class: center, middle, inverse, title-slide # Ecologia Numérica ## Aula 3 - Agrupamento de cluster ### Felipe Melo ### Laboratório de Ecologia Aplicada - UFPE ### 2021-12-03 --- # Agrupamentos de cluster Hoje precisamos aprender a explicar isso: <img src="https://www.datanovia.com/en/wp-content/uploads/dn-tutorials/003-hierarchical-clustering-in-r/figures/005-visualizing-dendrograms-cutree-1.png" width="600" height="420" /> --- # Ou isso <img src="https://1.bp.blogspot.com/-sxbeFY-yzCo/TgTJsykT0kI/AAAAAAAAC5o/hZ5zF45pzs4/s1600/heatmap_cluster1.png"> --- # Ou isso aqui <img src="https://analise-estatistica.pt/wp-content/uploads/2012/10/diagrama-de-dispersao-analise-de-clusters.png"> --- # Agrupamento por cluster ### O que são? - Técnicas que identificam objetos ou entidades "similares" .center[ <img src="https://i.pinimg.com/originals/fc/60/55/fc6055ddb5fa9d6f5aeed0ea6201ef77.jpg", height = 400>] --- # Agrupamento por cluster ### pra que servem? - Formar grupos/classes que possuam alta homogeneidade interna (distância interna pequena) e alta heterogeneidade (distância entre grupos grande) .center[ <img src="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQ2BjBaFQJFYCI9sS2fBoqDfchgW6WBoVBWdA&usqp=CAU">] [fonte: Neves 2013](https://dspace.bc.uepb.edu.br/jspui/bitstream/123456789/4313/1/PDF%20-%20Reginaldo%20Ferreira%20Neves.pdf) --- # Princípios do agrupamnto por cluster ### Depende do pesquisar: a) escolher a medida de similaridade b) Escolher o algorítimo de agrupamento c) Definir a quantidade de cluster que faça sentido --- # Medidas de distância .pull-left[<img src="https://bigdata-madesimple.com/wp-content/uploads/2015/06/Five-most-popular-similarity-measures-implementation-in-python-1.png" height="300" />] .pull-right[ ### - Escolhidas de acrodo com a estrutura dos dados ### - Precisam ser "normalizadas" ### - **MUITO IMPORTANTE** para o resultado da análise ] --- # Algorítimos de agrupamento ### Métodos hierárquicos .center[ <img src="https://github.com/fplmelo/eco_numerica/blob/master/slides/libs/met_clust.png?raw=true" height= 450>] [Fonte: Marcelo Louretto](http://www.each.usp.br/lauretto/cursoR2017/04-AnaliseCluster.pdf) --- # Algorítimos de agrupamento ### Métodos não hirárquicos (K-means) .panelset[ .panel[.panel-name[R Code1] ```r k3<-kmeans(env[,-c(1,4,12)], centers = 3, nstart=25) k3 ``` ``` ## K-means clustering with 3 clusters of sizes 14, 7, 9 ## ## Cluster means: ## ele slo pH har pho nit amm oxy ## 1 248.7143 0.700000 8.028571 94.57143 1.0050000 2.795714 0.402142857 8.128571 ## 2 464.4286 3.057143 8.100000 90.57143 0.1900000 1.085714 0.008571429 11.485714 ## 3 857.1111 8.188889 8.044444 69.44444 0.1477778 0.320000 0.065555556 9.722222 ## bod ## 1 6.864286 ## 2 3.000000 ## 3 4.044444 ## ## Clustering vector: ## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 ## 3 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 ## 27 28 29 30 ## 1 1 1 1 ## ## Within cluster sum of squares by cluster: ## [1] 40309.58 35691.36 35819.23 ## (between_SS / total_SS = 94.8 %) ## ## Available components: ## ## [1] "cluster" "centers" "totss" "withinss" "tot.withinss" ## [6] "betweenss" "size" "iter" "ifault" ``` ```r library(factoextra) ``` ``` ## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa ``` ] .panel[.panel-name[Plot1] <img src="slide_aula3_cluster_files/figure-html/unnamed-chunk-3-1.png" width="400px" /> ] .panel[.panel-name[R Code2] ```r k2<-kmeans(env[,-c(1,4,12)], centers = 2, nstart=25) k2 ``` ``` ## K-means clustering with 2 clusters of sizes 20, 10 ## ## Cluster means: ## ele slo pH har pho nit amm oxy bod ## 1 305.8 1.065 8.07 93.8 0.767 2.2995 0.284 9.21 5.64 ## 2 833.1 8.360 8.01 70.7 0.139 0.3630 0.060 9.75 4.07 ## ## Clustering vector: ## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 ## 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ## 27 28 29 30 ## 1 1 1 1 ## ## Within cluster sum of squares by cluster: ## [1] 200947.72 87852.16 ## (between_SS / total_SS = 86.5 %) ## ## Available components: ## ## [1] "cluster" "centers" "totss" "withinss" "tot.withinss" ## [6] "betweenss" "size" "iter" "ifault" ``` ] .panel[.panel-name[Plot2] <img src="slide_aula3_cluster_files/figure-html/unnamed-chunk-5-1.png" width="400px" /> ] ] --- # Aplicações na Ecologia <img src="https://media.springernature.com/full/springer-static/image/art%3A10.1038%2Fs41598-020-69925-9/MediaObjects/41598_2020_69925_Fig1_HTML.png?as=webp" height = 400> [Diversidade de Inhame, Darkwa et al 2020](https://www.nature.com/articles/s41598-020-69925-9) --- # Aplicações na Ecologia <img src="https://media.springernature.com/lw685/springer-static/image/art%3A10.1007%2Fs10021-019-00415-4/MediaObjects/10021_2019_415_Fig3_HTML.png" height = 400> [Vallejos et al 2020](https://link.springer.com/article/10.1007/s10021-019-00415-4) --- # Aplicações na Ecologia <img src="https://www.pnas.org/content/pnas/115/8/1837/F1.large.jpg?width=800&height=600&carousel=1" height = 400> [Slik et al 2018](https://www.pnas.org/content/115/8/1837/tab-figures-data) --- class: center, middle # FIM