class: inverse, center # <svg style="height:0.8em;top:.04em;position:relative;fill:steelblue;" viewBox="0 0 581 512"><path d="M581 226.6C581 119.1 450.9 32 290.5 32S0 119.1 0 226.6C0 322.4 103.3 402 239.4 418.1V480h99.1v-61.5c24.3-2.7 47.6-7.4 69.4-13.9L448 480h112l-67.4-113.7c54.5-35.4 88.4-84.9 88.4-139.7zm-466.8 14.5c0-73.5 98.9-133 220.8-133s211.9 40.7 211.9 133c0 50.1-26.5 85-70.3 106.4-2.4-1.6-4.7-2.9-6.4-3.7-10.2-5.2-27.8-10.5-27.8-10.5s86.6-6.4 86.6-92.7-90.6-87.9-90.6-87.9h-199V361c-74.1-21.5-125.2-67.1-125.2-119.9zm225.1 38.3v-55.6c57.8 0 87.8-6.8 87.8 27.3 0 36.5-38.2 28.3-87.8 28.3zm-.9 72.5H365c10.8 0 18.9 11.7 24 19.2-16.1 1.9-33 2.8-50.6 2.9v-22.1z"/></svg> para análisis de datos 🚀<br> <br> ## 🔧Tips de dplyr 💻 <br> <br> <br> .large[Roxana N. Villafañe | LEMyP | <a href='http://twitter.com/data_datum'><svg style="height:0.8em;top:.04em;position:relative;fill:steelblue;" viewBox="0 0 512 512"><path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"/></svg> @data_datum</a>] <br> .large[Florencia D'Andrea | INTA-CONICET | <a href="http://twitter.com/cantoflor_87"> <svg style="height:0.8em;top:.04em;position:relative;fill:steelblue;" viewBox="0 0 512 512"><path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"/></svg> @cantoflor_87</a><br>] ✨ <br> Página web del curso en <https://flor14.github.io/Curso_r_unne_2020/> 🌟 --- <img src="img/hex-dplyr.png" width="10%" align="right" /> #Tip 1: si tenemos que seleccionar las mismas columnas varias veces ```r library(dplyr) library(gapminder) cols<-c("country", "lifeExp", "gdpPercap") #selecciono variables gapminder %>% select(!!cols) #selecciono según el vector creado ``` ``` ## # A tibble: 1,704 x 3 ## country lifeExp gdpPercap ## <fct> <dbl> <dbl> ## 1 Afghanistan 28.8 779. ## 2 Afghanistan 30.3 821. ## 3 Afghanistan 32.0 853. ## 4 Afghanistan 34.0 836. ## 5 Afghanistan 36.1 740. ## 6 Afghanistan 38.4 786. ## 7 Afghanistan 39.9 978. ## 8 Afghanistan 40.8 852. ## 9 Afghanistan 41.7 649. ## 10 Afghanistan 41.8 635. ## # … with 1,694 more rows ``` --- <img src="img/hex-dplyr.png" width="10%" align="right" /> #Tip 2: seleccionar según una expresión regular (regex) ```r gapminder %>% select(matches("gdp"))%>% #la expresión regular es "gdp" head ``` ``` ## # A tibble: 6 x 1 ## gdpPercap ## <dbl> ## 1 779. ## 2 821. ## 3 853. ## 4 836. ## 5 740. ## 6 786. ``` --- <img src="img/hex-dplyr.png" width="10%" align="right" /> #Tip 3: para reordenar columnas ```r gapminder %>% select("lifeExp", "gdpPercap", everything())%>% head ``` ``` ## # A tibble: 6 x 6 ## lifeExp gdpPercap country continent year pop ## <dbl> <dbl> <fct> <fct> <int> <int> ## 1 28.8 779. Afghanistan Asia 1952 8425333 ## 2 30.3 821. Afghanistan Asia 1957 9240934 ## 3 32.0 853. Afghanistan Asia 1962 10267083 ## 4 34.0 836. Afghanistan Asia 1967 11537966 ## 5 36.1 740. Afghanistan Asia 1972 13079460 ## 6 38.4 786. Afghanistan Asia 1977 14880372 ``` --- <img src="img/hex-dplyr.png" width="10%" align="right" /> # Tip 4: Si quisiera borrar una columna Con la función select y como argumento el nombre de la columna, antecedida por el signo menos. ```r gapminder %>% select(-pop) ``` ``` ## # A tibble: 1,704 x 5 ## country continent year lifeExp gdpPercap ## <fct> <fct> <int> <dbl> <dbl> ## 1 Afghanistan Asia 1952 28.8 779. ## 2 Afghanistan Asia 1957 30.3 821. ## 3 Afghanistan Asia 1962 32.0 853. ## 4 Afghanistan Asia 1967 34.0 836. ## 5 Afghanistan Asia 1972 36.1 740. ## 6 Afghanistan Asia 1977 38.4 786. ## 7 Afghanistan Asia 1982 39.9 978. ## 8 Afghanistan Asia 1987 40.8 852. ## 9 Afghanistan Asia 1992 41.7 649. ## 10 Afghanistan Asia 1997 41.8 635. ## # … with 1,694 more rows ``` --- <img src="img/hex-dplyr.png" width="10%" align="right" /> # `select_all()` - Permite seleccionar todas las columnas y aplicar una operación a todas las columnas ```r gapminder %>% select_all(toupper) %>% head ``` ``` ## # A tibble: 6 x 6 ## COUNTRY CONTINENT YEAR LIFEEXP POP GDPPERCAP ## <fct> <fct> <int> <dbl> <int> <dbl> ## 1 Afghanistan Asia 1952 28.8 8425333 779. ## 2 Afghanistan Asia 1957 30.3 9240934 821. ## 3 Afghanistan Asia 1962 32.0 10267083 853. ## 4 Afghanistan Asia 1967 34.0 11537966 836. ## 5 Afghanistan Asia 1972 36.1 13079460 740. ## 6 Afghanistan Asia 1977 38.4 14880372 786. ``` - Para deshacer el cambio anterior ```r gapminder %>% select_all(tolower) ``` --- <img src="img/hex-dplyr.png" width="10%" align="right" /> # Tip 5: usar `between()` para especificar rangos - También podemos hacerlo combinando con between(): ```r gapminder %>% select (country, lifeExp, year) %>% filter(between(lifeExp, 60, 85)) %>% head ``` ``` ## # A tibble: 6 x 3 ## country lifeExp year ## <fct> <dbl> <int> ## 1 Albania 64.8 1962 ## 2 Albania 66.2 1967 ## 3 Albania 67.7 1972 ## 4 Albania 68.9 1977 ## 5 Albania 70.4 1982 ## 6 Albania 72 1987 ``` --- <img src="img/hex-dplyr.png" width="10%" align="right" /> # Tip 6: Si queremos conservar sólo la nueva columna; usamos `transmute()` ```r gapminder %>% transmute(gdp = pop * gdpPercap) %>% head ``` ``` ## # A tibble: 6 x 1 ## gdp ## <dbl> ## 1 6567086330. ## 2 7585448670. ## 3 8758855797. ## 4 9648014150. ## 5 9678553274. ## 6 11697659231. ``` --- <img src="img/hex-dplyr.png" width="10%" align="right" /> #Funciones útiles para combinar con `summarise()`: .pull-left[ #### rbase <table> <thead> <tr> <th style="text-align:left;"> funciones </th> <th style="text-align:left;"> descripción </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> min(), max() </td> <td style="text-align:left;"> valores mínimos y máximos </td> </tr> <tr> <td style="text-align:left;"> mean() </td> <td style="text-align:left;"> media </td> </tr> <tr> <td style="text-align:left;"> median() </td> <td style="text-align:left;"> mediana </td> </tr> <tr> <td style="text-align:left;"> sum() </td> <td style="text-align:left;"> suma de los valores </td> </tr> <tr> <td style="text-align:left;"> var(), sd() </td> <td style="text-align:left;"> varianza y desviación típica </td> </tr> </tbody> </table> ] .pull-right[ #### dplyr <table> <thead> <tr> <th style="text-align:left;"> dplyr </th> <th style="text-align:left;"> descripción </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> first() </td> <td style="text-align:left;"> primer valor de un vector </td> </tr> <tr> <td style="text-align:left;"> last() </td> <td style="text-align:left;"> último valor de un vector </td> </tr> <tr> <td style="text-align:left;"> n() </td> <td style="text-align:left;"> el numero de valores en un vector </td> </tr> <tr> <td style="text-align:left;"> n_distinct() </td> <td style="text-align:left;"> número de valores distintos en un vector </td> </tr> <tr> <td style="text-align:left;"> nth() </td> <td style="text-align:left;"> extraer el valor que ocupa la posición n en un vector </td> </tr> </tbody> </table> ] --- <img src="img/hex-dplyr.png" width="10%" align="right" /> # `summarise_all()` - Requiere una función que se aplicará a todas las columnas ```r iris %>% group_by(Species) %>% summarise_all(mean)%>% head ``` ``` ## # A tibble: 3 x 5 ## Species Sepal.Length Sepal.Width Petal.Length Petal.Width ## <fct> <dbl> <dbl> <dbl> <dbl> ## 1 setosa 5.01 3.43 1.46 0.246 ## 2 versicolor 5.94 2.77 4.26 1.33 ## 3 virginica 6.59 2.97 5.55 2.03 ``` --- <img src="img/hex-dplyr.png" width="10%" align="right" /> # `summarise_at()` - Requiere dos argumentos, uno indicando las columnas que se tendrán en cuenta, y luego la operación con la que se resumirán los datos. ```r iris %>% group_by(Species) %>% summarise_at(vars(contains("Sepal")), mean) ``` ``` ## # A tibble: 3 x 3 ## Species Sepal.Length Sepal.Width ## <fct> <dbl> <dbl> ## 1 setosa 5.01 3.43 ## 2 versicolor 5.94 2.77 ## 3 virginica 6.59 2.97 ``` ```r #resumo variables que contengan #Sepal ``` --- <img src="img/hex-dplyr.png" width="10%" align="right" /> # `summarise_if()`: - Requiere dos argumentos ```r gapminder %>% group_by(continent) %>% summarise_if(is.numeric, mean) ``` ``` ## # A tibble: 5 x 5 ## continent year lifeExp pop gdpPercap ## <fct> <dbl> <dbl> <dbl> <dbl> ## 1 Africa 1980. 48.9 9916003. 2194. ## 2 Americas 1980. 64.7 24504795. 7136. ## 3 Asia 1980. 60.1 77038722. 7902. ## 4 Europe 1980. 71.9 17169765. 14469. ## 5 Oceania 1980. 74.3 8874672. 18622. ``` --- <img src="img/hex-dplyr.png" width="10%" align="right" /> # Si tenemos dudas ### Podemos consultar la documentación -- ```r ?dplyr::select ?dplyr::filter ?dplyr::mutate ?dplyr::arrange ?dplyr::summarise ?dplyr::group_by ``` --- ```r devtools::session_info() ``` ``` ## ─ Session info ─────────────────────────────────────────────────────────────── ## setting value ## version R version 3.6.3 (2020-02-29) ## os Ubuntu 20.04 LTS ## system x86_64, linux-gnu ## ui X11 ## language es_AR:es ## collate es_AR.UTF-8 ## ctype es_AR.UTF-8 ## tz America/Argentina/Cordoba ## date 2020-06-05 ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date lib source ## assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.3) ## backports 1.1.7 2020-05-13 [1] CRAN (R 3.6.3) ## callr 3.4.3 2020-03-28 [1] CRAN (R 3.6.3) ## cli 2.0.2 2020-02-28 [1] CRAN (R 3.6.3) ## crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.3) ## desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.3) ## devtools 2.2.2 2020-02-17 [3] CRAN (R 3.6.3) ## digest 0.6.25 2020-02-23 [1] CRAN (R 3.6.3) ## dplyr * 0.8.5 2020-03-07 [1] CRAN (R 3.6.3) ## ellipsis 0.3.1 2020-05-15 [1] CRAN (R 3.6.3) ## emo * 0.0.0.9000 2020-05-12 [1] Github (hadley/emo@3f03b11) ## evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.3) ## fansi 0.4.1 2020-01-08 [1] CRAN (R 3.6.3) ## fontawesome * 0.1.0 2020-05-12 [1] Github (rstudio/fontawesome@2b64e31) ## fs 1.4.1 2020-04-04 [1] CRAN (R 3.6.3) ## gapminder * 0.3.0 2017-10-31 [1] CRAN (R 3.6.3) ## generics 0.0.2 2018-11-29 [1] CRAN (R 3.6.3) ## glue 1.4.1 2020-05-13 [1] CRAN (R 3.6.3) ## highr 0.8 2019-03-20 [1] CRAN (R 3.6.3) ## htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.6.3) ## knitr 1.28 2020-02-06 [1] CRAN (R 3.6.3) ## lifecycle 0.2.0 2020-03-06 [1] CRAN (R 3.6.3) ## lubridate 1.7.8 2020-04-06 [1] CRAN (R 3.6.3) ## magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.3) ## memoise 1.1.0 2017-04-21 [3] CRAN (R 3.5.0) ## pillar 1.4.4 2020-05-05 [1] CRAN (R 3.6.3) ## pkgbuild 1.0.8 2020-05-07 [1] CRAN (R 3.6.3) ## pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.3) ## pkgload 1.1.0 2020-05-29 [1] CRAN (R 3.6.3) ## prettyunits 1.1.1 2020-01-24 [1] CRAN (R 3.6.3) ## processx 3.4.2 2020-02-09 [1] CRAN (R 3.6.3) ## ps 1.3.3 2020-05-08 [1] CRAN (R 3.6.3) ## purrr 0.3.4 2020-04-17 [1] CRAN (R 3.6.3) ## R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.3) ## Rcpp 1.0.4.6 2020-04-09 [1] CRAN (R 3.6.3) ## remotes 2.1.1 2020-02-15 [1] CRAN (R 3.6.3) ## rlang 0.4.6 2020-05-02 [1] CRAN (R 3.6.3) ## rmarkdown 2.1 2020-01-20 [1] CRAN (R 3.6.3) ## rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.3) ## sessioninfo 1.1.1 2018-11-05 [3] CRAN (R 3.5.1) ## stringi 1.4.6 2020-02-17 [1] CRAN (R 3.6.3) ## stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.3) ## testthat 2.3.2 2020-03-02 [1] CRAN (R 3.6.3) ## tibble 3.0.1 2020-04-20 [1] CRAN (R 3.6.3) ## tidyselect 1.0.0 2020-01-27 [1] CRAN (R 3.6.3) ## usethis 1.5.1 2019-07-04 [3] CRAN (R 3.6.2) ## utf8 1.1.4 2018-05-24 [1] CRAN (R 3.6.3) ## vctrs 0.3.1 2020-06-05 [1] CRAN (R 3.6.3) ## withr 2.2.0 2020-04-20 [1] CRAN (R 3.6.3) ## xaringan 0.16 2020-03-31 [1] CRAN (R 3.6.3) ## xfun 0.13 2020-04-13 [1] CRAN (R 3.6.3) ## yaml 2.2.1 2020-02-01 [1] CRAN (R 3.6.3) ## ## [1] /home/roxana/R/x86_64-pc-linux-gnu-library/3.6 ## [2] /usr/local/lib/R/site-library ## [3] /usr/lib/R/site-library ## [4] /usr/lib/R/library ```