Relational Data with dplyr: Mutating Joints and Filtering Joints
Key words are key, value, inner_join(), left_join(), right_join(), full_join, semi_join(), and anti_join(). Read a book written by Hadley Wickham and Garrett Grolemund.
keys and values
# How to count number with each cylinder category?
mtcars %>% group_by(cyl) %>% summarise(n=n()) # my old way
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 3 x 2
## cyl n
## <dbl> <int>
## 1 4 11
## 2 6 7
## 3 8 14
mtcars %>% count(cyl) # !
## cyl n
## 1 4 11
## 2 6 7
## 3 8 14
# multiple keys
mtcars %>% count(cyl,vs)
## cyl vs n
## 1 4 0 1
## 2 4 1 10
## 3 6 0 3
## 4 6 1 4
## 5 8 0 14
# explanation of 11 variables in mtcars data
?mtcars
Usage of inner_join(), left_join(), right_join(),full_join,semi_join(),anti_join()
- power of %in% in base R, but pipe is not available.
- Mutating Joints (inner_join(), left_join(), right_join(),full_join) and Filtering Joints (semi_join() and anti_join()) are useful.
- Filtering Joints are substitute of %in%.
# six cylinder cars
mtcars.6cyl <- mtcars %>% rownames_to_column() %>% filter(cyl==6)
# heavy cars
mtcars.heavy <- mtcars %>% rownames_to_column() %>% filter(wt>3)
# six cylinder and heavy
## traditional way
mtcars.6cyl[mtcars.6cyl$rowname %in% mtcars.heavy$rowname,]
## rowname mpg cyl disp hp drat wt qsec vs am gear carb
## 3 Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## 4 Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## 5 Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## 6 Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## inner_joint()
mtcars.6cyl %>% inner_join(mtcars.heavy,by="rowname") # disaster of columns
## rowname mpg.x cyl.x disp.x hp.x drat.x wt.x qsec.x vs.x am.x gear.x
## 1 Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3
## 2 Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3
## 3 Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4
## 4 Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4
## carb.x mpg.y cyl.y disp.y hp.y drat.y wt.y qsec.y vs.y am.y gear.y carb.y
## 1 1 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## 2 1 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## 3 4 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## 4 4 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
## semi_joint()
mtcars.6cyl %>% semi_join(mtcars.heavy,by="rowname") # Columns are not modified.
## rowname mpg cyl disp hp drat wt qsec vs am gear carb
## 1 Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
## 2 Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
## 3 Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
## 4 Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
select()
mtcars %>% dplyr::select(matches("m"))
## mpg am
## Mazda RX4 21.0 1
## Mazda RX4 Wag 21.0 1
## Datsun 710 22.8 1
## Hornet 4 Drive 21.4 0
## Hornet Sportabout 18.7 0
## Valiant 18.1 0
## Duster 360 14.3 0
## Merc 240D 24.4 0
## Merc 230 22.8 0
## Merc 280 19.2 0
## Merc 280C 17.8 0
## Merc 450SE 16.4 0
## Merc 450SL 17.3 0
## Merc 450SLC 15.2 0
## Cadillac Fleetwood 10.4 0
## Lincoln Continental 10.4 0
## Chrysler Imperial 14.7 0
## Fiat 128 32.4 1
## Honda Civic 30.4 1
## Toyota Corolla 33.9 1
## Toyota Corona 21.5 0
## Dodge Challenger 15.5 0
## AMC Javelin 15.2 0
## Camaro Z28 13.3 0
## Pontiac Firebird 19.2 0
## Fiat X1-9 27.3 1
## Porsche 914-2 26.0 1
## Lotus Europa 30.4 1
## Ford Pantera L 15.8 1
## Ferrari Dino 19.7 1
## Maserati Bora 15.0 1
## Volvo 142E 21.4 1
mtcars %>% dplyr::select(ends_with("m"))
## am
## Mazda RX4 1
## Mazda RX4 Wag 1
## Datsun 710 1
## Hornet 4 Drive 0
## Hornet Sportabout 0
## Valiant 0
## Duster 360 0
## Merc 240D 0
## Merc 230 0
## Merc 280 0
## Merc 280C 0
## Merc 450SE 0
## Merc 450SL 0
## Merc 450SLC 0
## Cadillac Fleetwood 0
## Lincoln Continental 0
## Chrysler Imperial 0
## Fiat 128 1
## Honda Civic 1
## Toyota Corolla 1
## Toyota Corona 0
## Dodge Challenger 0
## AMC Javelin 0
## Camaro Z28 0
## Pontiac Firebird 0
## Fiat X1-9 1
## Porsche 914-2 1
## Lotus Europa 1
## Ford Pantera L 1
## Ferrari Dino 1
## Maserati Bora 1
## Volvo 142E 1
Session info
sessionInfo()
## R version 3.6.2 (2019-12-12)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] rlang_0.4.10 forcats_0.5.0 stringr_1.4.0 dplyr_1.0.2
## [5] purrr_0.3.4 readr_1.4.0 tidyr_1.1.2 tibble_3.0.4
## [9] ggplot2_3.3.3 tidyverse_1.3.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.6 cellranger_1.1.0 pillar_1.4.7 compiler_3.6.2
## [5] dbplyr_2.0.0 tools_3.6.2 digest_0.6.27 lubridate_1.7.9.2
## [9] jsonlite_1.7.2 evaluate_0.14 lifecycle_0.2.0 gtable_0.3.0
## [13] pkgconfig_2.0.3 reprex_0.3.0 cli_2.2.0 rstudioapi_0.13
## [17] DBI_1.1.0 yaml_2.2.1 blogdown_1.2.2 haven_2.3.1
## [21] xfun_0.22 withr_2.3.0 xml2_1.3.2 httr_1.4.2
## [25] knitr_1.31 fs_1.5.0 hms_0.5.3 generics_0.1.0
## [29] vctrs_0.3.6 grid_3.6.2 tidyselect_1.1.0 glue_1.4.2
## [33] R6_2.5.0 fansi_0.4.1 readxl_1.3.1 rmarkdown_2.7
## [37] bookdown_0.21 modelr_0.1.8 magrittr_2.0.1 backports_1.2.1
## [41] scales_1.1.1 ellipsis_0.3.1 htmltools_0.5.1.1 rvest_0.3.6
## [45] assertthat_0.2.1 colorspace_2.0-0 utf8_1.1.4 stringi_1.5.3
## [49] munsell_0.5.0 broom_0.7.3 crayon_1.3.4