Relational Data with dplyr: Mutating Joints and Filtering Joints

Key words are key, value, inner_join(), left_join(), right_join(), full_join, semi_join(), and anti_join(). Read a book written by Hadley Wickham and Garrett Grolemund.

keys and values

# How to count number with each cylinder category?
mtcars %>% group_by(cyl) %>% summarise(n=n()) # my old way
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 3 x 2
##     cyl     n
##   <dbl> <int>
## 1     4    11
## 2     6     7
## 3     8    14
mtcars %>% count(cyl) # ! 
##   cyl  n
## 1   4 11
## 2   6  7
## 3   8 14
# multiple keys
mtcars %>% count(cyl,vs) 
##   cyl vs  n
## 1   4  0  1
## 2   4  1 10
## 3   6  0  3
## 4   6  1  4
## 5   8  0 14
# explanation of 11 variables in mtcars data
?mtcars

Usage of inner_join(), left_join(), right_join(),full_join,semi_join(),anti_join()

  • power of %in% in base R, but pipe is not available.
  • Mutating Joints (inner_join(), left_join(), right_join(),full_join) and Filtering Joints (semi_join() and anti_join()) are useful.
  • Filtering Joints are substitute of %in%.
# six cylinder cars
mtcars.6cyl <- mtcars %>% rownames_to_column() %>% filter(cyl==6)
# heavy cars
mtcars.heavy <- mtcars %>% rownames_to_column() %>% filter(wt>3)
# six cylinder and heavy
## traditional way
mtcars.6cyl[mtcars.6cyl$rowname %in% mtcars.heavy$rowname,] 
##          rowname  mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## 3 Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## 4        Valiant 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## 5       Merc 280 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## 6      Merc 280C 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
## inner_joint()
mtcars.6cyl %>% inner_join(mtcars.heavy,by="rowname") # disaster of columns
##          rowname mpg.x cyl.x disp.x hp.x drat.x  wt.x qsec.x vs.x am.x gear.x
## 1 Hornet 4 Drive  21.4     6  258.0  110   3.08 3.215  19.44    1    0      3
## 2        Valiant  18.1     6  225.0  105   2.76 3.460  20.22    1    0      3
## 3       Merc 280  19.2     6  167.6  123   3.92 3.440  18.30    1    0      4
## 4      Merc 280C  17.8     6  167.6  123   3.92 3.440  18.90    1    0      4
##   carb.x mpg.y cyl.y disp.y hp.y drat.y  wt.y qsec.y vs.y am.y gear.y carb.y
## 1      1  21.4     6  258.0  110   3.08 3.215  19.44    1    0      3      1
## 2      1  18.1     6  225.0  105   2.76 3.460  20.22    1    0      3      1
## 3      4  19.2     6  167.6  123   3.92 3.440  18.30    1    0      4      4
## 4      4  17.8     6  167.6  123   3.92 3.440  18.90    1    0      4      4
## semi_joint()
mtcars.6cyl %>% semi_join(mtcars.heavy,by="rowname") # Columns are not modified.
##          rowname  mpg cyl  disp  hp drat    wt  qsec vs am gear carb
## 1 Hornet 4 Drive 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
## 2        Valiant 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
## 3       Merc 280 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
## 4      Merc 280C 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4

select()

mtcars %>% dplyr::select(matches("m"))
##                      mpg am
## Mazda RX4           21.0  1
## Mazda RX4 Wag       21.0  1
## Datsun 710          22.8  1
## Hornet 4 Drive      21.4  0
## Hornet Sportabout   18.7  0
## Valiant             18.1  0
## Duster 360          14.3  0
## Merc 240D           24.4  0
## Merc 230            22.8  0
## Merc 280            19.2  0
## Merc 280C           17.8  0
## Merc 450SE          16.4  0
## Merc 450SL          17.3  0
## Merc 450SLC         15.2  0
## Cadillac Fleetwood  10.4  0
## Lincoln Continental 10.4  0
## Chrysler Imperial   14.7  0
## Fiat 128            32.4  1
## Honda Civic         30.4  1
## Toyota Corolla      33.9  1
## Toyota Corona       21.5  0
## Dodge Challenger    15.5  0
## AMC Javelin         15.2  0
## Camaro Z28          13.3  0
## Pontiac Firebird    19.2  0
## Fiat X1-9           27.3  1
## Porsche 914-2       26.0  1
## Lotus Europa        30.4  1
## Ford Pantera L      15.8  1
## Ferrari Dino        19.7  1
## Maserati Bora       15.0  1
## Volvo 142E          21.4  1
mtcars %>% dplyr::select(ends_with("m"))
##                     am
## Mazda RX4            1
## Mazda RX4 Wag        1
## Datsun 710           1
## Hornet 4 Drive       0
## Hornet Sportabout    0
## Valiant              0
## Duster 360           0
## Merc 240D            0
## Merc 230             0
## Merc 280             0
## Merc 280C            0
## Merc 450SE           0
## Merc 450SL           0
## Merc 450SLC          0
## Cadillac Fleetwood   0
## Lincoln Continental  0
## Chrysler Imperial    0
## Fiat 128             1
## Honda Civic          1
## Toyota Corolla       1
## Toyota Corona        0
## Dodge Challenger     0
## AMC Javelin          0
## Camaro Z28           0
## Pontiac Firebird     0
## Fiat X1-9            1
## Porsche 914-2        1
## Lotus Europa         1
## Ford Pantera L       1
## Ferrari Dino         1
## Maserati Bora        1
## Volvo 142E           1

Session info

sessionInfo()
## R version 3.6.2 (2019-12-12)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] rlang_0.4.10    forcats_0.5.0   stringr_1.4.0   dplyr_1.0.2    
##  [5] purrr_0.3.4     readr_1.4.0     tidyr_1.1.2     tibble_3.0.4   
##  [9] ggplot2_3.3.3   tidyverse_1.3.0
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.6        cellranger_1.1.0  pillar_1.4.7      compiler_3.6.2   
##  [5] dbplyr_2.0.0      tools_3.6.2       digest_0.6.27     lubridate_1.7.9.2
##  [9] jsonlite_1.7.2    evaluate_0.14     lifecycle_0.2.0   gtable_0.3.0     
## [13] pkgconfig_2.0.3   reprex_0.3.0      cli_2.2.0         rstudioapi_0.13  
## [17] DBI_1.1.0         yaml_2.2.1        blogdown_1.2.2    haven_2.3.1      
## [21] xfun_0.22         withr_2.3.0       xml2_1.3.2        httr_1.4.2       
## [25] knitr_1.31        fs_1.5.0          hms_0.5.3         generics_0.1.0   
## [29] vctrs_0.3.6       grid_3.6.2        tidyselect_1.1.0  glue_1.4.2       
## [33] R6_2.5.0          fansi_0.4.1       readxl_1.3.1      rmarkdown_2.7    
## [37] bookdown_0.21     modelr_0.1.8      magrittr_2.0.1    backports_1.2.1  
## [41] scales_1.1.1      ellipsis_0.3.1    htmltools_0.5.1.1 rvest_0.3.6      
## [45] assertthat_0.2.1  colorspace_2.0-0  utf8_1.1.4        stringi_1.5.3    
## [49] munsell_0.5.0     broom_0.7.3       crayon_1.3.4

R books

R and bioinformatics