The basic features of R are incredibly useful for someone coming from a non-R background, but experienced R users have sought faster and more effective data manipulation and improved plotting abilities, as well as multimedia options.
Hadley Wickham’s contribution to the R universe includes his ‘ tidyverse‘ concept, and the associated set of packages. These packages contain R functions designed using a common data philosophy and graphical grammar. The tidyverse is a framework of data concepts like ‘tidy data’ and ‘tibbles’ (extensions of R’s native data.frame). The standard for ‘tidy data’ as defined by Hadley is this basic rectangular form that forms the basis for R:
“Tidy data is data where:
- Each variable is in a column.
- Each observation is a row.
- Each value is a cell.”
These can all be installed collectively from within R Studio with
or you can install each one separately. Installing the tidyverse means you can use the R command:
to give you the advantage of making all the sub-components of that library available
It is useful to know the history and naming conventions of the tidyverse packages. They use function names separated by an underscore instead of the dot that Base R uses. For this reason, tidyverse has data_frame instead of data.frame, read_csv instead of read.csv,write_csv instead of write.csv etc. reshape2 modified Base R’s reshape. ‘dplyr’ was the 2014 evolution of an earlier package called plyr, also by Hadley.
dplyr provides another abstraction layer for data, introducing new key verbs such as mutate, filter, summarise and the key adverb “by group”. See Hadley’s own take on this.
R’s native group by and aggregate functions were common functions that took several steps to implement – this package provided brevity in that context. This is at the cost of more than one line entries. Some discussion about pros and cons of dplyr v data.table.
2. tibble is the name of a package, but it is also the name of a data object/concept that is central to the tidyverse. The use in script is ‘tb_df’ (tibble dataframe). There is online PDF documentation.
Tibbles are an alternative to the data.table. Tibbles extend R’s native data frames, not the data.tables (authored by a separate author). Tibbles, in fact, form the central data type of the ‘tidyverse’. data.frames, lists, vectors can be converted to tibbles with
Note that even though a tibble can be the output from a ‘tidyverse’ command ‘data_frame‘ this is NOT the basic R data.frame.
For data tidying. It includes functions like ‘gather’, which collapses multiple columns into a key-value pair according to specification. The complement is spread, in which a key-value pair is spread across multiple columns.
gather (unpivot) <->spread (pivot)
For an independent view, see Garrett Gman.
A ‘graphics of grammar‘ inspired extension to R’s native plot function.