Python’s adoption of R’s ‘data frame’

Due to the influence of R on python, some of the terminology is similar and could become confusing, but this is avoided by understanding that many of the concepts originated with R and have since been incrementally picked up in the python universe or elsewhere.

Here are some examples

A ‘data frame’ was introduced to python through the ‘pandas’ library.

Generally, the python libraries follow the lead of R. The data structures in R are generally known as ‘Data Frames’, with a two-dimensional structure.

Wes McKinney, a statistician, built a python library he called ‘pandas‘ that uses Series data structures (one dimensional) and a Data Frame data structure (two-dimensional) for time series processing (originally in a financial context).

Pandas is built on top of the python library numPy, so those wishing to improve performance, such as in the HPC context, may decide to proceed to numPy directly.

McKinney has written a book “Python for Data Analysis” (2nd Ed, 2017).

ggplot (python)

ggplot2, for example, is a package that exists within the R universe and uses the command ‘ggplot()’ in that context, but its later implementation within the python universe (as a library) is called ‘ggplot‘. ggplot (python) is unashamedly described as ‘a plotting system for Python based on R’s ggplot2 and the Grammar of Graphics‘ . To achieve its goals, it also requires the python implementation of a data frame, using the pandas library.

datatable (python)

A python library that has been based on Matt Dowle’s R library called ‘data.table’.  The library project started in 2017 at H20ai, not surprising since Matt Dowle also works there as a “Hacker” (since 2015).  H20 is a Silicon Valley machine-learning company in Mountain View, California.

Leave a comment

Your email address will not be published. Required fields are marked *