Abstract

This post demonstrates how to use pivot_longer() to convert your wide data to long data. This is part 1 of the Pivoting your tables with Tidyr series.

Intro

One of the primary data manipulation operations is pivoting your tabular data from “wide” format to “long” format and vice-versa.

The idea is to make your tabular data “tidy” i.e.

Every column is a variable.
Every row is an observation.
Every cell is a single value.

In other words, every column contains just one type of information, every row in the table is a snapshot or a version of the information your table captures and every cell contains just one piece of information.¹

While the wide format is more human-readable, the long format is preferred and is desirable for data and plotting operations using R, Python or other data processing programming languages. The {tidyr} R package has functions that allow you to transform your tabular data between the two formats.

In this post, we will see how to convert a wide dataframe to long format using the pivot_longer() function from {tidyr} package.

The wide one

Consider the following data table. It has been created from the famous Gapminder dataset. This table shows the average life expectancy in each continent for 2 years. While some of you may say that Gapminder data contains records for a lot more number of years, here we consider just the latest 2 years for ease of explanation and visual purposes.

Figure 1: Continent-wise Average Life Expectancy over last 2 years

my_data is in the wide format as we have continent names in column headers and average life expectancy values in each of those columns. To convert this tibble to the long format, we need to pull together the continent names in one column and their corresponding values into another column. See Figure 2

The long one

The long format of this table would ideally have only year, continent and average_life_expectancy columns and look something like the table below.

The long format has repeated values of the column that are not gathered/collected. In this case, the year column gets its values repeated for each row.

Let’s recreate the above transformation in R. First, we create the my_data table.

Code

my_data <- data.frame(
  year     = c(2002L, 2007L), 
  Africa   = c(53.33, 54.81), 
  Americas = c(72.42, 73.61), 
  Asia     = c(69.23, 70.73), 
  Europe   = c(76.70, 77.65), 
  Oceania  = c(79.74, 80.72)
)

knitr::kable(my_data)

year	Africa	Americas	Asia	Europe	Oceania
2002	53.33	72.42	69.23	76.70	79.74
2007	54.81	73.61	70.73	77.65	80.72

To convert this table into long format, we use the pivot_longer() function from {tidyr} R package. Let us see how to use this function.

Tip

Use `formals` to view all the formal arguments of a function and their default values. `formals` returns a named list.

Code

library(tidyr, quietly = TRUE, warn.conflicts = FALSE)

formals(pivot_longer)

$data


$cols


$names_to
[1] "name"

$names_prefix
NULL

$names_sep
NULL

$names_pattern
NULL

$names_ptypes
NULL

$names_transform
NULL

$names_repair
[1] "check_unique"

$values_to
[1] "value"

$values_drop_na
[1] FALSE

$values_ptypes
NULL

$values_transform
NULL

$...

The result of formals(pivot_longer) tells us that the minimum information needed to use this function is to provide values to the data and cols arguments as all other arguments have default values and hence, are optional.

Using only the minimum arguments with pivot_longer(), we get a long formatted tibble with the columns year, name and value.

Code

long_minimal <- pivot_longer(
                        data      = my_data,
                        cols      = c("Africa", "Americas", "Asia", "Europe", "Oceania")
                        )

knitr::kable(long_minimal)

year	name	value
2002	Africa	53.33
2002	Americas	72.42
2002	Asia	69.23
2002	Europe	76.70
2002	Oceania	79.74
2007	Africa	54.81
2007	Americas	73.61
2007	Asia	70.73
2007	Europe	77.65
2007	Oceania	80.72

Notice that the continent names and their corresponding average life expectancy values appear in columns named name and value. These are the default column names. We can change these column names by providing our own names to the arguments names_to and values_to.

Since the year column is the only one that remains as is, we can rewrite the above pivot_longer statement as below

Code

my_data_longer <- pivot_longer(data      = my_data,
                               cols      = !year,
                               names_to  = "continent",
                               values_to = "average_life_expectancy")

knitr::kable(my_data_longer)

year	continent	average_life_expectancy
2002	Africa	53.33
2002	Americas	72.42
2002	Asia	69.23
2002	Europe	76.70
2002	Oceania	79.74
2007	Africa	54.81
2007	Americas	73.61
2007	Asia	70.73
2007	Europe	77.65
2007	Oceania	80.72

If you are a visual person like me and wish to see this transformation with explanations, check out this GIF I made using good ol’ Powerpoint.

Figure 4: {tidyr} pivot_longer() explained

Conclusion

pivot_longer() is the successor for the great gather() function and has many advantages over the latter. pivot_longer() repeats all the values in the columns that are not included in the cols argument. Therefore, if your dataframe/tibble had a primary key prior to the transformation, the primary key of your transformed “longer” dataframe is your old primary key + the new column created by names_to. This function has many other arguments that allow some truly great transformations. Mastering this function (and its wide counterpart) is a great skill upgrade while massaging your data to make it “tidy”.

Happy Gathering!

References

Hadley Wickham and Maximilian Girlich (2022). tidyr: Tidy Messy Data. R package version 1.2.0. https://CRAN.R-project.org/package=tidyr
Yihui Xie (2022). knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.39.

Footnotes

Long vs. Wide Data: What’s the Difference? https://www.statology.org/long-vs-wide-data/↩︎

Reuse

CC BY 4.0

Citation

BibTeX citation:

@online{katti2022,
  author = {Katti, Vishal},
  title = {Pivoting Your Tables with {Tidyr:} {Part} {I}},
  date = {2022-07-08},
  url = {https://vishalkatti.com/posts/tidyr-pivot-longer/},
  langid = {en},
  abstract = {This post demonstrates how to use `pivot\_longer()` to
    convert your wide data to long data. This is part 1 of the Pivoting
    your tables with Tidyr series.}
}

For attribution, please cite this work as:

Katti, Vishal. 2022. “Pivoting Your Tables with Tidyr: Part I.” July 8, 2022. https://vishalkatti.com/posts/tidyr-pivot-longer/.