This post demonstrates how to use pivot_wider() to convert your long data to wide data. This is part 2 of the Pivoting your tables with Tidyr series.
Intro
We discussed the advantages of using the long format during data analysis, most users feel that the wide format is more readable by human. This is why most reports tend to have the data arranged in the wide format.
The wide format has at least one column which acts as a primary key i.e. it is unique and each value appears only once. It can also have multiple column whose unique combination acts as a primary key i.e. each combination appears only once.
While the long format is preferred and is desirable for data and plotting operations using R, Python or other data processing programming languages, the wide format is more human-readable. The {tidyr} R package has functions that allow you to transform your tabular data between the two formats.
In this post, we will see how to convert a long dataframe to wide format using the pivot_wider() function from {tidyr} package.
The long one
Consider the following data table. It has been created from the famous Gapminder dataset. This table shows the average life expectancy in each continent for 2 years. While some of you may say that Gapminder data contains records for a lot more number of years, here we consider just the latest 2 years for ease of explanation and visual purposes. We have added an extra id column for teaching purpose.
my_data is in the long format as we have continent names and year in their own column and average life expectancy values for each unique combination of year and continent. If we want to compare life expectancy across years for each continent, we need to have the life expectancy values for each continent side-by-side for easier viewing i.e. we need to convert to the wide format. To convert this tibble to the wide format, we need to push the year values into the headers and the average_life_expectancy values under the corresponding year column.
The wide one
The wide format of this table would ideally have only continent and columns having each unique value in the year column as a header. In this case, the wide one would look something like the table below.
The wide format has unique values of the column that are not pushed into headers. In this case, the continent column becomes unique for each row.
Let’s recreate the above transformation in R. First, we create the my_data table.
The result of formals(pivot_wider) tells us that the minimum information needed to use this function is to provide values to the data,names_from and values_from arguments as all other arguments have default values and hence, are optional.
Using only the minimum arguments with pivot_wider(), we get a wide formatted tibble but with missing data!
pivot_wider() creates unique combinations of all columns not included in names_from or values_from argument. Therefore, if your dataframe/tibble had a primary key prior to the transformation, the primary key of your transformed “wide” dataframe is your old primary key + unique combinations of all columns not included in names_from or values_from argument. We do have id column as a primary key in the original tibble. This gives an unusable output with NAs for each combination.
To specify which column/s to be made unique, pass their name to the id_cols argument. Here we pass the continent column to the id_cols argument.
If you are a visual person like me and wish to see this transformation with explanations, check out this GIF I made using good ol’ PowerPoint.
Conclusion
pivot_wider() is the successor for the great spread() function and has many advantages over the latter. This function has many other arguments that allow some truly great transformations. Mastering this function (and its long counterpart) is a great skill upgrade while massaging your data to make it “tidy”.
@online{katti2022,
author = {Katti, Vishal},
title = {Pivoting Your Tables with {Tidyr:} {Part} {II}},
date = {2022-08-29},
url = {https://vishalkatti.com/posts/tidyr-pivot-wider/},
langid = {en},
abstract = {This post demonstrates how to use `pivot\_wider()` to
convert your long data to wide data. This is part 2 of the Pivoting
your tables with Tidyr series.}
}