COLUM OPERATIONS IN DPLYR

Add, Remove, & Rename Columns In R Using dplyr

One comment

This tutorial will talk about the package called dplyr. It’s one of the tidyverse packages focused on data manipulation that allows you to sort, filter, add, and rename columns in R.

These features are also available in Power Query, so they aren’t unique to the R program. However, R is better at handling them.

It’s important to learn how these techniques can be done using dplyr as they’re fundamental to report development and working with data in R.

Specifically, this tutorial will focus on column operations in dplyr.

Getting Started

Column operations allow you to calculate, add, remove, and rename columns in R using dplyr. Open a new R script in RStudio. If you don’t know how, click on the links to find out how to install RStudio and create an R script.

For this demonstration, the Lahman dataset package is used. This contains baseball records dating back over a hundred years. It’s a good dataset to use for practice. You can download it by doing a quick google search.

Moreover, the Lahman package has a dataset labeled Teams, with a capital T. A best practice for naming conventions in R is using lowercase letters. So this needs to be converted first to teams, as seen in the image below.

Basic Functions For Column Operations

1. Add New Columns In R

The first function is mutate ( ). This creates new column based on existing columns.

If you want to calculate a new column, you can use the mutate function following the argument:

rename columns in R

df is a stand-in name for any kind of data frame. So when in actual use, replace df with the name of the data frame you want to mutate. Then, you place the new variables that need to be named along with the formula for deriving the new column.

As an example, the mutate function will be used to find the winning percentage for each column. In the Lahman dataset, there’s a Win and Loss column. To get the percentage, divide Win by the sum of Win and Loss. But before you can do that, you need to bring in the dplyr package.

Here’s what happens if you run the mutate function without dplyr:

You’ll get an error saying “could not find function mutate”.

So, here’s how to bring in dplyr into R. You only need to run library (tidyverse).

rename columns in R

You’ll see that dplyr is among the many functions in the tidyverse package. Another option is to run library (dplyr).

Now if you place your cursor on the code with the mutate function and run it, you’ll then see the Wpct column containing the winning percentages.

In this instance, the result of the mutate function was only ran; it didn’t assign to the data.

If you want to assign the result of the mutate function to the data teams, you need to use the assignment operator ( <- ). Once done, run it. Then in another line, run head (teams). This will assign the result to the teams dataset.

rename columns in R

If you want to check what columns are available in a data set, use the names ( ) function. This will list all of the column names in the data.

You can also use existing functions as part of the mutate function. For example, you can take the log of a specific dataset using the log ( ) function.

2. Select Columns In R

Another function in dplyr is select ( ). It either drops or selects given columns. Its basic algorithm is:

rename columns in R

You need to input the data frame name and then the columns you want to select.

For example, if you want to keep the yearID, wins, and loss columns in the dataset, you only need to run:

You’ll then get the result you want:

However, if you don’t use the head ( ) function, the result will show the bottom rows of the columns. So if you’re dealing with multiple rows of data, you’ll need to continuously scroll up to get to the top of the column.

A best practice is to use the head function along with select. So that when you run the code, the result will show the top rows of the column first.

Now if you want to remove columns from the dataset, you only need to place a minus sign ( ) before the column name.

rename columns in R

To check if a column has indeed been removed, you can compare the new dataset from the old one. Here’s how to do it:

First, assign the R code with the select function to an object. In this example, it’s been assigned to teams_short. To count the number of columns, use the ncol ( ) function. Run the ncol function for both teams_short and teams.

rename columns in R

You’ll then see that one column was removed from the dataset.

3. Rename Columns In R

The last column function in dplyr is rename ( ). And as the name suggests, it can rename selected columns in R.

This is its basic algorithm:

rename columns in R

And you’ll notice that it’s a bit counterintuitive; the new name comes first while the old name comes after that. So make sure to not get those mixed up.

As an example, the current yearID and divID columns will be renamed to year_id and division_id, respectively. Before running the code, make sure to assign this to a new object so as to not disrupt the original dataset.

To check if these selected columns had their names successfully changed, use the names ( ) function.

You’ll see that the columns have indeed been renamed.

Conclusion

This tutorial has discussed three basic dplyr functions you can use to perform column operations. Specifically, you learned how to add, remove, and rename columns in R.

There are still other functions that you’ve yet to explore. But it’s important to know and be familiar with mutate ( ), select  ( ), and rename ( ) as they are the most common.

These column editing techniques can also be done in Power Query. But it’s great to have knowledge of how to do this in dplyr, too. This will surely assist you when you move to analyzing statistical data sets.

George

***** Related Links *****
Create Vectors In R: A Step-by-step Tutorial
Data Frames In R: Learning The Basics
Factor Levels In R: Using Categorical & Ordinal Variables

***** Related Course Modules *****
R For Power BI Users – Part 1
SharePoint Advanced Concepts: Lists, Permissions, Social
Ultimate Beginners Guide To Power BI

***** Related Support Forum Posts *****
Context Transition – Behind The Scenes
Checking For Blank Values
Editing Columns And Refreshing Data
For more column operations queries to review see here….

1 comments on “Add, Remove, & Rename Columns In R Using dplyr”

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.