Factor Levels In R: Using Categorical & Ordinal Variables

This tutorial will go through factors and factor levels in R. Youâ€™ll learn how to create a factor and how to adjust factor levels.

Factors are used to store and work with variables in R.

In this tutorial, youâ€™ll be dealing with categorical and ordinal variables. Categorical variables are variables that involve one or more categories that aren’t ordered in any specific way. An example would be colors. Ordinal variables, on the other hand, are similar to categorical variables with the difference that ordinal variables have clear ordering of the categories. This could be like low, medium, and high.

This is an introduction to more statistical terms. You are now slowly exploring Râ€™s capabilities for data and statistical analysis.

Categorical Factor Levels In R

If you recall in another lesson about data frames, you used the dollar sign (\$) to print out the Species column from the iris dataset. Do this again in RStudio. At the bottom-most part, there’s a line containing Levels composed of setosa, versicolor, and virginica.

This is Râ€™s way of handling categories in data.

If you use the unique ( ) function, R will list out the unique values in the specified column. For example, if you Run unique (iris\$Species), the Console displays the three Species level of iris.

Thereâ€™s no inherent ordering for these levels. You canâ€™t say that setosa is greater than the other two color categories. R, by default, arranges them into alphabetical order.

Ordinal Factor Levels In R

Now letâ€™s try and explore factors with inherent ordering of the category.

Create a vector and name it orders. For this example, assign that vector with data using Starbucks’ cup size names: tall, venti, and grande. Then, print it out.

These should be arranged from smallest to the biggest; it should be tall, venti, and grande. But when you Run the unique ( ) function for orders, they arenâ€™t arranged in that order.

Hereâ€™s how to turn them into ordinal variables. First, you need to create a new vector. In this case, the vector is called new_orders_factor. Assign this vector with the factor ( ) function. Inside this function, input the vector you want to set levels with. Then, indicate levels in the order you want them to appear.

Highlight this entire line of code and then Run it. A new Value is then added in Environment.

To check if a vector has been properly assigned as a factor, use the is.factor ( ) function. If you check the two vectors, orders and new_orders_factor, you can see that the former returns FALSE while the new vector is indeed a factor.

A factor is a special way to store a series of texts. And though itâ€™s a character vector, it can be stored in a way that allows it to have a given number of categories that have a specific ordering of values or levels.

If you check using the levels ( ) function, you can see that the levels are now in the correct order.

Conclusion

Though this lesson may seem esoteric, youâ€™ll see how this makes a difference when dealing with more advanced R coding. Itâ€™s important to learn about factors and levels since they often come up in many R coding and statistical analysis.

George

George Mount
George Mount has over 10 years of analytics experience ranging from retail to healthcare. He is the author of the book "Advancing into Analytics: From Excel to Python and R."

grep & grepl R Functions Explained With Examples

A common task in data analysis is the need to find specific patterns within text data. Pattern...

R vs Python -The Real Differences

In the world of data analysis and data science, you might be wondering which programming language is...

Create Vectors In R: A Step-by-step Tutorial

This tutorial will show you how to use and create vectors in R. Vectors allow you to work with multiple...

R code Generator: Generate R Code From Plain Text

Are you tired of manually writing complex R code for your data analysis and visualization needs? Well,...

The Most Powerful Function Call In Power BI

Power BI is a powerful data visualization and business intelligence tool that allows users to connect...

Creating A Jitter Plot Using ggplot2 In RStudio

The ggplot2 package is the most comprehensive way of building graphs and plots. Firms, like the New...

How To Save & Load An RDS File In R

R is a popular programming language for data analysis, and it's often used to create and manipulate...

How To Create A Lollipop Plot In RStudio

A lollipop plot, also known as a dumbbell plot, is a data visualization technique that combines a...

Evaluating & Optimizing Code Performance In R

Optimizing R code can significantly improve the performance of R scripts and programs, making them run...

Using R In Power BI: Creating Complex Visuals

In this blog, we will be continuing our series about the techniques to create complex custom visuals....

Create Bivariate Visualizations In R Using ggplot2

Creating visualizations in R using ggplot2 can be a powerful way to explore and understand your data....

Create A Histogram Using The R Visual In Power BI

Histograms are graphs that allow us to easily understand and visualize the distribution of a dataset....