i

R Programming Complete Tutorial

R Factors

Categorical data

R factors are variables that take a limited number of different values. Hence these variables are often known as categorical variables. In order to categorize the data and store it on multiple levels, we use the data object called the R factor. They can store both strings and integers. They are also useful in the columns which have a limited number of unique values.      

Create a Factor

In order to create an R factor, we will make use of the factor () function. It contains data, levels, labels, exclude, ordered, and nmax as arguments.  Only data is mandatory; others are optional.

Factor Syntax:

Create a factor data set:

To create a factor, we can only take the data, keeping others as default. We have initialized a variable (department) with a character vector of four elements. If we want to create a factor data, we have to use this department data s argument.

We can use levels and labels arguments per our requirement. In the following example, we are initializing four levels Male, Man, Lady, Female, and assigning their corresponding labels as Male and Female only. So, all the Male and Man will be labeled by Male, and Lady and Female will be labeled as Female.

Create a Factor from Data frame:

We can define a character variable as factor data directly from the data frame. In data.frame() function, there is an argument stringsAsFactors. If we use it default or true, all the character variables will be converted as a factor.

In this example, our Dept variable contains character data. As we used stringsAsFactors as TRUE, this variable will be converted into factor. If we display this, it will be presented with the factor with four levels.