R Factors

Factors are the R data objects. It is used to categorize the data and store it as levels. They can store both integers and strings. Factors are used in data analysis for statistical modeling. factor() function is used to create factors.

Example 1:

data <- c('Physics','Maths','Physics','Bio','Bio','Maths','Physics','Maths','Bio')
fdata <- factor(data)
fdata

Output:

[1] Physics   Maths   Physics   Bio   Bio   Maths   Physics   Maths   Bio    
Levels: Bio Maths Physics

Example 2:

d <- c(100, 400, 200, 500, 100, 400, 300, 200, 100, 500, 400, 500, 100)
ndata <- factor(d)
ndata

Output:

[1] 100 400 200 500 100 400 300 200 100 500 400 500 100
Levels: 100 200 300 400 500

Accessing the Elements of a Factor

Accessing elements of a factor is very much similar to that of vectors.

Example:

data <- c('Physics','Maths','Physics','Bio','Bio','Maths','Physics','Maths','Bio')
fdata <- factor(data)
fdata
# access 3rd element
fdata[4]
# access 2nd and 4th element
fdata[c(2,4)]

Output:

[1] Physics Maths   Physics Bio     Bio     Maths   Physics Maths   Bio    
Levels: Bio Maths Physics
[1] Bio
Levels: Bio Maths Physics
[1] Maths Bio 
Levels: Bio Maths Physics

How to Modify a Factor

Elements of a factor can be modified using simple assignments. However, we cannot choose values outside of factor’s predefined levels.

Example:

data <- c('Physics','Maths','Physics','Bio','Bio','Maths','Physics','Maths','Bio')
fdata <- factor(data)
# modify 4th element
fdata[4] <- "Maths"
fdata
# cannot assign values outside the levels
fdata[3] <- "Chemistry"
fdata

Output:

[1] Physics Maths   Physics Maths   Bio     Maths   Physics Maths   Bio    
Levels: Bio Maths Physics
Warning message:
  In `[<-.factor`(`*tmp*`, 3, value = "Chemistry") :
  invalid factor level, NA generated
[1] Physics Maths   <NA>    Maths   Bio     Maths   Physics Maths   Bio    
Levels: Bio Maths Physics

Changing the Order of Levels

We can change the order of the levels in a factor by applying the factor function again with new order of levels.

Example:

data <- c('Physics','Maths','Physics','Bio','Bio','Maths','Physics','Maths','Bio')
fdata <- factor(data)
print(fdata)
# Apply the factor function with required order of the level.
newData <- factor(fdata,levels = c("Physics","Maths","Bio"))
print(newData)

Output:

[1] Physics Maths   Physics Bio     Bio     Maths   Physics Maths   Bio    
Levels: Bio Maths Physics
[1] Physics Maths   Physics Bio     Bio     Maths   Physics Maths   Bio    
Levels: Physics Maths Bio

Generating Factor Levels

To generate factor levels by using the gl() function.  Let’s see the syntax of this gl() function:

Syntax:

gl(n, k, labels)

Here,

  • n is a number of levels
  • k is the number of replications
  • labels is a vector of labels for the resulting factor levels

Example:

g <- gl(3, 4, labels = c("Nikita", "Deep","Ayesha"))
print(g)

Output:

[1] Nikita Nikita Nikita Nikita Deep   Deep   Deep   Deep   Ayesha Ayesha Ayesha Ayesha
Levels: Nikita Deep Ayesha
Reference: https://www.stat.berkeley.edu/~s133/factors.html