R Data Reshaping

In R programming, data reshaping is the process of changing the way data is organized into rows and columns.

Most of the time data processing is done by taking the input data as a data frame.

Sometimes we need data frame in a different format. R provides many functions to split, merge and change the rows to columns and vice-versa in data frame.

Joining Columns and Rows in a Data Frame

cbind()

Syntax:

cbind(x1, x2,….)

Here, x1 and x2 may be a data frame, matrix or vector.

We use cbind() function to merge vector, matrix or data frame by columns.

Example:

name <- c("Deep","Nikita","Akansha","Raj")
cls <- c("BE","MCA","MSC","MCA")
age <- c(28,24,25,30)
# Combine above three vectors into one data frame.
students <- cbind(name,cls,age)
# Print the data frame.
print(students)

Output:

     name           cls       age
[1,] "Deep"       "BE"     "28"
[2,] "Nikita"      "MCA"   "24"
[3,] "Akansha"  "MSC"   "25"
[4,] "Raj"          "MCA"   "30"

rbind()

rbind() function is used to combine vector, matrix or data frame by rows:

Syntax:

rbind(x1, x2,….)

Here, x1 and x2 may be a data frame, matrix or vector.

Example:

students <- data.frame(
name = c("Deep","Nikita","Akansha","Raj"),
cls = c("BE","MCA","MSC","MCA"),
age = c(28,24,25,30)
)
# Print the data frame1.
print(students)
new.students <- data.frame(
name = c("Akash","Jhon","Dolly","Priya"),
cls = c("BSC","BCA","MSC","MCA"),
age = c(21,20,26,24)
)
# Print the data frame2.
print(new.students)
# Combine rows form both the data frame1 and dataframe2.
all.students <- rbind(students,new.students)
# Print the result.
print(all.students)

Output:

    name   cls      age
1   Deep     BE    28
2  Nikita     MCA  24
3 Akansha  MSC  25
4     Raj     MCA  30
   name cls age
1 Akash BSC  21
2  Jhon BCA  20
3 Dolly MSC  26
4 Priya MCA  24
     Name   cls   age
1    Deep   BE  28
2  Nikita    MCA  24
3 Akansha MSC  25
4     Raj    MCA  30
5   Akash  BSC  21
6    Jhon   BCA  20
7   Dolly    MSC  26
8   Priya    MCA  24

Merging Data Frame

We can merge two data or more frames by using the merge() function. It is similar to join of SQL. Here, both data frames must have at least one common column by which we can join them.

Example:

Let's see an example to merge two data frames by roll_no:

data.frame1 <- data.frame(
roll_no = c(101, 102, 103, 104),
name = c("Deep","Nikita","Akansha","Raj"),
cls = c("BE","MCA","MSC","MCA"),
age = c(28,24,25,30)
)
# Print the data frame1.
print(students)
data.frame2 <- data.frame(
  roll_no = c(101, 102, 103, 104),
  dept = c("CS", "IT", "IT", "CS"),
  marks = c(90, 70, 88, 22)
)
# Print the data frame2.
print(new.students)
total <- merge(data.frame1,data.frame2,by="roll_no")
print(total)

Output:

     name cls age
1    Deep  BE  28
2  Nikita MCA  24
3 Akansha MSC  25
4     Raj MCA  30
   name cls age
1 Akash BSC  21
2  Jhon BCA  20
3 Dolly MSC  26
4 Priya MCA  24
  roll_no    name cls age dept marks
1     101    Deep  BE  28   CS    90
2     102  Nikita MCA  24   IT    70
3     103 Akansha MSC  25   IT    88
4     104     Raj MCA  30   CS    22

Transpose

We can transpose a matrix or data frame by using t() function.

Example:

a <- matrix(c(1:9), nrow = 3, byrow = TRUE)
a
x <- t(a)
x

Output:

     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9

Melting and Casting

We can reshape the data into multiple steps in order to convert input data into the required format.

We generally melt data so that each row in converted into the unique id-variable combination. Then we cast this data into the desired format. The function used to do this are melt() function and cast() function.

melt()

Example:

# install the reshape package
install.packages("reshape")
#Create a data frame
data1 <- data.frame(
  id = c(1, 1, 2, 2),
  rank = c(1, 2, 1, 2),
  x1 = c(5, 2, 6, 7),
  x2 = c(6, 1, 5, 8)
)
# load the reshape package
library(reshape)
# melt the data
mdata <- melt(data1, id =c("id", "rank"))
print(mdata)

Output:

  id rank variable value
1  1    1       x1     5
2  1    2       x1     2
3  2    1       x1     6
4  2    2       x1     7
5  1    1       x2     6
6  1    2       x2     1
7  2    1       x2     5
8  2    2       x2     8

cast()

Example:

Let's cast the melted data to evaluate mean:

#cast the melted data
idmeans <- cast(mdata, id~variable, mean)
rankmeans <- cast(mdata, rank~variable, mean)
print(idmeans)
print(rankmeans)

Output:

  id  x1  x2
1  1 3.5 3.5
2  2 6.5 6.5
  rank  x1  x2
1    1 5.5 5.5
2    2 4.5 4.5
Refernce: https://data-flair.training/blogs/r-data-reshaping-function-package/



ADVERTISEMENT
ADVERTISEMENT