R Data Frames

A data frame is a table or a two-dimensional array like structure in which each column contains values of one variable (means the first column can be a character, the second column can be numeric and thirds column can be logical, etc.) and each row has one set of values from each column. Let’s see some characteristics of the data frame:

  • Name of the column should be non-empty.
  • Name of the row should be unique.
  • The data stored in the data frame can be a factor, numeric or character type.
  • Each column must contain the same number of data items.

Creating a Data Frame

To create a data frame we have to use data.frame() function.

std_id = c (1:4)
std_name = c('Nikita','Deep','Priya',"Pihu")
marks = c(87, 89, 72, 65)
std.data <- data.frame(std_id, std_name, marks)
std.data

Output:

   std_id std_name marks
1      1      Nikita    87
2      2      Deep     89
3      3      Priya     72
4      4      Pihu     65

Get the Structure of Data Frame

We can get the structure of the data frame by using the str() function.

Example:

std_id = c (1:4)
std_name = c("Nikita","Deep","Priya","Pihu")
marks = c(87, 89, 72, 65)
std.data <- data.frame(std_id, std_name, marks, stringsAsFactors = FALSE)
# get the structure of the data frame
str(std.data)

Output:

'data.frame': 4 obs. of  3 variables:
$ std_id  : int  1 2 3 4
$ std_name: chr  "Nikita" "Deep" "Priya" "Pihu"
$ marks   : num  87 89 72 65

Summary of Data in Data Frame

We can obtain the nature and summary of data by applying the summary() function.

Example:

std_id = c (1:4)
std_name = c("Nikita","Deep","Priya","Pihu")
marks = c(87, 89, 72, 65)
std.data <- data.frame(std_id, std_name, marks, stringsAsFactors = FALSE)
# print the summary
print(summary(std.data))

Output:

std_id            std_name             marks      
Min.   :1.00     Length:4             Min.   :65.00 
1st Qu.:1.75   Class :character   1st Qu.:70.25 
Median :2.50   Mode  :character  Median :79.50 
Mean   :2.50                            Mean   :78.25 
3rd Qu.:3.25                      3rd Qu.:87.50 
Max.   :4.00                       Max.   :89.00

Access Specific Column from a Data Frame

We can extract a specific column from a data frame using the column name.

Example 1:

Let's see an example to extract the name and marks of student from the student data frame:

std_id = c (1:4)
std_name = c("Nikita","Deep","Priya","Pihu")
marks = c(87, 89, 72, 65)
std.data <- data.frame(std_id, std_name, marks, stringsAsFactors = FALSE)
#extract specific column
c <- data.frame(std.data$std_name,std.data$marks)
print(c)

Output:

std.data.std_name std.data.marks
1            Nikita             87
2             Deep             89
3             Priya             72
4              Pihu             65

Example 2:

Let's see an example to access only first two rows with all columns:

std_id = c (1:4)
std_name = c("Nikita","Deep","Priya","Pihu")
marks = c(87, 89, 72, 65)
std.data <- data.frame(std_id, std_name, marks, stringsAsFactors = FALSE)
#extract top two rows
r <- std.data[1:2,]
print(r)

Output:

std_id std_name marks
1      1    Nikita     87
2      2     Deep    89

Example 3:

Let's see an example to extract 1st and 4th row with 2nd and 4th 3rd column:

std_id = c (1:4)
std_name = c("Nikita","Deep","Priya","Pihu")
marks = c(87, 89, 72, 65)
std.data <- data.frame(std_id, std_name, marks, stringsAsFactors = FALSE)
# Extract 1st and 4th row with 2nd and 3rd column.
a <- std.data[c(1,4),c(2,3)]
print(a)

Output:

std_name marks
1   Nikita    87
4     Pihu    65

Modify a Data Frame

We can modify the data frame by using reassignment.

Example:

std_id = c (1:4)
std_name = c("Nikita","Deep","Priya","Pihu")
marks = c(87, 89, 72, 65)
std.data <- data.frame(std_id, std_name, marks, stringsAsFactors = FALSE)
# modify the marks of pihu to 93
std.data[4,"marks"] <- 93
print(std.data)

Output:

std_id std_name marks
1      1   Nikita    87
2      2    Deep    89
3      3    Priya    72
4      4     Pihu    93

Adding Rows to a Data Frame

We can add the rows to the data frame using the rbind() function.

Example 1:

std_id = c (1:4)
std_name = c("Nikita","Deep","Priya","Pihu")
marks = c(87, 89, 72, 65)
std.data <- data.frame(std_id, std_name, marks, stringsAsFactors = FALSE)
rbind(std.data,list(5,"Nidhi", 90))

Output:

std_id std_name marks
1      1    Nikita    87
2      2    Deep    89
3      3    Priya    72
4      4    Pihu    65
5      5    Nidhi    90

We can use rbind() in another different way. In the below example we create a data frame with new rows and merge it with the existing data frame to create the final data frame:

Example 2:

# create the first data frame
std_id = c (1:4)
std_name = c("Nikita","Aman","Priya","Pihu")
marks = c(87, 89, 72, 65)
std.data1 <- data.frame(std_id, std_name, marks, stringsAsFactors = FALSE)
# create the 2nd data frame
std_id = c (5:6)
std_name = c("Jhon","Paul")
marks = c(81, 70)
std.data2 <- data.frame(std_id, std_name, marks, stringsAsFactors = FALSE)
# Bind the two data frames.
std.finaldata <- rbind(std.data1,std.data2)
print(std.finaldata)

Output:

std_id std_name marks
1      1   Nikita    87
2      2     Aman    89
3      3    Priya    72
4      4     Pihu    65
5      5     Jhon    81
6      6     Paul    70

Adding Columns to a Data Frame

We can add the rows to the data frame using the cbind() function.

Example 1:

# create the first data frame
std_id = c (1:4)
std_name = c("Nikita","Deep","Priya","Pihu")
marks = c(87, 89, 72, 65)
std.data <- data.frame(std_id, std_name, marks, stringsAsFactors = FALSE)
cbind(std.data,std_class=c("MCA","BE", "MSC", "MSC"))

Output:

  std_id std_name marks std_class
1      1   Nikita    87       MCA
2      2     Deep    89        BE
3      3    Priya    72       MSC
4      4     Pihu    65       MSC

Since data frames are implemented as a list; we can also add new columns through simple list-like assignments.

Example 2:

# create the first data frame
std_id = c (1:4)
std_name = c("Nikita","Deep","Priya","Pihu")
marks = c(87, 89, 72, 65)
std.data <- data.frame(std_id, std_name, marks, stringsAsFactors = FALSE)
std.data$Std_class <- c("MCA","BE", "MSC", "MSC")
print(std.data)

Output:

   std_id std_name marks std_class
1      1    Nikita    87      MCA
2      2    Deep    89       BE
3      3    Priya    72       MSC
4      4    Pihu    65        MSC

Deleting columns and Rows from a Data Frame

We can delete the data frame columns by assigning NULL to it.

Example:

# create the first data frame
std_id = c (1:4)
std_name = c("Nikita","Deep","Priya","Pihu")
marks = c(87, 89, 72, 65)
std.data <- data.frame(std_id, std_name, marks, stringsAsFactors = FALSE)
# Delete std_class column
std.data$Std_class <- NULL
# Delete 3rd row
std.data <- std.data[-3,]
print(std.data)

Output:

  std_id std_name marks
1      1     Nikita    87
2      2     Deep    89
4      4     Pihu     65
Reference: https://www.rdocumentation.org/packages/base/versions/3.5.3/topics/data.frame