R Data Frames
A data frame is a table or a two-dimensional array like structure in which each column contains values of one variable (means the first column can be a character, the second column can be numeric and thirds column can be logical, etc.) and each row has one set of values from each column. Let’s see some characteristics of the data frame:
- Name of the column should be non-empty.
- Name of the row should be unique.
- The data stored in the data frame can be a factor, numeric or character type.
- Each column must contain the same number of data items.
Creating a Data Frame
To create a data frame we have to use data.frame() function.
std_id = c (1:4) std_name = c('Nikita','Deep','Priya',"Pihu") marks = c(87, 89, 72, 65) std.data <- data.frame(std_id, std_name, marks) std.data
Output:
std_id std_name marks 1 1 Nikita 87 2 2 Deep 89 3 3 Priya 72 4 4 Pihu 65
Get the Structure of Data Frame
We can get the structure of the data frame by using the str() function.
Example:
std_id = c (1:4) std_name = c("Nikita","Deep","Priya","Pihu") marks = c(87, 89, 72, 65) std.data <- data.frame(std_id, std_name, marks, stringsAsFactors = FALSE) # get the structure of the data frame str(std.data)
Output:
'data.frame': 4 obs. of 3 variables: $ std_id : int 1 2 3 4 $ std_name: chr "Nikita" "Deep" "Priya" "Pihu" $ marks : num 87 89 72 65
Summary of Data in Data Frame
We can obtain the nature and summary of data by applying the summary() function.
Example:
std_id = c (1:4) std_name = c("Nikita","Deep","Priya","Pihu") marks = c(87, 89, 72, 65) std.data <- data.frame(std_id, std_name, marks, stringsAsFactors = FALSE) # print the summary print(summary(std.data))
Output:
std_id std_name marks Min. :1.00 Length:4 Min. :65.00 1st Qu.:1.75 Class :character 1st Qu.:70.25 Median :2.50 Mode :character Median :79.50 Mean :2.50 Mean :78.25 3rd Qu.:3.25 3rd Qu.:87.50 Max. :4.00 Max. :89.00
Access Specific Column from a Data Frame
We can extract a specific column from a data frame using the column name.
Example 1:
Let's see an example to extract the name and marks of student from the student data frame:
std_id = c (1:4) std_name = c("Nikita","Deep","Priya","Pihu") marks = c(87, 89, 72, 65) std.data <- data.frame(std_id, std_name, marks, stringsAsFactors = FALSE) #extract specific column c <- data.frame(std.data$std_name,std.data$marks) print(c)
Output:
std.data.std_name std.data.marks 1 Nikita 87 2 Deep 89 3 Priya 72 4 Pihu 65
Example 2:
Let's see an example to access only first two rows with all columns:
std_id = c (1:4) std_name = c("Nikita","Deep","Priya","Pihu") marks = c(87, 89, 72, 65) std.data <- data.frame(std_id, std_name, marks, stringsAsFactors = FALSE) #extract top two rows r <- std.data[1:2,] print(r)
Output:
std_id std_name marks 1 1 Nikita 87 2 2 Deep 89
Example 3:
Let's see an example to extract 1st and 4th row with 2nd and 4th 3rd column:
std_id = c (1:4) std_name = c("Nikita","Deep","Priya","Pihu") marks = c(87, 89, 72, 65) std.data <- data.frame(std_id, std_name, marks, stringsAsFactors = FALSE) # Extract 1st and 4th row with 2nd and 3rd column. a <- std.data[c(1,4),c(2,3)] print(a)
Output:
std_name marks 1 Nikita 87 4 Pihu 65
Modify a Data Frame
We can modify the data frame by using reassignment.
Example:
std_id = c (1:4) std_name = c("Nikita","Deep","Priya","Pihu") marks = c(87, 89, 72, 65) std.data <- data.frame(std_id, std_name, marks, stringsAsFactors = FALSE) # modify the marks of pihu to 93 std.data[4,"marks"] <- 93 print(std.data)
Output:
std_id std_name marks 1 1 Nikita 87 2 2 Deep 89 3 3 Priya 72 4 4 Pihu 93
Adding Rows to a Data Frame
We can add the rows to the data frame using the rbind() function.
Example 1:
std_id = c (1:4) std_name = c("Nikita","Deep","Priya","Pihu") marks = c(87, 89, 72, 65) std.data <- data.frame(std_id, std_name, marks, stringsAsFactors = FALSE) rbind(std.data,list(5,"Nidhi", 90))
Output:
std_id std_name marks 1 1 Nikita 87 2 2 Deep 89 3 3 Priya 72 4 4 Pihu 65 5 5 Nidhi 90
We can use rbind() in another different way. In the below example we create a data frame with new rows and merge it with the existing data frame to create the final data frame:
Example 2:
# create the first data frame std_id = c (1:4) std_name = c("Nikita","Aman","Priya","Pihu") marks = c(87, 89, 72, 65) std.data1 <- data.frame(std_id, std_name, marks, stringsAsFactors = FALSE) # create the 2nd data frame std_id = c (5:6) std_name = c("Jhon","Paul") marks = c(81, 70) std.data2 <- data.frame(std_id, std_name, marks, stringsAsFactors = FALSE) # Bind the two data frames. std.finaldata <- rbind(std.data1,std.data2) print(std.finaldata)
Output:
std_id std_name marks 1 1 Nikita 87 2 2 Aman 89 3 3 Priya 72 4 4 Pihu 65 5 5 Jhon 81 6 6 Paul 70
Adding Columns to a Data Frame
We can add the rows to the data frame using the cbind() function.
Example 1:
# create the first data frame std_id = c (1:4) std_name = c("Nikita","Deep","Priya","Pihu") marks = c(87, 89, 72, 65) std.data <- data.frame(std_id, std_name, marks, stringsAsFactors = FALSE) cbind(std.data,std_class=c("MCA","BE", "MSC", "MSC"))
Output:
std_id std_name marks std_class 1 1 Nikita 87 MCA 2 2 Deep 89 BE 3 3 Priya 72 MSC 4 4 Pihu 65 MSC
Since data frames are implemented as a list; we can also add new columns through simple list-like assignments.
Example 2:
# create the first data frame std_id = c (1:4) std_name = c("Nikita","Deep","Priya","Pihu") marks = c(87, 89, 72, 65) std.data <- data.frame(std_id, std_name, marks, stringsAsFactors = FALSE) std.data$Std_class <- c("MCA","BE", "MSC", "MSC") print(std.data)
Output:
std_id std_name marks std_class 1 1 Nikita 87 MCA 2 2 Deep 89 BE 3 3 Priya 72 MSC 4 4 Pihu 65 MSC
Deleting columns and Rows from a Data Frame
We can delete the data frame columns by assigning NULL to it.
Example:
# create the first data frame std_id = c (1:4) std_name = c("Nikita","Deep","Priya","Pihu") marks = c(87, 89, 72, 65) std.data <- data.frame(std_id, std_name, marks, stringsAsFactors = FALSE) # Delete std_class column std.data$Std_class <- NULL # Delete 3rd row std.data <- std.data[-3,] print(std.data)
Output:
std_id std_name marks 1 1 Nikita 87 2 2 Deep 89 4 4 Pihu 65Reference: https://www.rdocumentation.org/packages/base/versions/3.5.3/topics/data.frame