R CSV Files
In R it is possible to read data from files and write data to a file. R supports various formats for reading and writing like csv, excel, xml, etc.
Here, we will learn to read data from a csv (comma separated value) file and then write data into a csv file.
Getting and Setting the Working Directory
We can get the current working directory of R by using getwd() function.
#get the current working directory > getwd() [1] "C:/Users/Nikita/Documents"
We can also set the new working directory using setwd() function.
#set the current working directory > setwd("F:/PC/Tutorial/R Tutorial") # get the current working directory > getwd() [1] "F:/PC/Tutorial/R Tutorial"
Create a CSV File
The csv file is text file in which the values in columns are separated by comma. Let’s create a CSV file. To create a csv file write the columns data separated by comma in notepad or any other text editor and save this file with the extension of .csv.
Let's see an example to create a csv file as demo.csv:
id,name,salary,Doj,dept 1,Nikita,50000,2018-01-01,IT 2,Deep,55000,2018-09-23,Operations 3,Kamlesh,30000,2019-11-15,IT 4,Priya,32000,2018-05-11,HR 5,Amita,25000,2015-03-26,Finance 6,Aman,60000,2017-05-21,IT 7,Divya,40000,2018-07-30,Operations 8,Kashish,20000,2019-06-17,Finance
Reading a CSV File
We can read a csv file through read.csv() function. This function will return the result in the form of data frame.
Example:
data <- read.csv("demo.csv") print(data)
Output:
id name salary Doj dept 1 1 Nikita 50000 2018-01-01 IT 2 2 Deep 55000 2018-09-23 Operations 3 3 Kamlesh 30000 2019-11-15 IT 4 4 Priya 32000 2018-05-11 HR 5 5 Amita 25000 2015-03-26 Finance 6 6 Aman 60000 2017-05-21 IT 7 7 Divya 40000 2018-07-30 Operations 8 8 Kashish 20000 2019-06-17 Finance
Analyzing the CSV File
Example 1: Find the number of rows and columns
We can find the number of columns and rows in the CSV file by using ncol() and nrow() function:
ncol(data) nrow(data)
Output:
[1] 5 [1] 8
Example 2: Get the Maximum Salary
# Create a data frame. data <- read.csv("demo.csv") # Get the max salary from data frame. sal <- max(data$salary) sal
Output:
[1] 60000
Example 3: Get the details of the person with maximum Salary
# Create a data frame. data <- read.csv("demo.csv") # Get the max salary from data frame. sal <- max(data$salary) # Get the person detail having max salary. result <- subset(data, salary == max(salary)) result
Output:
id name salary Doj dept 6 6 Aman 60000 2017-05-21 IT
Example 4: Get the details of people working in Operations department
# Create a data frame. data <- read.csv("demo.csv") result <- subset( data, dept == "Operations") print(result)
Output:
id name salary Doj dept 2 2 Deep 55000 2018-09-23 Operations 7 7 Divya 40000 2018-07-30 Operations
Example 5: Get the details of persons whose date of joining on or after 2018
# Create a data frame. data <- read.csv("demo.csv") result <- subset(data, as.Date(Doj) > as.Date("2018-01-01")) print(result)
Output:
id name salary Doj dept 2 2 Deep 55000 2018-09-23 Operations 3 3 Kamlesh 30000 2019-11-15 IT 4 4 Priya 32000 2018-05-11 HR 7 7 Divya 40000 2018-07-30 Operations 8 8 Kashish 20000 2019-06-17 Finance
Writing into a CSV File
write.csv() function is used to create a csv file from an existing data frame.
Example:
# Create a data frame. data <- read.csv("demo.csv") result <- subset(data, salary > 30000) # Write filtered data into a new file. write.csv(result,"demoWrite.csv", row.names = FALSE) newdata <- read.csv("demoWrite.csv") print(newdata)
Output:
id name salary Doj dept 1 1 Nikita 50000 2018-01-01 IT 2 2 Deep 55000 2018-09-23 Operations 3 4 Priya 32000 2018-05-11 HR 4 6 Aman 60000 2017-05-21 IT 5 7 Divya 40000 2018-07-30 Operations
Here, row.names argument allows us to set the names of the rows in the output data file. Default value of this argument is TRUE. Since it does not know what else to name the rows, it resorts to using row numbers. To correct this we can set row.names to FALSE.
There is also a col.names argument, which can be used to set the name of columns for a data set without headers. If the dataset already had headers then a col.names argument will be ignored.
Reference: https://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html