R CSV Files

In R it is possible to read data from files and write data to a file. R supports various formats for reading and writing like csv, excel, xml, etc.

Here, we will learn to read data from a csv (comma separated value) file and then write data into a csv file.

Getting and Setting the Working Directory

We can get the current working directory of R by using getwd() function.

#get the current working directory
> getwd()
[1] "C:/Users/Nikita/Documents"

We can also set the new working directory using setwd() function.

#set the current working directory
> setwd("F:/PC/Tutorial/R Tutorial")
# get the current working directory
> getwd()
[1] "F:/PC/Tutorial/R Tutorial"

Create a CSV File

The csv file is text file in which the values in columns are separated by comma. Let’s create a CSV file. To create a csv file write the columns data separated by comma in notepad or any other text editor and save this file with the extension of .csv.

Let's see an example to create a csv file as demo.csv:

id,name,salary,Doj,dept
1,Nikita,50000,2018-01-01,IT
2,Deep,55000,2018-09-23,Operations
3,Kamlesh,30000,2019-11-15,IT
4,Priya,32000,2018-05-11,HR
5,Amita,25000,2015-03-26,Finance
6,Aman,60000,2017-05-21,IT
7,Divya,40000,2018-07-30,Operations
8,Kashish,20000,2019-06-17,Finance

Reading a CSV File

We can read a csv file through read.csv() function. This function will return the result in the form of data frame.

Example:

data <- read.csv("demo.csv")
print(data)

Output:

  id    name salary        Doj       dept
1  1  Nikita  50000 2018-01-01         IT
2  2    Deep  55000 2018-09-23 Operations
3  3 Kamlesh  30000 2019-11-15         IT
4  4   Priya  32000 2018-05-11         HR
5  5   Amita  25000 2015-03-26    Finance
6  6    Aman  60000 2017-05-21         IT
7  7   Divya  40000 2018-07-30 Operations
8  8 Kashish  20000 2019-06-17    Finance

Analyzing the CSV File

Example 1: Find the number of rows and columns

We can find the number of columns and rows in the CSV file by using ncol() and nrow() function:

ncol(data)
nrow(data)

Output:

[1] 5
[1] 8

Example 2: Get the Maximum Salary

# Create a data frame.
data <- read.csv("demo.csv")
# Get the max salary from data frame.
sal <- max(data$salary)
sal

Output:

[1] 60000

Example 3: Get the details of the person with maximum Salary

# Create a data frame.
data <- read.csv("demo.csv")
# Get the max salary from data frame.
sal <- max(data$salary)
# Get the person detail having max salary.
result <- subset(data, salary == max(salary))
result

Output:

  id name salary        Doj dept
6  6 Aman  60000 2017-05-21   IT

Example 4: Get the details of people working in Operations department

# Create a data frame.
data <- read.csv("demo.csv")
result <- subset( data, dept == "Operations")
print(result)

Output:

  id  name salary        Doj       dept
2  2  Deep  55000 2018-09-23 Operations
7  7 Divya  40000 2018-07-30 Operations

Example 5: Get the details of persons whose date of joining on or after 2018

# Create a data frame.
data <- read.csv("demo.csv")
result <- subset(data, as.Date(Doj) > as.Date("2018-01-01"))
print(result)

Output:

id    name salary        Doj       dept
2  2    Deep  55000 2018-09-23 Operations
3  3 Kamlesh  30000 2019-11-15         IT
4  4   Priya  32000 2018-05-11         HR
7  7   Divya  40000 2018-07-30 Operations
8  8 Kashish  20000 2019-06-17    Finance

Writing into a CSV File

write.csv() function is used to create a csv file from an existing data frame.

Example:

# Create a data frame.
data <- read.csv("demo.csv")
result <- subset(data, salary > 30000)
# Write filtered data into a new file.
write.csv(result,"demoWrite.csv", row.names = FALSE)
newdata <- read.csv("demoWrite.csv")
print(newdata)

Output:

  id   name salary        Doj       dept
1  1 Nikita  50000 2018-01-01         IT
2  2   Deep  55000 2018-09-23 Operations
3  4  Priya  32000 2018-05-11         HR
4  6   Aman  60000 2017-05-21         IT
5  7  Divya  40000 2018-07-30 Operations

Here, row.names argument allows us to set the names of the rows in the output data file. Default value of this argument is TRUE. Since it does not know what else to name the rows, it resorts to using row numbers. To correct this we can set row.names to FALSE.

There is also a col.names argument, which can be used to set the name of columns for a data set without headers. If the dataset already had headers then a col.names argument will be ignored.

 Reference: https://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html