R Excel Files

Microsoft Excel is the spreadsheet program which stores data in the .xls and .xlsx format. R has the facility to read directly from these files using some excel specific packages.

Some excel packages are: XLConnect, xlsx, gdata, RODBC, RExcel, etc. Here, we will use the xlsx package.

Install xlsx package

Xlsx package provides all necessary tools to interact with Excel files from R. Many of the benefits of the xlsx package come from being able to export and format excel files from R. Following command is used to install the xlsx package in R:

install.packages("xlsx")

Verify and Load the “xlsx” Package

Following command is used to verify and load the package:

# Verify the package is installed.
any(grepl("xlsx",installed.packages()))
# Load the library into R workspace.
library("xlsx")

Output:

TRUE
package ‘rJava’ successfully unpacked and MD5 sums checked
package ‘xlsxjars’ successfully unpacked and MD5 sums checked
package ‘xlsx’ successfully unpacked and MD5 sums checked

Creating an Excel File

Create two worksheets in excel. Use the default name sheet1 and sheet2. Give the name of excel is Student.xlsx.

Sheet1:

  Roll_no   name Age addmission_date    dept
       1 Nikita  24      2018-01-01      IT
       2   Deep  29      2015-09-23      CS
       3  Aryan  21      2017-11-15 Finance
       4  Nidhi  25      2015-03-27      HR
       5   Aman  27      2013-05-21      IT

And sheet2:

    name      city
Nikita    Bangalore
   Deep    Delhi
  Aryan    Bhopal
  Nidhi    Raipur
   Aman    Noida

Reading the Excel File

In R, read.xlsx() function is used to read the excel file. The result will store as a data frame in the R environment.

Example:

Let's see an example to read the first worksheet using a sheet index or name:

# using sheet name
read.xlsx("Student.xlsx", sheetName = "Sheet1")
# using sheet index
read.xlsx("Student.xlsx", sheetIndex = 1)

Output:

  Roll_no   name Age addmission_date    dept
1       1 Nikita  24      2018-01-01      IT
2       2   Deep  29      2015-09-23      CS
3       3  Aryan  21      2017-11-15 Finance
4       4  Nidhi  25      2015-03-27      HR
5       5   Aman  27      2013-05-21      IT
  Roll_no   name Age addmission_date    dept
1       1 Nikita  24      2018-01-01      IT
2       2   Deep  29      2015-09-23      CS
3       3  Aryan  21      2017-11-15 Finance
4       4  Nidhi  25      2015-03-27      HR
5       5   Aman  27      2013-05-21      IT

Read the second worksheet:

# using sheet index
read.xlsx("Student.xlsx", sheetIndex = 2)

Output:

   name      city
1 Nikita Bangalore
2   Deep     Delhi
3  Aryan    Bhopal
4  Nidhi    Raipur
5   Aman     Noida

If we want to read data starts further down in the Excel worksheet we can include the startRow argument.

Example:

# reading data below the second line
read.xlsx("Student.xlsx", sheetName = "Sheet2", startRow = 3)

Output:

  Deep  Delhi
1 Aryan Bhopal
2 Nidhi Raipur
3  Aman  Noida

We can also specify the number of rows and columns. For this xlsx package has rowIndex and colIndex argument.

Example:

# read in a range of rows
read.xlsx("Student.xlsx", sheetName = "Sheet2", rowIndex = 3:5)

Output:

   Deep  Delhi
1 Aryan Bhopal
2 Nidhi Raipur
Reference: https://www.datacamp.com/community/tutorials/r-tutorial-read-excel-into-r