R XML Files

XML stands for Extensible Markup Language. It is a file format which shares both file format and data on the WWW, intranets, and elsewhere using ASCII text. It is a popular choice for storing large amounts of complex data. It is very similar to HTML it contains markup tags.

HTML describes the structure of the page, and XML tags describe the meaning of the data contained into the file.

Install the XML Package

To read the XML file in R, use the “XML” package.  To install this package, use the following command:

install.packages("XML")

Create an XML file

To create an XML file copy the below data or create your own data into a text editor like notepad and save this file with .xml extension. Here, my file name is Student.xml.

<RECORDS>
   <STUDENT>
      <ID>1</ID>
      <NAME>Nikita</NAME>
      <AGE>24</AGE>
      <ADDMISSIONDATE>1/1/2018</ADDMISSIONDATE>
      <DEPT>IT</DEPT>
   </STUDENT>
   <STUDENT>
      <ID>2</ID>
      <NAME>Deep</NAME>
      <AGE>29</AGE>
      <ADDMISSIONDATE>9/23/2018</ADDMISSIONDATE>
      <DEPT>CS</DEPT>
   </STUDENT>   
   <STUDENT>
      <ID>3</ID>
      <NAME>Aryan</NAME>
      <AGE>21</AGE>
      <ADDMISSIONDATE>11/15/2017</ADDMISSIONDATE>
      <DEPT>Finance</DEPT>
   </STUDENT>   
   <STUDENT>
      <ID>4</ID>
      <NAME>Nidhi</NAME>
      <AGE>21</AGE>
      <ADDMISSIONDATE>3/27/2015</ADDMISSIONDATE>
      <DEPT>HR</DEPT>
   </STUDENT>
   <STUDENT>
      <ID>5</ID>
      <NAME>Aman</NAME>
      <AGE>27</AGE>
      <ADDMISSIONDATE>5/21/2013</ADDMISSIONDATE>
      <DEPT>IT</DEPT>
   </STUDENT>   
</RECORDS>

Reading XML File

To read the xml file “XML” package provides a function called xmlParse().

Example:

# Load the package XML.
library("XML")
# load other required package methods.
library("methods")
result <- xmlParse(file = "Student.xml")
# Print the result.
print(result)

Output:

<?xml version="1.0"?>
<RECORDS>
  <STUDENT>
    <ID>1</ID>
    <NAME>Nikita</NAME>
    <AGE>24</AGE>
    <ADDMISSIONDATE>1/1/2018</ADDMISSIONDATE>
    <DEPT>IT</DEPT>
  </STUDENT>
  <STUDENT>
    <ID>2</ID>
    <NAME>Deep</NAME>
    <AGE>29</AGE>
    <ADDMISSIONDATE>9/23/2018</ADDMISSIONDATE>
    <DEPT>CS</DEPT>
  </STUDENT>
  <STUDENT>
    <ID>3</ID>
    <NAME>Aryan</NAME>
    <AGE>21</AGE>
    <ADDMISSIONDATE>11/15/2017</ADDMISSIONDATE>
    <DEPT>Finance</DEPT>
  </STUDENT>
  <STUDENT>
    <ID>4</ID>
    <NAME>Nidhi</NAME>
    <AGE>21</AGE>
    <ADDMISSIONDATE>3/27/2015</ADDMISSIONDATE>
    <DEPT>HR</DEPT>
  </STUDENT>
  <STUDENT>
    <ID>5</ID>
    <NAME>Aman</NAME>
    <AGE>27</AGE>
    <ADDMISSIONDATE>5/21/2013</ADDMISSIONDATE>
    <DEPT>IT</DEPT>
  </STUDENT>
</RECORDS>

Find the Number of Nodes Present in XML File

# Load the packages required to read XML files.
library("XML")
library("methods")
# Give the input file name to the function.
result <- xmlParse(file = "Student.xml")
# Exract the root node form the xml file.
rootnode <- xmlRoot(result)
# Find number of nodes in the root.
rootsize <- xmlSize(rootnode)
# Print the result.
print(rootsize)

Output:

[1] 5

Get the details of the first node

Example:

# Load the packages required to read XML files.
library("XML")
library("methods")
# Give the input file name to the function.
result <- xmlParse(file = "Studnet.xml")
# Exract the root node form the xml file.
rootnode <- xmlRoot(result)
# Print the result.
print(rootnode[1])

Output:

$STUDENT
<STUDENT>
  <ID>1</ID>
  <NAME>Nikita</NAME>
  <AGE>24</AGE>
  <ADDMISSIONDATE>1/1/2018</ADDMISSIONDATE>
  <DEPT>IT</DEPT>
</STUDENT>
attr(,"class")
[1] "XMLInternalNodeList" "XMLNodeList"

Get Different Elements of a Node

Example:

# Load the packages required to read XML files.
library("XML")
library("methods")
# Give the input file name to the function.
result <- xmlParse(file = "Student.xml")
# Exract the root node form the xml file.
rootnode <- xmlRoot(result)
# Get the first element of the fifth node.
print(rootnode[[5]][[1]])
# Get the second element of the first node.
print(rootnode[[1]][[2]])
# Get the third element of the second node.
print(rootnode[[2]][[3]])

Output:

<ID>5</ID>
<NAME>Nikita</NAME>
<AGE>29</AGE>

Convert XML to Data frame

xmlToDataFrame() function is used to convert the XML file into data frame.

Example:

# Load the packages required to read XML files.
library("XML")
library("methods")
# Convert the input xml file to a data frame.
df<- xmlToDataFrame("Student.xml")
print(df)

Output:

  ID   NAME AGE ADDMISSIONDATE    DEPT
1  1 Nikita  24       1/1/2018      IT
2  2   Deep  29      9/23/2018      CS
3  3  Aryan  21     11/15/2017 Finance
4  4  Nidhi  21      3/27/2015      HR
5  5   Aman  27      5/21/2013      IT
Reference: https://stackoverflow.com/questions/35181106/parsing-xml-to-r