Convert XML to JSON in Python
XML conversion is very useful if we work on an API that returns data in JSON format and the source of data is in XML format.
JSON
A JSON file reserves the simple data structures and objects in JSON format, Json is a standard data exchange format. Json is used for data transmission among a server and web application.
The data in JSON object is in the form of a key and value pair. The keys refer to strings and the values are JSON types and these both are separated by a colon. By using the text editor Json files can be edited.
XML
XML is used for storing the data. It is case-sensitive. The markup items are defined with XML and produce personalized markup language. An element is defined as the basic unit. There is no predefined tags in XML. It clarifies sharing of data, transport of data, changes in the platform.
Converting XML to JSON
There is no built-in library included in Python for working with XML. But we can do this using the xmltodict library and can be installed via PIP. First, we need to install xmltodict module via PIP
pip install xmltodict
Now, we need to import the inbulit module JSON by using the import keyword
import json
Reading the XML file
We can easily read the XML file with below code snippet
with open("xml_file.xml") as xml_file:
data_dict = xmltodict.parse(xml_file.read())
Closing the XML file
xml_file.close()
Convert the xml_data into a dictionary
json_data = json.dumps(data_dict)
Writing the json_data to the output file
with open("data.json", "w") as json_file:
json_file.write(json_data)
Closing the output file
json_file.close()
Example:
import xmltodict
import json
obj = xmltodict.parse("""<teachers>
<teacher memberid = "10001">
<name> Akash mishra </name>
<gender> male </gender>
<subjects>
<subject name= "Math" class= "KG" />
<subject name= "Science" class= "PG" />
<subject name= "Social Science" class = "LKG" />
</subjects>
</teacher>
<teacher memberid = "10002">
<name> Anuj rana </name>
<gender> male </gender>
<subjects>
<subject name= "Math" class= "KG" />
<subject name= "Science" class= "PG" />
<subject name= "Social Science" class = "LKG" />
</subjects>
</teacher>
</teachers>""")
print(json.dumps(obj))
Output:
{"teachers": {"teacher": [{"@memberid": "10001", "name": "Akash mishra", "gender": "male", "subjects": {"subject": [{"@name": "Math", "@class": "KG"}, {"@name": "Science", "@class": "PG"}, {"@name": "Social Science", "@class": "LKG"}]}}, {"@memberid": "10002", "name": "Anuj rana", "gender": "male", "subjects": {"subject": [{"@name": "Math", "@class": "KG"}, {"@name": "Science", "@class": "PG"}, {"@name": "Social Science", "@class": "LKG"}]}}]}}
The xmltodict.parse() method is used for converting the XML document to a Python object.
Convert XML to Python Object
There is another library named untangle that is used for converting XML.
We can install it by using the below command
pip install untangle
Example:
import untangle
obj = untangle.parse("""
<employees>
<employee>
<name>Akash Sharma</name>
<role>Software engineer</role>
<age>29</age>
</employee>
</employees>
""")
#Access the name
print(obj.employees.employee.name.cdata)
Output:
Akash Sharma
Reading with untangle.parse method
import untangle
obj = untangle.parse("Employee.xml")
print(obj.employees.employee.name.cdata)
Output:
Akash Sharma
XML to JSON with Pandas
We can also convert XML to JSON with Pandas module. Pandas are essentially used for data cleaning in data science.
Installation
If we want to read XML with Pandas we need to install the pandas-read-xml via PIP. It can be installed by using the below command.
pip install pandas-read-xml
With using the below code we can conver XML to JSON in Pandas
import pandas_read_xml as pdx
df = pdx.read_xml("Student.xml")
print(df.to_json())
The Pandas module requires other modules for working(idna, numpy, chardet, pytz, six, python-dateutil, pandas, certifi, urllib3, requests, pyarrow). Because of this, it takes much memory so, the xmltodict method is highly efficient for XML to JSON conversion.
XML to JSON with Beautifulsoup
The main purpose of the Beautifulsoup library is parsing the HTML.It can also parse XML using the third-party parser known as lxml.Firstly, we need to install both modules.These can be easily installed with the below command
pip install beautifulsoup4
pip install lxml
Example:
from bs4 import BeautifulSoup
import json
#Loading xml
xml_parser = BeautifulSoup(open('Employee.xml'), 'xml')
Extracting the suitable information
name = xml_parser.find('name').contents[0]
age = xml_parser.find('age').contents[0]
role = xml_parser.find('role').contents[0]
employee = {
'name':name,
'age': age,
'role': role
}
print(json.dumps(employee))
Output:
{"name": "Akash", "age": "29", "role": "Software engineer"}
Conclusion
In the above tutorial, you have understood how to convert the XML to JSON with different methods. We have seen different methods in the conversion of XML to JSON.