XML is a moveable, open-source language that permits programmers to expand applications that know how to be interpreted by other applications, in spite of the operating system and/or developmental language.
XML stands for Extensible Markup Language. This language is considered similar to HTML or SGML. This is optional for accessing as an open normal by the World Wide Web Consortium.
XML is extremely obliging for keeping track of little to medium amounts of data without no requiring a SQL-based backbone.
The Python normal library supplies a negligible but functional set of interfaces to work with XML.
The two mainly essential and generally employed APIs to XML data are the SAX and DOM interfaces.
SAX clearly never processes in order as fast as DOM can while operational with large files. On the other give, DOM wholly can actually kill your reserves, particularly if used on a lot of small files.
SAX is read-only, as DOM permits changes to the XML file. As these two diverse APIs factually harmonize each other, there is no motive for both for large projects.
For all our XML code examples, let's use a simple XML file movies.xml as an input:
<collection shelf="New Arrivals"> <movie title="Enemy Behind"> War, Thriller DVD 2003 PG 10 Talk about a US-Japan war <movie title="Transformers"> Anime, Science Fiction DVD 1989 R 8 A schientific fiction <movie title="Trigun"> Anime, Action DVD 4 PG 10 Vash the Stampede! <movie title="Ishtar"> Comedy VHS PG 2 Viewable boredom
Parsing XML with SAX APIs
SAX is an average edge for event-driven XML parsing. Parsing XML with SAX normally necessitates you to generate your own ContentHandler by subclassing xml.sax.ContentHandler.
Your ContentHandler handles the particular tags and attributes of your flavor(s) of XML. A ContentHandler object provides methods to handle various parsing events. Its owning parser calls ContentHandler methods as it parses the XML file.
The methods startDocument and endDocument are called at the start and the end of the XML file. The method characters(text) are parsed character data of the XML file via the parameter text.
The ContentHandler is called at the start and end of each element. If the parser is not in namespace mode, the methods startElement(tag, attributes) and endElement(tag) are called; otherwise, the corresponding methods startElementNS and endElementNS are called. Here, tag is the element tag, and attributes is an Attributes object.
Here are other important methods to understand before proceeding-
Following method creates a new parser object and returns it. The parser object created will be of the first parser type the system finds.
xml.sax.make_parser( [parser_list] )
Here is the detail of the parameters:
The following method creates a SAX parser and uses it to parse a document.
xml.sax.parse( xmlfile, contenthandler[, errorhandler])
Here is the detail of the parameters -
There is one more method to create a SAX parser and to parse the specified XML string.
xml.sax.parseString(xmlstring, contenthandler[, errorhandler])
Here is the detail of the parameters -
#!/usr/bin/python import xml.sax class MovieHandler( xml.sax.ContentHandler ): def __init__(self): self.CurrentData = "" self.type = "" self.format = "" self.year = "" self.rating = "" self.stars = "" self.description = "" # Call when an element starts def startElement(self, tag, attributes): self.CurrentData = tag if tag == "movie": print "*****Movie*****" title = attributes["title"] print "Title:", title # Call when an elements ends def endElement(self, tag): if self.CurrentData == "type": print "Type:", self.type elif self.CurrentData == "format": print "Format:", self.format elif self.CurrentData == "year": print "Year:", self.year elif self.CurrentData == "rating": print "Rating:", self.rating elif self.CurrentData == "stars": print "Stars:", self.stars elif self.CurrentData == "description": print "Description:", self.description self.CurrentData = "" # Call when a character is read def characters(self, content): if self.CurrentData == "type": self.type = content elif self.CurrentData == "format": self.format = content elif self.CurrentData == "year": self.year = content elif self.CurrentData == "rating": self.rating = content elif self.CurrentData == "stars": self.stars = content elif self.CurrentData == "description": self.description = content if ( __name__ == "__main__"): # create an XMLReader parser = xml.sax.make_parser() # turn off namepsaces parser.setFeature(xml.sax.handler.feature_namespaces, 0) # override the default ContextHandler Handler = MovieHandler() parser.setContentHandler( Handler ) parser.parse("movies.xml")
This would produce the following result:
*****Movie***** Title: Enemy Behind Type: War, Thriller Format: DVD Year: 2003 Rating: PG Stars: 10 Description: Talk about a US-Japan war *****Movie***** Title: Transformers Type: Anime, Science Fiction Format: DVD Year: 1989 Rating: R Stars: 8 Description: A schientific fiction *****Movie***** Title: Trigun Type: Anime, Action Format: DVD Rating: PG Stars: 10 Description: Vash the Stampede! *****Movie***** Title: Ishtar Type: Comedy Format: VHS Rating: PG Stars: 2 Description: Viewable boredom
For complete detail on SAX API documentation, please refer to standard Python SAX APIs.
The Document Object Model ("DOM") is a cross-language API from the World Wide Web Consortium (W3C) for accessing and modifying XML documents.
The DOM is tremendously practical for random-access applications. SAX only tolerates a vision of one bit of the document at an instance. If you are searching at one SAX element, you encompass no contact with another.
Here is the easiest method to load an XML document and to make a minidom object by the xml.dom module. The minidom object gives an easy parser method that rapidly creates a DOM tree from the XML file.
The sample phrase calls the parse( file [,parser] ) function of the minidom object to parse the XML file designated by the file into a DOM tree object.
#!/usr/bin/python from xml.dom.minidom import parse import xml.dom.minidom # Open XML document using minidom parser DOMTree = xml.dom.minidom.parse("movies.xml") collection = DOMTree.documentElement if collection.hasAttribute("shelf"): print "Root element : %s" % collection.getAttribute("shelf") # Get all the movies in the collection movies = collection.getElementsByTagName("movie") # Print detail of each movie. for movie in movies: print "*****Movie*****" if movie.hasAttribute("title"): print "Title: %s" % movie.getAttribute("title") type = movie.getElementsByTagName('type')[0] print "Type: %s" % type.childNodes[0].data format = movie.getElementsByTagName('format')[0] print "Format: %s" % format.childNodes[0].data rating = movie.getElementsByTagName('rating')[0] print "Rating: %s" % rating.childNodes[0].data description = movie.getElementsByTagName('description')[0] print "Description: %s" % description.childNodes[0].data
This would produce the following result:
Root element : New Arrivals *****Movie***** Title: Enemy Behind Type: War, Thriller Format: DVD Rating: PG Description: Talk about a US-Japan war *****Movie***** Title: Transformers Type: Anime, Science Fiction Format: DVD Rating: R Description: A schientific fiction *****Movie***** Title: Trigun Type: Anime, Action Format: DVD Rating: PG Description: Vash the Stampede! *****Movie***** Title: Ishtar Type: Comedy Format: VHS Rating: PG Description: Viewable boredom
Here at Intellinuts, we have created a complete Python tutorial for Beginners to get started in Python.