Thursday, June 18, 2009

SAX XML Parsing using Java

If you are into development world of any language (except probably Mainframe related language) you will find XML very handy. The reason might be the compatibility of this data storing language rather then data processing language with almost every known language.

XML stores data and if the data needs to be passed from one system to another system (system might be applications or any other computer languages), it needs to be parsed.Basically parsing is a way for reading data from and XML document


There are number of ways in which you can parse the XML document. Two most popular ways of the parsing is SAX parsing and DOM (Document Object Model).


Here we are going to have a look onto the SAX parsing. I took some important points from wiki which I am jotting down in a nutshell.


SAX or Simple API for XML is a serial access parser API for XML. It implements SAX functions as a stream parser, with an event-driven API.


The user defines a number of callback methods that will be called when events occur during parsing.

The SAX events include:
XML Text nodes, XML Element nodes, XML Processing Instructions, XML Comments.


Events are fired when each of these XML features are encountered, and again when the end of them is encountered. XML attributes are provided as part of the data passed to element events.


SAX parsing is unidirectional; previously parsed data cannot be re-read without starting the parsing operation again.



This XML document, when passed through a SAX parser, will generate a sequence of events like the following:
XML Element start, XML Text node, XML Element end.


To perform SAX parsing using JAVA, there are APIs available. I found xerces most useful.


The link for the API is xercesAPI

To download the installable (xercesImpl.jar) click here.

The XML which we are going to parse is:

< company name="TelComm Company Inc.">
< employees>
< employee>
< name>
< first>Virendra< /first>
< last>Sehwag< /last>
< /name>
< office>Delhi< /office>
< telephone>123456< /telephone>
< /employee>
< employee>
< name>
< first>Yuvraj< /first>
< last>Singh< /last>
< /name>
< office>Punjab< /office>
< telephone>123457< /telephone>
< /employee>
< /employees>
< /company>



The code for the SAX parsing is:

//Overridden methods are being called while parsing, there are still more methods which are being called
//However we have overridden only three methods (startElement(),endElement(),getText()).

/*

The significance of parameters of the startElement and endElement methods
uri - The Namespace URI, or the empty string if the element has no Namespace URI or if Namespace processing is not being performed.
localName - The local name (without prefix), or the empty string if Namespace processing is not being performed.
qName - The qualified XML 1.0 name (with prefix), or the empty string if qualified names are not available.
attributes - The specified or defaulted attributes of the tag
*/

import java.util.*;
import java.io.*;
import org.xml.sax.*;
import org.apache.xerces.parsers.*;
import org.xml.sax.helpers.DefaultHandler;

public class SAXParserDemo extends DefaultHandler {
CharArrayWriter text = new CharArrayWriter();
SAXParser parser = new SAXParser();

String firstName;
String lastName;
String office;
String telephone;

public void parse(InputStream is) throws SAXException, IOException {
parser.setContentHandler(this);
parser.parse( new InputSource(is));
}


//startElement - Overridden method of the DefaultHandler class, being called when parsing the XML
//this method to take specific actions at the start of each element(such as allocating a new tree node or writing output to a file)
public void startElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, Attributes attributes) throws SAXException {
text.reset();
if (qName.equals("company")) {
String name = attributes.getValue("name");
String header = "Employee Listing For "+name;
System.out.println(header);
System.out.println();
}

}

//endElement - Overridden method of the DefaultHandler class, being called when parsing the XML
//this method to take specific actions at the end of each element (such as finalising a tree node or writing output to a file).

public void endElement(java.lang.String uri, java.lang.String localName, java.lang.String qName) throws SAXException {
if (qName.equals("first")) {
firstName = getText();
}

if (qName.equals("last")) {
lastName = getText();
}

if (qName.equals("office")) {
office = getText();
}

if (qName.equals("telephone")) {
telephone = getText();
}

if (qName.equals("employee")) {
System.out.println(office + "\t " + firstName + "\t" + lastName + "\t" + telephone);
}

}

public String getText() {
return text.toString().trim();
}

//characters - Overridden method of the DefaultHandler class
//this method to take specific actions for each chunk of character data (such as adding the data to a node or buffer, or printing it to a file).
public void characters(char[] ch, int start, int length) {
text.write(ch,start,length);
}

public static void main(String[] args) throws Exception {
SAXParserDemo saxParser = new SAXParserDemo();
saxParser.parse(new FileInputStream("Company.xml"));
}

}



Thanks. Hope you all will find it useful.

No comments:

Total Pageviews