Previously we’ve explored how to parse XML data using Node.js as well as PHP. Continuing on the trend of parsing data using various programming languages, this time we’re going to take a look at parsing XML data using the dom4j library with Java.
Now dom4j, is not the only way to parse XML data in Java. There are many other ways including using the SAX parser. Everyone will have their own opinions on which of the many to use.
To keep up with my previous two XML tutorials, we’re going to use the following XML data saved in a file called data.xml at the root of the project:
<?xml version='1.0'?>
<business>
<company>Code Blog</company>
<owner>Nic Raboy</owner>
<employees>
<employee>
<firstname>Nic</firstname>
<lastname>Raboy</lastname>
</employee>
<employee>
<firstname>Maria</firstname>
<lastname>Campos</lastname>
</employee>
</employees>
</business>
With our XML content figured out, let’s make sure we structure our project like the following:
project root
src
xmlparser
MainDriver.java
libs
dom4j-1.6.1.jar
build.xml
data.xml
Based on our project structure, you can probably tell that we’re going to be using Apache Ant for building. Say what you want about using Ant, but I’m still one of many who still uses it. Feel free to make changes to Apache Maven or other to better meet your needs.
We’re now ready to crack open our src/xmlparser/MainDriver.java to start adding our parse logic.
package xmlparser;
import java.io.*;
import java.util.*;
import org.dom4j.*;
import org.dom4j.io.*;
public class MainDriver {
public static void main(String[] args) {
}
public static void printRecursive(Element element) {
}
public static Document readFile(String filename) throws Exception {
}
}
To further explain our intentions, the readFile(String filename)
function will load the data.xml file and return it as a Document
object for further parsing. The printRecursive(Element element)
function will iterate through each node of the XML and print it out if it contains text. All levels of the XML will be iterated through.
So let’s start with readFile(String filename)
:
public static Document readFile(String filename) throws Exception {
SAXReader reader = new SAXReader();
Document document = reader.read(new File(filename));
return document;
}
Nothing really to the above code. In fact, I pulled most of it from the dom4j quick-start code.
The printRecursive(Element element)
function is where things get more complex:
public static void printRecursive(Element element) {
for(int i = 0, size = element.nodeCount(); i < size; i++) {
Node node = element.node(i);
if(node instanceof Element) {
Element currentNode = (Element) node;
if(currentNode.isTextOnly()) {
System.out.println(currentNode.getText());
}
printRecursive(currentNode);
}
}
}
Some of the above code was taken from the dom4j quick-start, but the rest is some custom work. We are basically looking at each node and trying to visit any available children. If none exist, bail out. We also only want to print if there is text.
Finally, we’re looking at the main(String[] args)
function to bring it all together:
public static void main(String[] args) {
try {
Element root = readFile("data.xml").getRootElement();
printRecursive(root);
} catch (Exception e) {
e.printStackTrace();
}
}
Just like that we’ve printed our each node of our XML document.
In case you’re interested in the build.xml code, it can be seen below:
<project>
<property name="lib.dir" value="libs" />
<property name="jar.dir" value="build/jar" />
<property name="jar.name" value="XMLParser.jar" />
<path id="classpath">
<fileset dir="${lib.dir}" includes="**/*.jar"/>
</path>
<target name="clean">
<delete dir="build"/>
</target>
<target name="compile" depends="clean">
<mkdir dir="build/classes"/>
<javac srcdir="src" destdir="build/classes" classpathref="classpath"/>
</target>
<target name="build" depends="compile">
<mkdir dir="build/jar"/>
<jar destfile="${jar.dir}/${jar.name}" basedir="build/classes">
<zipgroupfileset dir="libs" includes="*.jar"/>
<manifest>
<attribute name="Main-Class" value="xmlparser.MainDriver"/>
</manifest>
</jar>
</target>
<target name="run">
<java jar="${jar.dir}/${jar.name}" fork="true"/>
</target>
<target name="buildandrun" depends="build, run" />
</project>
To test the project you’d just run ant buildandrun
from your command prompt or Terminal. Assuming of course you have Apache Ant configured correctly.
The dom4j library is very thorough so I recommend have a look at the Javadocs that go with it.