Friday, May 13, 2011

Using JAX-B to parse XML files without DTD or XSD

XML parsing is one common task for any developer, specially in Java. For any beginner looking how to do this it's very easy to find on internet lots of articles showing how to parse XML using a DOM parser or worse, a SAX parser!

Both DOM and SAX parsers are very powerful and general tools but even to parse simple things you need to write tons of code. For simple XML parsing (and writing) Java has the JAX-B API. JAX-B stands for Java Architecture for Xml Binding.

Even though there are tons of JAX-B tutorials on the internet, most of them start by building a XSD file and generating with the IDE, a maven plugin, (or another tool), the matching object model for the XML structure, and automatically get the binding of the classes by adding some Java annotations.

There are some situations where we don't want to do this:
  • We don't have the DTD or XSD available.
  • The DTD or XSD is outdated.
  • We really don't want to write a DTD or XSD for a simple domain-specific file that we won't re-use!
Using JAX-B is pretty straightforward but it has some concepts and terms we need to know first:
  • Marshalling: Is the process of converting a Java object into an XML representation.
  • Unmarshalling: Is the process of converting an XML file into a Java object.
For that matter, Java provides us with two interfaces and one factory class:
  • Marshaller
  • Unmarshaller
  • JAXBContext
And also it provides a huge set of annotations to allow us to bind a data model with an XML representation. So let's get started!

The following snippet show's the XML file we will be parsing:

<?xml version="1.0" encoding="UTF-8"?>
<movie-backup-index>
    <file-info lastBackup='2011-04-02' totalMovies='4'/>
    <media-list>
        <disc label="disc1">
            <movie>The Hulk</movie>
            <movie>Thor</movie>
        </disc>
        <disc label="disc1">
            <movie>Iron Man</movie>
            <movie>Captain America</movie>
        </disc>
    </media-list>
</movie-backup-index>

So the first thing we need is a class that will contain this structure, let's call it MovieIndex, and we'll add a few annotations to it. Since we want to work with JavaBeans we'll tell JAX-B to use the getters and setters of the class:
@XmlRootElement(name="movie-backup-index")
@XmlAccessorType(XmlAccessType.PROPERTY)
public class MovieIndex implements Serializable {

}

Now we have our first step, next we need to model the file information tag. We can do several things like adding an internal class or creating a new public class (or other ways of defining a class) but since this is a potential API ;) I'll create another public class: MovieIndex Info, and will add some annotations to it:

@XmlRootElement(name="file-info")
@XmlAccessorType(XmlAccessType.PROPERTY)
public class MovieIndexInformation implements Serializable {
    private Date lastBackup;
    private Integer totalMovies;
    public void setLastBackup(Date lastBackup) {
        this.lastBackup = lastBackup;
    }
    public void setTotalMovies(Integer totalMovies) {
        this.totalMovies = totalMovies;
    }
    @XmlAttribute
    public Date getLastBackup() {
        return lastBackup;
    }
    @XmlAttribute
    public Integer getTotalMovies() {
        return totalMovies;
    }
}


With the XmlAttribute we instruct JAX-B to read the data as an attribute instead of a child element. Now we need to create a model for our backup media and add some annotations:

@XmlRootElement(name="disc")
@XmlAccessorType(XmlAccessType.PROPERTY)
public class Media implements Serializable {
    private String label;
    private List<String> movies;
    @XmlAttribute
    public String getLabel() {
        return label;
    }
    @XmlElement(name="movie")
    public List<String> getMovies() {
        return movies;
    }
    public void setLabel(String label) {
        this.label = label;
    }
    public void setMovies(List<String> movies) {
        this.movies = movies;
    }
}


Ok, now we have all we need so we can add all the remaining elements to our MovieIndex class:

@XmlRootElement(name="movie-backup-index")
@XmlAccessorType(XmlAccessType.PROPERTY)
public class MovieIndex implements Serializable {
    private MovieIndexInformation fileInfo;
    private List<Media> media;
    
    @XmlElement(name="file-info")
    public MovieIndexInformation getFileInfo() {
        return fileInfo;
    }
    
    @XmlElement(name="disc")
    @XmlElementWrapper(name="media-list")
    public List<Media> getMedia() {
        return media;
    }
    public void setFileInfo(MovieIndexInformation fileInfo) {
        this.fileInfo = fileInfo;
    }
    public void setMedia(List<Media> media) {
        this.media = media;
    }
}

We've used the XmlElementWrapper annotation because we have an extra element that wraps our list of discs, so to avoid creating a new type with only a list, we have this annotation.

So we're ready to parse!! In order to quickly test, I've added the toString methods on the model objects, (I won't show it here because it's Netbeans-generated.

So now we need to add the boilerplate code to bootstrap the JAXBContext and to un-marshall our file:

public class JAXBDemoParse {
    /**
    * In this Main method we will be parsing an XML file into a Java Object using
    * the JAXB library included in the JDK.
    * @param args
    */
    public static void main(String[] args) throws Exception {
        //bootstrap the context.
        JAXBContext context = JAXBContext.newInstance(MovieIndex.class);
        //create an unmarshaller.
        Unmarshaller unmarshaller = context.createUnmarshaller();
        //parse the xml file!
        InputStream is = JAXBDemoParse.class.getResourceAsStream("demoXML.xml");
        MovieIndex index = (MovieIndex) unmarshaller.unmarshal(is);
        System.out.println(index);
    }
}

So finally!! We ended up loading all the data from the XML file into the object in only 3 lines of code!! and some annotations. Here is the program output, please notice how the dates on the xml attribute got parsed the right way, JAXB handled all the data conversion.

 MovieIndex{
  fileInfo=MovieIndexInformation{
      lastBackup=Sat Apr 02 00:00:00 GMT-03:00 2011,
      totalMovies=4},
  media=[
      Media{
          label=disc1,
          movies=[The Hulk, Thor]},
      Media{
          label=disc1,
          movies=[Iron Man, Captain America]
      }]
}

I've added some enters and tabs to the output so it can be better read by humans.

So that's all for now. One final note, if you would like to marshall the object to an XML file you ask the JAXBContext for a marshaller and just call the method with the object and an output stream.

6 comments:

  1. Nice and well prepared article.. thank u ...

    ReplyDelete
  2. Thanks for the comment, I've updated the highlighting of the code snippets.

    ReplyDelete
  3. "JAX-B stands for Java And Xml Binding." No it doesnt.. http://en.wikipedia.org/wiki/Java_Architecture_for_XML_Binding

    ReplyDelete
    Replies
    1. Another good way of generating the JAXB stubs is to get the xml response/snippet and generate the xsd from it (Using any IDE). Then you can use xjc compiler to generate the bindings (from the xsd). This saves you writing them yourself.

      Delete
    2. Hi, thanks for the errata I will change it right away. Also I'm aware of the automated tools but the intent of the post was to illustrate the use of the annotations.

      Delete