The Android SDK provides two Java XML parsers that most developers are accustomed to: DOM (Document Object Model) and SAX (Simple API for XML). A while back, the Android SDK was enhanced to include an XML Pull Parser, with a note that the change provided "higher performance for small memory applications", such as portable devices.
I had initially assumed that meant the Pull parser would be the XML parser of choice for the Android platform. Otherwise, why would the developers of the Android platform include it so early on with such a change description? Before I began the testing, I was expecting the performance of the Pull parser to be the fastest of the three methods available. I had already begun using it as the parser of choice. But was it really the right choice? And, if so, how much faster was it? I had to know.
Understanding How the Different Parsers Work
A DOM parser works by parsing an XML file into a native data structure matching the hierarchy of the XML file. Most of the processing is done up front and the entire file is looked at, so a DOM parser typically uses the most memory of the three parsers. A SAX parser works by having the user implement a class with method handlers for various events, such as finding tags or attributes. A Pull Parser works by creating a loop that continually requests the next event and can then handle that event directly within the loop. The idea with the Pull Parser is that it can easily be stopped at any point, only do processing on demand, and remove the overhead of extra method calls and classes.
Although each of the three parsers is different, they all have the same fundamentals: they are designed for parsing XML. This means that the code for handling events, such as finding and interpreting the tag data and attributes, can remain basically the same across all three implementations. Since this code will remain the same, any performance differences should be due to differences in the parsing algorithms, including their use of memory, which may cause Java garbage collection to run more frequently.
Crafting a Reasonable Performance Test for Parser Performance
I decided a fun comparison exercise for all three parsers would be to parse a typical GPX XML file. I chose GPX because of the popularity of location-based services on mobile and the prevalence of the file format amongst GPS units. The GPX format often has a relatively large number of records-at least when compared to something like a news feed which may only ever have the latest twenty items. Additionally, many GPS devices support this format. Each record is relatively simple and contains just a few pieces of data.
Here is an example GPX-format record:
<TRKPT LAT="43.95048" LON="-71.0852">
Since I was using a real-world XML file format, I originally thought I'd use files of real world sizes. For instance, a GPS unit that collects a location record each minute for a week will generate a GPX file with just over 10,000 records. As it turns out, my quick and dirty parsers were a bit slow for that size file, so I paired my results down and used 3 different file sizes:
- Small size of 70 records
- Medium size of 560 records
- Large size of 2,796 records
I ran each of the tests between two and six times to help factor out other events that might be going on with the handset. All tests were run on a production T-Mobile G1 handset with Android 1.1.