Java: Always explicitly specify which XML parser to use
There is the following design error in Java (at least in Servlets):
- A server may serve multiple applications; each application may use different libraries or even different versions of the same library, "side by side".
- XML parsers, transformers (XSLT), etc., have a standard interface, and there may be different implementations of this interface from different vendors, open-source projects, etc.
- Which XML parser, transformed etc. is actually used depends on a global system variable.
And it's point 3 that's the problem really. Points 1 and 2 are debatable: they certainly bring advantages, but they certainly bring complexity too.
I just had the problem that one of my web applications stopped working, but only intermittently. Restarting the server led to everything being OK, but later things would not be OK. I do hate environments where everything appears to work, yet in fact doesn't. I mean how do you know when you're "done" in such an environment? (Or how do you even know you are in such an environment?)
The bug was caused by:
- Application one used the default XML parser, and didn't have any extra JARs (libraries) for reading XML
- Application two required a special XML parser, set the global variable so it would be used, and included the JARs necessary for the special XML parser
So when a request came to application 1, after a request had come to application 2, then the system would try to instantiate the special XML parser within application 1 (specified in the global variable set by application 2), but wouldn't find it, as it wasn't deployed in application 1 (and applications can't use one another's libraries, due to feature #1).
This seems obvious when one describes it, but looking at the logs, on a live server, with the system down and the clock ticking? – Far from obvious.
So now, I assert, every time you want to create an XML parser, do the following:
If you require a special XML library, use:
System.setProperty("javax.xml.parsers.DocumentBuilderFactory", "org.apache.xerces.jaxp.DocumentBuilderFactoryImpl"); DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); ...
If you require the standard XML library, use:
Properties systemProperties = System.getProperties(); systemProperties.remove("javax.xml.parsers.DocumentBuilderFactory"); System.setProperties(systemProperties); DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); ...
There is also the possibility to pass a parameter to DocumentBuilderFactory to specify which XML parser technology to use. That's a good option too, as it wouldn't "corrupt" this global variable ("system property"). However I think one should be defensive, and always delete the global variable if one wishes to use the standard XML parser, and therefore it doesn't matter if this global variable gets corrupted or not.
Never do the following:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
This simply relies on whichever XML parser is currently set in the global variable. You have no way to guarantee that some other application running on the same server won't set the global variable to use an XML parser you don't have installed in your application. Even if you have control of the server and all applications, you don't know what software you'll be writing in the future. (In this case I installed a new application to a server which'd been running fine for 1 year, but due to setting the global variable, the old application broke..)
The same applies for all those other "factory" situations such as TransformerFactory.newInstance() etc.
This feels all quite inelegant to me, and has just cost me a lot of time, and it's not as if I'm so new to programming Java. I am wondering if there is a better way to approach it? Or is Java just broken in this particular respect?
P.S. This is not the only thing that went wrong with the old application today. I upgraded from Java 5 to Java 6 and suddenly some XML was not compliant against a schema according to Java – I had hit this error.