Article
XML (eXtensible Markup Language) remains a fundamental data format for configuration, web services, and data interchange. Java provides several APIs for XML parsing, with DOM (Document Object Model) and SAX (Simple API for XML) being two of the most fundamental approaches. This guide explores both parsers, their use cases, and practical implementations.
Understanding the Two Parsing Models
DOM (Document Object Model)
- Tree-based: Loads entire XML document into memory as a tree structure
- Random Access: Allows navigation and modification in any direction
- Memory Intensive: Not suitable for very large XML files
- Read/Write: Supports both reading and modifying XML documents
SAX (Simple API for XML)
- Event-based: Parses XML sequentially and triggers events
- Sequential Access: Processes document from start to end
- Memory Efficient: Suitable for large XML files
- Read-Only: Only supports reading, not modification
DOM Parser: In-Memory Tree Processing
1. Basic DOM Parsing
Sample XML File (employees.xml):
<?xml version="1.0" encoding="UTF-8"?> <employees> <employee id="101"> <name>John Doe</name> <position>Software Engineer</position> <department>IT</department> <salary>75000</salary> </employee> <employee id="102"> <name>Jane Smith</name> <position>Project Manager</position> <department>Operations</department> <salary>85000</salary> </employee> <employee id="103"> <name>Bob Johnson</name> <position>QA Analyst</position> <department>IT</department> <salary>65000</salary> </employee> </employees>
Basic DOM Reader:
import org.w3c.dom.*;
import javax.xml.parsers.*;
import java.io.*;
public class DOMXmlParser {
public static void parseEmployees(String filePath) {
try {
// Create DocumentBuilder
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
// Parse XML file
Document document = builder.parse(new File(filePath));
// Normalize the XML structure
document.getDocumentElement().normalize();
// Get root element
Element root = document.getDocumentElement();
System.out.println("Root element: " + root.getNodeName());
// Get all employee elements
NodeList employeeList = document.getElementsByTagName("employee");
System.out.println("=== Employee Details ===");
for (int i = 0; i < employeeList.getLength(); i++) {
Node employeeNode = employeeList.item(i);
if (employeeNode.getNodeType() == Node.ELEMENT_NODE) {
Element employeeElement = (Element) employeeNode;
// Get attributes
String id = employeeElement.getAttribute("id");
// Get child elements
String name = getElementValue(employeeElement, "name");
String position = getElementValue(employeeElement, "position");
String department = getElementValue(employeeElement, "department");
String salary = getElementValue(employeeElement, "salary");
System.out.printf("Employee ID: %s%n", id);
System.out.printf("Name: %s%n", name);
System.out.printf("Position: %s%n", position);
System.out.printf("Department: %s%n", department);
System.out.printf("Salary: %s%n%n", salary);
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
private static String getElementValue(Element parent, String tagName) {
NodeList nodeList = parent.getElementsByTagName(tagName);
if (nodeList.getLength() > 0) {
Element element = (Element) nodeList.item(0);
return element.getTextContent();
}
return "";
}
public static void main(String[] args) {
parseEmployees("employees.xml");
}
}
2. DOM Writing and Modification
import org.w3c.dom.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import java.io.*;
public class DOMXmlWriter {
public static void createEmployeeXml(String outputPath) {
try {
// Create DocumentBuilder
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
// Create new document
Document document = builder.newDocument();
// Create root element
Element rootElement = document.createElement("employees");
document.appendChild(rootElement);
// Add first employee
Element employee1 = createEmployeeElement(document, "101",
"John Doe", "Software Engineer", "IT", "75000");
rootElement.appendChild(employee1);
// Add second employee
Element employee2 = createEmployeeElement(document, "102",
"Jane Smith", "Project Manager", "Operations", "85000");
rootElement.appendChild(employee2);
// Add third employee
Element employee3 = createEmployeeElement(document, "103",
"Bob Johnson", "QA Analyst", "IT", "65000");
rootElement.appendChild(employee3);
// Write to file
writeDocumentToFile(document, outputPath);
System.out.println("XML file created successfully: " + outputPath);
} catch (Exception e) {
e.printStackTrace();
}
}
private static Element createEmployeeElement(Document doc, String id,
String name, String position,
String department, String salary) {
Element employee = doc.createElement("employee");
employee.setAttribute("id", id);
employee.appendChild(createTextElement(doc, "name", name));
employee.appendChild(createTextElement(doc, "position", position));
employee.appendChild(createTextElement(doc, "department", department));
employee.appendChild(createTextElement(doc, "salary", salary));
return employee;
}
private static Element createTextElement(Document doc, String tagName, String textContent) {
Element element = doc.createElement(tagName);
element.appendChild(doc.createTextNode(textContent));
return element;
}
private static void writeDocumentToFile(Document document, String filePath) {
try {
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
// Format output
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
DOMSource source = new DOMSource(document);
StreamResult result = new StreamResult(new File(filePath));
transformer.transform(source, result);
} catch (TransformerException e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
createEmployeeXml("new_employees.xml");
}
}
3. DOM Modification Example
public class DOMXmlModifier {
public static void updateEmployeeSalary(String filePath, String employeeId, String newSalary) {
try {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new File(filePath));
document.getDocumentElement().normalize();
// Find employee by ID
NodeList employees = document.getElementsByTagName("employee");
for (int i = 0; i < employees.getLength(); i++) {
Element employee = (Element) employees.item(i);
String id = employee.getAttribute("id");
if (id.equals(employeeId)) {
// Update salary
Node salaryNode = employee.getElementsByTagName("salary").item(0);
salaryNode.setTextContent(newSalary);
System.out.println("Updated salary for employee " + employeeId);
break;
}
}
// Save changes
writeDocumentToFile(document, filePath);
} catch (Exception e) {
e.printStackTrace();
}
}
private static void writeDocumentToFile(Document document, String filePath) {
try {
TransformerFactory transformerFactory = TransformerFactory.newInstance();
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
DOMSource source = new DOMSource(document);
StreamResult result = new StreamResult(new File(filePath));
transformer.transform(source, result);
} catch (TransformerException e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
updateEmployeeSalary("employees.xml", "101", "80000");
}
}
SAX Parser: Event-Based Streaming
1. Basic SAX Parser
SAX Handler Implementation:
import org.xml.sax.*;
import org.xml.sax.helpers.*;
import java.util.*;
public class EmployeeSaxHandler extends DefaultHandler {
private List<Employee> employees;
private Employee currentEmployee;
private StringBuilder currentValue;
// Event handlers
@Override
public void startDocument() throws SAXException {
employees = new ArrayList<>();
System.out.println("Start parsing document...");
}
@Override
public void endDocument() throws SAXException {
System.out.println("End parsing document.");
System.out.println("Total employees found: " + employees.size());
}
@Override
public void startElement(String uri, String localName,
String qName, Attributes attributes) throws SAXException {
currentValue = new StringBuilder();
if ("employee".equals(qName)) {
currentEmployee = new Employee();
String id = attributes.getValue("id");
currentEmployee.setId(id);
}
}
@Override
public void endElement(String uri, String localName, String qName) throws SAXException {
if ("employee".equals(qName)) {
employees.add(currentEmployee);
currentEmployee = null;
} else if ("name".equals(qName)) {
currentEmployee.setName(currentValue.toString());
} else if ("position".equals(qName)) {
currentEmployee.setPosition(currentValue.toString());
} else if ("department".equals(qName)) {
currentEmployee.setDepartment(currentValue.toString());
} else if ("salary".equals(qName)) {
currentEmployee.setSalary(currentValue.toString());
}
}
@Override
public void characters(char[] ch, int start, int length) throws SAXException {
currentValue.append(ch, start, length);
}
// Getter for parsed data
public List<Employee> getEmployees() {
return employees;
}
}
// Employee data class
class Employee {
private String id;
private String name;
private String position;
private String department;
private String salary;
// Constructors, getters, and setters
public Employee() {}
public String getId() { return id; }
public void setId(String id) { this.id = id; }
public String getName() { return name; }
public void setName(String name) { this.name = name; }
public String getPosition() { return position; }
public void setPosition(String position) { this.position = position; }
public String getDepartment() { return department; }
public void setDepartment(String department) { this.department = department; }
public String getSalary() { return salary; }
public void setSalary(String salary) { this.salary = salary; }
@Override
public String toString() {
return String.format("Employee[ID=%s, Name=%s, Position=%s, Department=%s, Salary=%s]",
id, name, position, department, salary);
}
}
SAX Parser Implementation:
import javax.xml.parsers.*;
import java.io.*;
public class SAXXmlParser {
public static void parseWithSAX(String filePath) {
try {
// Create SAXParserFactory
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
// Create handler
EmployeeSaxHandler handler = new EmployeeSaxHandler();
// Parse the XML file
saxParser.parse(new File(filePath), handler);
// Display results
System.out.println("\n=== Parsed Employees ===");
handler.getEmployees().forEach(System.out::println);
} catch (Exception e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
parseWithSAX("employees.xml");
}
}
2. Advanced SAX Handler with Filtering
public class FilteringSaxHandler extends DefaultHandler {
private List<Employee> itEmployees;
private Employee currentEmployee;
private StringBuilder currentValue;
private boolean inITDepartment = false;
private String currentElement;
@Override
public void startDocument() throws SAXException {
itEmployees = new ArrayList<>();
System.out.println("Starting filtered parsing for IT department employees...");
}
@Override
public void startElement(String uri, String localName,
String qName, Attributes attributes) throws SAXException {
currentElement = qName;
currentValue = new StringBuilder();
if ("employee".equals(qName)) {
currentEmployee = new Employee();
currentEmployee.setId(attributes.getValue("id"));
}
}
@Override
public void characters(char[] ch, int start, int length) throws SAXException {
if (currentValue != null) {
currentValue.append(ch, start, length);
}
}
@Override
public void endElement(String uri, String localName, String qName) throws SAXException {
if (currentEmployee == null) return;
switch (qName) {
case "name":
currentEmployee.setName(currentValue.toString().trim());
break;
case "position":
currentEmployee.setPosition(currentValue.toString().trim());
break;
case "department":
String dept = currentValue.toString().trim();
currentEmployee.setDepartment(dept);
inITDepartment = "IT".equals(dept);
break;
case "salary":
currentEmployee.setSalary(currentValue.toString().trim());
break;
case "employee":
if (inITDepartment) {
itEmployees.add(currentEmployee);
}
currentEmployee = null;
inITDepartment = false;
break;
}
}
@Override
public void endDocument() throws SAXException {
System.out.println("Filtered parsing completed.");
System.out.println("IT department employees: " + itEmployees.size());
itEmployees.forEach(System.out::println);
}
public List<Employee> getItEmployees() {
return itEmployees;
}
}
Performance Comparison and Analysis
Performance Test Class
import javax.xml.parsers.*;
import java.io.*;
public class ParserPerformanceTest {
public static void main(String[] args) {
String largeXmlFile = "large_employees.xml"; // 10,000+ records
System.out.println("=== Parser Performance Comparison ===");
// DOM Parser Test
long domStartTime = System.currentTimeMillis();
testDOMParser(largeXmlFile);
long domEndTime = System.currentTimeMillis();
System.out.printf("DOM Parser Time: %d ms%n", domEndTime - domStartTime);
// SAX Parser Test
long saxStartTime = System.currentTimeMillis();
testSAXParser(largeXmlFile);
long saxEndTime = System.currentTimeMillis();
System.out.printf("SAX Parser Time: %d ms%n", saxEndTime - saxStartTime);
// Memory usage comparison
Runtime runtime = Runtime.getRuntime();
runtime.gc(); // Suggest garbage collection
long memory = runtime.totalMemory() - runtime.freeMemory();
System.out.printf("Memory used: %d MB%n", memory / (1024 * 1024));
}
private static void testDOMParser(String filePath) {
try {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new File(filePath));
document.getDocumentElement().normalize();
NodeList employees = document.getElementsByTagName("employee");
System.out.printf("DOM: Parsed %d employees%n", employees.getLength());
} catch (Exception e) {
e.printStackTrace();
}
}
private static void testSAXParser(String filePath) {
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
CountingSaxHandler handler = new CountingSaxHandler();
saxParser.parse(new File(filePath), handler);
System.out.printf("SAX: Parsed %d employees%n", handler.getCount());
} catch (Exception e) {
e.printStackTrace();
}
}
}
// Simple counter for SAX performance test
class CountingSaxHandler extends DefaultHandler {
private int count = 0;
@Override
public void startElement(String uri, String localName,
String qName, Attributes attributes) {
if ("employee".equals(qName)) {
count++;
}
}
public int getCount() {
return count;
}
}
Factory Configuration and Security
Secure Parser Configuration
public class SecureXmlParser {
public static DocumentBuilder createSecureDocumentBuilder() throws ParserConfigurationException {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
// Security configurations to prevent XXE attacks
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
factory.setFeature("http://xml.org/sax/features/external-general-entities", false);
factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
factory.setXIncludeAware(false);
factory.setExpandEntityReferences(false);
// Performance configurations
factory.setNamespaceAware(true);
factory.setValidating(false);
return factory.newDocumentBuilder();
}
public static SAXParser createSecureSAXParser() throws ParserConfigurationException, SAXException {
SAXParserFactory factory = SAXParserFactory.newInstance();
// Security configurations
factory.setFeature("http://apache.org/xml/features/disallow-doctype-decl", true);
factory.setFeature("http://xml.org/sax/features/external-general-entities", false);
factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
factory.setNamespaceAware(true);
factory.setValidating(false);
return factory.newSAXParser();
}
}
Best Practices and Recommendations
When to Use DOM:
- Small to medium XML files (< 10MB)
- Need random access to any part of document
- Require modification of XML structure
- Complex XPath queries needed
- Document needs to be manipulated multiple times
When to Use SAX:
- Large XML files (> 10MB)
- Memory efficiency is critical
- Only need to read data sequentially
- Processing can be done in single pass
- Real-time streaming of XML data
Security Considerations:
- Always disable DTD processing unless specifically required
- Use secure parser factories with XXE protection
- Validate input against expected schema
- Limit entity expansion to prevent billion laughs attack
- Use whitelists for allowed elements and attributes
Performance Tips:
- Use SAX for read-only operations on large files
- Reuse parser instances when possible
- Disable validation if not needed
- Use streaming for very large files
- Consider StAX for pull parsing as a middle ground
Conclusion
Both DOM and SAX parsers have their specific use cases in Java XML processing. DOM provides a convenient in-memory representation suitable for small documents and complex manipulations, while SAX offers memory-efficient streaming for large files and simple extractions. Understanding the strengths and limitations of each approach allows developers to choose the right tool for their specific XML processing requirements, balancing factors like memory usage, performance, and functionality. For modern applications, also consider JAXB for object binding or StAX for pull parsing as additional alternatives.