Introduction to Geospatial PDF
Geospatial PDF (also known as GeoPDF) is a specialized PDF format that contains geographic coordinate information, allowing users to interact with map data directly in PDF viewers. These files maintain both the visual representation of maps and their underlying geospatial data, enabling measurements, coordinate queries, and GIS integration.
Java provides several powerful libraries for working with Geospatial PDFs, each with different capabilities and use cases.
Key Java Libraries for Geospatial PDF
1. Apache PDFBox with Geospatial Extensions
Apache PDFBox is the most popular open-source Java library for PDF manipulation and can be extended to handle geospatial content.
Maven Dependencies
<dependencies> <dependency> <groupId>org.apache.pdfbox</groupId> <artifactId>pdfbox</artifactId> <version>2.0.29</version> </dependency> <dependency> <groupId>org.locationtech.jts</groupId> <artifactId>jts-core</artifactId> <version>1.19.0</version> </dependency> <dependency> <groupId>org.geotools</groupId> <artifactId>gt-geojson</artifactId> <version>26.0</version> </dependency> </dependencies>
2. Reading Geospatial PDF Metadata
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.common.PDMetadata;
import org.apache.xmpbox.XMPMetadata;
import org.apache.xmpbox.schema.DublinCoreSchema;
import org.apache.xmpbox.schema.XMPBasicSchema;
import org.apache.xmpbox.xml.DomXmpParser;
import java.io.File;
import java.io.InputStream;
public class GeospatialPDFReader {
public void readGeoPDFMetadata(String filePath) {
try (PDDocument document = PDDocument.load(new File(filePath))) {
// Check for geospatial metadata in each page
for (PDPage page : document.getPages()) {
PDMetadata metadata = page.getMetadata();
if (metadata != null) {
InputStream inputStream = metadata.exportXMPMetadata();
DomXmpParser parser = new DomXmpParser();
XMPMetadata xmp = parser.parse(inputStream);
// Extract geospatial information from XMP
extractGeospatialInfo(xmp);
}
}
// Check document-level metadata
PDMetadata docMetadata = document.getDocumentCatalog().getMetadata();
if (docMetadata != null) {
InputStream inputStream = docMetadata.exportXMPMetadata();
DomXmpParser parser = new DomXmpParser();
XMPMetadata xmp = parser.parse(inputStream);
extractGeospatialInfo(xmp);
}
} catch (Exception e) {
e.printStackTrace();
}
}
private void extractGeospatialInfo(XMPMetadata xmp) {
// Look for geospatial schemas in XMP metadata
DublinCoreSchema dc = xmp.getDublinCoreSchema();
if (dc != null) {
System.out.println("Title: " + dc.getTitle());
System.out.println("Description: " + dc.getDescription());
}
// Custom geospatial properties would be extracted here
// This varies by how the GeoPDF was created
}
}
3. Creating Geospatial PDFs
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.common.PDRectangle;
import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;
import org.apache.pdfbox.pdmodel.graphics.state.PDExtendedGraphicsState;
import org.locationtech.jts.geom.Coordinate;
import org.locationtech.jts.geom.Geometry;
import org.locationtech.jts.geom.GeometryFactory;
import java.awt.*;
import java.io.File;
import java.util.List;
public class GeospatialPDFCreator {
private GeometryFactory geometryFactory = new GeometryFactory();
public void createGeoPDFWithMap(String outputPath,
List<Coordinate> coordinates,
String mapImagePath) {
try (PDDocument document = new PDDocument()) {
PDPage page = new PDPage(PDRectangle.A4);
document.addPage(page);
// Add map image as background
addMapImage(document, page, mapImagePath);
// Draw geospatial features
drawGeospatialFeatures(document, page, coordinates);
// Add geospatial metadata
addGeospatialMetadata(document, coordinates);
document.save(outputPath);
} catch (Exception e) {
e.printStackTrace();
}
}
private void addMapImage(PDDocument document, PDPage page, String imagePath) {
try {
PDImageXObject pdImage = PDImageXObject.createFromFile(imagePath, document);
PDPageContentStream contentStream = new PDPageContentStream(document, page);
// Scale image to fit page
contentStream.drawImage(pdImage, 0, 0,
page.getMediaBox().getWidth(),
page.getMediaBox().getHeight());
contentStream.close();
} catch (Exception e) {
e.printStackTrace();
}
}
private void drawGeospatialFeatures(PDDocument document, PDPage page,
List<Coordinate> coordinates) {
try {
PDPageContentStream contentStream = new PDPageContentStream(
document, page, PDPageContentStream.AppendMode.APPEND, true);
// Set up coordinate transformation
// This is a simplified example - real implementation would need proper projection
float scaleX = page.getMediaBox().getWidth() / 1000f; // Adjust based on your coordinate system
float scaleY = page.getMediaBox().getHeight() / 1000f;
// Draw points or polygons
contentStream.setStrokingColor(Color.RED);
contentStream.setLineWidth(2);
if (!coordinates.isEmpty()) {
Coordinate first = coordinates.get(0);
contentStream.moveTo((float)first.x * scaleX, (float)first.y * scaleY);
for (int i = 1; i < coordinates.size(); i++) {
Coordinate coord = coordinates.get(i);
contentStream.lineTo((float)coord.x * scaleX, (float)coord.y * scaleY);
}
// Close polygon if needed
contentStream.closeAndStroke();
}
contentStream.close();
} catch (Exception e) {
e.printStackTrace();
}
}
private void addGeospatialMetadata(PDDocument document, List<Coordinate> coordinates) {
// Add custom XMP metadata with geospatial information
// This would include projection information, bounds, etc.
try {
// Example metadata creation
PDMetadata metadata = new PDMetadata(document);
// Add geospatial metadata to metadata stream
document.getDocumentCatalog().setMetadata(metadata);
} catch (Exception e) {
e.printStackTrace();
}
}
}
4. Advanced Geospatial PDF Processing with GeoTools
For more sophisticated geospatial operations, integrate with GeoTools:
import org.geotools.api.referencing.crs.CoordinateReferenceSystem;
import org.geotools.geometry.jts.JTS;
import org.geotools.referencing.CRS;
import org.locationtech.jts.geom.Envelope;
import org.locationtech.jts.geom.Geometry;
public class GeoPDFTransformer {
public void transformCoordinatesForPDF(Geometry geometry,
String sourceCRS,
String targetCRS) {
try {
// Define coordinate reference systems
CoordinateReferenceSystem source = CRS.decode(sourceCRS);
CoordinateReferenceSystem target = CRS.decode(targetCRS);
// Perform coordinate transformation
Geometry transformedGeometry = JTS.transform(geometry,
CRS.findMathTransform(source, target, true));
// Now the geometry is in a CRS suitable for PDF display
processForPDFRendering(transformedGeometry);
} catch (Exception e) {
e.printStackTrace();
}
}
private void processForPDFRendering(Geometry geometry) {
// Process the transformed geometry for PDF rendering
Envelope envelope = geometry.getEnvelopeInternal();
System.out.println("Bounds: " + envelope);
// Scale coordinates to fit PDF page dimensions
// Add to PDF document...
}
}
5. Extracting Geospatial Data from PDF Annotations
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotation;
import java.io.File;
import java.util.List;
public class GeoPDFAnnotationExtractor {
public void extractGeospatialAnnotations(String pdfPath) {
try (PDDocument document = PDDocument.load(new File(pdfPath))) {
document.getPages().forEach(page -> {
try {
List<PDAnnotation> annotations = page.getAnnotations();
for (PDAnnotation annotation : annotations) {
// Check if annotation has geospatial data
if (isGeospatialAnnotation(annotation)) {
processGeospatialAnnotation(annotation);
}
}
} catch (Exception e) {
e.printStackTrace();
}
});
} catch (Exception e) {
e.printStackTrace();
}
}
private boolean isGeospatialAnnotation(PDAnnotation annotation) {
// Check annotation properties for geospatial indicators
String contents = annotation.getContents();
return contents != null &&
(contents.contains("EPSG:") ||
contents.contains("COORD:") ||
contents.contains("LAT:") ||
contents.contains("LON:"));
}
private void processGeospatialAnnotation(PDAnnotation annotation) {
// Extract coordinate information from annotation
System.out.println("Geospatial Annotation: " + annotation.getContents());
System.out.println("Position: " + annotation.getRectangle());
// Parse custom geospatial data from annotation contents
// This would be specific to how the GeoPDF was created
}
}
6. Batch Processing Geospatial PDFs
import java.io.File;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
public class GeoPDFBatchProcessor {
private ExecutorService executor = Executors.newFixedThreadPool(4);
public void processDirectory(String inputDir, String outputDir) {
try {
Files.walk(Paths.get(inputDir))
.filter(path -> path.toString().toLowerCase().endsWith(".pdf"))
.forEach(pdfPath -> processSingleFile(pdfPath, outputDir));
} catch (Exception e) {
e.printStackTrace();
}
}
private void processSingleFile(Path pdfPath, String outputDir) {
executor.submit(() -> {
try {
String filename = pdfPath.getFileName().toString();
String outputPath = outputDir + File.separator + "processed_" + filename;
GeospatialPDFReader reader = new GeospatialPDFReader();
// Process the GeoPDF and extract geospatial data
reader.readGeoPDFMetadata(pdfPath.toString());
System.out.println("Processed: " + filename);
} catch (Exception e) {
System.err.println("Error processing: " + pdfPath + " - " + e.getMessage());
}
});
}
public void shutdown() {
executor.shutdown();
}
}
Best Practices and Considerations
1. Coordinate Systems
- Always track the Coordinate Reference System (CRS) used in your GeoPDF
- Use standard EPSG codes (e.g., EPSG:4326 for WGS84, EPSG:3857 for Web Mercator)
- Implement proper coordinate transformations when needed
2. Performance Optimization
- Use streaming for large PDF files
- Implement caching for frequently accessed geospatial data
- Consider using spatial indexes for complex geometries
3. Error Handling
public class GeoPDFException extends Exception {
public GeoPDFException(String message, Throwable cause) {
super(message, cause);
}
}
// Use in your methods:
public void safeGeoPDFOperation(String filePath) throws GeoPDFException {
try {
// GeoPDF operations
} catch (Exception e) {
throw new GeoPDFException("Failed to process GeoPDF: " + filePath, e);
}
}
Conclusion
Java provides robust capabilities for working with Geospatial PDFs through libraries like Apache PDFBox, GeoTools, and JTS. While true GeoPDF support requires careful implementation of coordinate systems and metadata handling, these tools enable you to:
- Read and extract geospatial data from existing GeoPDFs
- Create new Geospatial PDFs with proper coordinate information
- Transform between different coordinate systems
- Batch process multiple GeoPDF files
- Integrate with broader GIS workflows
The key to success is understanding both PDF structure and geospatial concepts, ensuring that coordinate information is properly preserved and transformed throughout your processing pipeline.