Integrated Syntactic/Semantic XML Data Validation with a Reusable Software Component

Steven Golikov, Pace University


Data integration is a critical component of enterprise system integration, and XML data validation is the foundation for sound data integration of XML-based information systems. Since B2B e-commerce relies on data validation as one of the critical components for enterprise integration, it is imperative for financial industries and e-commerce systems to face this complex integration challenge. Due to the complexity of the validation process and the need for flexibility in supporting validation requirements in a reusable way, such validations are usually based on declarative XML schemas and are separated from the applications’ business logic proper. XML data depends on special schemas like Schematron, which is the most expressive for semantic co-constraints. However, Schematron’s design and expressiveness, based on Extensible Stylesheet Language Transformations (XSLT), have been considerably limited by its current implementation. It is also difficult to extend it to cover new constraints since the XSLT implementation of Schematron lacks support for straightforward invocation by external APIs. Furthermore, the current implementation cannot take advantage of the information derived from XML syntactic validation because it separates semantic and syntactic validations. This separation possibly leads to the loss of information derived from syntax validation, which leads to incorrect results for semantic validation of XML documents. This research overcomes the aforementioned limitations by proposing an algorithm for integrated syntactic (based on DTD and XSD) and semantic (based on Schematron) validations, and by presenting a reusable software component implementing this integrated validation process. This new validation component expands on the capabilities of Schematron v1.5/ISO and is implemented with DOM Level 3 XPath using Java J2SE 7. This component can also interact with its environment with event-driven loose-coupling, which is necessary for its seamless integration with its containing applications. The primary advantage of this component is its high extensibility for working as a test-bed for research on supporting new co-constraints and dynamic constraints across multiple XML documents. The result of this research can be utilized in various industry domains including e-commerce, government, and financial industries for XML constraint validation.

Subject Area

Educational technology|Computer science

Recommended Citation

Golikov, Steven, "Integrated Syntactic/Semantic XML Data Validation with a Reusable Software Component" (2013). ETD Collection for Pace University. AAI3536869.



Remote User: Click Here to Login (must have Pace University remote login ID and password. Once logged in, click on the View More link above)