A Modular Integrated Syntactic/Semantic XML Data Validation Solution

Christian Martinez, Pace University


A Modular Integrated Syntactic/Semantic XML Data Validation Solution Data integration between disparate systems can be difficult when there are distinct data formats and constraints from a syntactic and semantic perspective. Such differences can be the source of miscommunication that can lead to incorrect data interpretations. What we propose is to leverage XML as means to define not only syntactic constraints, but also semantic constraints. XML having been widely adopted across multiple industries, is heavily used as a data protocol. However, commonly used XML parsers have only embedded syntactic validation. In other words, if semantic constraints are needed, these come into play after a parser has validated the XML message. Furthermore, semantic constraints tend to be declared inside the client system, either in the code itself or in some other form of persistent storage such as a database. Our solution to this problem is to integrate the syntactic and semantic validation phases into a single parser. This way, all syntactic and semantic rules can be configured outside the client system. For our purposes, semantic rules are defined as co-constraints. Co-constraints are when the value, presence or absence of an element or attribute is dependent on the value, presence or absence of another element or attribute in the same XML message. Using this same concept, we have also built a parser that, based on co-constraints, can express business constraints that transcend the message definition. Our research provides a reusable modular middleware solution that integrates syntactic and semantic validation. We also demonstrate how the same semantic validating parser can be used to execute business rules triggered by semantic rules. Combining semantic and syntactic validation in the same XML parser or interpreter is a powerful solution to quick integration between disparate systems. A key of our proposal is also to have the syntax definition and semantic definitions separate, allowing them to evolve independently. One can imagine how syntax might not change between systems, but the semantic constraints can differ between message consumers.

Subject Area

Computer science

Recommended Citation

Martinez, Christian, "A Modular Integrated Syntactic/Semantic XML Data Validation Solution" (2016). ETD Collection for Pace University. AAI10128879.



Remote User: Click Here to Login (must have Pace University remote login ID and password. Once logged in, click on the View More link above)