Improving Enterprise Data Governance Through Ontology and Linked Data

R.J DeStefano, Pace University


In the past decade, the role of data has increased exponentially from being the output of a process, to becoming a true corporate asset. As the business landscape becomes increasingly complex and the pace of change increasingly faster, companies need a clear awareness of their data assets, their movement, and how they relate to the organization in order to make informed decisions, reduce cost, and identify opportunity. The increased complexity of corporate technology has also created a high level of risk, as the data moving across a multitude of systems lends itself to a higher likelihood of impacting dependent processes and systems, should something go wrong or be changed. The result of this increased difficulty in managing corporate data assets is poor enterprise data quality, the impacts of which, range in the billions of dollars of waste and lost opportunity to businesses. Tools and processes exist to help companies manage this phenomena, however often times, data projects are subject to high amounts of scrutiny as senior leadership struggles to identify return on investment. While there are many tools and methods to increase a companies’ ability to govern data, this research stands by the fact that you can’t govern that which you don’t know. This lack of awareness of the corporate data landscape impacts the ability to govern data, which in turn impacts overall data quality within organizations. This research seeks to propose a means for companies to better model the landscape of their data, processes, and organizational attributes through the use of linked data, via the Resource Description Framework (RDF) and ontology. The outcome of adopting such techniques is an increased level of data awareness within the organization, resulting in improved ability to govern corporate data assets. It does this by primarily addressing corporate leadership’s low tolerance for taking on large scale data centric projects. The nature of linked data, with it’s incremental and de-centralized approach to storing information, combined with a rich ecosystem of open source or low cost tools reduces the financial barriers to entry regarding these initiatives. Additionally, linked data’s distributed nature and flexible structure help foster maximum participation throughout the enterprise to assist in capturing information regarding data assets. This increased participation aids in increasing the quality of the information captured by empowering more of the individuals who handle the data to contribute. Ontology, in conjunction with linked data, provides an incredibly powerful means to model the complex relationships between an organization, its people, processes, and technology assets. When combined with the graph based nature of RDF the model lends itself to presenting concepts such as data lineage to allow an organization to see the true reach of it’s data. This research further proposes an ontology that is based on data governance standards, visualization examples and queries against data to simulate common data governance situations, as well as guidelines to assist in its implementation in a enterprise setting. The result of adopting such techniques will allow for an enterprise to accurately reflect the data assets, stewardship information and integration points that are so necessary to institute effective data governance.

Subject Area

Information Technology|Information science

Recommended Citation

DeStefano, R.J, "Improving Enterprise Data Governance Through Ontology and Linked Data" (2016). ETD Collection for Pace University. AAI10097925.



Remote User: Click Here to Login (must have Pace University remote login ID and password. Once logged in, click on the View More link above)