Dimitris Gavrilis, Eleni Afiontzi, Johan Fihn, Olof Olsson, Sebastian Cuy, Achille Felicetti, Franco Niccolucci
Abstract Most infrastructure projects, both recent and ongoing, involve a data aggregation task in order to bring together the heterogeneous information one expects to see in a typical EU landscape. The main reason for this is the plethora of technologies, standards, languages and practices that is found in the EU. Data aggregation typically includes the homogenization of heterogenous data through some kind of process that includes: ingestion, normalization, transformation and validation processes. The European funded project Ariadne (http://www.ariadne-infrastructure.eu/) aims at true integration of data by modelling the underlying domain and provding the technical framework for automatic integration of heterogeneous resources.
This infrastructure, comprises of a set of heterogeneous technologies such as: a metadata aggregator, including a set of enrichment and data integration micro-services, an RDF store with reasoning capabilities (through SPARQL), and a powerful indexing mechanism. The output of this process is published to a portal which can provide useful information to a variety of potential users ranging from simple visitors to domain researchers.
The data integration services can mine for links among resources, link them together and against language resources such as vocabularies. Complex records can be split into their individual components, represented, enriched and stored separately while maintaing their identity using semantic linking. These individual components are represented in the underlying model (ACDM) and include agents, language resources, datasets, collections, reports, databases, etc. Each integrated resource is assigned a URI and is published in RDF. This practice enables knowledge mining, semantic queries and reasoning engines which are provided within the project (e.g. SPARQL engine and Jena).
The technical infrastructure has been developed using various programming languages such as Java, PHP, Javascript, it is distributed spanning multiple virtual machines and brings together different established technologies and components. The portal is based on the Laravel PHP framework and uses ElasticSearch search engine to collect and browse through the data. Both the technical infrastructure and the portal will be presented and demonstrated in more detail.