Use cases
Aggregating multiple databases with Record Lineage
Aggregating multiple databases with Record Lineage enables data from different sources to be grouped and unified, while retaining the links between records and their original sources.


The need
Unify heterogeneous data sources within a single, centralized portal
Our customer wanted to publish, on a single portal, a database resulting from the pooling of records from 12 source databases.
As overlaps existed between the various source databases, deduplication was necessary to provide portal visitors with a single view of each record.
In addition, as portal users have the possibility of correcting and/or enriching the information published (=Crowdsourcing), it was necessary to maintain, for each entry in the aggregated database, a link to the corresponding record(s) in the source databases (=Record Lineage), in order to pass on corrections at source.
This use case concerned cultural sites. However, it can also be used for corporate or individual listings (CRM), product databases, etc.
Proposed solution
Intelligent database aggregation, guided by traceability and automation
Verification + geolocation* of postal addresses.
Verification of postal codes, translation of postal codes into INSEE codes.
Harmonization of data from each of the 12 source databases to obtain a single target format.
Multi-criteria (name, address) and multi-strategy (phonetic, Levenshtein distance, N-gram, etc.) deduplication .
Record Lineage: each record identifier and its original source base are retained throughout the processing chain.
Automation of the entire processing chain in both directions (source bases → aggregated base AND aggregated base → source bases) in order to propagate any updates and enrichments that may occur on either side.


Benefits
A unified, reliable and traceable database to facilitate service modernization
A single view of each record on the portal, thanks to deduplication.
The possibility for owners of the 12 source databases to crowdsource* corrections and apply them to their own database.
Up-to-date data on the portal, including both the latest modifications made in the source databases AND corrections / enrichments by crowdsourcing.
Complete automation of the process, enabling corrections to be propagated in both directions at regular intervals.
Product benefits
Integrate
Seamless integration of AI and No-Code technology for efficient data refinement.
Collaborate
Collaborate across teams to ensure complete data quality.
Shareable
Go beyond data transformation scripts for a readable, shareable project.
View
Move from the "black box" of scripting to an intuitive, documented visualization of all your data transformations.
Powerful
Use powerful dashboards to simplify analysis and drive continuous improvement.
Quality
Achieve superior data quality faster and at lower cost, while demonstrating the tangible value of investing in data excellence.
Intuition
Get a complete, intuitive visual representation of the data journey, from raw data to finalized data products.
Organized
Align your organization with data quality initiatives.
Aligned
Easily align your business with ever-changing regulatory requirements.
Ready to improve your data quality?
Join leading companies who are improving the reliability, compliance and efficiency of their data.