Use cases

Aggregating multiple databases with Record Lineage

Aggregating multiple databases with Record Lineage enables data from different sources to be grouped and unified, while retaining the links between records and their original sources.

Request a demo
Record-lineage-Data
diagram-connection-source-data

The need

Unify heterogeneous data sources within a single, centralized portal

Our customer wanted to publish, on a single portal, a database resulting from the pooling of records from 12 source databases.

As overlaps existed between the various source databases, deduplication was necessary to provide portal visitors with a single view of each record.

In addition, as portal users have the possibility of correcting and/or enriching the information published (=Crowdsourcing), it was necessary to maintain, for each entry in the aggregated database, a link to the corresponding record(s) in the source databases (=Record Lineage), in order to pass on corrections at source.

This use case concerned cultural sites. However, it can also be used for corporate or individual listings (CRM), product databases, etc.

Proposed solution

Intelligent database aggregation, guided by traceability and automation

Verification + geolocation* of postal addresses.

Verification of postal codes, translation of postal codes into INSEE codes.

Harmonization of data from each of the 12 source databases to obtain a single target format.

Multi-criteria (name, address) and multi-strategy (phonetic, Levenshtein distance, N-gram, etc.) deduplication .

Record Lineage: each record identifier and its original source base are retained throughout the processing chain.

Automation of the entire processing chain in both directions (source bases → aggregated base AND aggregated base → source bases) in order to propagate any updates and enrichments that may occur on either side.

process-aggregation-data-mayor
Illustration Tale of Data website - 2025-03-19T155653.595

Benefits

A unified, reliable and traceable database to facilitate service modernization

A single view of each record on the portal, thanks to deduplication.

The possibility for owners of the 12 source databases to crowdsource* corrections and apply them to their own database.

Up-to-date data on the portal, including both the latest modifications made in the source databases AND corrections / enrichments by crowdsourcing.

Complete automation of the process, enabling corrections to be propagated in both directions at regular intervals.

Product benefits

Ready to improve your data quality?

Join leading companies who are improving the reliability, compliance and efficiency of their data.