Skip to content

Overview

This guide introduces the Federated Data Quality Framework (FDQF) and its tooling.
It is intended for users who want to deploy, operate, or understand the system at a high level.

For deeper technical information, configuration guides, and deployment steps, please refer to the additional sections in the documentation menu.


What is the FDQF?

The Federated Data Quality Framework (FDQF) provides a structured approach and supporting software to assess and improve data quality in federated environments — situations where data remains distributed across independent locations due to legal, organizational, or technical constraints.

Instead of moving data to a central location, the FDQF enables local data quality analysis and privacy-preserving reporting across multiple sites.

Key Components

The FDQF consists of two main software components:

Data Quality Agent (DQA)

Runs locally at each data-holding site.

Responsibilities:

  • Executes predefined, domain-specific data quality checks
  • Converts human-readable rules into machine-readable queries
    (currently supported: HL7 CQL — Clinical Quality Language)
  • Queries the connected database locally
  • Generates aggregated and privacy-preserving results
  • Pushes data quality reports to the central server (one-way; no data pull)

Privacy-First Approach

If you would like to learn how do we guarantee privacy preservation/anonymization of the shared results see the Privacy page.

Data Quality Server (DQS)

Runs centrally to collect and present results.

Responsibilities:

  • Receives quality reports from multiple sites
  • Aggregates results across the network
  • Provides dashboards and views for end-users
  • Provides a REST API for including the reports in external Dataset catalogues
  • Enables researchers or study investigators to evaluate multi-site data quality without accessing raw data

Why Federated Data Quality?

Traditional data quality validation typically requires central access to raw datasets.
In many real-world environments — such as healthcare or regulated research settings — this is not possible.

The FDQF enables:

  • Local processing at each data site
  • Privacy-preserving quality metrics
  • Cross-site comparison without data sharing
  • Standardized and reproducible quality evaluation
  • Support for federated biomedical and research infrastructures

Who Is This For?

The FDQF is designed for:

  • Research networks and data consortia
  • Healthcare institutions
  • Federated biobanking and cohort infrastructures
  • Privacy-sensitive or regulated environments
  • Projects requiring data quality transparency without data transfer

Roles that may use the system include:

  • Data Stewards
  • Study Principal Investigators
  • Clinical Researchers
  • Data Engineers and IT Operators

In Summary

ConceptDescription
Federated approachData stays at each site — processing happens locally
Local AgentExecutes checks and generates privacy-preserving metrics
Central ServerCollects, aggregates, and displays results
GoalEnable cross-site data quality assessment without raw data sharing

The FDQF ensures trust, privacy, and transparency in multi-institution data ecosystems by combining local computation with centralized insight.

See for yourself

If you would like to experiment with the tooling please go to the Getting Started page.

Licensed under the GNU GPL v3.0