Abstract : Enterprise analytics environments face critical scalability challenges with manual and reactive data quality programs that cannot accommodate cloud-native, sub-hourly data integration pipelines ingesting data from hundreds of sources. This paper presents a self-healing data pipeline architecture comprising declarative quality rules, observability-driven telemetry, lineage-aware diagnosis, and orchestration-integrated remediation. The approach implements declarative rules for uniform handling of data quality across datasets, columns, keys, and relationships. A formal taxonomy of failure modes, including freshness, completeness, schema, semantic, constraint, duplicative, and relationship failures, provides a shared lexicon applicable to both automated repairs and manual remediation. Remediation strategies, including quarantine, selective replay, targeted backfill, and contractual rollback employ guardrail policies to ensure reversibility, auditability, and manageable cold start constraints. The framework emphasizes structural properties such as idempotency, determinism, explicit data contracts, incremental checkpointing, and promotion workflows that enable safe automated remediation without proprietary platform lock-in. Trust in automation is built incrementally through detection pilots, controlled remediation progression, and scale, supported by standardized platform-level structures that reduce organizational effort and maintain auditability for large engineering teams
Full article