I recently led a project where the plant asked a deceptively simple question: "Why do our batch yields reported in MES, SCADA, and the ERP never agree?" The catch: they wanted automatic reconciliation without modifying any of the three source systems. If you’ve been in factories long enough, you know this is a common pain — and a hard one to fix unless you treat it as a data-integration and process problem rather than a single-system bug.
Below I share a pragmatic, vendor‑agnostic approach I’ve used to reconcile batch yields automatically across MES, SCADA, and legacy ERP systems while leaving source systems untouched. The pattern works for brownfield sites where touching production systems is risky, expensive, or blocked by governance. It focuses on data capture, normalization, reconciliation logic, and operational workflows — everything needed for a production‑grade solution.
Why reconciliation is hard (and what to accept up front)
Before designing the solution, be explicit about the root causes you’ll encounter. I always start by listing these so stakeholders have realistic expectations:
- Different measurement points: MES measures batch completion, SCADA measures processed mass/flow, ERP records material issued/consumed at business transaction times.
- Timing and latency: SCADA events are near real‑time, MES may buffer transactions, ERP posts at end of shift or during batch settlement.
- Unit and granularity mismatches: SCADA measures liters, MES uses kilograms, ERP uses purchased unit (cases).
- Master data drift: Part codes, recipe names, and lot IDs are not identical across systems.
- Human corrections: Manual overrides, scrap entries, and adjustments appear only in one system.
Accepting these realities helps you design reconciliation that flags differences intelligently, instead of chasing a mythical single "truth".
High‑level architecture pattern (without touching source systems)
The solution I recommend is a non‑intrusive reconciliation layer that sits alongside existing systems and performs the following roles:
- Capture relevant events and transactions from MES, SCADA, and ERP using read‑only adapters.
- Normalize and enrich data into a common data model.
- Run a reconciliation engine that matches records, computes deltas, and applies business rules.
- Provide dashboards and exception workflows for operators and supply chain staff to resolve differences.
- Maintain a full audit trail and automated closing logic for agreed exceptions.
Typical components I use:
- Extract layer: read‑only connectors (OPC UA, MQTT, JDBC, REST APIs, FTP, flat file pickers). Tools: Apache NiFi, Talend, MuleSoft, custom Python connectors for proprietary APIs.
- Message bus / staging: Kafka or Azure Service Bus to decouple sources from processing.
- Normalization and enrichment: ETL/ELT jobs — can use Spark, Databricks, or SQL-based pipelines. Enrichment includes unit conversions, master data alignment, and timestamp normalization.
- Reconciliation engine: rules engine or custom microservice that applies matching logic, threshold checks, and scoring.
- Data store: a reconciliation data mart (Postgres, SQL Server) or an OLAP store for reporting (ClickHouse, Snowflake).
- User UI and workflow: web app (React/Angular) or low-code tools for exceptions, approvals, and adjustments. BI for dashboards (Power BI, Qlik, Tableau).
Step‑by‑step implementation checklist
Here is the step sequence I use when delivering these projects. It keeps risk low and delivers value early.
- 1. Identify minimal required fields — don’t try to capture everything. For batch yield reconciliation you typically need: batch ID, product ID, start/end timestamps, produced quantity, scrap quantity, unit of measure, and source system identifier.
- 2. Implement read‑only extraction — use vendor APIs or database read replicas. For SCADA, OPC UA or historian exports (PI, OSIsoft) work well. For legacy ERPs a scheduled export (CSV/SFTP) is acceptable if APIs aren’t available.
- 3. Normalize master data — build a cross‑reference table for part codes, units, and recipe names. This is the most important non‑technical item: business owners must validate mapping rules.
- 4. Timestamp normalization — convert to a single timezone and produce event canonical timestamps. Use both event‑time and ingest‑time to troubleshoot latency issues.
- 5. Matching strategy — create deterministic and fuzzy matching rules: exact batch ID match, fallback to time window + product + operator, then fuzzy string match on recipe names. Score matches and set acceptance thresholds.
- 6. Apply yield calculation and variance rules — compute yield by consistent formula (e.g., (produced − scrap) / input). Store raw inputs so auditors can re‑compute with different formulas.
- 7. Exception classification and automated closing — auto‑close small variances under a configurable threshold; flag larger differences for manual review and routing to the right role (plant engineer, logistics, ERP analyst).
- 8. Provide a reconciliation dashboard and workflow — show batches with deltas, history of adjustments, and required actions. Integrate with ticketing (ServiceNow, Jira) if you need traceability.
- 9. Implement audit trail and immutability — store original events, normalized records, and reconciliation decisions with who/when/why metadata.
- 10. Monitor and improve — instrument pipeline health metrics, match‑rate trends, and most common exception causes. Iterate on matching rules and master data maps.
Example reconciliation data model
A compact table of the core dataset helps when communicating with IT and operations. Below is an example schema I usually implement:
| Field | Description |
|---|---|
| source_system | MES | SCADA | ERP |
| source_batch_id | Original batch or lot identifier from source |
| canonical_batch_id | Unified batch ID after matching |
| product_id | Canonical product code |
| quantity_produced | Numeric; stored in canonical unit |
| quantity_scrap | Numeric; canonical unit |
| timestamp_start | Event time normalized |
| timestamp_end | Event time normalized |
| reconciliation_status | Matched | Unmatched | Exception | Auto‑closed |
| delta | Difference between systems for yield or quantity |
| audit_log | JSON of decisions and manual actions |
Matching rules and heuristics that work in practice
From my experience, a layered approach gives the best results:
- Layer 1 — Exact ID match: If batch/lot IDs are identical and product codes line up, accept with a high confidence score.
- Layer 2 — Time window + product: Match MES batch to SCADA runs within a configurable time window (e.g., batch end ± 15 minutes) and same canonical product ID.
- Layer 3 — Quantity + fuzzy metadata: If IDs differ, compare expected quantity ranges, recipe name similarity (Levenshtein distance), and operator/line identifiers.
- Layer 4 — Manual escalation: If none of the above exceed thresholds, classify as unmatched and surface to an operator with the most likely candidates.
Operationalizing exceptions and business rules
People will ask: when should the system auto‑close a difference vs. require manual approval? I recommend two knobs:
- Absolute and relative thresholds — e.g., auto‑close if delta < 2 units OR delta < 0.5% of produced quantity.
- Exception categories — auto‑close for measurement noise, require review for material loss or ERP posting differences.
Make the thresholds configurable per product family and plant. In one project I set tighter thresholds for expensive active ingredients and looser ones for bulk packaging lines.
Tools and quick wins
If you want to prototype quickly:
- Use Apache NiFi for low‑code extraction and flow orchestration.
- Stream events to Kafka for resilience and replayability.
- Implement normalization and matching in Python (pandas) or a Spark job, and store results in Postgres for a simple web UI.
- Dashboard with Power BI or Grafana for operations, and a small React app for exception workflows.
For enterprise rollouts, consider managed services (Azure Data Factory + Synapse, AWS Glue + Redshift) or an integration platform (MuleSoft) to reduce maintenance burden.
If you want, I can sketch a lightweight data flow diagram or provide a sample SQL query for matching MES vs SCADA batches based on time window + product + quantity tolerance. Tell me which systems you're integrating (names and available export methods) and I’ll tailor the matching rules and mapping checklist to your stack.