Building a Semantic Layer for Excel: One Source of Truth for KPIs

Apr 6

Ask three analysts at the same firm to pull the EBITDA margin for a portfolio company and there is a meaningful chance you will get three different numbers. Not because any of them made an arithmetic error. Because one of them included management fees in the EBITDA bridge and the other two did not. Because the third used trailing twelve months while the first two used the fiscal year. Because the treatment of stock-based compensation as an add-back is inconsistent across two models that were built independently six months apart by people who never compared notes on the methodology. Each analyst is confident their number is correct because it is internally consistent with the model they built. The problem is not competence. The problem is that the organization never defined what EBITDA margin means, formally and centrally, in a way that every downstream calculation is obligated to reference. The definition lives in individual analysts' heads and in the implicit choices embedded in individual models, which means it is not really a definition at all — it is a collection of interpretations that happen to agree most of the time and diverge precisely when the stakes are highest.

This is the problem that a semantic layer solves, and it solves it at the architectural level rather than through better communication or stricter review processes. A semantic layer is a centralized definition layer that sits between raw data and the tools that consume it — Excel models, Power BI dashboards, client reporting templates, management presentations — and ensures that every metric, every KPI, every financial ratio is calculated from the same logic, applied to the same data, with the same adjustments, every single time it appears anywhere in the organization. The semantic layer does not just store numbers. It stores the rules for producing numbers: the formula for each metric, the data fields it draws from, the adjustments it applies, the time dimension it operates over, the entity scope it covers, and the conditions under which its calculation changes. When a downstream tool requests EBITDA margin, it does not implement its own version of that calculation. It asks the semantic layer for the result, and the semantic layer applies the canonical definition. Every output that references EBITDA margin is guaranteed to be consistent because they all draw from the same source of truth.

The practical starting point for building a semantic layer is the metrics dictionary, which is exactly what it sounds like: a structured document that defines every metric the organization uses in financial reporting, investor communication, management analysis, and regulatory submissions. Each entry in the dictionary specifies the metric's canonical name, its plain-language definition, its precise calculation formula expressed in terms of specific data fields, the data source those fields come from, any standard adjustments that are applied before the calculation runs, the time period convention the metric uses by default, the unit and scale of the output, and the business context in which the metric is used. The metrics dictionary is not a spreadsheet glossary or a footnote in a model. It is a formal, versioned, governed document that is treated with the same seriousness as a legal definition, because in the context of investor reporting and regulatory compliance, it effectively is one.

The transition from a metrics dictionary as a document to a metrics dictionary as a living computational layer is where Python becomes the enabling technology. The canonical calculation rules defined in the dictionary are implemented as Python functions — one function per metric, each accepting the required data fields as inputs and returning the calculated output with full transparency into the logic applied. These functions are stored in a shared code library, version-controlled the same way software code is version-controlled, and subject to the same change governance process that governs model changes: a proposed update to the EBITDA margin calculation must be reviewed, approved, and merged before it affects any downstream consumer. The function library becomes the authoritative implementation of the metrics dictionary, bridging the gap between the written definition and the actual number that appears in a report.

Serving this semantic layer into Excel is accomplished through the Python integration pattern that is increasingly standard in modern finance architectures. A lightweight Python script imports the metrics library, queries the underlying data store — whether that is a Parquet file, a database, or an API endpoint — applies the canonical metric functions to produce the required outputs, and writes the results into the designated cells of the Excel model using openpyxl or xlwings. The Excel model receives pre-calculated, semantically consistent metric values rather than implementing the calculation logic itself. The formulas in the workbook reference the values the Python layer has deposited, not raw data fields that each model interprets independently. This separation of concerns — calculation logic in Python, presentation and analysis logic in Excel — is the architectural boundary that eliminates metric inconsistency across workbooks.

The same metric functions that serve Excel also serve Power BI through a different delivery mechanism. Power BI's dataflow and dataset layers can consume Python-computed outputs directly, either through scheduled data refreshes that invoke the same metric library scripts or through a shared data store — a database table or Parquet partition — that the Python layer writes to and Power BI reads from. Because both the Excel outputs and the Power BI outputs draw their calculated values from the same Python function library, they are guaranteed to agree. The EBITDA margin that appears on slide fourteen of the board deck, the EBITDA margin in the Power BI operational dashboard, and the EBITDA margin in the quarterly investor report are not three independently computed figures that someone manually reconciles before publication. They are three renderings of the same number, produced by the same logic, from the same data.

Client reporting is where metric consistency has the highest external stakes and where discrepancies are most damaging. A limited partner who compares the net IRR figure in their quarterly statement to the figure in the annual report and finds a difference — even a small one attributable to a rounding convention or a methodology change that was never formally disclosed — has a legitimate basis for concern about the reliability of the firm's reporting infrastructure. Building client reporting outputs from the same semantic layer that feeds internal management reporting eliminates this category of discrepancy by construction. The metric either has one definition or it has a formally governed set of definitions with explicit rules specifying which definition applies in which context — and that governance is documented, auditable, and enforced through the architecture rather than through manual reconciliation.

Maintaining the semantic layer over time requires the same change governance discipline applied to any production system. When the definition of a metric changes — because of a methodology decision, a regulatory update, a change in the underlying data schema, or a business evolution that makes the original definition no longer meaningful — the change is proposed, reviewed, approved, implemented in the function library, version-tagged, and communicated to all downstream consumers with sufficient notice to update their outputs. The metrics dictionary is updated in lockstep with the code change, so the written definition and the computational implementation never diverge. This is the organizational discipline that separates a semantic layer that remains authoritative over time from a metrics glossary that gradually becomes aspirational fiction as individual models drift away from it.

Cell Fusion Solutions designs and builds semantic layer architectures for finance teams that are ready to move beyond the fragile, interpretation-dependent world of independently maintained metric calculations. We build the metrics dictionaries, implement the Python function libraries, and connect them to existing Excel models, Power BI environments, and client reporting workflows in a way that makes metric consistency automatic rather than effortful. If your organization has experienced the boardroom moment where two slides show different numbers for the same metric, Cell Fusion Solutions can build the infrastructure that ensures it never happens again.

Anatoliy S

Building a Semantic Layer for Excel: One Source of Truth for KPIs

From Excel to Lakehouse: A Modern Upgrade Path Using Parquet + DuckDB + Python

LLM Guardrails for Spreadsheets: Preventing Hallucinated Numbers and Fake Citations

Cell Fusion Solutions Inc.