The Why Layer: Make Unstructured Data First-Class in Your Semantic Layer
TL;DR: Traditional business intelligence tells you what moved. The why is already in your warehouse—inside notes, tickets, survey comments, reviews, emails, and other text. By modeling unstructured data in your semantic layer and letting artificial intelligence read it alongside your metrics, you move from arguments to evidence-backed explanations.
What is a (modern) semantic layer—and what’s missing?
A semantic layer is the business translation layer that standardizes and serves consistent definitions (like revenue, churn, active user) across tools and teams. In modern stacks it creates a unified, human-meaningful view of data regardless of how or where that data is stored.
What’s been missing is unstructured data—all the text and media that doesn’t fit into traditional business intelligence charts and dashboards. Unstructured data often contains the reasons behind performance changes, but most teams leave it out of their models because the consumption layer hasn’t made it easy to work with.
Why now: the size of the opportunity (and the cost of ignoring it)
Multiple industry sources estimate that 80–90% of enterprise data is unstructured and that it’s growing faster than structured data. That means the explanations you want are already being collected, just not used at scale.
At the same time, warehouse platforms and adjacent tooling have made it far easier to store and analyze text in place. For example, leading warehouse providers Snowflake and BigQuery now have native, at-scale text and document analysis with AI features that can summarize, extract, and join unstructured results alongside to the other fields in tables.
The Why Layer: modeling unstructured data inside your semantic layer
We propose that teams should be focused on building out a “Why Layer”. The Why Layer treats unstructured fields as first-class citizens in your semantic model. In practice:
- You label text columns in tables that link to your existing business entities, time, and measures, so each comment or note can travel with the numbers it helps explain.
- You query structured and unstructured together: “Show win-rate by segment, and surface the top reasons we lost—with quotes and source links.”
- You keep one source of truth for definitions while broadening what “analysis” includes—numbers and narratives. This extends, rather than replaces, your current semantic approach (for dbt users, think of it as expanding the surface area your metrics and models can draw from).
How AI actually unlocks the “why”
Modern language models can read enterprise text to summarize, extract facts, identify themes, and classify reasons. The most reliable approach in production is to keep models grounded in your data: retrieve the relevant passages, have the model reason over them, and return explanations with citations back to the exact lines of text. (This “retrieve-then-reason” pattern—often called retrieval-augmented generation—exists to reduce hallucination and keep answers verifiable.)
Put simply:
- Read selected text fields you choose;
- Understand topics, reasons, entities, and sentiment;
- Connect findings to people/products/time/metrics;
- Explain with short quotes and links to sources.
What changes when you add the Why Layer
- Sales & RevOps: You learn why win rate dropped in enterprise last quarter (e.g., missing champion, security objections), with excerpts from deal notes.
- Customer Success & Support: You see the themes that predict churn in premium plans by reading ticket text tied to cancellations.
- Marketing: You map campaign notes and keyword narratives to qualified lead creation, so you can re-allocate budget to what actually persuades.
- Product & Engineering: You align feedback, postmortems, and incidents with adoption and reliability metrics to prioritize fixes that matter most.
Industry pieces continue to show higher-quality decisions when teams combine structured metrics and unstructured signals into one analysis flow, rather than treating them as separate worlds.
Getting Started with the Why Layer
- Start with one metric and one unstructured source (for example, closed-won rate + CRM notes).
- Keep governance first: choose text fields intentionally, apply redaction and role-based access, and always show citations. Trustworthy AI depends on well-governed data and transparent sources.
- Use data warehouse LLM integrated functions or sign up for Push to perform qualitative analysis alongside your quantitative analysis.