Making Agentic AI Smarter at the Architecture Level
Listen to the episode
Also available on:
The enterprise AI conversation has centered on model selection, token costs, and agent orchestration. The more consequential variable, however, is what goes into the context window. Dan Yarmoluk's breakdown is stark: 25% of context capacity goes to rules and constraints, 30% to orchestration overhead, 30% to probabilistic RAG retrieval—leaving 15% for the domain knowledge that drives useful AI reasoning.
The fix is architectural. Knowledge graphs and structured ontologies compress domain knowledge into a form compact enough to fit in the context window and structured enough to reason on directly. A 63,000-word book, organized as a knowledge graph, fits in roughly 20 kilobytes, and a model that reasons from it doesn't have to retrieve from it.
That changes the intelligence per watt calculation, because agents working from structured domain knowledge produce more reliable inference with less compute. The organizations building this architecture now are the ones positioned to reach the decision-making speed that agentic AI has been promising.
Key takeaways:
- The context window is a resource to be managed. As model context windows grow, the question of what occupies them becomes more consequential. How that space is allocated determines what the model can reason on.
- RAG has limits that matter at scale. Probabilistic retrieval works for general queries. In domains where accuracy is non-negotiable—clinical, financial, supply chain—retrieving answers probabilistically introduces error at exactly the point where it's least acceptable.
- Structured domain knowledge changes the ratio. Knowledge graphs and ontologies compress institutional knowledge into a compact, structured form that models can reason on directly.
- Intelligence per watt is an architecture decision. Every agentic deployment adds to token consumption and energy costs. The architecture decision that reduces those costs is the same one that improves inference quality.
- The CFO metric is changing. The question is shifting from token volume to inference value: how much useful understanding is generated per dollar spent. Organizations that build toward that metric now are ahead of the question.
About the guest
Dan Yarmoluk, Founder of Graphify.md and Adjunct Faculty – Software Engineering and Data Science at the University of St. Thomas in Minneapolis
Dan Yarmoluk is the Founder of Graphify.md and a Context Architect focused on building domain knowledge systems at enterprise scale. His work centers on structuring institutional knowledge so it can be efficiently reasoned over by AI, compressing what organizations know into a form models can actually use. Dan has spent his career at the intersection of data science, IoT, and digital transformation, working across industries including healthcare, industrial, and financial services. He also serves as Adjunct Faculty in Software Engineering and Data Science at the University of St. Thomas in Minneapolis.