Writing on building software, fine-tuning ML models & LLMs (and using generative AI).

As well as being economically efficient (when building and whatnot).

I Spent May Evaluating Different Engines for OCR

I scouted 93 docs that could act as a proxy for what companies use OCR for — handwritten notes, tables, financial legacy docs— and tested them on 14 different OCR engines such as smaller specialized open-source vision models, document parsing tools and general frontier models. Let me show you which won for easy, medium and the hard tiers and how to save money when building OCR pipelines.

Agentic AI: How to Save on Tokens

Let's talk about how to decrease agent costs by 90% using various techniques. We'll go through prompt caching works and why it’s a quick win, semantic caching, lazy-loading tools and MCPs, routing and cascading, delegating to subagents, and a bit on keeping the context clean.

When Does Adding Fancy RAG Features Work?

Let us evaluate a pipeline with things like query optimization, detailed chunking with neighbors and keys, along with expanding the context with a plain RAG pipeline using LLM judges.

Running Evals on a Bloated RAG Pipeline

Does expanding chunks to neighbors improve answers? That’s what we’ll test here. We’ll run some basic evaluations and look at metrics like faithfulness, answer relevancy, context relevance, and hallucination rate, and compare results across different models and datasets.