How Glean Cut Costs and Boosted Accuracy with In-House LLMs

Written by Cake Team | May 7, 2025 2:36:45 PM

Key takeaways

Glean extracts structured data from PDFs using AI-powered data pipelines
Cake’s “all-in-one” AIOps platform saved Glean two-and-a-half technical FTE hires in building and operating their AI-driven pipelines
Deploying generative AI on Cake empowered Glean to build an AI strategic moat, enhance data privacy, prove greater reliability, improve model accuracy from 70% to 90%+, and lower AI operating costs by 30%

Glean.ai (“Glean”) provides an accounts payable (AP) platform that helps customers better understand their expenditures with analysis and insights powered by machine learning. The Glean platform automates essential AP functions, including invoice processing and bill payment, analyzes spending data, and offers customers a variety of benchmarking information. For organizations with limited AP resources, this means streamlined operations, deeper financial insights, and the ability to strategically reduce costs.

High accuracy is critical for Glean’s accountant-heavy customer base. Glean needed to build an effective, accurate solution while maintaining low operating costs, so it selected Cake’s platform to manage AIOps and infrastructure.

“Cake helps you deploy the entire ML stack — everything you need in one package. It’s an all-in-one SaaS platform for MLOps that requires far less effort."

—Artiom Tarasiuk, Data engineer, Glean.ai

From traditional ML to in-house LLMs in three phases

While most AP solutions limit themselves to processing basic header-level data (e.g., invoice totals and due dates), Glean set out to tackle a more technically difficult challenge: analyzing spend at the line-item level. Glean's ML pipeline extracts messy, granular purchase data from PDFs of invoices and receipts, maps that information to canonical taxonomies, and generates insights from the newly structured data. The system's models continuously learn from new invoices, enabling the platform to provide benchmarking insights, time-series analyses, and spend comparisons.

Phase 1: Classical ML

The first phase of Glean’s implementation used classic ML approaches. ML was chosen over rule-based systems to maintain consistency and generalization capabilities. This initial phase involved migrating several in-house ML models to a Cake-managed Kubeflow/KServe infrastructure, marking the beginning of Glean’s partnership with Cake.

Cake helped solve Glean’s data labeling challenges in classical ML, particularly in training line item models and associating bounding boxes on PDFs with labels. However, the emergence of capable LLMs prompted a strategic shift away from classical ML.

“Cake offers a platform with a set of open source technologies that cover the full lifecycle of an ML project."

— Jeremy Johnson, Director of Data Science and Engineering, Glean.ai

Phase 2: External Proprietary LLMs

The second phase marked Glean’s transition to proprietary/commercial LLMs including OpenAI and Gemini Pro, which allowed Glean to take advantage of the latest AI breakthroughs, but which also presented its own challenges.

While these solutions were easy to get started with, they ultimately proved unstable and expensive. Reliability challenges with one major vendor led to frequent timeout errors. Given customer expectations of quick responsiveness on invoice processing, these timeout errors were causing the Glean team to rely more heavily on costly human processing than on the models themselves. Other large-scale vendors were too expensive to operate at scale. Some Glean customers were also not comfortable with the idea of sending data, embeddings, and queries out to external service providers.

Outside of reliability, performance, and security concerns, the proprietary LLM approach was simply too generic to establish the strategic moat Glean sought to create in their product. As Johnson described, “The reality is external models aren’t trained on your data. Either they aren’t as accurate as you need them to be, or the fine-tuning process is extraordinarily costly.”

Phase 3: Insourced LLMs

The partnership with Cake evolved significantly during this period, as Glean began implementing an ensemble of large language models to improve extraction and mapping processes through retrieval-augmented generation (“RAG”). Ultimately, Glean wanted to get to a point where they could reduce reliance on the costly human-in-the-loop aspect of their product with a reliable, customized model.

This collaboration culminated in Glean’s third phase: developing an in-house, fine-tuned LLM as part of their long-term strategy.

“We found long-term value in building our own in-sourced model with Cake. There is a strategic moat you can develop with your own IP – along with lower costs, greater security, better reliability, and improved accuracy when compared with external proprietary LLMs."

—Jeremy Johnson, Director of Data Science and Engineering, Glean.ai

Why move to an in-house LLM?

Glean decided to bring core generative AI capabilities in-house for five primary reasons - strategic moat, enhanced data privacy, greater reliability, improved model accuracy, and lower operating costs.

Strategic moat: This effort also provided a proprietary advantage. Johnson described this differentiator: “We like having more of our own IP — having in-house, fine-tuned models allows us to build more of a moat.”
Data privacy: Although Glean mainly processes receipts, invoices, and relatively standard financial documents that don't contain much PII or sensitive data, some customers insist that none of their data be sent to external providers. An in-house model offers a more secure and compliant solution.
Reliability: By developing and hosting their own LLM in-house, Glean optimized for low latency, minimized timeout errors, and ensured consistent performance — critical factors when processing time-sensitive customer data and delivering a timely response.
Model accuracy: Glean had started their work with in-house LLMs expecting accuracy that would be marginally higher than 70%. Results have been significantly better—testing above 90% accuracy. As Johnson described, “In-house model accuracy has substantially exceeded our expectations, as well as what we’ve seen with other models.”
Operating costs: Advances in model accuracy directly limit the need for human-in-the-loop (HITL) intervention, one of the more costly elements of managing an LLM-powered service in production. When compared to the estimated need for HITL intervention with proprietary models, the Glean team estimates that managing their model in-house will save them more than 30% (and eventually up to 70%) of HITL operating costs.

Managing AIOps with Cake

Cake has been at the foundation of Glean’s in-house modeling efforts. Johnson explained, "The most recent work on our in-house LLM reflects how Cake has become a key technology choice for advancing our AI R&D initiatives. We've had a longstanding dialogue about different iterations of the product—Cake understands our use case and the problems we'd like to address with a large language model."

Glean’s ML infrastructure evolved beyond basic inference as the strategy shifted towards developing an in-house model. The current stack emphasizes stability and focuses on distributed inference that can work in an async pattern. At the core of their stack, key components include:

Kubeflow: Core platform that serves as the foundation for the ML infrastructure
KServe: Primary inference tool integrated with multiple models
TriggerMesh: Event-driven tool that manages data processing by reading from queues and initiating inference jobs based on requests, supporting the asynchronous nature of the pipeline
pgvector: Current vector store solution, chosen to replace Milvus due to schema management controls
Argo CD: Deployment tool implemented at the beginning of Glean’s partnership with Cake, proved so successful it was adopted organization-wide, significantly improving DevOps processes and deployment management across departments

Cake has also introduced several key components for fine-tuning Glean’s in-house models, including:

Llama Factory: A visual interface for fine-tuning training and evaluation of large language models, providing easy-to-use access for managing datasets and configuration of hyperparameters such as epochs, context window, and batch size
vLLM: Provides a high-performance inference framework specifically tailored for large language models. By enabling faster and more efficient inference, it enhances throughput and reduces operational costs for AI-driven applications.
Llama 3.2: The latest iteration of the Llama model family, delivering greatly improved generative capabilities with enhanced accuracy, speed, and resource efficiency.

Using Llama Factory with Llama models and vLLM, Glean is able to rapidly train and deploy the most cutting-edge open source large language models. Llama Factory streamlines the training and tuning process, while vLLM delivers high-speed inference for optimized AI performance.

As Glean’s infrastructure continues to evolve, Cake has helped balance specific needs with flexibility and scalability. As Johnson outlined, “The process of developing and fine-tuning, as well as making core decisions about model operations—we've been able to defer largely to the expertise at Cake. The bulk of the informed decision-making has been driven by our trust in the expertise of the folks we're working with at Cake.”

Saving Glean two-and-a-half FTEs

“If we were to have brought on contractors for this project work instead of Cake, it would have taken about two-and-a-half FTEs."

— Jeremy Johnson, Director of Data Science and Engineering, Glean.ai

Glean has a small in-house team of ten technical staff across front-end, back-end, and data science and relies on Cake to augment this internal team. “Having specialized resources and added levels of expertise from Cake has been extremely helpful for implementing ML and LLM-related projects,” according to Johnson.

As an AIOps platform, Cake streamlined Glean’s initial implementation by accelerating experimentation and technology selection. As Tarasiuk explained, “Testing different labeling processes required significant AIOps effort. Since we don't have a huge team, we were fortunate that Cake helped considerably with the implementation and infrastructure roadmap. It was quite an intense period.”

Johnson continued, “Our relationship with Cake has included both services and thought leadership components in addition to the platform. They've served as trusted advisors in implementing several projects with us.”

Cake as an LLM project accelerator

One of the challenges of working in the data science and AI space is that the field changes so rapidly that even experienced people need to learn new skills frequently.

As Johnson outlined, “Having conducted ML job searches for someone with large language model experience, it is rough. There's significant time invested in hiring and ramping people up. Having an ongoing relationship with the Cake team is super helpful. They understand our projects and requirements and have deep expertise in the latest technologies.”

Even if teams can find the right person at the right time, there’s still the chance they lack familiarity with key portions of a project. Tarasiuk echoed the challenges in hiring, noting, “It's a significant challenge to find someone with such diverse domain experience. With Cake, we get that range of expertise because they have many different supported components. I don't think we could find a single person similarly experienced across all those areas.”

Working with Cake, the Glean team moved key parts of their product in-house, saving time, money, and resources. To learn more about moving LLM projects in-house, please sign up for a demo here.

View full post