Skip to content

What LLM Does Glean Use? A Full Breakdown

Published: 05/2025
26 minute read
What LLM Does Glean Use? A Full Breakdown

Improving model accuracy from 70% to over 90% while simultaneously cutting human-in-the-loop operating costs by more than 30% sounds like a dream scenario. For the accounts payable platform Glean.ai, it became a reality. These impressive results weren't achieved by switching to another big-name proprietary model. Instead, they came from a strategic decision to take control of their AI stack and build a fine-tuned, in-house solution. This success often prompts the question, what llm does glean use to get such great performance? While the answer involves cutting-edge open-source models, the real story is in the execution. This case study details how Glean leveraged the Cake platform to build, deploy, and manage their own models, saving the equivalent of two and a half full-time engineering hires.

Key takeaways

  1. Glean extracts structured data from PDFs using AI-powered data pipelines

  2. Cake’s “all-in-one” AIOps platform saved Glean two-and-a-half technical FTE hires in building and operating their AI-driven pipelines

  3. Deploying generative AI on Cake empowered Glean to build an AI strategic moat, enhance data privacy, prove greater reliability, improve model accuracy from 70% to 90%+, and lower AI operating costs by 30%

 

 

Glean.ai (“Glean”) provides an accounts payable (AP) platform that helps customers better understand their expenditures with analysis and insights powered by machine learning. The Glean platform automates essential AP functions, including invoice processing and bill payment, analyzes spending data, and offers customers a variety of benchmarking information. For organizations with limited AP resources, this means streamlined operations, deeper financial insights, and the ability to strategically reduce costs.

High accuracy is critical for Glean’s accountant-heavy customer base. Glean needed to build an effective, accurate solution while maintaining low operating costs, so it selected Cake’s platform to manage AIOps and infrastructure.

Cake helps you deploy the entire ML stack — everything you need in one package. It’s an all-in-one SaaS platform for MLOps that requires far less effort."

—Artiom Tarasiuk, Data engineer, Glean.ai

From traditional ML to in-house LLMs in three phases

While most AP solutions limit themselves to processing basic header-level data (e.g., invoice totals and due dates), Glean set out to tackle a more technically difficult challenge: analyzing spend at the line-item level. Glean's ML pipeline extracts messy, granular purchase data from PDFs of invoices and receipts, maps that information to canonical taxonomies, and generates insights from the newly structured data. The system's models continuously learn from new invoices, enabling the platform to provide benchmarking insights, time-series analyses, and spend comparisons.

 Phase 1:  Classical ML

The first phase of Glean’s implementation used classic ML approaches. ML was chosen over rule-based systems to maintain consistency and generalization capabilities. This initial phase involved migrating several in-house ML models to a Cake-managed Kubeflow/KServe infrastructure, marking the beginning of Glean’s partnership with Cake. 

Cake helped solve Glean’s data labeling challenges in classical ML, particularly in training line item models and associating bounding boxes on PDFs with labels. However, the emergence of capable LLMs prompted a strategic shift away from classical ML. 

Cake offers a platform with a set of open source technologies that cover the full lifecycle of an ML project."

— Jeremy Johnson, Director of Data Science and Engineering, Glean.ai

 Phase 2:  External Proprietary LLMs

The second phase marked Glean’s transition to proprietary/commercial LLMs including OpenAI and Gemini Pro, which allowed Glean to take advantage of the latest AI breakthroughs, but which also presented its own challenges.

While these solutions were easy to get started with, they ultimately proved unstable and expensive. Reliability challenges with one major vendor led to frequent timeout errors. Given customer expectations of quick responsiveness on invoice processing, these timeout errors were causing the Glean team to rely more heavily on costly human processing than on the models themselves. Other large-scale vendors were too expensive to operate at scale. Some Glean customers were also not comfortable with the idea of sending data, embeddings, and queries out to external service providers. 

Outside of reliability, performance, and security concerns, the proprietary LLM approach was simply too generic to establish the strategic moat Glean sought to create in their product. As Johnson described, “The reality is external models aren’t trained on your data. Either they aren’t as accurate as you need them to be, or the fine-tuning process is extraordinarily costly.”  

 Phase 3:  Insourced LLMs

The partnership with Cake evolved significantly during this period, as Glean began implementing an ensemble of large language models to improve extraction and mapping processes through retrieval-augmented generation (“RAG”). Ultimately, Glean wanted to get to a point where they could reduce reliance on the costly human-in-the-loop aspect of their product with a reliable, customized model. 

This collaboration culminated in Glean’s third phase: developing an in-house, fine-tuned LLM as part of their long-term strategy.

We found long-term value in building our own in-sourced model with Cake. There is a strategic moat you can develop with your own IP – along with lower costs, greater security, better reliability, and improved accuracy when compared with external proprietary LLMs." 

—Jeremy Johnson, Director of Data Science and Engineering, Glean.ai

Why move to an in-house LLM?

Glean decided to bring core generative AI capabilities in-house for five primary reasons - strategic moat, enhanced data privacy, greater reliability, improved model accuracy, and lower operating costs.

  1. Strategic moat: This effort also provided a proprietary advantage. Johnson described this differentiator: “We like having more of our own IP — having in-house, fine-tuned models allows us to build more of a moat.”

  2. Data privacy: Although Glean mainly processes receipts, invoices, and relatively standard financial documents that don't contain much PII or sensitive data, some customers insist that none of their data be sent to external providers. An in-house model offers a more secure and compliant solution.

  3. Reliability: By developing and hosting their own LLM in-house, Glean optimized for low latency, minimized timeout errors, and ensured consistent performance — critical factors when processing time-sensitive customer data and delivering a timely response.

  4. Model accuracy: Glean had started their work with in-house LLMs expecting accuracy that would be marginally higher than 70%. Results have been significantly better—testing above 90% accuracy. As Johnson described, “In-house model accuracy has substantially exceeded our expectations, as well as what we’ve seen with other models.”

  5. Operating costs: Advances in model accuracy directly limit the need for human-in-the-loop (HITL) intervention, one of the more costly elements of managing an LLM-powered service in production. When compared to the estimated need for HITL intervention with proprietary models, the Glean team estimates that managing their model in-house will save them more than 30% (and eventually up to 70%) of HITL operating costs.

What is Glean and what does it do?

At its core, Glean is an AI-powered enterprise search platform designed to tackle one of the biggest challenges in the modern workplace: finding information. Think about how many different applications your team uses every day—Slack, Google Drive, Jira, Salesforce, and countless others. Important data gets scattered across all these silos, making it incredibly difficult to find the one document or conversation you need. Glean connects to all of these apps, creating a single, unified search experience. It intelligently understands your company’s unique language, people, and content, allowing you to ask natural questions and get highly relevant answers instantly. Instead of spending hours digging through different platforms, your team can find what they need in seconds, which is a massive productivity gain.

The platform goes beyond simple keyword matching by using AI to understand the context and intent behind your search. It knows who you are, what projects you’re working on, and who you collaborate with, so it can surface the information that’s most relevant to you personally. This personalized approach ensures that the results aren't just accurate but are also tailored to your specific role and needs within the organization. By breaking down information silos and making institutional knowledge accessible, Glean empowers employees to work faster and make more informed decisions without the usual friction of hunting for data.

An AI-powered enterprise search platform

Glean acts as a smart, central hub for all of your company's knowledge. It securely integrates with dozens of workplace apps to index every document, message, ticket, and file your organization creates. When an employee performs a search, Glean doesn't just look for keywords; it uses sophisticated AI to understand the relationships between different pieces of information. This is powered by a deep understanding of your company's internal workings, including its structure, teams, and projects. The result is a search experience that feels intuitive and almost magical, delivering precise answers pulled from the most authoritative sources available across your entire digital workplace.

Going beyond search with Glean Assistant and Glean Agents

Glean is more than just a search bar; it's an active participant in your workflow. With features like Glean Assistant, the platform can proactively answer questions, summarize long documents, and even draft content based on your company's internal knowledge. Imagine asking an assistant to "summarize the key takeaways from the Q3 marketing report" and getting a concise, accurate summary in seconds. Glean Agents take this a step further by automating entire workflows. You can create agents to handle repetitive tasks, like generating weekly project updates or finding the right expert on a specific topic, freeing up your team to focus on more strategic work.

Understanding Glean's flexible multi-LLM strategy

One of the most powerful aspects of Glean's platform is its flexible, multi-LLM strategy. Instead of tying its entire system to a single large language model from one provider, Glean remains model-agnostic. This means it can leverage a wide range of the best-performing models from different developers, including OpenAI, Anthropic, Google, and Meta. This approach has several key advantages. First, it prevents vendor lock-in, giving Glean the freedom to adapt as the AI landscape evolves. Second, it allows the platform to use the right tool for the right job. Some models are better at creative text generation, while others excel at logical reasoning or data analysis. By having access to multiple LLMs, Glean can optimize performance and deliver the highest quality results for any given task.

This strategy is made possible through a centralized system that simplifies the management of these diverse models. For companies looking to build their own AI solutions, adopting a similar approach can be complex. It requires significant engineering effort to integrate, manage, and maintain different models. This is where a comprehensive platform like Cake can be invaluable, as it provides the entire managed stack needed to deploy and operate multiple open-source AI models efficiently. Glean’s strategy highlights a best practice in the AI world: flexibility and choice are key to building a resilient and future-proof AI infrastructure that can consistently deliver value.

Why Glean is model-agnostic

Glean’s model-agnostic stance is a strategic choice designed to maximize performance and reliability. The world of large language models is changing at an incredible pace, with new and improved versions being released constantly. By not committing to a single model, Glean ensures it can always integrate the latest and greatest technology without being stuck with an outdated or underperforming solution. This flexibility also allows for greater cost-efficiency, as Glean can choose the most economical model that meets the performance requirements for a specific task, rather than using an expensive, high-powered model for everything.

The Glean Model Hub: A marketplace for models

To manage its diverse collection of LLMs, Glean uses its Model Hub. You can think of the Model Hub as a centralized marketplace where different AI models are made available within a single, secure platform. This system abstracts away the complexity of dealing with multiple APIs and environments, making it much simpler for Glean to use many different AI models. For administrators, the Model Hub provides a clear and straightforward way to see which models are available, configure their usage, and manage access, all without needing deep technical expertise in each model's specific architecture.

So, which LLMs can you actually use with Glean?

Glean's commitment to a multi-LLM strategy means users have access to a curated selection of top-tier models from the biggest names in AI. This variety ensures that organizations can choose the model that best fits their specific needs for performance, cost, and compliance. The platform integrates seamlessly with leading model providers, allowing teams to leverage cutting-edge AI without the headache of managing individual integrations. This approach not only provides flexibility but also future-proofs the platform, as new and emerging models can be added to the hub as they become available. The ability to select from a roster of powerful LLMs is a significant advantage for any business looking to get the most out of its AI investments.

Supported models from OpenAI

As a pioneer in the generative AI space, OpenAI's models are a core part of Glean's offerings. Users can access various models from the GPT family, which are well-known for their powerful reasoning and text generation capabilities. This integration allows teams to leverage the same technology that powers tools like ChatGPT, but applied securely to their own internal company data. Whether it's for summarizing complex documents or drafting internal communications, OpenAI's models provide a robust and reliable option within the Glean ecosystem.

Supported models from Anthropic

Glean also supports models from Anthropic, such as those in the Claude family. Anthropic is known for its focus on AI safety and creating models that are helpful, honest, and harmless. For organizations with stringent compliance or ethical considerations, having access to Claude models can be a major benefit. These models are particularly adept at handling nuanced conversations and providing thoughtful, detailed responses, making them an excellent choice for tasks that require a high degree of accuracy and contextual understanding.

Supported models from Google

With its deep roots in AI research, Google offers a powerful suite of models that are also available through Glean. This includes models from the Gemini family, which are designed to be multimodal, meaning they can understand and process information from text, images, and other formats. Integrating Google's models gives Glean users access to cutting-edge AI that excels at complex reasoning and data synthesis. This is especially useful for teams that work with diverse types of information and need an AI that can connect the dots across different content formats.

Supported models from Meta

For organizations that prioritize open-source solutions, Glean's support for models from Meta, like the Llama family, is a key feature. Open-source models offer greater transparency and customizability, which can be a critical factor for companies with specific technical requirements or those who want more control over their AI stack. By including powerful open-source options, Glean provides a level of flexibility that caters to a wide range of enterprise needs, from startups to large corporations that prefer to build on open technologies.

The core technology connecting LLMs to your data

The real magic of Glean isn't just its access to powerful LLMs; it's how the platform securely connects those models to your company's private data. Simply pointing a generic LLM at your internal documents would be a recipe for disaster, leading to inaccurate answers (known as "hallucinations") and serious security risks. Glean avoids this by using a sophisticated architecture built on two key components: Retrieval-Augmented Generation (RAG) and a unique Knowledge Graph. This combination ensures that every answer generated by the AI is grounded in your actual company information, making the responses trustworthy, relevant, and secure. It’s a complex process that requires careful orchestration of data pipelines and AI workflows.

This is another area where the principles of a managed platform shine. Building a production-ready RAG system from scratch is a major undertaking that involves integrating vector databases, search indexes, and LLMs. For teams that want to build similar custom solutions, platforms like Cake provide the pre-built components and managed infrastructure to accelerate development. By handling the underlying complexity, these platforms allow data science and engineering teams to focus on fine-tuning the user experience instead of getting bogged down in infrastructure management, much like Glean does for its own customers.

How RAG ensures answers are accurate and secure

Retrieval-Augmented Generation, or RAG, is the process Glean uses to make sure its AI provides factual answers based on your company's data. When you ask a question, the system first searches your company's connected apps to find the most relevant documents and information. This "retrieval" step acts as a fact-checking mechanism. Then, it passes that verified information to the LLM along with your original question, instructing the model to generate an answer based *only* on the provided context. This grounds the AI, dramatically improving accuracy and preventing it from making things up.

The role of Glean's unique Knowledge Graph

Supporting the RAG process is Glean's Knowledge Graph. This graph is a dynamic map of all your organization's information and the relationships between it. It understands who reports to whom, which teams work on which projects, and what acronyms mean in your company's specific context. When the RAG system retrieves information, the Knowledge Graph provides crucial context that helps it find the most relevant and authoritative sources. This deep, contextual understanding is what allows Glean to deliver personalized and highly accurate answers that a generic search tool simply couldn't match.

How you can customize the AI experience for your team

Glean understands that every organization is different, so it provides a suite of tools that allow administrators to tailor the AI experience to their team's specific needs. This level of customization is crucial for ensuring that the platform aligns with a company's workflows, security policies, and communication style. From choosing which AI models to use to setting behavioral guidelines, Glean puts control in the hands of the user. This ensures that the AI acts as a natural extension of the team, reflecting the company's unique culture and operational requirements. These customization options empower businesses to create a bespoke AI environment that feels both powerful and perfectly integrated into their daily operations.

Choosing your preferred models in the Admin Console

Within the Glean Admin Console, you have the power to decide which LLMs are used for different tasks or by different teams. For example, you might configure the marketing team to use a model that excels at creative writing, while the engineering team uses a model optimized for code analysis. This granular control allows you to pick the best AI model for the job and manage costs effectively. The intuitive interface makes it easy to switch between models or set defaults without needing to write a single line of code.

Using your own keys with major cloud providers

For companies that have existing relationships and negotiated rates with cloud providers like AWS, Azure, or Google Cloud, Glean offers the ability to use your own API keys. This means you can run AI workloads through your own cloud accounts, which provides greater control over security, compliance, and billing. It allows you to consolidate your cloud spending and take advantage of any enterprise discounts you may have, offering a more integrated and cost-effective way to manage your AI usage within the platform.

Setting custom instructions to guide AI behavior

To ensure the AI's output aligns with your company's tone and standards, Glean allows you to set custom instructions. These are system-level directives that guide the AI's behavior across all interactions. For instance, you could instruct the AI to always adopt a formal tone, to structure summaries with bullet points, or to avoid using specific jargon. This feature helps maintain consistency in communications and ensures that the AI-generated content feels like it truly comes from your organization, reinforcing your brand and culture.

Important operational details to consider

While the capabilities of a platform like Glean are impressive, it's also important to understand the operational details that come with managing an enterprise-grade AI solution. This includes navigating the pricing structures, understanding how different models are tiered, and planning for the inevitable lifecycle of AI models as new versions are released. A successful implementation requires more than just flipping a switch; it involves thoughtful planning around cost management, performance optimization, and long-term maintenance. Being aware of these factors from the outset will help you make informed decisions and ensure you get the most value from your investment in AI technology over time.

Effectively managing these operational aspects is critical for controlling costs and ensuring consistent performance. For instance, the choice of an LLM can have a significant impact on both latency and your monthly bill. As the Glean.ai case study showed, bringing models in-house can lead to substantial operational savings, potentially reducing human-in-the-loop costs by over 30%. This highlights the importance of having a clear strategy for model selection and lifecycle management, whether you're using a turnkey solution like Glean or building your own AI applications on a platform like Cake.

Model tiers, FlexCredits, and cloud restrictions

Most AI platforms, including Glean, offer different tiers of models. You might have access to a standard, cost-effective model for everyday tasks, as well as a premium, high-performance model for more complex queries. Usage is often managed through a credit system, where different actions consume a certain number of credits based on the model's complexity. It's also important to be aware of any cloud-specific restrictions or requirements, as some features or models may only be available if you are hosted on a particular cloud provider.

Managing the model lifecycle and retirement schedules

The AI landscape moves fast, and LLMs are constantly being updated. This means that older model versions are eventually retired. A key operational task is managing this lifecycle. You'll need to stay informed about model retirement schedules and have a plan to test and migrate your workflows to newer versions. This ensures that your AI tools continue to perform optimally and benefit from the latest improvements in accuracy and efficiency. Proactive management of the model lifecycle is essential for maintaining a reliable and effective AI-powered workplace.

Managing AIOps with Cake

Cake has been at the foundation of Glean’s in-house modeling efforts. Johnson explained, "The most recent work on our in-house LLM reflects how Cake has become a key technology choice for advancing our AI R&D initiatives. We've had a longstanding dialogue about different iterations of the product—Cake understands our use case and the problems we'd like to address with a large language model."

Glean’s ML infrastructure evolved beyond basic inference as the strategy shifted towards developing an in-house model. The current stack emphasizes stability and focuses on distributed inference that can work in an async pattern. At the core of their stack, key components include: 

  • Kubeflow: Core platform that serves as the foundation for the ML infrastructure

  • KServe: Primary inference tool integrated with multiple models

  • TriggerMesh: Event-driven tool that manages data processing by reading from queues and initiating inference jobs based on requests, supporting the asynchronous nature of the pipeline

  • pgvector: Current vector store solution, chosen to replace Milvus due to schema management controls

  • Argo CD: Deployment tool implemented at the beginning of Glean’s partnership with Cake, proved so successful it was adopted organization-wide, significantly improving DevOps processes and deployment management across departments

Cake has also introduced several key components for fine-tuning Glean’s in-house models, including: 

  • Llama Factory: A visual interface for fine-tuning training and evaluation of large language models, providing easy-to-use access for managing datasets and configuration of hyperparameters such as epochs, context window, and batch size

  • vLLM: Provides a high-performance inference framework specifically tailored for large language models. By enabling faster and more efficient inference, it enhances throughput and reduces operational costs for AI-driven applications.

  • Llama 3.2: The latest iteration of the Llama model family, delivering greatly improved generative capabilities with enhanced accuracy, speed, and resource efficiency.

Using Llama Factory with Llama models and vLLM, Glean is able to rapidly train and deploy the most cutting-edge open source large language models. Llama Factory streamlines the training and tuning process, while vLLM delivers high-speed inference for optimized AI performance. 

As Glean’s infrastructure continues to evolve, Cake has helped balance specific needs with flexibility and scalability. As Johnson outlined, “The process of developing and fine-tuning, as well as making core decisions about model operations—we've been able to defer largely to the expertise at Cake. The bulk of the informed decision-making has been driven by our trust in the expertise of the folks we're working with at Cake.”

Saving Glean two-and-a-half FTEs

If we were to have brought on contractors for this project work instead of Cake, it would have taken about two-and-a-half FTEs."

— Jeremy Johnson, Director of Data Science and Engineering, Glean.ai

Glean has a small in-house team of ten technical staff across front-end, back-end, and data science and relies on Cake to augment this internal team. “Having specialized resources and added levels of expertise from Cake has been extremely helpful for implementing ML and LLM-related projects,” according to Johnson.

As an AIOps platform, Cake streamlined Glean’s initial implementation by accelerating experimentation and technology selection. As Tarasiuk explained, “Testing different labeling processes required significant AIOps effort. Since we don't have a huge team, we were fortunate that Cake helped considerably with the implementation and infrastructure roadmap. It was quite an intense period.”

Johnson continued, “Our relationship with Cake has included both services and thought leadership components in addition to the platform. They've served as trusted advisors in implementing several projects with us.”

Cake as an LLM project accelerator

One of the challenges of working in the data science and AI space is that the field changes so rapidly that even experienced people need to learn new skills frequently.

As Johnson outlined, “Having conducted ML job searches for someone with large language model experience, it is rough. There's significant time invested in hiring and ramping people up. Having an ongoing relationship with the Cake team is super helpful. They understand our projects and requirements and have deep expertise in the latest technologies.”

Even if teams can find the right person at the right time, there’s still the chance they lack familiarity with key portions of a project. Tarasiuk echoed the challenges in hiring, noting, “It's a significant challenge to find someone with such diverse domain experience. With Cake, we get that range of expertise because they have many different supported components. I don't think we could find a single person similarly experienced across all those areas.”

Working with Cake, the Glean team moved key parts of their product in-house, saving time, money, and resources. To learn more about moving LLM projects in-house, please sign up for a demo here.

Frequently asked questions

Why did Glean switch from well-known proprietary models to an in-house solution? While starting with big-name models was a quick way to get going, Glean eventually ran into issues with reliability, high costs, and data privacy concerns from their customers. Building their own model gave them full control. This allowed them to create a unique feature that competitors couldn't easily copy, ensure better security, and dramatically improve accuracy, all while lowering their long-term operating expenses.

What was Cake's specific role in this project? Cake provided both the foundational platform and the expert partnership Glean needed. Think of it as the complete toolkit and the instruction manual. The platform offered all the necessary open-source components in one managed package. Beyond the software, the Cake team acted as trusted advisors, helping Glean make key technical decisions and navigate the complexities of building and deploying their custom model.

Does my company need a large team of AI experts to build an in-house model like this? Not necessarily, and that's a key reason platforms like Cake exist. Glean has a lean technical team. By using Cake, they avoided having to hire the equivalent of two and a half full-time specialists for the project. The platform handles the heavy lifting of managing the AI infrastructure, which allows a smaller team to focus on the model itself and achieve results that would typically require a much larger staff.

The case study mentions many open-source tools. Do we need to master all of them to work with Cake? Absolutely not. The value of the Cake platform is that it integrates all these powerful open-source tools into a single, managed solution. You get the benefits of cutting-edge technologies without the significant engineering effort required to set up, configure, and maintain them yourself. Cake handles the underlying complexity so your team can focus on building and fine-tuning your AI application.

Glean's project focused on processing financial documents. Is the Cake platform only for this type of use case? While Glean's story is a great example of document processing, the platform is designed to be versatile. The core challenge Glean solved—deploying, managing, and fine-tuning a custom large language model—is fundamental to many different AI applications. Whether you're building a sophisticated chatbot, a content generation tool, or a complex data analysis system, Cake provides the essential infrastructure to accelerate your project.

Related Articles