Model Context Protocol (MCP) - A Deep Dive
In this blog
Introduction
Model Context Protocol (MCP), released by Anthropic in November 2024, has taken the Generative AI (GenAI) world by storm, claiming to offer a new way for users to provide data and tools to LLMs in order to build agentic systems. In a nutshell, agentic systems work by providing an LLM with the prompts and ability to access tools via function calling. Thus, when a user queries an agent, the system can access data and tools (such as APIs), which the LLM then uses as context in its response. MCP offers a new, standardized format for doing so, centered around the idea of tools being self-contained, reusable across systems, and standardized in how they connect to the agent.
To more fully understand the context of what MCP is adding to the current Gen AI landscape, let's back up a little. LLMs, or Large Language Models, are extremely large neural networks trained to generate text given an input. The first wave of popular LLMs hit in 2022, when OpenAI released ChatGPT, a chatbot with a then-unparalleled ability for humanlike speech. Even as new waves of releases from OpenAI and others show vast improvements in LLM performance, many users realized the limitations of generalized text generation. Retrieval Augmented Generation, or RAG, proposed a solution: provide embedded context documents to LLMs so that they can answer questions about specific subject matter. From there, the idea of context was broadened to any service that could send data back via an API. Thus, tool calls were developed and added to agents.
MCP builds upon agentic structures by providing a protocol for how these services provide data to LLMs. Simply put, MCP is a standardized, open-source format for allowing LLMs to interact with data sources and APIs. While the result does not sound so different from the agent, from a developer standpoint, MCP offers three main improvements:
- Organization of tool calls by providing a modular system for packaging them, grouped by service.
- Standardization of these services, so that they all plug into the code base the same way, meaning less time to integrate.
- Reusability of these services, meaning that once they are configured, there is no need to re-code them in a different system, saving developers time and energy.
Thus, the developer can easily connect the AI of their choice to as many different services and tools as they want and build as many different agents as they want with a mix of the same tools, resulting in more complex and customizable systems.
MCP does not just promise standardization, though – it is rapidly picking up buy-in from a range of users in a variety of ways. The release has been closely linked to Anthropic's Claude Desktop application. By editing a configuration file that the desktop app itself can open, even non-technical users can quickly get simple MCPs started and start asking Claude questions while it talks to their tools. A small number of external software integrations have been released by Anthropic, including GitHub, Slack and Puppeteer (which is able to browse the internet for the user). Clients are also accessible through a growing number of integrations with other UIs, including both official and community releases. This has made MCP highly accessible to the personal user who wants their own setup.
On the developer side, Anthropic has also released SDKs in a variety of languages for the quick building of MCP custom components on the backend, most notably in TypeScript and Python and a new package called FastMCP further simplifies and streamlines the building process for and TypeScript and Python users. This supports the development of custom components of MCP that can be run entirely through the developer's setup, instead of a third-party UI like Claude Desktop. Organizations and community members have been rapidly building MCP servers as well, with integrations ranging from tech services like Grafana and Milvus to casual apps like PayPal and Gmail.
In this article, we'll dive into the more technical side of MCP, with a focus on how to build custom setups for production. We will then talk about the implications and potential for MCP in business applications. While we addressed the technical aspects from a mostly code-agnostic viewpoint, we did all experimentation and testing in Python.
Technical details
MCP follows a standard client-server architecture and MCP consists of several main components: the host (which includes the client), servers, and data sources.
The host is the main LLM-powered application running everything, such as Claude Desktop, an IDE, or a Command Line Interface (CLI). It provides the direct interface for the user and contains the client, which is the source code that directs communication between the LLM and servers. Servers are lightweight programs exposing the desired services via context protocols and feeding data sources back to the LLM. In the context of MCP, the definition of data source is quite broad and encompasses any source that contains the context we want to pass to the host LLM. Data sources can consist of files, databases, APIs, etc., and can be local or on the internet.
Host
As mentioned above, the host is the application through which the user connects to the servers. From a user perspective, there are two parts: the user interface (UI) and the client. When using integrations like Claude Desktop, the client setup, connection to the LLM and server integration are abstracted away. However, a custom client can also be built using any LLM, in a variety of languages. To delve more into how clients work and how they interface with servers, we will focus on the structure of custom clients and their necessary components.
Clients
Most clients use the following basic structure:
- Client initialization. This is where the client starts up and connects to the LLM.
- Server connections. These are scripts, functions, and configuration files that manage connections to each server. The FastMCP package has many tools for easily building servers and integrating them with clients.
- Query processing. These functions handle tool interaction, including calling APIs and services, querying data and executing tools. It might also include system prompts to inform the agent how to handle tools and resources.
- Interactive interface. This code defines the connection to the UI or CLI that is communicating with the user and handles the passing of data to and from the interface.
- Resource management. This consists of resource cleanup, error handling and graceful shutdown.
In addition to the above structure, a custom client offers the flexibility to customize tool calling, response handling, and interface management.
Servers
Servers are how clients connect to applications uniformly. Each server should connect to one and only one service. This custom allows for a more modular construction of agents as opposed to the typical agentic frameworks, where the necessary tool calls are all built into one system.
A fixed number of servers were developed in collaboration with Anthropic to serve as a starting point for developers. Since the launch of MCP, both other companies and community users have developed third-party integrations to be used by anyone. Some well-established servers also have official Docker images that can be used to run the server. Servers that require user credentials store them as environment variables.
There are three main components of servers: tools, resources and prompts. Each component has a different way of interacting with the user and the LLM. While many servers can exist within an MCP, each server has its own set of tools, resources, and prompts that belong to it. For example, all the GitHub server's functionalities revolve around repository and user actions, such as managing pull requests, issues, etc. Meanwhile, Puppeteer handles browsing the internet, which, of course, includes GitHub websites.
While the GitHub server is best for managing one's resources, Puppeteer can be used, for example, to search for the appropriate repository on Google. The LLM can then use the information from that search to populate a request to the GitHub Server. However, the user does not have to figure out how to find this information—the LLM can decide based on simple requests. The LLM might even inform the user that they need the exact repository name to create an issue.
Tools
Tools are functions that perform specific actions. Each server integration offers a fixed set of tools which can be found in the server's documentation. Most tools require user-provided arguments about the specific data, user accounts, or information needed to fulfill the request. Tools are controlled by the LLM, which means the LLM, and not the user, decides if a tool is needed and which tool or tools. Therefore, while the server handles the actual fetching and calling of the tool, to smoothly integrate the tools into the MCP, the client side must have appropriate handler functions and system prompts to tell the LLM when to use the tools and how to feed that information back to the user.
Tools receive information structured in a dictionary format with the name of the tool and the arguments. When prompted to do a task, the LLM will decide for the user the best way to accomplish it. If the user has not provided the appropriate arguments, the LLM will prompt the user for the arguments. In a basic setup, the user formulates a tool call for the LLM, but system prompts and embedded tool calls can easily be incorporated so that the LLM formats its own call to the tool in dictionary format. When it receives the data from the tool, it passes it back to the user. This process can also be combined with other generative tasks such as summarizing.
Continuing the example from above, Puppeteer has a tool called puppeteer_navigate
that takes a URL as its input and navigates to that page. Meanwhile, GitHub's server offers a variety of tools for handling issues, branches, and pull requests, which usually require information such as the repository name, the user's account name, and any input text. The user can also create their own custom tools, which can both add new functionalities and call the default tools of a server. This can be extremely useful for combining multiple tool calls into one action, especially if the two tools usually need to be called together (e.g., committing changes to a branch and then pushing a pull request).
Resources
Resources are data sources connected to the server. While a tool can handle direct and structured access to a specific piece of data, resources provide context for the LLM to use at its own discretion. Resource types range from filesystems and databases to API responses, screenshots and images, live system data, and more. Currently, resources can exist as text that is UTF-8-encoded or raw binary data encoded in base-64.
In server source code, resources are defined by a URI in the format [protocol]://[host]/[path]
. For example, the URI for a GitHub repo would be repo://owner-name/repo-name/path
. A key feature of resources is that their URIs can also be dynamically defined using resource templates. Resource templates are abstractions of URIs that allow arguments to be passed into the resource definition by the LLM. The placeholder arguments are denoted by curly brackets. For example, the GitHub server could use template repo://{owner-name}/{repo-name}/{path}
to call a file stored in any GitHub repository owned by any user, provided the proper credentials are provided.
Prompts
Prompts are user-controlled templates that can be added to servers to allow for more standardization of repetitive tasks. The user can define prompt arguments. Then, when interfacing with the agent, they can select a prompt and simply provide the arguments needed. For example, the user could create a prompt to quickly make commits to a specific GitHub repo they frequently update, so that they only need to provide a commit message and no other data.
In addition to speeding up repetitive or frequently used tasks, prompts can be chained together in multi-step workflows. For example, a prompt could contain a series of system-user prompts in which a user asks if there are any open pull requests in a repo. If there are none, the chain ends. If not, the system returns their names to the user and asks if the user would like to view any of them. The user can then respond with their answer. If yes, the system could again prompt the user for their review. Therefore, multi-step processes that require user interaction and, therefore, cannot be handled by chaining multiple tool calls together in one custom tool call can still be streamlined easily by chaining user and system prompts.
Client-server connections
The connection between the client and server is 1:1, meaning each user's client connects to the server directly, rather than going through a centralized hub. There are two potential setups for the server to communicate with the client:
- The client and server are running on the same machine. In this case, the server uses stdio (standard Input/Output) to communicate with the client.
- The client connects to the server via HTTP. After initial setup, the server can push messages to the client over a persistent connection using SSE (Server-Sent Events)
The SDK provides functions for handling both types of connection.
To connect to the servers to the host, the developer configures a server config in JSON format, stored in the root of the host's application directory (Claude Desktop will navigate you straight to the config file with a single click). Each enabled server contains the command required to run the server (e.g., uv, npx, python, Docker, etc), a list of required arguments, and any environment variables to be passed in. Access to local filesystems can also be configured as servers.
As mentioned in the introduction, Anthropic has developed a limited number of servers [linked here] to demonstrate MCP capabilities, including GitHub, Google Drive, and Slack. There is also a growing list third-party servers with official MCP servers [linked here], as well community-built offerings [linked here]. There is also the FastMCP Python/Typescript package that abstracts a lot of the code required to construct a server, including tool, resource and prompt decorators for easy connections between different parts of the server.
Implications and business applications
MCP has provided an open-source, standardized way to connect LLMs to data and services, in which the data and services can be easily reused across various AI systems. This is a further step towards abstraction of software development and tasks, where the tasks given to the LLMs now exist in a standardized, portable format.
As it is still in its infancy, MCP is still underdeveloped in some places, especially in many areas of custom component building and security. Though not the focus of our article, we highly recommend researching security concerns and taking care with data when doing any experimenting or implementing third-party servers. Building custom servers and clients to integrate with secure data and passwords is recommended, as well as implementing rigorous session management tools and LLM guardrails to protect against hallucination.
As MCP develops, we expect to see more adoption of custom client and server setups, both because of the aforementioned security risks and due to their ability to be customized to an organization's needs while still being portable within the organization. Just as there is a market of open-source servers available, within a single organization, servers can be the building blocks for connecting to resources and services such as databases and APIs. If a custom server is defined strictly within the confines of a single service, with a base set of simple tools, prompts and custom tools can be built by developers for more specific use cases, but the server itself can be used by anyone who needs to reach that resource. This means that there is a one-time cost to building a tool that can be connected to many agents.
As a result, a well-constructed, reusable server could lower costs by reducing developer time and reducing the amount of code and infrastructure to maintain. Savings would multiply from each reuse. From a services perspective, we can emphasize the greater value this framework has over a typical agentic framework and emphasize that the servers we build are a part of the offering - they are not just receiving an agent, they are receiving the reusable tool. This also leaves the door open for future collaborations if the client wishes to reuse their server with other agents.
For example, WWT's management consulting division might contract with an organization to build an agent for sales representatives to query about both the organization's own and its competitors' products. The agent could have an MCP framework with two servers: one that connects to an internal database to handle questions about the organization's products, and one that connects to the internet to look for information about the competitors, with a server-enabled prompt to guide the process of finding the right information. We could then locate the servers in a central, internal location. If another team, say, customer service, also needs an agent to answer questions about products as well as customer data, they can integrate the database server and connect both datasets as resources. They might also wish to add a server that connects to the application that handles customer tickets. Developers could take on this new project with less time and a lower cost, because they only need to build one new server. As time goes on, the analytics team might want to use the internet access server and the internal database for their own MCP agents. Therefore, the servers built for the initial use case have compounding value across the organization and, because of the standardized protocol, are easily compatible with other clients. Additionally, as the ones who built the server, we might have a continuing relationship with the client as we help them leverage its potential and build new ones.
From a business perspective, by centering agentic frameworks around MCP in the future, we can offer not just the agent, but the servers, which will become building blocks for us or them to continue expanding their AI capabilities beyond the task at hand. Therefore, to best leverage MCP in all business contexts, it is important to understand not only the technical details of how to build a custom client-server setup but also how to build them in a way that the customer will get the most value out of their servers. In addition, we must learn to communicate the new features MCP brings to the table and the specific value added of custom servers as reusable, cost-saving and time-saving.
Author's Note: I would like to thank Adrien Hernandez, Vinay Garg, Chris Carpenter and Andrew Xavier for their help and insight in writing this blog!