Model Context Protocol (MCP) - A Deep Dive
In this blog
Introduction
Model Context Protocol (MCP), released by Anthropic in November 2024, has taken the Generative AI world by storm, claiming to offer a new way for users to provide data and tools to LLMs in order to build agentic systems. In a nutshell, agentic systems work by providing an LLM the prompts and ability to access tools via function calling. Thus, when a user queries an agent, the system can access data and tools (such as APIs), which the LLM then uses as context in its response. MCP offers a new, standardized format for doing so, centered around the idea of tools being self-contained, re-usable across systems, and standardized in how they connect to the agent.
To more fully understand the context of what MCP is adding to the current Gen AI landscape, let's back up a little. LLMs, or Large Language Models, are extremely large neural networks trained to generate text given an input. The first wave of popular LLMs hit in 2022, when OpenAI released ChatGPT, a chatbot with a then-unparalleled ability for humanlike speech. Even as new waves of releases from OpenAI and others show vast improvements in LLM performance, many users realized the limitations of generalized text generation. Retrieval Augmented Generation, or RAG, proposed a solution: provide embedded context documents to LLMs so that they can answer questions about specific subject matter. From there, the idea of context was broadened to any service that could send data back via an API. Thus, tool calls were developed and the standardized agent framework was built. RAG and agent frameworks can also be combined so that context documents and tool calls can be included in an LLM's response.
MCP builds upon agentic structures by providing a protocol for how these services provide data to LLMs. Instead of calling tools, an MCP agent connects to servers, each of which represents a single service or software one wishes to connect to (e.g., an internal database, GitHub, an internet browser, an API, etc.). The server is built in a standardized way and contains tools and resources that the LLM can call to feed back in its response. The term "MCP" refers to the standardization and set of rules that the agent and server must obey.
While the result does not sound so different from the agent, from a developer standpoint, MCP offers three main improvements:
- Organization of tool calls by providing a modular system for packaging them, grouped by service.
- Standardization of these services, so that they all plug into the code base the same way, meaning less time to integrate.
- Reusability of these services, meaning that once they are configured, there is no need to re-code them in a different system, saving developer time and energy.
Thus, the developer can easily connect as many different services and tools as they want to their LLM of choice, and build as many different agents as they want with a mix of the same tools. Essentially, MCP is an application of many principles of software engineering (modularization, standardization) into an agentic context.
MCP does not just promise standardization though – it is rapidly picking up buy-in from a range of users, in a variety of ways. The release has been closely linked to Anthropic's Claude Desktop application. By editing a configuration file that the desktop app itself can open, even non-technical users can quickly get simple MCPs started and start asking Claude questions while it talks to their tools. A small number of external software integrations have been released by Anthropic, including GitHub, Slack, and Puppeteer (which is able to browse the internet for the user). Clients are also accessible through a growing number of integrations with other UIs, including both official and community releases. This has made MCP highly accessible to the personal user who wants their own setup.
On the developer side, Anthropic has also released SDKs in a variety of languages for quick building of MCP custom components on the backend, most notably in TypeScript and Python, and a new package called FastMCP further simplifies and streamlines the building process for and TypeScript and Python users. This supports the development of custom components of MCP that can be run entirely through the developer's setup, instead of a third-party UI like Claude Desktop. Organizations and community members have been rapidly building MCP servers as well, with integrations ranging from tech services like Grafana and Milvus to casual apps like PayPal and Gmail.
In this article, we'll dive into the more technical side of MCP, with a focus on how to build custom setups for production. We will then then talk about the implications and potential for MCP in business applications. While we address the technical aspects from a mostly code-agnostic viewpoint, we did all experimentation and testing in Python.
Technical Details
MCP follows a standard client-server architecture and MCP consists of several main components: the host (which includes the client), servers, and data sources.
The host is the main LLM-powered application running everything, such as Claude Desktop, an IDE, or a Command Line Interface (CLI). It provides the direct interface for the user, and contains the client, which is the source code that directs communication between the LLM and servers. Servers are lightweight programs exposing the desired services via context protocols and feeding data sources back to the LLM. In the context of MCP, the definition of data source is quite broad and encompasses any source that contains the context we want to pass to the host LLM. Data sources can consist of files, databases, APIs, etc., and can be local or on the internet.
Hosts
As mentioned above, the host is the application through which the user connects to the servers. From a user perspective, there are two parts: the user interface (UI) and the client. When using integrations like Claude Desktop, the client setup, connection to the LLM and server integration are abstracted away. However, a custom client can also be built using any LLM, in a variety of languages. To delve more into how clients work and how they interface with servers, we will focus on the structure of custom clients and their necessary components.
Clients
Most clients use the following basic structure:
- Client Initialization. This is where the client starts up and connects to the LLM.
- Server Connections. These are scripts, functions, and configuration files that manage connections to each server. The FastMCP package has many tools for easily building servers and integrating them with clients.
- Query Processing. These functions handle tool interaction, including calling APIs and services, querying data, and executing tools. It might also include system prompts to inform the agent how to handle tools and resources.
- Interactive Interface. This code defines the connection to the UI or CLI that is communicating with the user and handles the passing of data to and from the interface.
- Resource management. This consists of resource cleanup, error handling, and graceful shutdown.
In addition to the above structure, a custom client offers the flexibility to customize tool calling, response handling, and interface management.
Servers
Servers are how clients connect to applications in a uniform manner. Each server should connect to one, and only one main service or software. This custom allows for a more modular construction of agents as opposed to the typical agentic frameworks, where the necessary tool calls were all built into one system. While the server handling code discussed in the client section might vary from client-to-client, the code behind a server should be the same no matter the system.
A fixed number of servers were developed in collaboration with Anthropic to serve as a starting point for developers. Since the launch of MCP, both other companies and community users have developed third-party integrations to be used by anyone. Some well-established servers also have official Docker images that can be used to run the server. Servers that require user credentials store them as environment variables.
There are three main components of servers: tools, resources, and prompts. Tools and resources are different formats of providing data to the LLM, while prompts are a structure for users to format tool calls quickly. While many servers can exist within an MCP, each server has its own set of tools, resources, and prompts that belong to it. For example, all the GitHub server's functionalities revolve around repository and user actions, such as managing pull requests, issues, etc. Meanwhile, Puppeteer handles browsing the internet, which of course includes GitHub websites.
While the GitHub server manages GitHub-native services like pull requests, actions, issues, Puppeteer can be used, for example, to search for the appropriate repository on Google. The LLM can then use the information from that search to populate a request to the GitHub Server. However, the user does not have to do the work of figuring out how to find this information – the LLM can decide based on simple requests. The LLM might even be the one to inform the user they need the exact repository name to create an issue.
Tools
Tools are functions that perform specific actions. Each server integration offers a fixed set of tools which can be found in the server's documentation, but custom tools can be added and can even combine the functions of other tools into one. Most tools require user-provided arguments about the specific data, user accounts, or information needed to fulfill the request. Tools are controlled by the agent, which means that the client source code decides if a tool is needed and which tool or tools. Therefore, while the server handles the actual fetching and calling of the tool, to smoothly integrate the tools into the MCP, the client side must have appropriate handler functions and system prompts to tell the LLM what tools are available, descriptions of what they do, and how to feed that information back to the user.
Tools receive information structured in dictionary format with the name of the tool and the arguments. When prompted to do a task, the LLM will decide for the user the best way to accomplish it. If the user has not provided the appropriate arguments, the LLM will prompt the user for the arguments. In a basic setup, the user formulates a tool call for the LLM, but system prompts and embedded tool calls can easily be incorporated so that the LLM formats its own call to the tool in dictionary format. When it receives the data from the tool, it passes it back to the user. This process can also be combined with other generative tasks such as summarizing.
Continuing the example from above, Puppeteer has a tool called puppeteer_navigate
that takes a URL as its input and navigates to that page. Meanwhile, GitHub's server offers a variety of tools for handling issues, branches, and pull requests, which usually require information such as the repository name, the user's account name, and any input text. The user can also create their own custom tools, which can both add new functionalities and call the default tools of a server. This can be extremely useful for combining multiple tool calls into one action, especially if the two tools usually need to be called together (e.g., committing changes to a branch and then pushing a pull request).
Resources
Resources are data sources connected to the server. While direct and structured access to a specific piece of data can be handled by a tool, resources provide context for the LLM to use at its own discretion. Resource types range from filesystems and databases to API responses, screenshots and images, live system data, and more. Currently, resources can exist as text that is UTF-8-encoded or raw binary data encoded in base-64.
In server source code, resources are defined by a URI in the format [protocol]://[host]/[path]
. For example, the URI for a GitHub repo would be repo://owner-name/repo-name/path
. A key feature of resources is that their URIs can also be dynamically defined using resource templates. Resource templates are abstractions of URIs that allow arguments to be passed into the resource definition by the LLM. The placeholder arguments are denoted by curly brackets. For example, the GitHub server could use template repo://{owner-name}/{repo-name}/{path}
to call a file stored in any GitHub repository owned by any user, provided the proper credentials are provided.
A flowchart showing a general example of how an MCP agent can be configured to handle tool and server information to provide the correct answer.
Prompts
Prompts are user-controlled templates that can be added to servers to allow for more standardization of repetitive tasks. The user can define prompt arguments. Then, when interfacing with the agent, they can select a prompt and simply provide the arguments needed. For example, the user could create a prompt to quickly make commits to a specific GitHub repo they frequently update, so that they only need to provide a commit message and no other data.
In addition to speeding up repetitive or frequently used tasks, prompts can be chained together in multi-step workflows. For example, a prompt could contain a series of system-user prompts in which a user asks if there are any open pull requests in a repo. If there are none, the chain ends. If not, the system returns their names to the user and asks if the user would like to view any of them. The user can then respond with their answer. If yes, the system could again prompt the user for their review. Therefore, for multi-step processes that require user interaction, and therefore cannot be handled by chaining multiple tool calls together in one custom tool call, can still be streamlined easily by chaining user and system prompts.
Client-Server Connections
The connection between the client and server is 1:1, meaning each user's client connects to the server directly, rather than going through a centralized hub. There are two potential setups for the server to communicate with the client:
- The client and server are running on the same machine. In this case, the server uses stdio (standard Input/Output) to communicate with the client.
- The client connects to the server via HTTP. After initial setup, the server can push messages to the client over a persistent connection using SSE (Server-Sent Events)
The SDK provides functions for handling both types of connection.
To connect to the servers to the host, the developer configures a server config in JSON format, stored in the root of the host's application directory (Claude Desktop will navigate you straight to the config file with a single click). Each enabled server contains the command required to run the server (e.g., uv, npx, python, Docker, etc), a list of required arguments, and any environment variables to be passed in. Access to local filesystems can also be configured as servers.
Anthropic has developed a limited number of servers [linked here] to demonstrate MCP capabilities, including GitHub, Google Drive, and Slack. There is also a growing list third-party servers with official MCP servers [linked here], as well community-built offerings [linked here].
As mentioned earlier, the FastMCP python/typescript package abstracts a lot of the code required to construct a server, including tool, resource, and prompt decorators for easy connections between different parts of the server.
Implications and business applications
In some ways, especially in the client architecture, MCP has many similarities to its agentic predecessors, and builds upon the paradigm set by it, rather than changing it. The largest functional shift MCP has made in the agentic world is with servers, which now bundle and modularize tools by service, as well as provide an open-source and standard framework for connecting to clients. Of the three capabilities of servers, tools remain the most central and important. Resources offer potential boosts to tools and efficiency of calling context, but it is yet unclear if the incorporation of resources in servers is as effective as tool-calling, in part because tool-calling is the main offering. Prompts currently offer little more than user customization at this point, and from our perspective do not need to be leveraged at all to get the most out of one's MCP agent. In summary, the biggest gain of MCP is the implementation of tool calls via server, enabling data and services to be packaged as modular code to be re-used across various AI systems.
Still in its infancy, MCP is underdeveloped in some places, especially in many areas of custom component building and security. Though not the focus of our article, we highly recommend researching security concerns and taking care with data when doing any experimenting or implementing third-party servers. Building custom servers and clients to integrate with secure data and passwords is recommended, as well as implementing rigorous session management tools and LLM guardrails to protect against hallucination.
As MCP develops, we expect to see more adoption of custom client and server setups, both because of security risks and due to their ability to be customized to an organization's needs while still being portable within the organization. Just as there is a market of open-source servers available, within a single organization, servers can be the building blocks for connecting to resources and services such as databases and APIs. If a custom server is defined strictly within the confines of a single service, with a base set of simple tools, prompts and custom tools can be built by developers for more specific use-cases, but the server itself can be used by anyone who needs to reach that resource.
A well-constructed, reusable server could lower costs by reducing developer time and the amount of code and infrastructure to maintain. The more a server is used across different applications, the more it reduces costs. Therefore, services providers such as management consultants should consider the server as a potential value add when building agentic frameworks for customers. By offering the server, we are offering reusability of the product in other agents, as well as building the opportunity for continued collaboration if the client wishes to reuse their server with other agents.
For example, WWT's management consulting division might contract with an organization to build an agent for sales representatives to query about both the organization's own and its competitors' products. The agent could have an MCP framework with two servers: one that connects to an internal database to handle questions about the organization's products, and one that connects to the internet to look for information about the competitors, with a server-enabled prompt to guide the process of finding the right information. We could then locate the servers in a central, internal location. If another team, say, customer service, also needs an agent to answer questions about products as well as customer data, they can integrate the database server and connect both datasets as resources. They might also wish to add a third server that connects to the application that handles customer tickets. Developers could take on this new project with less time and a lower cost, because they only need to build one new server. As time goes on, the analytics team might want to use the internet access server and the internal database for their own MCP agents. Therefore, the servers built for the initial use-case have compounding value across the organization, and because of the standardized protocol, are easily compatible with other clients. Additionally, as the ones who built the server, we might have a continuing relationship with the client as we help them leverage its potential and build new ones.
By centering agentic frameworks around MCP in the future, we can offer not just the agent, but the servers, which will become building blocks for us or them to continue expanding their AI capabilities beyond the task at hand. Therefore, to best leverage MCP in all business contexts, it is important to understand not only the technical details of how to build a custom client-server setup, but how to build them in a way that the customer will get the most value out of their servers. In addition, we must learn to communicate the new features MCP brings to the table, and the specific value add of custom servers.
Author's Note: I would like to thank Adrien Hernandez, Vinay Garg, Chris Carpenter and Andrew Xavier for their help and insight in writing this blog! Special thanks to Yoni Malchi for his incredible feedback.