Introduction

I've been doing a lot of research into what the future of artificial intelligence (AI) applications will be for software engineers. Last year, retrieval augmented generation (RAG) was all the rage, and now we are working on building these into production applications. Now it's all about leveraging agents and function calling, so I set out to build an application to demonstrate how to leverage an agentic workflow. I wanted to demonstrate how an agentic workflow could move from one step to another while building on previous work, keeping the human in the loop, and being able to track history. Enter Protoman, my take on building a Devin-type interface with the scaled-down task of helping create a new project from its pair's specification.

Initial project setup

To start, I utilized the aider library, which proved to be highly effective for initial project setup, especially when the requirements were clear. To create a Devin-like interface, I needed a chat interface with a familiar user interface (UI), a terminal output to visualize commands run, and a file explorer to be able to view the structure of the project workspace. Since I knew what I was building, getting this going with Aider was straightforward.

Once the user interface was established, the next step was to implement basic chat functionality, the Hello World of chat applications. This involved simply wrapping the large language model (LLM) and ensuring that these interactions were displayed correctly in the user interface. To achieve real-time updates for agent updates later, I incorporated web sockets into the architecture.

Throughout this process, AI was instrumental in handling complex tasks, such as configuring CORS on the Python backends, which can often be challenging. The AI's ability to manage these intricacies allowed me to streamline the development process and get through the basic build-a-web-application step so that I would be able to quickly move onto the core business functionality: the LLM agents.

Oracle and conversation flow

At this point, I was able to switch from being a simple LLM wrapper to adding in the agent workflow. I leveraged LangGraph as this provided a good framework to implement my agents, gave me the history I was going to need, and allowed me to break the workflow to ask for human feedback. The first feature I implemented was the conditional entry point, and here I placed an oracle, backed by an LLM call.

The oracle's role begins with analyzing the user's initial input to understand the context and intent. It would assess whether the user was engaging in casual conversation, attempting to perform a task outside the AI's capabilities, or seeking assistance with a coding task. This initial assessment is essential for setting the stage for the subsequent interactions. 

A diagram of a software code

Description automatically generated

KISS, how could you fail me now? 

Developing the LLM flow for the "coding" path was an adventure all on its own. I was going to start with a simple plan, then deal with any enhancements to that process as I went.

  1. I would have the agent formulate a plan that would utilize different tools: create a file, add file content, run commands, and ask the user for information or to do things.
  2. I would have another agent evaluate the plan to ensure it was valid.
  3. I would ask the user to review the plan.
  4. Run the plan.

 If the agent in step 2 didn't think the plan was complete, it would start over at step 1 with feedback. And if the user didn't think the plan was good, it would start over at step 1 with feedback. 

A diagram of a process

Description automatically generated

As I was developing the agent workflow, I tested it with a simple prompt: create a FastAPI application. I thought this was simple enough, but boy was I wrong.  Once I started seeing the plans the AI was coming up with, they seemed logical: ask what the application should be called, then create the application with this information. What I came to realize was this was going to be super complicated when I got to running the steps, as I needed to store this information in a structured way.

Those were complications I saw when everything functioned correctly, but error handling the graph was another level of complexity entirely. Sometimes, the LLM calls failed, or the planner and evaluator couldn't agree that the plan was ready and got themselves into infinite loops.

A diagram of a diagram

Description automatically generated

I tried my best to handle each of these complications as they came, but by this point, I realized that I had a ball of duct tape and dreams, and I needed to sit down with what I had learned and start over.

This time, with a plan

Developing one step at a time, while flexible, didn't quite address the complexities of error handling and the interdependencies of gathering user information and executing tasks. Now I was able to clearly see how difficult this process was instead of just intuitively knowing this was secretly a difficult process.

The first step to starting over was to introduce a state machine into the process. This state machine would track the different stages of the agent flow. By clearly defining these stages, the agents were able to ask for human feedback freely and ensure that each step was completed before moving on to the next.

The second step was to change the agent flow to account for gathering all the additional information the user did not provide in their first ask. In the data generation stage, the LLM would gather all the necessary information from the user. This involved asking a series of questions to ensure that all the required details were collected. Once the data was gathered, the planning stage began. Here, the LLM used the collected information to create a detailed plan for the task at hand. This plan included all the steps needed to complete the task, along with any dependencies and potential pitfalls. Finally, the run stage involved carrying out the plan. The agent monitored the progress, ensuring that each step was executed.

A diagram of a question

Description automatically generated

By adding the stage to the graph, the process became much more robust. The state machine provided a clear structure, while the graph state held useful information and allowed for flexibility and adaptability. Interestingly, the benefits to stability and error handling provided by the state machine were not something I had planned, it turned out to be a happy accident that greatly benefitted the process. This new approach made the agent flow much more reliable and efficient. Errors were handled more effectively, and the process of gathering user information and running tasks became smoother and more streamlined.

Conclusion

The journey of learning how to work with agents and building this prototype was enlightening and transformative. The implementation of a state machine not only added structure and reliability but also underscored the importance of planning and adaptability in complex systems. Every challenge encountered was a learning opportunity, revealing the intricacies and potential of leveraging advanced methodologies to enhance agent flows.

Developing this prototype also highlighted the value of iterative improvement and feedback. By meticulously tracking each stage and incorporating flexibility for human input, the process evolved into a robust system capable of handling unforeseen errors and dependencies.

In conclusion, this undertaking was a profound exercise in problem-solving and innovation. It demonstrated that with the right tools and a strategic approach, even the most complex processes could be streamlined and optimized. The lessons learned and the prototype developed lay a strong foundation for future advancements in agent-based systems, promising even greater efficiency and reliability in the tasks they perform.