This is a work in progress. A series, if you will.
I have been working with many AI tools over the past year or so. ChatGPT has become my personal advisor, therapist, and developer assistant.
I have worked with multiple paid tools: Gemini, Claude, the aforementioned ChatGPT. I have also worked with ChatGPT (or OpenAI) from the API side of things, using the API to do programmatic development.
Recently, I came across Ollama. By all means, visit the website. It’s an open source tool that enables you to run open source LLMs on your local system, avoiding the costs of paid services; or, the rate-limits of a free service.
There are limitations, of course, tied to the models themselves but also to your own local setup capabilities. This article isn’t going to address them. Maybe a future article will.
One of my motivations to run Ollama is quite simple. I am a fan of open-source tooling and have been so for over 20 years. When it comes to an option between paid or open source, I lean towards “run it myself on open source”.
Another motivation is cost, especially when it comes to experimentation and running your own lab. I have working on several AI or LLM-driven projects over that past year; and, up until recently, have incurred costs using paid services. Using Ollama allows me to truly experiment with AI / LLM / GenAI capabilities without running up a large bill. The cost will be my own time, energy, and I suppose electricity costs.
Langraph / Lanchain
Langchain is a framework for building LLM applications, you can read more at the source documentation.
Langgraph is a built on top of LangChain and is specific for working with multiple AI Agents; or, chaining AI Agents together in a workflow.
AI Agents or Agentic AI is the concept of giving your LLM a profile: i.e. “You are a professional web developer. I will give you a list of requirements for a website and you will return a series of code files, in JSON format. These files will be in [pick your language here] and they will adhere to certain standards. You can do a search of Agentic AI profiles or prompts to see just how involved these things can get.
A project I am interested in and have been working on is developing a workflow of agents that are typically involved in the building a web application.
When building a web application, these are some of the typical persons involved in that project:
- Product Manager
- Architect
- UX / UI Designer
- Backend Developer
- Frontend Developer
- Quality Assurance
- Devops Engineer
Each of these personas has a role to play in the process of building a web application. The project I am working on is to have:
- multiple roles defined as separate AI Agents, in JSON.
- Each Agent will receive an input (more to talk here) and will also deliver a specific output
Using AI to build the AI
Let’s be very transparent, I have become a huge fan of working with AI / LLM tools for as many things as I can. It has become my default mobile and desktop app and has supplanted google as my goto for any and all questions. The only difference is which AI tools I’m using.
For this project, I did use ChatGPT and I also used CoPilot hooked into VSCode to write the code with me.
Again, the scope of the project is to essentially build an automated web-development team, with options to review the produced artifacts prior to moving onto the next step. The final step, at this stage, is code. The documents (or artifacts) produced before that final deliverable consist of a Project Requirement Document (PRD), high-level architecture documents, and UI/UX designs.
Agent flow
Each Agent is defined in JSON and stored in /agents/profiles/
Here is the product_manager.json file, as an example:
{
"order": 1,
"name":"ProductManagerAgent",
"output_key":"product_spec",
"doc_type":"product_spec",
"persona":"You are a senior product manager. Based on the user's prompt, create a clear and concise product specification. Include goals, user stories, key features, and success criteria.",
"template_file":"product_spec.txt",
"enabled":"True"
}
From the JSON file, you can see we have a name, an order (where in the order of agents this particular agent will appear – ProductManagerAgent is 1, or first). There is a persona, there is an enabled key, to turn each agent on or off.
The output_key, doc_type, and template_file, as you can see, have similar values; and, at a later round of optimization, that will be fixed; however, that can come later. As I’ve heard before, “Don’t optimize too early”; and, I believe that.
As a bit of a side note, I think, as an industry (software engineering), we are too quick to optimize and prevent tech debt, but that is for a later conversation.
So I’ll have four or five of these agent profiles, stored in an /agents/profiles/ directory.
My application entry point – app.py will start with requesting a user prompt:
user_prompt = input_with_default("What kind of web app?", DEFAULT_APP)
That entrypoint and that input will be stored in a vector database (I’m using chroma db) as the initial input; and, it will be stored with the session ID of this application session.
I will also initiate an LLM session (again, using Ollama and Langchain)
from langchain_ollama import OllamaLLM
from config.settings import DEFAULT_MODEL
def get_llm(model_name: str = DEFAULT_MODEL):
return OllamaLLM(model=model_name)
From there, I will get all the agents stored in the aforementioned /agents/profiles/ directory and list them by order (for those that are enabled):
def get_registered_agents_dynamically(llm):
profiles_dir = os.path.join(os.path.dirname(__file__), "profiles")
agents = []
for filename in os.listdir(profiles_dir):
if filename.endswith(".json"):
with open(os.path.join(profiles_dir, filename), "r") as f:
profile = json.load(f)
if profile.get('enabled') == "True":
input(f"{profile.get('name')}, {profile.get('enabled')}")
agents.append((profile.get("order", 9999), GenericAgent(llm, profile)))
# Sort by 'order' and strip out the tuples
sorted_agents = [agent for _, agent in sorted(agents, key=lambda x: x[0])]
return sorted_agents
Aaaand, with my agents, I will add them to my graph:
def create_flow(agents: List[BaseAgent], memory_store, session_id: str):
builder = StateGraph(AppState)
for agent in agents:
builder.add_node(agent.name, lambda state, agent=agent: agent.run(state, session_id=session_id, memory_store=memory_store))
for i in range(len(agents)-1):
builder.add_edge(agents[i].name, agents[i+1].name)
builder.set_entry_point(agents[0].name)
builder.set_finish_point(agents[-1].name)
return builder.compile()
That will be fun via:
result = graph.invoke({"user_prompt": user_prompt})
At this stage, I have a base class (BaseAgent) that and a GenericAgent that is a subclass of BaseAgent. These two work together to run each agent, gather the inputs and product an output.
Each output (a json file) will be stored in a vector database (chroma db). Each agent will pull all the previous stored outputs from that vector database associated with the same session ID. This means that each agent will review all the previous documents, before producing its own output. The reasoning here is that, similar to a web or product development team, each participant should be aware of all the requirements as they consider their own created artifact.
Human in the Loop
For each agent and their output, I have a function that will ask for input and give a human the opportunity to provide feedback, and the agent will then re-produce their output, with this new feedback, in addition to the previous outputs.
A future goal might be to allow a human to review any of the documents at any time in the process, make adjustments that then re-run the entire workflow, from that point (e.g. if I need to adjust the architecture document, all subsequent documents – UI/UX designs; backend and frontend code – would also be re-run.
Another future state would be to ensure that any feedback doesn’t necessarily conflict with earlier outputs (i.e. if the original product requirements were to build a soup-based recipe site and my feedback for the backend developer is to change this to a music listing site), the agents should identify that as a change in scope or conflict with the original document.
def human_review(output: str, agent_name: str) -> tuple[str, str | None]:
print(f"\n--- {agent_name} Output ---")
print(output)
while True:
choice = input("Approve output? (y = yes, n = revise with feedback) ").strip().lower()
if choice == "y":
return output, None
elif choice == "n":
feedback = input("Enter feedback for improvement: ")
return None, feedback
Structured Outputs
This is something I’ve come to love but also, still working on.
When it comes to asking the backend and frontend developer agents to produce their output, I am asking for a json-structured list of files and code. That json is then used to generate actual code files.
The biggest challenge here was (and still is), asking the LLM to only give structured JSON outputs. Too often, the output would be something like:
“here is your output {files:[…]}”
That first line, “here is your output”, would break the parsers.
This is still a work in process, in regards to fixing. I have looked at structured outputs but even those need well structured JSON. I’ve had to resort to some functions that will sanitize the output before applying the structured output. I feel like I’m going down a path of regular expressions (as in, “I had a problem, so I decided to use regular expressions. Now I have two problems.”).
I will continue to work on this problem.
Status and Next Steps
I now have a working protoype application that will
- Take a series of Agents, stored as JSON and build a LangGraph
- Will ask for a user input to start the process
- Will go through each agent, review the previous inputs / outputs and produce its output, as defined in its profile.
- Will accept human in the loop feedback; and, will make modifications, based on the feedback.
- Will save the backend and frontend developer outputs as code.
Next Steps:
There are a number of steps to improve upon.
- I have created an API endpoint to test against; however, this should be built out, along with a front end, to allow for a user to perform these tasks via a browser.
- They should also be permitted to review any previous documents at any given time
- Agent improvement: The agent profiles are all one-line profiles. Building out the agent profiles to improve accuracy will be critical.
- Comparing different models: Ollama supports multiple open source models – being able to select different models for different tasks will be helpful. Being able to measure and compare model outputs against each other to determine the best model to select is even better.
- Code output: The code output is a skeleton at this point and there’s no real functionality
- Running and testing code: Making the code runnable on a local environment will be helpful, as well.
That is the state of this project, at this time. As I continue to make enhancements, I will document those here.