
Make a powerful AI agent to convert PDFs to code
I recently posted a PDF with a pairs trading strategy on twitter.
https://x.com/pyquantnews/status/1891848025846288541
I thought I’d implement the strategy in Python. Pretty quickly, I thought that an LLM could do a much faster job than I could.
I started to upload the PDF into ChatGPT and thought about building an agent workflow instead.
In today’s newsletter, you’ll build an AI agent workflow that reads a PDF, develops an implementation plan, and writes code.
All based on a PDF.
Let’s go!
Make a powerful AI agent to convert PDFs to code
AI agents are transforming trading strategies by integrating insights from academic research.
We now have the tools to make academic research accessible by converting complex formulas and jargon into code. This involves data ingestion, planning an implementation, and writing code. From there, we can backtest and deploy strategies in live markets.
LlamaIndex is a great framework that lets us use LLMs to build agents. We’ll use LlamaIndex to parse a PDF describing a pairs trading strategy. The next agent will extract details from the PDF and explain how to implement the strategy. The final two agents will plan an implementation and write the code.
Let's see how it works.
Imports and set up
We’ll use asyncio to run the agents asynchronously. LlamaIndex is great for building agents. Download the pairs trading PDF to work along with this example.
This code sets up our environment and initializes the language model. We load environment variables, which likely include API keys. Then we create an instance of the OpenAI language model, specifically using the GPT-4 model. This prepares us for natural language processing tasks.
Define our tool functions
We define asynchronous functions that will serve as tools for our agents
These functions are tools that our agents will use. The read_pdf_tool extracts information from a PDF file about the pairs trading strategy. The build_plan_tool and write_code_tool update the workflow state with an implementation plan and Python code, respectively. These tools allow our agents to process information and generate outputs with the help of an LLM.
Create our agents
We define three function agents to handle different tasks in our workflow
We create three function agents: PDFReaderAgent, PlanBuilderAgent, and CodeWriterAgent. Each agent has a specific role in our workflow. They use the language model and their assigned tools to perform tasks like reading PDFs, building implementation plans, and writing code.
While it looks like a lot of code, each agent follows the same pattern. They each take a name, a description, and a prompt. This is what allows the LLM know select the right tool. Finally, you pass in the LLM, the tools the agent has available, and which downstream agent is available to pass information to.
Set up our agent workflow
We create an agent workflow to orchestrate the interaction between our agents.
This code sets up our agent workflow. It defines the sequence of agents that will work on our task, starting with the PDFReaderAgent. We also set an initial state for our workflow, which will be updated as the agents perform their tasks. This workflow structure allows our agents to collaborate effectively.
Now we can execute the workflow.
We execute our agent workflow by providing a user message that outlines the task. The code then processes the events streamed by the workflow. It prints information about which agent is currently active, their outputs, tool usage, and results. This allows us to see the step-by-step progress of our workflow as it processes the PDF, builds a plan, and generates code for the pairs trading strategy.
The output should look something like this.
.png)
With a lot of text, you should see the code near the end.
Your next steps
The first thing to do is test the code the agent creates (which I didn’t do). Even if you come across bugs, you have 90%+ of basic scaffolding already done! You can also try other LLMs and refine the prompts to get more predictable outputs.