Build your first advanced AI equity research agent

AI can already analyze stock price data, but what about building research reports for comparable analysis?

You may have heard of AI agents and agentic workflows—these are structured steps designed to achieve a specific goal. Each step is connected through specialized prompts that guide the AI’s reasoning and decision-making.

2025 is the year of agents.

In today’s newsletter, you’ll get Python code to build that builds an agent to extract financial reports from Tesla and Ford and recommends which stock to buy.

There’s a lot of code in today’s newsletter. Just copy and paste it, get everything running, then go line by line to learn how it works.

Let’s go!

Build your first advanced AI equity research agent

There’s some set up you’ll need to take care of.

First, head over to LlamaCloud and get your self a free account. You’ll need your organization ID, a project ID, and your API key.

Once done, create a .env file in the directory with this notebook. Add the following to it:

LLAMA_CLOUD_API_KEY=<llx_YOUR_KEY>

Next, create a folder called data. Download the following two files:

https://digitalassets.tesla.com/tesla-contents/image/upload/IR/TSLA-Q2-2024-Update.pdf

https://s205.q4cdn.com/882619693/files/doc_financials/2024/q2/Q2-2024-Ford-Earnings-Press-Release.pdf

Name them tesla_q2_earnings.pdf and ford_q2_earnings_press_release.pdf, respectively, and put them in the data folder.

Finally, create a file called modeling_assumptions.txt in the data directory and add the following to it:

# Financial Modeling Assumptions
Discount Rate: 8%
Terminal Growth Rate: 2%
Tax Rate: 25%
Revenue Growth (Years 1-5): 10% per annum
Revenue Growth (Years 6-10): 5% per annum
Capital Expenditures as % of Revenue: 7%
Working Capital Assumption: 3% of Revenue
Depreciation Rate: 10% per annum
Cost of Capital Assumption: 8%

Ok, you’re all set.

Let’s get to the code!

Imports and set up

Import the LlamaIndex libraries we’ll need for the research agent.

1from pydantic import BaseModel, Field
2from typing import Optional, List, Dict
3from llama_cloud_services import LlamaExtract
4from llama_cloud.core.api_error import ApiError
5from llama_cloud import ExtractConfig
6from llama_index.core.workflow import (
7    Event,
8    StartEvent,
9    StopEvent,
10    Context,
11    Workflow,
12    step,
13    draw_all_possible_flows
14)
15from llama_index.utils.workflow import draw_all_possible_flows
16from llama_index.llms.openai import OpenAI
17from llama_index.core.llms.llm import LLM
18from llama_index.core.prompts import ChatPromptTemplate
19import nest_asyncio
20from dotenv import load_dotenv
21
22nest_asyncio.apply()
23load_dotenv()

These libraries provide tools for data modeling, API interactions, workflow management, and natural language processing. We use them to create a structured workflow for financial analysis and report generation.

Define the data models

We create data models to structure our financial information and analysis outputs. Pydantic is great for making sure the output of agents conforms to what we expect.

1class RawFinancials(BaseModel):
2    revenue: Optional[float] = Field(
3        None, description="Extracted revenue (in million USD)"
4    )
5    operating_income: Optional[float] = Field(
6        None, description="Extracted operating income (in million USD)"
7    )
8    eps: Optional[float] = Field(None, description="Extracted earnings per share")
9
10
11class InitialFinancialDataOutput(BaseModel):
12    company_name: str = Field(
13        ..., description="Company name as extracted from the earnings deck"
14    )
15    ticker: str = Field(..., description="Stock ticker symbol")
16    report_date: str = Field(..., description="Date of the earnings deck/report")
17    raw_financials: RawFinancials = Field(
18        ..., description="Structured raw financial metrics"
19    )
20    narrative: Optional[str] = Field(
21        None, description="Additional narrative content (if any)"
22    )
23
24
25class FinancialModelOutput(BaseModel):
26    revenue_projection: float = Field(
27        ..., description="Projected revenue for next year (in million USD)"
28    )
29    operating_income_projection: float = Field(
30        ..., description="Projected operating income for next year (in million USD)"
31    )
32    growth_rate: float = Field(..., description="Expected revenue growth rate (%)")
33    discount_rate: float = Field(
34        ..., description="Discount rate (%) used for valuation"
35    )
36    terminal_growth_rate: float = Field(
37        ..., description="Terminal growth rate (%) used in the model"
38    )
39    valuation_estimate: float = Field(
40        ..., description="Estimated enterprise value (in million USD)"
41    )
42    key_assumptions: str = Field(
43        ..., description="Key assumptions such as tax rate, CAPEX ratio, etc."
44    )
45    summary: str = Field(
46        ..., description="A brief summary of the preliminary financial model analysis."
47    )
48
49
50class ComparativeAnalysisOutput(BaseModel):
51    comparative_analysis: str = Field(
52        ..., description="Comparative analysis between Company A and Company B"
53    )
54    overall_recommendation: str = Field(
55        ..., description="Overall investment recommendation with rationale"
56    )
57
58
59class FinalEquityResearchMemoOutput(BaseModel):
60    company_a_model: FinancialModelOutput = Field(
61        ..., description="Financial model summary for Company A"
62    )
63    company_b_model: FinancialModelOutput = Field(
64        ..., description="Financial model summary for Company B"
65    )
66    comparative_analysis: ComparativeAnalysisOutput = Field(
67        ..., description="Comparative analysis between Company A and Company B"
68    )

We define several data models using Pydantic. These models help structure our financial data and analysis outputs.

They include classes for raw financials, initial financial data output, financial model output, comparative analysis, and the final equity research memo. Each model specifies the expected fields and their descriptions.

Set up the LlamaExtract agent

Make sure you have your organization ID and project ID from LlamaCloud.

1llama_extract = LlamaExtract(
2    # Your project ID here
3    project_id="e8aabe96-8170-4987-a058-168961a97375",
4    
5    # Your organization ID here
6    organization_id="14b36159-7d91-4d2d-8048-d8ce28654ef3",
7)
8
9try:
10    existing_agent = llama_extract.get_agent(name="automotive-sector-analysis")
11    if existing_agent:
12        llama_extract.delete_agent(existing_agent.id)
13except ApiError as e:
14    if e.status_code == 404:
15        pass
16    else:
17        raise
18
19extract_config = ExtractConfig(
20    extraction_mode="BALANCED"
21)
22
23agent = llama_extract.create_agent(
24    name="automotive-sector-analysis",
25    data_schema=InitialFinancialDataOutput,
26    config=extract_config,
27)

We set up the LlamaExtract agent for our automotive sector analysis. This involves initializing the agent with specific project and organization IDs, handling any existing agents, and creating a new agent with a balanced extraction mode.

The agent is configured to use our InitialFinancialDataOutput schema for data extraction.

Define workflow events and classes

This is where the magic happens. This code sets up the events and workflow that runs the agents.

1class DeckAParseEvent(Event):
2    deck_content: InitialFinancialDataOutput
3
4
5class DeckBParseEvent(Event):
6    deck_content: InitialFinancialDataOutput
7
8
9class CompanyModelEvent(Event):
10    model_output: FinancialModelOutput
11
12
13class ComparableDataLoadEvent(Event):
14    company_a_output: FinancialModelOutput
15    company_b_output: FinancialModelOutput
16
17
18class LogEvent(Event):
19    msg: str
20    delta: bool = False
21
22
23class AutomotiveSectorAnalysisWorkflow(Workflow):
24    def __init__(
25        self,
26        agent: LlamaExtract,
27        modeling_path: str,
28        llm: Optional[LLM] = None,
29        **kwargs
30    ):
31        super().__init__(**kwargs)
32        self.agent = agent
33        self.llm = llm or OpenAI(model="gpt-4o")
34        with open(modeling_path, "r") as f:
35            self.modeling_data = f.read()
36
37    async def _parse_deck(self, ctx: Context, deck_path) -> InitialFinancialDataOutput:
38        extraction_result = await self.agent.aextract(deck_path)
39        initial_output = extraction_result.data
40        ctx.write_event_to_stream(LogEvent(msg="Transcript parsed successfully."))
41        return initial_output
42
43    @step
44    async def parse_deck_a(self, ctx: Context, ev: StartEvent) -> DeckAParseEvent:
45        initial_output = await self._parse_deck(ctx, ev.deck_path_a)
46        await ctx.set("initial_output_a", initial_output)
47        return DeckAParseEvent(deck_content=initial_output)
48
49    @step
50    async def parse_deck_b(self, ctx: Context, ev: StartEvent) -> DeckBParseEvent:
51        initial_output = await self._parse_deck(ctx, ev.deck_path_b)
52        await ctx.set("initial_output_b", initial_output)
53        return DeckBParseEvent(deck_content=initial_output)
54
55    async def _generate_financial_model(
56        self, ctx: Context, financial_data: InitialFinancialDataOutput
57    ) -> FinancialModelOutput:
58        prompt_str = """
59    You are an expert financial analyst.
60    Using the following raw financial data from an earnings deck and financial modeling assumptions,
61    refine the data to produce a financial model summary. Adjust the assumptions based on the company-specific context.
62    Please use the most recent quarter's financial data from the earnings deck.
63
64    Raw Financial Data:
65    {raw_data}
66    Financial Modeling Assumptions:
67    {assumptions}
68
69    Return your output as JSON conforming to the FinancialModelOutput schema.
70    You MUST make sure all fields are filled in the output JSON.
71
72    """
73        prompt = ChatPromptTemplate.from_messages([("user", prompt_str)])
74        refined_model = await self.llm.astructured_predict(
75            FinancialModelOutput,
76            prompt,
77            raw_data=financial_data.model_dump_json(),
78            assumptions=self.modeling_data,
79        )
80        return refined_model
81
82    @step
83    async def refine_financial_model_company_a(
84        self, ctx: Context, ev: DeckAParseEvent
85    ) -> CompanyModelEvent:
86        print("deck content A", ev.deck_content)
87        refined_model = await self._generate_financial_model(ctx, ev.deck_content)
88        print("refined_model A", refined_model)
89        print(type(refined_model))
90        await ctx.set("CompanyAModelEvent", refined_model)
91        return CompanyModelEvent(model_output=refined_model)
92
93    @step
94    async def refine_financial_model_company_b(
95        self, ctx: Context, ev: DeckBParseEvent
96    ) -> CompanyModelEvent:
97        print("deck content B", ev.deck_content)
98        refined_model = await self._generate_financial_model(ctx, ev.deck_content)
99        print("refined_model B", refined_model)
100        print(type(refined_model))
101        await ctx.set("CompanyBModelEvent", refined_model)
102        return CompanyModelEvent(model_output=refined_model)
103
104    @step
105    async def cross_reference_models(
106        self, ctx: Context, ev: CompanyModelEvent
107    ) -> StopEvent:
108        company_a_model = await ctx.get("CompanyAModelEvent", default=None)
109        company_b_model = await ctx.get("CompanyBModelEvent", default=None)
110        if company_a_model is None or company_b_model is None:
111            return
112
113        prompt_str = """
114    You are an expert investment analyst.
115    Compare the following refined financial models for Company A and Company B.
116    Based on this comparison, provide a specific investment recommendation for Tesla (Company A).
117    Focus your analysis on:
118    1. Key differences in revenue projections, operating income, and growth rates
119    2. Valuation estimates and their implications
120    3. Clear recommendation for Tesla with supporting rationale
121
122    Return your output as JSON conforming to the ComparativeAnalysisOutput schema.
123    You MUST make sure all fields are filled in the output JSON.
124
125    Company A Model:
126    {company_a_model}
127
128    Company B Model:
129    {company_b_model}
130        """
131        prompt = ChatPromptTemplate.from_messages([("user", prompt_str)])
132        comp_analysis = await self.llm.astructured_predict(
133            ComparativeAnalysisOutput,
134            prompt,
135            company_a_model=company_a_model.model_dump_json(),
136            company_b_model=company_b_model.model_dump_json(),
137        )
138        final_memo = FinalEquityResearchMemoOutput(
139            company_a_model=company_a_model,
140            company_b_model=company_b_model,
141            comparative_analysis=comp_analysis,
142        )
143        return StopEvent(result={"memo": final_memo})

We define custom events and a workflow class for our automotive sector analysis. The workflow includes steps for parsing financial decks, generating financial models, and performing comparative analysis.

Each step is defined as an asynchronous method within the AutomotiveSectorAnalysisWorkflow class.

Now let’s run it!

You can visualize the agent workflow with this code. It will produce an HTML file that you can open in any browser.

1draw_all_possible_flows(
2    AutomotiveSectorAnalysisWorkflow,
3    filename="data/automotive_sector_analysis_workflow.html",
4)

The result looks something like this.

Now run the agent workflow. The workflow processes earnings reports for Tesla and Ford, generates financial models, and produces a comparative analysis. The final result is an equity research memo that provides insights and recommendations based on the analysis of both companies.

1modeling_path = "data/modeling_assumptions.txt"
2workflow = AutomotiveSectorAnalysisWorkflow(
3    agent=agent, modeling_path=modeling_path, verbose=True, timeout=240
4)
5
6result = await workflow.run(
7    deck_path_a="data/tesla_q2_earnings.pdf",
8    deck_path_b="data/ford_q2_earnings_press_release.pdf",
9)
10final_memo = result["memo"]
11print("\n********Final Equity Research Memo:********\n", final_memo)

The result looks something like this.

And to get the final analysis.

1final_memo.comparative_analysis

And you end up with something like this.

Your next steps

There is a lot of code here. The good news is you can make slight modifications to run the analysis against different pairs of companies. Try downloading PDFs from NVIDIA and AMD, altering the code, and seeing what you get. If you run into any errors about modules not found, just pip install them.