How to Build an AI Lead Scoring Agent with Claude API (Step-by-Step)

What You'll Build

If you've ever wasted time chasing leads that were never going to convert, this tutorial is for you. We're going to build a working AI lead-scoring agent in Python that uses the Claude API to automatically evaluate and rank inbound leads — in under 150 lines of code.

By the end, you'll have an agent that accepts raw lead data, calls custom scoring tools, and returns a structured score with reasoning for each lead. I built this pattern for a real estate client here in Southwest Florida, but it works for any industry.

📦 Full Source Code Note: The complete, working code is assembled step-by-step throughout this tutorial. Every snippet below is a real piece of the final agent — by the end of Step 5, you'll have the entire thing. I recommend following along section by section rather than skipping ahead.

Prerequisites

Python 3.9 or higher installed
An Anthropic API key (get one here)
anthropic SDK installed: pip install anthropic
Basic familiarity with Python classes and dictionaries
A .env file or environment variable set for ANTHROPIC_API_KEY

Step 1: Set Up Your Claude API Client and Environment

First, let's get the project structure in place. Create a new file called lead_scoring_agent.py and set up the Anthropic client. This is the foundation everything else builds on.

We're using claude-sonnet-4-6 for this agent — it's the right balance of speed and reasoning quality for scoring tasks.

lead_scoring_agent.py

import os
import json
from anthropic import Anthropic

# Load your API key from the environment
client = Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))

MODEL = "claude-sonnet-4-6"

# Sample lead data — in production this would come from your CRM or form submissions
SAMPLE_LEADS = [
    {
        "id": "lead_001",
        "name": "Maria Gonzalez",
        "source": "Zillow",
        "budget": 850000,
        "timeline": "3 months",
        "location_preference": "Naples, FL",
        "has_preapproval": True,
        "contacted_count": 2,
        "property_type": "Single Family",
        "notes": "Ready to move, relocating from Chicago for retirement."
    },
    {
        "id": "lead_002",
        "name": "Derek Simmons",
        "source": "Facebook Ad",
        "budget": 300000,
        "timeline": "12+ months",
        "location_preference": "Fort Myers, FL",
        "has_preapproval": False,
        "contacted_count": 0,
        "property_type": "Condo",
        "notes": "Just browsing for now, no urgency mentioned."
    },
    {
        "id": "lead_003",
        "name": "Susan Park",
        "source": "Referral",
        "budget": 1200000,
        "timeline": "1 month",
        "location_preference": "Marco Island, FL",
        "has_preapproval": True,
        "contacted_count": 5,
        "property_type": "Waterfront",
        "notes": "Highly motivated buyer, selling current home next week."
    }
]

Nothing fancy here — just the client, the model name, and three realistic leads to test with. The SAMPLE_LEADS list represents what you'd pull from a CRM like HubSpot or a form submission webhook.

Step 2: Define Lead Scoring Tool Functions and Schema

This is where the agent gets its intelligence. We're going to define two tools: one that scores a lead based on weighted criteria, and one that formats the final output. Claude will decide when to call each one.

The tool schema is just a JSON dictionary that tells Claude what each tool does and what parameters it expects. Think of it as a function signature that Claude can read.

lead_scoring_agent.py (continued)

# Tool definitions — these describe the tools to Claude
TOOLS = [
    {
        "name": "score_lead",
        "description": (
            "Evaluates a real estate lead and returns a numerical score from 0 to 100 "
            "based on budget, timeline urgency, pre-approval status, engagement level, "
            "and lead source quality. Higher scores indicate higher conversion probability."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "lead_id": {
                    "type": "string",
                    "description": "The unique identifier for the lead."
                },
                "budget_score": {
                    "type": "integer",
                    "description": "Score from 0-25 reflecting budget size and seriousness."
                },
                "timeline_score": {
                    "type": "integer",
                    "description": "Score from 0-25 reflecting urgency (0 = 12+ months, 25 = within 1 month)."
                },
                "qualification_score": {
                    "type": "integer",
                    "description": "Score from 0-25 based on pre-approval status and engagement."
                },
                "source_score": {
                    "type": "integer",
                    "description": "Score from 0-25 based on lead source quality (referral = highest)."
                },
                "reasoning": {
                    "type": "string",
                    "description": "A brief explanation of why this lead received its scores."
                }
            },
            "required": [
                "lead_id",
                "budget_score",
                "timeline_score",
                "qualification_score",
                "source_score",
                "reasoning"
            ]
        }
    },
    {
        "name": "format_lead_report",
        "description": (
            "Takes scored lead data and formats it into a clean summary report "
            "suitable for a sales team dashboard or CRM import."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "lead_id": {
                    "type": "string",
                    "description": "The unique identifier for the lead."
                },
                "total_score": {
                    "type": "integer",
                    "description": "The total score from 0 to 100."
                },
                "priority_tier": {
                    "type": "string",
                    "description": "One of: HOT, WARM, COLD based on total score."
                },
                "recommended_action": {
                    "type": "string",
                    "description": "Specific next step the sales agent should take with this lead."
                },
                "reasoning": {
                    "type": "string",
                    "description": "Summary of why this lead was scored and prioritized this way."
                }
            },
            "required": [
                "lead_id",
                "total_score",
                "priority_tier",
                "recommended_action",
                "reasoning"
            ]
        }
    }
]

💡 Tip: The more descriptive your tool descriptions are, the better Claude performs. Don't be lazy with those description fields — Claude uses them to decide when and how to call the tool. Treat them like docstrings that an AI will read.

Step 3: Build the Main Agent Loop with Tool Use

Now we build the core of the agent — the agentic loop. This is the part that sends messages to Claude, handles tool call responses, executes the actual Python functions, and keeps the conversation going until Claude is done.

This pattern is called a "tool use loop" and it's the foundation of basically every real agent you'll build with the Anthropic SDK. Once you understand this, everything else is just filling in the details.

lead_scoring_agent.py (continued)

def run_tool(tool_name: str, tool_input: dict) -> dict:
    """Execute the requested tool and return its result."""
    if tool_name == "score_lead":
        return execute_score_lead(tool_input)
    elif tool_name == "format_lead_report":
        return execute_format_report(tool_input)
    else:
        return {"error": f"Unknown tool: {tool_name}"}


def score_lead_with_agent(lead: dict) -> dict:
    """
    Run the full agentic loop for a single lead.
    Keeps calling Claude until it stops requesting tool use.
    """
    system_prompt = (
        "You are a lead scoring specialist for a real estate agency in Southwest Florida. "
        "Your job is to evaluate inbound leads and score them from 0 to 100 using the "
        "score_lead tool, then format a report using the format_lead_report tool. "
        "Always use both tools in sequence — score first, then format. "
        "Be precise and consistent in your scoring logic."
    )

    # Initial user message with the raw lead data
    messages = [
        {
            "role": "user",
            "content": (
                f"Please score and format a report for the following lead:\n\n"
                f"{json.dumps(lead, indent=2)}"
            )
        }
    ]

    final_report = {}

    # Agentic loop — runs until Claude sends a stop_reason other than "tool_use"
    while True:
        response = client.messages.create(
            model=MODEL,
            max_tokens=1024,
            system=system_prompt,
            tools=TOOLS,
            messages=messages
        )

        # If Claude is done (no more tool calls), break out of the loop
        if response.stop_reason == "end_turn":
            break

        # Process each content block in the response
        tool_results = []
        for block in response.content:
            if block.type == "tool_use":
                tool_result = run_tool(block.name, block.input)

                # Capture the final report when the format tool is called
                if block.name == "format_lead_report":
                    final_report = {**block.input, "lead_name": lead["name"]}

                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": json.dumps(tool_result)
                })

        # If no tool calls were found, we're done
        if not tool_results:
            break

        # Append Claude's response and our tool results to the message history
        messages.append({"role": "assistant", "content": response.content})
        messages.append({"role": "user", "content": tool_results})

    return final_report

The while True loop is the key pattern here. Claude keeps going as long as it has tool calls to make. When it's satisfied, it returns stop_reason: "end_turn" and we break out.

Notice how we append both Claude's response AND our tool results back to the messages list. That's what gives Claude the context it needs to continue reasoning correctly.

Step 4: Implement Lead Evaluation Logic

Now let's write the actual Python functions that get called when Claude triggers the tools. These are the real scoring mechanics — Claude decides the subscores, but we validate and compute the final total here.

lead_scoring_agent.py (continued)

def execute_score_lead(tool_input: dict) -> dict:
    """
    Validate and compute the total score from Claude's subscores.
    Each subscore is capped at its max to prevent manipulation.
    """
    budget_score = min(int(tool_input.get("budget_score", 0)), 25)
    timeline_score = min(int(tool_input.get("timeline_score", 0)), 25)
    qualification_score = min(int(tool_input.get("qualification_score", 0)), 25)
    source_score = min(int(tool_input.get("source_score", 0)), 25)

    total_score = budget_score + timeline_score + qualification_score + source_score

    return {
        "lead_id": tool_input["lead_id"],
        "total_score": total_score,
        "subscores": {
            "budget": budget_score,
            "timeline": timeline_score,
            "qualification": qualification_score,
            "source": source_score
        },
        "status": "scored"
    }


def execute_format_report(tool_input: dict) -> dict:
    """
    Format the scored lead data into a clean report dict.
    Priority tier is determined by the total score range.
    """
    total_score = int(tool_input.get("total_score", 0))

    # Assign tier based on score range
    if total_score >= 75:
        tier = "HOT"
    elif total_score >= 45:
        tier = "WARM"
    else:
        tier = "COLD"

    return {
        "lead_id": tool_input["lead_id"],
        "total_score": total_score,
        "priority_tier": tier,
        "recommended_action": tool_input.get("recommended_action", "Follow up within 48 hours."),
        "reasoning": tool_input.get("reasoning", ""),
        "status": "report_ready"
    }


def score_all_leads(leads: list) -> list:
    """Run the scoring agent on a list of leads and return sorted results."""
    results = []
    for lead in leads:
        print(f"  Scoring lead: {lead['name']}...")
        report = score_lead_with_agent(lead)
        if report:
            results.append(report)

    # Sort by total score descending so highest-priority leads appear first
    results.sort(key=lambda x: x.get("total_score", 0), reverse=True)
    return results

⚠️ Important: Notice the min() calls in execute_score_lead. Claude is generally accurate, but capping subscores at their maximum ensures no single category can inflate the total. Always validate tool inputs on your side — don't trust any model output blindly.

Step 5: Test with Real Lead Data

Last piece — the main runner. This ties everything together and prints a clean leaderboard of your scored leads. Run this and you should see ranked output in your terminal within a few seconds.

lead_scoring_agent.py (continued)

def print_results(scored_leads: list) -> None:
    """Print a formatted leaderboard of scored leads to the terminal."""
    print("\n" + "="*60)
    print("       AI LEAD SCORING REPORT — NAPLES AI AGENCY")
    print("="*60)

    tier_colors = {"HOT": "🔴", "WARM": "🟡", "COLD": "🔵"}

    for i, lead in enumerate(scored_leads, 1):
        tier = lead.get("priority_tier", "UNKNOWN")
        icon = tier_colors.get(tier, "⚪")
        print(f"\n#{i} {icon} [{tier}] {lead.get('lead_name', 'Unknown')} — Score: {lead.get('total_score', 0)}/100")
        print(f"   Action: {lead.get('recommended_action', 'N/A')}")
        print(f"   Reason: {lead.get('reasoning', 'N/A')[:120]}...")

    print("\n" + "="*60)
    print(f"  Total leads scored: {len(scored_leads)}")
    hot_count = sum(1 for l in scored_leads if l.get("priority_tier") == "HOT")
    print(f"  HOT leads requiring immediate follow-up: {hot_count}")
    print("="*60 + "\n")


if __name__ == "__main__":
    print("\nStarting AI Lead Scoring Agent...")
    print(f"Model: {MODEL}")
    print(f"Leads to score: {len(SAMPLE_LEADS)}\n")

    scored = score_all_leads(SAMPLE_LEADS)
    print_results(scored)

Run it with python lead_scoring_agent.py and here's what you'll see:

Terminal Output

Starting AI Lead Scoring Agent...
Model: claude-sonnet-4-6
Leads to score: 3

  Scoring lead: Maria Gonzalez...
  Scoring lead: Derek Simmons...
  Scoring lead: Susan Park...

============================================================
       AI LEAD SCORING REPORT — NAPLES AI AGENCY
============================================================

#1 🔴 [HOT] Susan Park — Score: 94/100
   Action: Call immediately — seller closing next week, waterfront buyer with full approval.
   Reason: Referral source with maximum trust signal, pre-approved, 1-month timeline, $1.2M budget...

#2 🔴 [HOT] Maria Gonzalez — Score: 81/100
   Action: Schedule showing this week — motivated relocation buyer with pre-approval confirmed.
   Reason: Strong Zillow lead with pre-approval, 3-month timeline, high budget for Naples market...

#3 🔵 [COLD] Derek Simmons — Score: 28/100
   Action: Add to drip email campaign — no urgency, no pre-approval, early research stage.
   Reason: Facebook ad lead without pre-approval, 12+ month timeline, low engagement history...

============================================================
  Total leads scored: 3
  HOT leads requiring immediate follow-up: 2
============================================================

How It Works

Here's the plain-English version of what just happened. When you call score_lead_with_agent(), you send a lead's data to Claude as a user message. Claude reads it and decides to call the score_lead tool, filling in subscores based on its reasoning about the lead's quality.

Your Python code receives that tool call, runs execute_score_lead() to validate and total the scores, then sends the result back to Claude. Claude sees the result, decides to call format_lead_report next, and you repeat the process.

The loop ends when Claude has nothing left to do and returns stop_reason: "end_turn". You're left with a clean, structured report for each lead — no regex, no parsing, just a dictionary you can push directly to a CRM or database.

Common Errors and Fixes

Error 1: AuthenticationError — Invalid API Key

anthropic.AuthenticationError: 401 {"type":"error","error":{"type":"authentication_error","message":"invalid x-api-key"}}

Fix: Your environment variable isn't being read. Run export ANTHROPIC_API_KEY="sk-ant-..." in your terminal before running the script, or add a python-dotenv loader at the top of your file. Double-check there are no extra spaces or quotes inside the key string itself.

Error 2: KeyError on tool_input fields

KeyError: 'budget_score'

Fix: Claude occasionally omits an optional field if it can't determine a value. Switch from tool_input["budget_score"] to tool_input.get("budget_score", 0) everywhere in your tool executor functions. The get() pattern with a default value is safer for all tool input handling.

Error 3: Infinite loop — agent never stops

# Script keeps running, no output, API credits draining fast

Fix: This happens when your tool result format is wrong and Claude keeps re-requesting the tool. Make sure your tool results are wrapped exactly like this: {"type": "tool_result", "tool_use_id": block.id, "content": json.dumps(result)}. The content field must be a string, not a dict. Also add a loop counter as a safety net: if iteration > 10: break.

Next Steps

Now that you have a working agent, here are four ways to take it further:

Connect a real CRM: Replace SAMPLE_LEADS with a HubSpot or Salesforce API call. Pull new leads every hour and push scored results back automatically.
Add a multi-agent layer: Build a second "routing agent" that reads the scored leads and drafts personalized follow-up emails for each tier using a separate Claude call.
Expand scoring criteria: Add tools for checking property history via MLS API, verifying phone numbers, or pulling social signals — each as its own tool Claude can choose to call.
Build a Slack or SMS alert: Pipe HOT leads to a Twilio SMS or Slack webhook so your sales team gets notified the moment a high-priority lead comes in.