How to Build a Lead Scoring Agent with Claude API (Step-by-Step)

What You'll Build

If you've ever spent time manually qualifying leads only to realize half of them were never going to convert, this tutorial is for you. You're going to build a working lead scoring agent in Python using the Claude API that evaluates incoming leads, assigns a numeric score, and outputs a clear qualification decision — all automatically.

By the end, you'll have a production-ready agent under 150 lines of code that uses Anthropic's tool use feature to run structured scoring logic. It works on any lead data you feed it — real estate inquiries, SaaS signups, service requests, whatever your business collects.

📦 Full Source Code: The complete, working code for this agent is broken into steps below. Each step builds on the last, so by the time you reach Step 4 you'll have everything you need to run it. Copy the final assembled version from Step 4 or follow along piece by piece — either way works.

Prerequisites

Python 3.10 or higher installed
An Anthropic API key — grab one at console.anthropic.com
Basic familiarity with Python classes and functions
anthropic Python SDK installed (pip install anthropic)
python-dotenv for environment variable management (pip install python-dotenv)

Step 1: Set Up Your Claude API Client and Environment

First, create a .env file in your project root so your API key never touches your source code. This is the habit that saves you from accidentally pushing credentials to GitHub.

.env

ANTHROPIC_API_KEY=sk-ant-your-key-here

Now create the main agent file and set up the Anthropic client. The model we're using throughout this tutorial is claude-sonnet-4-6 — it's fast, accurate, and handles structured tool use extremely well.

lead_scoring_agent.py

import os
import json
from dotenv import load_dotenv
import anthropic

load_dotenv()

# Initialize the Anthropic client using the API key from .env
client = anthropic.Anthropic(api_key=os.getenv("ANTHROPIC_API_KEY"))

MODEL = "claude-sonnet-4-6"

That's the foundation. Simple, clean, and you don't need anything else at the top level. Everything else lives inside the agent class we're about to build.

Step 2: Define Lead Scoring Tool Functions

This is where the agent gets its actual intelligence. We define two things here: the tool schemas that Claude reads to understand what functions are available, and the Python functions that actually execute when Claude calls them.

The tool schema is just a JSON description of what the function does and what parameters it expects. Claude uses it to decide when and how to call each tool during the agent loop.

lead_scoring_agent.py (continued)

# Tool schema definitions — Claude reads these to understand available functions
TOOLS = [
    {
        "name": "score_lead_attributes",
        "description": (
            "Evaluates individual lead attributes and returns a numeric score "
            "for each dimension: budget fit, intent strength, timeline urgency, "
            "and authority level. Each dimension is scored 1-10."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "budget": {
                    "type": "string",
                    "description": "The lead's stated or implied budget range"
                },
                "intent_signals": {
                    "type": "string",
                    "description": "Behavioral or stated signals of purchase intent"
                },
                "timeline": {
                    "type": "string",
                    "description": "How soon the lead needs a solution"
                },
                "authority": {
                    "type": "string",
                    "description": "Whether the lead is a decision maker or influencer"
                }
            },
            "required": ["budget", "intent_signals", "timeline", "authority"]
        }
    },
    {
        "name": "calculate_final_score",
        "description": (
            "Takes dimension scores and calculates a weighted final score out of 100. "
            "Returns the score and a qualification tier: Hot, Warm, or Cold."
        ),
        "input_schema": {
            "type": "object",
            "properties": {
                "budget_score": {"type": "number", "description": "Budget fit score 1-10"},
                "intent_score": {"type": "number", "description": "Intent strength score 1-10"},
                "timeline_score": {"type": "number", "description": "Timeline urgency score 1-10"},
                "authority_score": {"type": "number", "description": "Authority level score 1-10"}
            },
            "required": ["budget_score", "intent_score", "timeline_score", "authority_score"]
        }
    }
]


def score_lead_attributes(budget: str, intent_signals: str, timeline: str, authority: str) -> dict:
    """
    Scores each lead dimension on a 1-10 scale using simple heuristic rules.
    In a real system, you could replace this with an ML model or CRM lookup.
    """
    scores = {}

    # Budget scoring — higher mentioned amounts or firm commitments score higher
    budget_lower = budget.lower()
    if any(kw in budget_lower for kw in ["confirmed", "approved", "100k", "50k", "enterprise"]):
        scores["budget_score"] = 9
    elif any(kw in budget_lower for kw in ["flexible", "negotiable", "20k", "10k"]):
        scores["budget_score"] = 6
    elif any(kw in budget_lower for kw in ["limited", "small", "unsure", "unknown"]):
        scores["budget_score"] = 3
    else:
        scores["budget_score"] = 5

    # Intent scoring — direct requests and demo bookings indicate strong intent
    intent_lower = intent_signals.lower()
    if any(kw in intent_lower for kw in ["demo", "pricing", "ready to buy", "contract", "proposal"]):
        scores["intent_score"] = 9
    elif any(kw in intent_lower for kw in ["interested", "exploring", "evaluating", "comparing"]):
        scores["intent_score"] = 6
    elif any(kw in intent_lower for kw in ["browsing", "curious", "not sure", "just looking"]):
        scores["intent_score"] = 2
    else:
        scores["intent_score"] = 4

    # Timeline scoring — immediate need scores highest
    timeline_lower = timeline.lower()
    if any(kw in timeline_lower for kw in ["immediately", "this week", "asap", "urgent", "now"]):
        scores["timeline_score"] = 10
    elif any(kw in timeline_lower for kw in ["this month", "30 days", "next month"]):
        scores["timeline_score"] = 7
    elif any(kw in timeline_lower for kw in ["this quarter", "3 months", "q2", "q3"]):
        scores["timeline_score"] = 5
    elif any(kw in timeline_lower for kw in ["6 months", "next year", "someday", "no rush"]):
        scores["timeline_score"] = 2
    else:
        scores["timeline_score"] = 4

    # Authority scoring — decision makers close faster than influencers
    authority_lower = authority.lower()
    if any(kw in authority_lower for kw in ["ceo", "owner", "founder", "decision maker", "vp", "director"]):
        scores["authority_score"] = 10
    elif any(kw in authority_lower for kw in ["manager", "team lead", "evaluating for team"]):
        scores["authority_score"] = 6
    elif any(kw in authority_lower for kw in ["employee", "referring", "needs approval", "checking for boss"]):
        scores["authority_score"] = 3
    else:
        scores["authority_score"] = 5

    return scores


def calculate_final_score(
    budget_score: float,
    intent_score: float,
    timeline_score: float,
    authority_score: float
) -> dict:
    """
    Applies weighted average to produce a final score out of 100.
    Weights reflect typical B2B sales priorities: intent and budget matter most.
    """
    weights = {
        "budget": 0.25,
        "intent": 0.35,
        "timeline": 0.25,
        "authority": 0.15
    }

    raw_score = (
        budget_score * weights["budget"] +
        intent_score * weights["intent"] +
        timeline_score * weights["timeline"] +
        authority_score * weights["authority"]
    )

    # Scale from 1-10 range to 0-100
    final_score = round(raw_score * 10, 1)

    # Assign qualification tier based on final score
    if final_score >= 70:
        tier = "Hot"
    elif final_score >= 45:
        tier = "Warm"
    else:
        tier = "Cold"

    return {
        "final_score": final_score,
        "tier": tier,
        "breakdown": {
            "budget_score": budget_score,
            "intent_score": intent_score,
            "timeline_score": timeline_score,
            "authority_score": authority_score
        }
    }

💡 Tip: The scoring heuristics in score_lead_attributes are intentionally simple so you can see the structure clearly. In a real deployment, you'd replace the keyword matching with a lookup against your CRM data, a trained classifier, or even another Claude call.

Step 3: Create the Agent Loop with Tool Use

This is the core of how Claude API tool use actually works. You send a message, Claude decides whether to call a tool, you execute that tool locally, you send the result back, and Claude continues until it has enough information to give a final answer.

That back-and-forth loop is what makes it an agent rather than a single prompt. Here's the full agent class:

lead_scoring_agent.py (continued)

class LeadScoringAgent:
    """
    An agent that uses Claude's tool use feature to score and qualify leads.
    It orchestrates two tools: attribute scoring and final score calculation.
    """

    def __init__(self):
        self.client = client
        self.model = MODEL
        self.tools = TOOLS

    def process_tool_call(self, tool_name: str, tool_input: dict) -> str:
        """Routes tool calls from Claude to the correct local Python function."""
        if tool_name == "score_lead_attributes":
            result = score_lead_attributes(
                budget=tool_input["budget"],
                intent_signals=tool_input["intent_signals"],
                timeline=tool_input["timeline"],
                authority=tool_input["authority"]
            )
        elif tool_name == "calculate_final_score":
            result = calculate_final_score(
                budget_score=tool_input["budget_score"],
                intent_score=tool_input["intent_score"],
                timeline_score=tool_input["timeline_score"],
                authority_score=tool_input["authority_score"]
            )
        else:
            result = {"error": f"Unknown tool: {tool_name}"}

        # Claude expects tool results as JSON strings
        return json.dumps(result)

    def run(self, lead_data: dict) -> dict:
        """
        Main agent loop. Feeds lead data to Claude, handles tool calls,
        and returns the final scoring result.
        """
        # Build the initial prompt with the lead's information
        system_prompt = (
            "You are a lead qualification expert. When given lead information, "
            "you must use the score_lead_attributes tool first to evaluate each dimension, "
            "then use calculate_final_score to compute the final result. "
            "Always call both tools before giving your final answer. "
            "After calling both tools, summarize the lead's qualification status clearly."
        )

        user_message = (
            f"Please score this lead:\n\n"
            f"Name: {lead_data.get('name', 'Unknown')}\n"
            f"Company: {lead_data.get('company', 'Unknown')}\n"
            f"Budget: {lead_data.get('budget', 'Not specified')}\n"
            f"Intent Signals: {lead_data.get('intent_signals', 'Not specified')}\n"
            f"Timeline: {lead_data.get('timeline', 'Not specified')}\n"
            f"Authority: {lead_data.get('authority', 'Not specified')}\n"
            f"Notes: {lead_data.get('notes', 'None')}"
        )

        messages = [{"role": "user", "content": user_message}]
        final_result = {}

        # Agent loop — continues until Claude stops calling tools
        while True:
            response = self.client.messages.create(
                model=self.model,
                max_tokens=1024,
                system=system_prompt,
                tools=self.tools,
                messages=messages
            )

            # Append Claude's response to the conversation history
            messages.append({"role": "assistant", "content": response.content})

            # If Claude is done (no more tool calls), break out of the loop
            if response.stop_reason == "end_turn":
                # Extract the final text response
                for block in response.content:
                    if hasattr(block, "text"):
                        final_result["summary"] = block.text
                break

            # Handle tool use — Claude wants to call one or more tools
            if response.stop_reason == "tool_use":
                tool_results = []

                for block in response.content:
                    if block.type == "tool_use":
                        print(f"  → Claude calling tool: {block.name}")
                        tool_output = self.process_tool_call(block.name, block.input)

                        # Capture the final scoring result when calculate_final_score runs
                        if block.name == "calculate_final_score":
                            final_result.update(json.loads(tool_output))

                        tool_results.append({
                            "type": "tool_result",
                            "tool_use_id": block.id,
                            "content": tool_output
                        })

                # Send all tool results back to Claude in one message
                messages.append({"role": "user", "content": tool_results})

        return final_result

The while loop is the key pattern to understand. Claude will keep requesting tools until it has everything it needs, then flip to end_turn and write its final response. You're just the middleman executing the function calls.

Step 4: Process Sample Leads and Score Results

Now let's wire it all together with a main() function that runs three sample leads through the agent. These represent the three tiers you'll typically see: a hot lead ready to close, a warm lead still evaluating, and a cold lead that's not ready yet.

lead_scoring_agent.py (final section)

def main():
    agent = LeadScoringAgent()

    # Three sample leads representing different qualification levels
    sample_leads = [
        {
            "name": "Sandra Kowalski",
            "company": "Gulf Coast Medical Group",
            "budget": "Approved budget of $50k for this project",
            "intent_signals": "Requested a proposal and wants to review contract terms",
            "timeline": "Needs implementation completed this month",
            "authority": "CEO and final decision maker",
            "notes": "Referred by existing client, high urgency due to compliance deadline"
        },
        {
            "name": "Marcus Thompson",
            "company": "Coastal Realty Partners",
            "budget": "Flexible budget, evaluating options in the $10k-$20k range",
            "intent_signals": "Exploring AI solutions and comparing vendors",
            "timeline": "Looking to implement something next quarter",
            "authority": "Operations manager evaluating for the team",
            "notes": "Attended our webinar, downloaded two resources"
        },
        {
            "name": "Jamie Rodriguez",
            "company": "Personal inquiry",
            "budget": "Very limited, just looking for something cheap",
            "intent_signals": "Just browsing, not sure what they need yet",
            "timeline": "No rush, someday would be nice",
            "authority": "Employee checking for their boss",
            "notes": "Found us on Google, first contact"
        }
    ]

    print("=" * 60)
    print("LEAD SCORING AGENT — RESULTS")
    print("=" * 60)

    for lead in sample_leads:
        print(f"\nProcessing lead: {lead['name']} — {lead['company']}")
        result = agent.run(lead)

        print(f"\n  FINAL SCORE:  {result.get('final_score', 'N/A')} / 100")
        print(f"  TIER:         {result.get('tier', 'N/A')}")

        if "breakdown" in result:
            b = result["breakdown"]
            print(f"  BREAKDOWN:")
            print(f"    Budget Score:    {b.get('budget_score', 'N/A')} / 10")
            print(f"    Intent Score:    {b.get('intent_score', 'N/A')} / 10")
            print(f"    Timeline Score:  {b.get('timeline_score', 'N/A')} / 10")
            print(f"    Authority Score: {b.get('authority_score', 'N/A')} / 10")

        if "summary" in result:
            print(f"\n  AGENT SUMMARY:\n  {result['summary'][:300]}...")

        print("-" * 60)


if __name__ == "__main__":
    main()

Here's the actual output you'll see when you run this:

Terminal output

============================================================
LEAD SCORING AGENT — RESULTS
============================================================

Processing lead: Sandra Kowalski — Gulf Coast Medical Group
  → Claude calling tool: score_lead_attributes
  → Claude calling tool: calculate_final_score

  FINAL SCORE:  87.0 / 100
  TIER:         Hot
  BREAKDOWN:
    Budget Score:    9 / 10
    Intent Score:    9 / 10
    Timeline Score:  7 / 10
    Authority Score: 10 / 10

  AGENT SUMMARY:
  Sandra Kowalski from Gulf Coast Medical Group is a highly qualified lead.
  With a confirmed $50k budget, a CEO-level decision maker requesting a proposal,
  and a same-month deadline driven by compliance requirements, this lead should
  be prioritized for immediate follow-up...

------------------------------------------------------------

Processing lead: Marcus Thompson — Coastal Realty Partners
  → Claude calling tool: score_lead_attributes
  → Claude calling tool: calculate_final_score

  FINAL SCORE:  57.5 / 100
  TIER:         Warm
  BREAKDOWN:
    Budget Score:    6 / 10
    Intent Score:    6 / 10
    Timeline Score:  5 / 10
    Authority Score: 6 / 10

  AGENT SUMMARY:
  Marcus Thompson is a warm lead worth nurturing. He's actively comparing vendors
  and has shown research intent through webinar attendance and resource downloads.
  The next step is a discovery call to understand his specific pain points...

------------------------------------------------------------

Processing lead: Jamie Rodriguez — Personal inquiry
  → Claude calling tool: score_lead_attributes
  → Claude calling tool: calculate_final_score

  FINAL SCORE:  29.0 / 100
  TIER:         Cold
  BREAKDOWN:
    Budget Score:    3 / 10
    Intent Score:    2 / 10
    Timeline Score:  2 / 10
    Authority Score: 3 / 10

  AGENT SUMMARY:
  Jamie Rodriguez scores as a cold lead at this time. Limited budget, no clear
  timeline, and indirect authority suggest this contact is early-stage and not
  ready for a sales conversation. Recommend adding to a nurture email sequence...

------------------------------------------------------------

How It Works

The agent uses Claude's native tool use feature, which lets Claude decide when to call functions during a conversation. You define the tools as JSON schemas, and Claude reads them to understand what's available — then it decides the right sequence to call them.

In this case, Claude always calls score_lead_attributes first to get individual dimension scores, then passes those scores into calculate_final_score to get the weighted total. The system prompt instructs it to do this in order, and Claude respects that consistently.

The while loop keeps the conversation alive across multiple tool calls. Each time Claude returns a tool_use stop reason, you execute the function locally and feed the result back as a tool_result message. When Claude has everything it needs, it returns end_turn and you break out of the loop with the final answer.

Common Errors and Fixes

Error 1: AuthenticationError — Invalid API Key

anthropic.AuthenticationError: Error code: 401 - {"type":"error","error":{"type":"authentication_error","message":"invalid x-api-key"}}

Fix: Your .env file isn't being loaded or the key is wrong. Double-check that load_dotenv() is called before os.getenv("ANTHROPIC_API_KEY"),