AI-Operated Prospecting on Cloudflare D1 | Cookbook

What you're building

A prospecting pipeline backed by Cloudflare D1 that discovers businesses from two public data sources — county parcel records and Google Places — deduplicates them, and enriches each lead before an AI coding agent scores, classifies, and loads them into D1.

The two discovery paths have different enrichment needs, so they get separate diagrams.

Places-first path

Google Places is the entry point. Each result already includes coordinates, website, rating, and review count — so geocode and business intel enrichment are skipped. The remaining stages fill in email and ownership data.

Parcel-first path

Parcel records are the entry point. For each unlinked parcel, a single Places Text Search call finds the operating business and returns coordinates, rating, reviews, and status in the same response — so no separate geocode or business intel passes are needed. The remaining enrichment stages fill in email and confirm ownership.

Everything lands in a Cloudflare D1 database. The discovery scripts output SQL files, the enrichment scripts query D1 directly via wrangler d1 execute, and the AI agent reads and writes through the same interface. One database, no moving parts.

Why two sources? Places-first gives you broad coverage — every dentist, restaurant, or church in a city. Parcel-first gives you owner-operators — the person who owns the building is often the decision-maker. The two pipelines find different leads with different conversion profiles. Run both.

Prerequisites

A Cloudflare account (free tier works)
wrangler CLI installed and authenticated (npm install -g wrangler && wrangler login)
A Google Places API key (the new API, not the legacy one)
Python 3 (stdlib only for core pipeline)
County parcel data as shapefiles or file geodatabases — most US counties publish these for free
An AI coding agent (Claude Code, Cursor, etc.)

bash# ~/.zshrc or ~/.bashrc
export GOOGLE_API_KEY="your-places-api-key"

Set up D1

Create the database and apply a migration with two tables: leads for the prospects the pipeline discovers, and parcels for the county property records you'll match against.

bashnpx wrangler d1 create prospecting-db
# Copy the database_id from the output

The schema

sql-- 0001_initial.sql

CREATE TABLE leads (
  id               INTEGER PRIMARY KEY AUTOINCREMENT,
  company          TEXT,
  phone            TEXT,
  email            TEXT,
  website          TEXT,
  address          TEXT,
  city             TEXT,
  state            TEXT,
  zip              TEXT,
  lat              REAL,
  lng              REAL,
  lead_type        TEXT DEFAULT 'tenant',
  score            TEXT,
  parcel_id        TEXT,
  rating           REAL,
  review_count     INTEGER,
  business_status  TEXT,
  place_id         TEXT,
  geocode_enriched INTEGER NOT NULL DEFAULT 0,
  email_enriched   INTEGER NOT NULL DEFAULT 0,
  hunter_enriched  INTEGER NOT NULL DEFAULT 0,
  parcel_enriched  INTEGER NOT NULL DEFAULT 0,
  intel_enriched   INTEGER NOT NULL DEFAULT 0,
  created_at       TEXT NOT NULL DEFAULT (datetime('now'))
);

CREATE TABLE parcels (
  prop_id      TEXT PRIMARY KEY,
  owner_name   TEXT,
  situs_addr   TEXT,
  situs_city   TEXT,
  situs_zip    TEXT,
  land_use     TEXT,
  acres        REAL,
  market_value REAL
);

CREATE INDEX idx_leads_city ON leads(city);
CREATE INDEX idx_leads_enrichment ON leads(geocode_enriched, email_enriched, hunter_enriched, parcel_enriched, intel_enriched);
CREATE INDEX idx_parcels_city ON parcels(situs_city);
CREATE INDEX idx_parcels_addr ON parcels(situs_addr);

The five _enriched flags on leads are the backbone of the pipeline. More on those in the enrichment section.

bashnpx wrangler d1 migrations apply prospecting-db --remote

Load parcel data into D1

This is a one-time step. US counties publish property records as GIS datasets — owner name, address, land use code, acreage, market value. You download the shapefiles, extract the commercial parcels, and load them into D1's parcels table. After that, everything queries the database.

In Texas, the statewide parcel dataset lives at data.geographic.texas.gov. Most states have an equivalent. Download shapefiles for the counties you care about.

Extract and filter

Parcel shapefiles contain residential, commercial, agricultural, and exempt properties. You only want the commercial ones. Land use codes vary by county, but most follow a pattern:

Code	Category
F1, F3	Commercial / office
B1, B2, B3	Multifamily
EX9	Church / nonprofit
EX5	School

Write a script that reads the .dbf file (the attribute table in a shapefile), filters by land use code, and writes the records you care about:

pythonimport struct

def read_dbf(path):
    """Read a .dbf file into a list of dicts. stdlib only."""
    with open(path, 'rb') as f:
        header = f.read(32)
        num_records = struct.unpack('<I', header[4:8])[0]
        header_size = struct.unpack('<H', header[8:10])[0]
        record_size = struct.unpack('<H', header[10:12])[0]

        fields = []
        while True:
            field_desc = f.read(32)
            if field_desc[0] == 0x0D:
                break
            name = field_desc[:11].split(b'\x00')[0].decode()
            ftype = chr(field_desc[11])
            fsize = field_desc[16]
            fields.append((name, ftype, fsize))

        f.seek(header_size)
        rows = []
        for _ in range(num_records):
            record = f.read(record_size)
            if record[0:1] == b'*':
                continue
            pos = 1
            row = {}
            for name, ftype, fsize in fields:
                val = record[pos:pos+fsize].decode('latin-1').strip()
                row[name] = val
                pos += fsize
            rows.append(row)
    return rows

COMMERCIAL_CODES = {'F1', 'F3', 'F10', 'B1', 'B2', 'B3', 'EX9', 'EX5'}

parcels = read_dbf('collin_county/parcels.dbf')
commercial = [p for p in parcels if p.get('land_use') in COMMERCIAL_CODES]

Filter by your ideal customer profile

Not every commercial parcel is worth pursuing. Before you spend API calls on enrichment, filter for the businesses you actually want to reach. Define your ICP and drop everything else.

This depends on what you sell. A landscaping company might want parcels over two acres. A SaaS company might only want medical offices. A cleaning service might want office buildings above a certain market value. Whatever the criteria, apply them early — enrichment costs money per lead.

pythondef matches_icp(parcel):
    """Filter parcels by your ideal customer profile. Customize this."""
    land_use = parcel.get('land_use', '')
    if land_use not in COMMERCIAL_CODES:
        return False

    # Example: skip parcels without a street address (can't geocode)
    if not parcel.get('situs_addr'):
        return False

    # Example: only properties above a certain acreage
    try:
        acres = float(parcel.get('acres', 0))
        if acres < 0.25:
            return False
    except ValueError:
        pass

    return True

filtered = [p for p in commercial if matches_icp(p)]

After filtering, a single county typically yields thousands of parcels worth investigating. Four counties in the Dallas-Fort Worth area produced ~86,000 commercial parcels total — with ICP filters applied, that narrowed to around 12,000.

Load into D1

Generate SQL INSERTs from the filtered parcels and apply them:

pythonwith open('/tmp/load_parcels.sql', 'w') as f:
    for p in filtered:
        f.write(
            f"INSERT OR IGNORE INTO parcels (prop_id, owner_name, situs_addr, situs_city, "
            f"situs_zip, land_use, acres, market_value) VALUES ("
            f"'{p['prop_id']}', '{p['owner_name']}', '{p['situs_addr']}', "
            f"'{p['situs_city']}', '{p.get('situs_zip','')}', '{p['land_use']}', "
            f"{p.get('acres', 0)}, {p.get('market_value', 0)});\n"
        )

bashnpx wrangler d1 execute prospecting-db --remote --file /tmp/load_parcels.sql

Once the parcels are in D1, you never touch the shapefiles again. Discovery scripts query the parcels table directly.

The two discovery pipelines

Pipeline A: Places-first

Search Google Places by business type and city. This finds every business matching your categories, whether they own or rent their space. Good for broad coverage.

pythonimport json
import urllib.request

PLACES_BASE = "https://places.googleapis.com/v1/places"

def places_search(query, api_key, max_results=20):
    payload = json.dumps({
        "textQuery": query,
        "maxResultCount": max_results,
    }).encode()

    req = urllib.request.Request(
        f"{PLACES_BASE}:searchText",
        data=payload,
        headers={
            "Content-Type": "application/json",
            "X-Goog-Api-Key": api_key,
            "X-Goog-FieldMask": ",".join([
                "places.id", "places.displayName",
                "places.nationalPhoneNumber", "places.formattedAddress",
                "places.location", "places.websiteUri",
                "places.rating", "places.userRatingCount",
                "places.businessStatus",
            ]),
        },
        method="POST",
    )
    with urllib.request.urlopen(req, timeout=15) as r:
        return json.loads(r.read()).get("places", [])

Run a batch of queries by category and city:

pythonimport time

QUERIES = [
    "dentist Allen TX",
    "dental office Allen TX",
    "restaurant Allen TX",
    "church Allen TX",
    "auto repair Allen TX",
]

all_results = []
for q in QUERIES:
    results = places_search(q, api_key=os.environ["GOOGLE_API_KEY"])
    all_results.extend(results)
    time.sleep(0.2)

Pipeline B: Parcel-first

Query unlinked parcels from D1 — ones not yet matched to any lead. For each parcel address, search Google Places to find the operating business. This finds owner-operators — the person on the parcel deed is often the decision-maker you want to reach.

bash# Find parcels not yet linked to any lead
npx wrangler d1 execute prospecting-db --remote --json \
  --command "SELECT * FROM parcels WHERE prop_id NOT IN (SELECT parcel_id FROM leads WHERE parcel_id IS NOT NULL)"

pythondef find_business_at_parcel(parcel, api_key):
    """Look up the Google Places business operating at a parcel address."""
    addr = f"{parcel['situs_addr']}, {parcel['situs_city']}, TX"
    results = places_search(addr, api_key, max_results=5)

    if not results:
        return None

    for place in results:
        place_addr = place.get("formattedAddress", "").upper()
        parcel_street = parcel["situs_addr"].upper().split()[0]
        if parcel_street in place_addr:
            return place

    return results[0]

The parcel tells you who owns the property. The Places result tells you who operates there. When those match (the business name resembles the owner name), you've found an owner-operator — someone who both controls the property and runs the business.

Deduplication

Both pipelines produce overlapping results. Before inserting, export your existing leads from D1 for dedup:

bashnpx wrangler d1 execute prospecting-db --remote --json \
  --command "SELECT company, phone, address, city FROM leads" > /tmp/existing_leads.json

Deduplicate by phone number first (most reliable), then by normalized company name and street address:

pythonimport re

def normalize(s):
    """Strip punctuation, collapse whitespace, uppercase."""
    return re.sub(r'\s+', ' ', re.sub(r'[^A-Z0-9 ]', '', s.upper())).strip()

def dedup_leads(new_leads, existing_leads):
    """Filter out leads already in D1."""
    seen_phones = {c['phone'] for c in existing_leads if c.get('phone')}
    seen_names = {normalize(c['company']) for c in existing_leads if c.get('company')}
    seen_addrs = {normalize(c['address']) for c in existing_leads if c.get('address')}

    unique = []
    for lead in new_leads:
        phone = lead.get('phone', '')
        name = normalize(lead.get('company', ''))
        addr = normalize(lead.get('address', ''))

        if phone and phone in seen_phones:
            continue
        if name and name in seen_names:
            continue
        if addr and addr in seen_addrs:
            continue

        seen_phones.add(phone)
        seen_names.add(name)
        seen_addrs.add(addr)
        unique.append(lead)

    return unique

Three-layer dedup matters. A business might have a different phone on Google than in your database (cell vs. landline). Or the address might be formatted differently ("123 Main St" vs. "123 Main Street"). Checking all three catches most duplicates without false positives.

The enrichment pipeline

After discovery, each lead has basic info — name, address, maybe a phone number. Enrichment fills in the rest: email, property ownership, and for leads that didn't come through a Places call, coordinates and business intelligence.

Both pipelines call the Places API during discovery and get coordinates, rating, reviews, and status back in that same response. Set the geocode_enriched and intel_enriched flags at insert time for those leads — no separate pass needed. The flags exist for leads imported from other sources that didn't go through Places.

The flag pattern

Each enrichment stage is controlled by a boolean flag on the lead record. The flag means "this stage has been attempted" — not "it succeeded." This prevents retrying dead ends (businesses with no website will never yield an email, and you don't want to re-scrape them every run).

Flag	Stage	What it does
`geocode_enriched`	Geocode	Backfill lat/lng — set at insert for Places/Parcel discovery, run separately for other sources
`email_enriched`	Email Scrape	Scrape website homepage + /contact page for email addresses
`hunter_enriched`	Hunter.io	Domain-based email lookup for leads with a website but no scraped email
`parcel_enriched`	Parcel Link	Match address to parcel records for owner info
`intel_enriched`	Business Intel	Rating, reviews, status — set at insert for Places/Parcel discovery, run separately for other sources

Every script follows the same pattern:

bash# 1. Query D1 for unenriched candidates
npx wrangler d1 execute prospecting-db --remote --json \
  --command "SELECT * FROM leads WHERE geocode_enriched = 0 AND address IS NOT NULL"

python# 2. Process each one, build SQL updates
for lead in candidates:
    result = do_enrichment(lead)
    updates.append(build_update(lead, result))

# 3. Set the flag regardless of outcome
updates.append(f"UPDATE leads SET geocode_enriched = 1 WHERE id = {lead['id']};")

# 4. Write updates to a SQL file, apply to D1
with open('/tmp/geocode_updates.sql', 'w') as f:
    f.write('\n'.join(updates))

bashnpx wrangler d1 execute prospecting-db --remote --file /tmp/geocode_updates.sql

The "set flag regardless" rule is what makes the pipeline idempotent. Run it as many times as you want — it only processes new leads.

Stage 1: Geocode

If a lead has an address but no coordinates, geocode it via Places Text Search. You need lat/lng for distance filtering and territory mapping later.

pythondef geocode(address, api_key):
    results = places_search(address, api_key, max_results=1)
    if results and "location" in results[0]:
        loc = results[0]["location"]
        return loc["latitude"], loc["longitude"]
    return None, None

Stage 2: Email scraping

For each lead with a website, fetch the homepage and the /contact page, then regex out email addresses. Filter aggressively — junk domains, noreply addresses, and framework artifacts pollute the results.

pythonimport re
import urllib.request
import ssl

EMAIL_RE = re.compile(r'[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}')

JUNK_DOMAINS = {
    'example.com', 'sentry.io', 'wixpress.com', 'googleapis.com',
    'w3.org', 'schema.org', 'wordpress.org', 'squarespace.com',
    'gravatar.com', 'godaddy.com',
}
JUNK_PREFIXES = {'noreply', 'no-reply', 'mailer-daemon', 'postmaster', 'webmaster'}

ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

def scrape_emails(website):
    """Scrape email addresses from a website's homepage and /contact page."""
    emails = set()
    for path in ['', '/contact', '/contact-us']:
        url = website.rstrip('/') + path
        try:
            req = urllib.request.Request(url, headers={'User-Agent': 'Mozilla/5.0'})
            with urllib.request.urlopen(req, timeout=10, context=ctx) as r:
                html = r.read().decode(errors='replace')
                for match in EMAIL_RE.findall(html):
                    addr = match.lower()
                    domain = addr.split('@')[1]
                    prefix = addr.split('@')[0]
                    if domain not in JUNK_DOMAINS and prefix not in JUNK_PREFIXES:
                        emails.add(addr)
        except Exception:
            continue
    return list(emails)

Why the relaxed SSL context? Small-business websites frequently have expired, self-signed, or misconfigured certificates. You're scraping a public homepage for a mailto link, not transmitting credentials. Verify before sending. Scraped emails are unverified — the address may exist in the HTML but bounce when you send to it. Run leads through an email verification service (ZeroBounce, NeverBounce, or Hunter's built-in verifier) before any outreach campaign. Sending to a cold, unverified list damages your sender reputation faster than any other mistake.

Stage 3: Hunter.io email enrichment

For leads with a website but no email after scraping, Hunter.io can find a deliverable address by querying the domain. This catches professional services and companies that hide their email behind contact forms.

pythonimport urllib.request
import json
import os

def hunter_find_email(domain, api_key):
    """Find the most likely email address for a domain via Hunter.io."""
    url = f"https://api.hunter.io/v2/domain-search?domain={domain}&api_key={api_key}&limit=1"
    try:
        with urllib.request.urlopen(url, timeout=10) as r:
            data = json.loads(r.read())
            emails = data.get("data", {}).get("emails", [])
            if emails:
                return emails[0].get("value")
    except Exception:
        pass
    return None

Query D1 for leads that have a website but no email and haven't been through Hunter yet:

bashnpx wrangler d1 execute prospecting-db --remote --json \
  --command "SELECT id, company, website FROM leads WHERE hunter_enriched = 0 AND website IS NOT NULL AND (email IS NULL OR email = '')"

Hunter.io's free plan gives you 50 requests/month — enough for a small pipeline. It works best for companies with a domain-level email pattern (e.g. firstname@company.com). The flag follows the same idempotent pattern — set it regardless of whether an email was found.

Stage 4: Parcel linking

Match leads to parcel records in D1 by normalized street address. When a lead's address matches a parcel row, you get the owner's name, acreage, market value, and land use — context that feeds into scoring later.

pythondef normalize_address(addr):
    """Collapse to comparable form: uppercase, strip suite/unit, abbreviate."""
    addr = addr.upper().strip()
    addr = re.sub(r'\b(SUITE|STE|UNIT|APT|#)\s*\S+', '', addr)
    addr = re.sub(r'\bSTREET\b', 'ST', addr)
    addr = re.sub(r'\bAVENUE\b', 'AVE', addr)
    addr = re.sub(r'\bBOULEVARD\b', 'BLVD', addr)
    addr = re.sub(r'\bDRIVE\b', 'DR', addr)
    return re.sub(r'\s+', ' ', addr).strip()

Stage 5: Business intelligence

Google Places returns more than a name and address. The same API call that discovered the lead can populate review count, average rating, business status, and category. You use these to prioritize outreach.

pythondef enrich_business_intel(lead, api_key):
    """Pull business intelligence from Google Places for a lead."""
    query = f"{lead['company']} {lead['address']}"
    results = places_search(query, api_key, max_results=1)

    if not results:
        return {}

    place = results[0]
    return {
        'rating': place.get('rating'),
        'review_count': place.get('userRatingCount', 0),
        'business_status': place.get('businessStatus', 'UNKNOWN'),
        'place_id': place.get('id'),
    }

A business with 200 reviews is probably larger than one with 5. A business marked CLOSED_TEMPORARILY is not worth emailing. These signals come back in the same API response you already paid for.

AI-operated qualification

Two jobs that would take a human hours per batch, but an AI agent can do in minutes:

Lead scoring

The agent reviews enrichment data and assigns a priority tier. What counts as "hot" depends on your business — here's the signal table:

Signal	Where it comes from	What it indicates
Review count	Google Places	Business size and foot traffic
Has email	Email scrape	Reachable via cold outreach
Owner-operator match	Parcel link	Decision-maker identified
Property value / acreage	Parcel data	Scale of opportunity
Business status	Google Places	Active vs. closed vs. temporarily shut

The agent weighs these signals against your ICP and assigns a tier (hot, warm, cold). Present the results in a table and let the user adjust before writing to D1.

Owner-operator detection

When a lead's business name resembles the parcel owner's name, they're likely an owner-operator — they own the building and make decisions about it directly. If you sell anything property-related, this is the person you want to talk to.

pythonSTOP_WORDS = {'INC', 'LLC', 'LP', 'LTD', 'CORP', 'CO', 'THE', 'OF', 'AND', '&'}

def fuzzy_owner_match(business_name, owner_name):
    """Check if a business name is a fuzzy match for a parcel owner name."""
    def tokenize(s):
        tokens = re.sub(r'[^A-Z0-9 ]', '', s.upper()).split()
        return {t for t in tokens if t not in STOP_WORDS and len(t) > 1}

    biz_tokens = tokenize(business_name)
    own_tokens = tokenize(owner_name)

    if not biz_tokens or not own_tokens:
        return False

    overlap = biz_tokens & own_tokens
    return len(overlap) >= 2 or (len(overlap) == 1 and len(biz_tokens) <= 2)

Churches, schools, and fraternal organizations almost always own and operate their facility. The agent can classify these automatically without fuzzy matching.

Lead type taxonomy

Every lead gets a type based on the relationship between the business and the property. This determines who you're actually talking to when you reach out:

Type	Who they are	What it means for outreach
`owner_operator`	Owns the building, runs the business	Direct decision-maker
`tenant`	Rents space, doesn't control the property	May need to route you to their landlord
`property_manager`	Manages properties for owners	Controls vendor selection for multiple sites

The pipeline assigns owner_operator when the business name matches the parcel owner, and tenant otherwise. The agent can reclassify during qualification if it finds evidence one way or the other.

Wire it up as a skill

The whole pipeline is designed to be run by an AI coding agent. Here's the Claude Code skill file that orchestrates it:

yaml---
name: prospecting
description: Discover and enrich new prospects from public records.
  Use when user mentions "find prospects", "new leads", "prospecting",
  "parcel search", or "enrich contacts". Handles Places search,
  parcel-first discovery, dedup, and multi-stage enrichment.
---

Workflow

The skill body walks the agent through seven steps:

Ask discovery source. Always start here — the user picks Places-first or Parcel-first.
Analyze gaps. Query D1 for coverage by city and enrichment progress. Help the user pick targets where coverage is thin.
Run discovery. Execute the chosen pipeline, deduplicate against existing D1 records.
Review and insert. Show the user totals, city breakdown, dedup stats. Get explicit confirmation before writing to D1.
Enrich. Run all five stages in order. Each stage is a separate script invocation — geocode, email scrape, Hunter.io, parcel link, business intel.
Score and classify. The agent reviews enrichment data, assigns lead scores and types. Present a table for user confirmation before updating.
Verify. Query enrichment progress, show before/after comparison.

Rules to encode in the skill

Never insert duplicates. The dedup step is non-negotiable.
Confirm before inserting. The agent must present discovery results and wait for user approval.
Confirm before reclassifying. Score and type changes must be reviewed.
Set flags regardless of outcome. The idempotent flag pattern is the backbone of the pipeline.

The pattern: The skill file defines workflow and rules. The scripts do the work. The agent orchestrates — it asks the user for decisions at each gate, runs the right script, and presents results. You approve, it executes.

What's next

Cold email sequences. Build template-per-lead-type outreach and wire up a sending service (Resend, SendGrid, SES). The pipeline already classifies leads by type — map each type to a template.
Scheduled enrichment. Run enrichment nightly as a cron job. New leads from today's discovery get geocoded and email-scraped by morning.
Multi-county expansion. Each county is a new shapefile download and a filter config. The pipeline scales linearly with no code changes.
Additional enrichment stages. LinkedIn for decision-maker names. Yelp for review data. Clearbit for company firmographics. Each one is a new flag and a new script — the pattern stays the same.