How long does it take to build a website with WebZum?

WebZum creates a complete professional website in about 5 minutes. Our AI automatically discovers your business information from Google, Facebook, Yelp, and across the web, then generates all content, images, and design for you.

Do I need to write any content for my website?

No content writing is needed. WebZum's AI finds your existing online information (reviews, business details, photos) and writes all website content specifically for your business. You can edit anything afterward if you want.

How much does WebZum cost?

WebZum offers a Basic plan at $19/month and a Professional plan at $34/month. The Professional plan includes a free custom domain and removes WebZum branding. Both include a free preview before subscribing, no setup fees, and you can cancel anytime with no long-term contracts.

What makes WebZum different from other website builders?

WebZum doesn't just generate a template — it runs a full AI pipeline: market research, competitor analysis, brand strategy, SEO keyword planning, content writing, logo generation, image generation, and multi-page design. Other builders require you to write content and pick designs yourself. WebZum does all of it automatically in 5 minutes. It works for any business — existing or brand new, with or without an online presence.

Do I get my own domain name?

Yes! The Professional plan includes a free domain name. Search for an available name, click the one you want, and your website is live on it. WebZum handles registration, DNS, and SSL — nothing technical on your end.

WebZum works for anyone who needs a website: small businesses (restaurants, salons, contractors, law firms), agencies building sites for clients at scale, creators and influencers who need landing pages, portfolios, and SEO-focused microsites. The AI pipeline adapts to any use case — it researches your market, writes the content, and builds the site.

Can I edit my website after it's generated?

Yes, WebZum includes website editing tools that let you modify text, images, and sections. You can also regenerate your entire website anytime to get fresh content as our AI improves.

Does WebZum create a logo for my business?

Yes! WebZum's AI creates a professional logo for your business at no extra cost. If you already have a logo, you can upload it and WebZum will use your brand colors throughout the website.

How We Use AI to Detect Duplicate Business Registrations (And Why It’s Harder Than You Think)

TL;DR: We built an AI-powered business fingerprinting system that detects duplicate registrations with 95% accuracy. It handles typos, abbreviations, different formats, and even intentional variations. Uses Claude to normalize business data, generates unique fingerprints, and prevents users from creating multiple websites for the same business.

The Problem: Users Keep Creating Duplicates

We let users generate websites by entering a business name. Simple, right?

Wrong.

What we saw:

“Joe’s Pizza Brooklyn” (Monday)
“Joes Pizza - Brooklyn NY” (Tuesday)
“Joe’s Pizzeria” (Wednesday)

Same business. Three websites. Three subscriptions. Chaos.

Why it happens:

Typos: “Joe’s” vs “Joes” vs “Joe’s”
Abbreviations: “Brooklyn” vs “Bklyn” vs “BK”
Formatting: “123 Main St” vs “123 Main Street, Apt 2”
Intentional variations: Users forget they already created a site

The cost:

Wasted AI API calls ($2-5 per website generation)
Confused users (“Why do I have 3 websites?”)
Support tickets (“Which one is the real one?”)
Database bloat (3x more records than actual businesses)

We needed to detect duplicates before generating the website.

The Insight: Fingerprints, Not Exact Matches

The breakthrough came when we stopped trying to match business names exactly and started thinking about “business fingerprints.”

Exact matching (doesn’t work):

"Joe's Pizza Brooklyn" ≠ "Joes Pizza - Brooklyn NY"

Fingerprint matching (works):

normalize("Joe's Pizza Brooklyn") → "joes-pizza-brooklyn"
normalize("Joes Pizza - Brooklyn NY") → "joes-pizza-brooklyn"
✅ MATCH!

But normalization alone isn’t enough. We needed AI.

How It Works: The Technical Architecture

1. AI-Powered Business Name Extraction

When a user enters text, we use Claude to extract structured data:

async function extractBusinessInfo(userInput: string) {
  const prompt = `
Extract business information from this input:
"${userInput}"

Return JSON with:
- businessName: The core business name (no location, no legal entity)
- location: City, state, or neighborhood
- type: Business type (restaurant, plumber, etc.)
- legalEntity: LLC, Inc, etc. (if present)

Examples:
Input: "Joe's Pizza LLC in Brooklyn"
Output: {
  "businessName": "Joe's Pizza",
  "location": "Brooklyn",
  "type": "restaurant",
  "legalEntity": "LLC"
}

Input: "Best Plumbing Services - San Diego, CA"
Output: {
  "businessName": "Best Plumbing Services",
  "location": "San Diego, CA",
  "type": "plumber",
  "legalEntity": null
}
`;

  const response = await anthropic.messages.create({
    model: 'claude-3-5-sonnet-20241022',
    max_tokens: 500,
    messages: [{
      role: 'user',
      content: prompt
    }]
  });

  return JSON.parse(response.content[0].text);
}

Why AI? Because business names are messy:

“Joe’s Pizza Brooklyn” → name: “Joe’s Pizza”, location: “Brooklyn”
“Brooklyn Joe’s Pizza” → name: “Joe’s Pizza”, location: “Brooklyn”
“Joe’s Pizzeria of Brooklyn” → name: “Joe’s Pizzeria”, location: “Brooklyn”

AI understands context that regex can’t handle.

2. Normalization Pipeline

Once we have structured data, we normalize it:

function normalizeBusinessName(name: string): string {
  return name
    .toLowerCase()
    .replace(/['']/g, '') // Remove apostrophes
    .replace(/[^\w\s]/g, '') // Remove punctuation
    .replace(/\s+/g, '-') // Spaces to hyphens
    .replace(/^(the|a|an)-/, '') // Remove articles
    .replace(/-llc|-inc|-corp|-ltd$/, '') // Remove legal entities
    .trim();
}

function normalizeLocation(location: string): string {
  return location
    .toLowerCase()
    .replace(/\b(street|st|avenue|ave|road|rd|boulevard|blvd)\b/g, '') // Remove street types
    .replace(/\b(apartment|apt|suite|ste|unit)\s*\d+/g, '') // Remove apt numbers
    .replace(/[^\w\s]/g, '')
    .replace(/\s+/g, '-')
    .trim();
}

function normalizePhone(phone: string): string {
  // Extract just the digits
  const digits = phone.replace(/\D/g, '');
  
  // US phone: keep last 10 digits
  if (digits.length >= 10) {
    return digits.slice(-10);
  }
  
  return digits;
}

Examples:

normalizeBusinessName("Joe's Pizza LLC") → "joes-pizza"
normalizeBusinessName("The Joe's Pizzeria") → "joes-pizzeria"
normalizeLocation("123 Main St, Apt 2") → "123-main"
normalizeLocation("123 Main Street") → "123-main"
normalizePhone("(555) 123-4567") → "5551234567"
normalizePhone("+1-555-123-4567") → "5551234567"

3. Fingerprint Generation

We combine normalized components into a unique fingerprint:

interface BusinessFingerprint {
  primaryKey: string;      // Most specific
  secondaryKeys: string[]; // Fallback matches
  metadata: {
    originalName: string;
    normalizedName: string;
    location?: string;
    phone?: string;
    type?: string;
  };
}

function generateFingerprint(businessInfo: ExtractedBusinessInfo): BusinessFingerprint {
  const normalizedName = normalizeBusinessName(businessInfo.businessName);
  const normalizedLocation = businessInfo.location 
    ? normalizeLocation(businessInfo.location) 
    : null;
  const normalizedPhone = businessInfo.phone 
    ? normalizePhone(businessInfo.phone) 
    : null;
  
  // Primary key: name + location (most specific)
  const primaryKey = normalizedLocation
    ? `${normalizedName}-${normalizedLocation}`
    : normalizedName;
  
  // Secondary keys: alternative matches
  const secondaryKeys = [
    normalizedName, // Name only
    normalizedPhone ? `phone-${normalizedPhone}` : null, // Phone only
    businessInfo.type ? `${normalizedName}-${businessInfo.type}` : null // Name + type
  ].filter(Boolean);
  
  return {
    primaryKey,
    secondaryKeys,
    metadata: {
      originalName: businessInfo.businessName,
      normalizedName,
      location: normalizedLocation,
      phone: normalizedPhone,
      type: businessInfo.type
    }
  };
}

Example fingerprints:

Input: "Joe's Pizza Brooklyn"
Output: {
  primaryKey: "joes-pizza-brooklyn",
  secondaryKeys: [
    "joes-pizza",
    "joes-pizza-restaurant"
  ],
  metadata: {
    originalName: "Joe's Pizza",
    normalizedName: "joes-pizza",
    location: "brooklyn",
    type: "restaurant"
  }
}

Input: "Joes Pizza - Brooklyn NY (555) 123-4567"
Output: {
  primaryKey: "joes-pizza-brooklyn",
  secondaryKeys: [
    "joes-pizza",
    "phone-5551234567",
    "joes-pizza-restaurant"
  ],
  metadata: {
    originalName: "Joes Pizza",
    normalizedName: "joes-pizza",
    location: "brooklyn",
    phone: "5551234567",
    type: "restaurant"
  }
}

✅ PRIMARY KEY MATCH: Same business!

4. Duplicate Detection

Before creating a new business, we check for duplicates:

async function checkForDuplicates(fingerprint: BusinessFingerprint): Promise<DuplicateResult> {
  // Check primary key first (exact match)
  const primaryMatch = await db.findBusinessByFingerprint(fingerprint.primaryKey);
  if (primaryMatch) {
    return {
      isDuplicate: true,
      confidence: 'high',
      matchedBusiness: primaryMatch,
      matchType: 'primary'
    };
  }
  
  // Check secondary keys (fuzzy match)
  for (const secondaryKey of fingerprint.secondaryKeys) {
    const secondaryMatch = await db.findBusinessByFingerprint(secondaryKey);
    if (secondaryMatch) {
      // Verify it's actually the same business (not just similar name)
      const similarity = calculateSimilarity(fingerprint, secondaryMatch.fingerprint);
      
      if (similarity > 0.8) {
        return {
          isDuplicate: true,
          confidence: 'medium',
          matchedBusiness: secondaryMatch,
          matchType: 'secondary',
          similarity
        };
      }
    }
  }
  
  return {
    isDuplicate: false,
    confidence: 'none'
  };
}

function calculateSimilarity(fp1: BusinessFingerprint, fp2: BusinessFingerprint): number {
  let score = 0;
  let checks = 0;
  
  // Name similarity (most important)
  if (fp1.metadata.normalizedName === fp2.metadata.normalizedName) {
    score += 0.5;
  }
  checks++;
  
  // Location similarity
  if (fp1.metadata.location && fp2.metadata.location) {
    if (fp1.metadata.location === fp2.metadata.location) {
      score += 0.3;
    }
    checks++;
  }
  
  // Phone similarity
  if (fp1.metadata.phone && fp2.metadata.phone) {
    if (fp1.metadata.phone === fp2.metadata.phone) {
      score += 0.2;
    }
    checks++;
  }
  
  return score / checks;
}

5. User Confirmation Flow

When we detect a duplicate, we ask the user:

async function handleBusinessRegistration(userInput: string) {
  // Extract and normalize
  const businessInfo = await extractBusinessInfo(userInput);
  const fingerprint = generateFingerprint(businessInfo);
  
  // Check for duplicates
  const duplicateCheck = await checkForDuplicates(fingerprint);
  
  if (duplicateCheck.isDuplicate) {
    // Show confirmation dialog
    const userConfirmed = await showDuplicateDialog({
      originalInput: userInput,
      matchedBusiness: duplicateCheck.matchedBusiness,
      confidence: duplicateCheck.confidence
    });
    
    if (!userConfirmed) {
      // User says it's a duplicate, redirect to existing business
      return {
        action: 'redirect',
        businessId: duplicateCheck.matchedBusiness.id
      };
    }
    
    // User says it's NOT a duplicate, create new business
    // (but flag for manual review if confidence is high)
    if (duplicateCheck.confidence === 'high') {
      await flagForManualReview(fingerprint, duplicateCheck);
    }
  }
  
  // Create new business
  const business = await createBusiness(businessInfo, fingerprint);
  return {
    action: 'created',
    businessId: business.id
  };
}

Duplicate dialog UI:

function showDuplicateDialog(data: DuplicateData): Promise<boolean> {
  return new Promise((resolve) => {
    const dialog = document.createElement('div');
    dialog.innerHTML = `
      <div class="duplicate-dialog">
        <h3>We found a similar business</h3>
        <p>You entered: <strong>${data.originalInput}</strong></p>
        <p>We found: <strong>${data.matchedBusiness.name}</strong></p>
        <p>Created: ${formatDate(data.matchedBusiness.createdAt)}</p>
        
        <div class="actions">
          <button class="btn-primary" id="use-existing">
            Use Existing Business
          </button>
          <button class="btn-secondary" id="create-new">
            No, Create New Business
          </button>
        </div>
      </div>
    `;
    
    document.body.appendChild(dialog);
    
    dialog.querySelector('#use-existing').addEventListener('click', () => {
      resolve(false); // It's a duplicate
      dialog.remove();
    });
    
    dialog.querySelector('#create-new').addEventListener('click', () => {
      resolve(true); // Not a duplicate
      dialog.remove();
    });
  });
}

The Challenges We Solved

Challenge 1: False Positives

Problem: “Joe’s Pizza Brooklyn” and “Joe’s Burgers Brooklyn” matched as duplicates

Solution: Multi-factor scoring with type checking

function calculateSimilarity(fp1, fp2) {
  // ... previous code ...
  
  // Type check (critical for restaurants)
  if (fp1.metadata.type && fp2.metadata.type) {
    if (fp1.metadata.type !== fp2.metadata.type) {
      score *= 0.5; // Heavily penalize type mismatch
    }
  }
  
  return score;
}

Challenge 2: Franchise Locations

Problem: “McDonald’s Brooklyn” and “McDonald’s Manhattan” are different locations, not duplicates

Solution: Location-aware fingerprinting

// For franchise businesses, location is REQUIRED in primary key
const isFranchise = FRANCHISE_NAMES.includes(normalizedName);

const primaryKey = isFranchise || normalizedLocation
  ? `${normalizedName}-${normalizedLocation}`
  : normalizedName;

Challenge 3: AI Hallucinations

Problem: Claude sometimes extracts incorrect business types

Solution: Confidence scoring + fallback to user input

const businessInfo = await extractBusinessInfo(userInput);

// Validate AI extraction
if (!businessInfo.businessName || businessInfo.businessName.length < 2) {
  // AI failed, fall back to user input
  businessInfo.businessName = userInput;
}

// Store both AI-extracted and original input
await db.createBusiness({
  ...businessInfo,
  originalInput: userInput,
  aiExtracted: true
});

The Results: 95% Accuracy

Before (no deduplication):

30% of businesses had duplicates
1,000 businesses → 1,300 database records
$650 wasted on duplicate AI generations

After (fingerprinting system):

5% false negative rate (missed duplicates)
2% false positive rate (flagged non-duplicates)
93% of duplicates caught before generation
$600 saved per month in AI costs

User feedback:

“Oh wow, I already created this last week! Thanks for catching that.” - Bakery owner

“I thought I lost my website. Turns out I just typed the name slightly differently.” - Contractor

Why This Matters for AI Applications

Most AI applications assume clean input. We learned:

Bad: Trust user input → create duplicates → clean up later Good: Normalize input → detect duplicates → confirm with user

The startup lesson: AI is great at understanding messy input, but you still need deterministic logic for matching. Use AI to extract structure, use code to match patterns.

Key Insights

AI for extraction, code for matching: Claude extracts business info, code generates fingerprints
Multi-factor scoring: Name + location + phone + type = high confidence
User confirmation: When in doubt, ask the user
Graceful degradation: If AI fails, fall back to user input

What’s Next

We’re exploring:

Fuzzy matching: Levenshtein distance for typo detection
Address normalization: Use Google Maps API to standardize addresses
Phone number lookup: Verify business phone numbers with Twilio
Historical data: Learn from user corrections to improve AI extraction

But the core insight remains: Fingerprints > exact matches.

Try it yourself: Enter “Joe’s Pizza Brooklyn” on WebZum, then try “Joes Pizza - Brooklyn NY”. Watch the duplicate detection catch it.

Building a deduplication system? Key takeaway: AI + normalization + fingerprinting = robust duplicate detection. Don’t rely on exact matches—businesses are messy.

The future of data quality isn’t perfect input—it’s intelligent normalization.

How We Use AI to Detect Duplicate Business Registrations (And Why It’s Harder Than You Think)

The Problem: Users Keep Creating Duplicates

We let users generate websites by entering a business name. Simple, right?

Wrong.

What we saw:

“Joe’s Pizza Brooklyn” (Monday)
“Joes Pizza - Brooklyn NY” (Tuesday)
“Joe’s Pizzeria” (Wednesday)

Same business. Three websites. Three subscriptions. Chaos.

Why it happens:

Typos: “Joe’s” vs “Joes” vs “Joe’s”
Abbreviations: “Brooklyn” vs “Bklyn” vs “BK”
Formatting: “123 Main St” vs “123 Main Street, Apt 2”
Intentional variations: Users forget they already created a site

The cost:

Wasted AI API calls ($2-5 per website generation)
Confused users (“Why do I have 3 websites?”)
Support tickets (“Which one is the real one?”)
Database bloat (3x more records than actual businesses)

We needed to detect duplicates before generating the website.

The Insight: Fingerprints, Not Exact Matches

The breakthrough came when we stopped trying to match business names exactly and started thinking about “business fingerprints.”

Exact matching (doesn’t work):

"Joe's Pizza Brooklyn" ≠ "Joes Pizza - Brooklyn NY"

Fingerprint matching (works):

normalize("Joe's Pizza Brooklyn") → "joes-pizza-brooklyn"
normalize("Joes Pizza - Brooklyn NY") → "joes-pizza-brooklyn"
✅ MATCH!

But normalization alone isn’t enough. We needed AI.

How It Works: The Technical Architecture

1. AI-Powered Business Name Extraction

When a user enters text, we use Claude to extract structured data:

async function extractBusinessInfo(userInput: string) {
  const prompt = `
Extract business information from this input:
"${userInput}"

Return JSON with:
- businessName: The core business name (no location, no legal entity)
- location: City, state, or neighborhood
- type: Business type (restaurant, plumber, etc.)
- legalEntity: LLC, Inc, etc. (if present)

Examples:
Input: "Joe's Pizza LLC in Brooklyn"
Output: {
  "businessName": "Joe's Pizza",
  "location": "Brooklyn",
  "type": "restaurant",
  "legalEntity": "LLC"
}

Input: "Best Plumbing Services - San Diego, CA"
Output: {
  "businessName": "Best Plumbing Services",
  "location": "San Diego, CA",
  "type": "plumber",
  "legalEntity": null
}
`;

  const response = await anthropic.messages.create({
    model: 'claude-3-5-sonnet-20241022',
    max_tokens: 500,
    messages: [{
      role: 'user',
      content: prompt
    }]
  });

  return JSON.parse(response.content[0].text);
}

Why AI? Because business names are messy:

“Joe’s Pizza Brooklyn” → name: “Joe’s Pizza”, location: “Brooklyn”
“Brooklyn Joe’s Pizza” → name: “Joe’s Pizza”, location: “Brooklyn”
“Joe’s Pizzeria of Brooklyn” → name: “Joe’s Pizzeria”, location: “Brooklyn”

AI understands context that regex can’t handle.

2. Normalization Pipeline

Once we have structured data, we normalize it:

function normalizeBusinessName(name: string): string {
  return name
    .toLowerCase()
    .replace(/['']/g, '') // Remove apostrophes
    .replace(/[^\w\s]/g, '') // Remove punctuation
    .replace(/\s+/g, '-') // Spaces to hyphens
    .replace(/^(the|a|an)-/, '') // Remove articles
    .replace(/-llc|-inc|-corp|-ltd$/, '') // Remove legal entities
    .trim();
}

function normalizeLocation(location: string): string {
  return location
    .toLowerCase()
    .replace(/\b(street|st|avenue|ave|road|rd|boulevard|blvd)\b/g, '') // Remove street types
    .replace(/\b(apartment|apt|suite|ste|unit)\s*\d+/g, '') // Remove apt numbers
    .replace(/[^\w\s]/g, '')
    .replace(/\s+/g, '-')
    .trim();
}

function normalizePhone(phone: string): string {
  // Extract just the digits
  const digits = phone.replace(/\D/g, '');
  
  // US phone: keep last 10 digits
  if (digits.length >= 10) {
    return digits.slice(-10);
  }
  
  return digits;
}

Examples:

normalizeBusinessName("Joe's Pizza LLC") → "joes-pizza"
normalizeBusinessName("The Joe's Pizzeria") → "joes-pizzeria"
normalizeLocation("123 Main St, Apt 2") → "123-main"
normalizeLocation("123 Main Street") → "123-main"
normalizePhone("(555) 123-4567") → "5551234567"
normalizePhone("+1-555-123-4567") → "5551234567"

3. Fingerprint Generation

We combine normalized components into a unique fingerprint:

interface BusinessFingerprint {
  primaryKey: string;      // Most specific
  secondaryKeys: string[]; // Fallback matches
  metadata: {
    originalName: string;
    normalizedName: string;
    location?: string;
    phone?: string;
    type?: string;
  };
}

function generateFingerprint(businessInfo: ExtractedBusinessInfo): BusinessFingerprint {
  const normalizedName = normalizeBusinessName(businessInfo.businessName);
  const normalizedLocation = businessInfo.location 
    ? normalizeLocation(businessInfo.location) 
    : null;
  const normalizedPhone = businessInfo.phone 
    ? normalizePhone(businessInfo.phone) 
    : null;
  
  // Primary key: name + location (most specific)
  const primaryKey = normalizedLocation
    ? `${normalizedName}-${normalizedLocation}`
    : normalizedName;
  
  // Secondary keys: alternative matches
  const secondaryKeys = [
    normalizedName, // Name only
    normalizedPhone ? `phone-${normalizedPhone}` : null, // Phone only
    businessInfo.type ? `${normalizedName}-${businessInfo.type}` : null // Name + type
  ].filter(Boolean);
  
  return {
    primaryKey,
    secondaryKeys,
    metadata: {
      originalName: businessInfo.businessName,
      normalizedName,
      location: normalizedLocation,
      phone: normalizedPhone,
      type: businessInfo.type
    }
  };
}

Example fingerprints:

Input: "Joe's Pizza Brooklyn"
Output: {
  primaryKey: "joes-pizza-brooklyn",
  secondaryKeys: [
    "joes-pizza",
    "joes-pizza-restaurant"
  ],
  metadata: {
    originalName: "Joe's Pizza",
    normalizedName: "joes-pizza",
    location: "brooklyn",
    type: "restaurant"
  }
}

Input: "Joes Pizza - Brooklyn NY (555) 123-4567"
Output: {
  primaryKey: "joes-pizza-brooklyn",
  secondaryKeys: [
    "joes-pizza",
    "phone-5551234567",
    "joes-pizza-restaurant"
  ],
  metadata: {
    originalName: "Joes Pizza",
    normalizedName: "joes-pizza",
    location: "brooklyn",
    phone: "5551234567",
    type: "restaurant"
  }
}

✅ PRIMARY KEY MATCH: Same business!

4. Duplicate Detection

Before creating a new business, we check for duplicates:

async function checkForDuplicates(fingerprint: BusinessFingerprint): Promise<DuplicateResult> {
  // Check primary key first (exact match)
  const primaryMatch = await db.findBusinessByFingerprint(fingerprint.primaryKey);
  if (primaryMatch) {
    return {
      isDuplicate: true,
      confidence: 'high',
      matchedBusiness: primaryMatch,
      matchType: 'primary'
    };
  }
  
  // Check secondary keys (fuzzy match)
  for (const secondaryKey of fingerprint.secondaryKeys) {
    const secondaryMatch = await db.findBusinessByFingerprint(secondaryKey);
    if (secondaryMatch) {
      // Verify it's actually the same business (not just similar name)
      const similarity = calculateSimilarity(fingerprint, secondaryMatch.fingerprint);
      
      if (similarity > 0.8) {
        return {
          isDuplicate: true,
          confidence: 'medium',
          matchedBusiness: secondaryMatch,
          matchType: 'secondary',
          similarity
        };
      }
    }
  }
  
  return {
    isDuplicate: false,
    confidence: 'none'
  };
}

function calculateSimilarity(fp1: BusinessFingerprint, fp2: BusinessFingerprint): number {
  let score = 0;
  let checks = 0;
  
  // Name similarity (most important)
  if (fp1.metadata.normalizedName === fp2.metadata.normalizedName) {
    score += 0.5;
  }
  checks++;
  
  // Location similarity
  if (fp1.metadata.location && fp2.metadata.location) {
    if (fp1.metadata.location === fp2.metadata.location) {
      score += 0.3;
    }
    checks++;
  }
  
  // Phone similarity
  if (fp1.metadata.phone && fp2.metadata.phone) {
    if (fp1.metadata.phone === fp2.metadata.phone) {
      score += 0.2;
    }
    checks++;
  }
  
  return score / checks;
}

5. User Confirmation Flow

When we detect a duplicate, we ask the user:

async function handleBusinessRegistration(userInput: string) {
  // Extract and normalize
  const businessInfo = await extractBusinessInfo(userInput);
  const fingerprint = generateFingerprint(businessInfo);
  
  // Check for duplicates
  const duplicateCheck = await checkForDuplicates(fingerprint);
  
  if (duplicateCheck.isDuplicate) {
    // Show confirmation dialog
    const userConfirmed = await showDuplicateDialog({
      originalInput: userInput,
      matchedBusiness: duplicateCheck.matchedBusiness,
      confidence: duplicateCheck.confidence
    });
    
    if (!userConfirmed) {
      // User says it's a duplicate, redirect to existing business
      return {
        action: 'redirect',
        businessId: duplicateCheck.matchedBusiness.id
      };
    }
    
    // User says it's NOT a duplicate, create new business
    // (but flag for manual review if confidence is high)
    if (duplicateCheck.confidence === 'high') {
      await flagForManualReview(fingerprint, duplicateCheck);
    }
  }
  
  // Create new business
  const business = await createBusiness(businessInfo, fingerprint);
  return {
    action: 'created',
    businessId: business.id
  };
}

Duplicate dialog UI:

function showDuplicateDialog(data: DuplicateData): Promise<boolean> {
  return new Promise((resolve) => {
    const dialog = document.createElement('div');
    dialog.innerHTML = `
      <div class="duplicate-dialog">
        <h3>We found a similar business</h3>
        <p>You entered: <strong>${data.originalInput}</strong></p>
        <p>We found: <strong>${data.matchedBusiness.name}</strong></p>
        <p>Created: ${formatDate(data.matchedBusiness.createdAt)}</p>
        
        <div class="actions">
          <button class="btn-primary" id="use-existing">
            Use Existing Business
          </button>
          <button class="btn-secondary" id="create-new">
            No, Create New Business
          </button>
        </div>
      </div>
    `;
    
    document.body.appendChild(dialog);
    
    dialog.querySelector('#use-existing').addEventListener('click', () => {
      resolve(false); // It's a duplicate
      dialog.remove();
    });
    
    dialog.querySelector('#create-new').addEventListener('click', () => {
      resolve(true); // Not a duplicate
      dialog.remove();
    });
  });
}

The Challenges We Solved

Challenge 1: False Positives

Problem: “Joe’s Pizza Brooklyn” and “Joe’s Burgers Brooklyn” matched as duplicates

Solution: Multi-factor scoring with type checking

function calculateSimilarity(fp1, fp2) {
  // ... previous code ...
  
  // Type check (critical for restaurants)
  if (fp1.metadata.type && fp2.metadata.type) {
    if (fp1.metadata.type !== fp2.metadata.type) {
      score *= 0.5; // Heavily penalize type mismatch
    }
  }
  
  return score;
}

Challenge 2: Franchise Locations

Problem: “McDonald’s Brooklyn” and “McDonald’s Manhattan” are different locations, not duplicates

Solution: Location-aware fingerprinting

// For franchise businesses, location is REQUIRED in primary key
const isFranchise = FRANCHISE_NAMES.includes(normalizedName);

const primaryKey = isFranchise || normalizedLocation
  ? `${normalizedName}-${normalizedLocation}`
  : normalizedName;

Challenge 3: AI Hallucinations

Problem: Claude sometimes extracts incorrect business types

Solution: Confidence scoring + fallback to user input

const businessInfo = await extractBusinessInfo(userInput);

// Validate AI extraction
if (!businessInfo.businessName || businessInfo.businessName.length < 2) {
  // AI failed, fall back to user input
  businessInfo.businessName = userInput;
}

// Store both AI-extracted and original input
await db.createBusiness({
  ...businessInfo,
  originalInput: userInput,
  aiExtracted: true
});

The Results: 95% Accuracy

Before (no deduplication):

30% of businesses had duplicates
1,000 businesses → 1,300 database records
$650 wasted on duplicate AI generations

After (fingerprinting system):

5% false negative rate (missed duplicates)
2% false positive rate (flagged non-duplicates)
93% of duplicates caught before generation
$600 saved per month in AI costs

User feedback:

“Oh wow, I already created this last week! Thanks for catching that.” - Bakery owner

“I thought I lost my website. Turns out I just typed the name slightly differently.” - Contractor

Why This Matters for AI Applications

Most AI applications assume clean input. We learned:

Bad: Trust user input → create duplicates → clean up later Good: Normalize input → detect duplicates → confirm with user

The startup lesson: AI is great at understanding messy input, but you still need deterministic logic for matching. Use AI to extract structure, use code to match patterns.

Key Insights

AI for extraction, code for matching: Claude extracts business info, code generates fingerprints
Multi-factor scoring: Name + location + phone + type = high confidence
User confirmation: When in doubt, ask the user
Graceful degradation: If AI fails, fall back to user input

What’s Next

We’re exploring:

Fuzzy matching: Levenshtein distance for typo detection
Address normalization: Use Google Maps API to standardize addresses
Phone number lookup: Verify business phone numbers with Twilio
Historical data: Learn from user corrections to improve AI extraction

But the core insight remains: Fingerprints > exact matches.

Try it yourself: Enter “Joe’s Pizza Brooklyn” on WebZum, then try “Joes Pizza - Brooklyn NY”. Watch the duplicate detection catch it.

Building a deduplication system? Key takeaway: AI + normalization + fingerprinting = robust duplicate detection. Don’t rely on exact matches—businesses are messy.

The future of data quality isn’t perfect input—it’s intelligent normalization.

How We Use AI to Detect Duplicate Business Registrations (And Why It's Harder Than You Think)

How We Use AI to Detect Duplicate Business Registrations (And Why It’s Harder Than You Think)

The Problem: Users Keep Creating Duplicates

The Insight: Fingerprints, Not Exact Matches

How It Works: The Technical Architecture

1. AI-Powered Business Name Extraction

2. Normalization Pipeline

3. Fingerprint Generation

4. Duplicate Detection

5. User Confirmation Flow

The Challenges We Solved

Challenge 1: False Positives

Challenge 2: Franchise Locations

Challenge 3: AI Hallucinations

The Results: 95% Accuracy

Why This Matters for AI Applications

Key Insights

What’s Next

Ready to Build Your Website?

How We Use AI to Detect Duplicate Business Registrations (And Why It's Harder Than You Think)

How We Use AI to Detect Duplicate Business Registrations (And Why It’s Harder Than You Think)

The Problem: Users Keep Creating Duplicates

The Insight: Fingerprints, Not Exact Matches

How It Works: The Technical Architecture

1. AI-Powered Business Name Extraction

2. Normalization Pipeline

3. Fingerprint Generation

4. Duplicate Detection

5. User Confirmation Flow

The Challenges We Solved

Challenge 1: False Positives

Challenge 2: Franchise Locations

Challenge 3: AI Hallucinations

The Results: 95% Accuracy

Why This Matters for AI Applications

Key Insights

What’s Next

Ready to Build Your Website?