An AI receptionist is a 1- to 3-year decision. Pick the wrong one and you spend six months unwinding bookings that never made it into your calendar, fighting overage clauses that doubled your bill, and migrating to a competitor while your current vendor's "retention specialist" pretends not to understand the cancellation request.

This is the 12-point checklist that separates the vendors that earn your business from the vendors that survive on inertia. Run every shortlisted vendor through this before signing. For the broader market context, read our 2026 buyer guide.

Point 1: Call the Vendor's Demo Line

Every serious AI receptionist vendor publishes a phone number that connects you directly to their AI in production. If a vendor only offers a recorded marketing pitch instead of a live demo line, that tells you the AI is not ready for your scrutiny.

What to test on the demo:

  • Ask a basic question. "What services do you offer?" The answer should be specific and accurate.
  • Ask a pricing question. "How much for X?" The AI should answer or transfer to a human.
  • Try to break it. Ask something off-script. "What's your favorite color?" or "Can you tell me a joke?" Watch how it recovers.
  • Try a different language if your patient base is bilingual. Switch to Spanish mid-sentence.

If the AI sounds robotic, takes more than 1.5 seconds to respond, or fails to handle ambiguity gracefully, the vendor is not ready. Move on.

Point 2: Ask for a Real Customer Call Recording

Marketing demos are scripted. Real customer calls are not. Ask the vendor to share a sanitized recording of an actual customer interaction (with PII redacted). Listen for:

  • Conversation flow that handles real-world ambiguity
  • Voice quality consistency across the entire call
  • How the AI handles a customer changing their mind mid-conversation

If the vendor refuses or only offers polished demos, you do not yet know how the product behaves under real conditions.

Point 3: Confirm Booking Integration in Writing

"We integrate with Boulevard" can mean three different things:

  1. Direct API integration. The AI writes appointments to Boulevard in real time during the call. This is what you want.
  2. Zapier middleware. The AI sends data to Zapier, which writes to Boulevard. This breaks two to three times per month and adds 15 to 60 second delays.
  3. Manual handoff. The AI sends an email to your front desk, who manually enters the appointment. This is not integration. It is voicemail with extra steps.

Ask which one. Get it in writing. The difference between these three options is the difference between a booking system that works and one that quietly fails 15 percent of the time.

Point 4: Read the Overage Clause Out Loud

Most surprise bills come from overage clauses written in language that obscures the trigger. Read your contract's overage clause aloud. If you cannot say it without confusion, your future bill will reflect that confusion.

Watch for triggers like:

  • Per-minute overage above an included bucket
  • Per-call overage above an included call count
  • Per-unique-caller overage (Goodcall does this)
  • "Premium minutes" that bill at a higher rate during certain hours
  • SMS overage on top of voice overage

If the vendor markets a flat rate, confirm what "unlimited" actually covers. We dig into this in our cost comparison post.

Point 5: Verify HIPAA / BAA at Your Tier

Some vendors only sign BAAs (Business Associate Agreements) on Enterprise plans that cost 3x what their published mid-tier costs. If you handle protected health information, the BAA is non-negotiable. Confirm in writing:

  • BAA available at the tier you intend to buy
  • BAA signed before any PHI flows through the system
  • BAA covers all subprocessors (the LLM, the speech-to-text engine, the cloud provider)

If the vendor cannot answer the subprocessor question, they have not done their compliance homework.

Point 6: Test the Cancellation Flow

Before signing, ask the vendor exactly how you cancel. The right answer is "log into the dashboard and click cancel" or "email support@ and we will process within 7 days." The wrong answer is "speak to your account manager" or "submit a written request to retention."

If the cancellation flow involves friction, the vendor is buying inertia, not earning loyalty. That tells you everything about how they will treat you when you have a problem.

Point 7: Pin Down Implementation Timeline

Real AI receptionist implementations land in 7 to 14 days. The breakdown:

  • Days 1 to 2: Discovery and intake
  • Days 3 to 5: Calendar and CRM integrations
  • Days 6 to 8: Voice tuning
  • Days 9 to 11: QA testing
  • Days 12 to 14: Soft launch with split traffic

If a vendor quotes 30+ days, ask why. The answer is usually that their implementation is custom-coded per customer, which means it will be expensive to maintain and slow to update. If a vendor quotes 24 hours, ask what they are skipping. The answer is usually QA testing.

Point 8: Get a Reference at Your Call Volume

Vendor case studies are filtered. References give you the unfiltered version. Ask for a customer in your industry, at your call volume, who has been live for at least 6 months. Then call them and ask:

  • What does the bill actually look like vs what they were quoted?
  • How many bookings does the AI miss or get wrong per month?
  • How responsive is the vendor when something breaks?
  • Would they pick the vendor again, knowing what they know now?

If a vendor cannot produce a reference at your volume in your industry, they probably do not have one. That tells you something.

Point 9: Understand the Failure Mode

Every AI fails sometimes. The vendor's failure handling tells you everything:

  • Best case: The AI escalates to a human, the human picks up within 30 seconds, the call resumes seamlessly.
  • Acceptable: The AI escalates to voicemail with a transcript and the front desk calls back within 15 minutes.
  • Unacceptable: The call drops, the customer hears silence, or the AI loops without escalation.

Test the failure mode on the demo line. Ask something complex enough to trigger escalation.

Point 10: Check the Uptime SLA

"99.9% uptime" sounds great until you do the math. That is 8.76 hours of downtime per year, or roughly 44 minutes per month. For a phone system, every minute of downtime is missed calls and lost revenue.

What to confirm:

  • The published SLA percentage (99.9%, 99.95%, 99.99%)
  • What the SLA covers (the AI agent, the voice routing, the booking integration, all of the above)
  • The credits or remedies if the SLA is breached
  • Whether the vendor publishes a status page with historical incident data

A vendor without a public status page is hiding their reliability record.

Point 11: Look at Contract Length

Month-to-month is the 2026 standard. If a vendor still requires a 12-month commitment, the product is probably not strong enough to keep you voluntarily. Exceptions:

  • Custom enterprise deals with significant implementation costs
  • Heavily discounted annual prepays where the discount is genuine

For a typical mid-tier purchase, demand month-to-month. If the vendor refuses, walk.

Point 12: Run a 30-Day Pilot

Before going all-in, run a 30-day pilot. Recommendations:

  • Route 50 percent of inbound traffic to the AI, 50 percent to your existing setup
  • Track booking rate, customer feedback, escalation rate, and revenue captured
  • Review every call recording from the AI for the first 7 days
  • Compare the numbers honestly at day 30

If the pilot shows the AI matches or exceeds your existing setup on the metrics that matter, go all-in. If it does not, you have hard data to bring back to the vendor or move on without commitment.

How to Use This Checklist

Print or copy these 12 points. For each shortlisted vendor (typically 3 to 5), score 1-10 on each dimension. Add up the scores. The highest-scoring vendor is your top candidate.

For most med spas, the right call is to evaluate AutoMeit, Goodcall, Rosie, and one live service for baseline comparison. Book a demo with us to see how AutoMeit scores on your specific evaluation criteria. Or test our live AI now at +1-470-706-9896 and run point 1 right now.

FAQ

How long does AI receptionist evaluation typically take? A serious evaluation takes 2 to 4 weeks: 1 week of demo testing and reference calls, 1 to 2 weeks of contract review and feature verification, and a final week of pricing negotiation. Rushing past 2 weeks usually means missed details that surface as problems later.

What is the most common AI receptionist evaluation mistake? Skipping the live demo line test. Buyers rely on marketing videos and sales-led demos, both of which are scripted. Calling the live demo line and trying to break the AI surfaces problems that no scripted demo will reveal.

Should I get a custom demo before signing? Yes. After the live demo line test, ask the vendor to configure a custom demo using your service menu, hours, and FAQs. Test the custom demo with your team and 2 to 3 trusted patients before going live. This catches knowledge base errors and tone mismatches.

What contract terms should I negotiate? Month-to-month billing, no auto-renewal traps, BAA inclusion at your tier, transparent overage rates with caps, and a 30-day exit window if the AI fails QA in production. These are reasonable in 2026 and any vendor refusing them is a red flag.

How do I know if an AI receptionist is med-spa specific? Ask the vendor for a list of med spa customers (names, ideally), specific integrations they have built for med spa platforms (Boulevard, Mangomint, Mindbody, Zenoti, AestheticsPro), and how their knowledge base handles med spa-specific FAQs (PRP pricing, Botox follow-ups, consultation vs treatment booking flows). Generic AI receptionists will not have these answers.