What to ask before signing with any AI partner

The market for AI help is full of firms that look identical from the outside. Same website language, same case-study format, same retainer pricing, same promises. The real differences show up in the answers to specific questions — not in the marketing.

This is the question set we’d want an operator to use on us. We’ve been asked all of these. Some we answer well, some have made us sharper, and none of them should leave a serious partner stuck. Walk through these 14 with any firm you’re considering. If they get vague on more than two or three, walk.

Track 1: Have they actually done this work?

1. How many mid-market clients have you delivered AI work to in the last 18 months?

You’re looking for real, in-production work — not pilots, not workshops, not “we ran a session for them.” A firm with three clients can still be excellent, but it’s a riskier bet because the patterns haven’t stabilized. A firm claiming “hundreds” is usually counting things that aren’t really engagements. The honest number for most reputable mid-market AI partners sits somewhere in the double digits over 18 months.

2. Show me a working solution you’ve shipped — not a deck.

This is the most diagnostic question on the list. If the answer is “we’d have to get permission” or “we can describe one but can’t show one,” they probably don’t have many to show. A firm that has shipped real work will have a short walkthrough of a workflow running in production (anonymized), a specific KPI improvement it can defend, and an honest account of what went wrong during the build and what it learned. If you can’t see the work, you can’t trust the work.

3. What’s the longest-running client relationship you have, and why has it lasted?

Longevity tells you something. If the longest client is 14 months, the firm is either young or it churns. If it’s five-plus years, ask why. Sometimes a long relationship means the partner has gone native and stopped pushing; sometimes it means the partner has kept adding value across multiple cycles of the client’s business. Listen for how they describe it — operationally (“we ship X for them every quarter”) or relationally (“we’ve been there through three CEO transitions”). Both are signals. Vague is not.

Track 2: Who’s actually doing the work?

4. Who’s on my delivery team — names, backgrounds, where based?

A reputable partner can tell you, within days of an engagement starting, exactly who will work on your account. By name. With profiles. Their seniority should match what you’re paying for. Watch for the bait-and-switch (senior on the pitch, junior on the delivery), offshore teams when you expected onshore, “pods” without named individuals, and any inability to commit to who’s actually on your account. The right answer is a small, named team — typically two or three people at the AI Office tier, more for a Value Sprint or a multi-quarter program.

5. What’s the seniority of the person who’ll be in the weekly meeting with my team?

The seniority of the person actually in the room with you matters more than the org chart. Most mid-market work needs someone with real operating experience — eight to twelve years — because the hard calls aren’t technical. Which workflow to prioritize, which one to kill, when to push back on the CEO: those are judgment calls. A junior operator can be excellent at execution and still struggle with them.

6. What kind of work do you turn down?

A firm that takes everything is signaling either desperation or undifferentiated capability. A firm with clear “no” categories has thought about who it’s actually good for. Listen for specifics: “We turn down anyone who won’t commit a sponsor.” “We turn down regulated work where we don’t have specific experience.” “We turn down what’s really data engineering dressed up as AI.” If they can’t name a category they decline, they’re not being honest with you about fit.

Track 3: How do they actually operate?

7. What does a typical first 90 days look like?

Not “we’ll align on objectives and develop a roadmap” — that’s the answer of a firm that has never delivered. The right answer is operational and specific: a kickoff in week one (in person if possible), a roadmap inside the first two weeks, a first build started by week three, something in production around day 45 to 60, and a documented ROI baseline by day 30. If the answer involves no production work in the first 90 days, the engagement model is wrong for the mid-market.

8. How do you measure ROI?

The honest answer involves baselines, specific business KPIs — not technology metrics — and a methodology that survives a skeptical CFO. Watch for “time saved” that never converts to dollars, “user satisfaction” as a primary metric, “adoption rate” as the headline number, or “hours of AI-generated output” offered as proof of impact. Those are all real measures, but none of them answers “did this engagement pay for itself.” The right answer lets you tell your board, in one sentence, what you got.

This is also where you press on commitments. Ask the firm to put accountability on the table where it should be: a fixed-scope build — a Value Sprint or a multi-quarter program — is where a 12-month KPI guarantee belongs. A monthly retainer like an AI Office should be held to a different standard: it should target at least 3x payback and effectively pay for itself — and we still stand behind the KPIs we set, working until they move.

9. What happens when the AI gets something wrong in production?

This is the question most vendors haven’t thought hard about. The right answer involves human-in-the-loop checkpoints where consequences matter, documented escalation paths, monitoring that flags errors to the build team, a process for capturing lessons and updating the system, and a clear line on who owns what when something goes sideways. If they say “we test thoroughly so this shouldn’t happen,” they’re not ready for production. AI gets things wrong. The question is how they’ve prepared for it.

Track 4: How do they think about the relationship?

10. What’s your cancellation clause?

A confident partner offers a low cancellation barrier — month-to-month after a short initial commitment, no long lock-ins. The logic is simple: if the work is valuable, you’ll stay; if it isn’t, both sides should move on. Be cautious of 12-month minimum commitments, cancellation fees, “mutual termination” clauses riddled with carve-outs, or anything that makes leaving expensive. How easy a firm makes it to leave is a direct read on its confidence in its own work.

11. How do you transfer knowledge to my team?

The right answer is “we measure ourselves on your independence, not your dependence” — followed by how. Documentation, working sessions with your people in the room, runbooks, training, and a clear point where you can operate without them. The wrong answer is some version of “you’ll always need us for X.” That’s a partner who has built a moat by hoarding knowledge, and you’ll resent it by month 18.

12. If we wanted to take a workflow you built and run it without you, could we?

The honest answer is yes, with caveats. The system should be documented well enough that another competent team could operate it, and you should own the code, the prompts, and the configurations. The firm’s value should live in continued partnership, not in the lock-in of what it has already built. If the answer is “no, our systems require our ongoing involvement,” that’s a business model that depends on your dependence.

Track 5: How do they handle the hardest cases?

13. Tell me about a client where the engagement didn’t work — what happened, and what did you learn?

If they can’t name one, they’re either very young or not being straight with you. Every serious AI partner has had engagements that didn’t land. The mature ones can describe specifically what went wrong — usually the wrong sponsor, the wrong problem, or the wrong time — and what they changed in their process because of it. The texture of the answer matters more than the answer itself. A firm that tells you a specific story is a firm you can work with.

14. What’s the question I haven’t asked that I should be asking?

This is the best closer on the list. A firm that has been through enough engagements will have a specific answer — something about sponsor risk, organizational readiness, or the trap most mid-market companies fall into. If they don’t have one, they haven’t thought hard enough about why engagements succeed and fail. That’s a yellow flag.

How we’d score ourselves on these 14

Honestly: we’d score well on most, fine on a couple, and we’re still working on one. We’d encourage you to push us on any of them in a 30-minute call. The conversation will tell you more than any website can.

Proof