Elementary Interactive
Blog

Why don't we send a thousand cold emails?

June 5, 2026

And what we do instead.

There's a certain type of B2B growth advice we've never been able to take seriously. It goes like this: send more. Automate the sequence. A/B test the subject line. Trust the funnel. "A thousand emails equals two clients" — as if the only variable that matters is volume, and relevance is just a rounding error.

We make our living building enterprise systems — Laravel backends, Filament 4 admin layers, multi-tenant platforms that have to survive real production load and that have to deliver a benefit our partners can actually assess and monetize for their own clients. So when we started thinking seriously about how to find new clients, we approached it the same way we'd approach a system design problem: what's the actual constraint, what's the failure mode, what's the way to live with the constraints, and how do we make all of it measurable so we decide on data instead of received wisdom?

This post is about the conclusion we reached, and the two things we're now doing instead of mass outreach: a deliberate method for who we reach, and a measurement setup that shows us whether any of it is working.

The volume model optimizes the wrong variable

Mass cold outreach treats response rate as a function of volume. Send enough, and the law of large numbers does the rest. The math can't be wrong — it's just optimizing the wrong thing.

The companies we want to work with don't choose a technical partner because a subject line landed. A serious enterprise — the kind that runs mission-critical systems, with a procurement process and an architecture review — selects a partner because someone visibly understood their problem before making contact. That understanding is exactly what doesn't survive scale. The moment you turn it into a template, parameterize it, and send it through a list, you strip out the one property that made it worth sending.

So volume and relevance aren't independent dials you can just turn. Past a certain point they're in direct tension: the mechanism that makes scaling possible is the same mechanism that destroys the signal. A thousand near-identical emails aren't a thousand chances — they're the same low-probability bet placed a thousand times, while you quietly spend down something that doesn't show up in the open rate: your credibility. Every poorly-targeted message is a small step back from how seriously the next one gets taken.

For a small, experienced shop, that's a terrible trade. We don't have a brand large enough to absorb the reputational cost, and we don't need the volume — we need a handful of the right engagements per year.

Method: targeting as a qualification problem

If the limiting factor is relevance, then the work moves upstream — into selection, before a single word is written. We treat this as a qualification problem, and most of the effort goes into disqualifying.

In practice that means examining signals an engineer can actually read, rather than a marketer's guesswork about a persona:

Technology fit. What does the company actually run? A visibly present PHP or aging Laravel footprint, a job posting that quietly reveals the stack, a careers page with something brittle integrated into it — these tell us whether the problem we're good at solving actually exists there. We're not the right partner for everyone, and pretending otherwise wastes both sides' time.

Evidence of a real problem. Public signals of friction: a slow or fragile system, a recent incident, a migration in progress, a team clearly under-resourced for what they're trying to ship. If we can't articulate a specific problem they have, we don't reach out. That's the disqualifier, and it kills most candidates.

Decision-making reachability. At a smaller enterprise, can we get to someone who actually owns the technical decision? Reaching a black hole is just volume with extra steps.

What comes out of this is not a list of a thousand. It's a short list — companies where we can write a specific first sentence about their situation by hand, with no merge tags. If we can't write that sentence, they're not on the list. That filter alone does most of the work.

This is slower per contact by design — the bet being that a smaller number of genuinely relevant conversations is worth more than a large volume of indifferent ones. Whether that bet pays off is exactly what we set out to measure.

Measurement: instrumenting outreach as a system

Here's the point where the engineering brain refuses to switch off — and where, we'd argue, most "thought leadership" about outreach quietly falls apart. People describe their process in glowing terms and show you exactly zero of the underlying numbers. We'd rather instrument the thing properly and let the data embarrass us if it needs to.

The setup is deliberately boring and standard, which is the point — it's reproducible:

  • UTM parameters on every link. Every outgoing link — whether in a message, a follow-up, or a piece of content — carries consistent utm_source / utm_medium / utm_campaign tagging. This is the difference between "someone visited the site" and "this outreach step drove this visit." Without it, attribution is just guessing.

  • GTM as the single, consent-aware instrumentation layer. Only the Google Tag Manager container loads in the page; GA4 fires through GTM as a tag, rather than through a second hardcoded snippet. One source of truth, versioned, with rollback — and, critically, consent-aware: GA4 is gated behind a consent-initialization trigger, so the whole thing is GDPR-correct by construction, not as an afterthought.

  • Events that map to the funnel, not to vanity. Page views are noise. What we care about is the sequence that actually correlates with an interaction: did the targeted contact arrive, did they go to the work we do, did they reach the contact point. Those are the events worth defining, because those are the ones a decision can rest on.

The point of this isn't the dashboards. The point is that the next decision — do more of what, less of what, change the targeting criteria — should rest on evidence, not on how the week felt. A small shop can't afford to spend months on a motion that doesn't convert because nobody was measuring whether it did.

A note on sample size, and on honesty

We're holding to one principle: we're not going to wave around early numbers as if they meant something. A few contacts and one positive reply isn't a trend — it's statistical noise, and presenting it as a signal would be exactly the kind of dishonesty we're standing against. You need a real sample before the data says anything, and we'd rather say "not yet" than invent a narrative.

So this is the first post in a series, and the gist is simple: we'll share what actually happens as it develops — the real response rates, the conversations, what converted and what didn't — without any polish. As soon as enough data accumulates to mean something, we'll show it. If it works, we'll show why. If it doesn't, we'll show that too.

Most "here's how we grew" content has only a superficial acquaintance with reality. We think the honest version is the rare one — and frankly, it's the only version worth writing when the entire premise is that relevance beats volume.

More to come.