Piotrek Wąsik - Product Manager

Two months after we opened the beta on a payments product I'd been driving end to end, the dashboard was almost empty. A couple of customers live, a few workspaces connected, a balance sitting in an account, and close to zero in actual payouts.

If you judged it by the success criteria we'd written down, that's a launch in trouble. The thing is, it wasn't in trouble at all. The product was doing exactly what it was supposed to do — we were just measuring it with the wrong instrument. Working out why is the most useful thing I took from the whole project, and it generalises to almost any enterprise launch.

Every metric we'd written down was a scoreboard

When we set success criteria, we did what most teams do: we picked the things that obviously matter. Dollars processed. Accounts paid. Share of eligible campaigns adopting the feature. All sensible. All the right things to want.

And all completely useless for deciding what to do in any given week.

Amazon has the cleanest language for this. In Working Backwards, Bryar and Carr describe how the company splits metrics into output metrics and controllable input metrics — what the rest of the industry calls lagging and leading indicators. Output metrics (revenue, orders, profit) are the things you care about but can't move directly or sustainably. Input metrics are the controllable activities that, done right, produce those outputs. Bezos's standing instruction was to spend the team's energy on the inputs and let the outputs follow.

The reason isn't philosophical, it's practical. By the time a lagging metric moves — or fails to — the decisions that drove it are weeks or months in the past, and the window to do anything about them has usually closed. A lagging metric is a great scoreboard. It is a terrible steering wheel. You cannot drive a car by staring at the final score.

Our problem was that we'd brought a scoreboard to a job that needed a steering wheel.

The enterprise wrinkle: the product can't move for weeks

There's a second thing going on in enterprise that makes this sharper, and it's the part people miss.

In a self-serve product, a signed-up user can reach value in minutes, so early in-app metrics are at least measurable. In enterprise, there's a structural delay between "the customer wants this" and "the customer can use this." In our case the chain was: close the deal → sign a separate contract (weeks) → set up and connect a workspace → get added as a vendor in the customer's finance system → onboard the actual end users. Only at the end of that chain can a single transaction happen.

So at month two, looking for volume on the dashboard was looking in the one place where, structurally, nothing could have happened yet. The demand was real and it was validating — it was just showing up in the sales pipeline, not the product. Win rates were high. Enterprise deals were closing. The only slow step was signature. That is a leading indicator screaming that you've built the right thing. We were ignoring it because it wasn't on the scoreboard.

The lesson: in enterprise, demand validates upstream of the product, often by weeks or months. If your only instruments live inside the product, you're flying blind during exactly the period when you most need to know whether to keep going.

The fix is two instruments, not a shorter goal

When a launch's headline numbers look quiet, the instinct from the GTM side of the house is reasonable: "twelve months is too long to wait — give us a three-month goal so we can adjust." That instinct is half right. Twelve months is too long to wait for a signal. But the answer isn't to shorten the output goal. Adoption curves aren't linear; dividing an annual target by four and expecting a quarter of it by month three will just manufacture a fake miss and trigger panic adjustments to something that's working.

The answer is to run two instruments at two different cadences:

Outputs → annual targets, reviewed quarterly. These stay where they are. You don't reset them every time you get nervous. This is just the nested-cadence model that OKR practitioners have used for years: long-horizon targets that you review on a shorter rhythm without re-setting on every review. Setting cadence and review cadence are two different things, and conflating them is where these arguments usually go wrong.

Controllable inputs → a six-week scorecard you can act on. This is the steering wheel. A short list of leading metrics that (a) you control, (b) move before the outputs do, and (c) are hard to game. For the payments product, that scorecard looked roughly like this:

Deals with the feature in the pitch, and the win rate on them — the demand signal that shows up first
Contracts signed and contracts in flight, plus time-to-contract — the structural bottleneck
Workspaces connected — the readiness step
The activation funnel: invited → onboarded → first payment, plus time-to-first-payment — the part that actually predicts volume

None of those is the thing you ultimately care about. Every one of them tells you what to do this month, and every one of them moves weeks before the dashboard does.

How to pick inputs that are actually worth tracking

The trap with leading metrics is choosing ones that are easy to move but don't connect to anything. Amazon's own story about this is instructive: their "selection" metric started as the number of product detail pages, then became the share of detail-page views where the item was in stock, then the share where the item was in stock and available for fast shipping. Each revision was a response to the same question asked in their weekly review — "if we move this metric as currently defined, does the output we want actually follow?" — until the input genuinely predicted the output and couldn't be inflated by gaming it.

That's the test. A good input metric is controllable, leads the output, and resists being gamed. "Number of creators invited" looks like progress but is trivially inflatable; "invited → first payment conversion" is not. Pick the version that hurts to fake.

And keep the list short. The more input metrics you track, the harder it is to weigh them or make trade-offs, and the faster the whole thing decays into a vanity dashboard nobody acts on. A handful you review religiously beats twenty you glance at.

What this changes about defending your numbers

The practical payoff isn't just better measurement, it's a better conversation with everyone watching the launch. When the outputs are quiet and someone asks why, "trust me, twelve months" is a weak answer and "here's a third of the annual target, which we missed" is a worse one. "Here's the input scorecard, here's the demand validating in the pipeline, here's exactly where it's bottlenecked and what we're doing about it" is a position you can hold in front of anyone — and it reframes a scary-looking dashboard from failure into expected lag, with the leading signals all green.

If I were starting the same launch again, I'd write the input scorecard before the output targets. The outputs are the easy part; everyone can name them. The inputs are where the judgement lives, and they're what carry you through the months when the scoreboard, correctly, still reads zero.

---

Field notes from building and shipping product. More at buildwithpiotrek.com.