MK — PORTFOLIO
Architecture Engineering Deep-Dive

WHY I STOPPED
FIGHTING LATENCY and started measuring it

MK
Mandeep Kaur
Senior Full Stack Developer
February 2026
9 min read

At WyldTrace, we had a traceability API that was slow — not catastrophically slow, but frustratingly, unpredictably slow. The kind of slow that makes stakeholders raise eyebrows during demos and causes that uncomfortable silence when a product manager asks why the page took four seconds to load.

After months of profiling, tuning, and one critical revelation about connection pooling under load, we brought average request latency down by 38%. This is what I learned — and what I'd do differently from day one.

"The biggest mistake wasn't the slow code. It was not knowing which code was slow, when it was slow, and why it mattered at scale."

THE Problem We Had

The platform handled product provenance lookups — scanning a QR code on a physical product should return its full supply chain history in under two seconds. In development, it did. In staging, it did. In production, under real load, it sometimes took five or six. Sporadically. Infuriatingly.

The initial instinct — and I'll admit this was mine — was to throw solutions at it. Add caching. Tune the JVM heap. Rewrite the worst-looking query. These things helped marginally. But we were optimising blind.

Lesson one: Optimising without measuring is guessing. You might guess right occasionally, but you'll never know why it worked, and you won't be able to reproduce the result deliberately.

BUILDING Visibility First

Before touching a single line of application code, we instrumented everything. Spring Boot Actuator gave us the foundation. We wired in Micrometer with a Prometheus backend, added custom timers around our most-called service methods, and deployed a Grafana dashboard that gave us per-endpoint p50, p95, and p99 latency in real time.

The first thing the dashboard told us was humbling: the problem wasn't our code at all.

// Before: fire and forget, no visibility
public ProvenanceRecord lookupRecord(String qrCode) {
    return repository.findByQrCode(qrCode);
}

// After: instrumented with Micrometer timer
private final MeterRegistry registry;

public ProvenanceRecord lookupRecord(String qrCode) {
    return Timer.builder("provenance.lookup")
        .tag("endpoint", "qr-scan")
        .register(registry)
        .record(() -> repository.findByQrCode(qrCode));
}

Once we could see the data, patterns emerged immediately. Latency spikes happened at predictable intervals — roughly every 10 minutes — and they correlated almost perfectly with connection pool exhaustion events in our HikariCP logs.

THE REAL Culprit

Our microservice was configured with default HikariCP settings. The default maximum pool size is 10 connections. Under normal load, fine. Under the burst of 50–80 concurrent lookups that came with a real product launch event, threads were queuing for a database connection for up to 3.4 seconds before the actual query even ran.

The query itself was fast — under 40ms with proper indexing. But threads were waiting 3,400ms just to get a connection. We'd been profiling queries while the real problem was a config value we'd never touched.

3.4s
Avg connection wait before
38%
Latency reduction achieved
1.2s
End-to-end at scale after

What We Changed

  • HikariCP pool size — tuned from default 10 to 30, with a connection timeout of 2s and a max lifetime of 10 minutes. Monitored the pool utilisation to find the right ceiling without over-provisioning.
  • Database indexing — added a composite index on (qr_code, product_id, created_at) which reduced our most common query from a full table scan to a sub-5ms index seek.
  • Read replicas — directed all lookup traffic to a read replica on AWS RDS, freeing the primary for writes and reducing contention.
  • Request pipeline batching — grouped concurrent lookups for the same product into a single downstream call using a short-lived in-flight cache keyed on QR code hash.
  • N+1 query elimination — Hibernate was triggering one query per supply chain step. Replacing with a single JOIN-based fetch via @EntityGraph cut per-request query count from 12 down to 1.

WHAT I'D DO Differently

Looking back, nearly all of this pain was avoidable. The fixes were not complex — the real cost was the weeks we spent optimising the wrong things before we could see clearly what was wrong.

  • Instrument from day one. Add Micrometer, wire up Prometheus, build a Grafana dashboard before the first PR is merged. It costs an afternoon and pays back tenfold.
  • Load test early and often. A single-user response time tells you almost nothing. What matters is behaviour under your p95 concurrent load. Use k6 or Gatling and test from the first week.
  • Never leave connection pool config at default. Profile your actual concurrent usage, set maximumPoolSize deliberately, and alert on pool saturation.
  • Audit every ORM query. Hibernate is powerful and treacherous in equal measure. Log SQL in staging, count queries per request, and treat any N+1 as a bug.
  • Treat latency as a feature. It's not a performance concern to defer to a later sprint. Your users feel it on the first day.
"Slow software isn't just a technical problem. It's a trust problem. Every second a user waits is a second they're wondering whether your product is reliable."

THE Outcome

After implementing the changes above over three focused sprints, our average end-to-end latency for a QR provenance lookup dropped from 4.2 seconds to 1.2 seconds at scale — well within our target. The p99 dropped from a painful 8.1 seconds to 2.4 seconds. No more stakeholder eyebrows.

More importantly, we now had the instrumentation to know what was happening at all times. When a new deployment causes a regression, we see it on the dashboard within minutes — not in a Slack message from an unhappy user.

That shift — from reacting to symptoms to observing causes — is the real lesson here. The code changes were almost secondary.

TL;DR: If your API is slow and you don't have per-endpoint p95 latency visible on a dashboard right now, that's the first thing to fix. Measure before you optimise. Everything else follows from seeing clearly.

MK
Mandeep Kaur
Senior Full Stack Developer · Glasgow, UK

7+ years building enterprise Java systems across fintech, healthcare, and analytics. Currently scaling a production traceability platform at WyldTrace. MSc Data Analytics (Distinction), University of Strathclyde.