Billion User Trap - The Pagination Mistake That Can Take Down Your Database

Share on:

Background

In Part 1, we established that access patterns & not entities should drive your design. In Part 2, we fixed the database by replacing brittle three-table JOINs with a covering index, reducing P99 on the listing query. The schema was right. The index was right. The listing API itself was fast. And then it quietly started taking the database down - not because of a bad query, but because of how that fast query was being paginated.

The Query That Kills Production

It's one line of SQL. It is the default behavior in most ORMs.

And at scale, it silently impacts database performance.

1SELECT user_id, full_name, kyc_status
2FROM   users
3ORDER  BY created_at DESC
4LIMIT  50 OFFSET 5000000;

This is offset-based pagination. It works correctly. And it will become your worst problem once your user table crosses tens of millions of rows.


What OFFSET Actually Does

Here's what most engineers believe OFFSET does:

"Skip to the 5 millionth record and return the next 50."

Here's what the database actually does:

"Scan and load the first 5,000,050 rows. Discard the first 5,000,000. Return the last 50."

And that too - EVERY TIME

The database must traverse every row before the offset to reach the rows you want. The deeper the page, the more work. The more concurrent users paginating, the more overlapping full-scans running simultaneously.


The Performance Profile of OFFSET at Scale

On a 200M row users table (PostgreSQL, SSD-backed, well-tuned):

Page Number (50 records/page)OFFSET ValueApprox. Query Time (P99)
Page 104ms
Page 1004,9506ms
Page 1,00049,95018ms
Page 10,000499,950140ms
Page 100,0004,999,9501,800ms

What happens while OFFSET query runs - It holds buffer pool pages and I/O bandwidth that your real-time user queries also need.


Cursor-Based Pagination

The correct approach is keyset pagination (also called cursor-based pagination).

Instead of "give me page N," you ask: "give me 50 records that come after a specific record."

 1-- First page (no cursor)
 2SELECT user_id, full_name, kyc_status, created_at
 3FROM   users
 4WHERE  region = 'NA'
 5ORDER  BY created_at DESC, user_id DESC
 6LIMIT  50;
 7
 8-- Subsequent pages (cursor = last record from previous page)
 9SELECT user_id, full_name, kyc_status, created_at
10FROM   users
11WHERE  region = 'NA'
12  AND  (created_at, user_id) < (:lastCreatedAt, :lastUserId)
13ORDER  BY created_at DESC, user_id DESC
14LIMIT  50;

What changed:

  1. No OFFSET - the WHERE clause positions the query directly in the index
  2. The database uses the index to jump to the exact start position
  3. Reads exactly 50 rows. Not 5,000,050.
  4. Query time is O(1) with respect to page depth - page 100,000 is exactly as fast as page 1

The same query at page 100,000 now runs in 4ms instead of 1,800ms. Not 10% faster. 450X faster.


Implementing the Cursor Token

Exposing raw timestamps and IDs as cursor values, leaks your schema. Most common mistake I have seen is considering Base 64 encoding as a security mechanism that will safeguard your application from leaking schema.

The correct cursor design addresses this security problem, and ties the payload structure directly to the index that serves it.

Step-1 : The payload mirrors the index

The cursor payload is not an arbitrary bag of fields - it must contain exactly the columns in the same order as the index that powers the query. Recall the covering index from Part 2 :

1  CREATE INDEX idx_user_summary ON users(region, created_at DESC, user_id)
2    INCLUDE (full_name, kyc_status);

The cursor payload must comprise of the same three fields and that too in the same order

1  public record CursorPayload(
2    String region,
3    Instant createdAt,
4    UUID userId
5  ) {}

Step-2 : JSON instead of pipe-separated strings

Pipe-delimited strings are fragile - they break the moment a field value contains the delimiter

Step-3 : HMAC-signed cursor - encoding with integrity

The cursor must be a signed token : payload + a signature computed over that payload using a server-held secret key

 1@Component
 2public class CursorCodec {
 3
 4    private static final String HMAC_ALGORITHM = "HmacSHA256";
 5    private final SecretKeySpec signingKey;
 6    private final ObjectMapper mapper;
 7
 8    public CursorCodec(@Value("${cursor.signing.secret}") String secret) {
 9        this.signingKey = new SecretKeySpec(secret.getBytes(StandardCharsets.UTF_8), HMAC_ALGORITHM);
10        this.mapper = new ObjectMapper().registerModule(new JavaTimeModule());
11    }
12
13    public String encode(CursorPayload payload) {
14        try {
15            byte[] payloadBytes = mapper.writeValueAsBytes(payload);
16            byte[] signature = sign(payloadBytes);
17
18            // Final token = base64(payloadBytes) "." base64(signature)
19            String encodedPayload = Base64.getUrlEncoder().withoutPadding().encodeToString(payloadBytes);
20            String encodedSignature = Base64.getUrlEncoder().withoutPadding().encodeToString(signature);
21
22            return encodedPayload + "." + encodedSignature;
23        } catch (Exception e) {
24            throw new CursorEncodingException("Failed to encode cursor", e);
25        }
26    }
27
28    public CursorPayload decode(String cursorToken) {
29        try {
30            String[] parts = cursorToken.split("\\.");
31            if (parts.length != 2) {
32                throw new InvalidCursorException("Malformed cursor token");
33            }
34
35            byte[] payloadBytes = Base64.getUrlDecoder().decode(parts[0]);
36            byte[] providedSignature = Base64.getUrlDecoder().decode(parts[1]);
37
38            byte[] expectedSignature = sign(payloadBytes);
39            if (!MessageDigest.isEqual(expectedSignature, providedSignature)) {
40                // Constant-time comparison — prevents timing attacks on signature verification
41                throw new InvalidCursorException("Cursor signature mismatch — possible tampering");
42            }
43
44            return mapper.readValue(payloadBytes, CursorPayload.class);
45        } catch (InvalidCursorException e) {
46            throw e;
47        } catch (Exception e) {
48            throw new InvalidCursorException("Failed to decode cursor", e);
49        }
50    }
51
52    private byte[] sign(byte[] data) throws Exception {
53        Mac mac = Mac.getInstance(HMAC_ALGORITHM);
54        mac.init(signingKey);
55        return mac.doFinal(data);
56    }
57}

A few details worth calling out :

  • The signing key lives in configuration, and not in code
  • Token format is payload.signature

Your API response carries the cursor for the next page :

1  {
2    "users": [...],
3    "nextCursor": "eyJyZWdpb24iOiJOQSIsImNyZWF0ZWRBdCI6IjIwMjYtMDYtMTVUMDQ6MzA6MDBaIiwidXNlcklkIjoiM2Y0OGEyLi4uIn0.k3J8x9PqRz...",
4    "hasMore": true
5  }

Client sends ?cursor=eyJyZWdpb24i... on the next request.

No page numbers, no offsets. Stateless, Scalable & Secured.

Advantages of this approach

  • Consistent performance at any depth
  • No database-killing deep scans

The Trade-off You Must Acknowledge

Cursor pagination shown here as part of my solution to the problem statement comes with constraints -

  1. Jump-to-page navigation ("go to page 37") - impossible with cursors
  2. Total count - accurate total requires a COUNT(*) which is itself expensive at scale
  3. Bidirectional browsing - reverse cursors are possible but with some additional complexity

Architect's Note

OFFSET-based pagination is one of the most common database performance mistake I have seen in platforms that creep in due to original design assumptions. It runs fine for initial years, and then quietly starts eating database capacity as the volume of data grows. If your platform's user listing or any "search and scroll" feature uses OFFSET under the hood - it's worth auditing before scale forces the issue.

comments powered by Disqus