On Pagination

Today, I went online to try to find good explanations for pagination, and was greeted with a lot of stuff that made it harder for someone new to understand the concept. So, I’m making a simple little post to explain the concept.

The entire point of my search for the concept was to help one of the developers of Weit (my friend), to understand the need, necessity and purpose for pagination intuitively. I might have missed some very cool blog posts that were addressing this concept adequately, but I blame the Google search engine for this sad state of affairs.

The normal flow of data #

Assume that you have a database that has a table called food. This is great, because the food table has every single ingredient that you can think of¹. Now, let’s visualize how many food items we have:

A whole lot of food items — Too many food items

Clearly, we have a LOT of food items. This is nice because when we ask for all the food items, we get all of them. Take the case of an unsuspecting user asking for all the food items present in the database:

API server and the food items — One user requesting for the food items

Now, this is a problem for two reasons:

The user generally doesn’t want everything everywhere all at once.
There’s a little too much data being sent here.

While it may seem that bigger is always better, we should dive deeper into the second point here. There’s a few internal demons that we’re fighting off:

A large request to the database ends up taking a long time to process if there’s too many records to return.
A large response from the database takes time to reach the API server.
Validating the response received from the database is going to be slow on the API side because there’s too much data.
When you send back so much data from the API server to the user, there’s a strain on the network bandwidth and the resources that you’re using up on a single server.
Think about when you have 1000+ users, the cost for processing stuff skyrockets in terms of resources and money.

To prevent all of these issues from plaguing us in our dreams, we’re talking about pagination today.

Pagination to the rescue! #

Instead of sending too much data back at once, we can break it down into smaller chunks called “pages”, and send back one page at a time whenever the user requests for a page.

Breaking stuff down into pages — Break it down for me

Honestly, that’s kinda it. Break stuff down into chunks and send it back to your user. Now, you might ask why this works better than our generic approach of send it all back and be done with it. To that I say,

Your user might just want a general view of what the data looks like. Sending everything back is a waste because of this.
Perhaps they’ve used a filtering condition and the response they want will show up at the very first record.
It’s easier for frontend developers to keep their interface clean.
You’re not making your API servers beg for a swift death.
You’re not trying to overclock your database in place of your gaming rig.

The code to enable this good boy behavior #

Now, there’s actually a surprising amount of debate on how to set up your database request to effectively paginate and provide you with responses. A large part of the implementation does end up boiling down to your system architecture and what might be an efficient method for you to implement pagination.

I’ll discuss the most bog standard way because this is an introductory blog post and nothing that goes into the thick of it.

# Unpaginated code
def get_food_items(connector):
    # ... database connection logic

    cursor.execute("""
    SELECT *
    FROM food
    ORDER BY fdc_id
    """)
    results = cursor.fetchall()

    # ... transformation to fit expected response data structure
    return response

Now, the previous one looks simple but it nails the coffin for the API server and the database. Let’s look at a better approach that increases the processing a bit on the database side but reduces the load significantly on the API server side².

# Paginated code
def get_food_items(connector, page: int = 0, size: int = 20):
    # ... database connection logic

    cursor.execute(f"""
    SELECT *
    FROM food
    ORDER BY fdc_id
    LIMIT {size} OFFSET {page * size}
    """)
    results = cursor.fetchall()

    # ... transformation to fit expected response data structure

In the above code, the page is used to determine which chunk is to be retrieved while size is used to determine the number of records in a page. By default, a user only receives 20 records at the start. However, this approach also allows you the flexibility of maxing out your size to send all the records in the database back as well.

It is to be noted that this approach isn’t very good for very huge datasets. Even an index might not help due to the data scanning needing to take place at a pretty massive scale.

That’s about it. Thanks for reading all the way through!

Further Optimization Reading #

Planetscale has a pretty good blogpost on explaining what strategies smart people have adopted to solve hard pagination problems.

It certainly is misleading because I would assume that food should be equated the same as a “dish” but that’s what the folks at USDA FoodCentral Dataset have dictated. ↩︎
Credit for the code piece and disadvantages is attributed to @nimmikrishnab ↩︎