Mon Feb 03 2020

How Optimized Threaded Pagination Works

Article size is 3.5 kB - 173 kB and is a 3 min read

For FastComments one of our challenges was dealing with long comment threads. For example, our demo page already has over 200 comments at time of writing. Long threads make us miss our SLA of 40ms for a /comments api call. We still miss this occasionally for other reasons, but this was a big hit.

Ideally, the call finishes in 30ms. That's our internal goal.

There are a lot of easy solutions to pagination, nothing revolutionary there. However, it gets a little more fun when you throw nested replies (threads) into the mix.

Threads Example

Reason being is that if you cop-out and do it the easy (and potentially bad way) way that works 90% of the time you end up with something that's very slow the other 10% of the time. Being Fast, we don't want to do that. Some examples which don't make us happy:

  1. Fast - Bad Interaction - not showing the threads and instead just showing the "top-level" comments.
  2. Mostly Fast - Bad Interaction - paginating by the top-level comments and always querying all replies (could potentially show hundreds instead of just 30 a page like we want to).

What we wanted to do is paginate by "content breaks". To illustrate let's break up the example from earlier:

Threads Example Sliced Up

A page can occur at any point in the thread. This gives us a lot of flexibility. But how do you optimize this? An obvious solution is to do what we did originally - query the whole set of comments - then just run DFS through the threads and short-circuit when you reach your pagination limit.

We take it a step further and only do this operation during write. So any vote, comment, or moderator action triggers a job that updates the pageNumber at a comment level. Those from the SQL world will think of this as a stored procedure that runs "ON UPDATE".

Of course, for us it's about 20 lines of JavaScript. This job so far seems to execute in around 300ns for threads with ~300 comments. This includes reading from and writing to the DB.

Some OK Code

Another option would have been to use a graph database. However, wanting to build a stable product and not having the operational experience with a graph database made that not a good idea.

For very large threads, a check is done to offload the work into a dedicated batch job for processing. To reduce the confusion of a submitted comment in a large thread not showing right away, it is inserted in the "correct" place in the correct page, even if it increases the size of that page slightly.