Understanding database scaling patterns

There are lot of articles online describing database scalability patterns, but they are mostly scattered articles —just techniques are defined haphazardly without much context, they are not defined in a step by step manner i.e; when to choose which scaling option, which scaling options are feasible in practise & why. Moreover I am planing to discuss some of the techniques in details in future articles, so I feel it’s better if I discuss step by step techniques with some context in my own way. This article is high level article — I will not discuss scaling techniques in details here.

Assume you have built a startup which offers ride sharing at cheap cost. Initially when you start, you target a city & hardly you have tens of customers after initial advertisement. You save all customer, trip, locations, bookings data, customer trip history in the same database or most likely in a single physical machine, there is no fancy caching or big data pipeline to solve problems since your app is very new. This is perfect for your use case at this moment since there are very less customers & your system hardly books 1 trip in 5 minute say.

Now more people start signing up in your system since you are the cheapest service in the market & thanks to your promotion and ads. You start booking say 10 bookings per minute, slowly the number increases to 20, 30 bookings per minute. At this point of time, you realize that the system has started performing poorly, API latency has increased a lot, some transactions deadlock or starve and eventually they fail. Your app is taking more time to respond causing customer dissatisfaction. What can you do to solve the problem?

Pattern 1- Query Optimization & Connection Pool implementation:

Another parallel optimization that you can do is tweaking around database connections. Database client libraries & external libraries are available in almost all programming languages. You can use connection pool libraries to cache database connections or can configure connection pool size in the database management system itself. Creating any network connection is costly since it requires some back & forth communication between client & server. Pooling connections may help you to optimize on number of connections. Connection pool libraries may help you to multiplex connections — multiple application threads can use the same database connection. I shall see if I can explain connection pooling in details in a separate article later.

Your measure latency of your APIs & find probably 20–50% or more reduced latency. This is good optimization at this point of time.

You have now scaled your business to one more city, more customer sign up, you slowly start to do 80–100 bookings per minute. Your system is not able to handle this scale. Again you see API latency has increased, database layer has given up, but this time, no query optimization is giving you any significant performance gain. You check the system metric, you find disk space is almost full, CPU is busy 80% of the time, RAM fills up very quickly.

Pattern 2- Vertical Scaling or Scale Up:

But how do you set up machine for vertical scaling?

You allocate a bigger machine. One approach is not to migrate data manually from old machine rather set the new machine as slave to the existing machine (master)-make a temporary master slave configuration. Let the replication happen naturally. Once the replication is done, promote the new machine to master & take the older machine offline. Since the bigger machine is expected to serve all request, all read / write will happen on this machine.

Cool. Your system is up & running again with increased performance.

Your business is doing very well & you decide to scale to 3 more cities — you are now operational in 5 cities total. Traffic is 3x times than earlier, you are expected to do around 300 bookings per minute. Before even achieving this target booking, you again hit the performance crunch, database index size is increasing heavily in memory, it needs constant maintenance, table scanning with index is getting slower than ever. You calculate the cost of scaling up the machine further but not convinced with the cost. What do you do now?

Pattern 3- Command Query Responsibility Segregation (CQRS):

Most of the medium scale startups which serve few hundred thousand requests everyday can survive with master-slave set up provided that they periodically archive older data.

Now you scale to 2 more cities, you see that your master is not able to handle all write requests. Many write requests are having latency. Moreover, the lag between master & slave sometimes impact customers & drivers ex — when trip ends, customer pays the driver successfully, but the driver is not able to see the payment since customer’s activity is a write request that goes to the master, while driver’s activity is a read request that goes to one of the slaves. Your overall system is so slow that driver is not able to see the payment for at least half a minute — frustrating for both driver & customer. How do you solve it?

Pattern 4-Multi Master Replication:

In a multi-master configuration, all the machines can work as both master & slave. You can think of multi-master as a circle of machines say A->B->C->D->A. B can replicate data from A, C can replicate data from B, D can replicate data from C, A can replicate data from D. You can write data to any node, while reading data, you can broadcast the query to all nodes, whoever replies return that. All nodes will have same database schema, same set of tables, index etc. So you have to make sure there are no collision in id across nodes in the same table, otherwise during broadcasting, multiple nodes would return different data for the same id. Generally it’s better to use UUID or GUID for id. One more disadvantage of this technique is — read queries might be inefficient since it involves broadcasting query & getting the correct result — basically scatter gather approach.

Now you scale to 5 more cities & your system is in pain again. You are expected to handle roughly 50 request per second. You are in desperate need to handle heavy number of concurrent requests. How do you achieve that?

Pattern 5- Partitioning:

Now imagine you have expanded your business to a total of 20 cities in your country & planning to expand to Australia soon. Your increasing demand of app requires faster & faster response. None of the above method can help you to the extreme now. You must scale your system in such a way that expanding to other countries / regions does not always need you to do frequent engineering or architecture changes. How do you do that?

Pattern 6 -Horizontal Scaling:

Sharding is in general hard — at least engineers from different companies say that. But when you serve millions or billions of requests, you have to make such tough decision.

I will discuss sharding in greater detail in my next post, so holding back my temptation to discuss more in this post.

Now since you have sharding in place, you are confident that you can scale to many countries. Your business has grown so much that investors are pushing you to scale the business across continents. You again see some problem here. API latency again. Your service is hosted in USA & people from Vietnam are having difficult time book rides. Why? What do you do about it?

Pattern 7- Data Centre Wise Partition:

These are some general step by step techniques for database scaling. Although most of the engineers don’t get enough chance to implement these techniques, but as a whole it’s better to get a broader idea about such system which in future may help you to do better system & architecture designing.

In my next articles, I will try to discuss some of the concepts in details. Please feel free to give appropriate feedback for this post if any.

Deep discussions on problem solving, distributed systems, computing concepts, real life systems designing. Developer @PayPal. https://in.linkedin.com/in/kousikn