Distributed Systems : CAP theorem

devcat
5 min readNov 24, 2023

Understanding CAP theorem with database

Welcome to the Distributed Systems series, In this article we will learn and understand what is CAP theorem. CAP stands for consistency, availability and partition tolerance. When we talk about CAP theorem, we mostly talk about distributed systems. First, Let’s understand what is distributed system. Distributed System is a system which is made up of multiple process that runs on single machine or multiple machine. In this lecture, we are going to learn CAP theorem in distributed system perspective using a simple database analogy.

What is CAP theorem?

CAP theorem states that in a Distributed System, while network partition occurs we can only choose either consistency or availability. This is coined by Eric Brewer to understand distributed system. CAP stands for consistency, availability and partition tolerance.

consistency — all client will see the most recent write.

availability — system is available and return data even if the node that contains the most recent data fails.

partition tolerance — system should work despite network failures.

Now, we have understood the definition of CAP theorem, Next we will see how it plays a role in designing or choosing the distributed system such as database, cache, storage..etc.

Let’s first understand with a simple analogy. We are going to insert a record into mysql database. Mysql server runs on single machine in single process to handle read and write requests as specified in the below diagram.

Client wants to read and write records to the Relational Mysql DB. Since the database instance is running on single machine, System will be consistent which means that client will see the most recent write. System will not be available if the node fails or if there is network failure between client and database server.

As the number of users grow, our database should scale more reads and writes. There are multiple approach available to scale the database, but we are going to discuss how CAP theorem is applied, when we replicate or partition the database. This is where distributed system comes into practice. In order to scale more reads, we will replicate the data from mysql master to replica for every change that occurs in the master database. This approach helps to split the write and read requests to separate mysql instance running on separate machine.

As specified in the below diagram, mysql instance is running on master which accept write requests, All the writes are then replicated into the replica machines to handle the read requests. Compare to the previous single instance setup, we have separated the writes and reads into separate machines.

Now we have distributed mysql database which serves writes and reads from master and replica mysql instances. Let’s look at how CAP theorem is used to define this mysql setup. Since our database setup becomes distributed, It is partition tolerant by default which means when some nodes fail, system should work to handle some client requests.

In the above example, we have not partitioned the database instead we have only added replication to scale read requests. If the master node fails, replica instances are available to serve the read requests. This is bit different from the previous single instance setup. As CAP theorem states that in a distributed system we can only choose either consistency or availability.

Choosing Consistency or Availability

Let’s say that our database system required to be highly consistent, It should return the most recent data at any time when client makes a request. Let’s consider a simple scenario of writing records to master db.

If the replica machines are not available, then master db has two choices,

  • It could return an error to the client.(Synchronous)
  • It can write the data to the local machine and returns success.(Asynchronous)

If we go by first choice, returning error to the client. Our system will be highly consistent if there is network partition occurs between master and replica instances. since master and replica instances have the same data, our client will read the most recent write. Our system is now more of CP.

If we go by second choice, Our system will be highly available. It will return success response to our client and store the write to our local machine. Our writes will be replicated asynchronously at later point of time. Meanwhile, if the client which writes to the master, reads from replica instance it may or may not see the most recent data. This is due to delay in asynchronous replication due to network partition or network failure.

Our system is now more of AP since our client might not see the most recent data from read replica due to network partition between master and replica.

In a distributed system, It’s impossible to achieve all the three properties of the CAP. we can only choose any two of the CAP such as CA, CP, AP. CA does not make any sense, In most cases distributed systems are partition tolerant. So, we either go with CP or AP.

When we choose CP by opting the first choice of the above example, It is not that our system will not be available. Our database will not be available to answer the write when the master instance can’t connect to the replica instance. In this scenarios, We are favouring consistency over availability.

We can call distributed systems as CP or AP because​ these properties are favoured during certain scenario such as network failure, Here we can not sacrifice P(partition tolerant) since distributed systems are partition tolerant by default.

So, we can not build any distributed system without P, Either we have to favour C (CP) or A(AP) to define our system.

Conclusion

In this article we have seen what is CAP theorem and how it is used to define the properties of distributed system. When we build distributed database, cache or storage..etc, We have choice to make how our system should behave. Whether It has to favour consistency or availability during certain scenario. In this article we have understood the properties of distributed mysql read-replica and what we can expect from this during network failures.

--

--