Caching in System Design

While surfing the web, you have usually wondered why the sites you have visited once load faster than the new ones.

You would have also noticed how the texts and labels on the website appear or load faster as compared to high-quality images or other components. The reason behind this is Caching.

If you have ever visited any social media platform like Instagram or Twitter with a slow data connection, you would have found that the posts take a longer time to load as compared to the captions or texts. For big firms and businesses, these things matter a lot as such activities keep the user engaged till the images load completely, preventing them from switching from one webpage to another. A client may immediately change the website if it takes more time to load than usual. A better customer experience may save the business from losing existing clients and help them establish themselves in the market.

For example, when you watch a video on any streaming platform, in case the video buffers, you would prefer to switch the platform or search for any other way to entertain yourself. 

All such problems related to the engagement and enhancement of user experience for such businesses can be easily solved using the well-known technique of Caching.

Caching Technique

You may ask, what is the basic idea or logic behind the Caching technique?

Let's think of this like this, when you prepare your dinner, do you find yourself standing in the supermarket to buy the basic commonly used spices daily or do you keep them stocked up in your kitchen? 

The only logical and obvious answer is to stock them in the kitchen to save the time and hustle of visiting the supermarket daily.

This is precisely the concept of Caching. In this analogy, the kitchen shelf works like the cache registers or temporary store in your system. The same thing happens when our system needs to fetch some data from the primary memory RAM. Repeatedly accessing data, even from the primary memory, is a challenging task. The overall time to fetch the required data can be reduced by storing the frequently used data in an even faster and more easily accessible storage system, i. e., cache memory.

Cache memory is just a collection of registers that are used to store the temporary data, these have limited storage, but it is comparatively a lot faster than any other storage system. Caches are used to store the most recently accessed items.

So in case if you ever need a piece of information repeatedly in the same program, save it in the caches making it easier to Store again and again.

Most of you would have now wondered that if the cache memory is this beneficial, why isn't it used to store all the data? There are various reasons behind this, and one major one behind the same is that the hardware used to store the caches are extremely expensive as compared to the regular storage devices; also, if we fill up the Cache with a lot of data too, this would just demolish the real reason why they are faster and easy to access, hence only the Recently Accessible data should be stored in the Cache.

How can We Use Cache in their System?

Caching is used in almost every layer of the IT field. There are three layers of Cache in your system hardware, the first one being the CPU cache, then there is the 2nd layer of Cache, and the last layer is the regular RAM.

Cache memory is also required in operating systems to cache various instances of kernels to deal with their different extensions.

Cache memory is also available in web browsers to provide a better user experience.

By storing the data from the sites that you visit frequently

The Cache in mobile and web applications is used to save the properties of the front end so that it loads faster and is more accessible to the user.

In a well-designed system, the cache memory is placed close to the Central Processing Unit and sometimes on the CPU chip itself. Cache memory is connected to the system via the data bus network, and cache memory reduces the latency and enhances the input/ output operations. 

How does Cache Work?

Usually, web applications store the data of their recent users back in the database. The read and write operations to be performed on the database require network calls and Input/Output operations.

Cache memory helps you avoid this unnecessary hustle

Cache memory does this by reducing the number of network calls to the database to enhance service quality. To understand this concept better, let us take an example of any social media platform. For instance, assume that a post goes viral and is trending. Now thousands of users would like to access the same file

It would not be sensible to retrieve the data for every request from the database.

These social media platforms have millions of users, and this much traffic just to retrieve one file would degrade the user experience. To reduce the traffic and save time, we can keep these frequently accessed items separately in cache memory, which would help us improve the performance of our machines.

In a typical web application, we add some server cache using an in-memory store with our application servers like Redis, which may help improve our machines' performance.

Types of Cache

Usually, you can implement Cache in your System in the following four ways:

1. Application Server Cache

While designing web applications, one can easily add an Application Server Cache by giving some memory storage to it at the web server. This type of Cache is named Application Server Cache because it is implemented on your application's server. When the server used in the web application has just one node, the Cache can be added alongside the in-memory of your application server.

The latest request of the user is then stored in the Cache, and if repeated, the data is fetched from the Cache directly instead of searching it in the database. If the user sends any new request, the information for the same is fetched from the disk and stored in the Cache for subsequent use.

Note that the Cache would use the memory space of the web server, which could also affect your machines' performance.

Application Server Cache is a good solution for small systems, but the primary problem arises when you need to scale up the application. If the Cache still works on the application servers, as the number of servers increases, the number of Cache misses would increase too. Since a load balancer would be used to route the requests to the servers, each node would be utterly unaware of the already cached request making the data inconsistent and increasing complications. You can use either Global or Distributed in your designs to overcome this problem. Now let us look into them individually.

2. Global Cache

Just like the name, Global Cache is implemented in the single Cache space. Every node in the application will use the same space to store the cached requests. Global Cache can be classified into two parts:

First, when the requested data is not found in the Global cache space, it is the responsibility of the Cache to find the requested data from anywhere in the disk or database.

Second, when the data is not found in the Cache space, the requesting node directly contacts the database or disk to retrieve the data from it.

3. Distributed Cache

While designing an application with multiple servers, you can use Distributed Cache. As the name suggests, the whole Cache is distributed across the network, and each node would be assigned a part of the Cache. Each request encountered by the load balancer is routed to the server where the cache request is most likely to be found; the server is decided using the consistent hashing function through which every request is passed before being routed. 

Key points:

  • Each node has a part of the Cache.
  • The server where the request would be routed is decided by passing every request through the consistent Hashing function
  • Cache space can be increased by simply adding more servers to the application design. 

4. Content Distribution Network

CDN is basically used to reduce the network traffic on the servers. CDN is mainly used for websites which serve large amounts of static data. The static data could b anything from HTML, javascript files or even standard images. First, we request the data from the CDN, and only if it is unavailable on the CDN is it asked for from the backend services.

Cache Invalidation

Caching is a great solution to boost up the performance of your system but let us assume a scenario where the users are performing constant write operations on the database. Now, if the database is constantly updated, all the servers or nodes should access the same data and maintain consistency. Still, the Cache would create a problem here as it would be challenging to maintain consistency across all the cache memory on each and every server.

We use some cache invalidation approaches to invalidate the cache data to avoid inconsistent application behaviour and keep the cached data coherent with the source database.

There are three types of cache invalidation approaches, such as:

1. Write through Cache 

As the name suggests, the write operation is performed through the Cache memory, meaning any write operation would be first performed on the Cache and then on the database to maintain consistency. If any read request is found, it can be directly performed on the cache memory as every read follows the most recent write operation.

One of the many advantages of this approach is that this reduces the risk involved in Cache updation to a considerable extent as the write operations on the Cache and the actual database are done simultaneously. Write Through Cache technique can be used in those applications which may need to read the data repeatedly after it is written in the database, as in such systems, the write operations can take a longer time that would be compensated by the low read latency and consistency of our Cache.

2. Write Around Cache

Unlike the write-through cache technique, in this approach, we write the data on the database without updating or making any changes to the cache memory. However, this would help to perform write operations more quickly; it may lead to inconsistent data in cache memory. The primary drawback of this approach is that a read request from the recent read operation may cause a cache miss and would have to be read from a slower backend. You wouldn't need to load Cache the data, which would not be re-read; hence, this approach is best suited for applications that don't re-read the latest written data.

3. Write Back Cache

This approach solves the problems you may face while implementing the Write through or Write-around Cache technique. Since updating the Cache and database increases the write latency, updating only the database may lead to inconsistent data. In this approach, we only update the Cache first and assign a modified state to it. We schedule some Async jobs at regular intervals to further update the database with correct data.

The only drawback of this approach is that there is a high risk of data loss involved. Assume that you have updated the Cache and you are waiting to write on the database, but suddenly the disk crashes and all the data in the Cache is lost, now since the database is the source of truth and data was not written on it yet, you have lost your data.

Eviction Techniques

We have seen most of the essential concepts related to caching in system design. But you may wonder what you would do if the cache memory in your system gets full. Since the Cache in your system is limited, you would have to use some eviction algorithm to replace the existing data in cache memory with a new one, and this should be done in such a way that there is the least number of Cache misses. 

Let us now understand some of the commonly used Cache Eviction approaches:

1. Least Recently Used (LRU)

LRU is one of the most straightforward and, hence, the most popular approach. This technique has a good runtime performance and a reasonable hit rate in standard systems. 

As the name suggests, the Least Recently Used Cache fragment is replaced or overwritten by the new write request. Whenever a new entry comes, the bottom-most data is removed, and the latest entry is written at the top. 

Look at the example of any social media platform. As soon as the new posts arrive, the oldest post is removed from the main page, and the latest post is shown at the top.  

2. Least Frequently Used 

In this approach, the cache fragment, which is the least frequently used, is removed, and then the latest write request is made. Take the example of the suggestions you get while typing on your phone; these words are the most frequently used ones stored in your device cache.

3. Most Recently Used

This approach is suitable for applications where the recently updated data is the least likely to be re-read. This approach removes the most recently read data from the Cache and prefers to hold the old data. In most applications where the data repetition should be avoided to improve the user experience, this approach is used to store the data you have already interacted with. 

4. Random Replacement 

Random Replacement is the most straightforward approach, as it just replaces any random cache fragment to make the necessary space for new data.

So, this is everything you should know about the concept of Caching in System Design.