What would you do if you want to read a book everyday? Would you go to the library everyday or just purchase a copy and keep it at your desk? You would obviously purchase. Similarly, when we need milk for a restaurant instead of going to the dairy to purchase every time, we can store it in refrigerator.
This is exactly what caching is about. Read to know!
- What is cache/caching?
- How caching is helping making websites faster?
- Where cache can we use cache?
- What are the different types of cache?
- What is Cache Invalidation?
- How to determine what data to keep?
What is cache/caching?
Caching means storing frequently demanded data closer.
In the above examples of books and milk, the desk and refrigerator were acting as the cache.
In computing, a cache is temporary storage such as a subdirectory in hard drive for the browser which is used to access required data quickly. Caching works on Locality of Reference principle: ‘Recently requested data is likely to be requested again’. It’s like a short-term memory which has limited space but it is faster and contains most recently accessed items.
flowchart for caching
Now you know what caching is. Don’t you wonder how this helps in making your website faster? Read further!
How caching is helping making websites faster?
Caching is an important component to the performance of any system. Cache ensures low latency for data and high throughput for the client request. It also acts as a backup for the data. A proper cache implementation can make retrieval of data faster from the server than from the original database source.
Where can we use cache?
Cache can be used in almost every layer such as hardware, OS, browsers, web applications etc. and
levels such as
4.CDN (internet level)
5.Scatter Gather (Infra level)
Different levels where caching can be implemented.
What are the different types of cache?
1.Application server cache
Cache-HIT and Cache-MISS?
Cache-HIT – When cache has the data
Cache-MISS – When cache doesn’t have the data
What is Cache Invalidation?
Have you wondered what happens if the data is modified in the database? We have to invalidate the cache as well and remove the old value from the cache and update the new value.
Here are some techniques that are used for cache invalidation: Write through cache , Write around cache , Write back cache , Cache aside , Read through cache etc.
Cache Invalidation Techniques
Write-through cache :
Write is done synchronously to the cache and to the backend storage. This is easy for operation but is slow in data writing because data has to be written to both the cache and the storage.
Consistency between cache and database is maintained
We will have the backup of the data in database
Higher latency because before returning the response to client we have to write and update the cache and the back-end storage
Write-back cache :
In this technique we write/update data in cache first and then data will be updated to the database at certain time intervals. When the application is write-heavy this technique is more useful than the rest.
Reduces the database write frequency
This technique will come with the risk of loosing data in case of a cache crash since cache is the only copy of the written data. If we loose the data from the cache before writing in the database, the data might get lost.
Cache Aside :
When client sends request, the system will first check the data in cache. If its a cache-hit it will return the data without any involvement of the database. If its a cache-miss then system will look for the data in database, and it will update the cache with that data and the send response back to the client. So next time client will request for the same data, it will be available in cache.
Works best for the read-heavy systems on the data which is not frequently updated. Such as user data like username and email id user id etc.
This technique can introduce inconsistency in data if the data gets updated in the database. To avoid this data inconsistency we use TTL for cache. Time To Live or expiry time for data. After this time interval data should be invalidated from the cache.
How to determine what data to keep?
We use cache-replacement policies to remove the stale data from the cache. As we already know, cache does not have vast memory to store the data. Hence, we must consider cache eviction policies a while designing a cache.
Cache replacement methods
Most common cache replacement policies are
1.Belady’s Algorithm (Hawkeye, Mockingjay)
3.Queue based policies (FIFO,LIFO)
4.Recently based policies (Least recently used(LRU),Time aware least recently used, Most recently used, Segmented LRU, LRU Approximations(pseudo lru ,clock pro))
5.Frequently based policies (Least frequently used, Least frequent recently used, LFU with dynamic aging)
6.Cache replacement using machine learning
7.RRIP style policies (Re-Reference interval prediction (static, bimodal, dynamic ))
8.Other cache replacement policies (Adaptive replacement cache, Adaptive climb, clock with adaptive replacement, Multi queue, Pannier, Low inter-reference recency set)