Caching in on Scale and Performance – Part III

In this concluding part of the article, I would like to discuss the various Cache management methodologies. But before we delve in that, Architects and Software Designers of today need to first make sure they are using the right Database/Storage technology. Because a Distributed Scalable No-SQL service like CosmosDB with it’s concept of request units (RUs) might not even need caching. The scale issue is solved by partitioning, distributing and scaling the database itself.

But if your design calls for a classic RDBMS style database, then a Caching layer and caching techniques need to be thought through.

In this section we will cover Cache Updates techniques and Caching Infrastructure considerations.

Fetch on miss

Most basic Cache systems are designed to be empty to begin with. When the application needs data, it tries to read it from the Cache. Since the cache is empty, it generates a “Not found” event which then can trigger a Database fetch. All subsequent reads can then fetch from the cache.

The advantage of this method is that you don’t have to populate the entire cache with data which may or may not be used. Only the required data is uploaded to cache. So you save on space and hence infrastructure cost. If you manage cache TTL (Time to live) properly, using this method you can very efficiently manage your Cache with a minimal infrastructure by keeping only the most frequently used data in the cache and nothing else.

The disadvantage of this method is that cold run for the application has is slow on response time as the data needs to get loaded.

This method is ideal for scenarios where only some parts of the data is being used frequently and occasional cache miss cost is acceptable to the users. Small cache is more important.

Pre-loaded Cache

Here you pre-load the entire cache-able data all at once. And then only update the database as and when data changes using one of the Cache Update methods (described later). This is an Anti-pattern. Loading everything into Cache could undo the performance benefits because of the added Cache management. While initially it might be inefficient, the system should eventually evict cache that not being used and come to optimum cache store.

Cache Eviction Policy

If you are using Redis, you can use a combination of TTL (Time to Live) and Expire commands to manage the Cache optimally. A good Cache Eviction policy can help you manage the size and availability of your Caching system.

Cache Update Methods

These are the standard patterns of updating your cache. Each pattern has it’s merits and demerits.

Write Through: When data changes, it is simultaneously updated to the Cache and to the Database. Advantage is consistency between Database and Cache. Disadvantage is keeping everything in the cache updated whether needed or not.

Write Around: Data is first written to the Database first to ensure that data is persisted first and then fetched into the Cache when accessed. The write logic can expire the cache when it writes to the database, so that the application knows that the data needs to be fetched when a Cache hit comes in.

Write Back: Data is first written to the cache and asynchronously updated to the database. Data loss risk is very high. So this method must be used only when data loss is affordable but data access needs to be very fast. This method can be used when the Cache layer is replicated and hence loss of one Cache server will not impact the database update.

Hopefully, this three-part article has covered areas of caching that most people are concerned with. One of the purposes of writing these articles was to make my job easier. I don’t have to point customers in different directions when caching is being discussed.

Caching in on Scale and Performance – Part II

In Part I of this 3-Part article, we looked at the importance of caching and cost of not doing so. We then built a sample application with Redis Cache as an example.

Going back to our Cash-in-the-wallet example from the previous article, the entire transaction chain from the Bank to Wallet has many locations where money can be held in smaller quantities. The ATM has some part of the money. At the Bank Branch, the teller’s drawer has some cash stored while the bigger pile of cash is probably in the back of the bank inside a large vault. There might be an even bigger stash of cash at the bank HQ. Armored vehicles keep moving cash between locations.

This is very similar to the situation with Data. Cached data can be found across the application tiers. Some of them might be completely transparent to the Developer (SQL Cache, Browser Cached Pages etc.) while some Caching needs to be built grounds-up by the developers (App Tier caching, Page level caching using JavaScript and JSON etc.).

Cache

The above diagram depicts all the places Data can be cached. The red arrows indicate expensive network trips to fetch data that adds latency and reduce performance. The diagram is agnostic of Cloud or On-Prem solutions.

So, the question now is, what to cache and where? The key concept to note here is that SQL Server Database is your single source of Truth. Which means that while all updates to data must be written to the Database, every piece of data need not be fetched from the database.

Create a data heat map

The most important characteristics for Caching is the frequency of updates to data values. Some data like Countries, Cities, Zip Codes, Names of people, Date of birth etc. won’t change. Then there is some data can change but not too often. Customer address, Customer Preferences, Software Customization, Customized Screen Layouts are examples where there may be change, but not that frequently. And then there is real-time transactional data like Bank Balances, Instrument Values in Hospitals etc., that needs real-time read-writes to permanent storage and differences between permanent storage and cache can create big business issues.

What you need to do is to look at your entire Application data and split them into the following three categories:

  1. Data that never changes
  2. Data that could change every few months
  3. Data that changes Daily
  4. Real-time data, changes every second

Once you have these broad categories, you can then decide where to cache the Data. The first two categories, depending on the volume of data, can be cached in the web page as JSON objects managed by JavaScript or HTML5 Session Store. The third Category can stay closer to the Database in a Clustered and Load Balanced Cache system. The last one needs to be fetched from Database directly (But that trip can also be avoided by using Write-Through Cache mechanisms).

In the next (and the concluding part) we will discuss Cache usage patterns and Architecting the Cache sub-system for scale.

Caching in on Scale and Performance – Part I

he Technologies covered: Azure SQL Database , Azure Redis Cache


Say it’s lunch time and you are standing in line for a sandwich. What’s the first thing you do? Check your wallet for cash? Right? If you don’t have cash, you go to an ATM to get it. But you most definitely don’t go to the bank branch and withdraw cash for this. And when you go to the ATM, do you withdraw just enough cash for the Sandwich. No. You take out an amount that you think you will need for a few days.

Going to your SQL Server Database (or any RDBMS) to fetch data every time your client application needs is like going to the Bank to withdraw cash every time you need cash. And while going to the bank to withdraw money might have been the modus operandi a few decades ago, it most definitely is not the right way to operate today. The same holds for Applications. A decade ago Client-Server systems operated this way. But with all the various scalable caching options available today, it makes no sense to keep going to the database for fetching every piece of data.

For over a decade, I have been helping customers scale their Client-Server or Web applications. A decade ago no one thought about Caching. Which was obvious. There weren’t too many Caching/No-SQL technologies and RDBMS was the only Database choice. But, I observe that even today, with all the No-SQL and Caching technologies available, many customers have still not given it a thought.

So, I decided to demystify the importance of caching in a 3-part blog series. This one covers the “Why Cache?”. The next will cover “What can you cache?”. And in the third one I will cover Caching Design and Architecture with Redis cache as an example.

Disclaimer

With a host of distributed No-SQL Database technologies available, there might be other Database options that can get you similar benefits. So, this blog series will focus on Traditional Client-Server systems built on SQL Server like RDBMS technologies either On-Premise or in the Cloud.

The Build-up to a clog

This is a typical development pattern that I have observed around how companies land up in situation where their systems start stalling and are unable to scale.

Stage 1:¬†All new application development starts from a developer writing code against a test database. Every time he hits the F5, the application zips to the database and shows the data on his application at lightning speed. And why wouldn’t it? He and a few others are the only ones connected to the database.

Stage 2: This application is then deployed to production and starts getting used by a few hundred people. Performance drop is not particularly evident.

Stage 3: As the popularity of the application grows and more and more people start using it, there is a noticeable drop in performance and that is when the Development team goes back to the whiteboard and starts to find solutions. But usually at this point, re-architecture is a huge cost.

Hence, building Caching into your architecture and design right from the beginning!

A caching example

Lets us walk through the process of building a simple web page that uses Redis Cache as opposed to a Database. I have built two web-based application applications on Azure. One connected to a Azure SQL Database and one connected to Azure Redis Cache. This article assumes that you have a basic understanding of Azure Services and .Net coding.

The database used is the AdventureWorks sample available on Azure when you set up a Azure SQL Instance.

Azure DB Create 2

Then I setup a Redis Cache service on Azure and called it AWCache

Azure Redis Cache Create 3

Once the database and cache were setup, I built a Console Application that builds the Cache from the Database.

Here are a the basic steps of constructing the Cache:

Prerequisites: NewtonSoft JSon, StackExchange Redis Library . Use Nuget to import the latest packages into Visual Studio.

Build a Join Query that gets ProductCategory X Product on Category ID:

select SalesLT.ProductCategory.ProductCategoryID,
SalesLT.ProductCategory.Name, 
SalesLT.Product.ProductID,SalesLT.Product.Name 
from SalesLT.Product, SalesLT.ProductCategory 
where SalesLT.ProductCategory.ProductCategoryID = SalesLT.Product.ProductCategoryID 
order by SalesLT.ProductCategory.Name

Build a Cache Structure to load the Join query into.

//Structure to store Category X Products
 public class CategoryXProduct
 {
    public string CategoryID;
    public string CategoryName;
    public List<Product> lstProducts;
    public CategoryXProduct()
    {
       lstProducts = new List<Product>();
    }
 }

 //Structure to store Products
 public class Product
 {
    public string ProductID;
    public string ProductName;

    public Product(string ID, string Name)
    {
        ProductID = ID;
        ProductName = Name;
    }
  }

Connect to the Cache and the SQL Database

//Connect to Cache Database with a ConnectionMultiplexer Helper Class
 IDatabase cache = Helper.Connection.GetDatabase();
 var server = Helper.Connection.GetServer("AWCache.redis.cache.windows.net:6380");
 
 //Clear the Cache
 server.FlushDatabase();

//Connect to AdventureWorks Data on Azure SQL
 dbConn = new SqlConnection("Server=tcp:mtcdemo.database.windows.net,1433;Initial Catalog=AdventureWorksDB;MultipleActiveResultSets=True; Persist Security Info=False;User ID=anideswandikar;Password=@ni123@ni123;MultipleActiveResultSets=False;Encrypt=True;TrustServerCertificate=False;Connection Timeout=30;");
 dbConn.Open();

//Run a loop on the DataReader and load the results of the query to the Cache.
reader.Read();
 cxp = new CategoryXProduct();

 string categoryID = reader.GetValue(0).ToString();
 string categoryName = reader.GetValue(1).ToString();
 string productID = reader.GetValue(2).ToString();
 string productName = reader.GetValue(3).ToString();

 cxp.CategoryID = categoryID;
 cxp.CategoryName = categoryName;
 cxp.lstProducts.Add(new Product(productID, productName));
 
 //Remember the current CategoryID to detect a change
 prevCategoryID = categoryID;

 int i = 1;

 //Loop and watch for CategoryID Change
 
 while (reader.Read())
 {
   categoryID = reader.GetValue(0).ToString();
   categoryName = reader.GetValue(1).ToString();

   //If Category is same, add product
   if (categoryID == prevCategoryID)
   {
      productID = reader.GetValue(2).ToString();
      productName = reader.GetValue(3).ToString();
      cxp.lstProducts.Add(new Product(productID, productName));
   }
   //Add the Object to Cache and start a new Object
   else
   {
     cache.StringSet(prevCategoryID, JsonConvert.SerializeObject(cxp));
     i++;
     cxp = new CategoryXProduct();
 
     //Get te new Category name and first product

     productID = reader.GetValue(2).ToString();
     productName = reader.GetValue(3).ToString();
     cxp.CategoryID = categoryID;
     cxp.CategoryName = categoryName;
     //Add the CategoryXProduct item to Cache
     cxp.lstProducts.Add(new Product(productID, productName));
     //Remember the new CategoryID
     prevCategoryID = categoryID;
    }
 
   }

 }

Once the cache is built, I built two very similar looking Web Applications. Once connects to the Database and one to the Cache. The Cache code looks something like this:

 string id = DLCategory.SelectedValue;
 CategoryXProduct item = 
    JsonConvert.DeserializeObject<CategoryXProduct>(cache.StringGet(id));
 foreach (Product p in item.lstProducts)
 {
 LBProducts.Items.Add(p.ProductName);
 }

BothApps

And voila! We just built a page in an Application that never has to go to the database for getting Data. But, what if a new Product is added or discontinued? Or a a new Product Category gets added. That is what we will cover in the next topic. What to cache? And the various caching methodologies.