A caching system supporting … - Refurbed Engineering

refurbed strives to take decisions based on data. One way to validate assumptions is to perform A/B testing.

In our architecture, NGINX sits in front of the web application. In the previous A/B system, NGINX hashed the remote address 50%/50% to produce a user_segment header “A” or “B”, which was then used to present different versions of the page.

This can be done in NGINX like this:

split_clients "${remote_addr}" $user_segment {
  50% "A";
  50% "B";
}

add_header       User-Segment  $user_segment;
proxy_set_header User-Segment  $user_segment;

We write web handlers in Go on top of the gin framework. Our HTML templates are quicktemplates and compiled to Go.

Pages are the result of expensive API requests to the backend and indirectly the database. For this reason we aggressively cache pages. This means the NGINX cache key has to include the segment, to avoid serving “B” pages to “A” and vice-versa.

proxy_cache_key   $proxy_host$host$request_uri$user_segment$;

cache before

We needed to perform multiple experiments simultaneously, so we enhanced the A/B Segment API. The new API generates 30 random segments (A or B), distributed 50%/50, 80%/20% or 90%/10% depending on how we configured the slot. We assign features to test to a specific slot and the web application has an API to test the current segment for a specific slot.

split_clients "${remote_addr}001" $us001 {
  50% "A";
  50% "B";
}
...
split_clients "${remote_addr}001" $us029 {
  80% "A";
  20% "B";
}
                                   
set $user_segments_value "$us001,$us002,$us003,$us004,$us005,$us006,$us007,$us008,$us009,$us010";
set $user_segments_value "$user_segments_value,$us011,$us012,$us013,$us014,$us015,$us016,$us017,$us018,$us019,$us020";
set $user_segments_value "$user_segments_value,$us021,$us022,$us023,$us024,$us025,$us026,$us027,$us028,$us029,$us030";
                                  
add_header       User-Segments $user_segments_value;
proxy_set_header User-Segments $user_segments_value;

This results in a header like User-Segments: A,B,A,A,B,B,B,A,A,B,A,A,A,B,A,B,A,B,A,B,A,B,A,A,A,B,B,A,B,B

With 30 slots, you can produce a billion combinations for a given page. Caching the result pages is not workable anymore. Every user will get a different combination.

The goal

To use the 30 A/B slots in our pages and perform multiple experiments, while still able to serve pages fast and without overloading our APIs and database.

Possible solutions

Caching API responses with NGINX in front of the API server
Cache fragments of the page
A custom cache in our web application caching API responses

Requirements

Consistency. Final pages are a consistent result of a set of API calls. If we moved away from caching rendered pages, we need to make sure the cached API requests make sense between them.
Acceptable response times and use of resources (memory, CPU).
As we deploy our stack to Kubernetes, being able to share the cache across replicas would be a welcome improvement. In the original setup, NGINX cache is local to the pod.

The new cache layer in refbwebd

We decided to implement caching inside refbwebd, our front web application.

Web handlers inside the application already do an atomic set of consistent API calls. They also know their inputs, so we could generate a proper caching key. For example, we could take into account relevant URL parameters into the key and discard tracking ones. For this, we ported handlers to declare their parameters using BindQuery. Any parameter not explicitly declared or added to an allow list, is not used as part of the key.

We settled on Redis as the default cache store. Redis is available as a managed service in GCP (Memorystore) and we could setup it locally for development as well. We already planned to introduce Redis for a different project. Redis also improves our current caching by sharing the cache between refbwebd pods, which was not the case with NGINX caches.

cache after

The caching system acts as a facade to the methods used by the pages to do API calls. The cache attempts to retrieve the full set of API calls from cache when the first call happens. If there is a valid entry in the cache, the handler does not know the subsequent API call responses come from the full set of responses that was loaded from the cache, as the API is transparent.

cache hit

When there is a cache miss, the cache layer records all API calls, and when the handler completes, it updates the entry in the cache store, again, with the full set of API calls.

cache miss

Throttling

A RWMutex per-handler key (inspired by mutexkv ) key prevents simultaneous requests to the same handler missing the cache to generate a snowball sequence of API calls. If a handler is already generating API calls, a request to the same handler will wait until completion, by that time the entry will be already in the cache.

When a handler retrieves an entry from the cache and then misses a an API call response, the cache enters a inconsistent mode, fallbacks to do real requests and triggers a background refresh. This can happen if new code was deployed which changed the set of API calls but an entry is still in the cache.

Marshaling and compression

We first marshal the API responses to msgpack, which is a fast and popular serialization format.

For compression we settled on snappy which has a good balance of speed and compression and is for this reason a common choice for this use-case.

Snappy keeps redis memory usage stable at around 350M, while without compression we filled the available 1G memory in a few minutes.

Grace periods

In our NGINX setup, we served stale content i.e. expired cached entries for a grace period. This is to avoid a client finding an expired cache to experience high latencies. We amended the situation for the next user by triggering a refresh in the background.

We emulate the same behavior in the new subsystem. Entries are stored with RedisTTL = TTL + GracePeriod so they are available after they expire. If a hit happens after the TTL expired, we serve the request and generate a background refresh job once the handler is completed. This creates the effect of every request hitting the cache and latencies low. The refresh job includes all the API URLs recorded during the request triggering the background refresh.

grace hit

We process background refreshes using a typical worker pool. We use a Redis lock to single-flight background refreshes in case a handler has a refresh in progress already. Implementing single-flight with Redis is done by atomically writing a key if it does not exists (SETNX). The key is deleted when the refresh job finishes, or expires with a default TTL.

Future possibilities

The API we expose to the pages and Gin can be improved so that the presence of the cache is totally transparent. Porting a page today is changing a few lines, but it could be none.
Enhance cache keys to include a deployment unique marker. This prevents cache inconsistencies across deployments.
We are disabling the cache for authenticated pages, even when we know those pages are public. Some work can be done having the cache identify public pages and keeping the cache enabled.

A caching system supporting A/B testing