Elastic memory caching tiers


Performance Analysis of Computer Systems Lab

Computer Science Department, Stony Brook University

Location: Room 336, New CS Building. Lab PI: Anshul Gandhi  (anshul (at) cs.stonybrook.edu).

Elastic memory caching tiers
(January 2016 - present)

Large-scale Web services often employ distributed memory caching solutions to reduce client response latencies by lowering loads at the critical backend database tier. Such in-memory caching solutions are also offered as a service by several cloud service providers, including Amazon Web Services and Google Cloud Platform. A popular example of such a caching solution is Memcached, that is currently employed by many online service providers, including Facebook, Reddit, Twitter, Wikipedia, YouTube, and Zynga. The memory caching tier sits in between the client and the backend database or storage tier, and aggregates the available memory of all nodes in the caching tier to cache data. Requests from clients are first looked up at the faster (memory access) caching tier. If the lookup fails in the caching tier, it is then tried at the slower (disk access), persistent database tier.

However, distributed caching systems, such as Memcache, are not elastic due to their stateful nature. There are many challenges in dynamically scaling the caching tier. The primary challenge is that the scaling action will result in an immediate, albeit transient, performance degradation. Addition of a new cache node results in a cold cache, whereas removal of an existing cache node results in loss of hot data. In both cases, performance suffers by as much as a factor of 10 due to cache misses until the cache is warm again (which can take many minutes, significantly costing businesses in lost revenues). To avoid such performance issues, system administrators typically over-provision caching systems, leading to significant cost/energy waste given the large amounts of expensive DRAM deployed.

This project aims to investigate novel architectures for memory caching systems that will enable dynamic scaling without any performance degradation. Our research will enable significant cost and energy savings, and will also scale to Internet-sized systems such as Facebook and Amazon.




Copyright 2014-2016 PACE Lab, Stony Brook University