The difference between a cache and a buffer


Is saying a cache is a special kind of buffer correct? They both perform similar functions, but is there some underlying difference that I am missing?

Best Answer

From Wikipedia's article on data buffers:

a buffer is a region of a physical memory storage used to temporarily hold data while it is being moved from one place to another

A buffer ends up cycling through and holding every single piece of data that is transmitted from one storage location to another (like when using a circular buffer in audio processing). A buffer allows just that - a "buffer" of data before and after your current position in the data stream.

Indeed, there are some common aspects of a buffer and a cache. However, cache in the conventional sense usually does not store all of the data when it's being moved from place to place (i.e. CPU cache).

The purpose of a cache is to store data in a transparent way, such that just enough data is cached so that the remaining data can be transferred without any performance penalty. In this context, the cache only "pre-fetches" a small amount of data (depending on the transfer rates, cache sizes, etc...).

The main difference is that a buffer will eventually have held all of the data. Conversely, a cache may have held all, some, or none of the data (depending on the design). However, a cache is accessed as if you were directly accessing the data in the first place - what exactly gets cached is transparent to the "user" of the cache.

The difference is in the interface. When you're using a cache to access a data source, you use it as if the cache is the data source - you can access every part of the data source through the cache, and the cache will determine where the data comes from (the cache itself, or the source). The cache itself determines what parts of the data to preload (usually just the beginning, but sometimes all), while the cache replacement algorithm in use determines what/when things are removed from the cache. The best example of this is a system, aside from CPU cache itself, is prefetcher/readahead. Both load the parts of data they think you will use most into memory, and revert to the hard drive if something isn't cached.

Conversely, a buffer can't be used to instantaneously move your location in the data stream, unless the new part has already been moved to the buffer. To do so would require the buffer to relocate (given the new location exceeds the buffer length), effectively requiring you to "restart" the buffer from a new location. The best example of this is moving the slider in a Youtube video.

Another good example of a buffer is playing audio back in Winamp. Since audio files need to be decoded by the CPU, it takes some time between when the song is read in, to when the audio is processed, to when it's sent to your sound card. Winamp will buffer some of the audio data, so that there is enough audio data already processed to avoid any "lock-ups" (i.e. the CPU is always preparing the audio you'll hear in a few hundred milliseconds, it's never real-time; what you hear comes from the buffer, which is what the CPU prepared in the past).