Implementing a shared cache: Part 1

So awhile ago I mentioned that I was trying to add a new shared-memory cache for the next version of the KDE platform. It’s almost done now, and has been submitted for review (both old-skool on kde-core-devel, and all Web 2.0-style on our review board).

Given the number of things I had to think about while implementing it (and I promise you that even still it’s not fully as thought-out as it could be), I decided that I could probably make a half-decent, if very technical series of posts about the implementation process.

I’ve got a basic outline set out, but without further ado, I’ll go over in this post what exactly a shared-memory cache is, why KDE has one now, and why I’m trying to make a different one.

Why a shared-memory cache?

Most of the programmer types are already familiar with the idea of a cache: You have some input that isn’t particularly helpful right now, you have a function to turn that non-helpful input into something you can use, but that function takes awhile to run. So, you save the output of that function (the value) once, somewhere where you can refer to it later (by a key) if you need it. The idea is to trade some extra memory usage for reduced time requirements. Caching is used everywhere, in your CPU, in your Web browser, the operating system, and more.

Now, the shared-memory part allows this cache to be shared between running processes (if you don’t know what a process is, just think of it as an instance of a running program. Each different instance, even of the same program, would be a different process). The operating system normally does a very good job of making sure that different processes can’t interfere with each other, but there are times when it makes sense to open a couple of gateways between them to let them share the burden. In KDE’s case, many of the icons used by a standard KDE application will be used unchanged by other KDE programs, so it makes sense for us to cache generated icons for use by other KDE program. We could simply write the icons out to disk where other programs could read them, but putting them into shared memory allows for the near-immediate transfer of that data without any disk I/O.

I’d like to find examples of current shared-memory caches (besides our current KPixmapCache), but the only ones I can find are the fully distributed type like memcached. Cairo has a cache for glyphs, but that seems to be done per-process. GTK+ has a cache which is designed to be read in directly using mmap(2), but not necessarily to be accessed via shared memory. Let me know if you find any though!

So again, in our case we use a shared-memory cache in large part to handle icon loading and Plasma themes (both potentially heavy users of SVG icons). This gives us two speedups: 1) We don’t always have to re-generate a pixmap from the source data (a very big speedup when the source is an SVG), and 2) If one process generates a pixmap, every other KDE process can get access to the results nearly instantly.

What KDE currently does

My interest in shared-memory caching came about from looking into some bugs reported against our current cache, KPixmapCache, which was developed in conjunction with the lead-up to KDE 4.0 to allow the new Plasma subsystem and the existing icon subsystem to use the fancy SVG graphics without taking forever.

In addition, KPixmapCache had a feature where not only could it cache the image data (the 0’s and 1’s that make up the image itself), but also the resulting QPixmap handle to the image as it exists in “the graphics memory” (I’ll gloss over the distinction for now, it will be important in a future part).

KPixmapCache is implemented by hosting two different files in a known location, one ending in .index and the other ending in .data. Respectively these files hold the index metadata needed for the cache to work, and the actual image data.

Anytime you talk about shared resources, you also need to think about how to protect access to those shared resources to keep everything consistent. KPixmapCache uses the trusty KLockFile to protect against concurrent access (this has the benefit of being safe if the partition is mounted on NFS, although I think the reason is more because that’s what already existed in kdelibs).

From there, KPixmapCache uses a sorted binary tree (sorted by the hash code) to manage the index, and treats the .data file as a simple block of memory. Every time a new item is entered into the cache, a new chunk of free space is allocated from the .data file, even if there already exists empty space. Likewise, if a new index entry is required to insert an item, it is always added at the end of the index (with one exception, when overwriting an existing index entry due to a hash collision). I said the index was sorted earlier, but that might seem incompatible with always adding new entries at the end. What the implementer did to solve this problem is hold pointers to the actual children and parent in each index item, that way a virtual hierarchy could be arranged. The binary tree is never re-arranged (like in AVL trees or red-black trees), so the initial root node is always the root node. The overall structure looks like this figure:

Schematic of KPixmapCache layout

One disadvantage to this architecture is that it is difficult and inefficient to delete entries that should be expired out of the cache. Properly removing an entry would require possibly having to move entries in the data file in order to minimize fragmentation. This alone wouldn’t be a large deal (I end up having to do the same thing in KSharedDataCache), but updating the .index file is even harder, since it requires updating information for both parent and children (although again, not impossible by any means). Avoiding fragmentation in the index would require either moving nodes around in the index file (possibly recursively), or having to scan for the first free node when adding items. None of these are big problems, but it does make the implementation more annoying.

KPixmapCache worked around all of this by punting the problem: No entries ever got removed until the cache filled up. At this point the entries would be ranked by whatever the expiration policy in effect was (i.e. least recently used preferred, newest preferred, etc.), a new (smaller) cache would be constructed holding the most-desired entries, and the old cache would be deleted. Although infrequent, this could possibly take a not-insignificant amount of time when it did happen.

So why a new implementation?

Probably the one thing that led to me starting from a different architecture however, was the interface to KPixmapCache: It is designed to be subclassed, and to allow subclasses access to both the index and individual data items through a QDataStream (see KIconCache for an example usage). Doing this meant the internal code had to use a QIODevice to interface to the data and index, and so what ends up happening is that even though KPixmapCache tries to place all of the data in shared memory, it always ends up accessing it like it was a disk file anyways (even though a memory-to-QIODevice adapter is used).

Having to support subclassing (in the public API no less) makes changing many of the implementation details a journey fraught with hazard, and it’s bad enough that any little problem in KPixmapCache seemingly guarantees a crash. Since KPixmapCache is used in very core desktop platform code (Plasma and KIconLoader) I knew I wanted to go a different direction.

So, I started work on a “KIconCache”. However all the work I was doing was hardly specific to icons, and when I’d heard of a developer that was abusing KPixmapCache to hold non-image-data somehow, I decided to make a generic shared-memory cache, KSharedDataCache. Next post I’ll try to explain the direction I decided to take with KSharedDataCache.

9 thoughts on “Implementing a shared cache: Part 1

  1. Tim Henderson Identicon Tim Henderson

    I am not a KDE developer, however I have some experience in file structures and I have a few questions/suggestions on your index structure.

    If I read you post correctly, you want a disk based “cache” to hold large binary blobs. Effectively you have a key value store. When the store is full you copy the most accessed items to a new index, and delete the old one. I see to problems with this, the first is with the index structure itself, the second is with your page eviction technique (whose issues stem from the first problem).

    Currently you are using a fairly straight forward implementation of a disk based binary search tree (BST) for you index. While the BST is a good structure in memory it is not good on disk, because it is a “tall tree” ie height=Log_{2}(N) where N is the number of elements in the tree. The problem with a tall tree is it takes many disk accesses to get to the leaf nodes. What you would like is to minimize the disk accesses. The best tree structure which supports *both* dynamic inserts and dynamic deletions is a B+Tree. For more information on B+Tree specifically see my blog post http://blog.hackthology.com/lessons-learned-while-implementing-a-btree.

    The second issue is the approach to page replacement when the buffer is full. As you say currently you punt on the problem. If you use a well known dynamic disk based tree (like the B+Tree) you can implement proper page replacement on top of it. I would suggest implementing LRU page replacement. LRU stands for Least Recently Used, and the algorithm works like this. When your cache is full, evict the page that was last used longest time ago.

    The simple way to implement this is to have a conceptual stack of pages, when a page is used it is placed on the top of the stack (if was already in the stack it is of course removed from its old spot so it can go on the top). When it comes time to evict a page, remove the page on the bottom of the stack.

    This structure, and algorithm, can be implemented on top of the B+Tree in an efficient manner. Indeed it can be quite straight forward to do so. If you would like to talk about this please contact me.

  2. ruurd Identicon ruurd

    Some comments. First of all, what does not become clear in the article is why COTS components weren’t considered for performing this function. memcached immediately comes to mind. Yes, one of the downsides of it is that it needs to be installed. The second thing is I see an eviction policy enumeration. Wouldn’t this be a chance to use a pattern?

  3. gadnio Identicon gadnio

    Hello,
    Having dealt with these issues frequently, i would still recommend B+ trees — even if your hash collisions aren’t that frequent, they are efficient enough as the leaves are sorted by the key and you can have close-to-nothing cost of traversing them (the only thing in mind is micromanagement of free-lists in the leaves when doing a full scan.)

    However, implementing B+ trees seems quite a challenge, already solved by others — picking up leaf size, implementing the inserts/updates/deletes… Since MySQL is already used (or Virtuoso, in the Nepomuk case), you can consider using for free their implementation as they have already solved that problem for you…

  4. TheBlackCat Identicon TheBlackCat

    What about caching things like the “OK”, “Cancel”, and “Apply” buttons that are used in pretty much every dialog in KDE? Is it even meaningful to cache buttons? I don’t know, it is just something I know is used a lot.

  5. Pellaeon Identicon Pellaeon

    Some of the commenters don’t seem to have picked up on the fact that the post mainly describes an existing implementation Michael is about to replace with something better, which he’ll describe in the next post. Michael also didn’t write the old version.

  6. mpyne Identicon mpyne Post author

    Tim: See Pellaeon’s comment, I am at this point merely trying to describe the existing functionality. I agree that it is suboptimal. I will get more into what I *am* implementing in future posts, but basically it is a fixed-size hash table where the data is handled in pages instead of blocks of memory for convenience. It’s definitely not the best possible implementation, but I don’t have tons of experience and so I’m starting off by aiming small (plus there’s the rampant paranoia that I might accidentally access memory beyond the mapped area and cause SIGSEGV). A B+-Tree implementation was considered but rejected due to the relatively high degree of sophistication required in a correct implementation.

    ruurd: I considered memcached but unless I’m mistaken the distributed nature of it doesn’t extend to shared memory, and I’m not sure I want to be pushing large pixmaps over IPC channels. It seems reasonable in other respects though. As far as patterns goes, you’d have to be more specific about what you’re asking about.

    gadnio: If anything I would surmise that a NoSQL key/value store would be the best base to “steal” from. Perhaps in KDE 4.6, I’m not personally super-attached to what I’ll describe in the next post.

    TheBlackCat: Caching the icons for those buttons is done, caching the buttons themselves doesn’t make a lot of sense. The code to generate those buttons is shared automatically by the kernel though.

    Pellaeon: Exactly, although I’m not sure how much “better” the next one will be, just as long as it’s less crash-prone. At least deleting items will be faster. ;)

  7. Karellen Identicon Karellen

    I always worry about application cache sizes. KDE has a cache. Firefox has a cache. GTK/Gnome apps have a cache. Other non-KDE apps might have their own cache.

    How do these caches know how big to be? How do they know what else I might want to do with my memory? When people complain about Firefox bloat, the FF devs and apologists make the claim that unused RAM is wasted RAM, and so caching as much as possible for as long as possible is the way forward. But that approach to me sounds like the developer assumes that their app is the only important app on the system, and that I should want it to be performant over all other apps.

    IMO, the advantage of a disk-based mmap() cache is that the kernel has a much better view of the entire state of memory – all processes, all users, etc… – at any time than any application (or application framework) can. It also has all the hard work already in place for expiration policies (expiring from RAM at least) which again is shared across the whole system. And a disk-based mmap() cache should also share pages between different processes completely automatically. If one process has the file mmap()ed in it’s address space, and it’s currently swapped in to physical RAM, the second process immediately gets to read the same physical RAM. And the index is the filesystem. Given a hash, e.g. “abcdef1234567890”, the data just goes in “~/.cache/ab/cdef1234567890” (or equivalent) – similar to how git stores objects. Or you can add more directory layers if you suspect many, many objects (~/.cache/ab/cd/ef/12/34567890)

  8. ruurd Identicon ruurd

    Re patterns: in the case of eviction policy, one could imagine using a plugin structure to plug in the proper evictor in the cache class. There MUST be some design pattern usually applied in these cases.

    Re memcached: I venture the guess that if you can cache all the small pixmaps that already would greatly improve the situation. So why not cap the cached pixmaps on size?

  9. Pingback: KGameRenderer: Caching on multiple levels « The Geek Shall Inherit The Earth

Comments are closed.