Tag Archives: kpixmapcache

Implementing a shared cache: Part 2

In my last post, I gave some background on what a shared-memory cache is, and how KDE already uses one (KPixmapCache) to save memory and make the desktop more efficient. I also noted how the current implementation leaves some things to be desired, and hinted at a new implementation I was working on.

In this second part, I’ll discuss some of the basic design principles of the new class, which I called KSharedDataCache.

Why a new class?

If you didn’t read Part 1, you may be wondering why I don’t just fix the current implementation, KPixmapCache, instead of writing new code. It’s a good question, but the short story is that due to the public API used by KPixmapCache, it is non-trivial (to say the least :) to improve KPixmapCache and take some necessary steps to improve its performance. The penalty of getting it wrong is pretty severe as well, as there have been probably hundreds of reported crash bugs already due to KPixmapCache.

So, someone on IRC gave me the idea that, why don’t I just make my improvements in a different class, like a KPixmapCache2 and move the majority of the current users of KPixmapCache to use that instead? It sounded like a good option to me, so that’s what I started on, eventually settling on a generic cache layer under a slightly more specialized image-handling cache.

KSharedDataCache

KSharedDataCache is a class that manages a cache, keyed by QString values, and holding QByteArrays for generality. The cache is held in shared memory, which is accessed across multiple processes based on the cache name (which is converted internally to a file name).

The central data structure is the cache itself. Everything that is needed to be able to insert items, find items, and otherwise manage the cache is kept in the same memory segment, instead of being split into two different files like in KPixmapCache. A very ugly drawing of the layout would look like this:

Block diagram of the KSharedDataCache memory layout

Starting from the left, we have the header for the shared cache itself. This contains several important pieces of data, including the cache size, the page size (which is adjustable), the number of free pages, and the mutex which protects against concurrent access to the shared data.

One note about the mutex, is that it is used instead of KLockFile. It requires support for process-shared POSIX thread primitives (which is required for XSI-conformant systems, but was not present in Linux/glibc until NPTL IIRC). As long as your system tells the truth about whether it supports process-shared primitives KSharedDataCache will still work (even if it can’t use shared memory).

After the cache header, the entry index table is located (starting from the first byte meeting alignment criteria to avoid crashing on non-x86 systems… although I have none to test!). This table is a fixed-size table, based on the total cache size and page size. Entries are placed into the entry table based on the hash of the entry key, and each entry contains information such as the item size, hash code, use count, time of last access, and location of its data.

Collisions are possible with any hash table. The standard answer to handling collisions is to use a method called “chaining” to just make a list of entries which share the same hash code. Unfortunately dynamic memory allocation is much more involved when you’re dealing with a fixed-size block of shared memory, so currently quadratic probing is used to try to seek out other, hopefully empty candidates. Since this is just a cache, the probing is only continued for a small number of attempts.

Following the entry table is the page table, which simply records the entry currently using every page in memory. It is possible to compress the page table by using a bit vector, and making a full page table only when needed (currently only during defragmenting) but I didn’t have time to implement that.

Finally, the rest of the shared memory is devoted to a paged memory allocation system (this is probably the most suboptimal part of my current implementation, but at least it can be fixed later this time ;). Every entry is stored in this data area, with the key that is used followed by the actual QByteArray data.

Resolving an access for an item with a key of “juk_128x128.png” would work something like this:

  1. Lock the cache. If unable to acquire the lock, assume the cache is corrupt, unlink it on disk, and create it all over again.
  2. Convert the key to a byte array using the UTF-8 encoding, then determine the hash code.
  3. Use the hash code to find the appropriate entry index. Compare the hash code to the candidate entry’s hash code, and if they don’t match, use quadratic probing to find another candidate. Give up if the entry is not found within several attempts.
  4. If the hash codes match, search the matching data area to determine the saved key value, and make sure the keys also match. If they don’t match then go back to before, using quadratic probing to find a match.
  5. If the keys did match, we found our entry. Update the entry’s use count and last access time, and then copy the data out of the page or pages to return to the caller. Since this all happened in shared memory this should be much much faster than loading it from disk (assuming of course that the operating system hasn’t paged out the shared memory to disk in the meantime).

You might have noticed that I possibly unlink the cache with reckless abandon in step 1. I actually do this in many more places, the idea being that a corrupt cache can lead to bugs that are very hard for the end user to diagnose and correct, and by definition a cache can be expected to drop entries at any time. The only danger would be tampering with a cache that other processes are currently using in shared memory. By unlinking (and only unlinking) the cache, the other processes can continue to use the inode that used to be associated with the file, and the kernel will finish the cleanup when the other processes exit.

Of course I’m up past a thousand words now, so I’ll continue in Part 3, where I’ll discuss how pointers work in shared memory, how initial cache setup is performed, and my attempts at handling cache pressure, defragmentation, etc.

Implementing a shared cache: Part 1

So awhile ago I mentioned that I was trying to add a new shared-memory cache for the next version of the KDE platform. It’s almost done now, and has been submitted for review (both old-skool on kde-core-devel, and all Web 2.0-style on our review board).

Given the number of things I had to think about while implementing it (and I promise you that even still it’s not fully as thought-out as it could be), I decided that I could probably make a half-decent, if very technical series of posts about the implementation process.

I’ve got a basic outline set out, but without further ado, I’ll go over in this post what exactly a shared-memory cache is, why KDE has one now, and why I’m trying to make a different one.

Why a shared-memory cache?

Most of the programmer types are already familiar with the idea of a cache: You have some input that isn’t particularly helpful right now, you have a function to turn that non-helpful input into something you can use, but that function takes awhile to run. So, you save the output of that function (the value) once, somewhere where you can refer to it later (by a key) if you need it. The idea is to trade some extra memory usage for reduced time requirements. Caching is used everywhere, in your CPU, in your Web browser, the operating system, and more.

Now, the shared-memory part allows this cache to be shared between running processes (if you don’t know what a process is, just think of it as an instance of a running program. Each different instance, even of the same program, would be a different process). The operating system normally does a very good job of making sure that different processes can’t interfere with each other, but there are times when it makes sense to open a couple of gateways between them to let them share the burden. In KDE’s case, many of the icons used by a standard KDE application will be used unchanged by other KDE programs, so it makes sense for us to cache generated icons for use by other KDE program. We could simply write the icons out to disk where other programs could read them, but putting them into shared memory allows for the near-immediate transfer of that data without any disk I/O.

I’d like to find examples of current shared-memory caches (besides our current KPixmapCache), but the only ones I can find are the fully distributed type like memcached. Cairo has a cache for glyphs, but that seems to be done per-process. GTK+ has a cache which is designed to be read in directly using mmap(2), but not necessarily to be accessed via shared memory. Let me know if you find any though!

So again, in our case we use a shared-memory cache in large part to handle icon loading and Plasma themes (both potentially heavy users of SVG icons). This gives us two speedups: 1) We don’t always have to re-generate a pixmap from the source data (a very big speedup when the source is an SVG), and 2) If one process generates a pixmap, every other KDE process can get access to the results nearly instantly.

What KDE currently does

My interest in shared-memory caching came about from looking into some bugs reported against our current cache, KPixmapCache, which was developed in conjunction with the lead-up to KDE 4.0 to allow the new Plasma subsystem and the existing icon subsystem to use the fancy SVG graphics without taking forever.

In addition, KPixmapCache had a feature where not only could it cache the image data (the 0’s and 1’s that make up the image itself), but also the resulting QPixmap handle to the image as it exists in “the graphics memory” (I’ll gloss over the distinction for now, it will be important in a future part).

KPixmapCache is implemented by hosting two different files in a known location, one ending in .index and the other ending in .data. Respectively these files hold the index metadata needed for the cache to work, and the actual image data.

Anytime you talk about shared resources, you also need to think about how to protect access to those shared resources to keep everything consistent. KPixmapCache uses the trusty KLockFile to protect against concurrent access (this has the benefit of being safe if the partition is mounted on NFS, although I think the reason is more because that’s what already existed in kdelibs).

From there, KPixmapCache uses a sorted binary tree (sorted by the hash code) to manage the index, and treats the .data file as a simple block of memory. Every time a new item is entered into the cache, a new chunk of free space is allocated from the .data file, even if there already exists empty space. Likewise, if a new index entry is required to insert an item, it is always added at the end of the index (with one exception, when overwriting an existing index entry due to a hash collision). I said the index was sorted earlier, but that might seem incompatible with always adding new entries at the end. What the implementer did to solve this problem is hold pointers to the actual children and parent in each index item, that way a virtual hierarchy could be arranged. The binary tree is never re-arranged (like in AVL trees or red-black trees), so the initial root node is always the root node. The overall structure looks like this figure:

Schematic of KPixmapCache layout

One disadvantage to this architecture is that it is difficult and inefficient to delete entries that should be expired out of the cache. Properly removing an entry would require possibly having to move entries in the data file in order to minimize fragmentation. This alone wouldn’t be a large deal (I end up having to do the same thing in KSharedDataCache), but updating the .index file is even harder, since it requires updating information for both parent and children (although again, not impossible by any means). Avoiding fragmentation in the index would require either moving nodes around in the index file (possibly recursively), or having to scan for the first free node when adding items. None of these are big problems, but it does make the implementation more annoying.

KPixmapCache worked around all of this by punting the problem: No entries ever got removed until the cache filled up. At this point the entries would be ranked by whatever the expiration policy in effect was (i.e. least recently used preferred, newest preferred, etc.), a new (smaller) cache would be constructed holding the most-desired entries, and the old cache would be deleted. Although infrequent, this could possibly take a not-insignificant amount of time when it did happen.

So why a new implementation?

Probably the one thing that led to me starting from a different architecture however, was the interface to KPixmapCache: It is designed to be subclassed, and to allow subclasses access to both the index and individual data items through a QDataStream (see KIconCache for an example usage). Doing this meant the internal code had to use a QIODevice to interface to the data and index, and so what ends up happening is that even though KPixmapCache tries to place all of the data in shared memory, it always ends up accessing it like it was a disk file anyways (even though a memory-to-QIODevice adapter is used).

Having to support subclassing (in the public API no less) makes changing many of the implementation details a journey fraught with hazard, and it’s bad enough that any little problem in KPixmapCache seemingly guarantees a crash. Since KPixmapCache is used in very core desktop platform code (Plasma and KIconLoader) I knew I wanted to go a different direction.

So, I started work on a “KIconCache”. However all the work I was doing was hardly specific to icons, and when I’d heard of a developer that was abusing KPixmapCache to hold non-image-data somehow, I decided to make a generic shared-memory cache, KSharedDataCache. Next post I’ll try to explain the direction I decided to take with KSharedDataCache.

KSharedDataCache

So in my last post I had mentioned some of the issues that have been encountered with KPixmapCache, and that I was working on a separate implementation.

Based on the fact that the underlying code is easily made more general purpose I’ve gone ahead and went down that path. With that in mind, the current code I have is available from a branch of kdelibs that I’ve made a few hours ago, called kdelibs-shareddatacache, available from /branches/work. This branch adds a class called KSharedDataCache, adds a KImageCache which operates on top of KSharedDataCache, and updates Plasma::Theme and KIconLoader to both use KSharedDataCache.

I don’t have time to go over the implementation details (I’ll try to do so this week if I get time though) but I will throw out the following:

  • My port of KIconLoader was so much of a hack that the original KIconCache is still there! It’s just not being used for icons. I think the Plasma::Theme work is a little cleaner but probably could be better handled itself.
  • Changing the icons requires a separate patch to kdebase/runtime/kcontrol/icons that I will apply if this should ever land in trunk. (It’s a bit faster too…)
  • On the topic of speed, I don’t have any benchmarks, mostly because that wasn’t the reason I was mucking about in the cache. It seems a bit faster to me but I’ve done no formal testing.
  • My larger concern was correctness. Absolutely everything is locked (double-locked in one particular scenario I was able to think of, look in the code if you’re interested…) and things seem to pass the consistency tests I’ve been able to throw at it now. The limited testsuite for kiconloader seems to work as well.
  • This work depends on having a pthreads library with “process-shared” locking primitives (at the very least mutexes) but I have not added relevant CMake checks yet.
  • Now that I think about it I also probably should be using timeouts and checking some more return codes here and there.

It’s available if you’re interested though. If you do go this route, I have an auxiliary tool which you can use to view the contents of a cache:

It obviously requires the headers for the new classes so you will need to have installed the modified kdelibs. You can download the source from http://purinchu.net/dumping-ground/cacheviewer-1.0.tar.bz2. When it asks for a cache name, just give it one of the *.kcache files in your /var/tmp/kdecache-$USER directory (examples: icon-cache or plasma_theme_Aya (or whatever theme you’re using)).

(By the way, isn’t it *scary* that the icon cache had been accessed more than 100,000 times in a 12-hour period? ;)

Bug of Legend

So. Those of you who notice the things I post will presumably notice I haven’t blogged as much recently. Essentially it’s because I have less time for development between school and work (I’m on shore duty so I don’t deploy, but I’m in charge of a division so my hours are still substantial).

The spare time I have had over the past month I’ve mostly spent trying to (re) fix my new favorite bug ever, bug 182026 which is a crash bug in KPixmapCache::discard().

You should hopefully have no clue about what KPixmapCache is other than that it’s what’s crashing most of your favorite apps, so I’ll give you the rundown. Back when KDE 4 was being developed and we were shifting to use the shiny SVG icons and graphics everywhere it was noticed that rendering SVG graphics took rather more time than simply loading a PNG from disk. So, instead of loading a SVG file from disk every single time it was needed, the end result (a PNG) was cached to disk instead, and a quick check is done to ensure that the underlying SVG file didn’t change. I didn’t write the code but thought it was interesting and committed a couple of bug fixes and now I apparently maintain it. =D

For efficiency reasons that on-disk cache is accessed by each KDE process using a shared memory segment (although it then gets inefficient again by using I/O operations anyways, but that’s for later). Anytime you have multiple processes or threads accessing a shared resource you need a way to handle that contention to make sure that the sensitive parts of the code are only being run by one thread at a time.

The second problem that was reported as 182026 was that the “discard” feature of the cache seemed to cause crashes for a lot of people. (Although oddly enough, not for any of the 3 test systems I have. Call me lucky…). The idea of discarding the cache contents is that the application code knows that something changed which means that essentially all of the stored graphics will not be needed. The major way most people see this is by changing Plasma themes (where the old theme’s graphics are now useless) or by changing the icon theme.

The underlying issue is that ::discard() uses essentially no locking. Instead it tells the shared caches to reload themselves on their next operation (a process referred to as “invalidation”), disconnects from shared memory, and deletes the cache file on disk. Or at least, it used to. Merely adding locking helps, but is solving the wrong problem. Any time the cache is disconnected from shared memory, it will try to reconnect later, re-creating the file on disk if necessary (and it is necessary in this case). But this is all just to say that the cache is now empty, there’s no underlying reason that we have to deallocate everything just to turn around and reallocate it.

So when I committed my fix for 182026, I did the following:

  • I added a flag to the cache to mark if it was simply empty or not. This allows me to discard the cache without having to disconnect it from shared memory or invalidate it.
  • Made KPixmapCache::discard() take the cache lock first before it did any of that.
  • When copying the cache for resizing, KSaveFile is used instead of truncating the cache (especially in case any cache locks have to be broken due to timeouts)

Unfortunately my initial fix had the side effect of breaking the “cache” part of KPixmapCache as it would never find cached items again. It was noticed by a KDE Games developer though so 4.4.2 should still be good to go now that I’ve fixed that.

A final side note is that I think I need to change the code that deletes an entire named cache to discard the cache as well for those processes that already have it attached to shared memory.

I have an alternative shared-memory cache implementation which I hope to integrate for KDE Platform 4.5 or 4.6 (although it may simply end up as KSharedCache or similar instead of merely replacing KPixmapCache due to API additions in KPixmapCache I don’t feel like supporting). At some point in between I may write a post about the underlying cache architecture and some “Do”s and “Don’t”s but I think I’ll just stop here for now. :P