Monthly Archives: May 2010

Implementing a shared cache: Part 4

Almost two weeks since I posted Part 3, which means it’s probably time to wrap up my series on implementing KSharedCache with this post. As a recap for those who don’t want to skip to the end of Part 3, I said I’d talk about having to defragment the shared cache and porting KIconLoader.

Defragmentation

One of the sub-optimal parts of my implementation of KSharedDataCache is the fact that the data must be contiguous in memory. It would actually be fairly simple to change this due to the existing page design, but right now this is what we’ve got.

The reason this is sub-optimal is due to an effect called fragmentation (more precisely, external fragmentation). The effect is illustrated in the following diagram:

The problem with fragmentation can be seen in the cache layout at the bottom. The maroon blocks indicate allocated pages. In the bottom cache layout, even though the overall allocation is less than in the other cache, it is not possible to fit the new block in memory because there is no solid block of free memory of sufficient size.

This problem could be solved if only there were a way to move the allocated memory around. In the case of KSharedDataCache this can be done, because only the cache records the location of each data block (when an item is returned to an application, it is actually a copy). This process is called defragmentation, and is essentially the same idea as what disk-based defragmenters do.

My defragmentation is probably fairly naïve but does the job, simply looking for used pages of memory and moving them as far to the beginning as possible. The more interesting part is deciding when to defragment. Right now defragmentation is performed as follows:

  • When we are having to remove currently-used pages due to insufficient consecutive memory when total free space is higher than a certain threshold, before pages are actually removed.
  • After evicting cache entries, since there are probably more holes in memory, and the reason we evicted things from the cache was due to insufficient consecutive free memory.

As I said my defragmentation routine is very simple and could probably be easily improved. I haven’t noticed issues with it becoming a problem during desktop usage but that’s perhaps attributable to not coming into use very often (if at all) due to the cache aging I mentioned in Part 3.

Porting KIconLoader

Perhaps the largest impetus driving me to do all this work in the first place was due to KIconLoader, which used to use KPixmapCache to cache loaded icons. KIconLoader is used everywhere in KDE and so many KPixmapCache-related crashes were first noticed in seemingly very-unrelated applications when trying to load icons.

Porting KIconLoader to KSharedDataCache was not a direct method name replacement unfortunately, as one of the things KIconLoader stored was the path that the icon was found at. (This used to be done by using the custom data API for KPixmapCache I mentioned in Part 1). I had first intended to use KImageCache (an image-related subclass of KSharedDataCache) for KIconLoader, but I ended up using KSharedDataCache directly, to hold cache entries that simply contained the pixmap and the path.

One problem that came up that I only fixed a few minutes ago was that KIconLoader would not only cache loaded icons, but it would also cache failed icon lookups, which was a behavior I had not ported over. This was especially notable in Dolphin browsing in large directories apparently. Either way, that is now fixed.

Future Directions

I’m proud of the work I put into KSharedDataCache, and especially since it has been running in trunk for about 4 weeks now with no major issues that seem to have popped up. However there are quite a few things that could be improved about it:

  • Cache corruption: Although in my opinion the risk of crashes from a corrupted cache is less right now due to the cache layout and non-usage of QDataStream, the possibility is not zero. Because of the serious consequences of cache corruption leading to crashes, it would be nice to have an efficient way to mark that a disk cache should be deleted because it is corrupt. I’ve thought of some things but have no concrete plans at this point.
  • The page table/index table method of storing data is very simplistic. There is surely a more appropriate method buried in some ACM or IEEE publication somewhere, even in the limits of fixed memory size. As it stands my method blends some of the disadvantages of 1960’s-era memory allocators with paged memory allocators, without all of the benefits.
  • Assuming defragmentation remains required, the defragmenter could probably be made faster as well.
  • It is not at this point possible to resize a cache once it has been created. There’s no reason in theory that it can’t be done, it’s just not implemented. (Note that implementing this is more complicated than simply changing a size flag in the cache…)
  • The cache could possibly be made more concurrent with lock-free algorithms or finer-grained locking. This is not something I’d like to touch until I have a way to verify correctness of the result, however.
  • Finally, it possible that someone has done this way better and that I simply missed it, in which event we should look at whether we should just adopt that library as a dependency and make KSharedDataCache a wrapper around it.
  • Should we remove old KPixmapCache caches after starting up a shiny new 4.5 desktop?

So, this concludes my series on implementing a shared cache. I’ve got to get working on other efficiency improvements, new kdesvn-build releases, classes, etc. It’s been fun writing though!

Implementing a shared cache: Part 3

So it has been a few days since Part 2, where I promised I’d talk about some issues that go with using pointers in shared memory, initial cache setup, and my arbitrary methods I use to handle various scenarios.

Pointing to things in shared memory

First I’ll talk about pointers. Essentially all you need to know about them is that a pointer holds a memory address, namely the address of the data you’re really interested in.

Now, every process in modern operating systems has its own “address space”, which defines where things are in memory. So, memory addresses in process 1 have no relation to addresses used in process 2, or any other process.

What this means for shared memory algorithms is that you cannot use normal pointers, since they rely on pointing to a specific spot in a process’s address space. See below for an example:

Demonstration of address space for three processes shared a cache

Three KSharedDataCache-using processes are running, and let’s say that kcalc was the first to create that cache, so the pointers are created from the perspective of kcalc’s address space. If KSharedDataCache used normal pointers to point to the data in the cache (from the cache header) then things would fail to work right in kate where we point into the middle of the data. The case of krunner is even worse, as we point into the krunner process’s private data!

The solution is not too hard. The memory map call that creates the connection will tell you where the connection starts. So instead of saying “the data is at address 0x400000”, use pointers that say “the data is 1024 bytes past the start”. These are called offsets. For example, the pthread library that is standard in POSIX could use this type of technique to implement “process-shared” mutexes (mutexes are by default merely thread-shared).

Initial cache setup

Taking that first step in creating a cache is hard. Once the cache is setup we can rely on having some means of locking, entry tables that are setup, and other niceties. Creating that in the face of race conditions is another matter though.

My decision to use pthreads for the mutex made this part harder than it could have been otherwise, as the mutex has to be stored with the cache. But you can’t use the mutex without initializing it first (remember that pthread mutexes default to not being process-shared). If two processes try to create a non-existing cache at the same time, they would both try to initialize the mutex, and the process that initializes the mutex the second time could potentially cause logic errors in the first process.

So, I went with a simpler solution for this rare case: A spinlock, using Qt’s built-in atomic operations. It is not quite a pure spinlock because there are a couple of possibilities (numbered as they are in the code):

Case 0 is that the cache has just been created, without ever having been attached to. (0 because that is the default value for an initially empty file that has just been mapped into shared memory).

Case 1 is that there is a process which has noticed that the cache is not initialized, and is initializing it (in other words, it atomically switched the 0 flag to 1). This is a locking condition: No other process will attempt to initialize the cache. But the cache is not initialized yet!

Case 2 occurs when the cache has been finally initialized, and can have the standard locks and methods used. To access a cache that is in this condition you must use the cache mutex.

I don’t use a spinlock all the time because my implementation does not do any magical non-locking algorithms, and therefore some operations might take some significant time with the lock held. Using a mutex allows threads that are waiting to sleep and save CPU and battery power, which would not work with a spinlock.

Cache handling

Any cache needs a method of deciding when to remove old entries. This is especially vital for hash-based caches that use probing, like KSharedDataCache, where allowing the cache to reach maximum capacity will make it very slow since probing becomes both more common, and more lengthy. I use several techniques to try to get rid of old entries. I make no promises as to their effectiveness, but I felt it was better to try something than to do nothing. The techniques are:

  • Limited Quadratic Probing: One standard method of handling items that hash to the same location in a hash table is to use “probing”, where the insertion/find algorithms look at the next entry, then the next, and so on until a free spot is found. Obviously this takes longer, especially if the hash function tends to make entries cluster anyways. In the case of KSharedDataCache it’s perfectly acceptable to simply give up after a certain number of attempts, and I quite willingly do so (but see the next technique). On the other hand if you can avoid colliding you don’t have to worry about finding empty spots, so to that end I use the “FNV” hash function by Fowler, Yo, and Noll.
  • Ramped Entry Aging™: The basic idea is that as the amount of entries in the cache goes up, it becomes more likely that the insertion method will, instead of probing past a colliding entry, artificially decrease its use count, and kick it out if it becomes unused. There are competing effects here: There’s no point having a cache if you’re not going to employ it, so this aging never happens if the cache is lightly loaded. On the other hand entries that get added to the cache and only ever used once could cause collisions for weeks afterwards in the long-lived scenarios I envision so it is important to make entries justify their use. So as the cache load increases, there is a higher and higher chance of entries being evicted if unused. I simply divide the use count in half, so an entry can be quickly evicted even if used a million times a month ago.
  • Collateral Evictions®: In the event that the first two options don’t work, then some entries have to be kicked out to make room for new ones. This process is called eviction. In my Collateral Evictions plan, anytime we kick entries out, we kick even more entries out on top of that (I chose twice as many, for no particular reason). The idea is that if we’re running of out space we’ll probably have to kick someone else out on the very next insert call anyways, so since it’s a time-consuming operation we might as well make it effective. The exact entries that get kicked out are decided based on the developer-configurable eviction policy.

Next time I’ll talk about defragmentation, porting KIconLoader, and any other random things I can think up. I hope it hasn’t been deathly boring so far!

If trunk is broken for you…

It might be because I completed the port of KIconLoader to use the KSharedDataCache class I recently introduced.

Everything should be fine if it’s not broken yet though, as Plasma was ported over a few days ago.

The cache files are stored under $(kde4-config –path cache)/*.kcache (Normally this is /var/tmp/kdecache-$USER/*.kcache). You shouldn’t need the old plasma_theme*.index or plasma_theme*.data files anymore, and you can delete those if you’d like.

I’m not done with the series I had going on making KSharedDataCache, I just haven’t had a lot of time recently. There’s 1 or 2 more parts I’d like to add, stay tuned…

Today I learned…

… that the ~QX11PixmapData(): QPixmap objects must be destroyed before the QApplication object, otherwise the native pixmap object will be leaked. warning most KDE applications display when exiting is actually false.

The X server will cleanup any opened resources, including pixmaps, automatically when the client exits. This is much like how the kernel automatically closes files and memory allocations when an application exits.

(The QPixmaps in question are the ones cached by KIconLoader as best as I can tell. They are cleaned up, just not before QApplication’s destructor runs.)