Category Archives: Computing Troubles

Debugging KF5 build failures

Those familiar with running development versions of KDE software are familiar with the idea of having to sometimes remove their whole development install directory and start all over in order to resolve some types of build errors.

That advice has always seemed to me to be overly mysterious, even if occasionally effective. It’s just as logical as the old saw about rebooting your computer; it often works, but why?

I got to re-examine that question today when resolving a build error on my infrequently-updated KF5 install, a build error that could have been fixed by removing the install directory and re-building everything. Only this time, I can explain why.


In short, the issue relates to the movement of a framework class from one framework module to a different one (a not-uncommon circumstance when working on a new major version of a software release).

Not too long ago, the trusty old KPluginLoader module occupied a spot within the KService framework, which was responsible for installing the needed includes, libraries, etc. for application developers (and other Frameworks) to use KPluginLoader.

However, as part of our ongoing code and API cleanups, we decided to move KPluginLoader and its associated classes to a more generic Framework module (KCoreAddons). This move was performed back in March and was designed to maintain source compatibility with code that used a KService class to construct KPluginLoader.

To do this, an intermediate class, KPluginName, was created to link KService with KPluginLoader (instead of using QString directly as an implicit conversion, which could easily become disastrous). What this means is that all code using the KService class would now depend on the new KPluginName class, which is bundled with KPluginLoader (all of which was moved to KCoreAddons).

That describes the theory. Nor is the reality far off, unless you are like me and simply install updated modules to a set install path without deleting the install directory first every time.

With that construct, what happens to application code using KService is that there are 2 possible kpluginloader.h files that might be found: The new one from KCoreAddons, and the old one from KService leftover in the install directory. If CMake finds the old one in the KService include directory, there won’t be a KPluginName class, which means any applications using KService itself will fail to compile at kservice.h (which needs KPluginName, and properly #includes kpluginloader.h to load it).

This is the exact kind of error that deleting your install directory first helps to overcome. So let me remind my fellow developers and testers, if you get weird build errors (especially for software in flux like KF5 is), don’t forget to try removing your install directory when you’re troubleshooting. It takes more time, sure, but there’s a reason that distribution package managers uninstall old versions of software before installing updated versions.

Vim tip: Finding differences without separate files

While debugging a failure in the kdesrc-build test suite after a fairly extensive series of refactorings, I ended up with an object hierarchy which was different from a different hierarchy… but different where?

The obvious solution is to use something like Test::More’s is_deeply test, which displays the places where the two structures are different.

Unfortunately it reported that the two structures were identical (although in fairness I checked just now and it’s an older version from 2009, there’s been several bugfixes in the function since which probably close this… assuming I was using the method right).

The other quick-and-dirty debugging method I use is to simply dump the structure to the output using something like Data::Dumper.

Unfortunately it’s not easy to “spot the difference” in two consecutive Data::Dumper outputs of nearly-identical structures. That is where Vim came in to save my day.

I think most Vim users already know about the vimdiff mode which helps significantly with applying patches to files, but what if you don’t have a file to work on? In my case I was dealing with Konsole output. Of course I could cut-and-paste that output to different files and then run vimdiff, but that’s annoying (what if I forget to unlink the files when I’m done, what to name them?, etc.).

Instead I dumped all of the output into a Vim window, then used :vnew to open a new buffer and window. Afterwards I moved the second output block to the new window and cleaned up the leading and trailing empty lines.

The only thing left to do is put Vim in its diff mode, and that is done using :diffthis, which gave me the following (click to enlarge):

Screenshot of a vim window displaying only differences between two different buffers

This makes it much easier to find the bug, and is very easy to accomplish quickly. Perhaps you’ll find it useful at some point as well.

Tracking down a library bug

So today I had noticed I had build failures in quite a few modules that were based on errors linking to libkipi, involving undefined references to KIPI::ImageInfoShared::~ImageInfoShared().

Normally fixing this is as easy as ensuring that the library which provides the symbol has been updated, built, and installed and then running kdesrc-build --refresh-build on the modules that had failed. In this case it didn’t work though. Hmm…

Looking at my log/latest/gwenview/error.log showed that it was the same build failure, so I went to the affected build directory and ran make VERBOSE=1, which shows the command line involved.

The output was a whole lot of something like:

/usr/lib/ccache/bin/g++   -march=native -pipe removed .o files
/home/kde-svn/kde-4/lib/libkfile.so.4.7.0
../lib/libgwenviewlib.so.4.7.0 /home/kde-svn/kde-4/lib/libkio.so.5.7.0
/home/kde-svn/kde-4/lib64/libkipi.so more removed stuff

Had I been paying close attention I may have noticed the actual problem right here, but in the event I merely noticed that libkipi had no version number referenced, only the libkipi.so.

The next step for me was to try and figure out why the symbol wasn’t defined, but first I wanted to make sure that the symbol wasn’t defined, which can be accomplished using the nm tool.

The output of nm lib64/libkipi.so needs to be filtered to make it useful. I ended up just grepping for the mangled symbol name but you can unmangle the symbol names and grep for that as well. After running nm lib64/libkipi.so | grep ImageInfoShared I saw that the destructor was actually defined three times!

$ nm  /home/kde-svn/kde-4/lib64/libkipi.so | grep _ZN4KIPI15ImageInfoSharedD
0000000000014728 t _ZN4KIPI15ImageInfoSharedD0Ev
00000000000146e0 t _ZN4KIPI15ImageInfoSharedD1Ev
00000000000146e0 t _ZN4KIPI15ImageInfoSharedD2Ev

Two of the destructor names pointed to the same address so there was only two different functions, but why were there even 2? Looking into this further revealed that the different destructors are actually defined in the C++ ABI implemented by gcc, specifically that:

  • The D0 destructor is used as the deleting destructor.
  • The D1 destructor is used as the complete object destructor
  • The D2 destructor is used as the base object destructor.

The D1 destructor is presumably used as a shortcut when the compiler is able to determine the actual class hierarchy of a given object and can therefore inline the destructors together into a “full destructor”, which the D2 destructor would be used when the ancestry is unclear and therefore the full C++ destruction chain is run. Neither of these would deallocate memory though, which is why the separate D0 destructor is needed (which is presumably otherwise equivalent to D2, but that’s just me guessing).

Either way, the destructors were actually just normal operation of the compiler. All the bases appeared to be covered, t from the nm output means that the symbol is defined in the text section, which means it should be available, right?

As it turns out I had to read the nm manpage closer… lowercase symbol categories imply that the symbol is local to that library, or in other words that it does not participate in symbol linking amongst other shared objects. That would explain why gcc seemingly couldn’t find it.

But why was the symbol local instead of global? As far as this goes, I’m still honestly not sure. It turns out that LIBKIPI_EXPORT was defined to expand to KDE_IMPORT instead of KDE_EXPORT. But on Linux both of those change the visibility of the affected symbol to be exported (KDE_IMPORT makes more sense on the Windows platform, but is essentially the default on Linux already). So although this appeared to be the issue, it was actually not a concern.

However, playing around with that #define and recompiling libkipi made me realize that the affected library didn’t appear to have changed after I ran make install… which turned out to be due to my KDE libraries getting installed to $HOME/kde-4/lib, but libkipi was in $HOME/kde-4/lib64, and was getting picked up by CMake and FindKipi.cmake somehow.

Perhaps I hit a transient buildsystem bug, perhaps it was something else. But removing the stray lib64 directory and rebuilding the affected modules appears to have fixed everything. At least I learned the reason there’s up to 3 different destructor symbol names for a C++ class, I guess. ;)

Good job Aaron

243 votes. 61 duplicated bugs (as of this post). 21 backtraces submitted as attachments. 5 months of troubleshooting. 1 detailed Valgrind log. And now, Aaron Seigo has figured out and fixed Bug 258706, a crash in Plasma related to the system tray (often but not always with Amarok).

The problem? In basic terms, a QIcon was able to outlive the custom KIconLoader that was used to actually load the QIcon’s pixmap. It’s hard to properly share the underlying KIconLoader without breaking compatibility, but it’s not hard to figure out when the KIconLoader is deleted by using QWeakPointer. So, Aaron’s fix uses QWeakPointer to always have an up-to-date status of the KIconLoader used, and gives a fallback pixmap if the KIconLoader was deleted before the QIcon loaded the pixmap it needed.

If you’ve been hitting this bug, the fix will be in KDE Platform 4.6.4, although maybe your packagers will re-spin 4.6.3 packages to include this fix (hint hint ;)

Perl hijinks

So I’ve been trying to modularize my kdesrc-build Perl script (i.e. actually split it into logical objects/modules) and yet still retain it all into one script, the idea being to get the logic into a more understandable state where possible and overall make the codebase less brittle.

I achieved a large milestone today in finally managing to group together the debugging methods in a way which remains compatible with the rest of the script.

What I mean by this is that I used the prototypes feature of Perl subroutines to allow for methods like:

return 1 if pretending;

(Notice how there are no parentheses after the pretending call), and

info "\tPerforming source update for g[$module]";

(Likewise no parentheses for the info method call).

Now, in retrospect I probably should have simply not used prototypes, at least for the output methods which would not be significantly less readable with parentheses. All the same however, prototypes were interfering with grouping these debugging routines into their own module.

This is because subroutine prototypes actually affect the Perl parser itself, and these prototypes are not exported by the normal Perl routines for exporting subroutines out of classes, i.e. a subroutine Debug::pretending() (with a prototype that is completely empty) would get exported to main as main::pretending (with no prototype at all). This would break code that used the pretending routine without parentheses since the Perl parser doesn’t know it is supposed to accept no arguments.

After beating my head on this problem off-and-on for awhile it finally occurred to me today that Perl is a “dynamic language”: Why couldn’t I just manually feed the appropriate declaration into the parser on-the-fly when necessary?

After some prototyping I came up with an import method that seemed to work:

my $pkg = shift;
my $caller = caller;
my @exports = qw(debug pretending); # etc...

# This loop is only slightly "magical". Basically to import functions
# into a different package in Perl, we can use something like:
# *PACKAGE::FUNCTION = \&SOURCE_PACKAGE::FUNCTION;
# where the *PACKAGE modifies the symbol table for that package.
#
# The extra part, which requires using eval, is to predeclare the
# subroutine with a prototype first.
# "sub foo($old_prototype);"

for my $fn (@exports) {
    my $prototype = prototype($fn);
    eval "sub ${caller}::${fn}(${prototype});\n" .
         "*${caller}::${fn} = \\&${pkg}::${fn};";
}

All that I’m really doing is reading in the prototype for each function that is exported by using the built-in prototype method, and then eval-ing a string the predeclares the exported subroutine with the appropriate prototype, and then assigns the implementation of that subroutine to the existing package’s implementation.

This exports the prototype information when the import method is *run*, but the Perl parser will parse as much of the file as possible before starting execution. So if we want the parser to be updated as well, we must force the new import method to be run as soon as possible, which can be done using the standard Perl BEGIN { ... } block, which runs the code inside of it as soon as it is encountered.

So in kdesrc-build, instead of having “use ksb::Debug;“, I have:

BEGIN {
    ksb::Debug->import();
}

And now everything is parsed (and run) just as before! I’ll likely still convert the code to not need this circumlocution at some point, but I thought it was at least technically interesting.

Today I learned…

… that the ~QX11PixmapData(): QPixmap objects must be destroyed before the QApplication object, otherwise the native pixmap object will be leaked. warning most KDE applications display when exiting is actually false.

The X server will cleanup any opened resources, including pixmaps, automatically when the client exits. This is much like how the kernel automatically closes files and memory allocations when an application exits.

(The QPixmaps in question are the ones cached by KIconLoader as best as I can tell. They are cleaned up, just not before QApplication’s destructor runs.)

Bug of Legend

So. Those of you who notice the things I post will presumably notice I haven’t blogged as much recently. Essentially it’s because I have less time for development between school and work (I’m on shore duty so I don’t deploy, but I’m in charge of a division so my hours are still substantial).

The spare time I have had over the past month I’ve mostly spent trying to (re) fix my new favorite bug ever, bug 182026 which is a crash bug in KPixmapCache::discard().

You should hopefully have no clue about what KPixmapCache is other than that it’s what’s crashing most of your favorite apps, so I’ll give you the rundown. Back when KDE 4 was being developed and we were shifting to use the shiny SVG icons and graphics everywhere it was noticed that rendering SVG graphics took rather more time than simply loading a PNG from disk. So, instead of loading a SVG file from disk every single time it was needed, the end result (a PNG) was cached to disk instead, and a quick check is done to ensure that the underlying SVG file didn’t change. I didn’t write the code but thought it was interesting and committed a couple of bug fixes and now I apparently maintain it. =D

For efficiency reasons that on-disk cache is accessed by each KDE process using a shared memory segment (although it then gets inefficient again by using I/O operations anyways, but that’s for later). Anytime you have multiple processes or threads accessing a shared resource you need a way to handle that contention to make sure that the sensitive parts of the code are only being run by one thread at a time.

The second problem that was reported as 182026 was that the “discard” feature of the cache seemed to cause crashes for a lot of people. (Although oddly enough, not for any of the 3 test systems I have. Call me lucky…). The idea of discarding the cache contents is that the application code knows that something changed which means that essentially all of the stored graphics will not be needed. The major way most people see this is by changing Plasma themes (where the old theme’s graphics are now useless) or by changing the icon theme.

The underlying issue is that ::discard() uses essentially no locking. Instead it tells the shared caches to reload themselves on their next operation (a process referred to as “invalidation”), disconnects from shared memory, and deletes the cache file on disk. Or at least, it used to. Merely adding locking helps, but is solving the wrong problem. Any time the cache is disconnected from shared memory, it will try to reconnect later, re-creating the file on disk if necessary (and it is necessary in this case). But this is all just to say that the cache is now empty, there’s no underlying reason that we have to deallocate everything just to turn around and reallocate it.

So when I committed my fix for 182026, I did the following:

  • I added a flag to the cache to mark if it was simply empty or not. This allows me to discard the cache without having to disconnect it from shared memory or invalidate it.
  • Made KPixmapCache::discard() take the cache lock first before it did any of that.
  • When copying the cache for resizing, KSaveFile is used instead of truncating the cache (especially in case any cache locks have to be broken due to timeouts)

Unfortunately my initial fix had the side effect of breaking the “cache” part of KPixmapCache as it would never find cached items again. It was noticed by a KDE Games developer though so 4.4.2 should still be good to go now that I’ve fixed that.

A final side note is that I think I need to change the code that deletes an entire named cache to discard the cache as well for those processes that already have it attached to shared memory.

I have an alternative shared-memory cache implementation which I hope to integrate for KDE Platform 4.5 or 4.6 (although it may simply end up as KSharedCache or similar instead of merely replacing KPixmapCache due to API additions in KPixmapCache I don’t feel like supporting). At some point in between I may write a post about the underlying cache architecture and some “Do”s and “Don’t”s but I think I’ll just stop here for now. :P

ImageMagick Fun

The “fun” in the title should be read in your most sarcastic tone of voice… Anyways, one of my professors mailed us a PDF of a scanned document to read (and print out) for the next class. Being that is was scanned in (by what appeared to be the professor literally holding it above a scanner) there was a lot of excess black in the picture.

I don’t know about you, but printing 2 large blocks of solid black, for 22 pages, doesn’t sound like a wise investment of toner. But ah! Why don’t I just crop off the excess part of each page so that just the scanned-in text is visible, and print that out? This has to be easy, right?

Unfortunately it wasn’t as easy as I’d hoped (most of the picture editors that can even handle PDFs can’t print out each layer as a separate page, and there’s no way I’m doing the exact same operation 22 times). ImageMagick looked like the thing I needed, even if it would take some trial-and-error to figure out exactly how much to crop off.

Turned out it only took a couple of runs to figure out exactly how much I could get away with cropping. But I had a worse problem than having to do trial runs: The output looked horrible.

I tried reading the man page, going to the website, and the rest, and couldn’t figure out what to do. Using the -density option seemed to be the right idea, but alas I couldn’t get it to work.

I troubleshot further, even getting to the point of running gs manually to see if Ghostview or ImageMagick was the problem (turned out it was myself, I guess). Eventually I realized that Ghostview was rendering the initial image to ImageMagick at a low resolution (72 DPI) but viewing the source in Okular, it was obvious that much better was possible (I’d estimate 200 DPI although I ended up using 300). So if I could figure out how to get ImageMagick to pass the right DPI to Ghostview I should have the problem fixed.

More directed Google searching revealed I’d had the right flag the whole time, -density. I just had it in the wrong spot. Something like this is right: convert -density 300x300 input.pdf -crop ... output.pdf. Instead I’d been using convert input.pdf -density 300x300 -crop ... output.pdf.

I figured I’d put my experience out there in the great Internet Memory Machine in case others have similar troubles.

Retro tunes with Phonon

So, in The Beginning, when I was just a young padawan on the Internet, I had been let into a glorious secret: Emulation (not of IBM System/360 machines, but of more important things like the Super NES). Some branching from there led me to zophar.net, a popular emulation site, and their message boards, and also left me with a fascination with emulation.

The attributes of some of the older systems like the NES and Super NES made it fairly easy to capture their music-producing software, since those systems used separate co-processors to handle music effects. NES music would be stored in the NSF format, and SNES music was handled with the SPC format (named after the audio chip used, the Sony SPC700). There were (and still are) specialized plugins on many systems to play these formats (they emulated only the music chip, not the rest of the system).

I’ve been involved on the periphery of some of these things for the past couple of years. (For instance I had written a KFileMetaInfo plugin for KDE 3, and had helped Chris Lee with adding playback support to GStreamer.

One problem with the previous GStreamer solution (which I’ll call gst-spc) is that the underlying playback library, libopenspc, is written in x86 assembly, and has some crash bugs associated with it as well. As well the code has long been orphaned. I’m not really any good at writing emulation code and although I could learn, it would take far too much time for me to do anything useful.

Luckily for me the state of the art has advanced and last year I was pointed to a library called game-music-emu. This library included a very good SPC emulator written in C++, which had been merged into some popular SNES emulators already. Unfortunately it didn’t really have a great build system (using it involved simply copying it into your existing program) so my initial proposal to port GStreamer to use game-music-emu by simply including the source files with GStreamer was rejected. The GStreamer devs preferred to have an external library which could be used (or not) and I couldn’t blame them since in general good OSS projects avoid copying or forking external code.

So I contacted the game-music-emu author (Blargg) asking about the possibility of adding support for building a library, and ended up with commit access and an invitation to do it myself. Hmm.

So I did, and awhile ago I had made a release of “libgme” 0.5.5, working with Blargg has he got free time. My subsequent patch to GStreamer was accepted and since gst-plugins-bad-0.10.14 it has been possible to use libgme to playback many emulated music file types (not just SNES, but others as well).

With that solved I left the issue, but I recently came back to it since I figured out that even after upgrading to gst-plugins-bad-0.10.17 the other day, that gstreamer playback was not using libgme, but the older libopenspc.

At first I thought it was simply my fault, as I’d still had gst-spc installed from years and years ago. Removing gst-spc and libopenspc (just to be double-sure) left me with no SPC playback features. Running gst-inspect confirmed I did not have any gme decoder. WTF.

I then again thought it was my fault because I had installed libgme to /usr/local instead of /usr. So I dutifully wrapped up libgme in an ebuild and installed it. And still nothing. WTF.

I dug into the Gentoo ebuild for gst-plugins-bad and it seems that for whatever reason not all possible plugins are installed. It seems the new installation method is supposed to be that each individual plugin is supposed to have its own ebuild (i.e. gst-plugins-gme), like how Gentoo has split out other packages like KDE into individual ebuilds. Fair enough.

I write another ebuild, and finally hit paydirt:

Screenshot of music player playing SPC files
The Qt example music player playing SPC files

Obviously this does require that you are using the GStreamer backend for Phonon to have this work, otherwise you can just try it in some other GStreamer-using application. (I’d show it in JuK but I’d have to add SPC support to Taglib first)

If you’re interested in the ebuilds I used you can use this Portage overlay, (SHA-512 sum c0ff9aa5413b0c0b14f7c52d5b3ee887edc4e7bf47182e58c21e9c340d8ff7e9). The overlay may or may not work for you, and I don’t even know if overlays are still the “hip” way to do things in Gentoo, but It Works For Me. ;)

kdesvn-build git bug possibly fixed?

So if you’ve used kdesvn-build to build some of the modules that are hosted on Gitorious then you are probably familiar with an error that always comes up when doing the initial checkout. This error is so famous that every “how to build using kdesvn-build” guide I’ve seen over the past couple of months have mentioned that the clone step for qt-copy would need to be done manually.

A Konversation developer, argonel, noticed the issue the other day and got in touch with me, so I had him strace the output of the (successful) manual run and the (unsuccessful) kdesvn-build run. It wasn’t initially super helpful although it clarified what was going on (the gitorious.org end of the connect was closed on their end for some reason).

That was the conclusion of that, but then I get an email the next day from argonel saying that he’d done more digging, and that it was a known issue which could be worked around by adding the -v flag to git, which forces progress output to be displayed even if the output is redirected. (The issue has something to do with kdesvn-build redirecting the git output, if you run the git command manually but redirect its stdout to a file then you’ll see the clone fail after about 30 seconds as well).

This progress output makes the logged output look really bad, however, so the workaround I ended up implementing in kdesvn-build is to show the progress output on the terminal and redirect the rest (you may actually prefer this as it’s possible to see the progress of the checkout now).

In short, the Great kdesvn-build Git Clone Bug should be fixed. Please test the trunk version of kdesvn-build for me to make sure I got it though!