Tracking down a library bug

So today I had noticed I had build failures in quite a few modules that were based on errors linking to libkipi, involving undefined references to KIPI::ImageInfoShared::~ImageInfoShared().

Normally fixing this is as easy as ensuring that the library which provides the symbol has been updated, built, and installed and then running kdesrc-build --refresh-build on the modules that had failed. In this case it didn’t work though. Hmm…

Looking at my log/latest/gwenview/error.log showed that it was the same build failure, so I went to the affected build directory and ran make VERBOSE=1, which shows the command line involved.

The output was a whole lot of something like:

/usr/lib/ccache/bin/g++   -march=native -pipe removed .o files
/home/kde-svn/kde-4/lib/libkfile.so.4.7.0
../lib/libgwenviewlib.so.4.7.0 /home/kde-svn/kde-4/lib/libkio.so.5.7.0
/home/kde-svn/kde-4/lib64/libkipi.so more removed stuff

Had I been paying close attention I may have noticed the actual problem right here, but in the event I merely noticed that libkipi had no version number referenced, only the libkipi.so.

The next step for me was to try and figure out why the symbol wasn’t defined, but first I wanted to make sure that the symbol wasn’t defined, which can be accomplished using the nm tool.

The output of nm lib64/libkipi.so needs to be filtered to make it useful. I ended up just grepping for the mangled symbol name but you can unmangle the symbol names and grep for that as well. After running nm lib64/libkipi.so | grep ImageInfoShared I saw that the destructor was actually defined three times!

$ nm  /home/kde-svn/kde-4/lib64/libkipi.so | grep _ZN4KIPI15ImageInfoSharedD
0000000000014728 t _ZN4KIPI15ImageInfoSharedD0Ev
00000000000146e0 t _ZN4KIPI15ImageInfoSharedD1Ev
00000000000146e0 t _ZN4KIPI15ImageInfoSharedD2Ev

Two of the destructor names pointed to the same address so there was only two different functions, but why were there even 2? Looking into this further revealed that the different destructors are actually defined in the C++ ABI implemented by gcc, specifically that:

  • The D0 destructor is used as the deleting destructor.
  • The D1 destructor is used as the complete object destructor
  • The D2 destructor is used as the base object destructor.

The D1 destructor is presumably used as a shortcut when the compiler is able to determine the actual class hierarchy of a given object and can therefore inline the destructors together into a “full destructor”, which the D2 destructor would be used when the ancestry is unclear and therefore the full C++ destruction chain is run. Neither of these would deallocate memory though, which is why the separate D0 destructor is needed (which is presumably otherwise equivalent to D2, but that’s just me guessing).

Either way, the destructors were actually just normal operation of the compiler. All the bases appeared to be covered, t from the nm output means that the symbol is defined in the text section, which means it should be available, right?

As it turns out I had to read the nm manpage closer… lowercase symbol categories imply that the symbol is local to that library, or in other words that it does not participate in symbol linking amongst other shared objects. That would explain why gcc seemingly couldn’t find it.

But why was the symbol local instead of global? As far as this goes, I’m still honestly not sure. It turns out that LIBKIPI_EXPORT was defined to expand to KDE_IMPORT instead of KDE_EXPORT. But on Linux both of those change the visibility of the affected symbol to be exported (KDE_IMPORT makes more sense on the Windows platform, but is essentially the default on Linux already). So although this appeared to be the issue, it was actually not a concern.

However, playing around with that #define and recompiling libkipi made me realize that the affected library didn’t appear to have changed after I ran make install… which turned out to be due to my KDE libraries getting installed to $HOME/kde-4/lib, but libkipi was in $HOME/kde-4/lib<b>64</b>, and was getting picked up by CMake and FindKipi.cmake somehow.

Perhaps I hit a transient buildsystem bug, perhaps it was something else. But removing the stray lib64 directory and rebuilding the affected modules appears to have fixed everything. At least I learned the reason there’s up to 3 different destructor symbol names for a C++ class, I guess. ;)