73

I am trying to compile a program which uses udis86 library. Actually I am using an example program given in the user-manual of the library. But while compiling, it gives error. The errors I get are:

example.c:(.text+0x7): undefined reference to 'ud_init'
example.c:(.text+0x7): undefined reference to 'ud_set_input_file'
.
.
example.c:(.text+0x7): undefined reference to 'ud_insn_asm'

The command I am using is:

$ gcc -ludis86 example.c -o example 

as instructed in the user-manual.

Clearly, linker is not able to link libudis library. But if I change my command to:

$ gcc example.c -ludis86 -o example 

It starts working. So can please someone explain what is the problem with the first command?

3 Answers3

108

Because that's how the linking algorithm used by GNU linker works (a least when it comes to linking static libraries). The linker is a single pass linker and it does not revisit libraries once they have been seen.

A library is a collection (an archive) of object files. When you add a library using the -l option, the linker does not unconditionally take all object files from the library. It only takes those object files that are currently needed, i.e. files that resolve some currently unresolved (pending) symbols. After that, the linker completely forgets about that library.

The list of pending symbols is continuously maintained by the linker as the linker processes input object files, one after another from left to right. As it processes each object file, some symbols get resolved and removed from the list, other newly discovered unresolved symbols get added to the list.

So, if you included some library by using -l, the linker uses that library to resolve as many currently pending symbols as it can, and then completely forgets about that library. If it later suddenly discovers that it now needs some additional object file(s) from that library, the linker will not "return" to that library to retrieve those additional object files. It is already too late.

For this reason, it is always a good idea to use -l option late in the linker's command line, so that by the time the linker gets to that -l it can reliably determine which object files it needs and which it doesn't need. Placing the -l option as the very first parameter to the linker generally makes no sense at all: at the very beginning the list of pending symbols is empty (or, more precisely, consists of single symbol main), meaning that the linker will not take anything from the library at all.

In your case, your object file example.o contains references to symbols ud_init, ud_set_input_file etc. The linker should receive that object file first. It will add these symbols to the list of pending symbols. After that you can use -l option to add the your library: -ludis86. The linker will search your library and take everything from it that resolves those pending symbols.

If you place the -ludis86 option first in the command line, the linker will effectively ignore your library, since at the beginning it does not know that it will need ud_init, ud_set_input_file etc. Later, when processing example.o it will discover these symbols and add them to the pending symbol list. But these symbols will remain unresolved to the end, since -ludis86 was already processed (and effectively ignored).

Sometimes, when two (or more) libraries refer to each other in circular fashion, one might even need to use the -l option twice with the same library, to give linker two chances to retrieve the necessary object files from that library.

jww
  • 83,594
  • 69
  • 338
  • 732
AnT
  • 291,388
  • 39
  • 487
  • 734
  • 16
    It's not just a GNU thing. This is the standard, POSIX-required behavior: *-l library Search the library named liblibrary.a. A library shall be searched when its name is encountered, so the placement of a -l option is significant. Several standard libraries can be specified in this manner, as described in the EXTENDED DESCRIPTION section. Implementations may recognize implementation-defined suffixes other than .a as denoting libraries.* See http://pubs.opengroup.org/onlinepubs/9699919799/utilities/c99.html – R.. GitHub STOP HELPING ICE Aug 10 '12 at 02:37
  • 3
    @R.. This begs the question, why does the standard require this behavior? Is there some advantage to be had by using this approach? Other compiler tools like msvc and borland doesn't follow this approach and it works just fine. In many ways, it seems better since it's less error prone for users of this tool. – greatwolf Aug 04 '13 at 01:44
  • 1
    @greatwolf: MSVC is just about the opposite of "works just fine" when it comes to C. Anyway, the motivation for the order mattering is that you can have the same symbols might be defined in more than one library, in which case you want to be able to control which one gets used. – R.. GitHub STOP HELPING ICE Aug 04 '13 at 02:15
  • 1
    My impression is that this is not a static library issue only, if you explicitly specify -l:libwhatever.so for example, the undefined reference linkererror persists as long as the -l:libwhatever.so token occurs earlier in the gcc command than the object_file.o token – alexandre iolov Dec 27 '14 at 13:16
  • Your explanation is very clear. Much better than the official doc. They only explain what instead of why. Thank you very much. – Zhang LongQI Nov 17 '17 at 03:48
  • 3
    You might want to add a paragraph on groups in GNU's `ld`. See `--start-group` and `--end-group` in the [`ld(1)` man page](https://linux.die.net/man/1/ld). It effectively tells the linker to revisit archives in the group. – jww Feb 18 '18 at 07:11
  • Note the `lld` linker of the LLVM project [does not implement](https://lld.llvm.org/NewLLD.html#key-concepts) the POSIX behaviour (see @R..'s comment): _"Here is how LLD approaches the problem. Instead of memorizing only undefined symbols, we program LLD so that it memorizes all symbols. When it sees an undefined symbol that can be resolved by extracting an object file from an archive file it previously visited, it immediately extracts the file and link it. It is doable because LLD does not forget symbols it have seen in archive files."_ – nh2 Jul 21 '18 at 01:49
  • 1
    @nh2: That probably breaks certain intentional use of the standard behavior. Is there any option to disable it? – R.. GitHub STOP HELPING ICE Jul 21 '18 at 13:40
9

I hit this same issue a while back. Bottom line is that gnu tools won't always "search back" in the library list to resolve missing symbols. Easy fixes are any of the following:

  1. Just specify the libs and objs in the dependency order (as you have discovered above)

  2. OR if you have a circular dependency (where libA references a function in libB, but libB reference a function in libA), then just specify the libs on the command line twice. This is what the manual page suggests as well. E.g.

    gcc foo.c -lfoo -lbar -lfoo
    
  3. Use the -( and -) params to specify a group of archives that have such circular dependencies. Look at the GNU linker manual for --start-group and --end-group. See here for more details.

When you use option 2 or 3, you're likely introducing a performance cost for linking. If your don't have that much to link, it may not matter.

Community
  • 1
  • 1
selbie
  • 82,148
  • 13
  • 83
  • 154
4

Or use rescan

from pg 41 of Oracle Solaris 11.1 Linkers and Libraries Guide:

Interdependencies between archives can exist, such that the extraction of members from one archive must be resolved by extracting members from another archive. If these dependencies are cyclic, the archives must be specified repeatedly on the command line to satisfy previous references.

$ cc -o prog .... -lA -lB -lC -lA -lB -lC -lA 

The determination, and maintenance, of repeated archive specifications can be tedious.

The -z rescan-now option makes this process simpler. The -z rescan-now option is processed by the link-editor immediately when the option is encountered on the command line. All archives that have been processed from the command line prior to this option are immediately reprocessed. This processing attempts to locate additional archive members that resolve symbol references. This archive rescanning continues until a pass over the archive list occurs in which no new members are extracted. The previous example can be simplified as follows.

$ cc -o prog .... -lA -lB -lC -z rescan-now 

Alternatively, the -z rescan-start and -z rescan-end options can be used to group mutually dependent archives together into an archive group. These groups are reprocessed by the link-editor immediately when the closing delimiter is encountered on the command line. Archives found within the group are reprocessed in an attempt to locate additional archive members that resolve symbol references. This archive rescanning continues until a pass over the archive group occurs in which no new members are extracted. Using archive groups, the previous example can be written as follows.

$ cc -o prog .... -z rescan-start -lA -lB -lC -z rescan-end
flerb
  • 474
  • 6
  • 12