«home
This post gives a fairly thorough breakdown of the contents of a "Hello World" executable on OS X 10.13.3 (High Sierra). The source used to generate the executable is as follows:
#include <stdio.h>
int main()
{
printf("Hello World!\n");
return 0;
}
You might find useful information here if you
are curious about how an executable is structured on a modern *nix OS,
need to manipulate object files in the Mach-O format, or
are interested in the inner workings of dynamic linking on OS X.
Official documentation for the Mach-O object file format is sparse, and much of the unofficial documentation available – while still very valuable – is out of date in crucial respects. For example, Z. Liu's minimal Mach-O executable doesn't work on recent OS X versions. Aiden Steel's useful guide leaves out some crucial features of modern Mach-O executables.
I assume that you're familar with basic concepts from low-level programming
(pointers, memory addresses, registers, the stack, etc.). No detailed knowledge
of x86-64 assembly is required. However, it would be helpful to have a rough
idea of what the MOV
, JMP
, CALL
and LEA
instructions do.
The dynamic linker makes extensive use of LEB128 encoding. Briefly, LEB encodes integer values of arbitrary size as variable-length sequences of bytes. Only the last byte has its most significant bit set. The integer encoded is given by the lowest 7 bits of each byte in sequence (little-endian).
OS X on x86-64
uses the System V calling
conventions.
The details of these conventions are not relevant here, but it would be helpful
to skim Wikipedia's description. The important things to bear in mind are that
(i) not all arguments to a function are passed on the stack and that (ii) there
is convention for determining which arguments go in which registers.
Mach-O executables generated by the standard Xcode tools have a zero page to ensure that dereferencing of a null pointer is trapped by the OS.
OS X has the following tools for dumping the contents of Mach-O files:
List segments/sections:
otool -l a.out
Show dyld opcodes (location of dyldinfo varies with Xcode version):
/Library/Developer/CommandLineTools/usr/bin/dyldinfo -opcodes a.out
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/dyldinfo -opcodes a.out
jtool
is a cross-platform alternative:
http://www.newosxbook.com/tools/jtool.html
x86-64
has a number of instruction-pointer-relative addressing modes. Roughly
speaking, any instruction that takes an address operand can also take a memory
address specified as a signed 32-bit offset from the value of RIP
, the
instruction pointer register. The addition of RIP
-relative addressing reduces
the cost of position-independent code in terms of code size and performance.
As an example, take the following jmp
instruction. The address to jump to is
stored in a given location in memory. The address of this memory location is
specified not in absolute terms, but relative to the value that RIP
has
following decoding of the jmp
instruction. As the offset specified is 0x61,
and the jmp
instruction itself occupies
6 bytes, the target
is the address stored at
the address of the first byte of the jmp
instruction plus 0x67.
jmp QWORD PTR [rip+0x61] # jump to address of this instruction + 0x67
On OS X, or any other modern general purpose operating system, an executable never knows at which address it is going to be loaded, or at which addresses any shared libraries it makes use of are going to be loaded. Any absolute addresses contained in the executable must therefore be translated prior to execution. The use of RIP-relative addressing ensures that relatively few of these relocations need to be performed by the dynamic linker. For a more general introduction to linkers and the concept of relocations, I recommend Ian Lance Taylor's series of blog posts (post 6 in particular).
A Mach-O file consists of:
A Mach64 header.
A sequence of load commands, some of which specify the size and location of code and data segments.
Data for the segments described in the load commands.
Segments can be split into multiple sections. This document does not cover the format of the header or the load commands. This information is collected in in Aiden Steel's guide.
If you're on OS X, the MachOView utility is useful for browsing the structure of MachO files.
The overall layout of our example executable is as follows.
No surprises here. This specifies the architecture, the number of load commands, and the size of the load commands. Details here.
This command loads a segment that occupies the first 4GB of the process's
memory, but that takes up
no space in the executable file. The essential purpose of the __PAGEZERO
segment is to
ensure that null pointer dereferences are trapped. This is achieved by ensuring
that no protection rights are assigned to the segment — it is neither readable,
writable nor executable. See this StackOverflow
question
for discussion of why the (virtual) size of this segment is so large.
This command loads the text segment, which is split into multiple sections.
The __text
section contains the code for the main
function.
The code in the __stubs
and __stub_helper
sections is crucially involved in
calls to dynamically linked functions. Dynamic linking is covered in more detail
in the subsequent section ‘Lazy vs. non-lazy symbol binding’.
This section contains zero-terminated C string constants. In our executable
there is one such constant: "Hello World!\n"
.
From the man page for the unwinddump
utility:
When a C++ (or x86_64 Objective-C) exception is thrown, the runtime must unwind the stack looking for some function to catch the exception. Traditionally, the unwind information is stored in the
__TEXT/__eh_frame
section of each executable as Dwarf CFI (call frame information). Beginning in Mac OS X 10.6, the unwind information is also encoded in the__TEXT/__unwind_info
section using a two-level lookup table of compact unwind encodings.
The comment on __TEXT/__eh_frame
is somewhat out of date, as
the current Xcode tools omit this section. This appears to be a relatively
recent change
(see e.g. this blog post
and
this LLVM mailing list post
from 2014).
In a more complex executable, this segment contains actual program data.
In our executable, it contains only lazy and non-lazy symbol pointers
(in the sections __la_symbol_ptr
and __nl_symbol_ptr
respectively). These
symbol pointers are involved in calling dynamically linked functions.
Details on how this works are in the section
‘Lazy vs. non-lazy symbol binding’.
This segment contains data interpreted by the dynamic linker. Its internal
structure is specified in an additional DYLD_INFO_ONLY
load command.
This load command specifies the internal structure of the __LINKEDIT
segment.
In particular, it gives the offset and size of
some bytecode interpreted by OS X's dynamic linker, and
the symbol export trie.
This command loads the SYMTAB segment. This segment contains the symbol table,
which is
a list of
nlist_64
structures. The SYMTAB segment is included in modern executables largely for
legacy reasons, and the executable will in fact run successfully with it
removed. The string table referenced by this load command is, however, still
used.
This load command specifies the offset of the indirect symbol table. The indirect symbol table is a list of indices into the symbol table. The following fields are used to categorize symbols by specifying ranges of the indirect symbol table:
unsigned long ilocalsym; /* index to local symbols */
unsigned long nlocalsym; /* number of local symbols */
unsigned long iextdefsym; /* index to externally defined symbols */
unsigned long nextdefsym; /* number of externally defined symbols */
unsigned long iundefsym; /* index to undefined symbols */
unsigned long nundefsym; /* number of undefined symbols */
Our executable has no local symbols, so ilocalsym
and nlocalsym
are 0
.
It defines two external symbols
(__mh_execute_header
and _main
), so iextdefsym
is 0
and nextdefsym
is 2
. There are two referenced external symbols
(dyld_stub_binder
and _printf
), so nundefsym
is 2
and iundefsym
is 2
(because the first two entries in the indirect symbol table are
the indices for __mh_execute_header
and _main
). The role of
dyld_stub_binder
is discussed
in more detail in the section ‘Lazy vs. non-lazy symbol binding’.
This is a very simple load command that just specifies the location of the
dynamic linker: /usr/lib/dyld
.
This specifies a unique identifier for the executable.
This load command specifies the minimum version of OS X compatible with the executable (10.13.0).
This load command specifies the version of the source code used to generate the
executable. In our executable, this has the default value of 0.0
.
This load command gives the offset of the __main
function in the file (3936).
In our executable, __main
is at the beginning of the __text
section of the
__TEXT
segment.
There is one LOAD_DYLIB
load command for every library to which the executable
is dynamically linked. In our executable, the only such
library is libc, /usr/lib/libSystem.B.dylib
.
This load command gives the offset and size of the function starts segment. Mark Rowe explains on Stack Overflow that this segment is
... used by tools that need to symbolicate addresses in crash logs, samples, spindumps, etc. to determine if a given address falls inside a function. It could also be useful to debuggers to help them more quickly find the bounds of the function that a given address is within.
The data within this section is formatted as a zero-terminated sequence of DWARF-style ULEB128 values. The first value is the offset from the start of the __TEXT segment to the start of the first function. The remaining values are offsets to the start of the next function from the previous function.
This load command specifies the offset and size of a segment which records the
locations of certain pieces of data that are inlined in the __TEXT
segment.
This segment is empty in our example executable. When present, the format
appears to be simply
a list of data_in_code_entry
structs.
See
the LLVM source
and the entry for this struct in the
LLVM docs.
Following the Mach-O header, the contents of the executable are as follows:
__TEXT
segment__stubs
section__stub_helper
section__cstring
section__unwind_info
section__DATA
segment__nl_symbol_ptr
section__la_symbol_ptr
sectionOur hello world executable is dynamically linked against libc. By default,
dynamically bound symbols like printf
are bound lazily. That is, printf
is not bound when the executable is loaded, but only when the first call to
printf
is made.
The basic concept of how this works is simple. For each dynamically bound symbol the executable stores a function pointer. This function pointer initially points to a ‘stub’. The stub calls the dynamic linker and asks it to look up the address of the relevant function. The function pointer is then overwritten with the function's address. As a result, subsequent function calls proceed directly.
Things are slightly more complex than this in practice because the stub is
split into a stub proper and a stub helper. The stub proper always consists of
a single jmp
instruction. Initially, this jump targets the stub helper. The
stub helper then calls the dynamic linker.
In fact, things are more complex still because the stub helper is
itself decomposed into a stub helper and a ‘stub binding helper’. The reason
for this decomposition is that the dyld_stub_binder
function, which is called
by each stub helper, requires two arguments. One of these arguments is
different for each dynamically bound symbol; the other is the same. The
stub binding helper pushes the constant argument onto the stack. The stub
helper pushes the varying argument onto the stack and then jumps to the stub
binding helper. Unlike a regular C function, dyld_stub_binder
does not
follow the System V calling conventions and takes both of its arguments on the
stack.
Initial state:
First function call:
The call to printf
compiles down to a call to the associated stub. The
arguments to printf
are moved into registers prior to this call. The stub
itself has no arguments.
The stub calls the function at the address stored in the lazy symbol pointer, thereby calling the stub helper.
The stub helper calls the dynamic linker. The dynamic linker overwrites the
lazy symbol pointer with the address of the printf
function itself.
All arguments to dyld_stub_binder
are passed on the stack, so the
arguments to printf
aren't clobbered.
The dynamic linker jumps to printf
.
Subsequent function calls:
printf
itself.The lazy symbol pointer section is 8 bytes long, and thus contains one lazy
symbol pointer. This is what we might expect given that our executable
calls a single library function, printf
.
A0 0F 00 00 01 00 00 0
The most significant non-zero byte is present because page zero occupies the
first 4GB of the executable's address space. The corresponding
file offset is therefore 0x0FA0 = 4000. The __stub_helper
section starts at
3984=0xf90. As we will see shortly, the first stub is 16 bytes long, and
0xf90+16 = 4000. Thus, the lazy symbol pointer points to the second stub. This
is because, as mentioned in the previous section, the first stub is a special
stub that is called by all of the other stubs rather than the stub for a
specific function.
The __stub_helper
section starts at 0xf90 and has size 0x1a=26. It
disassembles as follows:
0: 4c 8d 1d 71 00 00 00 lea r11,[rip+0x71] # 0x78
7: 41 53 push r11
9: ff 25 61 00 00 00 jmp QWORD PTR [rip+0x61] # 0x70
f: 90 nop
10: 68 00 00 00 00 push 0x0
15: e9 e6 ff ff ff jmp rip-0x1a
The nop
instruction pads the section to 16 bytes.
The first three instructions are the stub for dyld_stub_binding_helper
, which
is different in form from the subsequent stubs. The address jumped to in the
third instruction is the address in the memory location at 0x100000f90 + 0x61 +
9 + 6 (where 6 is the size of the jmp
instruction itself). The resulting address
is 0x100001000, which corresponds to offset 0x1000=4096 in the file. This is
the start of the __nl_symbol_ptr
section. Thus, the address jumped to is the
address pointed to by the first non-lazy symbol pointer. The
__nl_symbol_ptr
section is zeroed out in the file, but when the executable is loaded,
the relevant entry is non-lazily set to point to dyld_stub_binder
.
The value loaded into R11 in the snippet above
is the address of the ImageLoader cache (see
dyld_stub_binder.s).
In a regular executable, where the only
non-lazily loaded symbol is dyld_stub_binder
, and the __nl_symbol_ptr
section is 16 bytes long, the address of the ImageLoader cache is the starting
address of __nl_symbol_ptr
plus 8. I don't know exactly what the ImageLoader
cache is, or how the internals of this work.
The last two instructions in the listing above form the sole ordinary stub
helper in this executable. In a larger executable, there would be a long
sequence of stub helpers like this. Each ordinary stub helper pushes a dyld
bytecode offset onto the stack and then jumps to dyld_stub_binding_helper
(which in turn calls dyld_stub_binder
). There is no padding between ordinary
stub helpers.
__stubs
starts at 0xf8a, has a length of 6 bytes and disassembles as follows:
ff 25 80 00 00 00 jmp QWORD PTR [rip+0x80] # 0x86
This is the stub for printf
. The operand to jmp
is the memory address
(specified via a RIP-relative offset) of the first (and in our
executable, only) lazy pointer in __la_symbol_ptr
. It's important not to get
confused into thinking that this is a jump specified via a simple relative
offset. Rather, it is a jump to the memory location stored in the relevant lazy
symbol pointer. It is the location of the lazy symbol pointer that is specified
in a RIP-relative manner.
The string table is simply a sequence of null-terminated strings. In our Hello World executable it starts at 8376=0x20b8:
20 00 _ _ m h _ e x e c u t e _ h e a
d e r 00 _ m a i n 00 _ p r i n t f 00
d y l d _ s t u b _ b i n d e r 00
Because zero offsets into the string table have a special meaning, the first entry
is a dummy. The convention appears to be to use the string " "
for this
purpose.
The symbol table is a list of nlist_64
structs:
// Size: 16 bytes
struct nlist_64 {
union { uint32_t n_strx; } n_un;
uint8_t n_type;
uint8_t n_sect;
uint16_t n_desc;
uint64_t n_value;
};
In our executable the symbol table has offset 8296=0x2068 and contains 4 symbols, thus 64 bytes. Its contents are as follows:
(one nlist_64 per line)
00 00 00 0F 01 10 00 00 00 00 00 01 00 00 00
00 00 00 0F 01 00 00 60 0F 00 00 01 00 00 00
00 00 00 01 00 00 01 00 00 00 00 00 00 00 00
00 00 00 01 00 00 01 00 00 00 00 00 00 00 00
See Aidan Steele's guide for more details on these fields.
The first nlist_64
:
n_strx = 2 the index of the string "__mh_execute_header" in the
string table
n_type = 0x0F N_SECT | N_EXT (N_SECT means that n_sect gives the
section number in this file where the symbol is
defined)
n_sect = 1
n_desc = 0x0010 REFERENCED_DYNAMICALLY
n_value = 0x100000000 the address of the symbol (this = size of page zero
in the case of __mh_execute_header)
The second nlist_64
:
n_strx = 22 the index of the string "_main" in the string table
n_type = 0x0F N_SECT | N_EXT
n_sect = 1
n_desc = 0x0000
n_value = 0x100000F60 the beginning of the __text section.
The third nlist_64
:
n_strx = 28 the index of the string "_printf" in the string table
n_type = 0x01 N_EXT (symbol not defined in this file)
n_sect = 0 dummy value (because N_SECT not set in n_type)
n_desc = 0x0001 REFERENCE_FLAG_UNDEFINED_LAZY
n_value = 0 dummy value (because not defined in this file)
The fourth nlist_64
:
n_strx = 36 the index of the string "dyld_stub_binder" in the
string table
n_type = 0x01 N_EXT (symbol not defined in this file)
n_sect = 0 dummy value (because N_SECT not set in n_type)
n_desc = 0x0001 REFERENCE_FLAG_UNDEFINED_LAZY
n_value = 0 dummy value (because not defined in this file)
The indirect symbol table is a sequence of 32-bit values. Each value is an index into the symbol table. The purpose of the indirect symbol table is to record which symbol is associated with each
stub,
non-lazy symbol pointer, and
lazy symbol pointer.
The indices in a given section of the indirect symbol table are in the same order as the stubs / non-lazy symbol pointers / lazy symbol pointers. So for example, to find the symbol associated with the second lazy symbol pointer, we
add 2 to the specified offset into the indirect symbol table,
look up the index at this offset into the indirect symbol table, then
go to the entry in the symbol table at the resulting index.
In our executable, the indirect symbol table starts at 8360=0x20a8 and has
a length of 4 * sizeof(uint32)
= 16 bytes. Its contents are as follows:
02 00 00 00 | 03 00 00 00 | 00 00 00 40 | 02 00 00 00
These values have the following interpretations:
Index into indirect Index into symtab
0 2 --> _printf
1 3 --> _dyld_stub_binder
2 ??? --> ???
3 2 --> _printf
Offsets into the indirect symbol table are as follows:
__stubs 0
__nl_symbol_ptr 1
__la_symbol_ptr 3
The dynamic linker is called via dyld_stub_binder
. The arguments of this
function do not directly specify which symbols to bind.
Instead, dyld_stub_binder
is given an offset into a special bytecode segment
within the executable that is interpreted by the dynamic linker.
The code for the dynamic linker is split into four sections:
rebase info
binding info
lazy binding info
export info
We can disassmble the dynamic linker section using otool:
dyldinfo -opcodes a.out
This gives the following result:
rebase opcodes:
0x0000 REBASE_OPCODE_SET_TYPE_IMM(1)
0x0001 REBASE_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB(2, 0x00000010)
0x0003 REBASE_OPCODE_DO_REBASE_IMM_TIMES(1)
0x0004 REBASE_OPCODE_DONE()
binding opcodes:
0x0000 BIND_OPCODE_SET_DYLIB_ORDINAL_IMM(1)
0x0001 BIND_OPCODE_SET_SYMBOL_TRAILING_FLAGS_IMM(0x00, dyld_stub_binder)
0x0013 BIND_OPCODE_SET_TYPE_IMM(1)
0x0014 BIND_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB(0x02, 0x00000000)
0x0016 BIND_OPCODE_DO_BIND()
0x0017 BIND_OPCODE_DONE
no compressed weak binding info
lazy binding opcodes:
0x0000 BIND_OPCODE_SET_SEGMENT_AND_OFFSET_ULEB(0x02, 0x00000010)
0x0002 BIND_OPCODE_SET_DYLIB_ORDINAL_IMM(1)
0x0003 BIND_OPCODE_SET_SYMBOL_TRAILING_FLAGS_IMM(0x00, _printf)
0x000C BIND_OPCODE_DO_BIND()
0x000D BIND_OPCODE_DONE
0x000E BIND_OPCODE_DONE
0x000F BIND_OPCODE_DONE
Each opcode is a single byte. The most significant four bits identify the
opcode. Some opcodes allow an immediate value to be stored in the least
significant 4 bits. For example, REBASE_OPCODE_SET_TYPE_IMM(1)
is encoded as
0x50 | 0x01
. Other opcodes can have immediate values following them. These
immediate values are typically either LEB-encoded integer values or
zero-terminated strings.
It's easy enough to find the encoding of each opcode by googling and/or looking at the headers, so I won't list them here.
A lazy symbol pointer starts out pointing to the address of a stub helper. The
address of this helper changes following relocation of the program. Thus, each
lazy symbol pointer must be rebased when the program is loaded. We have a
single lazy symbol pointer (for _printf
), so the rebase opcodes section
contains a single REBASE_OPCODE
command. This command specifies the index of
the load command for the data segment (counting from zero) and an offset of
0x10 into this segment – the start of the __la_symbol_ptr
section.
Setting the type to 1
specifies that the entity being rebased is a pointer.
REBASE_OPCODE_DO_REBASE_IMM_TIMES
is used to rebase a contiguous sequence of
pointers using a single command. Thus, if our program called three functions in
libc rather than one, REBASE_OPCODE_DO_REBASE_IMM_TIMES(1)
would become
REBASE_OPCODE_DO_REBASE_IMM_TIMES(3)
.
The binding opcodes section contains the command to non-lazily bind
dyld_stub_binder
. BIND_OPCODE_SET_DYLIB_ORDINAL_IMM(1)
takes as its argument
the index of /usr/lib/libSystem.B.dylib
. The index is 1
because this library
is loaded by the first LC_LOAD_DYLIB
load command in the file. Setting the
type to 1
specifies
that the symbol is a pointer. 0x02
is the index of the load command for the
data segment (counting from zero). The offset of zero specifies the beginning
of the first section of the data segment, __nl_symbol_pointer
. Thus, the
effect of this command is to set the pointer in __nl_symbol_pointer[0]
to
point to dyld_stub_binder
.
The lazy binding opcodes section binds the lazy symbol pointers. In
the case of our example executable, the only lazy symbol pointer is _printf
.
The offset is 0x02
because of the two non-lazy symbol pointers at the
beginning of the data segment.
Note that BIND_OPCODE_DONE
is zero.
The last two BIND_OPCODE_DONE
opcodes in the listing are just padding.
The export trie is primarily of interest for dylibs rather than for
executables. Nonetheless, our executable does export two symbols:
__mh_execute_header
and _main
. The export trie stores the names of all
exported symbols together with various associated properties. The headers give
the following description:
The symbols exported by a dylib are encoded in a trie. This is a compact representation that factors out common prefixes. It also reduces LINKEDIT pages in RAM because it encodes all information (name, address, flags) in one small, contiguous range. The export area is a stream of nodes. The first node sequentially is the start node for the trie.
Nodes for a symbol start with a uleb128 that is the length of the exported symbol information for the string so far. If there is no exported symbol, the node starts with a zero byte. If there is exported info, it follows the length.
First is a uleb128 containing flags. Normally, it is followed by a uleb128 encoded offset which is location of the content named by the symbol from the mach_header for the image. If the flags is EXPORT_SYMBOL_FLAGS_REEXPORT, then following the flags is a uleb128 encoded library ordinal, then a zero terminated UTF8 string. If the string is zero length, then the symbol is re-export from the specified dylib with the same name. If the flags is EXPORT_SYMBOL_FLAGS_STUB_AND_RESOLVER, then following the flags is two uleb128s: the stub offset and the resolver offset. The stub is used by non-lazy pointers. The resolver is used by lazy pointers and must be called to get the actual address to use.
After the optional exported symbol information is a byte of how many edges (0-255) that this node has leaving it, followed by each edge. Each edge is a zero terminated UTF8 of the addition chars in the symbol, followed by a uleb128 offset for the node that edge points to.
There is also a good description of the export trie on the following page (under the "Export Trie" heading):
http://www.m4b.io/reverse/engineering/mach/binaries/2015/03/29/mach-binaries.html
Some flag values:
EXPORT_SYMBOL_FLAGS_REEXPORT = 8
EXPORT_SYMBOL_FLAGS_STUB_AND_RESOLVER = 16
The trie data from our executable is as follows:
byte 5
|
00 01 _ 00 05 00 02 _ m h _ e x e c u
t e _ h e a d e r 00 21 m a i n 00
25 02 00 00 00 03 00 E0 1E 00 00 00 00 00 00 00
| |
byte 33 byte 37
We can sort of see already from this that the overall structure of the trie is as follows:
o
|
|
'_' BRANCH 1
|
|
o
/ \
BRANCH 2 / \ BRANCH 3
/ \
'_mh_execute_header' 'main'
/ \
o o
The two symbols encoded in the trie are __mh_execute_header
and _main
.
Byte(s) | Encoded value | Interpretation |
---|---|---|
0 | 0 | No terminal string info here. Root node. |
1 | 01 | Number of branches leaving this node. |
2-3 | _\0 | Label of branch 1 (see diagram above). |
4 | 05 | Offset from start of trie to beginning of next node. |
5 | 0 | No terminal string info here. |
6 | 2 | Number of branches leaving this node. |
7-25 | _mh_execute_header\0 | Label of branch 2 (see diagram above) |
26 | 33 | Offset from start of trie to beginning of next node. |
27-31 | main\0 | Label of branch 3 (see diagram above). |
32 | 0x25=37 | Offset from start of tree to beginning of next node. |
33 | 2 | Length of terminal string info. |
34 | 0 | Symbol export flags. |
35 | 0 | Offset of symbol __mh_execute_header in file. |
36 | 0 | Number of branches leaving this node. |
37 | 3 | Length of terminal string info. |
38 | 0 | Symbol export flags. |
39-40 | 3936 (LEB) | Offset of symbol _main in file. |
41 | 0 | Number of branches leaving this node. |
42-48 | -- | Padding. |
The export trie tells us that __mh_execute_header
starts at the beginning of
the file while _main
starts at byte 3936. It makes sense that
__mh_execute_header
starts at the beginning of the file, since this symbol is
made available so that programs can inspect their Mach-O headers. The value of
3936 for __main
also makes sense as this is the offset of the __text
section
of the __TEXT
segment.
There isn't any interesting use of the ‘symbol flags’ flags byte in our executable. This byte can be used to encode the following info:
Kinds (least significant two bits in flags byte):
0 Regular symbol
1 Thread local symbol
2 Absolute symbol
Types (bits 3 and 4 in flags byte):
0 Regular
4 Weak (program will still exec if symbol not found?)
8 Reexport
16 A ‘stub’ with a uleb128 stub offset followed by a
uleb128 resolver offset. Not to be confused with
stubs in the sense above. Don't know what this is
exactly.
The use of ULEB encoding for node offsets makes it surprisingly difficult to generate the export trie. Non-terminal nodes in the trie reference other nodes via their offsets in the encoded byte stream. The number of bytes used to encode an offset varies depending on the size of the offset value. Increasing the number of bytes occupied by an encoded offset has a knock-on effect on the values of other offsets, which in turn affects the number of bytes required to encode these offsets.
The following is a sketch of the export trie generation algorithm used by the
standard tools (see
makeTrie
in
MachOTrie.hpp).
First, calculate the size of a node on the
assumption that the offsets of each of its children
occupy a single byte. If one of the offsets can't fit in a byte, then increase
the size of this offset. Update as necessary the offset values of the
node's other children and the offset values of its descendants' children.
These updates
may cause the encoded size of some of the offsets to increase.
Repeat the cycle until the encoded size of all offsets stabilizes.
Pseudocode:
Each trie node has the fields
SIZE (integer),
MAX_DISP_SIZE (integer).
Initial value of SIZE for each node =
encoded size of the node excluding any child offsets
Initial value of MAX_DISP_SIZE for each node =
1
START:
Set OFFSET := 0
Visit each node of the trie in pre-order:
OFFSET += N.SIZE
If the node has as-yet unvisited children:
ULEN := uleb encoded length of OFFSET
If ULEN > N.MAX_DISP_SIZE:
N.SIZE += ULEN - N.MAX_DISP_SIZE
N.MAX_DISP_SIZE = ULEN
GOTO START
On recent OS X versions, a viable Mach-O executable must be almost as complex as the executable for a Hello World C program produced by the standard Xcode tools. However, I have verified that the following load commands (and associated segments where applicable) can be removed without rendering the object file unexecutable on OS X 10.13.4:
LC_VERSION_MIN_MACOSX
LC_SOURCE_VERSION
LC_DATA_IN_CODE
LC_FUNCTION_STARTS
LC_UUID
LC_SYMTAB
The __cstring
and __unwind_info
sections of the __TEXT
segment.
__cstring
section can't be removed from our example executable
because it contains the string constaint passed to printf
. But in general,
a Mach-O file without a __cstring
segment will still execute.[1] A comprehensive but dated reference for the Mach-O file format by Aidan Steele. Based on a PDF released by Apple in 2009.
[2] A twenty part essay on linkers by Ian Lance Taylor. No Mach-O specific content, but a wealth of useful information on linking and object file formats in general.
[3] Stack Overflow question with discussion of requirements for a minimal Mach-O executable.
[4] Useful information on the internal workings of the OS X dynamic linker and its bytecode format (from Jonathan Levin's New OS X Book).
[5] A detailed but dated description of a minimal Mach-O executable by Mike Ash.
[6][7] A two-part comparison of ELF and Mach-O by Joe Damato. Contains some useful info about how dynamic linking works on OS X.
[8] Facebook's ‘fishhook’ utility. The readme has an extremely useful diagram showing the interaction of the various different symbol-table structures in a Mach-O executable.
[9] Some useful discussion of stubs on Stack Exchange.
[10] Info on the global offset table in Apple's official docs.
[11] Slides for a presentation on the structure of MachO files by Anthony Shoumikhin. Covers calls to dynamically bound functions in detail.