Memory Layout of an Executable

This article will be discussing the layout of a process loaded into memory, its sections, and how the process is run.

Before we dive into what sections are and how they're utilized by processes, however, we need a basic understanding of memory. Memory is a buffer of bits that a computer uses to store values. In a nutshell, memory is just a way for your computer to store, read, and change values when necessary.

For a more detailed look at how memory works at a hardware level, I highly recommend checking out Core Dumped's Youtube channel.

What exactly are sections?

When a program is compiled into an executable file, the file doesn't just contain the code to execute.

If we look at the structure of an executable file, we can see that there are actually quite a few parts to it divided by "section headers".

Using objdump, we can view these section headers:


stoatsec@blog~$ objdump -h bof

bof:     file format elf64-x86-64

Sections:
Idx Name          Size      VMA               LMA               File off  Algn
    0 .interp       0000001c  0000000000000318  0000000000000318  00000318  2**0
                    CONTENTS, ALLOC, LOAD, READONLY, DATA
    1 .note.gnu.property 00000040  0000000000000338  0000000000000338  00000338  2**3
                    CONTENTS, ALLOC, LOAD, READONLY, DATA
    2 .note.gnu.build-id 00000024  0000000000000378  0000000000000378  00000378  2**2
                    CONTENTS, ALLOC, LOAD, READONLY, DATA
    3 .note.ABI-tag 00000020  000000000000039c  000000000000039c  0000039c  2**2
                    CONTENTS, ALLOC, LOAD, READONLY, DATA
    4 .gnu.hash     00000024  00000000000003c0  00000000000003c0  000003c0  2**3
                    CONTENTS, ALLOC, LOAD, READONLY, DATA
    5 .dynsym       00000138  00000000000003e8  00000000000003e8  000003e8  2**3
                    CONTENTS, ALLOC, LOAD, READONLY, DATA
    6 .dynstr       000000c8  0000000000000520  0000000000000520  00000520  2**0
                    CONTENTS, ALLOC, LOAD, READONLY, DATA
    7 .gnu.version  0000001a  00000000000005e8  00000000000005e8  000005e8  2**1
                    CONTENTS, ALLOC, LOAD, READONLY, DATA
    8 .gnu.version_r 00000040  0000000000000608  0000000000000608  00000608  2**3
                    CONTENTS, ALLOC, LOAD, READONLY, DATA
    9 .rela.dyn     000000d8  0000000000000648  0000000000000648  00000648  2**3
                    CONTENTS, ALLOC, LOAD, READONLY, DATA
    10 .rela.plt     00000090  0000000000000720  0000000000000720  00000720  2**3
                    CONTENTS, ALLOC, LOAD, READONLY, DATA
    11 .init         0000001b  0000000000001000  0000000000001000  00001000  2**2
                    CONTENTS, ALLOC, LOAD, READONLY, CODE
    12 .plt          00000070  0000000000001020  0000000000001020  00001020  2**4
                    CONTENTS, ALLOC, LOAD, READONLY, CODE
    13 .text         00000857  0000000000001090  0000000000001090  00001090  2**4
                    CONTENTS, ALLOC, LOAD, READONLY, CODE
    14 .fini         0000000d  00000000000018e8  00000000000018e8  000018e8  2**2
                    CONTENTS, ALLOC, LOAD, READONLY, CODE
    15 .rodata       000002dc  0000000000002000  0000000000002000  00002000  2**3
                    CONTENTS, ALLOC, LOAD, READONLY, DATA
    16 .eh_frame_hdr 00000064  00000000000022dc  00000000000022dc  000022dc  2**2
                    CONTENTS, ALLOC, LOAD, READONLY, DATA
    17 .eh_frame     0000017c  0000000000002340  0000000000002340  00002340  2**3
                    CONTENTS, ALLOC, LOAD, READONLY, DATA
    18 .init_array   00000008  0000000000003dd0  0000000000003dd0  00002dd0  2**3
                    CONTENTS, ALLOC, LOAD, DATA
    19 .fini_array   00000008  0000000000003dd8  0000000000003dd8  00002dd8  2**3
                    CONTENTS, ALLOC, LOAD, DATA
    20 .dynamic      000001e0  0000000000003de0  0000000000003de0  00002de0  2**3
                    CONTENTS, ALLOC, LOAD, DATA
    21 .got          00000028  0000000000003fc0  0000000000003fc0  00002fc0  2**3
                    CONTENTS, ALLOC, LOAD, DATA
    22 .got.plt      00000048  0000000000003fe8  0000000000003fe8  00002fe8  2**3
                    CONTENTS, ALLOC, LOAD, DATA
    23 .data         00000010  0000000000004030  0000000000004030  00003030  2**3
                    CONTENTS, ALLOC, LOAD, DATA
    24 .bss          00000010  0000000000004040  0000000000004040  00003040  2**4
                    ALLOC
    25 .comment      0000001b  0000000000000000  0000000000000000  00003040  2**0
                    CONTENTS, READONLY
                    

These section headers are prefixed by a period, and then followed by some string of characters. Each area is used for a specific purpose, be it storing variable data, code and program metadata, etc.

Noteable sections include '.text', which contains the program's code, '.data', which is used for values known at compile time (global/static variables), and '.bss', which contains data initialized with zero. There are also ".plt" and ".got" sections that are used in the dynamic linking of processes, but we won't be covering that process now.

These sections contained in the compiled binary are then loaded into memory, creating the process' structure.

Loading the process into memory is handled by the kernel and is a bit involved, so I'll be talking about that in another paper. For now, just understand that the data stored in the executable file's sections is loaded into memory in corresponding sections.

A portion of the process' memory will be used for the stack and heap.

The stack and the heap

The stack and heap are regions of a process' memory used for storing data at runtime. These are both data structures that to store data like variables, but exhibit different behavior.

The stack, for instance, can be thought of as a stack of books. It follows a "last in first out" system, and is addressed with the assembly instructions "pop" and "push". "push" places some data on the top of the stack, while "pop" takes the last piece of data placed on the stack and stores it in a CPU register.


               Stack      
        ┌──────────────────┐
        │  Variable X      │ <-- accessed first
        ├──────────────────┤
        │  Variable Y      │ <-- accessed second
        ├──────────────────┤
        │  Variable Z      │ <-- accessed third
        ├──────────────────┤
        │        ...       │
        └──────────────────┘
                    

Variable X would be "popped off the stack" first, then variable Y, and then variable Z.

Important to note, the stack grows downwards instead of upwards, as shown in the example, and data on the stack is immutable by nature.

Data on the heap, however, is dynamic and can grow.

The heap is a slab of memory that contains "chunks" of data that can be accessed whenever and from wherever, regardless of their position in the heap. If a variable stored on the heap wants to grow in size, the chunks of data associated with that variable can grow in size to accommodate that change.

Data stored on the heap can also be broken up into more than one chunk in a process called "fragmentation", so if a chunk of data for a variable that wants to grow in size is sandwiched between two other chunks of data, another chunk can be added elsewhere that the variable will be able to grow in to.


                         Heap      
    ┌───────────────┬───────────────────────────────────┐
    │  Variable X   │              Variable Y           │
    ├───────────────┴──┬─────────────────────────┬──────┤ 
    │    Variable Z    │  Variable X (2nd chunk) │      │
    ├──────────────────┴─────────────────────────┴──────┤
    │                                                   │
    │                      ...                          │
    │                                                   │
    └───────────────────────────────────────────────────┘
                    

Variable X in this graphic has two chunks. An example of how a variable might be fragmented is a string with some characters stored in either chunk.

One thing that is important to remember is that the implementation of the stack and heap can differ between operating systems. Ultimately, these behaviors exhibited by the stack and heap are defined by the developers of the operating system running the process. Another important thing to remember is that despite their different behaviors, the stack, heap, and any other regions of memory are just a slice of the overall buffer of bits that is memory. Poorly defined or not defined checks on memory allocations have created the 'buffer overflow' class of vulnerability, which remains one of the most commonly abused binary exploitation techniques to this day.

Also, when threads are spawned by a process, they typically share the main process' heap and have their own stack. Browsers like Chromium actually spawn new processes for each tab, which provides greater security. The drawback is that processes are much more resource-intensive than threads. Its a tradeoff, and also one of the reasons people complain about Chromium-based browsers being RAM hogs.

Execution

Once a program has been loaded into memory, execution can begin. I won't be covering process execution here either, as it requires a good understanding of assembly, but a specialized CPU register known as the "instruction pointer", "ip", or "rip" is directed to the entrypoint of the process' code. In ELF executables, some code held in the '.init' section will be run before control is handed off to the main() function.

Back to Home