Anatomy of a PE

Preamble

Executables (EXE/DLL) in Microsoft Windows make use of a format called PE (Portable Executable).

A PE image typically lies in the hard drive as a .exe or .dll file, and is loaded by the Windows Loader into the RAM of the system when a process is created from that file (e.g., by double-clicking the .exe). So, basically, we have two different states:

Physical PE image file.
Loaded PE image (running process).

Layout

The layout of a PE image contains the following elements:

A header, glued at the first byte of the file.
A list of consecutive sections.
An optional chunk of trailing (unused) bytes.

These elements are concatenated, and there may or may not be padding space between them. This padding space, if present, is usually filled with 0x00 bytes.

It is possible to add trailing bytes to a PE image. Actually, those files will simply lie there doing nothing. However, some types of processes such as package installers can use the trailing area to add some payload that the process will use (e.g., decrypt, uncompress, …) at some point.

The PE Header

The PE header contains a wealth of information, structured in a fixed way.

Among the many details about the PE that you can gather from the header, there’s the list of sections. The list of sections contains the name, position, and size of each section in the PE, both in physical file format, and when loaded in memory as a process.

Many other things are described by the header, such as the address of the Entry Point (the instruction where code execution must start), and the address of some data directories. It is in these data directories where you can find the list of DLLs the PE depends on, the list of functions exported by the PE, etc…

The PE Header is usually 0x0400 (1024) bytes long. The data structures contained in it are defined in winnt.h, and they are always the same (so you can assume offsets and such). One must pay attention, though, to the fact that depending on whether the process is 32-bit or 64-bit, some pieces of the header will be PE32 or PE64. One can tell whether the process is 32-bit or 64-bit by checking the header flags (which are found at a location which is common to PE32 and PE64 headers).

RVAs and VAs

In general, the addresses found in the PE header are given as RVAs (Relative Virtual Addresses). An RVA is an offset relative to the first byte of the PE image. Assuming that you know the location in memory of the first byte of the PE image (the image base pointer), then the relationship between a VA (Virtual Address) and its corresponding RVA is given by:

VA = ( base + RVA )

RVA = ( VA - base )

If the PE is a physical file, then the base pointer is simply the start of the file. However, if the PE has been loaded as a process, then the base address can be found in several ways that should (in theory) match:

The value returned by GetModuleHandle(0) is, in fact, the base pointer.
The HINSTANCE received by WinMain is, in fact, the base pointer.
The Windows Loader stores the base pointer at the PE header on load, in the BaseOfImage field.

The PE header is glued at this location in memory. So, in run-time, one can use this knowledge to do PE-related operations such as leap-frogging through the PE sections. A typical anti-cracking use is to run a CRC32 on the code section in order to display a badboy message if the executable code has been patched, infected, or tampered with.

Physical vs. loaded state

The anatomy of the PE image is different, yet quite similar, in both (physical vs. loaded) PE states.

When a PE is loaded as a process, the PE image file gets chopped in sections, and these sections get relocated (copied) in memory:

In both states, the PE image forms a block of contiguous bytes.
In both states, the header and the sections are found in the same order.
The amount of padding between sections usually differs.

In their physical file form, PE sections do not need to have any particular padding, so EXE/DLL files can be as small as possible. On the other hand, when loaded, each PE section occupies a certain number of whole memory pages with certain permissions (execution, read-only, etc…). The amount of padding for the start address and the size of each section is given by the PE header.

It is important to note that this relocation procedure simply adds some padding between sections. The section chunks themselves remain identical, and are a straight copy of the original bytes found in the physical file. The only exceptions are some areas which get initialised by the Windows Loader on load (e.g., some fields in the PE header, the IAT, …). I will talk about these in a future post.

[EDIT] I have made significant progress in this area in the 10+ years since I wrote this. But Part II of this post never saw the light of day. Spare time has always been scarce.