Tutorial 02: Program Structure

Objectives:
  • Understand the structure of an easm source code file.
  • Understand the purpose of each of the source code sections.

Introduction

An easm source code file is organised into 'sections'. The word section has been used to reflect the close relationship that an easm source code file has to its final executable. For example - a Win32 binary must have a code section which contains the binary machine code of the program. Similarly, it must have a data section which contains space for any global variable declarations made in the program. Mirroring this, easm contains a code section and a data section too - for exactly the same purposes.

Since I've already mentioned two of the sections, I'll mention the rest for completeness. Following from code and data, there are also sections for imports, constants, functions and structures. Don't worry – we will dig deeper into each of them later on in this lesson.

Subsystem

Now that the concept of sections is out there, I'll start from the top of a typical source code file and explain what we might find. The first instruction you should be coming across is the subsystem instruction. It will have one of two possible operands - those being cui and gui. The former operand specifies that the current source code file should be compiled as a console application and the latter specifies that the application should be compiled as a graphical user interface application. The only difference is that cui gets a console window and gui doesn't. Note however, it is possible to have a gui application that has a console too.

subsystem cui

Includes

The next instruction you might see is the include directive. The include directive is used to import declarations from other source code files. We will look at includes in more depth in another lesson.

include "includes\win32.easm"

Imports

We now reach the imports section where bindings to external functions are declared. In this section you would usually list each of the functions that you wish to call at some stage in your application. The calling convention should be specified in the case that it is not stdcall (for example, C library functions such as printf and scanf).

section imports from msvcrt.dll import printf using cdecl from msvcrt.dll import scanf using cdecl from user32.dll import MessageBoxA from kernel32.dll import ExitProcess

As you can see from the code example, four function are being imported from 3 different libraries. The syntax should be easy enough to understand - we have 3 keywords: from, import and using. The using keyword is only used when we need to specify that a function is not a stdcall function. Here, we indicate that the functions printf and scanf are both cdecl functions. The functions MessageBoxA and ExitProcess are assumed to be stdcall as they do not have a using keyword in their declarations. Now that these four functions have been identified, they may be called in the code as if they were functions we had declared ourselves.

Going Deeper: For those that are interested in how this works - the easm assembler reads these declarations and converts them into a section within the final executable called the "import table". The import table lists the DLLs that should be loaded into the same address space as the executable and which functions from these DLLs will be invoked by the executable. When a call instruction is found that points to an imported function - the address generated is the address of the import table's entry for the particular function. This address is replaced by the virtual address of the function once the executable and the related DLLs have been loaded into memory by the operating system's executable loader.

Constants

Moving on, we reach the constants section. This section is used for the declaration of symbolic constants. A symbolic constant is really just a macro. Wherever the constant identifier is found, it is replaced by the constant value instead. This provides a mechanism for giving meaningful names to arbitrary values.

section constants const HWND = 4 const NULL = 00h const INT = dword

The code example above creates three symbolic constants named HWND, NULL and INT respectively. Again the syntax here should be fairly straightforward. A constant declaration begins with the keyword const, followed by the desired constant identifier, then an equals sign and finally the constants value. Keep in mind that every instance of the constant identifier is substituted for its associated value. This means that the value must be a valid language element to avoid the substitution resulting in an invalid statement.

Structures

The next section we commonly find is the structures section. As you may have guessed, structures are defined here. Let's see how this looks before getting into the semantics of it.

section structures structure RECT dword left dword top dword right dword bottom

Anyone familiar with Win32 programming will recognise the structure definition as the standard structure for defining rectangles. Each structure definition follows the same pattern. The structure keyword is followed by the desired structure identifier to create the initial structure. Following this are the individual field definitions within the structure and these follow the same rules as normal data declarations (which we haven't yet spoke about - so don't worry if you don't know what they are). Now that a structure has been defined, it may be declared in the same way as a regular variable is declared. You will see more of this in the next part when we talk about the data section.

Structures are used to group common declarations under the same identifier. The support for structures in easm allows greater flexibility when it comes to Windows programming. Many internal Win32 API functions use some kind of structure in some way - so it is important that this can be captured in easm.

Going Deeper: As of the current easm release (1.0.1), structures are just 'syntactic sugar' in that they result in a declaration being made for each of the fields within the structure. Each of the declarations combine the structure identifier with the field name to produce a regular variable declaration. Using the example provided above - if a structure was declared within the data section named myRect, four variable declarations would be generated with the identifiers: myRect.left, myRect.top, myRect.right and myRect.bottom. This means that the address of the structure is actually the address of the first field in the structure. In this example, the address of the structure would be found using the statement &myRect.left.

Variables

The data section commonly follows the structures section and this is where you will find variable declarations. Let's take a look at an example.

section data byte byteVariable word wordVariable = 01h dword dwordVariable = abcdabcdh string buffer[255d] struct RECT rectangle

Five variables are declared in this example. The first three declarations create a byte-sized variable, a word-sized variable and dword-sized variable. Notice how the second and third declarations also specify initial values. The variable byteVariable receives an implicit zero value because it was not initialised explicitly.

The fourth declaration creates a string buffer containing 255 bytes of space. The space for the buffer is allocated within the same location as the other three declarations and adds 255 bytes to the executable file size. A string declaration has a maximum of 4096 bytes available to any one declaration.

The fifth and final variable declaration creates an instance of the RECT structure named rectangle.

Functions

The second-to-last section that can occur in an easm source code file is the functions section. This section is where you will find declarations for each user-defined function in the program. Let's see what it looks like:

section functions function void DoHelloWorld (message:4) // do something with parameter 'message' end

As you can see from the example, a function declaration has the following syntax:

function [return-type] [identifier] ([parameter:size], ...) end

The [return-type] element has two options: void and dword representing no return value and a 32-bit return value respectively. If the function doesn't need to return a value, you should use void. If the function must return a value, you should use dword.

The [identifier] element represents the function name. In the example given, the function identifier is DoHelloWorld. This is the identifier that will be used when accessing the function in later code. Next we see that the function identifier must be followed by a set of parenthesis that contain the function parameters. A function parameter has the form identifier : size. The parameter identifier must be unique among other parameters declared within the same function. The size element of the parameter has three options: 1, 2 and 4 representing a byte-sized parameter, a word-sized parameter and a dword-sized parameter respectively. Commonly you will find the parameter size being set to 4 to reflect the nature of 32-bit Windows programming.

The remainder of the function will be the series of statements that make up the actual functionality - commonly referred to as the function body. The keyword end is used to mark the completion of the function. Once a function is declared, it may be called from within the code section via the call instruction (we'll talk about the code section in the next part). Be aware that the call instruction can be used to call both your own user-defined functions and imported functions too.

Code

The final section you will see in a typical easm source code file is the code section. The code section can be thought of as the entry point to your application. The first instruction within the code section is the first instruction that will be executed. Let's look at an example:

section code set eax = 01h eax += 01h call DoHelloWorld ("Hello World") call ExitProcess (00h)

The example isn't really doing anything useful here. First we set the value of the eax register to 1. We then add 1 to the eax register - making the value 2. The code then calls the DoHelloWorld function from an ealier example and finally calls the import function ExitProcess. The aim of the example is to show that the code section can contain any number of statements that do whatever is needed to serve the purpose of the application.

Usually a source code file will have a code section but it's possible for the code section to be omitted. This situation only occurs when we're writing a include library. It makes sense for an include library to have no code section as the purpose of the include library is to declare certain reusable parts of code that are simply copied into other source code files through the include keyword.

On a final note, you will often see a call to ExitProcess as the last statement in most easm applications. This is because if the code doesn't explicitly cause the current process to end when it is finished, it will continue to read into memory and eventually cause an access violation.

Summary

In this lesson we have looked at the structure of an easm source cod file. You have seen basic examples of each of the potential source code sections and seen what each section is used for. In the next lesson, we will look at a fully working example and break it down into each component part, using the knowledge we have gained in this lesson.