Assembly Language Fundamentals


Basic Elements of Assembly Language

Although Debug is adequate for the assembly language examples used here, it has limitations that make its use somewhat awkward. These examples are best implemented using either the Microsoft Assembler (MASM), or Turbo Assembler (TASM). 

 

Constants and expressions

Numeric Literal: A combination of digits, optionally a sign, decimal point and an exponent.

        Examples of numeric literals:

       5
       5.5
      -5.5
       26.5E+05

Integer constants can end with an uppercase or lowercase radix (base) symbol: h = hexadecimal,  q (or o) = octal, d = decimal (default), b == binary. A constant expression consists of a well-defined combination of numeric literals, operators, and defined symbolic constants.

Examples of integer constants and constant expressions:

Integer constants

Constant expressions
26
decimal  
5
1Ah
hexadecimal  
26.5
1101b
binary  
4 * 20
36D
decimal  
-3 * 4 / 6

String or character constants are embedded in single or double quotes. Embedded quotes are allowed. 

Examples of string and character constants:

'ABC'
'X'
"This is a message"
'4042'
"This isn't a test"
'Say "hello" to John'

 

A variable is a location in a program's data area that has been assigned a name. For example:

count1    db    50        ; count1 is a variable (memory allocation)

A label serves as a place marker when a program needs to jump or loop from one location to another. A label can be followed by a blank line, or can be on a line with an instruction. In the following example, Label1 and Label2 are labels identifying locations in a program:

Label1:	mov	ax, 0
	mov	bx, 0
	:
	:
Label2:
	jmp	Label1	; jump to Label1

A keyword always has some predefined meaning to the assembler. It can be an instruction, or a directive. Examples are MOV, PROC, TITLE, ADD, AX, and END. Keywords cannot be used out of context. In the following example, the label add is a syntax error:

	add:	mov	ax, 10	; Error because add cannot be used as label

 

Statements

An assembly language statement is either an instruction (executable statement) or a directive (provide information on how to generate code): 

            [<label:>]  <mnemonic> [<operands>]      [; <comment>]

Statements are freeform with white space between each component. Cannot be longer than 128 characters, but can be extended to the following line if the last character is \ (backslash).

Examples of instructions shown by category:

call	MySub		; transfer of control
mov	ax, 5		; data transfer
add	ax, 20		; arithmetic
jz	next1		; logical (jump if Zero flag was set)
in	al, 20		; input/output (reads from hardware port)

 

Sample Program 1

The following example shows assembly program that displays the traditional "Hello World" message. The first line contains the TITLE directive; all characters on this line are treated as comments, as well as the next line. 

Segments are the building blocks of programs: The code segment is where the program instructions are stored; the data segment contains all the variables, and the stack segment contains the program's runtime stack. The stack is a special area in memory that the program uses when calling and returning from subroutines.

Example 1: The Hello World program: 

title Hello World program 	(hello.asm)

; This program displays "Hello world"
.model small	; The '.' precedes assembler directives
.stack 100h	; Allocate stack size
.data
HelloMess	db	'Hello, World',13,10,'$'
.code
main proc
	mov ax, @data
	mov ds, ax	;Set DS to point to data segment
	mov ah, 9			;Print string function
	mov dx, OFFSET HelloMess	;Point to "Hello World"
	int 21h				;Display "Hello World"
	mov ah, 4C00h			;Terminate program function
	int 21h				;Terminate the program
main endp
end main

Description of important lines in program:

The .model small directive indicates that the program uses no more than 64K memory for code and 64K for data. The .stack directive sets aside 100h (256) bytes of stack space for the program. The .data directive marks the beginning of the data segment where variables are stored. 

The HelloMess variable is declared to hold the string "Hello, World", along with two bytes containing the newline character sequence (13, 10). The '$' is a required string terminator character needed by the output subroutine used further.

The .code directive marks the beginning of the code segment, where the executable instructions are located. The proc directive declares the beginning of a procedure. Here, the procedure is called main.

The first two statements in the main procedure copy the address of the data segment (@data) into the DS register. The mov instruction always have two operands; first the destination, then the source. (I.e., mov <destination>,<source>)

Next, a character string is written to the screen. This is done by calling the function that displays a string whose address is in the DX register. First, the function number is placed in the AH register (function number 9). Then, the offset address of the start of the string (indicated by the variable HelloMess) is copied to the DX register. Thirdly, an call to interrupt vector 21h is made.

The last two statements (mov ah, 4C00h, and int 21h) halts the program and returns control to the operating system.

The statement main endp uses the endp directive to mark the end of the procedure main. Procedures may not overlap.

The end of the program contains the end directive which is the last line to be assembled. The label main next to it identifies the location of the entry point - that is, the point at which the CPU starts to execute the program.

Standard Assembly Directives

Directive Description
end End of program assembly (Required)
endp End of procedure (Required by proc directive)
page Set a page format for the listing file (Optional)
proc Begin procedure (Optional)
title Title of the listing file (Optional)
.code Marks the start of the code segment (Required)
.data Marks the start of the data segment (Required)
.model Specifies the program's memory model (Highly recommended)
.stack Sets the size of the stack segment (Required)

 

Assembling, Linking and Debugging

 The Assemble-Link-Execute Cycle

A text editor is used to produce the ASCII source file. The assembler reads the source file and produces an object file which is a machine-language translation of the program. The object file may contain several links to subroutines in an external link library. The linker then copies the needed subroutines from the link library into the object file, creates a special header record at the beginning of the program, and produces an executable program

The assembler can optionally produce a listing file, which is a copy of the program's source file (suitable for printing) with line numbers and translated machine code. The linker can optionally produce a map file, which contains information about the program's code, data and stack segments.

A link library is a file containing subroutines that are already compiled into machine language. The table below shows a list of the filenames that would be created if we assembled and linked the program above.

Filename Description When/how created
hello.asm Source program Text Editor
hello.obj Object program Assembler
hello.lst Listing file Assembler
hello.exe Executable program Linker
hello.map Map file Linker

Using Borland Turbo Assembler (TASM)

With Borland Turbo Assembler (TASM), the command to assemble the program would be:

	C:\> tasm /l/n/z hello

The /l/n (slash el, slash n) options produce a listing file, and the /z option is used to show source lines with errors. If there are no assembly errors, the screen output during assembly may look like this:

Turbo Assembler   Version 4.1
Copyright (c) 1988, 1996 Borland International
Assembling file:	hello.ASM
Error messages:		None
Warning messages:	None
Passes:			1
Remaining memory:	418k

This will produce the object file hello.obj and the listing file hello.lst. To link the object file the command will be:

C:\> tlink /3/m/v hello

This will produce the executable file hello.exe and map file hello.map. The /3 option allows the use of 32-bit registers; the /m option creates a map file, and the /v option includes debugging information in the executable program. To run the program, simply type:

C:> hello

To test the program (highly recommended) BEFORE running it, type:

C:> td hello

 

Using Microsoft Assembler (MASM)

The Microsoft Assembler package contains the ML.EXE program, which assembles and links one or more assembly language source files, producing an object file (*.obj), and an executable file (*.exe). The general syntax is:

ML options filename.ASM

Each command line option must be precede by at least one space. For example, the following commands assemble and link hello.asm with different options:

ML /Zi hello.asm	; include debugging information
ML /Fl hello.asm	; produce a listing file (hello.lst)
ML /Fm hello.asm	; produce a map file (hello.map)
ML /Zm hello.asm	; use MASM 5.12 compatibility mode

The following command assembles hello.asm and links hello.obj to the link library linkfile.lib in the C:\MASM directory:

ML /Zi /Zm /Fm /Fl hello.asm /link /co c:\MASM\linkfile

Use the CV command to run hello.exe in the CodeView debugger:

CV hello		; load and run hello.exe
 

Sample Program 2

Create the assembler program file reverse.asm for the following code that displays a user-entered string in reverse. Assemble, link and run the executable file in the CodeView debugger:

Example 2: The Reverse String program:

title Reverse String program 	(reverse.asm)

; This program displays a user-entered string in reverse
.model small
.stack 100h
.data
MAX_STRING_LENGTH	EQU	1000
StringToReverse		DB	MAX_STRING_LENGTH	DUP(?)
ReverseString		DB	MAX_STRING_LENGTH	DUP(?)
.code
main proc
mov	ax, @Data
mov	ds, ax			; Set DS to point to data segment
mov	ah, 3			; Standard input handle
mov	cx, MAX_STRING_LENGTH	; Read to MAX chars
mov	dx, OFFSET StringToReverse	; Store string here
int	21h				; Get the string
and	ax, ax				; Read any characters?
jz	Done				; No, so done
mov	cx,ax			; Put string length in CX, where
				; you can save it as a counter
push	cx			; Save the string length
mov	bx, OFFSET StringToReverse
mov	si, OFFSET ReverseString
add	si, cx
dec	si			; Point to the end of the
				; reverse string buffer
ReverseLoop:
mov	al, [bx]		; Go to the next character
mov	[si], al		; Store the characters in reverse
inc	bx			; Point to the next character
dec	si			; Point to previous location buffer
loop	ReverseLoop		; Move next character, if any
pop	cx			; Get back the string length
mov	ah, 40h			; Write from handle function #
mov	bx, 1			; Standard output handle
mov	dx, OFFSET ReverseString
int	21h			; Print the reversed string
Done:
mov	ah, 4ch			; Terminate program function
int	21h			; Terminate the program
main endp
end main

Last modified 25 July 2002 , by A. Vahed