stringtranslate.com

язык ассемблера x86


Язык ассемблера x86 — это название семейства ассемблерных языков , которые обеспечивают некоторый уровень обратной совместимости с процессорами вплоть до микропроцессора Intel 8008 , выпущенного в апреле 1972 года. [1] [2] Он используется для создания объектного кода для процессоры класса x86 .

Ассемблер считается языком программирования и является машинно-специфичным и низкоуровневым . Как и все языки ассемблера, ассемблер x86 использует мнемонику для представления основных инструкций ЦП или машинного кода . [3] Языки ассемблера чаще всего используются для детальных и критичных по времени приложений, таких как небольшие встроенные системы реального времени , ядра операционных систем и драйверы устройств , но также могут использоваться и для других приложений. Компилятор иногда создает ассемблерный код в качестве промежуточного шага при трансляции программы высокого уровня в машинный код.

Ключевое слово

Зарезервированные ключевые слова языка ассемблера x86 [4] [5]

  • ааа
  • аад
  • ааа
  • аас
  • АЦП
  • добавлять
  • и
  • арпл
  • граница
  • BSF
  • бср
  • замена
  • БТ
  • биткойны
  • БТР
  • БТС
  • вызов
  • между прочим
  • клк
  • cld
  • Кли
  • клтд
  • клтс
  • КМЦ
  • cmp
  • ЦМПС
  • cmpxchg
  • cwtd
  • cwtl
  • даа
  • дас
  • декабрь
  • div
  • входить
  • f2xm1
  • фабрики
  • причуда
  • фаддп
  • fbld
  • фбстп
  • фчс
  • fclex
  • fcom
  • fcomp
  • fcompp
  • fcos
  • fdecstp
  • fdiv
  • fdivp
  • fdivr
  • fdivrp
  • бесплатно
  • Фиадд
  • фиком
  • фикомп
  • фидив
  • фидивр
  • поле
  • фимул
  • финкстп
  • окончание
  • кулак
  • кулак
  • фисубр
  • фисубрп
  • флд
  • флд
  • флдкв
  • флденв
  • fldl2e
  • fldl2t
  • fldlg2
  • флдлн2
  • флдпи
  • флдз
  • fmul
  • fmulp
  • fnclex
  • фнинт
  • фноп
  • fnsave
  • фнстенв
  • фнстью
  • фнстсв
  • фпатан
  • фпрем
  • фпрем
  • фптан
  • фрндинт
  • фрстор
  • fсохранить
  • fscale
  • фсин
  • фсинкос
  • fsqrt
  • первый день
  • фстенв
  • фстью
  • ФСТП
  • фстсв
  • fsub
  • fsubp
  • фсубр
  • fsubrp
  • фтст
  • фуком
  • Фукомп
  • фукомпп
  • ждать
  • fxam
  • валютный курс
  • fxtract
  • fyl2x
  • fyl2xp1
  • хлт
  • идив
  • Имул
  • в
  • Inc.
  • входы
  • интервал
  • в
  • инвд
  • инвлпг
  • ирет
  • jcxz
  • JMP
  • лахф
  • Лар
  • позвоню
  • лдкс
  • Леа
  • оставлять
  • ле
  • лфс
  • лгдт
  • LGS
  • Лидт
  • лджмп
  • ллдт
  • лмсв
  • замок
  • лодочки
  • петля
  • лупнз
  • лупз
  • лрет
  • лсл
  • лсс
  • литр
  • двигаться
  • движется
  • movsx
  • movw
  • мовзб
  • мул
  • отрицательный
  • нет
  • нет
  • или
  • вне
  • ауты
  • поп
  • папа
  • попф
  • толкать
  • пуша
  • толчок
  • ркл
  • ркр
  • представитель
  • репнз
  • репс
  • в отставку
  • роль
  • рор
  • Сахф
  • Сал
  • сар
  • сбб
  • скас
  • setcc
  • сержант
  • шл
  • шлд
  • шр
  • осколок
  • Сидт
  • sldt
  • смс
  • стц
  • стандартный
  • Стти
  • стос
  • ул.
  • суб
  • тест
  • верр
  • верв
  • ждать
  • wbinvd
  • xadd
  • хчг
  • xlat
  • исключающее ИЛИ

Мнемоника и коды операций

Каждая инструкция ассемблера x86 представлена ​​мнемоникой, которая , часто в сочетании с одним или несколькими операндами, преобразуется в один или несколько байтов, называемых кодом операции ; например, инструкция NOP преобразуется в 0x90, а инструкция HLT — в 0xF4. [3] Существуют потенциальные коды операций без документированной мнемоники, которые разные процессоры могут интерпретировать по-разному, из-за чего использующая их программа ведет себя непоследовательно или даже генерирует исключение на некоторых процессорах. Эти коды операций часто появляются на соревнованиях по написанию кода как способ сделать код меньше, быстрее, элегантнее или просто продемонстрировать мастерство автора.

Синтаксис

Язык ассемблера x86 имеет две основные ветви синтаксиса : синтаксис Intel и синтаксис AT&T . [6] Синтаксис Intel доминирует в мире DOS и Windows , а синтаксис AT&T доминирует в мире Unix , поскольку Unix была создана в AT&T Bell Labs . [7] Вот краткое изложение основных различий между синтаксисом Intel и синтаксисом AT&T :

Многие ассемблеры x86 используют синтаксис Intel , включая FASM , MASM , NASM , TASM и YASM. GAS , который изначально использовал синтаксис AT&T , поддерживает оба синтаксиса, начиная с версии 2.10 посредством .intel_syntaxдирективы. [6] [8] [9] Особенностью синтаксиса AT&T для x86 является то, что операнды x87 перевернуты, что является унаследованной ошибкой исходного ассемблера AT&T. [10]

Синтаксис AT&T почти универсален для всех других архитектур (сохраняя тот же movпорядок); изначально это был синтаксис сборки PDP-11. Синтаксис Intel специфичен для архитектуры x86 и используется в документации платформы x86. Intel 8080 , который предшествовал x86, также использует порядок «сначала место назначения» для mov. [11]

Регистры

Процессоры x86 имеют набор регистров, которые можно использовать в качестве хранилищ двоичных данных. В совокупности регистры данных и адреса называются общими регистрами. Каждый регистр имеет особое назначение в дополнение к тому, что они могут делать: [3]

Помимо общих регистров дополнительно имеются:

Регистр IP указывает на смещение памяти следующей инструкции в сегменте кода (он указывает на первый байт инструкции). Программист не может получить прямой доступ к регистру IP.

Регистры x86 можно использовать с помощью инструкций MOV . Например, в синтаксисе Intel:

Мов топор , 1234h ; копирует значение 1234hex (4660d) в регистр AX   
mov bx , топор ; копирует значение регистра AX в регистр BX   

Сегментированная адресация

Архитектура x86 в реальном и виртуальном режиме 8086 использует для адресации памяти процесс, известный как сегментация , а не плоскую модель памяти , используемую во многих других средах. Сегментация предполагает составление адреса памяти из двух частей: сегмента и смещения ; сегмент указывает на начало группы адресов размером 64 КиБ (64×2 10 ), а смещение определяет, насколько далеко от этого начального адреса находится желаемый адрес. При сегментной адресации для полного адреса памяти требуются два регистра. Один для хранения сегмента, другой для смещения. Чтобы преобразовать обратно в плоский адрес, значение сегмента сдвигается на четыре бита влево (что эквивалентно умножению на 2, 4 или 16), а затем добавляется к смещению для формирования полного адреса, что позволяет преодолеть барьер в 64 КБ за счет умного выбора адресов. , хотя это значительно усложняет программирование.

Например, только в реальном режиме /защищенном режиме, если DS содержит шестнадцатеричное число 0xDEAD , а DX содержит число 0xCAFE , они вместе будут указывать на адрес памяти . Таким образом, ЦП может адресовать до 1 048 576 байт (1 МБ) в реальном режиме. Комбинируя значения сегмента и смещения , мы находим 20-битный адрес.0xDEAD * 0x10 + 0xCAFE == 0xEB5CE

Исходный IBM PC ограничивал программы размером 640 КБ, но спецификация расширенной памяти использовалась для реализации схемы переключения банков, которая вышла из употребления, когда более поздние операционные системы, такие как Windows, использовали более широкие диапазоны адресов новых процессоров и реализовывали собственную виртуальную память. схемы.

Защищенный режим, начиная с Intel 80286, использовался OS/2 . Ряд недостатков, таких как невозможность доступа к BIOS и невозможность вернуться в реальный режим без перезагрузки процессора, препятствовали широкому использованию. [12] 80286 также по-прежнему был ограничен адресацией памяти в 16-битных сегментах, то есть одновременно можно было получить доступ только к 216 байтам (64 килобайтам ). Чтобы получить доступ к расширенным функциям 80286, операционная система перевела процессор в защищенный режим, включив 24-битную адресацию и, следовательно, 2 24 байта памяти (16 мегабайт ).

В защищенном режиме селектор сегмента можно разбить на три части: 13-битный индекс, бит индикатора таблицы , определяющий, находится ли запись в GDT или LDT , и 2-битный запрошенный уровень привилегий ; см. сегментацию памяти x86 .

При обращении к адресу с сегментом и смещением используется обозначение сегмент : смещение , поэтому в приведенном выше примере плоский адрес 0xEB5CE может быть записан как 0xDEAD:0xCAFE или как пара регистров сегмента и смещения; ДС:DX.

Существуют некоторые специальные комбинации сегментных регистров и регистров общего назначения, которые указывают на важные адреса:

The Intel 80386 featured three operating modes: real mode, protected mode and virtual mode. The protected mode which debuted in the 80286 was extended to allow the 80386 to address up to 4 GB of memory, the all new virtual 8086 mode (VM86) made it possible to run one or more real mode programs in a protected environment which largely emulated real mode, though some programs were not compatible (typically as a result of memory addressing tricks or using unspecified op-codes).

The 32-bit flat memory model of the 80386's extended protected mode may be the most important feature change for the x86 processor family until AMD released x86-64 in 2003, as it helped drive large scale adoption of Windows 3.1 (which relied on protected mode) since Windows could now run many applications at once, including DOS applications, by using virtual memory and simple multitasking.

Execution modes

The x86 processors support five modes of operation for x86 code, Real Mode, Protected Mode, Long Mode, Virtual 86 Mode, and System Management Mode, in which some instructions are available and others are not. A 16-bit subset of instructions is available on the 16-bit x86 processors, which are the 8086, 8088, 80186, 80188, and 80286. These instructions are available in real mode on all x86 processors, and in 16-bit protected mode (80286 onwards), additional instructions relating to protected mode are available. On the 80386 and later, 32-bit instructions (including later extensions) are also available in all modes, including real mode; on these CPUs, V86 mode and 32-bit protected mode are added, with additional instructions provided in these modes to manage their features. SMM, with some of its own special instructions, is available on some Intel i386SL, i486 and later CPUs. Finally, in long mode (AMD Opteron onwards), 64-bit instructions, and more registers, are also available. The instruction set is similar in each mode but memory addressing and word size vary, requiring different programming strategies.

The modes in which x86 code can be executed in are:

Switching modes

The processor runs in real mode immediately after power on, so an operating system kernel, or other program, must explicitly switch to another mode if it wishes to run in anything but real mode. Switching modes is accomplished by modifying certain bits of the processor's control registers after some preparation, and some additional setup may be required after the switch.

Examples

With a computer running legacy BIOS, the BIOS and the boot loader run in Real mode. The 64-bit operating system kernel checks and switches the CPU into Long mode and then starts new kernel-mode threads running 64-bit code.

With a computer running UEFI, the UEFI firmware (except CSM and legacy Option ROM), the UEFI boot loader and the UEFI operating system kernel all run in Long mode.

Instruction types

In general, the features of the modern x86 instruction set are:

Stack instructions

The x86 architecture has hardware support for an execution stack mechanism. Instructions such as push, pop, call and ret are used with the properly set up stack to pass parameters, to allocate space for local data, and to save and restore call-return points. The ret size instruction is very useful for implementing space efficient (and fast) calling conventions where the callee is responsible for reclaiming stack space occupied by parameters.

When setting up a stack frame to hold local data of a recursive procedure there are several choices; the high level enter instruction (introduced with the 80186) takes a procedure-nesting-depth argument as well as a local size argument, and may be faster than more explicit manipulation of the registers (such as push bp ; mov bp, sp ; sub sp, size). Whether it is faster or slower depends on the particular x86-processor implementation as well as the calling convention used by the compiler, programmer or particular program code; most x86 code is intended to run on x86-processors from several manufacturers and on different technological generations of processors, which implies highly varying microarchitectures and microcode solutions as well as varying gate- and transistor-level design choices.

The full range of addressing modes (including immediate and base+offset) even for instructions such as push and pop, makes direct usage of the stack for integer, floating point and address data simple, as well as keeping the ABI specifications and mechanisms relatively simple compared to some RISC architectures (require more explicit call stack details).

Integer ALU instructions

x86 assembly has the standard mathematical operations, add, sub, neg, imul and idiv (for signed integers), with mul and div (for unsigned integers); the logical operators and, or, xor, not; bitshift arithmetic and logical, sal/sar (for signed integers), shl/shr (for unsigned integers); rotate with and without carry, rcl/rcr, rol/ror, a complement of BCD arithmetic instructions, aaa, aad, daa and others.

Floating-point instructions

x86 assembly language includes instructions for a stack-based floating-point unit (FPU). The FPU was an optional separate coprocessor for the 8086 through the 80386, it was an on-chip option for the 80486 series, and it is a standard feature in every Intel x86 CPU since the 80486, starting with the Pentium. The FPU instructions include addition, subtraction, negation, multiplication, division, remainder, square roots, integer truncation, fraction truncation, and scale by power of two. The operations also include conversion instructions, which can load or store a value from memory in any of the following formats: binary-coded decimal, 32-bit integer, 64-bit integer, 32-bit floating-point, 64-bit floating-point or 80-bit floating-point (upon loading, the value is converted to the currently used floating-point mode). x86 also includes a number of transcendental functions, including sine, cosine, tangent, arctangent, exponentiation with the base 2 and logarithms to bases 2, 10, or e.

The stack register to stack register format of the instructions is usually fop st, st(n) or fop st(n), st, where st is equivalent to st(0), and st(n) is one of the 8 stack registers (st(0), st(1), ..., st(7)). Like the integers, the first operand is both the first source operand and the destination operand. fsubr and fdivr should be singled out as first swapping the source operands before performing the subtraction or division. The addition, subtraction, multiplication, division, store and comparison instructions include instruction modes that pop the top of the stack after their operation is complete. So, for example, faddp st(1), st performs the calculation st(1) = st(1) + st(0), then removes st(0) from the top of stack, thus making what was the result in st(1) the top of the stack in st(0).

SIMD instructions

Modern x86 CPUs contain SIMD instructions, which largely perform the same operation in parallel on many values encoded in a wide SIMD register. Various instruction technologies support different operations on different register sets, but taken as complete whole (from MMX to SSE4.2) they include general computations on integer or floating-point arithmetic (addition, subtraction, multiplication, shift, minimization, maximization, comparison, division or square root). So for example, paddw mm0, mm1 performs 4 parallel 16-bit (indicated by the w) integer adds (indicated by the padd) of mm0 values to mm1 and stores the result in mm0. Streaming SIMD Extensions or SSE also includes a floating-point mode in which only the very first value of the registers is actually modified (expanded in SSE2). Some other unusual instructions have been added including a sum of absolute differences (used for motion estimation in video compression, such as is done in MPEG) and a 16-bit multiply accumulation instruction (useful for software-based alpha-blending and digital filtering). SSE (since SSE3) and 3DNow! extensions include addition and subtraction instructions for treating paired floating-point values like complex numbers.

These instruction sets also include numerous fixed sub-word instructions for shuffling, inserting and extracting the values around within the registers. In addition there are instructions for moving data between the integer registers and XMM (used in SSE)/FPU (used in MMX) registers.

Memory instructions

The x86 processor also includes complex addressing modes for addressing memory with an immediate offset, a register, a register with an offset, a scaled register with or without an offset, and a register with an optional offset and another scaled register. So for example, one can encode mov eax, [Table + ebx + esi*4] as a single instruction which loads 32 bits of data from the address computed as (Table + ebx + esi * 4) offset from the ds selector, and stores it to the eax register. In general x86 processors can load and use memory matched to the size of any register it is operating on. (The SIMD instructions also include half-load instructions.)

Most 2-operand x86 instructions, including integer ALU instructions, use a standard "addressing mode byte"[13]often called the MOD-REG-R/M byte.[14][15][16]Many 32-bit x86 instructions also have a SIB addressing mode byte that follows the MOD-REG-R/M byte.[17][18][19][20][21]

In principle, because the instruction opcode is separate from the addressing mode byte, those instructions are orthogonal because any of those opcodes can be mixed-and-matched with any addressing mode. However, the x86 instruction set is generally considered non-orthogonal because many other opcodes have some fixed addressing mode (they have no addressing mode byte), and every register is special.[21][22]

The x86 instruction set includes string load, store, move, scan and compare instructions (lods, stos, movs, scas and cmps) which perform each operation to a specified size (b for 8-bit byte, w for 16-bit word, d for 32-bit double word) then increments/decrements (depending on DF, direction flag) the implicit address register (si for lods, di for stos and scas, and both for movs and cmps). For the load, store and scan operations, the implicit target/source/comparison register is in the al, ax or eax register (depending on size). The implicit segment registers used are ds for si and es for di. The cx or ecx register is used as a decrementing counter, and the operation stops when the counter reaches zero or (for scans and comparisons) when inequality is detected. Unfortunately, over the years the performance of some of these instructions became neglected and in certain cases it is now possible to get faster results by writing out the algorithms yourself. Intel and AMD have refreshed some of the instructions though, and a few now have very respectable performance, so it is recommended that the programmer should read recent respected benchmark articles before choosing to use a particular instruction from this group.

The stack is a region of memory and an associated ‘stack pointer’, which points to the bottom of the stack. The stack pointer is decremented when items are added (‘push’) and incremented after things are removed (‘pop’). In 16-bit mode, this implicit stack pointer is addressed as SS:[SP], in 32-bit mode it is SS:[ESP], and in 64-bit mode it is [RSP]. The stack pointer actually points to the last value that was stored, under the assumption that its size will match the operating mode of the processor (i.e., 16, 32, or 64 bits) to match the default width of the push/pop/call/ret instructions. Also included are the instructions enter and leave which reserve and remove data from the top of the stack while setting up a stack frame pointer in bp/ebp/rbp. However, direct setting, or addition and subtraction to the sp/esp/rsp register is also supported, so the enter/leave instructions are generally unnecessary.

This code is the beginning of a function typical for a high-level language when compiler optimisation is turned off for ease of debugging:

 push rbp ; Save the calling function’s stack frame pointer (rbp register) mov rbp, rsp ; Make a new stack frame below our caller’s stack sub rsp, 32 ; Reserve 32 bytes of stack space for this function’s local variables. ; Local variables will be below rbp and can be referenced relative to rbp, ; again best for ease of debugging, but for best performance rbp will not ; be used at all, and local variables would be referenced relative to rsp ; because, apart from the code saving, rbp then is free for other uses.   ; However, if rbp is altered here, its value should be preserved for the caller. mov [rbp-8], rdx ; Example of accessing a local variable, from memory location into register rdx

...is functionally equivalent to just:

 enter 32, 0

Other instructions for manipulating the stack include pushfd(32-bit) / pushfq(64-bit) and popfd/popfq for storing and retrieving the EFLAGS (32-bit) / RFLAGS (64-bit) register.

Values for a SIMD load or store are assumed to be packed in adjacent positions for the SIMD register and will align them in sequential little-endian order. Some SSE load and store instructions require 16-byte alignment to function properly. The SIMD instruction sets also include "prefetch" instructions which perform the load but do not target any register, used for cache loading. The SSE instruction sets also include non-temporal store instructions which will perform stores straight to memory without performing a cache allocate if the destination is not already cached (otherwise it will behave like a regular store.)

Most generic integer and floating-point (but no SIMD) instructions can use one parameter as a complex address as the second source parameter. Integer instructions can also accept one memory parameter as a destination operand.

Program flow

The x86 assembly has an unconditional jump operation, jmp, which can take an immediate address, a register or an indirect address as a parameter (note that most RISC processors only support a link register or short immediate displacement for jumping).

Also supported are several conditional jumps, including jz (jump on zero), jnz (jump on non-zero), jg (jump on greater than, signed), jl (jump on less than, signed), ja (jump on above/greater than, unsigned), jb (jump on below/less than, unsigned). These conditional operations are based on the state of specific bits in the (E)FLAGS register. Many arithmetic and logic operations set, clear or complement these flags depending on their result. The comparison cmp (compare) and test instructions set the flags as if they had performed a subtraction or a bitwise AND operation, respectively, without altering the values of the operands. There are also instructions such as clc (clear carry flag) and cmc (complement carry flag) which work on the flags directly. Floating point comparisons are performed via fcom or ficom instructions which eventually have to be converted to integer flags.

Each jump operation has three different forms, depending on the size of the operand. A short jump uses an 8-bit signed operand, which is a relative offset from the current instruction. A near jump is similar to a short jump but uses a 16-bit signed operand (in real or protected mode) or a 32-bit signed operand (in 32-bit protected mode only). A far jump is one that uses the full segment base:offset value as an absolute address. There are also indirect and indexed forms of each of these.

In addition to the simple jump operations, there are the call (call a subroutine) and ret (return from subroutine) instructions. Before transferring control to the subroutine, call pushes the segment offset address of the instruction following the call onto the stack; ret pops this value off the stack, and jumps to it, effectively returning the flow of control to that part of the program. In the case of a far call, the segment base is pushed following the offset; far ret pops the offset and then the segment base to return.

There are also two similar instructions, int (interrupt), which saves the current (E)FLAGS register value on the stack, then performs a far call, except that instead of an address, it uses an interrupt vector, an index into a table of interrupt handler addresses. Typically, the interrupt handler saves all other CPU registers it uses, unless they are used to return the result of an operation to the calling program (in software called interrupts). The matching return from interrupt instruction is iret, which restores the flags after returning. Soft Interrupts of the type described above are used by some operating systems for system calls, and can also be used in debugging hard interrupt handlers. Hard interrupts are triggered by external hardware events, and must preserve all register values as the state of the currently executing program is unknown. In Protected Mode, interrupts may be set up by the OS to trigger a task switch, which will automatically save all registers of the active task.

Examples

The following examples use the so-called Intel-syntax flavor as used by the assemblers Microsoft MASM, NASM and many others. (Note: There is also an alternative AT&T-syntax flavor where the order of source and destination operands are swapped, among many other differences.)[23]

"Hello world!" program for MS-DOS in MASM-style assembly

Using the software interrupt 21h instruction to call the MS-DOS operating system for output to the display – other samples use libc's C printf() routine to write to stdout. Note that the first example, is a 30-year-old example using 16-bit mode as on an Intel 8086. The second example is Intel 386 code in 32-bit mode. Modern code will be in 64-bit mode.[24]

.model small.stack 100h.datamsgdb'Hello world!$'.codestart: mov ax, @DATA ; Initializes Data segment mov ds, axmovah, 09h ; Sets 8-bit register ‘ah’, the high byte of register ax, to 9, to ; select a sub-function number of an MS-DOS routine called below ; via the software interrupt int 21h to display a messageleadx, msg ; Takes the address of msg, stores the address in 16-bit register dxint21h ; Various MS-DOS routines are callable by the software interrupt 21h ; Our required sub-function was set in register ah abovemovax, 4C00h ; Sets register ax to the sub-function number for MS-DOS’s software ; interrupt int 21h for the service ‘terminate program’.int21h ; Calling this MS-DOS service never returns, as it ends the program.end start

"Hello world!" program for Windows in MASM style assembly

; requires /coff switch on 6.15 and earlier versions.386.model small,c.stack 1000h.datamsg db "Hello world!",0.codeincludelib libcmt.libincludelib libvcruntime.libincludelib libucrt.libincludelib legacy_stdio_definitions.libextrn printf:nearextrn exit:nearpublic mainmain proc push offset msg call printf push 0 call exitmain endpend

"Hello world!" program for Windows in NASM style assembly

; Image base = 0x00400000%define RVA(x) (x-0x00400000)section .textpush dword hellocall dword [printf]push byte +0call dword [exit]retsection .datahello db "Hello world!"section .idatadd RVA(msvcrt_LookupTable)dd -1dd 0dd RVA(msvcrt_string)dd RVA(msvcrt_imports)times 5 dd 0 ; ends the descriptor tablemsvcrt_string dd "msvcrt.dll", 0msvcrt_LookupTable:dd RVA(msvcrt_printf)dd RVA(msvcrt_exit)dd 0msvcrt_imports:printf dd RVA(msvcrt_printf)exit dd RVA(msvcrt_exit)dd 0msvcrt_printf:dw 1dw "printf", 0msvcrt_exit:dw 2dw "exit", 0dd 0

.data ; section for initialized datastr: .ascii "Hello, world!\n" ; define a string of text containing "Hello, world!" and then a new line.str_len = . - str ; get the length of str by subtracting its address.text ; section for program functions.globl _start ; export the _start function so it can be run_start: ; begin the _start function movl $4, %eax ; specify the instruction to 'sys_write' movl $1, %ebx ; specify the output to the standard output, 'stdout' movl $str, %ecx ; specify the outputted text to our defined string movl $str_len, %edx ; specify the character amount to write as the length of our defined string. int $0x80 ; call a system interrupt to initiate the syscall we have created. movl $1, %eax ; specify the instruction to 'sys_exit' movl $0, %ebx ; specify the exit code to 0, meaning success int $0x80 ; call another system interrup to end the program

"Hello world!" program for Linux in NASM style assembly

;; This program runs in 32-bit protected mode.; build: nasm -f elf -F stabs name.asm; link: ld -o name name.o;; In 64-bit long mode you can use 64-bit registers (e.g. rax instead of eax, rbx instead of ebx, etc.); Also change "-f elf " for "-f elf64" in build command.;section .data ; section for initialized datastr: db 'Hello world!', 0Ah ; message string with new-line char at the end (10 decimal)str_len: equ $ - str ; calcs length of string (bytes) by subtracting the str's start address ; from ‘here, this address’ (‘$’ symbol meaning ‘here’)section .text ; this is the code section (program text) in memory global _start ; _start is the entry point and needs global scope to be 'seen' by the ; linker --equivalent to main() in C/C++_start: ; definition of _start procedure begins heremoveax, 4 ; specify the sys_write function code (from OS vector table)movebx, 1 ; specify file descriptor stdout --in gnu/linux, everything's treated as a file, ; even hardware devicesmovecx, str ; move start _address_ of string message to ecx registermovedx, str_len ; move length of message (in bytes)int80h ; interrupt kernel to perform the system call we just set up - ; in gnu/linux services are requested through the kernelmoveax, 1 ; specify sys_exit function code (from OS vector table)movebx, 0 ; specify return code for OS (zero tells OS everything went fine)int80h ; interrupt kernel to perform system call (to exit)

For 64-bit long mode, "lea rcx, str" would be the address of the message, note 64-bit register rcx.

"Hello world!" program for Linux in NASM style assembly using the C standard library

;; This program runs in 32-bit protected mode.; gcc links the standard-C library by default; build: nasm -f elf -F stabs name.asm; link: gcc -o name name.o;; In 64-bit long mode you can use 64-bit registers (e.g. rax instead of eax, rbx instead of ebx, etc..); Also change "-f elf " for "-f elf64" in build command.; global main ; ‘main’ must be defined, as it being compiled ; against the C Standard Library extern printf ; declares the use of external symbol, as printf ; printf is declared in a different object-module. ; The linker resolves this symbol later.segment .data ; section for initialized datastring db 'Hello world!', 0Ah, 0 ; message string ending with a newline char (10 ; decimal) and the zero byte ‘NUL’ terminator ; ‘string’ now refers to the starting address ; at which 'Hello, World' is stored.segment .textmain: push string ; Push the address of ‘string’ onto the stack. ; This reduces esp by 4 bytes before storing ; the 4-byte address ‘string’ into memory at ; the new esp, the new bottom of the stack. ; This will be an argument to printf() call printf ; calls the C printf() function. add esp, 4 ; Increases the stack-pointer by 4 to put it back ; to where it was before the ‘push’, which ; reduced it by 4 bytes. ret ; Return to our caller.

"Hello world!" program for 64-bit mode Linux in NASM style assembly

This example is in modern 64-bit mode.

; build: nasm -f elf64 -F dwarf hello.asm; link: ld -o hello hello.oDEFAULT REL ; use RIP-relative addressing modes by default, so [foo] = [rel foo]SECTION .rodata; read-only data should go in the .rodata section on GNU/Linux, like .rdata on WindowsHello:db "Hello world!", 10 ; Ending with a byte 10 = newline (ASCII LF)len_Hello:equ $-Hello ; Get NASM to calculate the length as an assembly-time constant ; the ‘$’ symbol means ‘here’. write() takes a length so that ; a zero-terminated C-style string isn't needed. ; It would be for C puts()SECTION .rodata; read-only data can go in the .rodata section on GNU/Linux, like .rdata on WindowsHello:db "Hello world!",10 ; 10 = `\n`.len_Hello:equ $-Hello ; get NASM to calculate the length as an assemble-time constant;; write() takes a length so a 0-terminated C-style string isn't needed. It would be for putsSECTION .textglobal _start_start:mov eax, 1; __NR_write syscall number from Linux asm/unistd_64.h (x86_64)mov edi, 1; int fd = STDOUT_FILENOlea rsi, [rel Hello]; x86-64 uses RIP-relative LEA to put static addresses into regsmov rdx, len_Hello; size_t count = len_Hellosyscall; write(1, Hello, len_Hello); call into the kernel to actually do the system call ;; return value in RAX. RCX and R11 are also overwritten by syscallmov eax, 60; __NR_exit call number (x86_64) is stored in register eax.xor edi, edi ; This zeros edi and also rdi. ; This xor-self trick is the preferred common idiom for zeroing ; a register, and is always by far the fastest method. ; When a 32-bit value is stored into eg edx, the high bits 63:32 are ; automatically zeroed too in every case. This saves you having to set ; the bits with an extra instruction, as this is a case very commonly ; needed, for an entire 64-bit register to be filled with a 32-bit value. ; This sets our routine’s exit status = 0 (exit normally)syscall; _exit(0)

Running it under strace verifies that no extra system calls are made in the process. The printf version would make many more system calls to initialize libc and do dynamic linking. But this is a static executable because we linked using ld without -pie or any shared libraries; the only instructions that run in user-space are the ones you provide.

$ strace ./hello > /dev/null # without a redirect, your program's stdout is mixed with strace's logging on stderr. Which is normally fineexecve("./hello", ["./hello"], 0x7ffc8b0b3570 /* 51 vars */) = 0write(1, "Hello world!\n", 13) = 13exit(0) = ?+++ exited with 0 +++

Using the flags register

Flags are heavily used for comparisons in the x86 architecture. When a comparison is made between two data, the CPU sets the relevant flag or flags. Following this, conditional jump instructions can be used to check the flags and branch to code that should run, e.g.:

cmpeax, ebxjnedo_something; ...do_something:; do something here

Aside, from compare instructions, there are a great many arithmetic and other instructions that set bits in the flags register. Other examples are the instructions sub, test and add and there are many more. Common combinations such as cmp + conditional jump are internally ‘fused’ (‘macro fusion’) into one single micro-instruction (μ-op) and are fast provided the processor can guess which way the conditional jump will go, jump vs continue.

The flags register are also used in the x86 architecture to turn on and off certain features or execution modes. For example, to disable all maskable interrupts, you can use the instruction:

cli

The flags register can also be directly accessed. The low 8 bits of the flag register can be loaded into ah using the lahf instruction. The entire flags register can also be moved on and off the stack using the instructions pushfd/pushfq, popfd/popfq, int (including into) and iret.

The x87 floating point maths subsystem also has its own independent ‘flags’-type register the fp status word. In the 1990s it was an awkward and slow procedure to access the flag bits in this register, but on modern processors there are ‘compare two floating point values’ instructions that can be used with the normal conditional jump/branch instructions directly without any intervening steps.

Using the instruction pointer register

The instruction pointer is called ip in 16-bit mode, eip in 32-bit mode, and rip in 64-bit mode. The instruction pointer register points to the address of the next instruction that the processor will attempt to execute. It cannot be directly accessed in 16-bit or 32-bit mode, but a sequence like the following can be written to put the address of next_line into eax (32-bit code):

callnext_linenext_line:popeax

Writing to the instruction pointer is simple — a jmp instruction stores the given target address into the instruction pointer to, so, for example, a sequence like the following will put the contents of rax into rip (64-bit code):

jmprax

In 64-bit mode, instructions can reference data relative to the instruction pointer, so there is less need to copy the value of the instruction pointer to another register.

See also

References

  1. ^ "Intel 8008 (i8008) microprocessor family". www.cpu-world.com. Retrieved 2021-03-25.
  2. ^ "Intel 8008". CPU MUSEUM - MUSEUM OF MICROPROCESSORS & DIE PHOTOGRAPHY. Retrieved 2021-03-25.
  3. ^ a b c "Intel 8008 OPCODES". www.pastraiser.com. Retrieved 2021-03-25.
  4. ^ "Assembler language reference". www.ibm.com. Retrieved 2022-11-28.
  5. ^ "x86 Assembly Language Reference Manual" (PDF).
  6. ^ a b c d e Narayam, Ram (2007-10-17). "Linux assemblers: A comparison of GAS and NASM". IBM. Archived from the original on October 3, 2013. Retrieved 2008-07-02.
  7. ^ "The Creation of Unix". Archived from the original on April 2, 2014.
  8. ^ Hyde, Randall. "Which Assembler is the Best?". Retrieved 2008-05-18.
  9. ^ "GNU Assembler News, v2.1 supports Intel syntax". 2008-04-04. Retrieved 2008-07-02.
  10. ^ "i386-Bugs (Using as)". Binutils documentation. Retrieved 15 January 2020.
  11. ^ "Intel 8080 Assembly Language Programming Manual" (PDF). Retrieved 12 May 2023.
  12. ^ Mueller, Scott (March 24, 2006). "P2 (286) Second-Generation Processors". Upgrading and Repairing PCs, 17th Edition (Book) (17 ed.). Que. ISBN 0-7897-3404-4. Retrieved 2017-12-06.
  13. ^ Curtis Meadow. "Encoding of 8086 Instructions".
  14. ^ Igor Kholodov. "6. Encoding x86 Instruction Operands, MOD-REG-R/M Byte".
  15. ^ "Encoding x86 Instructions".
  16. ^ Michael Abrash. "Zen of Assembly Language: Volume I, Knowledge". "Chapter 7: Memory Addressing". Section "mod-reg-rm Addressing".
  17. ^ Intel 80386 Reference Programmer's Manual. "17.2.1 ModR/M and SIB Bytes"
  18. ^ "X86-64 Instruction Encoding: ModR/M and SIB bytes"
  19. ^ "Figure 2-1. Intel 64 and IA-32 Architectures Instruction Format".
  20. ^ "x86 Addressing Under the Hood".
  21. ^ a b Stephen McCamant. "Manual and Automated Binary Reverse Engineering".
  22. ^ "X86 Instruction Wishlist".
  23. ^ Peter Cordes (18 December 2011). "NASM (Intel) versus AT&T Syntax: what are the advantages?". Stack Overflow.
  24. ^ "I just started Assembly". daniweb.com. 2008.

Further reading

Manuals

Books