Assembly Language Basics

_assembly__
Assembly Language
Technical

<asm> Assembly Language;

The closest you can get to the metal — a complete guide from registers to real programs

Posted by MB Blogger  ·  Estimated read: 12 min  ·  Category: Low-Level Programming

01. What is Assembly Language?

Assembly language is a low-level programming language with a very strong correspondence between its instructions and the machine code instructions of a specific computer architecture. Unlike high-level languages (Python, Java, C++), there is almost a 1-to-1 mapping between an assembly instruction and a machine code opcode.

Every CPU architecture has its own assembly language — x86, ARM, RISC-V, MIPS — all speak different dialects. An assembler translates human-readable assembly mnemonics into binary machine code the CPU can execute directly.

// Key Insight Assembly sits one level above machine code (raw binary) and one level below compiled languages like C. It gives you full control over every byte the processor touches.

02. Why Learn Assembly?

Use CaseWhy Assembly Matters
OS Kernel DevelopmentBoot loaders, interrupt handlers, context switching require direct hardware control
Reverse EngineeringDisassembling malware or closed-source binaries produces assembly output
Embedded SystemsMicrocontrollers with KB of RAM can't afford compiler overhead
Cryptography & SIMDHand-optimised vector instructions outperform compiler output
Security ResearchShellcode, buffer overflow exploits, ROP chains are written in assembly
Understanding CompilersReading compiler output helps write faster high-level code

03. Architecture Overview

x86 (32-bit)

Intel's classic architecture, dominant in PCs from the 1980s through the 2000s. Uses registers EAX, EBX, ECX, EDX. Instructions are variable-length (CISC), making encoding complex but code compact.

x86-64 (64-bit)

AMD's 64-bit extension of x86, now universal on desktops and servers. Registers are prefixed with R (RAX, RBX…). Adds 8 new general-purpose registers (R8–R15). The System V AMD64 ABI governs Linux; Microsoft has its own calling convention.

ARM (AArch64)

RISC architecture dominant in smartphones, tablets, Apple Silicon, and embedded systems. Fixed-length 32-bit instructions, 31 general-purpose registers (X0–X30). Designed for power efficiency.

// Note This guide focuses on x86-64 (Intel/AMD) on Linux using NASM syntax — the most common starting point for PC-focused learners.

04. CPU Registers

Registers are tiny, lightning-fast storage locations inside the CPU. In x86-64, each 64-bit register has 32-bit, 16-bit, and 8-bit sub-names:

Register Naming (RAX family)
; 64-bit  32-bit  16-bit  8-bit
  RAXEAXAXAL / AH
RAX
Accumulator – return values, arithmetic
RBX
Base – callee-saved general purpose
RCX
Counter – loop counter, 4th arg
RDX
Data – I/O, 3rd arg, div remainder
RSI
Source Index – 2nd arg, string ops
RDI
Destination Index – 1st arg
RSP
Stack Pointer – top of stack
RBP
Base Pointer – stack frame base
R8–R15
Extra general-purpose (64-bit only)
RIP
Instruction Pointer – next instruction
RFLAGS
Status flags – ZF, CF, SF, OF…
XMM0–15
128-bit SIMD / floating point regs

05. Memory Model & Addressing Modes

x86-64 uses a flat memory model — all memory is one large address space. How you reference memory is called an addressing mode:

Addressing Modes (NASM syntax)
; Immediate – literal value
mov  rax, 42

; Register – value in a register
mov  rbx, rax

; Direct memory – fixed address
mov  rax, [0x601020]

; Register indirect – address in register
mov  rax, [rbx]

; Base + offset – struct field access
mov  rax, [rbx + 8]

; Base + index * scale + disp – array access
mov  rax, [rbx + rcx*8 + 16]

Memory Segments

SegmentContentsPermissions
.textExecutable instructionsRead + Execute
.dataInitialised global variablesRead + Write
.bssUninitialised globals (zeroed)Read + Write
StackLocal vars, return addressesRead + Write
HeapDynamic allocations (malloc)Read + Write

06. Core Instruction Set

Data Movement

MOV dst, srcCopy value from src to dst
PUSH srcPush onto stack; RSP -= 8
POP dstPop from stack; RSP += 8
LEA dst, [mem]Load effective address (pointer arithmetic)
XCHG a, bAtomically swap two operands

Arithmetic

ADD dst, srcdst = dst + src; sets flags
SUB dst, srcdst = dst - src; sets flags
MUL srcUnsigned: RDX:RAX = RAX × src
IMUL srcSigned multiply
DIV srcUnsigned: quotient→RAX, rem→RDX
INC / DECIncrement or decrement by 1
NEG dstTwo's complement negation

Bitwise & Logic

AND dst, srcBitwise AND
OR dst, srcBitwise OR
XOR dst, srcBitwise XOR (XOR reg,reg zeros it)
NOT dstBitwise NOT (one's complement)
SHL / SHRShift left / right (multiply/divide by 2ⁿ)
ROL / RORRotate bits left / right

Comparison & Flags

CMP a, bSUB without storing result; sets flags
TEST a, bAND without storing result; sets flags

07. Program Structure

NASM x86-64 Program Skeleton
; ── Declarations ──────────────────────────
global _start          ; entry point for linker
extern printf           ; external C function

; ── Data section ──────────────────────────
section .data
  msg   db "Hello!", 0x0A  ; string + newline
  msglen equ $ - msg      ; calc length at assemble time
  count  dq 0             ; 64-bit integer, value 0

; ── BSS section ───────────────────────────
section .bss
  buffer resb 64          ; reserve 64 bytes (uninitialised)

; ── Code section ──────────────────────────
section .text
_start:
  ; ... your code here ...
  mov  rax, 60            ; syscall: exit
  xor  rdi, rdi           ; exit code 0
  syscall

NASM data definition directives: db (byte), dw (word/2B), dd (dword/4B), dq (qword/8B). Reserve uninitialised space with resb/resw/resd/resq.

08. Hello, World! — Complete Program

hello.asm — x86-64 Linux (NASM)
global _start

section .data
  msg    db  "Hello, World!", 0x0A
  msglen equ  $ - msg

section .text
_start:
  ; write(1, msg, msglen)
  mov   rax, 1        ; syscall number: sys_write
  mov   rdi, 1        ; file descriptor: stdout
  mov   rsi, msg      ; pointer to message
  mov   rdx, msglen   ; number of bytes to write
  syscall

  ; exit(0)
  mov   rax, 60       ; syscall number: sys_exit
  xor   rdi, rdi      ; status = 0
  syscall
Build & Run (terminal)
# Assemble into object file
nasm -f elf64 hello.asm -o hello.o

# Link into executable
ld hello.o -o hello

# Execute
./hello
Hello, World!
// How it works Linux system calls are invoked with the syscall instruction. RAX holds the syscall number, and up to 6 arguments go in RDI, RSI, RDX, R10, R8, R9.

09. Control Flow

Unconditional Jump

JMP
jmp loop_start     ; always jump to label

Conditional Jumps (after CMP/TEST)

JE / JZJump if equal / zero flag set
JNE / JNZJump if not equal / not zero
JL / JGJump if less / greater (signed)
JLE / JGEJump if less-equal / greater-equal
JB / JAJump if below / above (unsigned)

Loop Example — Sum 1 to 10

loop_sum.asm
_start:
  xor  rax, rax      ; sum = 0
  mov  rcx, 1         ; counter i = 1

.loop:
  add  rax, rcx      ; sum += i
  inc  rcx            ; i++
  cmp  rcx, 11        ; compare i to 11
  jl   .loop           ; if i < 11, repeat
                        ; rax = 55

LOOP Instruction

The LOOP instruction decrements RCX and jumps if RCX ≠ 0 — a compact countdown:

LOOP example
  mov rcx, 5
.repeat:
  ; body executes 5 times
  loop .repeat

10. Stack & Calling Conventions

The stack grows downward in x86-64. RSP always points to the topmost (lowest address) valid data.

Stack Operations
push rax    ; RSP -= 8 ; [RSP] = RAX
pop  rbx    ; RBX = [RSP] ; RSP += 8

System V AMD64 ABI (Linux calling convention)

Parameter #RegisterNote
1stRDI
2ndRSI
3rdRDX
4thRCX
5thR8
6thR9
7th+StackRight to left
Return valueRAX64-bit integer result

Function Prologue & Epilogue

Standard Function Frame
my_func:
  ; ── Prologue ──
  push  rbp            ; save caller's base pointer
  mov   rbp, rsp       ; establish our frame
  sub   rsp, 32        ; allocate 32 bytes of locals

  ; ... function body ...

  ; ── Epilogue ──
  mov   rsp, rbp       ; restore stack pointer
  pop   rbp            ; restore caller's base pointer
  ret                   ; pop return address → RIP
// Stack Alignment The System V ABI requires RSP to be 16-byte aligned before a call instruction. After call pushes the return address (8 bytes), RSP is misaligned by 8. The prologue's push rbp restores 16-byte alignment.

11. Tools: Assemblers & Debuggers

ToolTypeDescription
NASMAssemblerNetwide Assembler — most popular, clean Intel syntax, multiplatform
GAS (GNU as)AssemblerGNU Assembler — uses AT&T syntax by default; part of binutils
MASMAssemblerMicrosoft Macro Assembler — Windows only, Intel syntax
YASMAssemblerRewrite of NASM with extra features
GDBDebuggerGNU Debugger — step through instructions, inspect registers/memory
LLDBDebuggerLLVM debugger — default on macOS, clean TUI
Radare2DisassemblerOpen-source reverse engineering framework
GhidraDecompilerNSA-released reversing suite with pseudo-C decompiler
godbolt.orgOnline IDECompiler Explorer — see assembly output for any C/C++ snippet
Quick GDB Cheat Sheet
# Disassemble function
(gdb) disas _start

# Set breakpoint at label
(gdb) b _start

# Step one instruction
(gdb) si

# Print all registers
(gdb) info registers

# Examine 8 bytes at RSP
(gdb) x/8xb $rsp

# Print RAX in hex
(gdb) p/x $rax

12. Real-World Applications

1. Inline Assembly in C

GCC Inline ASM
int add(int a, int b) {
  int result;
  __asm__ volatile (
    "addl %2, %1\n\t"
    "movl %1, %0"
    : "=r"(result)     // output
    : "r"(a), "r"(b)  // inputs
  );
  return result;
}

2. SIMD Optimisation

Using SSE/AVX instructions, you can process 4, 8, or 16 integers in a single clock cycle — critical for image processing, audio, and machine learning kernels.

SSE2: Add 4 floats at once
; Load 4×32-bit floats from memory into XMM registers
movaps  xmm0, [rsi]      ; load A[0..3]
movaps  xmm1, [rdx]      ; load B[0..3]
addps   xmm0, xmm1       ; xmm0 = A + B (4 additions, 1 instr)
movaps  [rdi], xmm0      ; store result

3. OS Bootloader

The first 512 bytes loaded by the BIOS (the MBR) must be raw x86 code. Every operating system starts with a tiny assembly stub before switching to 32/64-bit protected mode and calling into C.

// Bottom Line Assembly is a superpower — it unlocks the CPU's full capability and gives you complete visibility into what your code actually does. Even a surface understanding of assembly makes you a significantly better programmer in any language.
</asm>   Written for programmers who want to go deeper   ·   Copy & paste into Blogger HTML editor

Post a Comment

0 Comments