Introduction to ARM Assembly

This article was originally published on Medium.

ARM Assembly programming header image showing processor chip and circuit board design representing low-level computer architecture

Computers only understand binary-coded data. This data can contain a wide variety of information. To understand these instructions a computer would need to process it and make ‘sense’ out of it. The CPU of the computer is specifically designed for this task. As we all know, computers to do anything need to be given specific instructions and these instructions are provided to the computer using programming languages. However, there is more to this. Most of the programming languages are high-level languages which is more like English. CPUs don’t understand English, so we process the high-level programming languages and convert them into Assembly language. This is closer to what a computer would understand but far from what it could process. A typical assembly code consists of mnemonics which represent OPCODES operations and operands. This is understood by another program called Assembler.

What are OPCODES?

Opcodes are specific and limited instructions that the CPU of a particular architecture understands and can process and finally execute. Every CPU is designed differently, with different arrangements of registers of different sizes of the registers. Manufacturers like Intel, AMD, MediaTek, etc release a document with their CPUs which contains guidelines on how to program the specific model and generation of CPU. This also includes the set of operations the CPU can perform i.e., OPCODES. This is usually referred to as ISA (Instruction Set Architecture). We shall discuss all these in detail in the CPU architecture section of the course.

CPU Architecture — ARM cores

ARM processors follow RISC (Reduced Instruction Set Computer) architecture. Most instructions execute in a single cycle, orthogonal register set, and load-store architecture. In other words, RISC philosophy tries to reduce the complexity of instruction and only allow simple instructions in the design. This architecture tries to decrease the number of clock cycles needed to complete an instruction and thus increase the frequency of instructions.

To modify any variable in memory, you need to load the value of the memory in the core and then process/modify it and put it back in the memory. The other architectures out there have some ways of directly modifying the values in the memory, whereas ARM is designed differently. Thus, ARM can only perform LOAD and STORE operations in the memory (which makes more sense to the general perception of what memory is designed for).

Most internal registers are 32-bit registers except some registers for processing vectors. They are generally processed by 32-bit ALU and memory is generally manipulated in 32-bit units i.e. the word size for ARM is 32 bits (fixed).

WORD SIZE = 32 bits
HALFWORD SIZE = 16bits
DOUBLEWORD SIZE = 64 bits

ARM ISA has 32 32-bit ARM instruction sets and 16/32-bit Thumb ISA that provides better code density. ARM processors could switch between ARM ISA and Thumb ISA, termed as Interworking. This is automatically handled by the compiler/linker and the programmer need not worry about it.

Operating Modes

Most processors have two different modes of operation, one of which is privileged and the other is not, for example, you might have heard of User mode and Supervisor Mode. Wherein user mode fewer things can be done as compared to running in the privileged mode. This is specifically done to prevent critical unintended fatal operations for safety and security purposes.

ARM cores have 7 specific operating modes: Each of these has its own stack space and a subset of private registers. This is known as Register banking which makes handling nested exceptions efficient.

Diagram showing the 7 ARM processor operating modes: User, FIQ, IRQ, Supervisor, Abort, Undefined, and System modes with their privilege levels

Table showing ARM processor modes with their mode bits, privilege levels, and register banking configuration

Source: ARM Instruction Set Reference Guide v1.0

Thus the unprivileged mode is where the OS and user tasks execute. We will focus more on the registers sets for the User mode as we will usually deal with these registers.

User mode registers

Diagram of ARM User Mode registers showing R0-R15 general purpose registers, including Stack Pointer (R13), Link Register (R14), and Program Counter (R15)

When the operation mode is changed the R13 and R14 i.e. the Stack pointer and the Link Register change because as mentioned before, each mode has its own space in the stack and its subset of private registers.

In AAArch64 ISA their registers are referred differently. The architectural names are referred to with the prefix R while in 64-bit ARM processors, it could be referred to as W or X where W refers to size 32 while X refers to 64-bit size. For example, register R0 could be referred to as W0 or X0.

AArch64 register naming convention diagram showing W (32-bit) and X (64-bit) register prefixes in 64-bit ARM architecture

ARM OPCODES

As discussed before ARM follows RISC architecture thus only two operations can be performed on the memory i.e. LOAD and STORE. Let’s look at all the other most common instructions provided in the AAArch64 architecture.

Overview of ARM OPCODES showing the main instruction categories including data processing, load/store, and branch instructions

MOV / MVN — Move

MOV r8, #10            
MVN r9, r8
                      ; > [ mov / mvn ]  - instruction
                      ; > [ r8 ]         - register | Operand1
                      ; > [ #10 ]        - value | Operand2

ARM MOV instruction syntax and encoding diagram from the ARM Instruction Set Reference Guide showing operand format and condition codes Source: ARM Instruction Set Reference Guide v1.0

LDR — Load

LDR     r8, [r10]        ; loads R8 from the address in R10.
LDRNE   r2, [r5,#960]!   ; (conditionally) loads R2 from a word
                         ; 960 bytes above the address in R5, and
                         ; increments R5 by 960

ARM LDR (Load Register) instruction syntax showing addressing modes for loading data from memory into registers Source: ARM Instruction Set Reference Guide v1.0

STR — Store

STR     r2, [r9,#consta-struc]   ; consta-struc is an expression
                                 ; evaluating to a constant in
                                 ; the range 0-4095.

ARM STR (Store Register) instruction syntax showing addressing modes for storing register data to memory locations Source: ARM Instruction Set Reference Guide v1.0

ADD — Addition

ADD r8, r8, r9            
                      ; > [ add ]       - instruction
                      ; > [ r8 ]        - register | Operand1
                      ; > [ r8 ]        - register | Operand2
                      ; > [ #10 ]       - register | Operand3

; This instruction stores the addition between operand2 and operand3 
; and store it in operand1.

ARM ADD instruction syntax diagram showing register addition operations with operand encoding from the ARM Reference Guide Source: ARM Instruction Set Reference Guide v1.0

SUB — Subtraction

SUB r8, r8, r9            
                      ; > [ sub ]       - instruction
                      ; > [ r8 ]        - register | Operand1
                      ; > [ r8 ]        - register | Operand2
                      ; > [ #10 ]       - register | Operand3

; This instruction stores the subtraction between operand2 and operand3 
; and store it in operand1.

ARM SUB (Subtraction) instruction syntax diagram showing register subtraction operations with operand encoding Source: ARM Instruction Set Reference Guide v1.0

MUL — Multiplication

MUL r8, r8, r9            
                      ; > [ mul ]       - instruction
                      ; > [ r8 ]        - register | Operand1
                      ; > [ r8 ]        - register | Operand2
                      ; > [ #10 ]       - register | Operand3

; This instruction stores the multiplication between operand2 and operand3 
; and store it in operand1.

ARM MUL (Multiply) instruction syntax diagram showing register multiplication operations with operand encoding Source: ARM Instruction Set Reference Guide v1.0

Code to CPU Instruction

To understand how a high-level code is converted into CPU instructions and what happens inside a CPU we will take an example code and decode the operation one by one.

I am on a MacBook Pro with an M2 Pro chip based on ARM architecture and follows the AAArch64 ISA. This assembly code generated here will follow the ARM ISA and mnemonics. In case, you are on a Windows computer and Intel or AMD-based chipset, your ISA will be different.

Below is a simple code in C that adds two numbers and stores the result in a third variable.

#include<stdio.h>
int main(){
 int a = 5;
 int b = 10;
 int c = a + b;
}

You can get the assembly from the above C file using the following command.

gcc -c -S <filename>.c

Here is the assembly code, let’s go through it line by line to understand how ARM processors perform certain operations.

.section __TEXT,__text,regular,pure_instructions
 .build_version macos, 14, 0 sdk_version 14, 2
 .globl _main                           ; -- Begin function main
 .p2align 2
_main:                                  ; @main
 .cfi_startproc
; %bb.0:
 sub sp, sp, #16               ; Moving the SP 16 bytes back
 .cfi_def_cfa_offset 16
 mov w8, #5                    ; Moving 5 in register w8
 str w8, [sp, #12]             ; Storing the value in w8 to 12 byte 
                                ; ahead of the SP
 mov w8, #10                   ; Moving 10 in w8
 str w8, [sp, #8]              ; Storing the value in w8 to 8 byte 
                                ; ahead of the SP
 ldr w8, [sp, #12]             ; Loading value at 12 byte ahead of SP in w8
 ldr w9, [sp, #8]              ; Loading value at 8 byte ahead of SP in w9
 add w8, w8, w9                ; Adding values in w8 and w9 and storing in w8
 str w8, [sp, #4]              ; Storing value in w8 to 4 byte ahead of SP
 mov w0, #0                    ; Moving 0 to w0
 add sp, sp, #16               ; Adding 16 to SP and storing in SP.
                                ; (SP moves back at original position)
 ret                           ; return
 .cfi_endproc
                                        ; -- End function
.subsections_via_symbols

To begin with, below is the section of the assembly code that we need to look at and analyse. The lines before these are just function declarations and some configurations. Discussing them is out of the scope of this article. However curious readers can look for details in the provided references.

1.  mov  w8, #5                   ; Moving 5 in register w8
2.  str  w8, [sp, #12]            ; Storing the value in w8 to 12 byte 
                                  ; ahead of the SP
3.  mov  w8, #10                  ; Moving 10 in w8
4.  str  w8, [sp, #8]             ; Storing the value in w8 to 8 byte 
                                  ; ahead of the SP
5.  ldr  w8, [sp, #12]            ; Loading value at 12 byte ahead of SP in w8
6.  ldr  w9, [sp, #8]             ; Loading value at 8 byte ahead of SP in w9
7.  add  w8, w8, w9               ; Adding values in w8 and w9 and storing in w8
8.  str  w8, [sp, #4]             ; Storing value in w8 to 4 byte ahead of SP
9.  mov  w0, #0                   ; Moving 0 to w0
10. add  sp, sp, #16              ; Adding 16 to SP and storing in SP.
                                  ; (SP moves back at original position)

Analysis of the code

On close inspection, you can figure out the mnemonics in the instructions and the register names in the user. We are referring to the 32-bit registers (W) in AAArch64 architecture.

In the first step, we are moving the value of 5 in the first register i.e. W8. If you look carefully, it’s this line from the C code: int a = 5;
In the second step, we are storing the value at the location 12 bytes above the location in the Stack Pointer (SP stands for stack pointer).
In the third line, we are moving value 10 in the W8 register. This is similar to the line int b = 10; in the C program.
In the fourth step, we are storing the value at the location 8 bytes above the location in the SP.
In the fifth line, we are loading the value at 12 bytes ahead of SP in register W8.
In the sixth line, we are loading the value at 8 bytes ahead of SP in register W9.
In the seventh line, we are adding the values in W8 and W9 and moving the value back to W8. This is this line in the C code: int c = a + b;
In the eighth line, we are storing the value in W8 at 4 bytes above the SP location.
In the ninth line, we are moving 0 inside W0. In GCC, the first parameter register (W0 in ARM64) is used for storing the exit code of the program. Thus moving 0 to W0 indicates the successful execution of the program.
In the last line, we are moving the SP 16 bytes ahead by adding 16 to the SP.

With this, I would like to conclude this article.

Conclusion

This introduction has provided a basic understanding of the ARM assembly language and some fundamental instructions. We’ve seen how ARM processors operate on data using load-store architecture and how a simple C program translates into assembly instructions.

As you continue your exploration of ARM assembly, you can delve deeper into more complex instructions, branching statements, and assembly specific to different ARM cores. I have referenced different resources I came across while researching for this article. Feel free to explore them.

References

Thank you for giving your valuable time.

I hope this article gave you some insight into how your valuable and beautiful code provides instruction to the hardware. This is my first article on an upcoming series of Computer Architecture and Organisation, and Low-Level Programming.

Please write your feedback on the delivery and the composition. Also, do let me know if I missed or misunderstood anything. One of my primary goals is to gain more knowledge and insight from valuable and enlightened audiences on the platform.

Read the original article on Medium →