NaCl SFI model on x86-64 systems
This document addresses the details of the Software Fault Isolation (SFI) model for executable code that can be run in Native Client on an x86-64 system. An overview of this model can be found in the paper: Adapting Software Fault Isolation to Contemporary CPU Architectures. The primary focus of the SFI model is a Windows x86-64 system but the same techniques can be applied to run identical x86-64 binaries on other x86-64 systems such as Linux, Mac, FreeBSD, etc, so the description of the SFI model tries to abstract away system dependencies when possible.
Please note: throughout this document we use the AT&T notation for assembler syntax, in which the target operand appears last, e.g.
mov src, dst.
The format of Native Client executable binaries is identical to the x86-64 ELF binary format (, , , ) for Linux or BSD with a few extra requirements. The additional rules that a Native Client ELF binary must follow are:
- The ELF magic OS ABI field must be 123.
- The ELF magic OS ABI VERSION field must be 5.
- The ELF e_flags field must be 0x200000 (32-byte alignment).
- There must be exactly one PT_LOAD text segment. It must begin at 0x20000 (128 kB) and be marked RX (no W). The contents of the text segment must follow Text Segment Rules.
- There can be at most one PT_LOAD data segment marked R.
- There can be at most one PT_LOAD data segment marked RW.
- There can be at most one PT_GNU_STACK segment. It must be marked RW.
- All segments must end before limit address (4 GiB).
To ensure fault isolation at runtime, the system must maintain a number of runtime invariants across the lifetime of the running program. Both the Validator and the Service Runtime are responsible for maintaining the invariants. See the paper for the rationale for the invariants:
RIPalways points to valid instruction boundary (the validator must ensure this with direct jumps and direct calls).
RZP) is never modified by code (the validator must ensure this). Low 32 bits of
RZPare all zero (loader must ensure this).
RSPare always in the safe zone: between
RBPare allowed to be in the range of
- 84GiB are allocated for NaCl module (i.e. untrusted region):
R15+4GIB..R15+44GiBare buffer zones with PROT_NONE flags.
- The 4GB safe zone has pages with either PROT_WRITE or PROT_EXEC but must not have PROT_WRITE+PROT_EXEC pages.
- All executable code in PROT_EXEC pages is validatable and guaranteed to obey the invariant.
- Trampoline/springboard code is mapped to a non-writable region in the untrusted 84GB region; each trampoline/springboard is 32-byte aligned and fits within a single bundle.
- The OS must not put any internal structures/code into the untrusted region at any time (not using OS dynamic linker, etc)
Text Segment Rules
- The validation process must ensure that the text segment complies with the following rules. The validation process must complete successfully strictly before executing any instruction of the untrusted code.
- The following instructions are illegal and must be rejected by the validator (the list is not exhaustive as the validator uses a whiteist, not a blacklist; this means there is a large but finite list of instructions the validator allows, not a small list of instructions the validator rejects):
- any privileged instructions
movto/from segment registers
popa(not dangerous but not needed for GCC)
- There must be space for at least 32 bytes after the text segment and before the next segment in ELF (towards higher addresses) that ends strictly at a 64K boundary (a minimum page size for untrusted code). This space will be padded with HLT instructions as part of the validation process, along with the optional 64K page.
- Neither instructions nor pseudo-instructions are permitted to span a 32-byte boundary.
- The ELF entry address must be 32-byte aligned.
- must point to a valid instruction boundary
- must not point into a pseudo-instruction
- must not point between a restricted register (see below for definition) producer instruction and its corresponding restricted register consumer instruction.
CALLinstructions must be 5 bytes before a 32-byte boundary, so that the return address will be 32-byte aligned.
- Indirect call targets must be 32-byte aligned. Instead of indirect
naclcall(see below for definitions of these pseudo-instructions)
- All instructions that read or write from/to memory must use one of the four registers
RSPas a base, restricted (see below) register index (multiplied by 0, 1, 2, 4 or 8) and constant displacement (optional).
Exception to this rule: string instructions are allowed if used in following sequences (the sequences should not cross bundle boundaries; segment overrides are disallowed):
mov %edi, %edi lea (%rZP,%rdi),%rdi [rep] stos ; other string instructions can be used here
Note: this is identical to the pseudo-instruction:
[rep] stos %?ax, %nacl:(%rdi),%rZP
- An operand of a command is said to be a restricted register iff it is a register that is the target of a 32-bit move in the immediately-preceding command in the same bundle (consider the previous command as additional sandboxing prefix):
; any 32-bit register can be used here; the first operand is ; unrestricted but often is the same register mov ..., %eXX
- Instructions capable of changing
%RSPare forbidden, except the instruction sequences in the whitelist below, which must not cross bundle boundaries:
mov %rbp, %rsp mov %rsp, %rbp mov ..., %ebp ; restoration of %RBP from memory, register or stack - keeps the ; invariant intact add %rZP, %rbp mov ..., %esp ; restoration of %RSP from memory, register or stack - keeps the ; invariant intact add %rZP, %rsp lea xxx(%rbp), %esp add %rZP, %rsp ; restoration of %RSP from %RBP with adjust sub ..., %esp add %rZP, %rsp ; stack space allocation add ..., %esp add %rZP, %rsp ; stack space deallocation and $XX, %rsp ; alignment; XX must be between -128 and -1 pushq ... popq ... ; except pop %RSP, pop %RBP
List of Pseudo-instructions
Pseudo-instructions were introduced to let the compiler maintain the invariants without needing to know the code alignment rules. The assembler guarantees 32-bit alignment for all pseudo-instructions in the table below. In addition, to the pseudo-instructions, one pseudo-operand prefix is introduced:
%nacl. Presence of the
%nacl operand prefix ensures that:
- The instruction
"%mov %eXX, %eXX"is added immediately before the actual command using prefix
%eXXis a 32-bit part of the index register of the actual command, for example: in operand
%nacl:(,%r11), the notation
%eXXis referring to
- The resulting sequence of two instructions does not cross the bundle boundary.
For example, the instruction:
is translated by the assembler to:
mov %edi,%edi mov %eax,(%r15,%rdi,2)
The complete list of introduced pseudo-instructions is as follows:
|Pseudo-instruction||Is translated to|
|[rep] cmps %nacl:(%rsi),%nacl:(%rdi),%rZP|
[rep] cmps (%rsi),(%rdi)
|[rep] movs %nacl:(%rsi),%nacl:(%rdi),%rZP|
[rep] movs (%rsi),(%rdi)
(sandboxed stack increment)
(sandboxed indirect call)
|and $-32, %eXX|
add %rZP, %rXX
Note: the assembler ensures all calls (including naclcall) will end at the bundle boundary.
(sandboxed indirect jump)
(sandboxed %ebp/rbp restore)
|naclrestsp ...,%rZP (sandboxed %esp/rsp restore)||mov ...,%esp|
|naclrestsp_noflags ...,%rZP (sandboxed %esp/rsp restore)||mov ...,%esp|
(sandboxed %esp/rsp restore from %rbp; incudes $N offset)
(sandboxed stack decrement)
|[rep] scas %nacl:(%rdi),%?ax,%rZP|
[rep] scas (%rdi),%?ax
|[rep] stos %?ax,%nacl:(%rdi),%rZP|
[rep] stos %?ax,(%rdi)