Control-flow Integrity (CFI)

Control-flow Integrity (CFI) capabilities help defend against Return-Oriented Programming (ROP) and Call/Jump-Oriented Programming (COP/JOP) style control-flow subversion attacks. These attack methodologies use code sequences in authorized modules, with at least one instruction in the sequence being a control transfer instruction that depends on attacker-controlled data either in the return stack or in memory used to obtain the target address for a call or jump. Attackers stitch these sequences together by diverting the control flow instructions (e.g., JALR, C.JR, C.JALR), from their original target address to a new target via modification in the return stack or in the memory used to obtain the jump/call target address.

RV32/RV64 provides two types of control transfer instructions - unconditional jumps and conditional branches. Conditional branches encode an offset in the immediate field of the instruction and are thus direct branches that are not susceptible to control-flow subversion. Unconditional direct jumps using JAL transfer control to a target that is in a +/- 1 MiB range from the current pc. Unconditional indirect jumps using the JALR obtain their branch target by adding the sign extended 12-bit immediate encoded in the instruction to the rs1 register.

The RV32I/RV64I does not have a dedicated instruction for calling a procedure or returning from a procedure. A JAL or JALR may be used to perform a procedure call and JALR to return from a procedure. The RISC-V ABI however defines the convention that a JAL/JALR where rd (i.e. the link register) is x1 or x5 is a procedure call, and a JALR where rs1 is the conventional link register (i.e. x1 or x5) is a return from procedure. The architecture allows for using these hints and conventions to support return address prediction (See Return-address stack prediction hints).

The RVC standard extension for compressed instructions provides unconditional jump and conditional branch instructions. The C.J and C.JAL instructions encode an offset in the immediate field of the instruction and thus are not susceptible to control-flow subversion. The C.JR and C.JALR RVC instructions perform an unconditional control transfer to the address in register rs1. The C.JALR additionally writes the address of the instruction following the jump (pc+2) to the link register x1 and is a procedure call. The C.JR is a return from procedure if rs1 is a conventional link register (i.e. x1 or x5); else it is an indirect jump.

The term call is used to refer to a JAL or JALR instruction with a link register as destination, i.e., rd≠x0. Conventionally, the link register is x1 or x5. A call using JAL or C.JAL is termed a direct call. A C.JALR expands to JALR x1, 0(rs1) and is a call. A call using JALR or C.JALR is termed an indirect-call.

The term return is used to refer to a JALR instruction with rd=x0 and with rs1=x1 or rs1=x5. A C.JR instruction expands to JALR x0, 0(rs1) and is a return if rs1=x1 or rs1=x5.

The term indirect-jump is used to refer to a JALR instruction with rd=x0 and where the rs1 is not x1 or x5 (i.e., not a return). A C.JR instruction where rs1 is not x1 or x5 (i.e., not a return) is an indirect-jump.

The Zicfiss and Zicfilp extensions build on these conventions and hints and provide backward-edge and forward-edge control flow integrity respectively.

The Unprivileged ISA for Zicfilp extension is specified in Landing Pad (Zicfilp) and for the Unprivileged ISA for Zicfiss extension is specified in Shadow Stack (Zicfiss). The Privileged ISA for these extensions is specified in the Privileged ISA specification.

Landing Pad (Zicfilp)

To enforce forward-edge control-flow integrity, the Zicfilp extension introduces a landing pad (LPAD) instruction. The LPAD instruction must be placed at the program locations that are valid targets of indirect jumps or calls. The LPAD instruction (See Landing Pad Instruction) is encoded using the AUIPC major opcode with rd=x0.

Compilers emit a landing pad instruction as the first instruction of an address-taken function, as well as at any indirect jump targets. A landing pad instruction is not required in functions that are only reached using a direct call or direct jump.

The landing pad is designed to provide integrity to control transfers performed using indirect calls and jumps, and this is referred to as forward-edge protection. When the Zicfilp is active, the hart tracks an expected landing pad (ELP) state that is updated by an indirect_call or indirect_jump to require a landing pad instruction at the target of the branch. If the instruction at the target is not a landing pad, then a software-check exception is raised.

A landing pad may be optionally associated with a 20-bit label. With labeling enabled, the number of landing pads that can be reached from an indirect call or jump sites can be defined using programming language-based policies. Labeling of the landing pads enables software to achieve greater precision in pairing up indirect call/jump sites with valid targets. When labeling of landing pads is used, indirect call or indirect jump site can specify the expected label of the landing pad and thereby constrain the set of landing pads that may be reached from each indirect call or indirect jump site in the program.

In the simplest form, a program can be built with a single label value to implement a coarse-grained version of forward-edge control-flow integrity. By constraining gadgets to be preceded by a landing pad instruction that marks the start of indirect callable functions, the program can significantly reduce the available gadget space. A second form of label generation may generate a signature, such as a MAC, using the prototype of the function. Programs that use this approach would further constrain the gadgets accessible from a call site to only indirectly callable functions that match the prototype of the called functions. Another approach to label generation involves analyzing the control-flow-graph (CFG) of the program, which can lead to even more stringent constraints on the set of reachable gadgets. Such programs may further use multiple labels per function, which means that if a function is called from two or more call sites, the functions can be labeled as being reachable from each of the call sites. For instance, consider two call sites A and B, where A calls the functions X and Y, and B calls the functions Y and Z. In a single label scheme, functions X, Y, and Z would need to be assigned the same label so that both call sites A and B can invoke the common function Y. This scheme would allow call site A to also call function Z and call site B to also call function X. However, if function Y was assigned two labels - one corresponding to call site A and the other to call site B, then Y can be invoked by both call sites, but X can only be invoked by call site A and Z can only be invoked by call site B. To support multiple labels, the compiler could generate a call-site-specific entry point for shared functions, with each entry point having its own landing pad instruction followed by a direct branch to the start of the function. This would allow the function to be labeled with multiple labels, each corresponding to a specific call site. A portion of the label space may be dedicated to labeled landing pads that are only valid targets of an indirect jump (and not an indirect call).

The LPAD instruction uses the code points defined as HINTs for the AUIPC opcode. When Zicfilp is not active at a privilege level or when the extension is not implemented, the landing pad instruction executes as a no-op. A program that is built with LPAD instructions can thus continue to operate correctly, but without forward-edge control-flow integrity, on processors that do not support the Zicfilp extension or if the Zicfilp extension is not active.

Compilers and linkers should provide an attribute flag to indicate if the program has been compiled with the Zicfilp extension and use that to determine if the Zicfilp extension should be activated. The dynamic loader should activate the use of Zicfilp extension for an application only if all executables (the application and the dependent dynamically linked libraries) used by that application use the Zicfilp extension.

When Zicfilp extension is not active or not implemented, the hart does not require landing pad instructions at the targets of indirect calls/jumps, and the landing instructions revert to being no-ops. This allows a program compiled with landing pad instructions to operate correctly but without forward-edge control-flow integrity.

The Zicfilp extensions may be activated for use individually and independently for each privilege mode.

The Zicfilp extension depends on the Zicsr extension.

Landing Pad Enforcement

To enforce that the target of an indirect call or indirect jump must be a valid landing pad instruction, the hart maintains an expected landing pad (ELP) state to determine if a landing pad instruction is required at the target of an indirect call or an indirect jump. The ELP state can be one of:

0 - NO_LP_EXPECTED
1 - LP_EXPECTED

The ELP state is initialized to NO_LP_EXPECTED by the hart upon reset.

The Zicfilp extension, when enabled, determines if an indirect call or an indirect jump must land on a landing pad, as specified in Landing pad expected determination. If is_lp_expected is 1, then the hart updates the ELP to LP_EXPECTED.

Landing pad expected determination

  is_lp_expected = ( (JALR || C.JR || C.JALR) &&
                     (rs1 != x1) && (rs1 != x5) && (rs1 != x7) ) ? 1 : 0;

An indirect branch using JALR, C.JALR, or C.JR with rs1 as x7 is termed a software guarded branch. Such branches do not need to land on a LPAD instruction and thus do not set ELP to LP_EXPECTED.

When the register source is a link register and the register destination is x0, then it’s a return from a procedure and does not require a landing pad at the target.

When the register source and register destination are both link registers, then it is a semantically-direct-call. For example, the call offset pseudoinstruction may expand to a two instruction sequence composed of a lui ra, imm20 or a auipc ra, imm20 instruction followed by a jalr ra, imm12(ra) instruction where ra is the link register (either x1 or x5). Since the address of the procedure was not explicitly taken and the computed address is not obtained from mutable memory, such semantically-direct calls do not require a landing pad to be placed at the target. Compilers and JITers must use the semantically-direct calls only if the rs1 was computed as a PC-relative or an absolute offset to the symbol.

The tail offset pseudoinstruction used to tail call a far-away procedure may also be expanded to a two instruction sequence composed of a lui x7, imm20 or auipc x7, imm20 followed by a jalr x0, x7. Since the address of the procedure was not explicitly taken and the computed address is not obtained from mutable memory, such semantically-direct tail-calls do not require a landing pad to be placed at the target.

Software guarded branches may also be used by compilers to generate code for constructs like switch-cases. When using the software guarded branches, the compiler is required to ensure it has full control on the possible jump targets (e.g., by obtaining the targets from a read-only table in memory and performing bounds checking on the index into the table, etc.).

The landing pad may be labeled. Zicfilp extension designates the register x7 for use as the landing pad label register. To support labeled landing pads, the indirect call/jump sites establish an expected landing pad label (e.g., using the LUI instruction) in the bits 31:12 of the x7 register. The LPAD instruction is encoded with a 20-bit immediate value called the landing-pad-label (LPL) that is matched to the expected landing pad label. When LPL is encoded as zero, the LPAD instruction does not perform the label check and in programs built with this single label mode of operation the indirect call/jump sites do not need to establish an expected landing pad label value in x7.

When ELP is set to LP_EXPECTED, if the next instruction in the instruction stream is not 4-byte aligned, or is not LPAD, or if the landing pad label encoded in LPAD is not zero and does not match the expected landing pad label in bits 31:12 of the x7 register, then a software-check exception (cause=18) with xtval set to "landing pad fault (code=2)" is raised else the ELP is updated to NO_LP_EXPECTED.

The tracking of ELP and the requirement for a landing pad instruction at the target of indirect call and jump enables a processor implementation to significantly reduce or to prevent speculation to non-landing-pad instructions. Constraining speculation using this technique, greatly reduces the gadget space and increases the difficulty of using techniques such as branch-target-injection, also known as Spectre variant 2, which use speculative execution to leak data through side channels.

The LPAD requires a 4-byte alignment to address the concatenation of two instructions A and B accidentally forming an unintended landing pad in the program. For example, consider a 32-bit instruction where the bytes 3 and 2 have a pattern of ?017h (for example, the immediate fields of a LUI, AUIPC, or a JAL instruction), followed by a 16-bit or a 32-bit instruction. When patterns that can accidentally form a valid landing pad are detected, the assembler or linker can force instruction A to be aligned to a 4-byte boundary to force the unintended LPAD pattern to become misaligned, and thus not a valid landing pad, or may use an alternate register allocation to prevent the accidental landing pad.

Landing Pad Instruction

When Zicfilp is enabled, LPAD is the only instruction allowed to execute when the ELP state is LP_EXPECTED. If Zicfilp is not enabled then the instruction is a no-op. If Zicfilp is enabled, the LPAD instruction causes a software-check exception with xtval set to "landing pad fault (code=2)" if any of the following conditions are true:

The pc is not 4-byte aligned and ELP is LP_EXPECTED.
The ELP is LP_EXPECTED and the LPL is not zero and the LPL does not match the expected landing pad label in bits 31:12 of the x7 register.

If a software-check exception is not caused then the ELP is updated to NO_LP_EXPECTED.

The operation of the LPAD instruction is as follows:

LPAD operation

if (xLPE == 1 && ELP == LP_EXPECTED)
    // If PC not 4-byte aligned then software-check exception
    if pc[1:0] != 0
        raise software-check exception
    // If landing pad label not matched -> software-check exception
    else if (inst.LPL != x7[31:12] && inst.LPL != 0)
        raise software-check exception
    else
        ELP = NO_LP_EXPECTED
else
    no-op
endif

Shadow Stack (Zicfiss)

The Zicfiss extension introduces a shadow stack to enforce backward-edge control-flow integrity. A shadow stack is a second stack used to store a shadow copy of the return address in the link register if it needs to be spilled.

The shadow stack is designed to provide integrity to control transfers performed using a return, where the return may be from a procedure invoked using an indirect call or a direct call, and this is referred to as backward-edge protection.

A program using backward-edge control-flow integrity has two stacks: a regular stack and a shadow stack. The shadow stack is used to spill the link register, if required, by non-leaf functions. An additional register, shadow-stack-pointer (ssp), is introduced in the architecture to hold the address of the top of the active shadow stack.

The shadow stack, similar to the regular stack, grows downwards, from higher addresses to lower addresses. Each entry on the shadow stack is XLEN wide and holds the link register value. The ssp points to the top of the shadow stack, which is the address of the last element stored on the shadow stack.

The shadow stack is architecturally protected from inadvertent corruptions and modifications, as detailed in the Privileged specification.

The Zicfiss extension provides instructions to store and load the link register to/from the shadow stack and to check the integrity of the return address. The extension provides instructions to support common stack maintenance operations such as stack unwinding and stack switching.

When Zicfiss is enabled, each function that needs to spill the link register, typically non-leaf functions, store the link register value to the regular stack and a shadow copy of the link register value to the shadow stack when the function is entered (the prologue). When such a function returns (the epilogue), the function loads the link register from the regular stack and the shadow copy of the link register from the shadow stack. Then, the link register value from the regular stack and the shadow link register value from the shadow stack are compared. A mismatch of the two values is indicative of a subversion of the return address control variable and causes a software-check exception.

The Zicfiss instructions, except SSAMOSWAP.W/D, are encoded using a subset of May-Be-Operation instructions defined by the Zimop and Zcmop extensions. This subset of instructions revert to their Zimop/Zcmop defined behavior when the Zicfiss extension is not implemented or if the extension has not been activated. A program that is built with Zicfiss instructions can thus continue to operate correctly, but without backward-edge control-flow integrity, on processors that do not support the Zicfiss extension or if the Zicfiss extension is not active. The Zicfiss extension may be activated for use individually and independently for each privilege mode.

Compilers should flag each object file (for example, using flags in the ELF attributes) to indicate if the object file has been compiled with the Zicfiss instructions. The linker should flag (for example, using flags in the ELF attributes) the binary/executable generated by linking objects as being compiled with the Zicfiss instructions only if all the object files that are linked have the same Zicfiss attributes.

The dynamic loader should activate the use of Zicfiss extension for an application only if all executables (the application and the dependent dynamically-linked libraries) used by that application use the Zicfiss extension.

An application that has the Zicfiss extension active may request the dynamic loader at runtime to load a new dynamic shared object (using dlopen() for example). If the requested object does not have the Zicfiss attribute then the dynamic loader, based on its policy (e.g., established by the operating system or the administrator) configuration, could either deny the request or deactivate the Zicfiss extension for the application. It is strongly recommended that the policy enforces a strict security posture and denies the request.

The Zicfiss extension depends on the Zicsr and Zimop extensions. Furthermore, if the Zcmop extension is implemented, the Zicfiss extension also provides the C.SSPUSH and C.SSPOPCHK instructions. Moreover, use of Zicfiss in U-mode requires S-mode to be implemented. Use of Zicfiss in M-mode is not supported.

Zicfiss Instructions Summary

The Zicfiss extension introduces the following instructions:

Push to the shadow stack (See Push to the Shadow Stack)
- SSPUSH x1 and SSPUSH x5 - encoded using MOP.RR.7
- C.SSPUSH x1 - encoded using C.MOP.1
Pop from the shadow stack (See Pop from the Shadow Stack)
- SSPOPCHK x1 and SSPOPCHK x5 - encoded using MOP.R.28
- C.SSPOPCHK x5 - encoded using C.MOP.5
Read the value of ssp into a register (See Read ssp into a Register)
- SSRDP - encoded using MOP.R.28
Perform an atomic swap from a shadow stack location (See Atomic Swap from a Shadow Stack Location)
- SSAMOSWAP.W and SSAMOSWAP.D

Zicfiss does not use all encodings of MOP.RR.7 or MOP.R.28. When a MOP.RR.7 or MOP.R.28 encoding is not used by the Zicfiss extension, the corresponding instruction adheres to its Zimop-defined behavior, unless redefined by another extension.

Shadow Stack Pointer (`ssp`)

The ssp CSR is an unprivileged read-write (URW) CSR that reads and writes XLEN low order bits of the shadow stack pointer (ssp). The CSR address is 0x011. There is no high CSR defined as the ssp is always as wide as the XLEN of the current privilege mode. The bits 1:0 of ssp are read-only zero. If the UXLEN or SXLEN may never be 32, then the bit 2 is also read-only zero.

Zicfiss Instructions

Push to the Shadow Stack

A shadow stack push operation is defined as decrement of the ssp by XLEN/8 followed by a store of the value in the link register to memory at the new top of the shadow stack.

Only x1 and x5 registers are supported as rs2 for SSPUSH. Zicfiss provides a 16-bit version of the SSPUSH x1 instruction using the Zcmop defined C.MOP.1 encoding. The C.SSPUSH x1 expands to SSPUSH x1.

The SSPUSH instruction and its compressed form C.SSPUSH can be used to push a link register on the shadow stack. The SSPUSH and C.SSPUSH instructions perform a store identically to the existing store instructions, with the difference that the base is implicitly ssp and the width is implicitly XLEN.

The operation of the SSPUSH and C.SSPUSH instructions is as follows:

SSPUSH and C.SSPUSH operation

if (xSSE == 1)
    mem[ssp - (XLEN/8)] = X(src)  # Store src value to ssp - XLEN/8
    ssp = ssp - (XLEN/8)          # decrement ssp by XLEN/8
endif

The ssp is decremented by SSPUSH and C.SSPUSH only if the store to the shadow stack completes successfully.

Pop from the Shadow Stack

A shadow stack pop operation is defined as an XLEN wide read from the current top of the shadow stack followed by an increment of the ssp by XLEN/8.

Only x1 and x5 registers are supported as rs1 for SSPOPCHK. Zicfiss provides a 16-bit version of the SSPOPCHK x5 using the Zcmop defined C.MOP.5 encoding. The C.SSPOPCHK x5 expands to SSPOPCHK x5.

Programs with a shadow stack push the return address onto the regular stack as well as the shadow stack in the prologue of non-leaf functions. When returning from these non-leaf functions, such programs pop the link register from the regular stack and pop a shadow copy of the link register from the shadow stack. The two values are then compared. If the values do not match, it is indicative of a corruption of the return address variable on the regular stack.

The SSPOPCHK instruction, and its compressed form C.SSPOPCHK, can be used to pop the shadow return address value from the shadow stack and check that the value matches the contents of the link register, and if not cause a software-check exception with xtval set to "shadow stack fault (code=3)".

While any register may be used as link register, conventionally the x1 or x5 registers are used. The shadow stack instructions are designed to be most efficient when the x1 and x5 registers are used as the link register.

Return-address prediction stacks are a common feature of high-performance instruction-fetch units, but they require accurate detection of instructions used for procedure calls and returns to be effective. For RISC-V, hints as to the instructions' usage are encoded implicitly via the register numbers used. The return-address stack (RAS) actions to pop and/or push onto the RAS are specified in Return-address stack prediction hints.

Using x1 or x5 as the link register allows a program to benefit from the return-address prediction stacks. Additionally, since the shadow stack instructions are designed around the use of x1 or x5 as the link register, using any other register as a link register would incur the cost of additional register movements.

Compilers, when generating code with backward-edge CFI, must protect the link register, e.g., x1 and/or x5, from arbitrary modification by not emitting unsafe code sequences.

Storing the return address on both stacks preserves the call stack layout and the ABI, while also allowing for the detection of corruption of the return address on the regular stack. The prologue and epilogue of a non-leaf function that uses shadow stacks is as follows:

    function_entry:
        addi sp,sp,-8  # push link register x1
        sd x1,(sp)     # on regular stack
        sspush x1      # push link register x1 on shadow stack
         :
        ld x1,(sp)     # pop link register x1 from regular stack
        addi sp,sp,8
        sspopchk x1    # fault if x1 not equal to shadow
                       # return address
        ret

This example illustrates the use of x1 register as the link register. Alternatively, the x5 register may also be used as the link register.

A leaf function, a function that does not itself make function calls, does not need to spill the link register. Consequently, the return value may be held in the link register itself for the duration of the leaf function’s execution.

The C.SSPOPCHK, and SSPOPCHK instructions perform a load identically to the existing load instructions, with the difference that the base is implicitly ssp and the width is implicitly XLEN.

The operation of the SSPOPCHK and C.SSPOPCHK instructions is as follows:

SSPOPCHK and C.SSPOPCHK operation

if (xSSE == 1)
    temp = mem[ssp]            # Load temp from address in ssp and
    if temp != X(src)          # Compare temp to value in src and
                               # cause an software-check exception
                               # if they are not bitwise equal.
                               # Only x1 and x5 may be used as src
       raise software-check exception
    else
       ssp = ssp + (XLEN/8)    # increment ssp by XLEN/8.
    endif
endif

If the value loaded from the address in ssp does not match the value in rs1, a software-check exception (cause=18) is raised with xtval set to "shadow stack fault (code=3)". The software-check exception caused by SSPOPCHK/ C.SSPOPCHK is lower in priority than a load/store/AMO access-fault exception.

The ssp is incremented by SSPOPCHK and C.SSPOPCHK only if the load from the shadow stack completes successfully and no software-check exception is raised.

The use of the compressed instruction C.SSPUSH x1 to push on the shadow stack is most efficient when the ABI uses x1 as the link register, as the link register may then be pushed without needing a register-to-register move in the function prologue. To use the compressed instruction C.SSPOPCHK x5, the function should pop the return address from regular stack into the alternate link register x5 and use the C.SSPOPCHK x5 to compare the return address to the shadow copy stored on the shadow stack. The function then uses C.JR x5 to jump to the return address.

    function_entry:
        c.addi sp,sp,-8  # push link register x1
        c.sd x1,(sp)     # on regular stack
        c.sspush x1      # push link register x1 on shadow stack
         :
        c.ld x5,(sp)     # pop link register x5 from regular stack
        c.addi sp,sp,8
        c.sspopchk x5    # fault if x5 not equal to shadow return address
        c.jr x5

Store-to-load forwarding is a common technique employed by high-performance processor implementations. Zicfiss implementations may prevent forwarding from a non-shadow-stack store to the SSPOPCHK or the C.SSPOPCHK instructions. A non-shadow-stack store causes a fault if done to a page mapped as a shadow stack. However, such determination may be delayed till the PTE has been examined and thus may be used to transiently forward the data from such stores to SSPOPCHK or to C.SSPOPCHK.

Read `ssp` into a Register

The SSRDP instruction is provided to move the contents of ssp to a destination register.

Encoding rd as x0 is not supported for SSRDP.

The operation of the SSRDP instructions is as follows:

SSRDP operation

if (xSSE == 1)
    X(dst) = ssp
else
    X(dst) = 0
endif

The property of Zimop writing 0 to the rd when the extension using Zimop is not implemented or not active may be used by to determine if Zicfiss extension is active. For example, functions that unwind shadow stacks may skip over the unwind actions by dynamically detecting if the Zicfiss extension is active.

An example sequence such as the following may be used:

    ssrdp t0                      # mv ssp to t0
    beqz t0, zicfiss_not_active   # zero is not a valid shadow stack
                                  # pointer by convention
    # Zicfiss is active
    :
    :
zicfiss_not_active:

To assist with the use of such code sequences, operating systems and runtimes must not locate shadow stacks at address 0.

A common operation performed on stacks is to unwind them to support constructs like setjmp/longjmp, C++ exception handling, etc. A program that uses shadow stacks must unwind the shadow stack in addition to the stack used to store data. The unwind function must verify that it does not accidentally unwind past the bounds of the shadow stack. Shadow stacks are expected to be bounded on each end using guard pages. A guard page for a stack is a page that is not accessible by the process that owns the stack. To detect if the unwind occurs past the bounds of the shadow stack, the unwind may be done in maximal increments of 4 KiB, testing whether the ssp is still pointing to a shadow stack page or has unwound into the guard page. The following examples illustrate the use of shadow stack instructions to unwind a shadow stack. This example assumes that the setjmp function itself does not push on to the shadow stack (being a leaf function, it is not required to).

setjmp() {
    :
    :
    // read and save the shadow stack pointer to jmp_buf
    asm("ssrdp %0" : "=r"(cur_ssp):);
    jmp_buf->saved_ssp = cur_ssp;
    :
    :
}
longjmp() {
    :
    // Read current shadow stack pointer and
    // compute number of call frames to unwind
    asm("ssrdp %0" : "=r"(cur_ssp):);
    // Skip the unwind if backward-edge CFI not active
    asm("beqz %0, back_cfi_not_active" : "=r"(cur_ssp):);
    // Unwind the frames in a loop
    while ( jmp_buf->saved_ssp > cur_ssp ) {
        // advance by a maximum of 4K at a time to avoid
        // unwinding past bounds of the shadow stack
        cur_ssp = ( (jmp_buf->saved_ssp - cur_ssp) >= 4096 ) ?
                  (cur_ssp + 4096) : jmp_buf->saved_ssp;
        asm("csrw ssp, %0" : :  "r" (cur_ssp));
        // Test if unwound past the shadow stack bounds
        asm("sspush x5");
        asm("sspopchk x5");
    }
back_cfi_not_active:
    :
}

Atomic Swap from a Shadow Stack Location

For RV32, SSAMOSWAP.W atomically loads a 32-bit data value from address of a shadow stack location in rs1, puts the loaded value into register rd, and stores the 32-bit value held in rs2 to the original address in rs1. SSAMOSWAP.D (RV64 only) is similar to SSAMOSWAP.W but operates on 64-bit data values.

SSAMOSWAP.W for RV32 and SSAMOSWAP.D (RV64 only) operation

  if privilege_mode != M && menvcfg.SSE == 0
      raise illegal-instruction exception
  else if S-mode not implemented
      raise illegal-instruction exception
  else if privilege_mode == U && senvcfg.SSE == 0
      raise illegal-instruction exception
  else if privilege_mode == VS && henvcfg.SSE == 0
      raise virtual-instruction  exception
  else if privilege_mode == VU && senvcfg.SSE == 0
      raise virtual-instruction  exception
  else
      X(rd) = mem[X(rs1)]
      mem[X(rs1)] = X(rs2)
  endif

For RV64, SSAMOSWAP.W atomically loads a 32-bit data value from address of a shadow stack location in rs1, sign-extends the loaded value and puts it in rd, and stores the lower 32 bits of the value held in rs2 to the original address in rs1.

SSAMOSWAP.W for RV64

  if privilege_mode != M && menvcfg.SSE == 0
      raise illegal-instruction exception
  else if S-mode not implemented
      raise illegal-instruction exception
  else if privilege_mode == U && senvcfg.SSE == 0
      raise illegal-instruction exception
  else if privilege_mode == VS && henvcfg.SSE == 0
      raise virtual-instruction  exception
  else if privilege_mode == VU && senvcfg.SSE == 0
      raise virtual-instruction  exception
  else
      temp[31:0] = mem[X(rs1)]
      X(rd) = SignExtend(temp[31:0])
      mem[X(rs1)] = X(rs2)[31:0]
  endif

Just as for AMOs in the A extension, SSAMOSWAP.W/D requires that the address held in rs1 be naturally aligned to the size of the operand (i.e., eight-byte aligned for doublewords, and four-byte aligned for words). The same exception options apply if the address is not naturally aligned.

Just as for AMOs in the A extension, SSAMOSWAP.W/D optionally provides release consistency semantics, using the aq and rl bits, to help implement multiprocessor synchronization. An SSAMOSWAP.W/D operation has acquire semantics if aq=1 and release semantics if rl=1.

Stack switching is a common operation in user programs as well as supervisor programs. When a stack switch is performed the stack pointer of the currently active stack is saved into a context data structure and the new stack is made active by loading a new stack pointer from a context data structure.

When shadow stacks are active for a program, the program needs to additionally switch the shadow stack pointer. If the pointer to the top of the deactivated shadow stack is held in a context data structure, then it may be susceptible to memory corruption vulnerabilities. To protect the pointer value, the program may store it at the top of the deactivated shadow stack itself and thereby create a checkpoint. A legal checkpoint is defined as one that holds a value of X, where X is the address at which the checkpoint is positioned on the shadow stack.

An example sequence to restore the shadow stack pointer from the new shadow stack and save the old shadow stack pointer on the old shadow stack is as follows:

# a0 hold pointer to top of new shadow stack to switch to
stack_switch:
   ssrdp ra
   beqz ra, 2f                    # skip if Zicfiss not active
   ssamoswap.d ra, x0,  (a0)      # ra=*[a0] and *[a0]=0
   beq         ra, a0,  1f        # [a0] must be == [ra]
   unimp                          # else crash
1: addi        ra, ra,  XLEN/8    # pop the checkpoint
   csrrw       ra, ssp, ra        # swap ssp: ra=ssp, ssp=ra
   addi        ra, ra,  -(XLEN/8) # checkpoint = "old ssp - XLEN/8"
   ssamoswap.d x0, ra,  (ra)      # Save checkpoint at "old ssp - XLEN/8"
2:

This sequence uses the ra register. If the privilege mode at which this sequence is executed can be interrupted, then the trap handler should save the ra on the shadow stack itself. There it is guarded against tampering and can be restored prior to returning from the trap.

When a new shadow stack is created by the supervisor, it needs to store a checkpoint at the highest address on that stack. This enables the shadow stack pointer to be switched using the process outlined in this note. The SSAMOSWAP.W/D instruction can be used to store this checkpoint. When the old value at the memory location operated on by SSAMOSWAP.W/D is not required, rd can be set to x0.