Control-flow Integrity 控制流完整性
什么是控制流完整性?
一个程序能跑起来,说明其结构没有明显问题——宏观上看是这样,但要深入到程序的二进制码级呢?可能会有看起来没问题但实际上有隐患的地方,如栈溢出等。此外,即使一个程序写的很完美,没有任何漏洞,不法分子依旧有办法用它来进行一些攻击——如ROP和JOP。
ROP、JOP与COP
在讲这几种攻击之前,先说说什么是“代码重用攻击”。顾名思义,其核心思路不是往内存里再塞一段新代码,而是把程序里本来就存在 的指令片段拼接 起来,形成攻击者想要的控制流。换句话说,就是“以彼之道、还彼之身”。
ROP,Return-Oriented Programming,面向返回编程 。喜欢玩卡西欧系列计算器的应该不陌生,那些在计算器上显示出特定字符、甚至用来玩游戏的方法,都是ROP。它把很多以 ret 结尾的小片段(通常叫 gadget)串起来执行;因为 ret 会从栈上取下一个返回地址,所以一旦攻击者能改动返回地址,程序就可能被引到一串“并非程序员原本打算这样连起来”的代码片段上。
JOP,Jump-Oriented Programming,面向跳转编程 。它和 ROP 的目标一样,都是拼 gadget,但不依赖 ret ,而是主要依赖间接跳转 (例如 jmp reg 这一类)。JOP 的核心点是:即使针对 ret 做检测或限制,攻击者仍然可以改用 jump-based 的方式把 gadget 串起来。
COP,Call-Oriented Programming,面向调用编程 。它和 JOP 很像,区别在于更偏向利用 call 这一类控制转移来连接 gadget。
前向与后向
前向(forward edge) ,从“当前代码”跳到下一段要执行的代码 ,比如 call foo()、通过函数指针调用、虚函数调用、间接 jump等;后向(backward edge) ,即函数执行完以后,从 callee 返回到 caller ,也就是 ret 这条路。
举个C的例子:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 #include <stdio.h> void target (void ) { printf ("3 target: 现在正在 target() 里执行\n" ); printf ("4 target: 准备 return,回到调用我的地方\n" ); } void run (void (*fp)(void )) { printf ("2 run: 前向 -> 通过函数指针调用 fp()\n" ); fp(); printf ("5 run: 后向 <- fp() 已经返回到 run()\n" ); } int main (void ) { printf ("1 main: 前向 -> 调用 run()\n" ); run(target); printf ("6 main: 后向 <- run() 已经返回到 main()\n" ); return 0 ; }
这段程序的执行顺序是:
main: 前向 -> 调用 run()
run: 前向 -> 通过函数指针调用 fp()
target: 正在 target() 里执行
target: 准备 return,回到调用我的地方
run: 后向 <- fp() 已经返回到 run()
main: 后向 <- run() 已经返回到 main()
落实到 Zicfilp 和 Zicfiss,这俩一个保护的是前向,一个保护的是后向。
面向安全性的扩展:Zicfilp/Zicfiss
Zicfilp(Landing Pad) 和 Zicfiss(Shadow Stack) 最早可以追溯到2022年。在 Shadow Stacks and Landing Pads task group 的内部讨论中,有关控制流保护的基本概念被确定下来。同年六月,riscv/riscv-cfi 库诞生,Zicfiss 和 Zicfilp 从一开始就是作为一套成对能力来推进的:一个守返回地址,一个守间接控制转移目标。尽管当时只是一个雏形,但对控制流保护的研究很早就开始。
随后,到了2023年, 编译工具 LLVM 也明确指出支持 Zicfilp,尽管只是一个draft版本。对于 Zicfiss 补丁的评审则是在同年年中公开提交。此时,两个扩展已经成型,但是仍处在试验阶段。在下半年,riscv-cfi 仓库进行了高速更新,半年内便发布了六个版本,对扩展相关的指令与编码还在规划。
直到2024年3月,Zicfilp 和 Zicfiss 开始进行为期30天的公开评审。随后,在7月,正式版本发布,代表着控制流完整性扩展变为了Privilege文档内的一章。
然而,光有草案不够,社区推进也较为缓慢。截止目前,仅有一个基于CVA6的分支实现了 Zicfilp 和 Zicfiss 扩展。
对于 Zicfilp,其核心思想是:用一个 landing pad(LPAD) 约束间接 jump/call 的合法目标,主要防 JOP/COP ;对于 Zicfilp,则是使用 shadow stack(影子栈) 来保护函数返回地址,主要防 ROP 。这两个扩展可真是一对笑面虎两头乌角鲨啊
Zicfilp
Zicfilp就带来了一条新指令lpad,被称为“落地点”指令。开启了-fcf-protection=branch后,编译器会把它放在“允许成为间接调用/间接跳转目标”的位置,比如被取地址的函数入口,或者某些合法的间接跳转目标。Zicfilp 开启后,处理器会跟踪一个 ELP(expected landing pad) 状态:一旦某次间接跳转被认定“目标必须是合法入口”,目标地址上的第一条有效指令就必须是 lpad,否则抛出 software-check 异常。
那对于老CPU,不支持怎么办?它的OPCODE和auipc完全一样,而且恰好设置为rd=x0,并把高 20 位立即数解释成 lpl(landing-pad label),而不是普通 auipc 里的 PC-relative upper immediate。rd=x0使其被编码在HINT内,可以被实现“自由忽略”,当 Zicfilp 没实现 ,或者 在某个特权级上没有激活 时,lpad 就按 no-op 运行,不影响架构可见状态。
Zicfiss
Zicfilp带来了几条新指令:ssppush、sspopchk和ssrdp。其实还有一个ssamoswap,但我们也没支持原子操作,故省略。
Zicfiss
Zicfi 整合与测试
这里将两个扩展统称为 Zicfi。对于 Zicfiss,官方要求其运行在包含S模式的CPU内核上。刚好我们前面加了最小支持,可以试一下。
defines.svh
这里需要加上新的定义:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 `define MEM_SSPUSH 4'b11_00 `define MEM_SSPOPCHK 4'b11_01 `define EXC_LOAD_ACCESS_FAULT 32'd5 `define EXC_STORE_ACCESS_FAULT 32'd7 `define EXC_SOFTWARE_CHECK 32'd18 `define SOFTCHK_LPAD_FAULT 32'd2 `define SOFTCHK_SHADOW_STACK_FAULT 32'd3 `define MATCH_LPAD 32'h0000_0017 `define MASK_LPAD 32'h0000_0FFF `define MATCH_SSPUSH 32'hCE00_4073 `define MASK_SSPUSH 32'hFE0F_FFFF `define MATCH_SSPOPCHK 32'hCDC0_4073 `define MASK_SSPOPCHK 32'hFFF0_7FFF `define MATCH_SSRDP 32'hCDC0_4073 `define MASK_SSRDP 32'hFFFF_F07F `define ENVCFG_LPE_BIT 2 `define ENVCFG_SSE_BIT 3 `define MSECCFG_MLPE_BIT 10 `define MSTATUS_SPELP_BIT 23 `define MSTATUSH_MPELP_BIT 9
我们设定了新的指令,需要对译码器进行修改,增加额外的端口、存取类型判断和信号线。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 output logic is_lpad_instr, logic is_lpad_internal; logic is_sspush_internal; logic is_sspopchk_internal; logic is_ssrdp_encoding; logic is_ssrdp_internal; assign is_lpad_internal = ((instr & `MASK_LPAD) == `MATCH_LPAD); assign is_sspush_internal = ((instr & `MASK_SSPUSH) == `MATCH_SSPUSH); assign is_ssrdp_encoding = ((instr & `MASK_SSRDP) == `MATCH_SSRDP); assign is_ssrdp_internal = is_ssrdp_encoding && (instr[11 :7 ] != 5'd0 ); assign is_sspopchk_internal = !is_ssrdp_encoding && ((instr & `MASK_SSPOPCHK) == `MATCH_SSPOPCHK); assign is_lpad_instr = is_lpad_internal;
然后,是判断RS1和RS2是否被使用。因为sspush将GPR编码在rs2的位置。
1 2 3 4 5 6 7 8 assign rs1_used = is_sspopchk_internal ? 1'b1 : (is_lpad_internal || is_sspush_internal || is_ssrdp_encoding) ? 1'b0 : ~((opcode == OPCODE_LUI) || (opcode == OPCODE_AUIPC) || (opcode == OPCODE_JAL) || (opcode == OPCODE_ZERO) || csr_use_imm || ( (opcode == OPCODE_ZICSR) && (funct3 == `FUNCT3_CALL))); assign rs2_used = is_sspush_internal || (opcode == OPCODE_RTYPE) || (opcode == OPCODE_BTYPE) || (opcode == OPCODE_STYPE);
接着修改auipc的判断,因为lpad的OPCODE和auipc是一个。
1 assign is_auipc = (opcode == OPCODE_AUIPC) && !is_lpad_internal;
然后是写回的多路选择器,现在也要加上针对影子栈的通路。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 if (is_mul_internal) begin wd_sel = `WD_SEL_FROM_MUL; end else if (is_ssrdp_internal) begin wd_sel = `WD_SEL_FROM_SSP; end else if (is_lpad_internal) begin wd_sel = 3'b0 ; if (is_mul_internal) begin rf_we = 1'b0 ; end else if (is_ssrdp_internal) begin rf_we = 1'b1 ; end else if (is_lpad_internal || is_sspush_internal || is_sspopchk_internal) begin rf_we = 1'b0 ; assign dram_we = (opcode == OPCODE_STYPE) || is_sspush_internal;
还有数据存取类型判断,加上影子栈相关:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 always_comb begin : sl_selection if (is_sspush_internal) begin sl_type = `MEM_SSPUSH; end else if (is_sspopchk_internal) begin sl_type = `MEM_SSPOPCHK; end else begin unique case (opcode) OPCODE_LTYPE: begin case (funct3) `FUNCT3_LB: sl_type = `MEM_LB; `FUNCT3_LBU: sl_type = `MEM_LBU; `FUNCT3_LH: sl_type = `MEM_LH; `FUNCT3_LHU: sl_type = `MEM_LHU; `FUNCT3_LW: sl_type = `MEM_LW; default : sl_type = `MEM_NOP; endcase end OPCODE_STYPE:begin case (funct3) `FUNCT3_SB: sl_type = `MEM_SB; `FUNCT3_SH: sl_type = `MEM_SH; `FUNCT3_SW: sl_type = `MEM_SW; default : sl_type = `MEM_NOP; endcase end default : sl_type = `MEM_NOP; endcase end end
还有CSR地址提取:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 always_comb begin : csr_detection csr_addr = instr[31 :20 ]; csr_op = funct3; if (is_sspush_internal || is_sspopchk_internal || is_ssrdp_encoding) begin is_csr_instr = 1'b0 ; is_ecall = 1'b0 ; is_mret = 1'b0 ; is_sret = 1'b0 ; end else if (opcode == OPCODE_ZICSR) begin if (funct3 == `FUNCT3_CALL) begin is_csr_instr = 1'b0 ; is_ecall = (instr[31 :7 ] == 25'b0 ); is_mret = (instr[31 :7 ] == 25'b0011000000100000000000000 ); is_sret = (instr[31 :7 ] == 25'b0001000000100000000000000 ); end else begin is_csr_instr = 1'b1 ; is_ecall = 1'b0 ; is_mret = 1'b0 ; is_sret = 1'b0 ; end end else begin is_csr_instr = 1'b0 ; is_ecall = 1'b0 ; is_mret = 1'b0 ; is_sret = 1'b0 ; end end
最后将ZICSR部分的OPCODE分流补全:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 OPCODE_ZICSR: begin if (is_sspush_internal || is_sspopchk_internal || is_ssrdp_internal) begin is_illegal_instr = 1'b0 ; end else if (is_ssrdp_encoding) begin is_illegal_instr = 1'b1 ; end else if (funct3 == `FUNCT3_CALL) begin is_illegal_instr = !((instr == 32'h00000073 ) || (instr == 32'h30200073 ) || (instr == 32'h10200073 )); end else begin is_illegal_instr = (funct3 == 3'b100 ); end end
要改的太多了,看注释吧。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 `include "include /defines.svh" module CSR ( input logic clk, input logic rst_n, input logic csr_we, input logic [11 :0 ] csr_addr, input logic [31 :0 ] csr_wdata, input logic [ 2 :0 ] csr_op, output logic [31 :0 ] csr_rdata, input logic exception_valid, input logic [31 :0 ] exception_pc, input logic [31 :0 ] exception_cause, input logic [31 :0 ] exception_tval, input logic mret_valid, input logic sret_valid, input logic ssp_update_valid, input logic [31 :0 ] ssp_update_data, input logic elp_update_valid, input logic elp_update_expected, output logic trap_to_mmode, output logic [31 :0 ] trap_target, output logic [31 :0 ] xret_target, output logic [ 1 :0 ] current_priv_mode, output logic mstatus_tsr, output logic mstatus_tvm, output logic current_sse_enabled, output logic current_lpe_enabled, output logic elp_expected, output logic [31 :0 ] ssp_value ); logic [31 :0 ] mstatus; logic [31 :0 ] mstatush; logic [31 :0 ] mtvec; logic [31 :0 ] mepc; logic [31 :0 ] mcause; logic [31 :0 ] mscratch; logic [31 :0 ] mtval; logic [31 :0 ] mie; logic [31 :0 ] mip; logic [31 :0 ] misa; logic [31 :0 ] medeleg; logic [31 :0 ] mideleg; logic [31 :0 ] menvcfg; logic [31 :0 ] mseccfg; logic [31 :0 ] mcounteren; logic [31 :0 ] mnstatus; logic [31 :0 ] pmpcfg0; logic [31 :0 ] pmpaddr0; logic [31 :0 ] stvec; logic [31 :0 ] sepc; logic [31 :0 ] scause; logic [31 :0 ] sscratch; logic [31 :0 ] stval; logic [31 :0 ] satp; logic [31 :0 ] senvcfg; logic [31 :0 ] scounteren; logic [31 :0 ] ssp; logic elp_state; logic [63 :0 ] mcycle; logic [ 1 :0 ] priv_mode; localparam integer SIE_BIT = 1 ; localparam integer MIE_BIT = 3 ; localparam integer SPIE_BIT = 5 ; localparam integer MPIE_BIT = 7 ; localparam integer SPP_BIT = 8 ; localparam integer MPP_LOW = 11 ; localparam integer MPP_HIGH = 12 ; localparam integer TVM_BIT = 20 ; localparam integer TSR_BIT = 22 ; localparam integer SPELP_BIT = `MSTATUS_SPELP_BIT; localparam integer MPELP_BIT = `MSTATUSH_MPELP_BIT; localparam logic [31 :0 ] MSTATUS_WRITABLE_MASK = 32'h00F0_19AA ; localparam logic [31 :0 ] MSTATUSH_WRITABLE_MASK = 32'h0000_0200 ; localparam logic [31 :0 ] SSTATUS_MASK = 32'h0080_0122 ; localparam logic [31 :0 ] MENVCFG_WRITABLE_MASK = 32'h0000_000C ; localparam logic [31 :0 ] SENVCFG_WRITABLE_MASK = 32'h0000_000C ; localparam logic [31 :0 ] MSECCFG_WRITABLE_MASK = 32'h0000_0400 ; function automatic [31 :0 ] align_trap_vector(input logic [31 :0 ] value); begin align_trap_vector = {value[31 :2 ], 2'b00 }; end endfunction function automatic [31 :0 ] align_epc(input logic [31 :0 ] value); begin align_epc = {value[31 :2 ], 2'b00 }; end endfunction function automatic [31 :0 ] compose_sstatus(input logic [31 :0 ] mstatus_value); begin compose_sstatus = mstatus_value & SSTATUS_MASK; end endfunction function automatic [31 :0 ] compose_senvcfg(input logic [31 :0 ] menvcfg_value, input logic [31 :0 ] senvcfg_value); logic [31 :0 ] view_value; begin view_value = senvcfg_value & SENVCFG_WRITABLE_MASK; if (!menvcfg_value[`ENVCFG_SSE_BIT]) begin view_value[`ENVCFG_SSE_BIT] = 1'b0 ; end compose_senvcfg = view_value; end endfunction function automatic [31 :0 ] sanitize_mstatus(input logic [31 :0 ] new_value); logic [31 :0 ] sanitized; begin sanitized = new_value & MSTATUS_WRITABLE_MASK; if (sanitized[MPP_HIGH:MPP_LOW] == 2'b10 ) begin sanitized[MPP_HIGH:MPP_LOW] = `PRV_U; end sanitize_mstatus = sanitized; end endfunction function automatic [31 :0 ] sanitize_mstatush(input logic [31 :0 ] new_value); begin sanitize_mstatush = new_value & MSTATUSH_WRITABLE_MASK; end endfunction function automatic [31 :0 ] sanitize_menvcfg(input logic [31 :0 ] new_value); begin sanitize_menvcfg = new_value & MENVCFG_WRITABLE_MASK; end endfunction function automatic [31 :0 ] sanitize_senvcfg(input logic [31 :0 ] new_value, input logic [31 :0 ] menvcfg_value); logic [31 :0 ] sanitized; begin sanitized = new_value & SENVCFG_WRITABLE_MASK; if (!menvcfg_value[`ENVCFG_SSE_BIT]) begin sanitized[`ENVCFG_SSE_BIT] = 1'b0 ; end sanitize_senvcfg = sanitized; end endfunction function automatic [31 :0 ] sanitize_mseccfg(input logic [31 :0 ] new_value); begin sanitize_mseccfg = new_value & MSECCFG_WRITABLE_MASK; end endfunction function automatic [31 :0 ] update_sstatus_view(input logic [31 :0 ] old_mstatus, input logic [31 :0 ] new_sstatus); logic [31 :0 ] merged; begin merged = old_mstatus; merged[SIE_BIT] = new_sstatus[SIE_BIT]; merged[SPIE_BIT] = new_sstatus[SPIE_BIT]; merged[SPP_BIT] = new_sstatus[SPP_BIT]; merged[SPELP_BIT] = new_sstatus[SPELP_BIT]; update_sstatus_view = merged; end endfunction function automatic logic shadow_stack_enabled( input logic [1 :0 ] priv, input logic [31 :0 ] menvcfg_value, input logic [31 :0 ] senvcfg_value); begin unique case (priv) `PRV_S: shadow_stack_enabled = menvcfg_value[`ENVCFG_SSE_BIT]; `PRV_U: shadow_stack_enabled = menvcfg_value[`ENVCFG_SSE_BIT] && senvcfg_value[`ENVCFG_SSE_BIT]; default : shadow_stack_enabled = 1'b0 ; endcase end endfunction function automatic logic landing_pad_enabled( input logic [1 :0 ] priv, input logic [31 :0 ] menvcfg_value, input logic [31 :0 ] senvcfg_value, input logic [31 :0 ] mseccfg_value); begin unique case (priv) `PRV_M: landing_pad_enabled = mseccfg_value[`MSECCFG_MLPE_BIT]; `PRV_S: landing_pad_enabled = menvcfg_value[`ENVCFG_LPE_BIT]; `PRV_U: landing_pad_enabled = senvcfg_value[`ENVCFG_LPE_BIT]; default : landing_pad_enabled = 1'b0 ; endcase end endfunction function automatic logic take_delegated_trap(input logic [1 :0 ] priv, input logic [31 :0 ] cause, input logic [31 :0 ] medeleg_value, input logic [31 :0 ] mideleg_value); begin if (priv == `PRV_M) begin take_delegated_trap = 1'b0 ; end else if (cause[31 ]) begin take_delegated_trap = mideleg_value[cause[4 :0 ]]; end else begin take_delegated_trap = medeleg_value[cause[4 :0 ]]; end end endfunction logic [31 :0 ] senvcfg_view; assign senvcfg_view = compose_senvcfg(menvcfg, senvcfg); always_comb begin case (csr_addr) `CSR_SSP: csr_rdata = ssp; `CSR_SSTATUS: csr_rdata = compose_sstatus(mstatus); `CSR_SIE: csr_rdata = mie & mideleg; `CSR_STVEC: csr_rdata = stvec; `CSR_SCOUNTEREN: csr_rdata = scounteren; `CSR_SENVCFG: csr_rdata = senvcfg_view; `CSR_SSCRATCH: csr_rdata = sscratch; `CSR_SEPC: csr_rdata = sepc; `CSR_SCAUSE: csr_rdata = scause; `CSR_STVAL: csr_rdata = stval; `CSR_SIP: csr_rdata = mip & mideleg; `CSR_SATP: csr_rdata = satp; `CSR_MSTATUS: csr_rdata = mstatus; `CSR_MISA: csr_rdata = misa; `CSR_MVENDORID: csr_rdata = 32'b0 ; `CSR_MARCHID: csr_rdata = 32'b0 ; `CSR_MIMPID: csr_rdata = 32'b0 ; `CSR_MHARTID: csr_rdata = 32'b0 ; `CSR_MEDELEG: csr_rdata = medeleg; `CSR_MIDELEG: csr_rdata = mideleg; `CSR_MIE: csr_rdata = mie; `CSR_MNSTATUS: csr_rdata = mnstatus; `CSR_MTVEC: csr_rdata = mtvec; `CSR_MSTATUSH: csr_rdata = mstatush; `CSR_MENVCFG: csr_rdata = menvcfg; `CSR_MCOUNTEREN: csr_rdata = mcounteren; `CSR_MSCRATCH: csr_rdata = mscratch; `CSR_MEPC: csr_rdata = mepc; `CSR_MCAUSE: csr_rdata = mcause; `CSR_MTVAL: csr_rdata = mtval; `CSR_MIP: csr_rdata = mip; `CSR_MSECCFG: csr_rdata = mseccfg; `CSR_PMPCFG0: csr_rdata = pmpcfg0; `CSR_PMPADDR0: csr_rdata = pmpaddr0; `CSR_MCYCLE, `CSR_CYCLE: csr_rdata = mcycle[31 :0 ]; `CSR_MCYCLEH, `CSR_CYCLEH: csr_rdata = mcycle[63 :32 ]; `CSR_INSTRET: csr_rdata = mcycle[31 :0 ]; `CSR_INSTRETH: csr_rdata = mcycle[63 :32 ]; default : csr_rdata = 32'b0 ; endcase end logic [31 :0 ] csr_new_value; always_comb begin case (csr_op) `FUNCT3_CSRRW, `FUNCT3_CSRRWI: csr_new_value = csr_wdata; `FUNCT3_CSRRS, `FUNCT3_CSRRSI: csr_new_value = csr_rdata | csr_wdata; `FUNCT3_CSRRC, `FUNCT3_CSRRCI: csr_new_value = csr_rdata & (~csr_wdata); default : csr_new_value = csr_rdata; endcase end logic delegated_exception; assign delegated_exception = exception_valid && take_delegated_trap( priv_mode, exception_cause, medeleg, mideleg ); logic [1 :0 ] mret_target_priv; logic [1 :0 ] sret_target_priv; assign mret_target_priv = mstatus[MPP_HIGH:MPP_LOW]; assign sret_target_priv = mstatus[SPP_BIT] ? `PRV_S : `PRV_U; always_ff @(posedge clk or negedge rst_n) begin if (!rst_n) begin mstatus <= 32'h0000_1800 ; mstatush <= 32'h0000_0000 ; mtvec <= 32'h0114_5140 ; mepc <= 32'h0000_0000 ; mcause <= 32'h0000_0000 ; mscratch <= 32'h0000_0000 ; mtval <= 32'h0000_0000 ; mie <= 32'h0000_0000 ; mip <= 32'h0000_0000 ; misa <= 32'h4014_0100 ; medeleg <= 32'h0000_0000 ; mideleg <= 32'h0000_0000 ; menvcfg <= 32'h0000_0000 ; mseccfg <= 32'h0000_0000 ; mcounteren <= 32'h0000_0000 ; mnstatus <= 32'h0000_0000 ; pmpcfg0 <= 32'h0000_0000 ; pmpaddr0 <= 32'h0000_0000 ; stvec <= 32'h0114_5140 ; scounteren <= 32'h0000_0000 ; senvcfg <= 32'h0000_0000 ; sepc <= 32'h0000_0000 ; scause <= 32'h0000_0000 ; sscratch <= 32'h0000_0000 ; stval <= 32'h0000_0000 ; satp <= 32'h0000_0000 ; ssp <= 32'h0000_0000 ; elp_state <= 1'b0 ; priv_mode <= `PRV_M; end else if (exception_valid) begin if (delegated_exception) begin sepc <= align_epc(exception_pc); scause <= exception_cause; stval <= exception_tval; mstatus[SPIE_BIT] <= mstatus[SIE_BIT]; mstatus[SIE_BIT] <= 1'b0 ; mstatus[SPP_BIT] <= (priv_mode == `PRV_S); mstatus[SPELP_BIT] <= elp_state; elp_state <= 1'b0 ; priv_mode <= `PRV_S; end else begin mepc <= align_epc(exception_pc); mcause <= exception_cause; mtval <= exception_tval; mstatus[MPIE_BIT] <= mstatus[MIE_BIT]; mstatus[MIE_BIT] <= 1'b0 ; mstatus[MPP_HIGH:MPP_LOW] <= priv_mode; mstatush[MPELP_BIT] <= elp_state; elp_state <= 1'b0 ; priv_mode <= `PRV_M; end end else if (mret_valid) begin priv_mode <= mret_target_priv; mstatus[MIE_BIT] <= mstatus[MPIE_BIT]; mstatus[MPIE_BIT] <= 1'b1 ; mstatus[MPP_HIGH:MPP_LOW] <= `PRV_U; elp_state <= landing_pad_enabled( mret_target_priv, menvcfg, senvcfg, mseccfg ) ? mstatush[MPELP_BIT] : 1'b0 ; mstatush[MPELP_BIT] <= 1'b0 ; end else if (sret_valid) begin priv_mode <= sret_target_priv; mstatus[SIE_BIT] <= mstatus[SPIE_BIT]; mstatus[SPIE_BIT] <= 1'b1 ; mstatus[SPP_BIT] <= 1'b0 ; elp_state <= landing_pad_enabled( sret_target_priv, menvcfg, senvcfg, mseccfg ) ? mstatus[SPELP_BIT] : 1'b0 ; mstatus[SPELP_BIT] <= 1'b0 ; end else if (ssp_update_valid) begin ssp <= ssp_update_data; end else if (elp_update_valid) begin elp_state <= elp_update_expected; end else if (csr_we) begin case (csr_addr) `CSR_SSP: ssp <= csr_new_value; `CSR_SSTATUS: mstatus <= update_sstatus_view(mstatus, csr_new_value); `CSR_SIE: mie <= (mie & ~mideleg) | (csr_new_value & mideleg); `CSR_STVEC: stvec <= align_trap_vector(csr_new_value); `CSR_SCOUNTEREN: scounteren <= csr_new_value; `CSR_SENVCFG: senvcfg <= sanitize_senvcfg(csr_new_value, menvcfg); `CSR_SSCRATCH: sscratch <= csr_new_value; `CSR_SEPC: sepc <= align_epc(csr_new_value); `CSR_SCAUSE: scause <= csr_new_value; `CSR_STVAL: stval <= csr_new_value; `CSR_SIP: mip <= (mip & ~mideleg) | (csr_new_value & mideleg); `CSR_SATP: satp <= csr_new_value; `CSR_MSTATUS: mstatus <= sanitize_mstatus(csr_new_value); `CSR_MEDELEG: medeleg <= csr_new_value; `CSR_MIDELEG: mideleg <= csr_new_value; `CSR_MIE: mie <= csr_new_value; `CSR_MNSTATUS: mnstatus <= csr_new_value; `CSR_MTVEC: mtvec <= align_trap_vector(csr_new_value); `CSR_MSTATUSH: mstatush <= sanitize_mstatush(csr_new_value); `CSR_MENVCFG: menvcfg <= sanitize_menvcfg(csr_new_value); `CSR_MCOUNTEREN: mcounteren <= csr_new_value; `CSR_MSCRATCH: mscratch <= csr_new_value; `CSR_MEPC: mepc <= align_epc(csr_new_value); `CSR_MCAUSE: mcause <= csr_new_value; `CSR_MTVAL: mtval <= csr_new_value; `CSR_MIP: mip <= csr_new_value; `CSR_MSECCFG: mseccfg <= sanitize_mseccfg(csr_new_value); `CSR_PMPCFG0: pmpcfg0 <= csr_new_value; `CSR_PMPADDR0: pmpaddr0 <= csr_new_value; default : ; endcase end end always_ff @(posedge clk or negedge rst_n) begin if (!rst_n) begin mcycle <= 64'h0 ; end else if (csr_we && csr_addr == `CSR_MCYCLE) begin mcycle[31 :0 ] <= csr_new_value; end else if (csr_we && csr_addr == `CSR_MCYCLEH) begin mcycle[63 :32 ] <= csr_new_value; end else begin mcycle <= mcycle + 64'h1 ; end end assign trap_to_mmode = exception_valid && !delegated_exception; assign trap_target = delegated_exception ? align_trap_vector(stvec) : align_trap_vector(mtvec); assign xret_target = mret_valid ? mepc : sepc; assign current_priv_mode = priv_mode; assign mstatus_tsr = mstatus[TSR_BIT]; assign mstatus_tvm = mstatus[TVM_BIT]; assign current_sse_enabled = shadow_stack_enabled(priv_mode, menvcfg, senvcfg_view); assign current_lpe_enabled = landing_pad_enabled(priv_mode, menvcfg, senvcfg_view, mseccfg); assign elp_expected = elp_state; assign ssp_value = ssp; endmodule
我们加入了影子栈,它也需要进行读写,因此需要对存取控制模块进行修改:
1 2 3 4 5 6 7 if (dram_we) begin if (sl_type == `MEM_SSPUSH) begin wstrb = 4'b1111 ; store_data_o = store_data_i; end else begin unique case (sl_type[1 :0 ])
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 always_comb begin logic [31 :0 ] raw; raw = 32'b0 ; load_data_o = 32'b0 ; if (sl_type == `MEM_SSPOPCHK) begin load_data_o = load_data_i; end else begin case (sl_type[1 :0 ]) 2'b01 : begin raw = (load_data_i >> (addr[1 :0 ] * 8 )) & 32'h000000FF ; end 2'b10 : begin raw = (load_data_i >> (addr[1 ] * 16 )) & 32'h0000FFFF ; end 2'b11 : begin raw = load_data_i; end default : raw = 32'b0 ; endcase if (is_load_unsigned) begin load_data_o = raw; end else begin case (sl_type[1 :0 ]) 2'b01 : load_data_o = {{24 {raw[7 ]}}, raw[7 :0 ]}; 2'b10 : load_data_o = {{16 {raw[15 ]}}, raw[15 :0 ]}; 2'b11 : load_data_o = raw; default : load_data_o = 32'b0 ; endcase end end end
在 Zicfilp中,x7寄存器被设置为“调用点期望的标签”的存放处,位于x7[31:12]。因此要导出,便于后面比较。可能会带来潜在的综合问题吧。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 `include "include /defines.svh" module RegisterF ( input logic clk, input logic rf_we, input logic [ 4 :0 ] rR1, input logic [ 4 :0 ] rR2, input logic [ 4 :0 ] wR, input logic [31 :0 ] wD, input logic rf_we2, input logic [ 4 :0 ] wR2, input logic [31 :0 ] wD2, output logic [31 :0 ] rD1, output logic [31 :0 ] rD2, output logic [31 :0 ] x7_o ); logic [31 :0 ] rf_in[32 ]; initial begin rf_in[0 ] = '0 ; end always_ff @(posedge clk) begin if (rf_we && wR != 5'd0 ) begin rf_in[wR] <= wD; end if (rf_we2 && wR2 != 5'd0 && !(rf_we && wR == wR2)) begin rf_in[wR2] <= wD2; end end always_comb begin rD1 = (rR1 == 0 ) ? {32 {1'b0 }} : rf_in[rR1]; rD2 = (rR2 == 0 ) ? {32 {1'b0 }} : rf_in[rR2]; x7_o = rf_in[7 ]; end endmodule
还需要加上针对影子栈的判断:
1 2 3 4 5 6 7 8 input logic shadow_serialize_EX,input logic shadow_serialize_MEM,input logic shadow_serialize_WB,logic shadow_serialize_hazard;assign shadow_serialize_hazard = shadow_serialize_EX || shadow_serialize_MEM || shadow_serialize_WB;assign any_hazard = load_use_hazard || mul_use_hazard || mul_struct_hazard || mul_waw_hazard || shadow_serialize_hazard;
编译与测试
工具链得用rv32i_zicsr_zmmul_zicfilp_zicfiss的。自己编译了一份RV64的,也适用于RV32,在https://github.com/Zxis233/riscv-gnu-toolchain/releases/latest ,架构为rv64gc_zicfilp_zicfiss。
在core_portme.mak里修改编译选项,加上:
1 2 3 4 5 6 7 ARCH_FLAGS = -march=rv32i_zicsr_zmmul_zicfilp_zicfiss -mabi=ilp32 BAREMETAL_FLAGS = -ffreestanding -fno-builtin -nostdlib -nostartfiles CFI_FLAGS = \ -fcf-protection=full -fno-inline -fno-optimize-sibling-calls CFLAGS = $(PORT_CFLAGS) $(ARCH_FLAGS) $(BAREMETAL_FLAGS) $(CFI_FLAGS)\ -I$(PORT_DIR) -I. -DFLAGS_STR=\"$(FLAGS_STR)\"
然后clean完再跑编译,可以看到反汇编出的代码中已经有lpad和影子栈操作相关指令。
让CPU跑一下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 _end=0x000030c0 stack_top=0x0000ff00 sp=0x0000f6a0 Now Time: 0x00000000000004d7 2K performance run parameters for coremark. CoreMark Size : 666 Total ticks : 5184678 Total time (secs): 10 Iterations/Sec : 1 Iterations : 10 Compiler version : GCC15.2.0 Compiler flags : -O2 -g -DPERFORMANCE_RUN=1 Memory location : STACK seedcrc : 0xe9f5 [0]crclist : 0xe714 [0]crcmatrix : 0x1fd7 [0]crcstate : 0x8e3a [0]crcfinal : 0xfcaf Correct operation validated. See README.md for run and reporting rules. Now Time: 0x00000000004fb6cc 52248755000| [PASS] | Finished
No Zicfi
Zicfi
Loss
Total ticks
4598265
5184678
12.8%
Simulation Time
46317325000
52248755000
这个性能损耗还算可以接受。