Control-flow Integrity 控制流完整性

什么是控制流完整性?

一个程序能跑起来,说明其结构没有明显问题——宏观上看是这样,但要深入到程序的二进制码级呢?可能会有看起来没问题但实际上有隐患的地方,如栈溢出等。此外,即使一个程序写的很完美,没有任何漏洞,不法分子依旧有办法用它来进行一些攻击——如ROP和JOP。

ROP、JOP与COP

在讲这几种攻击之前,先说说什么是“代码重用攻击”。顾名思义,其核心思路不是往内存里再塞一段新代码,而是把程序里本来就存在的指令片段拼接起来,形成攻击者想要的控制流。换句话说,就是“以彼之道、还彼之身”。

ROP,Return-Oriented Programming,面向返回编程。喜欢玩卡西欧系列计算器的应该不陌生,那些在计算器上显示出特定字符、甚至用来玩游戏的方法,都是ROP。它把很多以 ret 结尾的小片段(通常叫 gadget)串起来执行;因为 ret 会从栈上取下一个返回地址,所以一旦攻击者能改动返回地址,程序就可能被引到一串“并非程序员原本打算这样连起来”的代码片段上。

JOP,Jump-Oriented Programming,面向跳转编程。它和 ROP 的目标一样,都是拼 gadget,但不依赖 ret,而是主要依赖间接跳转(例如 jmp reg 这一类)。JOP 的核心点是:即使针对 ret 做检测或限制,攻击者仍然可以改用 jump-based 的方式把 gadget 串起来。

COP,Call-Oriented Programming,面向调用编程。它和 JOP 很像,区别在于更偏向利用 call 这一类控制转移来连接 gadget。

前向与后向

前向(forward edge),从“当前代码”跳到下一段要执行的代码,比如 call foo()、通过函数指针调用、虚函数调用、间接 jump等;后向(backward edge),即函数执行完以后,从 callee 返回到 caller,也就是 ret 这条路。

举个C的例子:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#include <stdio.h>

void target(void){
printf("3 target: 现在正在 target() 里执行\n");
printf("4 target: 准备 return,回到调用我的地方\n");
}

void run(void (*fp)(void)){
printf("2 run: 前向 -> 通过函数指针调用 fp()\n");
fp(); // 前向:run 跳到 target
printf("5 run: 后向 <- fp() 已经返回到 run()\n");
}

int main(void){
printf("1 main: 前向 -> 调用 run()\n");
run(target); // 前向:main 跳到 run
printf("6 main: 后向 <- run() 已经返回到 main()\n");
return 0;
}

这段程序的执行顺序是:

  1. main: 前向 -> 调用 run()
  2. run: 前向 -> 通过函数指针调用 fp()
  3. target: 正在 target() 里执行
  4. target: 准备 return,回到调用我的地方
  5. run: 后向 <- fp() 已经返回到 run()
  6. main: 后向 <- run() 已经返回到 main()

落实到 Zicfilp 和 Zicfiss,这俩一个保护的是前向,一个保护的是后向。

面向安全性的扩展:Zicfilp/Zicfiss

Zicfilp(Landing Pad) 和 Zicfiss(Shadow Stack) 最早可以追溯到2022年。在 Shadow Stacks and Landing Pads task group 的内部讨论中,有关控制流保护的基本概念被确定下来。同年六月,riscv/riscv-cfi库诞生,Zicfiss 和 Zicfilp 从一开始就是作为一套成对能力来推进的:一个守返回地址,一个守间接控制转移目标。尽管当时只是一个雏形,但对控制流保护的研究很早就开始。

随后,到了2023年, 编译工具 LLVM 也明确指出支持 Zicfilp,尽管只是一个draft版本。对于 Zicfiss 补丁的评审则是在同年年中公开提交。此时,两个扩展已经成型,但是仍处在试验阶段。在下半年,riscv-cfi 仓库进行了高速更新,半年内便发布了六个版本,对扩展相关的指令与编码还在规划。

直到2024年3月,Zicfilp 和 Zicfiss 开始进行为期30天的公开评审。随后,在7月,正式版本发布,代表着控制流完整性扩展变为了Privilege文档内的一章。

然而,光有草案不够,社区推进也较为缓慢。截止目前,仅有一个基于CVA6的分支实现了 Zicfilp 和 Zicfiss 扩展。

对于 Zicfilp,其核心思想是:用一个 landing pad(LPAD) 约束间接 jump/call 的合法目标,主要防 JOP/COP;对于 Zicfilp,则是使用 shadow stack(影子栈) 来保护函数返回地址,主要防 ROP这两个扩展可真是一对笑面虎两头乌角鲨啊

Zicfilp

Zicfilp就带来了一条新指令lpad,被称为“落地点”指令。开启了-fcf-protection=branch后,编译器会把它放在“允许成为间接调用/间接跳转目标”的位置,比如被取地址的函数入口,或者某些合法的间接跳转目标。Zicfilp 开启后,处理器会跟踪一个 ELP(expected landing pad) 状态:一旦某次间接跳转被认定“目标必须是合法入口”,目标地址上的第一条有效指令就必须是 lpad,否则抛出 software-check 异常。

那对于老CPU,不支持怎么办?它的OPCODE和auipc完全一样,而且恰好设置为rd=x0,并把高 20 位立即数解释成 lpl(landing-pad label),而不是普通 auipc 里的 PC-relative upper immediate。rd=x0使其被编码在HINT内,可以被实现“自由忽略”,当 Zicfilp 没实现,或者 在某个特权级上没有激活 时,lpad 就按 no-op运行,不影响架构可见状态。

Zicfiss

Zicfilp带来了几条新指令:ssppushsspopchkssrdp。其实还有一个ssamoswap,但我们也没支持原子操作,故省略。

Zicfiss

Zicfi 整合与测试

这里将两个扩展统称为 Zicfi。对于 Zicfiss,官方要求其运行在包含S模式的CPU内核上。刚好我们前面加了最小支持,可以试一下。

defines.svh

这里需要加上新的定义:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
// ================== MemOP  定义 ==================
`define MEM_SSPUSH 4'b11_00
`define MEM_SSPOPCHK 4'b11_01

// ================== Exception Cause 定义 ==================
`define EXC_LOAD_ACCESS_FAULT 32'd5
`define EXC_STORE_ACCESS_FAULT 32'd7
`define EXC_SOFTWARE_CHECK 32'd18

// ================== Software-Check 定义 ==================
`define SOFTCHK_LPAD_FAULT 32'd2
`define SOFTCHK_SHADOW_STACK_FAULT 32'd3

// ================== Zicfilp/Zicfiss 定义 ==================
`define MATCH_LPAD 32'h0000_0017
`define MASK_LPAD 32'h0000_0FFF
`define MATCH_SSPUSH 32'hCE00_4073
`define MASK_SSPUSH 32'hFE0F_FFFF
`define MATCH_SSPOPCHK 32'hCDC0_4073
`define MASK_SSPOPCHK 32'hFFF0_7FFF
`define MATCH_SSRDP 32'hCDC0_4073
`define MASK_SSRDP 32'hFFFF_F07F

`define ENVCFG_LPE_BIT 2
`define ENVCFG_SSE_BIT 3
`define MSECCFG_MLPE_BIT 10
`define MSTATUS_SPELP_BIT 23
`define MSTATUSH_MPELP_BIT 9

Decoder.sv

我们设定了新的指令,需要对译码器进行修改,增加额外的端口、存取类型判断和信号线。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
    output logic is_lpad_instr, // LPAD 指令标识
// ...
logic is_lpad_internal;
logic is_sspush_internal;
logic is_sspopchk_internal;
logic is_ssrdp_encoding;
logic is_ssrdp_internal;

assign is_lpad_internal = ((instr & `MASK_LPAD) == `MATCH_LPAD);
assign is_sspush_internal = ((instr & `MASK_SSPUSH) == `MATCH_SSPUSH);
assign is_ssrdp_encoding = ((instr & `MASK_SSRDP) == `MATCH_SSRDP);
assign is_ssrdp_internal = is_ssrdp_encoding && (instr[11:7] != 5'd0);
assign
is_sspopchk_internal = !is_ssrdp_encoding && ((instr & `MASK_SSPOPCHK) == `MATCH_SSPOPCHK);
assign is_lpad_instr = is_lpad_internal;

然后,是判断RS1和RS2是否被使用。因为sspush将GPR编码在rs2的位置。

1
2
3
4
5
6
7
8
assign rs1_used = is_sspopchk_internal ?
1'b1 : (is_lpad_internal || is_sspush_internal || is_ssrdp_encoding) ? 1'b0 :
~((opcode == OPCODE_LUI) || (opcode == OPCODE_AUIPC) || (opcode == OPCODE_JAL) ||
(opcode == OPCODE_ZERO) || csr_use_imm || (
(opcode == OPCODE_ZICSR) && (funct3 == `FUNCT3_CALL))); // ECALL/MRET/SRET don't use rs1
// SSPUSH consumes the GPR encoded in bits [24:20], i.e. the rs2 slot.
assign rs2_used = is_sspush_internal || (opcode == OPCODE_RTYPE) || (opcode == OPCODE_BTYPE) ||
(opcode == OPCODE_STYPE);

接着修改auipc的判断,因为lpad的OPCODE和auipc是一个。

1
assign is_auipc = (opcode == OPCODE_AUIPC) && !is_lpad_internal;

然后是写回的多路选择器,现在也要加上针对影子栈的通路。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
        // 首先检测是否为乘法指令
if (is_mul_internal) begin
wd_sel = `WD_SEL_FROM_MUL;
end else if (is_ssrdp_internal) begin
wd_sel = `WD_SEL_FROM_SSP;
end else if (is_lpad_internal) begin
wd_sel = 3'b0;
// ...

// 首先检测是否为乘法指令
if (is_mul_internal) begin
rf_we = 1'b0; // 乘法指令的写回由乘法器处理
end else if (is_ssrdp_internal) begin
rf_we = 1'b1;
end else if (is_lpad_internal || is_sspush_internal || is_sspopchk_internal) begin
rf_we = 1'b0;
// ...
assign dram_we = (opcode == OPCODE_STYPE) || is_sspush_internal;

还有数据存取类型判断,加上影子栈相关:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
// 数据存取类型判断
always_comb begin : sl_selection
if (is_sspush_internal) begin
sl_type = `MEM_SSPUSH;
end else if (is_sspopchk_internal) begin
sl_type = `MEM_SSPOPCHK;
end else begin
unique case (opcode)
OPCODE_LTYPE: begin
case (funct3)
`FUNCT3_LB: sl_type = `MEM_LB;
`FUNCT3_LBU: sl_type = `MEM_LBU;
`FUNCT3_LH: sl_type = `MEM_LH;
`FUNCT3_LHU: sl_type = `MEM_LHU;
`FUNCT3_LW: sl_type = `MEM_LW;
default: sl_type = `MEM_NOP;
endcase
end

OPCODE_STYPE:begin
case (funct3)
`FUNCT3_SB: sl_type = `MEM_SB;
`FUNCT3_SH: sl_type = `MEM_SH;
`FUNCT3_SW: sl_type = `MEM_SW;
default: sl_type = `MEM_NOP;
endcase
end
default: sl_type = `MEM_NOP;
endcase
end
end

还有CSR地址提取:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
always_comb begin : csr_detection
// 提取CSR地址
csr_addr = instr[31:20];
csr_op = funct3;

if (is_sspush_internal || is_sspopchk_internal || is_ssrdp_encoding) begin
is_csr_instr = 1'b0;
is_ecall = 1'b0;
is_mret = 1'b0;
is_sret = 1'b0;
end else if (opcode == OPCODE_ZICSR) begin
if (funct3 == `FUNCT3_CALL) begin
// ECALL: instr = 0x00000073
// MRET: instr = 0x30200073
// SRET: instr = 0x10200073
is_csr_instr = 1'b0;
is_ecall = (instr[31:7] == 25'b0);
is_mret = (instr[31:7] == 25'b0011000000100000000000000);
is_sret = (instr[31:7] == 25'b0001000000100000000000000);
end else begin
// CSR instructions: CSRRW, CSRRS, CSRRC, CSRRWI, CSRRSI, CSRRCI
is_csr_instr = 1'b1;
is_ecall = 1'b0;
is_mret = 1'b0;
is_sret = 1'b0;
end
end else begin
is_csr_instr = 1'b0;
is_ecall = 1'b0;
is_mret = 1'b0;
is_sret = 1'b0;
end
end

最后将ZICSR部分的OPCODE分流补全:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
OPCODE_ZICSR: begin
// CSR and system instructions
if (is_sspush_internal || is_sspopchk_internal || is_ssrdp_internal) begin
is_illegal_instr = 1'b0;
end else if (is_ssrdp_encoding) begin
is_illegal_instr = 1'b1; // SSRDP rd=x0 保留为非法编码
end else if (funct3 == `FUNCT3_CALL) begin
// ECALL: instr = 0x00000073
// MRET: instr = 0x30200073
// SRET: instr = 0x10200073
// EBREAK: instr = 0x00100073 (not supported, marked as illegal)
// WFI: instr = 0x10500073 (not supported, marked as illegal)
// Check ECALL: bits[31:7] = 0
// Check MRET: bits[31:20]=0x302, bits[19:7]=0
// Check SRET: bits[31:20]=0x102, bits[19:7]=0
is_illegal_instr = !((instr == 32'h00000073) || // ECALL
(instr == 32'h30200073) || // MRET
(instr == 32'h10200073)); // SRET
end else begin
// CSR instructions: funct3 001-011, 101-111 are valid
// funct3 = 000 handled above, funct3 = 100 is invalid
is_illegal_instr = (funct3 == 3'b100);
end
end

CSR.sv

要改的太多了,看注释吧。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
`include "include/defines.svh"

// CSR(控制和状态寄存器)模块
// 实现 RV32I 的 Zicsr 扩展,并提供最小可工作的 M/S/U 特权支持。
// 当前不实现中断控制、分页翻译和 vectored trap,仅支持同步异常与 xRET。
module CSR (
input logic clk,
input logic rst_n,

// CSR 指令接口
input logic csr_we,
input logic [11:0] csr_addr,
input logic [31:0] csr_wdata,
input logic [ 2:0] csr_op,
output logic [31:0] csr_rdata,

// 异常/陷阱接口
input logic exception_valid,
input logic [31:0] exception_pc,
input logic [31:0] exception_cause,
input logic [31:0] exception_tval,

// xRET 接口
input logic mret_valid,
input logic sret_valid,

// Shadow stack / landing pad 状态更新
input logic ssp_update_valid,
input logic [31:0] ssp_update_data,
input logic elp_update_valid,
input logic elp_update_expected,

// 陷阱输出
output logic trap_to_mmode,
output logic [31:0] trap_target,
output logic [31:0] xret_target,

// 当前特权状态输出
output logic [ 1:0] current_priv_mode,
output logic mstatus_tsr,
output logic mstatus_tvm,
output logic current_sse_enabled,
output logic current_lpe_enabled,
output logic elp_expected,
output logic [31:0] ssp_value
);

// 机器级 CSR
logic [31:0] mstatus;
logic [31:0] mstatush;
logic [31:0] mtvec;
logic [31:0] mepc;
logic [31:0] mcause;
logic [31:0] mscratch;
logic [31:0] mtval;
logic [31:0] mie;
logic [31:0] mip;
logic [31:0] misa;
logic [31:0] medeleg;
logic [31:0] mideleg;
logic [31:0] menvcfg;
logic [31:0] mseccfg;
logic [31:0] mcounteren;
logic [31:0] mnstatus;
logic [31:0] pmpcfg0;
logic [31:0] pmpaddr0;

// 监督级 CSR
logic [31:0] stvec;
logic [31:0] sepc;
logic [31:0] scause;
logic [31:0] sscratch;
logic [31:0] stval;
logic [31:0] satp;
logic [31:0] senvcfg;
logic [31:0] scounteren;

// Zicfiss/Zicfilp 状态
logic [31:0] ssp;
logic elp_state;

// 计数器
logic [63:0] mcycle;

// 当前特权级
logic [ 1:0] priv_mode;

// mstatus 位定义
localparam integer SIE_BIT = 1;
localparam integer MIE_BIT = 3;
localparam integer SPIE_BIT = 5;
localparam integer MPIE_BIT = 7;
localparam integer SPP_BIT = 8;
localparam integer MPP_LOW = 11;
localparam integer MPP_HIGH = 12;
localparam integer TVM_BIT = 20;
localparam integer TSR_BIT = 22;
localparam integer SPELP_BIT = `MSTATUS_SPELP_BIT;
localparam integer MPELP_BIT = `MSTATUSH_MPELP_BIT;

localparam logic [31:0] MSTATUS_WRITABLE_MASK = 32'h00F0_19AA;
localparam logic [31:0] MSTATUSH_WRITABLE_MASK = 32'h0000_0200;
localparam logic [31:0] SSTATUS_MASK = 32'h0080_0122;
localparam logic [31:0] MENVCFG_WRITABLE_MASK = 32'h0000_000C;
localparam logic [31:0] SENVCFG_WRITABLE_MASK = 32'h0000_000C;
localparam logic [31:0] MSECCFG_WRITABLE_MASK = 32'h0000_0400;


// 4 字节对齐
function automatic [31:0] align_trap_vector(input logic [31:0] value);
begin
align_trap_vector = {value[31:2], 2'b00};
end
endfunction

function automatic [31:0] align_epc(input logic [31:0] value);
begin
align_epc = {value[31:2], 2'b00};
end
endfunction

// 提取出对外可见部分
function automatic [31:0] compose_sstatus(input logic [31:0] mstatus_value);
begin
compose_sstatus = mstatus_value & SSTATUS_MASK;
end
endfunction

function automatic [31:0] compose_senvcfg(input logic [31:0] menvcfg_value,
input logic [31:0] senvcfg_value);
logic [31:0] view_value;
begin
view_value = senvcfg_value & SENVCFG_WRITABLE_MASK;
if (!menvcfg_value[`ENVCFG_SSE_BIT]) begin
view_value[`ENVCFG_SSE_BIT] = 1'b0;
end
compose_senvcfg = view_value;
end
endfunction

// 规范化软件写入值
// 防止软件把 mstatus 写成非法/未实现状态
function automatic [31:0] sanitize_mstatus(input logic [31:0] new_value);
logic [31:0] sanitized;
begin
sanitized = new_value & MSTATUS_WRITABLE_MASK;
// 若 MPP 被设置成保留的状态代码 强制改为用户态
if (sanitized[MPP_HIGH:MPP_LOW] == 2'b10) begin
sanitized[MPP_HIGH:MPP_LOW] = `PRV_U;
end
sanitize_mstatus = sanitized;
end
endfunction

function automatic [31:0] sanitize_mstatush(input logic [31:0] new_value);
begin
sanitize_mstatush = new_value & MSTATUSH_WRITABLE_MASK;
end
endfunction

// 避免未实现位被随意写入
function automatic [31:0] sanitize_menvcfg(input logic [31:0] new_value);
begin
sanitize_menvcfg = new_value & MENVCFG_WRITABLE_MASK;
end
endfunction

// 写入时约束 确保 S 态不能绕过 M 态总开关
function automatic [31:0] sanitize_senvcfg(input logic [31:0] new_value,
input logic [31:0] menvcfg_value);
logic [31:0] sanitized;
begin
sanitized = new_value & SENVCFG_WRITABLE_MASK;
if (!menvcfg_value[`ENVCFG_SSE_BIT]) begin
sanitized[`ENVCFG_SSE_BIT] = 1'b0;
end
sanitize_senvcfg = sanitized;
end
endfunction

// 控制 M 态 landing pad 相关能力
function automatic [31:0] sanitize_mseccfg(input logic [31:0] new_value);
begin
sanitize_mseccfg = new_value & MSECCFG_WRITABLE_MASK;
end
endfunction

// 对 sstatus 的写入合并到 mstatus 中
function automatic [31:0] update_sstatus_view(input logic [31:0] old_mstatus,
input logic [31:0] new_sstatus);
logic [31:0] merged;
begin
merged = old_mstatus;
merged[SIE_BIT] = new_sstatus[SIE_BIT];
merged[SPIE_BIT] = new_sstatus[SPIE_BIT];
merged[SPP_BIT] = new_sstatus[SPP_BIT];
merged[SPELP_BIT] = new_sstatus[SPELP_BIT];
update_sstatus_view = merged;
end
endfunction

// 判断当前特权级下 shadow stack 功能是否生效
function automatic logic shadow_stack_enabled(
input logic [1:0] priv, input logic [31:0] menvcfg_value, input logic [31:0] senvcfg_value);
begin
unique case (priv)
`PRV_S: shadow_stack_enabled = menvcfg_value[`ENVCFG_SSE_BIT];
`PRV_U:
shadow_stack_enabled = menvcfg_value[`ENVCFG_SSE_BIT] &&
senvcfg_value[`ENVCFG_SSE_BIT];
default: shadow_stack_enabled = 1'b0;
endcase
end
endfunction

// 判断当前特权级下 landing pad 功能是否生效
function automatic logic landing_pad_enabled(
input logic [1:0] priv, input logic [31:0] menvcfg_value, input logic [31:0] senvcfg_value,
input logic [31:0] mseccfg_value);
begin
unique case (priv)
`PRV_M: landing_pad_enabled = mseccfg_value[`MSECCFG_MLPE_BIT];
`PRV_S: landing_pad_enabled = menvcfg_value[`ENVCFG_LPE_BIT];
`PRV_U: landing_pad_enabled = senvcfg_value[`ENVCFG_LPE_BIT];
default: landing_pad_enabled = 1'b0;
endcase
end
endfunction

// 判断当前异常/中断是否应该委托给 S 态处理
function automatic logic take_delegated_trap(input logic [1:0] priv, input logic [31:0] cause,
input logic [31:0] medeleg_value,
input logic [31:0] mideleg_value);
begin
if (priv == `PRV_M) begin
take_delegated_trap = 1'b0;
end else if (cause[31]) begin
take_delegated_trap = mideleg_value[cause[4:0]];
end else begin
take_delegated_trap = medeleg_value[cause[4:0]];
end
end
endfunction

logic [31:0] senvcfg_view;
assign senvcfg_view = compose_senvcfg(menvcfg, senvcfg);

// CSR 读取逻辑
always_comb begin
case (csr_addr)
`CSR_SSP: csr_rdata = ssp;
`CSR_SSTATUS: csr_rdata = compose_sstatus(mstatus);
`CSR_SIE: csr_rdata = mie & mideleg;
`CSR_STVEC: csr_rdata = stvec;
`CSR_SCOUNTEREN: csr_rdata = scounteren;
`CSR_SENVCFG: csr_rdata = senvcfg_view;
`CSR_SSCRATCH: csr_rdata = sscratch;
`CSR_SEPC: csr_rdata = sepc;
`CSR_SCAUSE: csr_rdata = scause;
`CSR_STVAL: csr_rdata = stval;
`CSR_SIP: csr_rdata = mip & mideleg;
`CSR_SATP: csr_rdata = satp;
`CSR_MSTATUS: csr_rdata = mstatus;
`CSR_MISA: csr_rdata = misa;
`CSR_MVENDORID: csr_rdata = 32'b0;
`CSR_MARCHID: csr_rdata = 32'b0;
`CSR_MIMPID: csr_rdata = 32'b0;
`CSR_MHARTID: csr_rdata = 32'b0;
`CSR_MEDELEG: csr_rdata = medeleg;
`CSR_MIDELEG: csr_rdata = mideleg;
`CSR_MIE: csr_rdata = mie;
`CSR_MNSTATUS: csr_rdata = mnstatus;
`CSR_MTVEC: csr_rdata = mtvec;
`CSR_MSTATUSH: csr_rdata = mstatush;
`CSR_MENVCFG: csr_rdata = menvcfg;
`CSR_MCOUNTEREN: csr_rdata = mcounteren;
`CSR_MSCRATCH: csr_rdata = mscratch;
`CSR_MEPC: csr_rdata = mepc;
`CSR_MCAUSE: csr_rdata = mcause;
`CSR_MTVAL: csr_rdata = mtval;
`CSR_MIP: csr_rdata = mip;
`CSR_MSECCFG: csr_rdata = mseccfg;
`CSR_PMPCFG0: csr_rdata = pmpcfg0;
`CSR_PMPADDR0: csr_rdata = pmpaddr0;
`CSR_MCYCLE, `CSR_CYCLE: csr_rdata = mcycle[31:0];
`CSR_MCYCLEH, `CSR_CYCLEH: csr_rdata = mcycle[63:32];
// Minimal Zicntr support: expose instret as a legal read-only counter view.
`CSR_INSTRET: csr_rdata = mcycle[31:0];
`CSR_INSTRETH: csr_rdata = mcycle[63:32];
default: csr_rdata = 32'b0;
endcase
end

// 根据操作类型计算新的 CSR 值
logic [31:0] csr_new_value;
always_comb begin
case (csr_op)
`FUNCT3_CSRRW, `FUNCT3_CSRRWI: csr_new_value = csr_wdata;
`FUNCT3_CSRRS, `FUNCT3_CSRRSI: csr_new_value = csr_rdata | csr_wdata;
`FUNCT3_CSRRC, `FUNCT3_CSRRCI: csr_new_value = csr_rdata & (~csr_wdata);
default: csr_new_value = csr_rdata;
endcase
end

logic delegated_exception;
assign delegated_exception = exception_valid && take_delegated_trap(
priv_mode, exception_cause, medeleg, mideleg
);

logic [1:0] mret_target_priv;
logic [1:0] sret_target_priv;
assign mret_target_priv = mstatus[MPP_HIGH:MPP_LOW];
assign sret_target_priv = mstatus[SPP_BIT] ? `PRV_S : `PRV_U;

// CSR 写入、异常进入和 xRET 恢复
always_ff @(posedge clk or negedge rst_n) begin
if (!rst_n) begin
mstatus <= 32'h0000_1800;
mstatush <= 32'h0000_0000;
mtvec <= 32'h0114_5140;
mepc <= 32'h0000_0000;
mcause <= 32'h0000_0000;
mscratch <= 32'h0000_0000;
mtval <= 32'h0000_0000;
mie <= 32'h0000_0000;
mip <= 32'h0000_0000;
misa <= 32'h4014_0100; // RV32I + S + U
medeleg <= 32'h0000_0000;
mideleg <= 32'h0000_0000;
menvcfg <= 32'h0000_0000;
mseccfg <= 32'h0000_0000;
mcounteren <= 32'h0000_0000;
mnstatus <= 32'h0000_0000;
pmpcfg0 <= 32'h0000_0000;
pmpaddr0 <= 32'h0000_0000;
stvec <= 32'h0114_5140;
scounteren <= 32'h0000_0000;
senvcfg <= 32'h0000_0000;
sepc <= 32'h0000_0000;
scause <= 32'h0000_0000;
sscratch <= 32'h0000_0000;
stval <= 32'h0000_0000;
satp <= 32'h0000_0000;
ssp <= 32'h0000_0000;
elp_state <= 1'b0;
priv_mode <= `PRV_M;
end else if (exception_valid) begin
if (delegated_exception) begin
sepc <= align_epc(exception_pc);
scause <= exception_cause;
stval <= exception_tval;
mstatus[SPIE_BIT] <= mstatus[SIE_BIT];
mstatus[SIE_BIT] <= 1'b0;
mstatus[SPP_BIT] <= (priv_mode == `PRV_S);
mstatus[SPELP_BIT] <= elp_state;
elp_state <= 1'b0;
priv_mode <= `PRV_S;
end else begin
mepc <= align_epc(exception_pc);
mcause <= exception_cause;
mtval <= exception_tval;
mstatus[MPIE_BIT] <= mstatus[MIE_BIT];
mstatus[MIE_BIT] <= 1'b0;
mstatus[MPP_HIGH:MPP_LOW] <= priv_mode;
mstatush[MPELP_BIT] <= elp_state;
elp_state <= 1'b0;
priv_mode <= `PRV_M;
end
end else if (mret_valid) begin
priv_mode <= mret_target_priv;
mstatus[MIE_BIT] <= mstatus[MPIE_BIT];
mstatus[MPIE_BIT] <= 1'b1;
mstatus[MPP_HIGH:MPP_LOW] <= `PRV_U;
elp_state <= landing_pad_enabled(
mret_target_priv, menvcfg, senvcfg, mseccfg
) ? mstatush[MPELP_BIT] : 1'b0;
mstatush[MPELP_BIT] <= 1'b0;
end else if (sret_valid) begin
priv_mode <= sret_target_priv;
mstatus[SIE_BIT] <= mstatus[SPIE_BIT];
mstatus[SPIE_BIT] <= 1'b1;
mstatus[SPP_BIT] <= 1'b0;
elp_state <= landing_pad_enabled(
sret_target_priv, menvcfg, senvcfg, mseccfg
) ? mstatus[SPELP_BIT] : 1'b0;
mstatus[SPELP_BIT] <= 1'b0;
end else if (ssp_update_valid) begin
ssp <= ssp_update_data;
end else if (elp_update_valid) begin
elp_state <= elp_update_expected;
end else if (csr_we) begin
case (csr_addr)
`CSR_SSP: ssp <= csr_new_value;
`CSR_SSTATUS: mstatus <= update_sstatus_view(mstatus, csr_new_value);
`CSR_SIE: mie <= (mie & ~mideleg) | (csr_new_value & mideleg);
`CSR_STVEC: stvec <= align_trap_vector(csr_new_value);
`CSR_SCOUNTEREN: scounteren <= csr_new_value;
`CSR_SENVCFG: senvcfg <= sanitize_senvcfg(csr_new_value, menvcfg);
`CSR_SSCRATCH: sscratch <= csr_new_value;
`CSR_SEPC: sepc <= align_epc(csr_new_value);
`CSR_SCAUSE: scause <= csr_new_value;
`CSR_STVAL: stval <= csr_new_value;
`CSR_SIP: mip <= (mip & ~mideleg) | (csr_new_value & mideleg);
`CSR_SATP: satp <= csr_new_value;
`CSR_MSTATUS: mstatus <= sanitize_mstatus(csr_new_value);
`CSR_MEDELEG: medeleg <= csr_new_value;
`CSR_MIDELEG: mideleg <= csr_new_value;
`CSR_MIE: mie <= csr_new_value;
`CSR_MNSTATUS: mnstatus <= csr_new_value;
`CSR_MTVEC: mtvec <= align_trap_vector(csr_new_value);
`CSR_MSTATUSH: mstatush <= sanitize_mstatush(csr_new_value);
`CSR_MENVCFG: menvcfg <= sanitize_menvcfg(csr_new_value);
`CSR_MCOUNTEREN: mcounteren <= csr_new_value;
`CSR_MSCRATCH: mscratch <= csr_new_value;
`CSR_MEPC: mepc <= align_epc(csr_new_value);
`CSR_MCAUSE: mcause <= csr_new_value;
`CSR_MTVAL: mtval <= csr_new_value;
`CSR_MIP: mip <= csr_new_value;
`CSR_MSECCFG: mseccfg <= sanitize_mseccfg(csr_new_value);
`CSR_PMPCFG0: pmpcfg0 <= csr_new_value;
`CSR_PMPADDR0: pmpaddr0 <= csr_new_value;
default: ;
endcase
end
end

// mcycle 计数器
always_ff @(posedge clk or negedge rst_n) begin
if (!rst_n) begin
mcycle <= 64'h0;
end else if (csr_we && csr_addr == `CSR_MCYCLE) begin
mcycle[31:0] <= csr_new_value;
end else if (csr_we && csr_addr == `CSR_MCYCLEH) begin
mcycle[63:32] <= csr_new_value;
end else begin
mcycle <= mcycle + 64'h1;
end
end

assign trap_to_mmode = exception_valid && !delegated_exception;
assign trap_target = delegated_exception ? align_trap_vector(stvec) : align_trap_vector(mtvec);
assign xret_target = mret_valid ? mepc : sepc;

assign current_priv_mode = priv_mode;
assign mstatus_tsr = mstatus[TSR_BIT];
assign mstatus_tvm = mstatus[TVM_BIT];
assign current_sse_enabled = shadow_stack_enabled(priv_mode, menvcfg, senvcfg_view);
assign current_lpe_enabled = landing_pad_enabled(priv_mode, menvcfg, senvcfg_view, mseccfg);
assign elp_expected = elp_state;
assign ssp_value = ssp;

endmodule

StoreUnit.svLoadUnit.sv

我们加入了影子栈,它也需要进行读写,因此需要对存取控制模块进行修改:

1
2
3
4
5
6
7
if (dram_we) begin
if (sl_type == `MEM_SSPUSH) begin
wstrb = 4'b1111;
store_data_o = store_data_i;
end else begin
unique case (sl_type[1:0])
// ...
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
always_comb begin
logic [31:0] raw;
raw = 32'b0;
load_data_o = 32'b0;

if (sl_type == `MEM_SSPOPCHK) begin
load_data_o = load_data_i;
end else begin
// 根据 sl_type 和 addr 偏移提取数据
case (sl_type[1:0])
2'b01: begin // byte
raw = (load_data_i >> (addr[1:0] * 8)) & 32'h000000FF;
end
2'b10: begin // half
raw = (load_data_i >> (addr[1] * 16)) & 32'h0000FFFF;
end
2'b11: begin // word
raw = load_data_i;
end
default: raw = 32'b0;
endcase

// 符号扩展或零扩展
if (is_load_unsigned) begin
load_data_o = raw; // 零扩展
end else begin
case (sl_type[1:0])
2'b01: load_data_o = {{24{raw[7]}}, raw[7:0]}; // LB
2'b10: load_data_o = {{16{raw[15]}}, raw[15:0]}; // LH
2'b11: load_data_o = raw; // LW
default: load_data_o = 32'b0;
endcase
end
end
end

RegisterF.sv

在 Zicfilp中,x7寄存器被设置为“调用点期望的标签”的存放处,位于x7[31:12]。因此要导出,便于后面比较。可能会带来潜在的综合问题吧。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
`include "include/defines.svh"

module RegisterF (
input logic clk,
// input logic rst_n,
input logic rf_we,
// 读地址端口
input logic [ 4:0] rR1,
input logic [ 4:0] rR2,
// 写地址端口1 (主流水线)
input logic [ 4:0] wR,
// 写数据端口1 (主流水线)
input logic [31:0] wD,
// 写端口2 (乘法器)
input logic rf_we2,
input logic [ 4:0] wR2,
input logic [31:0] wD2,
// 读数据端口
output logic [31:0] rD1,
output logic [31:0] rD2,
output logic [31:0] x7_o
);

logic [31:0] rf_in[32]; // 32个寄存器 unpacked 维度使用 [32]

// 初始化:x0 = 0
// 便于Yosys综合识别
initial begin
rf_in[0] = '0; // 初始 x0 = 0
end

// 写入使用时序逻辑 - 支持双写端口
// 当两个端口同时写入同一寄存器时主流水线端口优先
// 因乘法为长指令 后写回的短指令在时序上更靠后因此结果更新
// 应当采用更新的寄存器值
always_ff @(posedge clk) begin
// 主流水线写端口
if (rf_we && wR != 5'd0) begin
rf_in[wR] <= wD;
end
// 乘法器写端口(优先级更低)
if (rf_we2 && wR2 != 5'd0 && !(rf_we && wR == wR2)) begin
rf_in[wR2] <= wD2;
end
end

// 读取使用组合逻辑
always_comb begin
rD1 = (rR1 == 0) ? {32{1'b0}} : rf_in[rR1]; // x0寄存器恒为0
rD2 = (rR2 == 0) ? {32{1'b0}} : rf_in[rR2];
x7_o = rf_in[7];
end

endmodule

HazardUnit.sv

还需要加上针对影子栈的判断:

1
2
3
4
5
6
7
8
input  logic             shadow_serialize_EX,
input logic shadow_serialize_MEM,
input logic shadow_serialize_WB,
// ...
logic shadow_serialize_hazard;
assign shadow_serialize_hazard = shadow_serialize_EX || shadow_serialize_MEM || shadow_serialize_WB;
assign any_hazard = load_use_hazard || mul_use_hazard || mul_struct_hazard || mul_waw_hazard ||
shadow_serialize_hazard;

编译与测试

工具链得用rv32i_zicsr_zmmul_zicfilp_zicfiss的。自己编译了一份RV64的,也适用于RV32,在https://github.com/Zxis233/riscv-gnu-toolchain/releases/latest,架构为rv64gc_zicfilp_zicfiss

core_portme.mak里修改编译选项,加上:

1
2
3
4
5
6
7
ARCH_FLAGS  = -march=rv32i_zicsr_zmmul_zicfilp_zicfiss -mabi=ilp32
BAREMETAL_FLAGS = -ffreestanding -fno-builtin -nostdlib -nostartfiles
CFI_FLAGS = \
-fcf-protection=full -fno-inline -fno-optimize-sibling-calls

CFLAGS = $(PORT_CFLAGS) $(ARCH_FLAGS) $(BAREMETAL_FLAGS) $(CFI_FLAGS)\
-I$(PORT_DIR) -I. -DFLAGS_STR=\"$(FLAGS_STR)\"

然后clean完再跑编译,可以看到反汇编出的代码中已经有lpad和影子栈操作相关指令。

让CPU跑一下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
_end=0x000030c0 stack_top=0x0000ff00 sp=0x0000f6a0
Now Time: 0x00000000000004d7
2K performance run parameters for coremark.
CoreMark Size : 666
Total ticks : 5184678
Total time (secs): 10
Iterations/Sec : 1
Iterations : 10
Compiler version : GCC15.2.0
Compiler flags : -O2 -g -DPERFORMANCE_RUN=1
Memory location : STACK
seedcrc : 0xe9f5
[0]crclist : 0xe714
[0]crcmatrix : 0x1fd7
[0]crcstate : 0x8e3a
[0]crcfinal : 0xfcaf
Correct operation validated. See README.md for run and reporting rules.
Now Time: 0x00000000004fb6cc
52248755000| [PASS] | Finished
No Zicfi Zicfi Loss
Total ticks 4598265 5184678 12.8%
Simulation Time 46317325000 52248755000

这个性能损耗还算可以接受。