RV32I指令集和多周期 RV32I CPU代码的笔记,可用于RV32I指令集速查
指令格式
RISC-V指令集是模块化设计的,其中RV32I是基本的32位整数ISA,只有47种指令,但是足以提供给现代操作系统足够的基本支持以及作为编译器的编译目标。
助记符 | 描述 |
---|---|
I | 基本 |
M | 整数乘法/除法 |
A | 原子操作 |
F | 单精度浮点 |
D | 双精度浮点 |
C | 压缩指令 |
RV32I有31个通用整型寄存器,命名为x1 ~ x31,每个的位宽为32位。x0被命名为常数0,它也可以被作为目标寄存器来舍弃指令执行的结果。PC是寄存器x32。特别地,31个通用寄存器约定了一套固定的用途(比如x10
一般用于储存函数返回值和第一个函数参数)称为ABI,可参考这里。
RV32I的指令长度为32位([31:0]
),小端存储,最开始的[6:0]
(opcode)指示命令类型,分为下面11种命令:
instruction | description | algo |
---|---|---|
branch | conditional jump, 6 variants | if(reg OP reg) PC<-PC+imm |
ALU reg | Three-registers ALU ops, 10 variants | reg <- reg OP reg |
ALU imm | Two-registers ALU ops, 9 variants | reg <- reg OP imm |
load | Memory-to-register, 5 variants | reg <- mem[reg + imm] |
store | Register-to-memory, 3 variants | mem[reg+imm] <- reg |
LUI |
load upper immediate | reg <- (im << 12) |
AUIPC |
add upper immediate to PC | reg <- PC+(im << 12) |
JAL |
jump and link | reg <- PC+4 ; PC <- PC+imm |
JALR |
jump and link register | reg <- PC+4 ; PC <- reg+imm |
FENCE |
memory-ordering for multicores | |
SYSTEM |
system calls, breakpoints |
判断opcode确定命令类型:
// See the table P. 105 in RISC-V manual
// The 10 RISC-V instructions
wire isALUreg = (instr[6:0] == 7'b0110011); // rd <- rs1 OP rs2
wire isALUimm = (instr[6:0] == 7'b0010011); // rd <- rs1 OP Iimm
wire isBranch = (instr[6:0] == 7'b1100011); // if(rs1 OP rs2) PC<-PC+Bimm
wire isJALR = (instr[6:0] == 7'b1100111); // rd <- PC+4; PC<-rs1+Iimm
wire isJAL = (instr[6:0] == 7'b1101111); // rd <- PC+4; PC<-PC+Jimm
wire isAUIPC = (instr[6:0] == 7'b0010111); // rd <- PC + Uimm
wire isLUI = (instr[6:0] == 7'b0110111); // rd <- Uimm
wire isLoad = (instr[6:0] == 7'b0000011); // rd <- mem[rs1+Iimm]
wire isStore = (instr[6:0] == 7'b0100011); // mem[rs1+Simm] <- rs2
wire isSYSTEM = (instr[6:0] == 7'b1110011); // special
指令格式上,大致可以分为以下6种指令格式:
根据上图,可以归纳一些常见字段在不同指令格式中的位置:
type | opcode | rd | funct3 | rs1 | rs2 | funct7 |
---|---|---|---|---|---|---|
R | [6:0] | [11:7] | [14:12] | [19:15] | [24:20] | [31:25] |
I | [6:0] | [11:7] | [14:12] | [19:15] | ||
S | [6:0] | [14:12] | [19:15] | [24:20] | ||
B | [6:0] | [14:12] | [19:15] | [24:20] | ||
U | [6:0] | [11:7] | ||||
J | [6:0] | [11:7] |
上表可见源寄存器最多两个rs1,rs2
,目标寄存器最多一个rd
。而且,虽然每种指令的立即数位置不太一样,但是寄存器以及其他很多字段在所有指令中位置都是一样的。立即数,funcode等参数的提取可以参考下面的verilog实现:
// The 5 immediate formats
wire [31:0] Uimm={ instr[31], instr[30:12], {12{1'b0}}};
wire [31:0] Iimm={{21{instr[31]}}, instr[30:20]};
wire [31:0] Simm={{21{instr[31]}}, instr[30:25],instr[11:7]};
wire [31:0] Bimm={{20{instr[31]}}, instr[7],instr[30:25],instr[11:8],1'b0};
wire [31:0] Jimm={{12{instr[31]}}, instr[19:12],instr[20],instr[30:21],1'b0};
// Source and destination registers
wire [4:0] rs1Id = instr[19:15];
wire [4:0] rs2Id = instr[24:20];
wire [4:0] rdId = instr[11:7];
// function codes
wire [2:0] funct3 = instr[14:12];
wire [6:0] funct7 = instr[31:25];
ALU
An arithmetic logic unit (ALU) is a combinational digital circuit that performs arithmetic and bitwise operations on integer binary numbers.
主要涉及到下面这些命令(可以查阅RISC-V reference manual P130的那张表,很清晰地按照指令集分类全部列出),注意RISC-V指令集中立即数都是有符号位扩展的,有符号整数采用补码表示:
instr | opcode | funct3 | funct7 | algo |
---|---|---|---|---|
ADD | 7'b0110011 | 3'b000 | 7'b0000000 | [rd] <- [rs1] + [rs2] |
ADDI | 7'b0010011 | 3'b000 | [rd] <- [rs1] + Iimm | |
SUB | 7'b0110011 | 3'b000 | 7'b0100000 | [rd] <- [rs1] - [rs2] |
SLL | 7'b0110011 | 3'b001 | 7'b0000000 | [rd] <- [rs1] << ([rs2] & 0x1F), logical |
SLLI | 7'b0010011 | 3'b001 | 7'b0000000 | [rd] <- [rs1] << shamt[4:0], logical |
SRL | 7'b0110011 | 3'b101 | 7'b0000000 | [rd] <- [rs1] >> ([rs2] & 0x1F), logical |
SRLI | 7'b0010011 | 3'b101 | 7'b0000000 | [rd] <- [rs1] >> shamt[4:0], logical |
SRA | 7'b0110011 | 3'b101 | 7'b0100000 | [rd] <- [rs1] >> ([rs2] & 0x1F), arithmetic |
SRAI | 7'b0010011 | 3'b101 | 7'b0100000 | [rd] <- [rs1] >> shamt[4:0], arithmetic |
SLT | 7'b0110011 | 3'b010 | 7'b0000000 | [rd] <- ([rs1] < [rs2]? 1 : 0), signed |
SLTU | 7'b0110011 | 3'b011 | 7'b0000000 | [rd] <- ([rs1] < [rs2]? 1 : 0), unsigned |
SLTI | 7'b0010011 | 3'b010 | [rd] <- ([rs1] < imm[11:0]? 1: 0), signed | |
SLTIU | 7'b0010011 | 3'b011 | [rd] <- ([rs1] < imm[11:0]? 1: 0), unsigned | |
AND | 7'b0110011 | 3'b111 | 7'b0000000 | [rd] <- [rs1] & [rs2] |
ANDI | 7'b0010011 | 3'b111 | [rd] <- [rs1] & imm[11:0] | |
OR | 7'b0110011 | 3'b110 | 7'b0000000 | [rd] <- [rs1] | [rs2] |
ORI | 7'b0010011 | 3'b110 | [rd] <- [rs1] | imm[11:0] | |
XOR | 7'b0110011 | 3'b100 | 7'b0000000 | [rd] <- [rs1] ^ [rs2] |
XORI | 7'b0010011 | 3'b100 | [rd] <- [rs1] ^ imm[11:0] |
对照上面这个表,通过判断opcode,funct3,funct7确定命令再执行即可。
// The ALU
wire [31:0] aluIn1 = rs1;
wire [31:0] aluIn2 = isALUreg | isBranch ? rs2 : Iimm;
reg [31:0] aluOut;
wire [4:0] shamt = isALUreg ? rs2[4:0] : instr[24:20]; // shift amount
wire [32:0] aluMinus = {1'b1, ~aluIn2} + {1'b0,aluIn1} + 33'b1;
wire [31:0] aluPlus = aluIn1 + aluIn2;
wire EQ = (aluMinus[31:0] == 0);
wire LTU = aluMinus[32];
wire LT = (aluIn1[31] ^ aluIn2[31]) ? aluIn1[31] : aluMinus[32];
// ADD/SUB/ADDI:
// funct7[5] is 1 for SUB and 0 for ADD. We need also to test instr[5]
// to make the difference with ADDI
//
// SRLI/SRAI/SRL/SRA:
// funct7[5] is 1 for arithmetic shift (SRA/SRAI) and
// 0 for logical shift (SRL/SRLI)
always @(*) begin
case(funct3)
// SUB <- funct[7]==1 && instr[5]==1(ALUimm don't have SUB)
3'b000: aluOut = (funct7[5] & instr[5]) ? aluMinus[31:0] : aluPlus;
// left shift
3'b001: aluOut = aluIn1 << shamt;
// signed comparison (<)
3'b010: aluOut = {31'b0, LT};
// unsigned comparison (<)
3'b011: aluOut = {31'b0, LTU};
// XOR
3'b100: aluOut = (aluIn1 ^ aluIn2);
// logical right shift(0) or arithmetic right shift(1)
3'b101: aluOut = funct7[5]? ($signed(aluIn1) >>> shamt) :
($signed(aluIn1) >> shamt);
// OR
3'b110: aluOut = (aluIn1 | aluIn2);
// AND
3'b111: aluOut = (aluIn1 & aluIn2);
endcase
end
Branch Unit
The Branch Unit is the part of the CPU which allows the program to make decisions, and also to perform jumps (changes to the PC) and procedure calls.
instr | opcode | funct3 | algo |
---|---|---|---|
BEQ | 7'b1100011 | 3'b000 | if(rs1==rs2)PC<-PC+Bimm |
BNE | 7'b1100011 | 3'b001 | if(rs1!=rs2) PC<-PC+Bimm |
BLT | 7'b1100011 | 3'b100 | if(rs1<rs2) PC<-PC+Bimm; (signed comparison) |
BGE | 7'b1100011 | 3'b101 | if(rs1>=rs2) PC<-PC+Bimm; (signed comparison) |
BLTU | 7'b1100011 | 3'b110 | if(rs1<rs2) PC<-PC+Bimm; (unsigned comparison) |
BGEU | 7'b1100011 | 3'b111 | if(rs1>=rs2) PC<-PC+Bimm; (unsigned comparison) |
// BRANCHES
// BEQ rs1, rs2, imm if(rs1==rs2)PC<-PC+Bimm;
// BNE rs1, rs2, imm if(rs1!=rs2) PC<-PC+Bimm;
// BLT rs1, rs2, imm if(rs1<rs2) PC<-PC+Bimm; (signed comparison)
// BGE rs1, rs2, imm if(rs1>=rs2) PC<-PC+Bimm; (signed comparison)
// BLTU rs1, rs2, imm if(rs1<rs2) PC<-PC+Bimm; (unsigned comparison)
// BGEU rs1, rs2, imm if(rs1>=rs2) PC<-PC+Bimm; (unsigned comparison)
reg takeBranch;
always @(*) begin
case(funct3)
3'b000: takeBranch = EQ;
3'b001: takeBranch = !EQ;
3'b100: takeBranch = LT;
3'b101: takeBranch = !LT;
3'b110: takeBranch = LTU;
3'b111: takeBranch = !LTU;
default: takeBranch = 1'b0;
endcase
end
// Equivalent to PCplusImm = PC + (isJAL ? Jimm : isAUIPC ? Uimm : Bimm)
wire [31:0] PCplusImm = PC + ( instr[3] ? Jimm[31:0] :
instr[4] ? Uimm[31:0] :
Bimm[31:0] );
wire [31:0] PCplus4 = PC+4;
// Address computation
// An adder used to compute branch address, JAL address and AUIPC.
// branch->PC+Bimm AUIPC->PC+Uimm JAL->PC+Jimm
// Equivalent to PCplusImm = PC + (isJAL ? Jimm : isAUIPC ? Uimm : Bimm)
wire [31:0] PCplusImm = PC + ( instr[3] ? Jimm[31:0] :
instr[4] ? Uimm[31:0] :
Bimm[31:0] );
wire [31:0] PCplus4 = PC+4;
wire [31:0] nextPC = ((isBranch && takeBranch) || isJAL) ? PCplusImm :
isJALR ? {aluPlus[31:1],1'b0} :
PCplus4;
Load & Store
RISC-V基于加载-存储结构,算术指令只能在寄存器上操作,内存中的数据只能读取和加载。
load
命令如下,sign extend
指LH,LB
会将rd的高位部分以符号位进行补全,而LHU,LBU
用0来补全
instr | funct3 | algo |
---|---|---|
LB | 3'b000 | [rd] <- Mem(imm[11:0] + rs1) & 0xFF, sign extend |
LBU | 3'b100 | [rd] <- Mem(imm[11:0] + rs1) & 0xFF |
LH | 3'b001 | [rd] <- Mem(imm[11:0] + rs1) & 0xFFFF, sign extend |
LHU | 3'b101 | [rd] <- Mem(imm[11:0] + rs1) & 0xFFFF |
LW | 3'b010 | [rd] <- Mem(imm[11:0] + rs1) |
store
命令如下:
instr | funct3 | algo |
---|---|---|
SB | 3'b000 | Mem(imm[11:0] + rs1) <- [rs2] & 0xFF |
SH | 3'b001 | Mem(imm[11:0] + rs1) <- [rs2] & 0xFFFF |
SW | 3'b010 | Mem(imm[11:0] + rs1) <- [rs2] |
wire [ADDR_WIDTH-1:0] loadstore_addr = rs1 + (isStore ? Simm : Iimm);
// Load
// All memory accesses are aligned on 32 bits boundary. For this
// reason, we need some circuitry that does unaligned halfword
// and byte load/store, based on:
// - funct3[1:0]: 00->byte 01->halfword 10->word
// - mem_addr[1:0]: indicates which byte/halfword is accessed
wire mem_byteAccess = funct3[1:0] == 2'b00;
wire mem_halfwordAccess = funct3[1:0] == 2'b01;
wire [15:0] LOAD_halfword =
loadstore_addr[1] ? mem_rdata[31:16] : mem_rdata[15:0];
wire [7:0] LOAD_byte =
loadstore_addr[0] ? LOAD_halfword[15:8] : LOAD_halfword[7:0];
// LOAD, in addition to funct3[1:0], LOAD depends on:
// - funct3[2] (instr[14]): 0->do sign expansion 1->no sign expansion
wire LOAD_sign = !funct3[2] & (mem_byteAccess ? LOAD_byte[7] : LOAD_halfword[15]);
wire [31:0] LOAD_data =
mem_byteAccess ? {{24{LOAD_sign}}, LOAD_byte} :
mem_halfwordAccess ? {{16{LOAD_sign}}, LOAD_halfword} :
mem_rdata ;
// Store
// ------------------------------------------------------------------------
assign mem_wdata[ 7: 0] = rs2[7:0];
assign mem_wdata[15: 8] = loadstore_addr[0] ? rs2[7:0] : rs2[15: 8];
assign mem_wdata[23:16] = loadstore_addr[1] ? rs2[7:0] : rs2[23:16];
assign mem_wdata[31:24] = loadstore_addr[0] ? rs2[7:0] :
loadstore_addr[1] ? rs2[15:8] : rs2[31:24];
// The memory write mask:
// 1111 if writing a word
// 0011 or 1100 if writing a halfword
// (depending on loadstore_addr[1])
// 0001, 0010, 0100 or 1000 if writing a byte
// (depending on loadstore_addr[1:0])
wire [3:0] STORE_wmask =
mem_byteAccess ?
(loadstore_addr[1] ?
(loadstore_addr[0] ? 4'b1000 : 4'b0100) :
(loadstore_addr[0] ? 4'b0010 : 4'b0001)
) :
mem_halfwordAccess ?
(loadstore_addr[1] ? 4'b1100 : 4'b0011) :
4'b1111;
多周期状态机
主要状态:
FETCH_INSTR -> WAIT_INSTR -> EXECUTE -> WAIT_DATA
1.FETCH_INSTR
状态取址,设置mem_addr
为PC
地址,mem_rstrb
置高电平请求读RAM/ROM。
2.WAIT_INSTR
状态等待读完PC
地址指向的指令
3.EXECUTE
状态执行指令:
- 指令译码
- ALU执行计算
- 更新目的寄存器
RegisterBank[rdId]
- 更新
PC
,若是Branch,JAL,JALR
指令,则根据指令改变PC
,否则PC=PC+4
即可,(4 byte=1 instr
) - 若是
load
指令,则设置mem_addr
为loadstore_addr
(load地址),mem_rstrb
置高电平请求读RAM/ROM/IO,并且进入WAIT_DATA
状态,否则跳过WAIT_DATA
,直接转到FETCH_INSTR
状态,开始取出下一条要执行的指令。 - 若是
store
指令,则设置mem_addr
为loadstore_addr
(store地址),设置mem_wmask
,将mem_wdata
写入RAM/IO
4.WAIT_DATA
状态等待load
读完
// The state machine
localparam FETCH_INSTR = 0;
localparam WAIT_INSTR = 1;
localparam EXECUTE = 2;
localparam WAIT_DATA = 3;
reg [1:0] state = FETCH_INSTR;
assign writeBackEn = (state==EXECUTE && !isBranch && !isStore) ||
(state==WAIT_DATA) ;
assign mem_addr = (state == WAIT_INSTR || state == FETCH_INSTR) ?
PC : loadstore_addr ;
assign mem_rstrb = (state == FETCH_INSTR || (state == EXECUTE & isLoad));
assign mem_wmask = {4{(state == EXECUTE) & isStore}} & STORE_wmask;
always @(posedge clk) begin
if(!resetn) begin
PC <= 32'h00820000; // jump to SPI FLASH + 128 kB
state <= WAIT_DATA; // just wait for !mem_rbusy
end else begin
if(writeBackEn && rdId != 0) begin
RegisterBank[rdId] <= writeBackData;
end
case(state)
FETCH_INSTR: begin
state <= WAIT_INSTR;
end
WAIT_INSTR: begin
instr <= mem_rdata[31:2];
rs1 <= RegisterBank[mem_rdata[19:15]];
rs2 <= RegisterBank[mem_rdata[24:20]];
if(!mem_rbusy) begin
state <= EXECUTE;
end
end
EXECUTE: begin
if(!isSYSTEM) begin
PC <= nextPC;
end
state <= isLoad ? WAIT_DATA : FETCH_INSTR;
end
WAIT_DATA: begin
if(!mem_rbusy) begin
state <= FETCH_INSTR;
end
end
endcase
end
end