Kika's
Blog
图片简介 | CC BY 4.0 | 换一张

RV32I指令集及其多周期Verilog实现

2023-08-22

RV32I指令集和多周期 RV32I CPU代码的笔记,可用于RV32I指令集速查

具体代码见仓库,原作者教程

指令格式

RISC-V指令集是模块化设计的,其中RV32I是基本的32位整数ISA,只有47种指令,但是足以提供给现代操作系统足够的基本支持以及作为编译器的编译目标。

助记符 描述
I 基本
M 整数乘法/除法
A 原子操作
F 单精度浮点
D 双精度浮点
C 压缩指令

RV32I有31个通用整型寄存器,命名为x1 ~ x31,每个的位宽为32位。x0被命名为常数0,它也可以被作为目标寄存器来舍弃指令执行的结果。PC是寄存器x32。特别地,31个通用寄存器约定了一套固定的用途(比如x10一般用于储存函数返回值和第一个函数参数)称为ABI,可参考这里

RV32I的指令长度为32位([31:0]),小端存储,最开始的[6:0](opcode)指示命令类型,分为下面11种命令:

instruction description algo
branch conditional jump, 6 variants if(reg OP reg) PC<-PC+imm
ALU reg Three-registers ALU ops, 10 variants reg <- reg OP reg
ALU imm Two-registers ALU ops, 9 variants reg <- reg OP imm
load Memory-to-register, 5 variants reg <- mem[reg + imm]
store Register-to-memory, 3 variants mem[reg+imm] <- reg
LUI load upper immediate reg <- (im << 12)
AUIPC add upper immediate to PC reg <- PC+(im << 12)
JAL jump and link reg <- PC+4 ; PC <- PC+imm
JALR jump and link register reg <- PC+4 ; PC <- reg+imm
FENCE memory-ordering for multicores
SYSTEM system calls, breakpoints

判断opcode确定命令类型:

 // See the table P. 105 in RISC-V manual
 // The 10 RISC-V instructions
 wire isALUreg  =  (instr[6:0] == 7'b0110011); // rd <- rs1 OP rs2   
 wire isALUimm  =  (instr[6:0] == 7'b0010011); // rd <- rs1 OP Iimm
 wire isBranch  =  (instr[6:0] == 7'b1100011); // if(rs1 OP rs2) PC<-PC+Bimm
 wire isJALR    =  (instr[6:0] == 7'b1100111); // rd <- PC+4; PC<-rs1+Iimm
 wire isJAL     =  (instr[6:0] == 7'b1101111); // rd <- PC+4; PC<-PC+Jimm
 wire isAUIPC   =  (instr[6:0] == 7'b0010111); // rd <- PC + Uimm
 wire isLUI     =  (instr[6:0] == 7'b0110111); // rd <- Uimm   
 wire isLoad    =  (instr[6:0] == 7'b0000011); // rd <- mem[rs1+Iimm]
 wire isStore   =  (instr[6:0] == 7'b0100011); // mem[rs1+Simm] <- rs2
 wire isSYSTEM  =  (instr[6:0] == 7'b1110011); // special

指令格式上,大致可以分为以下6种指令格式:

根据上图,可以归纳一些常见字段在不同指令格式中的位置:

type opcode rd funct3 rs1 rs2 funct7
R [6:0] [11:7] [14:12] [19:15] [24:20] [31:25]
I [6:0] [11:7] [14:12] [19:15]
S [6:0] [14:12] [19:15] [24:20]
B [6:0] [14:12] [19:15] [24:20]
U [6:0] [11:7]
J [6:0] [11:7]

上表可见源寄存器最多两个rs1,rs2,目标寄存器最多一个rd。而且,虽然每种指令的立即数位置不太一样,但是寄存器以及其他很多字段在所有指令中位置都是一样的。立即数,funcode等参数的提取可以参考下面的verilog实现:

 // The 5 immediate formats
 wire [31:0] Uimm={    instr[31],   instr[30:12], {12{1'b0}}};
 wire [31:0] Iimm={{21{instr[31]}}, instr[30:20]};
 wire [31:0] Simm={{21{instr[31]}}, instr[30:25],instr[11:7]};
 wire [31:0] Bimm={{20{instr[31]}}, instr[7],instr[30:25],instr[11:8],1'b0};
 wire [31:0] Jimm={{12{instr[31]}}, instr[19:12],instr[20],instr[30:21],1'b0};

 // Source and destination registers
 wire [4:0] rs1Id = instr[19:15];
 wire [4:0] rs2Id = instr[24:20];
 wire [4:0] rdId  = instr[11:7];

 // function codes
 wire [2:0] funct3 = instr[14:12];
 wire [6:0] funct7 = instr[31:25];

ALU

An arithmetic logic unit (ALU) is a combinational digital circuit that performs arithmetic and bitwise operations on integer binary numbers.

主要涉及到下面这些命令(可以查阅RISC-V reference manual P130的那张表,很清晰地按照指令集分类全部列出),注意RISC-V指令集中立即数都是有符号位扩展的,有符号整数采用补码表示:

instr opcode funct3 funct7 algo
ADD 7'b0110011 3'b000 7'b0000000 [rd] <- [rs1] + [rs2]
ADDI 7'b0010011 3'b000 [rd] <- [rs1] + Iimm
SUB 7'b0110011 3'b000 7'b0100000 [rd] <- [rs1] - [rs2]
SLL 7'b0110011 3'b001 7'b0000000 [rd] <- [rs1] << ([rs2] & 0x1F), logical
SLLI 7'b0010011 3'b001 7'b0000000 [rd] <- [rs1] << shamt[4:0], logical
SRL 7'b0110011 3'b101 7'b0000000 [rd] <- [rs1] >> ([rs2] & 0x1F), logical
SRLI 7'b0010011 3'b101 7'b0000000 [rd] <- [rs1] >> shamt[4:0], logical
SRA 7'b0110011 3'b101 7'b0100000 [rd] <- [rs1] >> ([rs2] & 0x1F), arithmetic
SRAI 7'b0010011 3'b101 7'b0100000 [rd] <- [rs1] >> shamt[4:0], arithmetic
SLT 7'b0110011 3'b010 7'b0000000 [rd] <- ([rs1] < [rs2]? 1 : 0), signed
SLTU 7'b0110011 3'b011 7'b0000000 [rd] <- ([rs1] < [rs2]? 1 : 0), unsigned
SLTI 7'b0010011 3'b010 [rd] <- ([rs1] < imm[11:0]? 1: 0), signed
SLTIU 7'b0010011 3'b011 [rd] <- ([rs1] < imm[11:0]? 1: 0), unsigned
AND 7'b0110011 3'b111 7'b0000000 [rd] <- [rs1] & [rs2]
ANDI 7'b0010011 3'b111 [rd] <- [rs1] & imm[11:0]
OR 7'b0110011 3'b110 7'b0000000 [rd] <- [rs1] | [rs2]
ORI 7'b0010011 3'b110 [rd] <- [rs1] | imm[11:0]
XOR 7'b0110011 3'b100 7'b0000000 [rd] <- [rs1] ^ [rs2]
XORI 7'b0010011 3'b100 [rd] <- [rs1] ^ imm[11:0]

对照上面这个表,通过判断opcode,funct3,funct7确定命令再执行即可。

// The ALU
wire [31:0] aluIn1 = rs1;
wire [31:0] aluIn2 = isALUreg | isBranch ? rs2 : Iimm;
reg [31:0] aluOut;
wire [4:0] shamt = isALUreg ? rs2[4:0] : instr[24:20]; // shift amount

wire [32:0] aluMinus = {1'b1, ~aluIn2} + {1'b0,aluIn1} + 33'b1;
wire [31:0] aluPlus = aluIn1 + aluIn2;
wire EQ = (aluMinus[31:0] == 0);
wire LTU = aluMinus[32];
wire LT = (aluIn1[31] ^ aluIn2[31]) ? aluIn1[31] : aluMinus[32];

// ADD/SUB/ADDI: 
// funct7[5] is 1 for SUB and 0 for ADD. We need also to test instr[5]
// to make the difference with ADDI
//
// SRLI/SRAI/SRL/SRA: 
// funct7[5] is 1 for arithmetic shift (SRA/SRAI) and 
// 0 for logical shift (SRL/SRLI)
always @(*) begin
        case(funct3)
        // SUB <- funct[7]==1 && instr[5]==1(ALUimm don't have SUB)
        3'b000: aluOut = (funct7[5] & instr[5]) ? aluMinus[31:0] : aluPlus;
        // left shift
        3'b001: aluOut = aluIn1 << shamt;
        //  signed comparison (<)
        3'b010: aluOut = {31'b0, LT};
        // unsigned comparison (<)
        3'b011: aluOut = {31'b0, LTU};
        // XOR
        3'b100: aluOut = (aluIn1 ^ aluIn2);
        // logical right shift(0) or arithmetic right shift(1)
        3'b101: aluOut = funct7[5]? ($signed(aluIn1) >>> shamt) : 
                                     ($signed(aluIn1) >> shamt); 
        // OR
        3'b110: aluOut = (aluIn1 | aluIn2);
        // AND
        3'b111: aluOut = (aluIn1 & aluIn2); 
        endcase
end

Branch Unit

The Branch Unit is the part of the CPU which allows the program to make decisions, and also to perform jumps (changes to the PC) and procedure calls.

instr opcode funct3 algo
BEQ 7'b1100011 3'b000 if(rs1==rs2)PC<-PC+Bimm
BNE 7'b1100011 3'b001 if(rs1!=rs2) PC<-PC+Bimm
BLT 7'b1100011 3'b100 if(rs1<rs2) PC<-PC+Bimm; (signed comparison)
BGE 7'b1100011 3'b101 if(rs1>=rs2) PC<-PC+Bimm; (signed comparison)
BLTU 7'b1100011 3'b110 if(rs1<rs2) PC<-PC+Bimm; (unsigned comparison)
BGEU 7'b1100011 3'b111 if(rs1>=rs2) PC<-PC+Bimm; (unsigned comparison)
// BRANCHES
// BEQ  rs1, rs2, imm   if(rs1==rs2)PC<-PC+Bimm;
// BNE  rs1, rs2, imm   if(rs1!=rs2) PC<-PC+Bimm;
// BLT  rs1, rs2, imm   if(rs1<rs2) PC<-PC+Bimm; (signed comparison)
// BGE  rs1, rs2, imm   if(rs1>=rs2) PC<-PC+Bimm; (signed comparison)
// BLTU rs1, rs2, imm   if(rs1<rs2) PC<-PC+Bimm; (unsigned comparison)
// BGEU rs1, rs2, imm   if(rs1>=rs2) PC<-PC+Bimm; (unsigned comparison)
reg takeBranch;
always @(*) begin
        case(funct3)
                3'b000: takeBranch = EQ;
                3'b001: takeBranch = !EQ;
                3'b100: takeBranch = LT;
                3'b101: takeBranch = !LT;
                3'b110: takeBranch = LTU;
                3'b111: takeBranch = !LTU;
                default: takeBranch = 1'b0;
        endcase
end

// Equivalent to PCplusImm = PC + (isJAL ? Jimm : isAUIPC ? Uimm : Bimm)
wire [31:0] PCplusImm = PC + ( instr[3] ? Jimm[31:0] :
                        instr[4] ? Uimm[31:0] :
                        Bimm[31:0] );
wire [31:0] PCplus4 = PC+4;

// Address computation
// An adder used to compute branch address, JAL address and AUIPC.
// branch->PC+Bimm    AUIPC->PC+Uimm    JAL->PC+Jimm
// Equivalent to PCplusImm = PC + (isJAL ? Jimm : isAUIPC ? Uimm : Bimm)
wire [31:0] PCplusImm = PC + ( instr[3] ? Jimm[31:0] :
            instr[4] ? Uimm[31:0] :
            Bimm[31:0] );
wire [31:0] PCplus4 = PC+4;

wire [31:0] nextPC = ((isBranch && takeBranch) || isJAL) ? PCplusImm   :
                                             isJALR   ? {aluPlus[31:1],1'b0} :
                                             PCplus4;

Load & Store

RISC-V基于加载-存储结构,算术指令只能在寄存器上操作,内存中的数据只能读取和加载。

load命令如下,sign extendLH,LB会将rd的高位部分以符号位进行补全,而LHU,LBU用0来补全

instr funct3 algo
LB 3'b000 [rd] <- Mem(imm[11:0] + rs1) & 0xFF, sign extend
LBU 3'b100 [rd] <- Mem(imm[11:0] + rs1) & 0xFF
LH 3'b001 [rd] <- Mem(imm[11:0] + rs1) & 0xFFFF, sign extend
LHU 3'b101 [rd] <- Mem(imm[11:0] + rs1) & 0xFFFF
LW 3'b010 [rd] <- Mem(imm[11:0] + rs1)

store命令如下:

instr funct3 algo
SB 3'b000 Mem(imm[11:0] + rs1) <- [rs2] & 0xFF
SH 3'b001 Mem(imm[11:0] + rs1) <- [rs2] & 0xFFFF
SW 3'b010 Mem(imm[11:0] + rs1) <- [rs2]
wire [ADDR_WIDTH-1:0] loadstore_addr = rs1 + (isStore ? Simm : Iimm);

// Load
// All memory accesses are aligned on 32 bits boundary. For this
// reason, we need some circuitry that does unaligned halfword
// and byte load/store, based on:
// - funct3[1:0]:  00->byte 01->halfword 10->word
// - mem_addr[1:0]: indicates which byte/halfword is accessed

wire mem_byteAccess     = funct3[1:0] == 2'b00;
wire mem_halfwordAccess = funct3[1:0] == 2'b01;


wire [15:0] LOAD_halfword =
         loadstore_addr[1] ? mem_rdata[31:16] : mem_rdata[15:0];

wire  [7:0] LOAD_byte =
         loadstore_addr[0] ? LOAD_halfword[15:8] : LOAD_halfword[7:0];

// LOAD, in addition to funct3[1:0], LOAD depends on:
// - funct3[2] (instr[14]): 0->do sign expansion   1->no sign expansion
wire LOAD_sign = !funct3[2] & (mem_byteAccess ? LOAD_byte[7] : LOAD_halfword[15]);

wire [31:0] LOAD_data =
         mem_byteAccess ? {{24{LOAD_sign}},     LOAD_byte} :
         mem_halfwordAccess ? {{16{LOAD_sign}}, LOAD_halfword} :
         mem_rdata ;

// Store
// ------------------------------------------------------------------------

assign mem_wdata[ 7: 0] = rs2[7:0];
assign mem_wdata[15: 8] = loadstore_addr[0] ? rs2[7:0]  : rs2[15: 8];
assign mem_wdata[23:16] = loadstore_addr[1] ? rs2[7:0]  : rs2[23:16];
assign mem_wdata[31:24] = loadstore_addr[0] ? rs2[7:0]  :
             loadstore_addr[1] ? rs2[15:8] : rs2[31:24];

// The memory write mask:
//    1111                     if writing a word
//    0011 or 1100             if writing a halfword
//                                (depending on loadstore_addr[1])
//    0001, 0010, 0100 or 1000 if writing a byte
//                                (depending on loadstore_addr[1:0])

wire [3:0] STORE_wmask =
        mem_byteAccess      ?
                    (loadstore_addr[1] ?
                    (loadstore_addr[0] ? 4'b1000 : 4'b0100) :
                    (loadstore_addr[0] ? 4'b0010 : 4'b0001)
                                ) :
        mem_halfwordAccess ?
                    (loadstore_addr[1] ? 4'b1100 : 4'b0011) :
                    4'b1111;

多周期状态机

主要状态:

FETCH_INSTR -> WAIT_INSTR -> EXECUTE -> WAIT_DATA

1.FETCH_INSTR状态取址,设置mem_addrPC地址,mem_rstrb置高电平请求读RAM/ROM。

2.WAIT_INSTR状态等待读完PC地址指向的指令

3.EXECUTE状态执行指令:

  • 指令译码
  • ALU执行计算
  • 更新目的寄存器RegisterBank[rdId]
  • 更新PC,若是Branch,JAL,JALR指令,则根据指令改变PC,否则PC=PC+4即可,(4 byte=1 instr)
  • 若是load指令,则设置mem_addrloadstore_addr(load地址),mem_rstrb置高电平请求读RAM/ROM/IO,并且进入WAIT_DATA状态,否则跳过WAIT_DATA,直接转到FETCH_INSTR状态,开始取出下一条要执行的指令。
  • 若是store指令,则设置mem_addrloadstore_addr(store地址),设置mem_wmask,将mem_wdata写入RAM/IO

4.WAIT_DATA状态等待load读完

// The state machine
localparam FETCH_INSTR = 0;
localparam WAIT_INSTR  = 1;
localparam EXECUTE     = 2;
localparam WAIT_DATA   = 3;
reg [1:0] state = FETCH_INSTR;

assign writeBackEn = (state==EXECUTE && !isBranch && !isStore) ||
                                         (state==WAIT_DATA) ;

assign mem_addr = (state == WAIT_INSTR || state == FETCH_INSTR) ?
                                     PC : loadstore_addr ;

assign mem_rstrb = (state == FETCH_INSTR || (state == EXECUTE & isLoad));
assign mem_wmask = {4{(state == EXECUTE) & isStore}} & STORE_wmask;

always @(posedge clk) begin
    if(!resetn) begin
        PC    <= 32'h00820000; // jump to SPI FLASH + 128 kB
        state <= WAIT_DATA;    // just wait for !mem_rbusy
    end else begin
        if(writeBackEn && rdId != 0) begin
        RegisterBank[rdId] <= writeBackData;
    end
    case(state)
     FETCH_INSTR: begin
            state <= WAIT_INSTR;
     end
     WAIT_INSTR: begin
            instr <= mem_rdata[31:2];
            rs1 <= RegisterBank[mem_rdata[19:15]];
            rs2 <= RegisterBank[mem_rdata[24:20]];
            if(!mem_rbusy) begin
                state <= EXECUTE;
            end
     end
     EXECUTE: begin
            if(!isSYSTEM) begin
                PC <= nextPC;
            end
            state <= isLoad  ? WAIT_DATA : FETCH_INSTR;
     end
     WAIT_DATA: begin
            if(!mem_rbusy) begin
                state <= FETCH_INSTR;
            end
     end
    endcase 
    end
end

参考