==PROBLEM==
The Queue in ChiselUtil is generating incorrect code - the Queue's internal RAM's width does not match the io.enq width. This is causing it to extract the wrong signals, as now the outbound io.deq signals don't match the incoming io.enq signals.
==SYMPTOM==
Code compiles in both C++ and Verilog but returns the wrong answer. The values of the io.deq signals do not match the incoming io.enq signals. The values of signals has been shifted over by two.
==CAUSE==
An ambiguous width signal in the Bundle being used as the "gen" type for the Queue.
==SUSPECTED PROBLEM==
I suspect that one of the fields has a width of "-1", which is screwing up the math of the extract code. In particular, the Verilog says:
assign T29 = T14[1'h1/* -1*/:1'h1/* -1*/];
It looks like it is literally trying to extract Bits T14[-1:-1], and by dumb luck extracting [1:1] properly. But the next field in the Bundle is reset to extract from [0:0], causing a shift in the extract logic and causing the fields in the higher order bits of T14 to be pulled out incorrectly.
This bug may actually reside in Mem.
==MORE INFO==
I have a Bundle() which is passed into the Queue as the "gen" type. I added a new field to the bundle with an undefined width (btb_pred_taken_idx = UInt()):
class FetchBundle(implicit conf: BOOMConfiguration) extends Bundle
{
val fetch_width = conf.icache.ibytes/4
val pc = UInt(width = XPRLEN)
val insts = Vec.fill(fetch_width) { Bits(width = 32) }
val mask = Bits(width = fetch_width) // mark which words are val
val br_predictions = Vec.fill(fetch_width) { new BrPrediction } // 3 bits in width
val btb_pred_taken = Bool()
val btb_pred_taken_idx = UInt() // << --- THIS IS BREAKING THINGS
override def clone = new FetchBundle().asInstanceOf[this.type]
}
val fetch_bundle = new FetchBundle()
val FetchBuffer = Module(new Queue(gen=new FetchBundle,
entries=FETCH_BUFFER_SZ,
pipe=false,
flow=ENABLE_FETCH_BUFFER_FLOW_THROUGH,
_reset=(fetchbuffer_kill || reset.toBool)))
The bug is that io.enq.bits.pc does not match io.deq.bits.pc!
The FetchBuffer is a 102-bit wide bundle (for a fetch_width of 1) stored in variable T21; however, it generates an underlying RAM of 100 bits in size.
The PC is stored in bits [101:37] of the aggregated bundle T21. The Queue module pulls out bits [99:36] for io.deq.bits.pc. :( :( :(
Generated Verilog Code
module Queue_0(input clk, input reset,
output io_enq_ready,
input io_enq_valid,
input [63:0] io_enq_bits_pc,
input [31:0] io_enq_bits_insts_0,
input io_enq_bits_mask,
input io_enq_bits_br_predictions_0_taken,
input io_enq_bits_br_predictions_0_agreement,
input io_enq_bits_br_predictions_0_local_taken,
input io_enq_bits_btb_pred_taken,
input io_enq_bits_btb_pred_taken_idx,
input io_deq_ready,
output io_deq_valid,
output[63:0] io_deq_bits_pc,
output[31:0] io_deq_bits_insts_0,
output io_deq_bits_mask,
output io_deq_bits_br_predictions_0_taken,
output io_deq_bits_br_predictions_0_agreement,
output io_deq_bits_br_predictions_0_local_taken,
output io_deq_bits_btb_pred_taken,
output io_deq_bits_btb_pred_taken_idx,
output[2:0] io_count
);
<snip>
wire ptr_match;
reg[1:0] enq_ptr;
wire[1:0] T7;
wire[1:0] T8;
wire T9;
wire[1:0] T10;
wire[1:0] T11;
wire T12;
wire T13;
wire[101:0] T14;
wire[101:0] T15;
wire T16;
wire[99:0] T17;
reg [99:0] ram [3:0];
wire[99:0] T18;
wire[99:0] T19;
wire[101:0] T20;
wire[101:0] T21;
wire T22;
wire T23;
wire T24;
wire T25;
wire T26;
wire[31:0] T27;
wire[63:0] T28;
<snip>
assign io_deq_bits_btb_pred_taken_idx = T13;
assign T13 = T14[1'h0/* 0*/:1'h0/* 0*/];
assign T14 = maybe_flow ? T21 : T15;
assign T15 = {T28, T27, T26, T25, T24, T23, T22, T16};
assign T16 = T17[1'h0/* 0*/:1'h0/* 0*/];
assign T17 = ram[deq_ptr];
assign T19 = T20[7'h63/* 99*/:1'h0/* 0*/];
assign T20 = T21;
assign T21 = {io_enq_bits_pc, io_enq_bits_insts_0, io_enq_bits_mask, io_enq_bits_br_predictions_0_taken, io_enq_bits_br_predictions_0_agreement, io_enq_bits_br_predictions_0_local_taken, io_enq_bits_btb_pred_taken, io_enq_bits_btb_pred_taken_idx};
assign T22 = T17[1'h1/* -1*/:1'h1/* -1*/];
assign T23 = T17[1'h0/* 0*/:1'h0/* 0*/];
assign T24 = T17[1'h1/* 1*/:1'h1/* 1*/];
assign T25 = T17[2'h2/* 2*/:2'h2/* 2*/];
assign T26 = T17[2'h3/* 3*/:2'h3/* 3*/];
assign T27 = T17[6'h23/* 35*/:3'h4/* 4*/];
assign T28 = T17[7'h63/* 99*/:6'h24/* 36*/];
assign io_deq_bits_btb_pred_taken = T29;
assign T29 = T14[1'h1/* -1*/:1'h1/* -1*/];
assign io_deq_bits_br_predictions_0_local_taken = T30;
assign T30 = T14[1'h0/* 0*/:1'h0/* 0*/];
assign io_deq_bits_br_predictions_0_agreement = T31;
assign T31 = T14[1'h1/* 1*/:1'h1/* 1*/];
assign io_deq_bits_br_predictions_0_taken = T32;
assign T32 = T14[2'h2/* 2*/:2'h2/* 2*/];
assign io_deq_bits_mask = T33;
assign T33 = T14[2'h3/* 3*/:2'h3/* 3*/];
assign io_deq_bits_insts_0 = T34;
assign T34 = T14[6'h23/* 35*/:3'h4/* 4*/];
assign io_deq_bits_pc = T35;
assign T35 = T14[7'h63/* 99*/:6'h24/* 36*/];
assign io_deq_valid = T36;
assign T36 = T37 || io_enq_valid;
assign T37 = ! maybe_flow;
assign io_enq_ready = T38;
assign T38 = ! full;
assign full = ptr_match && maybe_full;
always @(posedge clk) begin
if(reset) begin
deq_ptr <= 2'h0/* 0*/;
end else if(do_deq) begin
deq_ptr <= T10;
end
if(reset) begin
maybe_full <= 1'h0/* 0*/;
end else if(T4) begin
maybe_full <= do_enq;
end
if(reset) begin
enq_ptr <= 2'h0/* 0*/;
end else if(do_enq) begin
enq_ptr <= T7;
end
if (do_enq)
ram[enq_ptr] <= T19;
end
endmodule
// if you've made it this far, here's a joke for your time:
// A SQL query goes into a bar, walks up to two tables and asks, "Can I join you?"
From the above code, we can discern the following:
- the Incoming io.enq signals are placed into T21, which is properly 102 bits in size.
- T21 is improperly shortened to 100-bits and placed into "ram".
- ram is placed into T17, from which the individual io.deq.bits fields are extracted from
- the first field (btb_pred_taken_idx), our hero of unspecified width (should be 1-bit in size), is extracted from T17[0:0] and into T16.
- The second field (btb_pred_taken), is extracted from T17[-1:-1]. Ruh-roh Shaggie!
- The third field (br_predictions_0_local_taken) is extracted from T17[0:0]… the same as btb_pred_taken_idx!
- The extracting of the remaining fields continues, but now offset by 2.
==WORKAROUND==
Specify the width of UInt() btb_pred_taken_idx. Continue to curse the width inference code.
Thank you for reading. I'd fix this myself, but I can't make heads or tails of the subtly different meanings of width, getWidth, inferWidth, WidthOf(),fixWidth(), matchWidth(), and friends.