Pointless Hacks - Asynchronous FIFO

Asynchronous FIFO

Clocks & Synchronization

Take some digital circuitry with some clocks (repeating digital pulses) driving the logic. Every time a pulse comes around the circuit, logic gates flip into different states, marking the next stage of the circuit's computation. Say that there is a clock running at 333MHz and another at 400MHz. The 400MHz circuitry wants to respond to some signal coming from the 333MHz domain. The obvious way to do this is just to connect a wire from the signal in the 333MHz domain to the 400MHz domain, like so:

Unfortunately, this doesn't work. Every now and again, the phase of the 333MHz domain will line up with the 400MHz domain in such a way that the signal between the two domains will become meta-stable - that is, in a state that is neither 0 nor 1. This state may propagate to other circuitry in the 400MHz domain, leaving the logic completely broken.

The solution is a synchronizer, which removes the meta-stability by passing the signal through a series of flip-flops until it is stable again. That looks like the following:

Now suppose that one wishes to pass more than 1 bit of data from one domain to another. The obvious way to do this is of course to use multiple syncronizers, one for each bit of information. Unfortunately this runs into problems. When more than 1 synchronizer is used, the bits don't necessarily change on the same clock cycle after synchronization.

The solution to the problem is called an "Asynchronous First-In-First-Out Queue" - or Async FIFO for short. An Async FIFO is a key component of any electronic device that wishes to pass data between two independent clock domains. It is composed of a block of read/write memory, a read pointer, and a write pointer. Data is written to the queue every write-clock cycle, provided the queue is not full, and write is enabled, and data is read every read-clock cycle, provided the queue is not empty, and read is enabled.

In Verilog, the Async FIFO's interface looks like the following:


module afifo(
    input wr_clock,
    input rd_clock,
    input reset_n,
    
    input wr_enable,
    input [WIDTH-1:0] wr_data,
    
    input rd_enable,
    output [WIDTH-1:0] rd_data,
    
    output full,
    output empty
);

With WIDTH passed as a parameter to represent the data width. The data itself is very easy to represent as an array:


reg [WIDTH-1:0] data[2 ** ADDR_WIDTH-1:0];
assign rd_data = data[rd_addr];

In order to synchronize the full/empty states between the two clock domains, the read and write pointers must be synchronized, and compared for equality. Astute readers will notice that if there is more than 1 bit in the pointers, then this is exactly the problem that was to be solved! This solution appears to head nowhere. Of course, there is one thing that is true about the read and write pointers that is not true of general data - they always increment by 1, or stay the same. This is just 1 bit of information.

Gray Codes

The trick to transferring this information is called a "Gray Code", after Frank Gray. It is an ordering of the binary numeral system such that each successive number differs by only 1 bit to the previous number. It is useful in this case because a single bit changing will always be synchronized correctly. The receiving clock domain will never observe a state with bits changing out-of-order (and therefore in an inconsistent state). Synchronizing Gray-coded pointers is therefore the way that the full/empty states are synchronized between the clock domains in an Async FIFO.

Gray-coding a binary value can be done very easily with just a single shift and an XOR operation:


reg [ADDR_WIDTH-1:0] wr_addr;
wire [ADDR_WIDTH-1:0] wr_addr_gray = (wr_addr >> 1) ^ wr_addr;

reg [ADDR_WIDTH-1:0] rd_addr;
wire [ADDR_WIDTH-1:0] rd_addr_gray = (rd_addr >> 1) ^ rd_addr;

These values are then put through a series of synchronizers to reach the other clock domain:


reg [ADDR_WIDTH-1:0] wr_addr_gray_sync[WR_TO_RD_SYNC_STAGES-1:0];
always @(posedge wr_clock) begin
    rd_addr_gray_sync[0] <= rd_addr_gray;
    for (i = 1; i < RD_TO_WR_SYNC_STAGES; i = i + 1)
        rd_addr_gray_sync[i] <= rd_addr_gray_sync[i - 1];
end

reg [ADDR_WIDTH-1:0] rd_addr_gray_sync[RD_TO_WR_SYNC_STAGES-1:0];
always @(posedge rd_clock) begin
    wr_addr_gray_sync[0] <= wr_addr_gray;
    for (i = 1; i < WR_TO_RD_SYNC_STAGES; i = i + 1)
        wr_addr_gray_sync[i] <= wr_addr_gray_sync[i - 1];
end

When the read/write pointers are equal (as they are after a reset), there is no more data to be read, and so the FIFO is marked as empty. When the write pointer is one less than the read pointer, it's time to stop writing lest the queue be marked as full in the next clock, and so in this situation the FIFO is marked as full. The following verilog code illustrates this:


assign empty = rd_addr_gray == wr_addr_gray_sync[WR_TO_RD_SYNC_STAGES-1];
assign full = wr_addr_next_gray == rd_addr_gray_sync[RD_TO_WR_SYNC_STAGES-1];

Finally, on each clock, if the read/write is enabled, and the queue is not empty/full respectively, the respective pointer will increment.


wire [ADDR_WIDTH-1:0] wr_addr_next = wr_addr + 1;
wire [ADDR_WIDTH-1:0] rd_addr_next = rd_addr + 1;

always @(posedge wr_clock)
    if (wr_enable && !full) begin
        data[wr_addr] <= wr_data;
        wr_addr <= wr_addr_next;
    end

always @(posedge rd_clock)
    if (rd_enable && !empty)
        rd_addr <= rd_addr_next;

The full code for this Asyncronous FIFO may be found here.