-1

The design

I'm trying to implement a RGB to YUV444 conversion algorithm in hardware based in the next approximation I've got working in a C based program:

#define CLIP(X) ( (X) > 255 ? 255 : (X) < 0 ? 0 : X)
#define RGB2Y(R, G, B) CLIP(( (  66 * (R) + 129 * (G) +  25 * (B) + 128) >> 8) +  16)
#define RGB2U(R, G, B) CLIP(( ( -38 * (R) -  74 * (G) + 112 * (B) + 128) >> 8) + 128)
#define RGB2V(R, G, B) CLIP(( ( 112 * (R) -  94 * (G) -  18 * (B) + 128) >> 8) + 128)

Simulation/verification done so far

I simulated and verified the VHDL code using a self-checking approach. I compared the VHDL output against a 'golden' reference YUV444 values generated by the know working C algorithm using several images and all simulations run successfully.

The problem

When I implemented it in hardware and inserted within a video pipeline, the video output looks fine in terms of sync and there are not obvious issues in the frame rate, flickering...etc but the problem is that the colours are not right, there's a magenta/purple cast, e.g. yellow will look light magenta and red a bit darker...etc

I'm thinking that it is likely that the UV values are clipped/saturated and Y (luminance) works fine and this is what I'm seeing in the resulting video.

The code

Please note that for simplicity I only posted the part which is doing the conversion and the relevant signal declaration types and function. The rest of the code is just a axi video stream signal wrapper which works well in terms of video synchronisation and there are not video issues in that sense, let me know if you think it would be helpful and I'll post it too.

--Functions:

  -- Absolute operation of: "op1 - op2 + op3" for the UV components
  function uv_op_abs(op1 : unsigned(15 downto 0); op2 : unsigned(15 downto 0); op3 : unsigned(15 downto 0))
  return unsigned is
    variable res1 : unsigned(15 downto 0);
  begin
    if op2 > op1 then
      res1 := (op2 - op1);
      if res1 > op3 then
        return res1 - op3;
      else 
        return op3 - res1;
      end if;
    else
      return (op1 - op2) + op3;
    end if;
  end uv_op_abs;

 function clip(mult_in : unsigned(15 downto 0))
  return unsigned is
  begin
    if to_integer(unsigned(mult_in)) > 240 then
      return unsigned(to_unsigned(240,8));
    else
      return unsigned(mult_in(7 downto 0));
    end if;
  end clip;

-- Signals/Constants declarations:

  --Constants
  constant coeff_0 : unsigned(7 downto 0) := "01000010";  --66
  constant coeff_1 : unsigned(7 downto 0) := "10000001";  --129  
  constant coeff_2 : unsigned(7 downto 0) := "00011001";  --25 
  constant coeff_3 : unsigned(7 downto 0) := "00100110";  --38
  constant coeff_4 : unsigned(7 downto 0) := "01001010";  --74
  constant coeff_5 : unsigned(7 downto 0) := "01110000";  --112
  constant coeff_6 : unsigned(7 downto 0) := "01011110";  --94
  constant coeff_7 : unsigned(7 downto 0) := "00010010";  --18
  constant coeff_8 : unsigned(7 downto 0) := "10000000";  --128
  constant coeff_9 : unsigned(7 downto 0) := "00010000";  --16

  --Pipeline registers
  signal red_reg : unsigned(7 downto 0);
  signal green_reg : unsigned(7 downto 0);
  signal blue_reg : unsigned(7 downto 0);

  signal y_red_reg_op1 : unsigned(15 downto 0);
  signal y_green_reg_op1 : unsigned(15 downto 0);
  signal y_blue_reg_op1 : unsigned(15 downto 0);

  signal u_red_reg_op1 : unsigned(15 downto 0);
  signal u_green_reg_op1 : unsigned(15 downto 0);
  signal u_blue_reg_op1 : unsigned(15 downto 0);

  signal v_red_reg_op1 : unsigned(15 downto 0);
  signal v_green_reg_op1 : unsigned(15 downto 0);
  signal v_blue_reg_op1 : unsigned(15 downto 0);

  signal y_reg_op2 : unsigned(15 downto 0);
  signal u_reg_op2 : unsigned(15 downto 0);
  signal v_reg_op2 : unsigned(15 downto 0);

  signal y_reg_op3 : unsigned(7 downto 0);
  signal u_reg_op3 : unsigned(7 downto 0);
  signal v_reg_op3 : unsigned(7 downto 0);

-- The process for YUV444 conversion:

  RGB_YUV_PROC : process(clk)
  begin
    if rising_edge(clk) then
      if rst = '1' then
        red_reg <= (others => '0');
        green_reg <= (others => '0');
        blue_reg <= (others => '0');
        y_red_reg_op1 <= (others => '0');
        y_green_reg_op1 <= (others => '0');
        y_blue_reg_op1 <= (others => '0');
        u_red_reg_op1 <= (others => '0');
        u_green_reg_op1 <= (others => '0');
        u_blue_reg_op1 <= (others => '0');
        v_red_reg_op1 <= (others => '0');
        v_green_reg_op1 <= (others => '0');
        v_blue_reg_op1 <= (others => '0');
        y_reg_op2 <= (others => '0');
        u_reg_op2 <= (others => '0');
        v_reg_op2 <= (others => '0');
        y_reg_op3 <= (others => '0');
        u_reg_op3 <= (others => '0');
        v_reg_op3 <= (others => '0');
        yuv444_out <= (others => '0');
        soff_sync <= '0';
      else

        --Sync with first video frame with the tuser (sof) input signal
        if rgb_sof_in = '1' then
          soff_sync <= '1';
        end if;

        --Fetch a pixel
        if (rgb_sof_in = '1' or soff_sync = '1') and rgb_valid_in = '1' and yuv444_ready_out = '1' and bypass = '0' then
          green_reg <= unsigned(rgb_in(7 downto 0));
          blue_reg <= unsigned(rgb_in(15 downto 8));
          red_reg <= unsigned(rgb_in(23 downto 16));
        end if;

        -- RGB to YUV conversion
        -- Y--> CLIP(( (  66 * (R) + 129 * (G) +  25 * (B) + 128) >> 8) +  16)
        -- U--> CLIP(( ( -38 * (R) -  74 * (G) + 112 * (B) + 128) >> 8) + 128)
        -- V--> CLIP(( ( 112 * (R) -  94 * (G) -  18 * (B) + 128) >> 8) + 128)
        if (rgb_sof_in = '1' or soff_sync = '1') and (valid_delay = '1' or validff1 = '1') and yuv444_ready_out = '1' and bypass = '0' then
          --Y calc (  66 * (R) + 129 * (G) +  25 * (B) + 128) >> 8) +  16)
          y_red_reg_op1 <= coeff_0 * red_reg;
          y_green_reg_op1 <= coeff_1 * green_reg;
          y_blue_reg_op1 <= coeff_2 * blue_reg; 
          y_reg_op2 <=  y_red_reg_op1 + y_green_reg_op1 + y_blue_reg_op1 + (X"00" & coeff_8);
          y_reg_op3 <= (y_reg_op2(15 downto 8) + coeff_9);

          --U calc ( -38 * (R) -  74 * (G) + 112 * (B) + 128) >> 8) + 128)
          u_red_reg_op1 <= coeff_3 * red_reg;
          u_green_reg_op1 <= coeff_4 * green_reg;
          u_blue_reg_op1 <= coeff_5 * blue_reg;
          u_reg_op2 <= uv_op_abs(u_blue_reg_op1, (u_red_reg_op1 + u_green_reg_op1), (X"00" & coeff_8));
          u_reg_op3 <= (u_reg_op2(15 downto 8) + coeff_8);

          --V calc ( 112 * (R) -  94 * (G) -  18 * (B) + 128) >> 8) + 128)
          v_red_reg_op1 <= coeff_5 * red_reg;
          v_green_reg_op1 <= coeff_6 * green_reg;
          v_blue_reg_op1 <= coeff_7 * blue_reg;
          v_reg_op2 <= uv_op_abs(v_red_reg_op1, (v_blue_reg_op1 + v_green_reg_op1), (X"00" & coeff_8));
          v_reg_op3 <= (v_reg_op2(15 downto 8) + coeff_8);

          --Output data
          yuv444_out <= std_logic_vector(v_reg_op3) & std_logic_vector(u_reg_op3) & std_logic_vector(y_reg_op3);
        elsif yuv444_ready_out = '1' and rgb_valid_in = '1' and bypass = '1' then
          yuv444_out <= rgb_in;
        end if;

      end if;
    end if;
  end process; -- RGB_YUV_PROC

I've also tried adding the 'clip; function to control overflow thinking that it'll help to the the synthesis tool in case of 'clipping' but it didn't help, the issue persists:

if (rgb_sof_in = '1' or soff_sync = '1') and (valid_delay = '1' or validff1 = '1') and yuv444_ready_out = '1' and bypass = '0' then
  --Y calc (  66 * (R) + 129 * (G) +  25 * (B) + 128) >> 8) +  16)
  y_red_reg_op1 <= coeff_0 * red_reg;
  y_green_reg_op1 <= coeff_1 * green_reg;
  y_blue_reg_op1 <= coeff_2 * blue_reg; 
  y_reg_op2 <=  y_red_reg_op1 + y_green_reg_op1 + y_blue_reg_op1 + (X"00" & coeff_8);
  y_reg_op3 <= clip( X"00" & (y_reg_op2(15 downto 8) + coeff_9));

  --U calc ( -38 * (R) -  74 * (G) + 112 * (B) + 128) >> 8) + 128)
  u_red_reg_op1 <= coeff_3 * red_reg;
  u_green_reg_op1 <= coeff_4 * green_reg;
  u_blue_reg_op1 <= coeff_5 * blue_reg;
  u_reg_op2 <= uv_op_abs(u_blue_reg_op1, (u_red_reg_op1 + u_green_reg_op1), (X"00" & coeff_8));
  u_reg_op3 <= clip( X"00" & (u_reg_op2(15 downto 8) + coeff_8));

  --V calc ( 112 * (R) -  94 * (G) -  18 * (B) + 128) >> 8) + 128)
  v_red_reg_op1 <= coeff_5 * red_reg;
  v_green_reg_op1 <= coeff_6 * green_reg;
  v_blue_reg_op1 <= coeff_7 * blue_reg;
  v_reg_op2 <= uv_op_abs(v_red_reg_op1, (v_blue_reg_op1 + v_green_reg_op1), (X"00" & coeff_8));
  v_reg_op3 <= clip( X"00"& (v_reg_op2(15 downto 8) + coeff_8));

Question

I'm aware that in hardware design a successful simulation doesn't necessarily mean that the the design is going to work in the same way after synthesis and I'm sure there are lots of things can be improved in the code. There's something fundamentally wrong in this approach but I can't see it so far, does anyone have an idea what could be wrong and why?

joe
  • 175
  • 1
  • 12
  • Does the simulation test the full range of YCbCr? (I think you mean YCbCr, as YUV is analog video) Does the saturation occur in the simulation when you check the full range? – Tricky May 14 '21 at 12:30
  • Yes that's right sorry I meant YCbCr....The problem is that it happens always regardless the light conditions and one of the experiments was to feed the testbench with the rgb values from a still image taken with the camera and compare results with the software conversion 'golden' values, the verification completed succesfully but testing the hardware implementation pointing the camera to the same view is giving the color cast issue – joe May 14 '21 at 16:37
  • How do you know the problem doesn't also exist in the golden reference? Did you convert the test results to a bitmap (or similar) to visually see the results? – Tricky May 15 '21 at 06:34

1 Answers1

0

First things first: check if you're using YUV or YCbCr. Those are often confused and not the same!!! Don't mix them.

Then I see:

#define CLIP(X) ( (X) > 255 ? 255 : (X) < 0 ? 0 : X)

and

function clip(mult_in : unsigned(15 downto 0))
  return unsigned is
  begin
    if to_integer(unsigned(mult_in)) > 240 then
      return unsigned(to_unsigned(240,8));
    else
      return unsigned(mult_in(7 downto 0));
    end if;
  end clip;

Those are very different functions. The first uses a signed data type and clips between 255 and 0 and the second one only clips 240 positive, and possible arithmetic overflow due to subtraction will not be handled properly. For some reason you're using unsigned arithmetic throughout the code! (Why? What's wrong with signed?)

So you're already comparing apples with oranges.

Next it seems you are using a absolute function?! Why? That's not part of the original code at all. That will of course produce artifacts. You cant just flip the sign on negative values and expect them to be correct?

Also, please use proper naming. A constant value of 16 should not be named coeff_9. Makes the code hard to read and maintain. If you want flexibility, use a different structure altogether. A name like coeff_X tells you nothing: sure it's probably a coefficient, but what's it used for and such.

Actually, you can just write (note, I will already assume signed arithmetic)

y_red_reg_op1 <= red_reg * to_signed(66, 8);

or, because red_reg'length is already 8, even

y_red_reg_op1 <= red_reg * 66;

which is much easier to read

Then the code can become something like (again assuming you will be using signed)

  --U calc (( -38 * (R) -  74 * (G) + 112 * (B) + 128) >> 8) + 128)
  u_red_reg_op1 <= -38 * red_reg;
  u_green_reg_op1 <= 74 * green_reg;
  u_blue_reg_op1 <= 112 * blue_reg;
  u_reg_op2 <= u_red_reg_op1 - u_green_reg_op1 + u_blue_reg_op1 + 128;
  u_reg_op3 <= clip(shift_right(u_reg_op2, 8) + 128);

and clip should of course be

function clip(value : signed(15 downto 0))
  return signed is
begin
  if value > 255 then
    return to_signed(255, 8);
  elsif value < 0 then
    return to_signed(0, 8);
  else
    return value;
  end if;
end clip;

p.s. I'm hoping you're using numeric_std.

If all of this still produces artifacts, check if you haven't mixed up the RGB or YCbCr signal component order. That's a common mistake.

Final p.s. VHDL actually has a fixed-point library, with saturation logic. And it is supported by the big FPGA manufacturers. You could even consider using that to write a 'better' solution then the C one.

edit: I just read wikipedia and the whole algorithm doesn't require clipping or abs or any of that at all.

  1. Basic transform from 8-bit RGB to 16-bit values (Y′: unsigned, U/V: signed, matrix values got rounded so that the later-on desired Y′UV range of each [0..255] is reached whilst no overflow can happen):
  2. Scale down (">>8") to 8-bit values with rounding ("+128") (Y′: unsigned, U/V: signed):
  3. Add an offset to the values to eliminate any negative values (all results are 8-bit unsigned):

you should implement your algo's likewise

Will probably become something like

  --U calc ((- 38 * (R) -  74 * (G) + 112 * (B) + 128) >> 8) + 128)
  u_red_reg_op1 <= -38 * red_reg; -- 16 bit signed
  u_green_reg_op1 <= -74 * green_reg; -- 16 bit signed
  u_blue_reg_op1 <= 112 * blue_reg; -- 16 bit signed
  u_reg_op2 <= u_red_reg_op1 + u_green_reg_op1 + u_blue_reg_op1 + 128; -- 16 bit signed
  u_reg_op3 <= unsigned(resize(shift_right(u_reg_op2, 8), 8) + 128); -- 8 bit unsigned

(CHECK!)

And last be not least: you need better testbenches, that confirm the results of the 'golden' source with the output of your implementation.

JHBonarius
  • 6,889
  • 3
  • 12
  • 28
  • Thanks for your answer and suggestions, to be honest I was doubting whether to use signed or unsigned but my mistake was to try unsigned thinking it would be simpler (makeing it more complex at the end) and trusting on a not great testbench :) I think it might explain and solve the issues I'm having and the missmatch with the simulation. Agreed with the naming I generally try to make it readable but this was written as a quick prototype and if it worked I of course planned to rewrite them with better names....I'll apply the changes improve the testbench and see if that works – joe May 15 '21 at 11:20