0

I'm writing some code to render camera preview using SkiaSharp. This is cross-platform but I came across a problem while writing the implementation for android.

I needed to convert YUV_420_888 to RGB8888 because that's what SkiaSharp supports and with the help of this thread, somehow managed to show decent quality images to my SkiaSharp canvas. The problem is the speed. At best I can get about 8 fps but usually it's just 4 or 5 fps. It turned out the biggest factor is the conversion. I now have about 3 versions of my ToRGB converter. I've even ended up trying "unsafe" code and parallel loops. I'll just show you my best one yet.

private unsafe byte[] ToRgb(byte[] yValuesArr, byte[] uValuesArr,
        byte[] vValuesArr, int uvPixelStride, int uvRowStride)
    {
        var width = PixelSize.Width;
        var height = PixelSize.Height;
        var rgb = new byte[width * height * 4];

        var partitions = Partitioner.Create(0, height);
        Parallel.ForEach(partitions, range =>
        {
            var (item1, item2) = range;
            Parallel.For(item1, item2, y =>
            {
                for (var x = 0; x < width; x++)
                {
                    var yIndex = x + width * y;
                    var currentPosition = yIndex * 4;
                    var uvIndex = uvPixelStride * (x / 2) + uvRowStride * (y / 2);

                    fixed (byte* rgbFixed = rgb)
                    fixed (byte* yValuesFixed = yValuesArr)
                    fixed (byte* uValuesFixed = uValuesArr)
                    fixed (byte* vValuesFixed = vValuesArr)
                    {
                        var rgbPtr = rgbFixed;
                        var yValues = yValuesFixed;
                        var uValues = uValuesFixed;
                        var vValues = vValuesFixed;

                        var yy = *(yValues + yIndex);
                        var uu = *(uValues + uvIndex);
                        var vv = *(vValues + uvIndex);

                        var rTmp = yy + vv * 1436 / 1024 - 179;
                        var gTmp = yy - uu * 46549 / 131072 + 44 - vv * 93604 / 131072 + 91;
                        var bTmp = yy + uu * 1814 / 1024 - 227;

                        rgbPtr = rgbPtr + currentPosition;
                        *rgbPtr = (byte) (rTmp < 0 ? 0 : rTmp > 255 ? 255 : rTmp);
                        rgbPtr++;

                        *rgbPtr = (byte) (gTmp < 0 ? 0 : gTmp > 255 ? 255 : gTmp);
                        rgbPtr++;

                        *rgbPtr = (byte) (bTmp < 0 ? 0 : bTmp > 255 ? 255 : bTmp);
                        rgbPtr++;

                        *rgbPtr = 255;
                    }
                }
            });
        });

        return rgb;
}

You can also find it on my repo. You can also find on that same repo the part where I rendered the output to SkiaSharp

For a preview size of 1440x1080, running on my phone, this code takes about 120ms to finish. Even if all the other parts are optimized, the most I can get from that is 8fps. And no, it's not my hardware because the built-in camera app runs smoothly. By the way 1440x1080 is the output of my ChooseOptimalSize algorithm that I got from the mono-droid examples of android's Camera2 API. I don't know if it's the best way or if it lacks logic on detecting the fps and sizing down the preview to make it faster.

brendt
  • 113
  • 1
  • 4
  • In line 176 are you spawning a thread from within a thread? – BHawk Jan 25 '19 at 10:43
  • Yes. I have a version with just a nested loop, and a version that only has the partitions. I only chose the fastest one based on my benchmarks. – brendt Jan 25 '19 at 11:04
  • I know nothing about SkiaSharp but wondered about replacing the the 4 writes to `*rgbPtr` with writing to a local array of 4 bytes (where the last byte is set to 255 outside the loop), and moving that in one go with a single 32-bit write to your output array. That might save incrementing the pointer 4 times, and do a single 32-bit write instead of 4 single-byte writes and avoid writing the 255 for each pixel. – Mark Setchell Jan 25 '19 at 11:25
  • I don't understand how I would write the byte outside the loop. Before the loop or after? Consider this byte array: `r0, g0, b0, __, r1, g1, b1, __, ...`. How would you fill all those blanks with 255 without iterating over them? – brendt Jan 25 '19 at 16:14
  • In C, you would do this outside the loop `unsigned char RGBA[4]={0,0,0,255}` and inside the loop you would do `RGBA[0]=...; RGBA[1]=...; RGBA[2]=...;` leaving `RGBA[3]` untouched for all subsequent pixels. Then move the full 32-bits of RGBA in one go. – Mark Setchell Jan 25 '19 at 16:28

1 Answers1

0

Does SkiaSharp support GPU drawing? If you connect the camera to a SurfaceTexture, you can use the preview frames as GL textures and render them efficiently into an OpenGL scene.

Even if not, you may still get faster results by sending the frames to the GPU and reading them back to the CPU with something like glReadPixels, as that'll do a RGB conversion within the GPU.

Eddy Talvala
  • 15,449
  • 2
  • 37
  • 42