4

I am writing an app in Swift which employs the Scandit barcode scanning SDK. The SDK permits you to access camera frames directly and provides the frame as a CMSampleBuffer. They provide documentation in Objective-C, which I am having trouble getting to work in Swift. I do not know if the problem is in porting the code, or if there is something amiss with the sample buffer itself, perhaps due to a change in Core Media since their documentation was generated.

Their API exposes the frame as follows (Objective-C):

interface YourViewController () <SBSProcessFrameDelegate>
...
- (void)barcodePicker:(SBSBarcodePicker*)barcodePicker
      didProcessFrame:(CMSampleBufferRef)frame
              session:(SBSScanSession*)session {
    // Process the frame yourself.
}

Building from several answers here on SO, I attempt to process the frame with:

let imageBuffer = CMSampleBufferGetImageBuffer(frame)!
CVPixelBufferLockBaseAddress(imageBuffer, 0)
let baseAddress = CVPixelBufferGetBaseAddress(imageBuffer)

let width = CVPixelBufferGetWidth(imageBuffer)
let height = CVPixelBufferGetHeight(imageBuffer)
let bytesPerRow = CVPixelBufferGetBytesPerRow(imageBuffer)

let colorSpace = CGColorSpaceCreateDeviceRGB()
let bitmapInfo = CGBitmapInfo(rawValue: CGImageAlphaInfo.NoneSkipFirst.rawValue | CGBitmapInfo.ByteOrder32Little.rawValue)
let context = CGBitmapContextCreate(baseAddress, width, height, 8, bytesPerRow, colorSpace, bitmapInfo.rawValue)

let quartzImage = CGBitmapContextCreateImage(context)
CVPixelBufferUnlockBaseAddress(imageBuffer,0)

let image = UIImage(CGImage: quartzImage!)

But, this fails with:

Jan 29 09:01:30  Scandit[1308] <Error>: CGBitmapContextCreate: invalid data bytes/row: should be at least 7680 for 8 integer bits/component, 3 components, kCGImageAlphaNoneSkipFirst.
Jan 29 09:01:30  Scandit[1308] <Error>: CGBitmapContextCreateImage: invalid context 0x0. If you want to see the backtrace, please set CG_CONTEXT_SHOW_BACKTRACE environmental variable.
fatal error: unexpectedly found nil while unwrapping an Optional value

The fatal error is in attempting to resolve a UIImage from quartzImage.

The width, height, and bytesPerRow are (at the base address):

Width: 1920
Height: 1080
Bytes per row: 2904

As passed from the delegate, here is what the buffer contains according to CMSampleBufferGetFormatDescription(frame):

Optional(<CMVideoFormatDescription 0x1447dafa0 [0x1a1864b68]> {
    mediaType:'vide' 
    mediaSubType:'420f' 
    mediaSpecific: {
        codecType: '420f'       dimensions: 1920 x 1080 
    } 
    extensions: {<CFBasicHash 0x1447dba10 [0x1a1864b68]>{type = immutable dict, count = 6,
entries =>
    0 : <CFString 0x19d28b678 [0x1a1864b68]>{contents = "CVImageBufferYCbCrMatrix"} = <CFString 0x19d28b6b8 [0x1a1864b68]>{contents = "ITU_R_601_4"}
    1 : <CFString 0x19d28b7d8 [0x1a1864b68]>{contents = "CVImageBufferTransferFunction"} = <CFString 0x19d28b698 [0x1a1864b68]>{contents = "ITU_R_709_2"}
    2 : <CFString 0x19d2b65c0 [0x1a1864b68]>{contents = "CVBytesPerRow"} = <CFNumber 0xb00000000000b582 [0x1a1864b68]>{value = +2904, type = kCFNumberSInt32Type}
    3 : <CFString 0x19d2b6640 [0x1a1864b68]>{contents = "Version"} = <CFNumber 0xb000000000000022 [0x1a1864b68]>{value = +2, type = kCFNumberSInt32Type}
    5 : <CFString 0x19d28b758 [0x1a1864b68]>{contents = "CVImageBufferColorPrimaries"} = <CFString 0x19d28b698 [0x1a1864b68]>{contents = "ITU_R_709_2"}
    6 : <CFString 0x19d28b818 [0x1a1864b68]>{contents = "CVImageBufferChromaLocationTopField"} = <CFString 0x19d28b878 [0x1a1864b68]>{contents = "Center"}
}
}
})

I realize there may be multiple "planes" here, but even with:

let pixelBufferBytesPerRow0 = CVPixelBufferGetBytesPerRowOfPlane(imageBuffer, 0)
let pixelBufferBytesPerRow1 = CVPixelBufferGetBytesPerRowOfPlane(imageBuffer, 1)

Gives:

Pixel buffer bytes per row (Plane 0): 1920
Pixel buffer bytes per row (Plane 1): 1920

I don't understand that discrepancy.

I also attempted to process each pixel individually as it is clear the buffer contains some manner of YCbCr, but it fails every way I have tried. The Scandit API suggest (Objective-C):

// Get the buffer info for the YCbCrBiPlanar format.
void *baseAddress = CVPixelBufferGetBaseAddress(imageBuffer);
CVPlanarPixelBufferInfo_YCbCrBiPlanar *bufferInfo = (CVPlanarPixelBufferInfo_YCbCrBiPlanar *)baseAddress;

But, I cannot find a Swift implementation that permits access to the buffer info using CVPlanarPixelBufferInfo... everything I have tried fails, so I am unable to determine the offset for "Y", "Cr", etc.

How can I access the pixel data in the buffer? Is this a problem with the CMSampleBuffer the SDK is passing, a problem with iOS9, or both?

ph0t0n
  • 675
  • 1
  • 10
  • 21
  • Have a look at http://stackoverflow.com/questions/34569750/get-pixel-value-from-cvpixelbufferref-in-swift/34570127#34570127 – Codo Jan 29 '16 at 14:30

2 Answers2

8

Working from Codo's "hints" and integrating with Objective-C code in the Scandit documentation, I worked out a solution in Swift. Though I accepted Codo's answer as it helped tremendously, I'm also answering my own question in the hopes that a complete solution would help someone in the future:

let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)!
CVPixelBufferLockBaseAddress(pixelBuffer, 0)
let lumaBaseAddress = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 0)
let chromaBaseAddress = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 1)

let width = CVPixelBufferGetWidth(pixelBuffer)
let height = CVPixelBufferGetHeight(pixelBuffer)

let lumaBytesPerRow = CVPixelBufferGetBytesPerRowOfPlane(pixelBuffer, 0)
let chromaBytesPerRow = CVPixelBufferGetBytesPerRowOfPlane(pixelBuffer, 1)
let lumaBuffer = UnsafeMutablePointer<UInt8>(lumaBaseAddress)
let chromaBuffer = UnsafeMutablePointer<UInt8>(chromaBaseAddress)

var rgbaImage = [UInt8](count: 4*width*height, repeatedValue: 0)
for var x = 0; x < width; x++ {
    for var y = 0; y < height; y++ {
        let lumaIndex = x+y*lumaBytesPerRow
        let chromaIndex = (y/2)*chromaBytesPerRow+(x/2)*2
        let yp = lumaBuffer[lumaIndex]
        let cb = chromaBuffer[chromaIndex]
        let cr = chromaBuffer[chromaIndex+1]

        let ri = Double(yp)                                + 1.402   * (Double(cr) - 128)
        let gi = Double(yp) - 0.34414 * (Double(cb) - 128) - 0.71414 * (Double(cr) - 128)
        let bi = Double(yp) + 1.772   * (Double(cb) - 128)

        let r = UInt8(min(max(ri,0), 255))
        let g = UInt8(min(max(gi,0), 255))
        let b = UInt8(min(max(bi,0), 255))

        rgbaImage[(x + y * width) * 4] = b
        rgbaImage[(x + y * width) * 4 + 1] = g
        rgbaImage[(x + y * width) * 4 + 2] = r
        rgbaImage[(x + y * width) * 4 + 3] = 255
    }
}

let colorSpace = CGColorSpaceCreateDeviceRGB()
let dataProvider: CGDataProviderRef = CGDataProviderCreateWithData(nil, rgbaImage, 4 * width * height, nil)!
let bitmapInfo = CGBitmapInfo(rawValue: CGImageAlphaInfo.NoneSkipFirst.rawValue | CGBitmapInfo.ByteOrder32Little.rawValue)
let cgImage: CGImageRef = CGImageCreate(width, height, 8, 32, width * 4, colorSpace!, bitmapInfo, dataProvider, nil, true, CGColorRenderingIntent.RenderingIntentDefault)!
let image: UIImage = UIImage(CGImage: cgImage)
CVPixelBufferUnlockBaseAddress(pixelBuffer,0)

Despite iterating through the entire 8.3MP image, the code executes very quickly. I freely admit that I don't have a deep understanding of Core Media frameworks, but I believe this means the code is executing on the GPU. But, I would appreciate any comments on the code to make it more efficient, or to improve the "Swiftness" as I am completely an amateur.

ph0t0n
  • 675
  • 1
  • 10
  • 21
  • To speed up the code I see two main things: (1) check whether "Float" instead of "Double" results in an improvement. (2) Get rid of all the integer multiplication. As you're going sequentially through the memory, you should be able to work with increments and additions instead of multiplications. – Codo Feb 02 '16 at 09:07
3

This is not a complete answer, just some hints:

Scandit uses the YCbCrBiPlanar format. It has a Y byte for each pixel and a Cb and a Cr byte for each group of 2x2 pixels. The Y values are on the first plane, the Cb and Cr values on the second plane.

If the image is w x h pixels large, then the first plane contains h rows of w bytes (and maybe some padding for each line).

The second plane contains h / 2 lines of w / 2 pairs of byte. Each pair consists of a Cb and Cr value. Again each line might have some padding at the end.

So the value of Y for the pixel at position (x, y) can be found at the address:

Y: baseAddressPlane1 + y * bytesPerRowPlane1 + x

And the value Cb and Cr for the pixel at position (x, y) can be found at the address:

Cb: baseAddressPlane2 + (y / 2) * bytesPerRowPlan2 + (x / 2) * 2

Cr: baseAddressPlane2 + (y / 2) * bytesPerRowPlan2 + (x / 2) * 2 + 1

The divisions by 2 are integer divisions that discard the fractional part.

Community
  • 1
  • 1
Codo
  • 64,927
  • 16
  • 144
  • 182
  • I've made a small update to the formula for Cb and Cr. – Codo Jan 29 '16 at 15:32
  • OK, this helps significantly. I am able to access the luma data. Iterating over each pixel takes quite some time. Though I ultimately want to process the image on a pixel-wise basis, I'd be happy at this point just getting a UIImage out of the buffer. Referring to the code in my original question, I can now successfully get an image context using CGBitmapContextCreate, but the resulting image is a mix of the luma and chroma components repeated 4 times... which I'm sure has to do with striding improperly over the array, and the fact that ARGB has 4 bytes per pixel... – ph0t0n Jan 29 '16 at 19:10