101

With Swift 3 leaning towards Data instead of [UInt8], I'm trying to ferret out what the most efficient/idiomatic way to encode/decode swifts various number types (UInt8, Double, Float, Int64, etc) as Data objects.

There's this answer for using [UInt8], but it seems to be using various pointer APIs that I can't find on Data.

I'd like to basically some custom extensions that look something like:

let input = 42.13 // implicit Double
let bytes = input.data
let roundtrip = bytes.to(Double) // --> 42.13

The part that really eludes me, I've looked through a bunch of the docs, is how I can get some sort of pointer thing (OpaquePointer or BufferPointer or UnsafePointer?) from any basic struct (which all of the numbers are). In C, I would just slap an ampersand in front of it, and there ya go.

Community
  • 1
  • 1
Travis Griggs
  • 18,930
  • 17
  • 76
  • 137

3 Answers3

269

Note: The code has been updated for Swift 5 (Xcode 10.2) now. (Swift 3 and Swift 4.2 versions can be found in the edit history.) Also possibly unaligned data is now correctly handled.

How to create Data from a value

As of Swift 4.2, data can be created from a value simply with

let value = 42.13
let data = withUnsafeBytes(of: value) { Data($0) }

print(data as NSData) // <713d0ad7 a3104540>

Explanation:

  • withUnsafeBytes(of: value) invokes the closure with a buffer pointer covering the raw bytes of the value.
  • A raw buffer pointer is a sequence of bytes, therefore Data($0) can be used to create the data.

How to retrieve a value from Data

As of Swift 5, the withUnsafeBytes(_:) of Data invokes the closure with an “untyped” UnsafeMutableRawBufferPointer to the bytes. The load(fromByteOffset:as:) method the reads the value from the memory:

let data = Data([0x71, 0x3d, 0x0a, 0xd7, 0xa3, 0x10, 0x45, 0x40])
let value = data.withUnsafeBytes {
    $0.load(as: Double.self)
}
print(value) // 42.13

There is one problem with this approach: It requires that the memory is property aligned for the type (here: aligned to a 8-byte address). But that is not guaranteed, e.g. if the data was obtained as a slice of another Data value.

It is therefore safer to copy the bytes to the value:

let data = Data([0x71, 0x3d, 0x0a, 0xd7, 0xa3, 0x10, 0x45, 0x40])
var value = 0.0
let bytesCopied = withUnsafeMutableBytes(of: &value, { data.copyBytes(to: $0)} )
assert(bytesCopied == MemoryLayout.size(ofValue: value))
print(value) // 42.13

Explanation:

  • withUnsafeMutableBytes(of:_:) invokes the closure with a mutable buffer pointer covering the raw bytes of the value.
  • The copyBytes(to:) method of DataProtocol (to which Data conforms) copies bytes from the data to that buffer.

The return value of copyBytes() is the number of bytes copied. It is equal to the size of the destination buffer, or less if the data does not contain enough bytes.

Generic solution #1

The above conversions can now easily be implemented as generic methods of struct Data:

extension Data {

    init<T>(from value: T) {
        self = Swift.withUnsafeBytes(of: value) { Data($0) }
    }

    func to<T>(type: T.Type) -> T? where T: ExpressibleByIntegerLiteral {
        var value: T = 0
        guard count >= MemoryLayout.size(ofValue: value) else { return nil }
        _ = Swift.withUnsafeMutableBytes(of: &value, { copyBytes(to: $0)} )
        return value
    }
}

The constraint T: ExpressibleByIntegerLiteral is added here so that we can easily initialize the value to “zero” – that is not really a restriction because this method can be used with “trival” (integer and floating point) types anyway, see below.

Example:

let value = 42.13 // implicit Double
let data = Data(from: value)
print(data as NSData) // <713d0ad7 a3104540>

if let roundtrip = data.to(type: Double.self) {
    print(roundtrip) // 42.13
} else {
    print("not enough data")
}

Similarly, you can convert arrays to Data and back:

extension Data {

    init<T>(fromArray values: [T]) {
        self = values.withUnsafeBytes { Data($0) }
    }

    func toArray<T>(type: T.Type) -> [T] where T: ExpressibleByIntegerLiteral {
        var array = Array<T>(repeating: 0, count: self.count/MemoryLayout<T>.stride)
        _ = array.withUnsafeMutableBytes { copyBytes(to: $0) }
        return array
    }
}

Example:

let value: [Int16] = [1, Int16.max, Int16.min]
let data = Data(fromArray: value)
print(data as NSData) // <0100ff7f 0080>

let roundtrip = data.toArray(type: Int16.self)
print(roundtrip) // [1, 32767, -32768]

Generic solution #2

The above approach has one disadvantage: It actually works only with "trivial" types like integers and floating point types. "Complex" types like Array and String have (hidden) pointers to the underlying storage and cannot be passed around by just copying the struct itself. It also would not work with reference types which are just pointers to the real object storage.

So solve that problem, one can

  • Define a protocol which defines the methods for converting to Data and back:

    protocol DataConvertible {
        init?(data: Data)
        var data: Data { get }
    }
    
  • Implement the conversions as default methods in a protocol extension:

    extension DataConvertible where Self: ExpressibleByIntegerLiteral{
    
        init?(data: Data) {
            var value: Self = 0
            guard data.count == MemoryLayout.size(ofValue: value) else { return nil }
            _ = withUnsafeMutableBytes(of: &value, { data.copyBytes(to: $0)} )
            self = value
        }
    
        var data: Data {
            return withUnsafeBytes(of: self) { Data($0) }
        }
    }
    

    I have chosen a failable initializer here which checks that the number of bytes provided matches the size of the type.

  • And finally declare conformance to all types which can safely be converted to Data and back:

    extension Int : DataConvertible { }
    extension Float : DataConvertible { }
    extension Double : DataConvertible { }
    // add more types here ...
    

This makes the conversion even more elegant:

let value = 42.13
let data = value.data
print(data as NSData) // <713d0ad7 a3104540>

if let roundtrip = Double(data: data) {
    print(roundtrip) // 42.13
}

The advantage of the second approach is that you cannot inadvertently do unsafe conversions. The disadvantage is that you have to list all "safe" types explicitly.

You could also implement the protocol for other types which require a non-trivial conversion, such as:

extension String: DataConvertible {
    init?(data: Data) {
        self.init(data: data, encoding: .utf8)
    }
    var data: Data {
        // Note: a conversion to UTF-8 cannot fail.
        return Data(self.utf8)
    }
}

or implement the conversion methods in your own types to do whatever is necessary so serialize and deserialize a value.

Byte order

No byte order conversion is done in the above methods, the data is always in the host byte order. For a platform independent representation (e.g. “big endian” aka “network” byte order), use the corresponding integer properties resp. initializers. For example:

let value = 1000
let data = value.bigEndian.data
print(data as NSData) // <00000000 000003e8>

if let roundtrip = Int(data: data) {
    print(Int(bigEndian: roundtrip)) // 1000
}

Of course this conversion can also be done generally, in the generic conversion method.

Martin R
  • 488,667
  • 78
  • 1,132
  • 1,248
  • Does the fact that we have to make a `var` copy of the initial value, mean that we're copying the bytes twice? In my current use case, I'm turning them into Data structs, so I can `append` them to a growing stream of bytes. In straight C, this is as easy as `*(cPointer + offset) = originalValue`. So the bytes are copied just once. – Travis Griggs Jun 26 '16 at 00:36
  • 1
    @TravisGriggs: Copying an int or float will most probably not be relevant, but you *can* do similar things in Swift. If you have an `ptr: UnsafeMutablePointer` then you can assign to the referenced memory via something like `UnsafeMutablePointer(ptr + offset).pointee = value` which closely corresponds to your Swift code. There is one potential problem: Some processors allow only *aligned* memory access, e.g. you cannot store an Int at a odd memory location. I don't know if that applies to the currently used Intel and ARM processors. – Martin R Jun 26 '16 at 12:32
  • 1
    @TravisGriggs: (cont'd) ... Also this requires that a sufficiently large Data object has already been created, and in Swift you can only create *and initialize* the Data object, so you might have an additional copy of zero bytes during the initialization. – If you need more details then I would suggest that you post a new question. – Martin R Jun 26 '16 at 12:35
  • I love the generic solution #2 and am using it. But anytime I subscript a Data, it doesn't work. Is there an elegant way to make it work with subscripted Data ranges? E.g. Int32(data: input[0...3])? – Travis Griggs Jun 27 '16 at 21:10
  • @TravisGriggs: `Int32(data: data.subdata(in: 0 ..< 3))` should work, but I don't know if that involves another copy. It should also be possible to expand the above methods by another parameter, e.g. `Int32(data: data, atOffset: 4)`. – Martin R Jun 27 '16 at 21:17
  • @MartinR how would array types work with generic solution #2? – Hans Brende Nov 11 '16 at 19:26
  • 2
    @HansBrende: I am afraid that is currently not possible. It would require an `extension Array: DataConvertible where Element: DataConvertible`. That is not possible in Swift 3, but planned for Swift 4 (as far as I know). Compare "Conditional conformances" in https://github.com/apple/swift/blob/master/docs/GenericsManifesto.md#conditional-conformances- – Martin R Nov 11 '16 at 19:36
  • I am wondering if there are any potential issues with byte alignment here. Could it be that the pointer casting causes a non-aligned pointer to be returned which is then dereferenced as, for example, a double, causing a memory fault. If that is the case, might it be better to prefer a solution that byte copies the source bytes to the target? – rghome Jan 26 '17 at 22:31
  • @rghome: That is a good point. I strongly assume that the buffer of a newly created data object is aligned for all scalar types, similar to what malloc() returns, but I don't know if that is guaranteed. – Martin R Jan 27 '17 at 06:28
  • I am pretty sure a newly allocated `Data` is long aligned and I see your code only ever gets/sets stuff from the entire object, so it is OK. Where you might have problems is attempting to reuse parts of the code (as I am) to get/set stuff from offsets in the `Data`. In that case, you can't just cast the byte pointer to a non-byte pointer and dereference; you will have to do a byte copy into the target. Maybe I will post an answer if I get something working. – rghome Jan 27 '17 at 11:05
  • Are these solutions still valid for Swift 4? – user965972 Jun 26 '17 at 19:39
  • @user965972: They should. Do you have any problems with it in Swift 4? – Martin R Jun 26 '17 at 19:45
  • @user965972 Generic solution #2 works perfect with Swift 4. – Baran Emre Mar 11 '18 at 08:39
  • One more nice trick you want to add to your answer. Once you have your extension set up, you can create a mutating append function that takes any type. – Chris Garrett Apr 12 '18 at 19:59
  • I was looking for a little more safety with *Generic Solution 1*, so added some type restrictions and preconditions `func to(_ type: T.Type) -> T where T: FixedWidthInteger` and then inside the function `precondition(self.count == T.bitWidth >> 2)`. This prevents memory corruption with something like `data[0..4].to(UInt32.self)` – nteissler Jul 26 '18 at 20:31
  • oops that should be a `>> 3` – nteissler Jul 26 '18 at 21:28
  • Generic solution #1 crashes on Swift 4.2. – Moebius Dec 13 '18 at 13:19
  • @Moebius: Can you provide more details? It works for me (just double-checked in Xcode 10.1, Swift 4.2). – Martin R Dec 13 '18 at 13:22
  • @MartinR Just a heads up (because I know you have lots of answers that use this API), the `withUnsafeBytes` method on `Data` that gives you back a typed pointer [is to be deprecated in Swift 5](https://github.com/apple/swift/blob/0d4a5853bf665eb860ad19a16048664899c6cce3/stdlib/public/Darwin/Foundation/Data.swift#L2161-L2169). It will now [give you back a raw buffer pointer](https://github.com/apple/swift/blob/0d4a5853bf665eb860ad19a16048664899c6cce3/stdlib/public/Darwin/Foundation/Data.swift#L2171-L2174) to the underlying bytes. – Hamish Dec 28 '18 at 21:16
  • @Hamish: Thank you for the information, it seems that I have to update them all :( – Can you tell me what the reason for the deprecation was? – Martin R Dec 28 '18 at 21:41
  • @MartinR The old API made it far too easy to accidentally write code that wasn't memory safe – it was implemented using `assumingMemoryBound(to:)`, which can yield undefined behaviour for `withUnsafeBytes` [if the user gets two pointers of unrelated type to Data's underlying buffer](https://forums.swift.org/t/how-to-use-data-withunsafebytes-in-a-well-defined-manner/12811/8). Even if it were switched to use `bindMemory(to:)`, that would cause memory soundness issues with Data's `init(bytesNoCopy:)` initialiser. – Hamish Dec 28 '18 at 22:05
  • There's some further discussion on this issue over at: https://forums.swift.org/t/how-to-use-data-withunsafebytes-in-a-well-defined-manner/12811 – Hamish Dec 28 '18 at 22:05
  • The missing piece to this answer is that for UInts you should use one of the [CFSwap](https://developer.apple.com/documentation/corefoundation/byte-order_utilities) functions to ensure you get the right byte order because this answer will implicitly load the data in host order. – Max May 13 '19 at 13:22
  • @Max: I think that I addressed the byte order issue at the very end of this answer. There are “native” Swift methods which can be used instead of CFSwapXXX. Please let me know if something is missing or unclear. – Martin R May 13 '19 at 13:23
  • @MartinR yeah, you're right. I didn't realize that `UInt16(littleEndian:)` does the same thing as `CFSwapInt16HostToLittle`. The documentation is written in a way I find confusing but they are equivalent. – Max May 13 '19 at 15:36
  • In Swift 5, solution #1 does not seem to work for `Int`s of any kind (`Int8`, `Int16` etc) since they do not conform to `ExpressibleByIntegerLiteral`. I don't know if conformance was never there, or it was removed in later versions of Swift. – m_katsifarakis Sep 19 '19 at 07:17
  • @m_katsifarakis: I am fairly sure that it works with all integer types. `let data = Data([1, 2]); if let value = data.to(type: Int16.self) { print(value) }` compiles and runs as expected in Xcode 11 with Swift 5. – Martin R Sep 19 '19 at 07:24
  • @MartinR on Xcode 10.2.1 I get `Instance method 'to(type:)' requires that 'Int.Type' conform to 'ExpressibleByIntegerLiteral'`. Perhaps I should update Xcode and try again. – m_katsifarakis Sep 19 '19 at 07:27
  • @m_katsifarakis: I do not have Xcode 10.2.1 anymore, but it works with Xcode 10.3, which is the current released version. – Martin R Sep 19 '19 at 07:32
  • 1
    @m_katsifarakis: Could it be that you mistyped `Int.self` as `Int.Type` ? – Martin R Sep 19 '19 at 07:35
  • @MartinR that was it! Such a stupid mistake. Thanks very much for your help and the brilliant solution! – m_katsifarakis Sep 19 '19 at 07:38
  • @MartinR Hello, thank you for the great solution. What about using solution #2 for types that don't conform to `ExpressibleByIntegerLiteral`? I was using your Swift 4 solution #2 with Bool type (and others like Struct) and it was working nicely but now I can't use it anymore because they don't conform to `ExpressibleByIntegerLiteral`. – ciclopez Nov 02 '20 at 14:55
  • @ciclopez: One option would be to write a `extension Bool: DataConvertible`. – This ExpressibleByIntegerLiteral requirement is actually just a workaround for the fact that there is no protocol that all “simple” types conform to. There may be better solutions in Swift 5.3, I'll think about it some time ... – Martin R Nov 02 '20 at 15:10
3

You can get an unsafe pointer to mutable objects by using withUnsafePointer:

withUnsafePointer(&input) { /* $0 is your pointer */ }

I don't know of a way to get one for immutable objects, because the inout operator only works on mutable objects.

This is demonstrated in the answer that you've linked to.

zneak
  • 124,558
  • 39
  • 238
  • 307
2

In my case, Martin R's answer helped but the result was inverted. So I did a small change in his code:

extension UInt16 : DataConvertible {

    init?(data: Data) {
        guard data.count == MemoryLayout<UInt16>.size else { 
          return nil 
        }
    self = data.withUnsafeBytes { $0.pointee }
    }

    var data: Data {
         var value = CFSwapInt16HostToBig(self)//Acho que o padrao do IOS 'e LittleEndian, pois os bytes estavao ao contrario
         return Data(buffer: UnsafeBufferPointer(start: &value, count: 1))
    }
}

The problem is related with LittleEndian and BigEndian.

Community
  • 1
  • 1
Beto Caldas
  • 1,962
  • 2
  • 20
  • 23