1

I am reading from an excel file. In one of the columns (which includes only very long numbers), the MATLAB imported it as CELL (because there's a heading).

This is the sample imported:
'980000684210053338'

These are my tries:

Converts to DOUBLE but numbers change
tableM.v1 = cellfun(@str2num,tableM.v1);

Converts to DOUBLE but numbers change
tableM.v1 = cellfun(@str2double,tableM.v1);

Converts to CHAR but numbers are correct
tableM.v1 = cell2mat(tableM.v1);

How do I convert this CELL to DOUBLE while maintaining the correct values?

Thanks

P.S. I am using MATLAB R2016a.

UPDATE:
I ran this code from the answers:
tableM.v1 = vpa(tableM.v1); % assuming tableM.v1 is a cellstr

And I received this error:

Warning: Support of strings that are not valid variable names or define a number will be removed in a future release. To create symbolic expressions, first create symbolic variables and then use operations on them.
In sym>convertExpression (line 1536)
In sym>convertChar (line 1441)
In sym>tomupad (line 1198)
In sym (line 177)
In cell2sym (line 28)
In sym>tomupad (line 1208)
In sym (line 177)
In vpa (line 44)
Error using symengine
New array must have the same number of elements as the original array.

Error in sym/reshape (line 50)
ySym = mupadmex('symobj::reshape',x.s,args{:});

Error in cell2sym (line 34)
S = reshape(sym([Csym{:}]), size(C));

Error in sym>tomupad (line 1208)
xsym = cell2sym(x);

Error in sym (line 177)
S.s = tomupad(x);

Error in vpa (line 44)
ss = sym(s);

PM0087
  • 123
  • 8

2 Answers2

1

Double precision numbers have at-most 15 stable decimal places or, as MathWorks puts it, "double values cannot represent all integers greater than 253 correctly". Since the Excel number is 18 significant decimal places long, loss of precision to doubleconversion is unavoidable.

To avoid loss of precision, you can convert the strings to numbers that use variable precision arithmetic:

 tableM.v1 = vpa(tableM.v1); % assuming tableM.v1 is a cellstr

This will more than likely kill performance, but that's the pay-off for exact representation (until MATLAB natively supports 128-bit floating point, which is both likely far away in time and more performance intensive anyway).

In theory, uint64 could also hold the integer exactly, but there's doesn't appear to be a clean way to convert a string to an n-bit integer that I can find.

Community
  • 1
  • 1
TroyHaskin
  • 8,191
  • 3
  • 20
  • 22
  • I receive errors when I use this code. I've updated the Question with the error. – PM0087 Dec 07 '16 at 16:30
  • @PeyM87 That tells me that `tableM.v1` is not entirely a cell array of strings containing only numbers. That's the only way it will work. – TroyHaskin Dec 07 '16 at 16:51
1

Better Answer

The answer down below is the accepted answer, but while procrastinating on something, I realized that it's too clever. I think what you really want is to use textscan:

tableM.v1 = cellfun(@(x) textscan(x, '%u64'), tableM.v1);

Textscan already has checks for exponentials and decimals, goes straight to integer class without passing through double, and gets overflow right (my overflow example below isn't quite right because the exponent vector also overflows. Maximum uint64 is actually 9223372036854775807). Instead of a nice vector, however, you will end up with a cell array of numbers because that's what textscan spits out. Any badly formed numbers will result in empty cells, which you will have to deal with before converting to a vector.

>> in = {'cat', '1e10', '980000684210053338};
>> out = cellfun(@(x) textscan(x, '%u64'), in)
out =

  1×3 cell array

  [0×1 uint64]    [10000000000]    [980000684210053338]

Once you've fixed those, you can convert to a vector with cell2mat.


Original Answer

As the other answers point out, doubles just can't hold those numbers due to loss of precision. You need to cast them to a 64-bit integer without first passing through the double filter. Try this three-line function:

function out = str2uint64(in)
    % Convert the digits into an array of numbers and cast to
    % uint64
    in = uint64(in - 48);

    % Create the order of magnitude for each digit and convert
    % that also to uint64
    exponents = uint64(logspace(numel(in)-1, 0, numel(in)));

    % Why would sum default to convert your numbers to doubles?!?
    % The 'native' tag is recent, I believe, but if you have it,
    % it will preserve the data type.
    out = sum(in .* exponents, 'native');
end

To use:

tableM.v1 = cellfun(@str2uint64,tableM.v1);

One caveat is that for some dumb reason, when MATLAB sums numbers, it casts them to double. In the current release, R2016b, there is a flag to tell it to sum without casting. I don't know when that flag was released, so your mileage may vary. If you don't have that option, you would have to do the sum in a loop instead.

Another caveat: this function has no input or output checking, so str2uint64('cat') = 5658 (I'm just converting numbers based on their position in the ASCII table) and str2uint64('1000000000000000000') = 18446744073709551615 (overflow). Use at your own risk.


craigim
  • 3,704
  • 1
  • 20
  • 37
  • Nice +1. I was very surprised to see that MATLAB didn't already have a `str2int*` suite. The internal `double` casting is also really weird and annoying. I noticed it, for some reason, happens when converting a VPA non-scalar to `uint64` as well: `str = '980000684210053338'; disp(uint64(vpa(str)));disp(uint64(vpa(repmat({str},1,2))));` produces different results, which is ridiculous. – TroyHaskin Dec 06 '16 at 14:56
  • This seems to work, one small problem though. It converts '0,0090' to '90' or '3,5610' to '305610'. – PM0087 Dec 07 '16 at 16:35
  • Your example didn't include characters other than digits. That's why I added that last caveat about error checking. The function also assumes that your inputs are already integers, not floats, because you didn't indicate that it was an option. That will take a few extra steps to search for the decimal indicator and then truncate the number. – craigim Dec 07 '16 at 17:37
  • I know. I was just checking it with different numbers to see what the limitations are so I'm somewhat aware. I've accepted your answer. Thanks. – PM0087 Dec 07 '16 at 17:52
  • @PeyM87 I know you already accepted my answer, but I think I came up with a better and more robust solution. – craigim Dec 08 '16 at 13:31
  • Thanks. I tested both scripts and the results seem the same (on my data). I didn't get any "badly formed numbers". P.S. This one also doesn't work correctly for the (0,0090) example. But I guess that's not the purpose. I solved those digits numbers example in some other way: `tableM.v1 = cellfun(@str2double,tableM.v1,'UniformOutput',0); tableM.v1 = cellfun(@(x) x/10000,tableM.v1);` – PM0087 Dec 08 '16 at 14:50
  • Really? For `0,0090` I get `0`, which is the correct answer. – craigim Dec 08 '16 at 15:27