0

I have tab delimited ascii data in txt files which are zip compressed (and the zip may or may not contain other files). I would like to read this data into a matrix without uncompressing the zip files.

There were a few similar @matlab / @java posts earlier:

Read the data of CSV file inside Zip File without extracting the contents in Matlab

Extracting specific file from zip in matlab

Read Content from Files which are inside Zip file

I have gotten this far thanks to the above - I can identify the .txt inside the zip, but don't know how to actually read its contents. First example:

zipFilename = 'example.zip';
zipJavaFile = java.io.File(zipFilename);
zipFile=org.apache.tools.zip.ZipFile(zipJavaFile);
entries=zipFile.getEntries;
cnt=1;
while entries.hasMoreElements
    tempObj=entries.nextElement;
    file{cnt,1}=tempObj.getName.toCharArray';
    cnt=cnt+1;
end
ind=regexp(file,'$*.xml$');
ind=find(~cellfun(@isempty,ind));
file=file(ind);
file = cellfun(@(x) fullfile('.',x),file,'UniformOutput',false);
% Now Operate Any thing on File.
zipFile.close

HOWEVER, I found no example as to how to "operate anything on file". I can extract the path within the zip file, but don't know how to actually read the contents of this txt file. (I wish to directly read its contents into memory -- a matrix --, without extraction, if possible.)

The other example is

zipFilename = 'example.zip';
zipFile = org.apache.tools.zip.ZipFile(zipFilename);
entries = zipFile.getEntries;
while entries.hasMoreElements
    entry = entries.nextElement;
    entryName = char(entry.getName);
    [~,~,ext] = fileparts(entryName);
    if strcmp(ext,'.txt')
        inputStream  = zipFile.getInputStream(entry);
        %Read the contents of the file
        inputStream.close;
    end
end
zipFile.close

The original example contained code to extract the file, but I merely want to read it directly into memory. Again, I don't know how exactly to work with this inputStream.

Could anyone give me a suggestion with a MWE?

Community
  • 1
  • 1
fusionfan
  • 1
  • 1
  • If it's compressed, you must extract it if you want to read it. It's like reading encrypted content - you must decrypt it first, don't you? – TDG May 26 '16 at 09:17
  • well I suppose you could extract into memory and read it in from memory without having to write to hard disk..not sure if the `unzip` function does this - Matlab basically sucks at handling compressed files. – GameOfThrows May 26 '16 at 09:37
  • @GameOfThrows is correct, I want to avoid writing to disk, and want to uncompress into memory. The second example above would be able to extract the file by writing it on the disk. Instead, I want to write the contents into a matrix in memory, and this is where I am stuck. – fusionfan May 26 '16 at 10:36
  • [there are many possibilities](http://stackoverflow.com/questions/309424/read-convert-an-inputstream-to-a-string). – Daniel May 26 '16 at 12:18

1 Answers1

0

It might be a little late, but maybe someone can use it: (the code was tested in Matlab R2018a)

zipFilename = 'example.zip';
zipFile = org.apache.tools.zip.ZipFile(zipFilename);
entries = zipFile.getEntries;
while entries.hasMoreElements
    entry = entries.nextElement;
    entryName = char(entry.getName);
    [~,~,ext] = fileparts(entryName);
    if strcmp(ext,'.txt')
        inputStream  = zipFile.getInputStream(entry);
        %Read the contents of the file

        buffer = java.io.ByteArrayOutputStream();
        org.apache.commons.io.IOUtils.copy(inputStream, buffer);
        data = char(typecast(buffer.toByteArray(), 'uint8')');

        inputStream.close;
    end
end
zipFile.close
Aboe12
  • 1