0

I am trying to consume a streamed response in Python from a soap API, and output a CSV file. The response outputs a string coded in base 64, which I do not know what to do with. Also the api documentation says that the response must be read to a destination buffer-by-buffer.

Here is the C# code was provided by the api's documentation:

byte[] buffer = new byte[4000];
bool endOfStream = false;
int bytesRead = 0;
using (FileStream localFileStream = new FileStream(destinationPath, FileMode.Create, FileAccess.Write))
{
   using (Stream remoteStream = client.DownloadFile(jobId))
   {
     while (!endOfStream)
     {
         bytesRead = remoteStream.Read(buffer, 0, buffer.Length);
         if (bytesRead > 0)
         {
              localFileStream.Write(buffer, 0, bytesRead);
              totalBytes += bytesRead;
         }
         else
         {
              endOfStream = true;
         }
      }
   }
}

I have tried many different things to get this stream to a readable csv file, but non have worked.

with open('test.csv', 'w') as f: f.write(FileString)

Returns a csv with the base64 string spread over multiple lines

Here is my latest attempt:

with open('csvfile13.csv', 'wb') as csvfile:
          FileString = client.service.DownloadFile(yyy.JobId, False)
          stream = io.BytesIO(str(FileString))
          with open(stream,"rt",4000) as readstream:
             csvfile.write(readstream)

This produces the error:

TypeError: coercing to Unicode: need string or buffer, _io.BytesIO

Any help would be greatly appreciated, even if it is just to point me in the right direction. I will be ensure to award the points to whoever is the most helpful, even if I do not completely solve the issue!

I have asked several questions similar to this one, but I have yet to find an answer that works completely: What is the Python equivalent to FileStream in C#?

Write Streamed Response(file-like object) to CSV file Byte by Byte in Python

How to replicate C# 'byte' and 'Write' in Python

Let me know if you need further clarification!

Update: I have tried print(base64.b64decode(str(FileString)))

This gives me a page full of webdings like

]�P�O�J��Y��KW �

I have also tried

for data in client.service.DownloadFile(yyy.JobId, False):
    print data

But this just loops through the output character by characater like any other string.

I have also managed to get a long string of bytes like \xbc\x97_D\xfb(not actual bytes, just similar format) by decoding the entire string, but I do not know how to make this readable.

Edit: Corrected the output of the sample python, added more example code, formatting

Community
  • 1
  • 1
walker_4
  • 419
  • 1
  • 7
  • 19
  • What is`type(FileString)` in `FileString = client.service.DownloadFile(yyy.JobId, False)` ??? Also, what version of Python 2 are you using? – juanpa.arrivillaga Mar 08 '17 at 00:12
  • class 'suds.sax.text.Text' – walker_4 Mar 08 '17 at 00:14
  • And I am using Python 2.7.12 |Anaconda 4.2.0 – walker_4 Mar 08 '17 at 00:15
  • But basically, you don't need to `open` a `io.BytesIO` object. I'm surprised you aren't getting an error. – juanpa.arrivillaga Mar 08 '17 at 00:15
  • Yeah sorry, just double checked that now. I am getting:"TypeError: coercing to Unicode: need string or buffer, _io.BytesIO found" – walker_4 Mar 08 '17 at 00:17
  • Well, what happens when you `print str(FileString)`??? – juanpa.arrivillaga Mar 08 '17 at 00:17
  • I'm pretty sure all you need is `with open('test.csv', 'w') as f: f.write(FileString)`. – juanpa.arrivillaga Mar 08 '17 at 00:21
  • When I run print str(FileString) I get a long string of characters in what appears to be base64, interspersed with "/" at irregular intervals. When I try to write, it simply puts the encoded string in the csv at several different lines – walker_4 Mar 08 '17 at 00:25
  • I think this is due to the fact that the response must be read "buffer-by-buffer" – walker_4 Mar 08 '17 at 00:27
  • `byte to byte` required window number and end packet delimiter. otherwise can't protect char bit position if have standard packet length. Another point `file_write` required string or byte not `byte_object`. – dsgdfg Mar 10 '17 at 12:57
  • The 4000 byte buffer is needed in C# only. Can you put up a link to the output of ``with open('test.csv', 'w') as f: f.write(FileString)``? If not, examine: ``import base64; print(base64.b64decode(str(FileString)))`` – wolfmanx Mar 12 '17 at 15:23
  • Thanks for your response, when I try import base64; print(base64.b64decode(str(FileString))) I get a page full of webdings like **]�P�O�J��Y��KW �** – walker_4 Mar 13 '17 at 16:25
  • if you know how to read this file in c#, you can use pythonnet to do the same in cpython. sorry I cannot test, because you have NOT provided full reproducible sample. – denfromufa Mar 14 '17 at 18:11
  • Yeah, apologies about not using a reproducible sample. The info is sensitive, and I am being extra cautious. I'll look into pythonnet – walker_4 Mar 14 '17 at 20:11
  • Would you be able to point be to a page which show something similar? – walker_4 Mar 14 '17 at 21:09

2 Answers2

1

It sounds like you need to use the base64 module to decode the downloaded data.

It might be as simple as:

with open(destinationPath, 'w') as localFile:
    remoteFile = client.service.DownloadFile(yyy.JobId, False)
    remoteData = str(remoteFile).decode('base64')
    localFile.write(remoteData)

I suggest you break the problem down and determine what data you have at each stage. For example what exactly are you getting back from client.service.DownloadFile?

Decoding your sample downloaded data (given in the comments):

'UEsYAItH7brgsgPutAG\AoAYYAYa='.decode('base64')

gives

'PK\x18\x00\x8bG\xed\xba\xe0\xb2\x03\xee\xb4\x01\x80\xa0\x06\x18\x01\x86'

This looks suspiciously like a ZIP file header. I suggest you rename the file .zip and open it as such to investigate.

If remoteData is a ZIP something like the following should extract and write your CSV.

import io
import zipfile

remoteFile = client.service.DownloadFile(yyy.JobId, False)
remoteData = str(remoteFile).decode('base64')

zipStream = io.BytesIO(remoteData)
z = zipfile.ZipFile(zipStream, 'r')
csvData = z.read(z.infolist()[0])

with open(destinationPath, 'w') as localFile:
    localFile.write(csvData)

Note: BASE64 can have some variations regarding padding and alternate character mapping but once you can see the data it should be reasonably clear what you need. Of course carefully read the documentation on your SOAP interface.

Community
  • 1
  • 1
Mike Robins
  • 1,585
  • 7
  • 13
  • Thanks for your response Mike. When I try your first code batch it gives me the error **AttributeError: 'Text' object has no attribute 'readline'**, and when I do the second batch it simply gives me the encoded string character by character. – walker_4 Mar 13 '17 at 16:08
  • I have managed to get the individual bytes in a form like this: **\xbc\x97_D\xfb** by decoding the entire string, but I do not know how to make this readable, and I suspect it must be decoded in chunks somehow. – walker_4 Mar 13 '17 at 16:13
  • It sounds like DownloadFile returns the whole file as a BASE64 encoded string. There is a typo in my second code snippets, sorry. It should read `base64.b64decode(data)` Have a look at [this question](http://stackoverflow.com/questions/3866316/whats-the-difference-between-utf8-utf16-and-base64-in-terms-of-encoding). Try `localFile.write(base64.b64decode(remoteFile))` – Mike Robins Mar 14 '17 at 01:48
  • Thanks again Mike, unfortunately, using "localFile.write(base64.b64decode(remoteFile))" gives me a csv file full of webdings. I believe I need to split the filestring, but I do not know how to do i. – walker_4 Mar 14 '17 at 20:38
  • Hmm.. we really need to "determine what data you have at each stage". What does `client.service.DownloadFile` return? a string, a file like object, or a suds object and we have to call `remoteFile.Read()` to get the data. Lets forget about the base64 decoding until you have a string like `'TWlrZSBSb2JpbnM='` to decode. What do you get from `print remoteFile` and `print.remoteFile.Read()`? Suds is definitely succeding in the SOAP call is it? I see a lot of questions about authentication. – Mike Robins Mar 15 '17 at 05:40
  • Yes, sorry I have not been more clear. client.service.DownloadFile returns a very long encoded sting like: **UEsYAItH7brgsgPutAG\AoAYYAY...** (note this string is gibberish, but in the same general format as the actual string) I will simply see the string printed if I put in print remote file, and will get an error 'AttributeError: 'Text' object has no attribute 'read''. I do not think that it is an authentication issue as I am able to check the status of the job, and have received authentication errors when changing passwords. – walker_4 Mar 15 '17 at 16:35
  • @cptnhaddock, Base 64 decoding the string you supplied `'UEsYAItH7brgsgPutAG\AoAYYAYa='.decode('base64')` gives a byte string. It is not a line of CSV. I puzzled about UTF encodings etc but it does not look like that. The bytes start with `'PK\x18\x00\x8bG\xed'`, a quick search tells me that zip files start with 'PK'. I think your downloaded file is a ZIP of the CSV. I'll amend my answer above. – Mike Robins Mar 16 '17 at 02:22
  • BOOM! You got it! Thanks so much for your help! – walker_4 Mar 16 '17 at 15:53
  • That's great, I'm glad we got to the bottom of that. One further thing you may need to consider: if the downloaded file is large, you may need to loop around receiving chunks of the file, base64 decoding them and appending to a temporary zip file, from which you finally extract the CSV. I'll leave that as an exercise for the reader. – Mike Robins Mar 17 '17 at 05:48
0

Are you sure FileString is a Base64 string? Based on the source code here, suds.sax.text.Text is a subclass of Unicode. You can write this to a file as you would a normal string but whatever you use to read the data from the file may corrupt it unless it's UTF-8-encoded.

You can try writing your Text object to a UTF-8-encoded file using io.open:

import io
with io.open('/path/to/my/file.txt', 'w', encoding='utf_8') as f:
    f.write(FileString)

Bear in mind, your console or text editor may have trouble displaying non-ASCII characters but that doesn't mean they're not encoded properly. Another way to inspect them is to open the file back up in the Python interactive shell:

import io
with io.open('/path/to/my/file.txt', 'r', encoding='utf_8') as f:
    next(f) # displays the representation of the first line of the file as a Unicode object

In Python 3, you can even use the built-in csv to parse the file, however in Python 2, you'll need to pip install backports.csv because the built-in module doesn't work with Unicode objects:

from backports import csv
import io
with io.open('/path/to/my/file.txt', 'r', encoding='utf_8') as f:
    r = csv.reader(f)
    next(r) # displays the representation of the first line of the file as a list of Unicode objects (each value separated)
Jared
  • 517
  • 3
  • 9
  • Thanks Jared, I believe you are on the right track, however when I run `from backports import csv import io with io.open('/path/to/my/file.txt', 'r', encoding='utf_8') as f: r = csv.reader(f) next(r) '` I get the text in the lines still encoded in what I believed to be base64 – walker_4 Mar 15 '17 at 19:24
  • When I run the same code with base64.b64decode(next(r)). I get bytes in long lines of bytes in the form "\xe5\xd1\" – walker_4 Mar 15 '17 at 19:28
  • Can you give an example including a) what you expect to see and b) what you actually see? And can you explain why you expect to see it? – Jared Mar 16 '17 at 18:04