10

I am working with a Python app with Flask running on Bluemix. I know how to use Object Storage with the swiftclient module for creating a container and saving a file in it, but how do I dump a joblib or pickle file contained within it? And how do I load it back in my Python program?

Here is the code to store a simple text file.

import swiftclient

app = Flask(__name__)
CORS(app)


cloudant_service = json.loads(os.environ['VCAP_SERVICES'])['Object-Storage'][0]
objectstorage_creds = cloudant_service['credentials']

if objectstorage_creds:
   auth_url = objectstorage_creds['auth_url'] + '/v3' #authorization URL
   password = objectstorage_creds['password'] #password
   project_id = objectstorage_creds['projectId'] #project id
   user_id = objectstorage_creds['userId'] #user id 
   region_name = objectstorage_creds['region'] #region name 

def predict_joblib():
  print('satart')
  conn = swiftclient.Connection(key=password,authurl=auth_url,auth_version='3',os_options={"project_id": project_id,"user_id": user_id,"region_name": region_name})
  container_name = 'new-container'

  # File name for testing
  file_name = 'requirment.txt'

  # Create a new container
  conn.put_container(container_name)
  print ("nContainer %s created successfully." % container_name)

  # List your containers
  print ("nContainer List:")
  for container in conn.get_account()[1]:
    print (container['name'])

  # Create a file for uploading
  with open(file_name, 'w') as example_file:
    conn.put_object(container_name,file_name,contents= "",content_type='text/plain')

  # List objects in a container, and prints out each object name, the file size, and last modified date
  print ("nObject List:")
  for container in conn.get_account()[1]:
    for data in conn.get_container(container['name'])[1]:
      print ('object: {0}t size: {1}t date: {2}'.format(data['name'], data['bytes'], data['last_modified']))

  # Download an object and save it to ./my_example.txt
  obj = conn.get_object(container_name, file_name)
  with open(file_name, 'w') as my_example:
    my_example.write(obj[1])
  print ("nObject %s downloaded successfully." % file_name)




@app.route('/')
def hello():
    dff = predict_joblib()
    return 'Welcome to Python Flask!'

@app.route('/signUp')
def signUp():
    return 'signUp'


port = os.getenv('PORT', '5000')
if __name__ == "__main__":
    app.debug = True
    app.run(host='0.0.0.0', port=int(port))
sagar43
  • 2,734
  • 2
  • 22
  • 45

1 Answers1

1

Since both file.open and pickle.dumps returns byte objects as seem on python docs:

pickle.dumps(obj, protocol=None, *, fix_imports=True) Return the pickled representation of the object as a bytes object, instead of writing it to a file.

open(name[, mode[, buffering]]) Open a file, returning an object of the file type described in section File Objects. If the file cannot be opened, IOError is raised. When opening a file, it’s preferable to use open() instead of invoking the file constructor directly.

You can just tackle in the object that you want to store as obj like:

# Create a file for uploading
file = pickle.dumps(obj)
conn.put_object(container_name,file,contents= "",content_type='application/python-pickle')

This change in content type is due to standards in http protocol. This I got from another SO question, please check. As stated:

It is the de-facto standard. RFC2046 states: 4.5.3. Other Application Subtypes It is expected that many other subtypes of "application" will be defined in the future. MIME implementations must at a minimum treat any unrecognized subtypes as being equivalent to "application/octet- stream". So, to a non-pickle-aware system, the stream will look like any other octet-stream, but for a pickle-enabled system this is vital information

JeanPaulDepraz
  • 515
  • 5
  • 11
  • Thanks for the suggestion but in swift client we don`t got the path to save file we just have to store file with "put_object" into the container. – sagar43 May 20 '16 at 10:25
  • @JeanPaulDepraq Thanks for the answer but i still getting issue "out of memory" on pickle.dumps(obj), because as you are saying "dump" returns bytes and in next line i have to save that object in file. So which means its using my disc space temporarily ? My main issue is that in bluemix i only have 2 GB space for storage but the pickle file i created is 5GB, so i have to save my file in object storage. – sagar43 May 23 '16 at 10:53
  • @sagar43 I checked the `dumps` implementation and it does not uses disk space. You are running out of ram memory, as such I would divide the object into chunks of like 500mbs and then send it and then build it whole when getting it back. I don't know how muck ram you have but you can even write a pickle chunk of 1.5gbs to the disk, then open the file and send it like in your example. – JeanPaulDepraz May 24 '16 at 03:26
  • I have 8GB ram but i am also getting error in case i used pickle compression that is Ram memory error. But when i use without compression i am getting error on Dump() which is error of physical memory. I don`t know why this is happening i am clueless. – sagar43 May 24 '16 at 07:34
  • @JeanPaulDepraq Are you open to using PySpark and a Jupyter notebook? If so, you can write out the pickle using [saveAsPickleFile](http://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.RDD.saveAsPickleFile) which would perform parallel writes and avoid the RAM memory exhaustion. The write path would use a "swift://" protocol url. If that's an option, willing to write up an answer along these lines. – Sanjay.Joshi Jan 23 '17 at 17:57