0

I have spacy object of type <class 'spacy.tokens.doc.Doc'>

import spacy
import requests
nlp = spacy.load("en_core_web_sm")
spacy_doc = nlp("hello")

How can I send this object as payload in a api request? Currently when I try to post it using below code

requests.post(API_URL, json={'doc': spacy_doc}, timeout=3)

I am getting the below error

ERROR:Object of type 'Doc' is not JSON serializable

Doc.to_bytes() is mentioned in https://spacy.io/usage/saving-loading

but when I use it

requests.post(API_URL, json={'doc': spacy_doc.to_bytes()}, timeout=3)

I get

ERROR:Object of type 'bytes' is not JSON serializable

What is the correct way to send spacy doc and reconstruct it back in the api(I am developing this target api service and its made using fastapi)?

GeorgeOfTheRF
  • 6,014
  • 19
  • 49
  • 71

3 Answers3

1

using .to_json()--(doc) will convert the doc to a JSON and which can be sent over the network.

requests.post(API_URL, json={'doc': spacy_doc.to_json()}, timeout=3)
JPG
  • 56,458
  • 6
  • 55
  • 118
  • How do we reconstruct it back to spacy doc object once we recieve it in the api? I dont see corresponding from_json() spacy method. – GeorgeOfTheRF Nov 04 '20 at 09:17
  • I am not quite good at spaCy, but, you might need to use form-data to send raw bytes, afaik, JSON not compatible with bytes. – JPG Nov 04 '20 at 09:26
  • Good idea. Is it possible to send bytes as form data and also send regular json in a singe post request. – GeorgeOfTheRF Nov 04 '20 at 09:31
  • I don't think you can send form-data and json at the same time. But, [I found this](https://stackoverflow.com/questions/19439961/python-requests-post-json-and-file-in-single-request), might be useful for you. – JPG Nov 04 '20 at 10:08
0

So you want to serialize the doc objects:

Often it’s sufficient to use the Doc.to_array functionality to serialize docs and just serialize the numpy arrays – but other times you want a more general way to save and restore Doc objects.

The DocBin class makes it easy to serialize and deserialize a collection of Doc objects together, and is much more efficient than calling Doc.to_bytes on each individual Doc object. You can also control what data gets saved (see attrs=["LEMMA", "ENT_IOB", "ENT_TYPE"]), and you can merge pallets together for easy map/reduce-style processing.

    import spacy
    from spacy.tokens import DocBin
    
    doc_bin = DocBin(attrs=["LEMMA", "ENT_IOB", "ENT_TYPE"], store_user_data=True) 
    texts = ["Some text", "Lots of texts...", "..."]
    nlp = spacy.load("en_core_web_sm")
    for doc in nlp.pipe(texts):
        doc_bin.add(doc)
    bytes_data = doc_bin.to_bytes()

    # Deserialize later, e.g. in a new process
    nlp = spacy.blank("en")
    doc_bin = DocBin().from_bytes(bytes_data)
    docs = list(doc_bin.get_docs(nlp.vocab))

source

dreadnaught
  • 117
  • 7
0

This solution worked

I.e. convert the python object to str and put it in the json payload. Later reconstruct the python object back from str

import pickle
import codecs
# encode returns bytes so it needs to be decoded to string
pickled = pickle.loads(codecs.decode(pickled.encode(), 'base64')).decode()

type(pickled)  # <class 'str'>

unpickled = pickle.loads(codecs.decode(pickled.encode(), 'base64'))
GeorgeOfTheRF
  • 6,014
  • 19
  • 49
  • 71