-2

The following URL works as expected and returns "null".

https://zga2tn1wgd.execute-api.us-east-1.amazonaws.com/mycall?url=https://mr.wikipedia.org/s/4jp4

But the same page, with unicode string instead of ascii string, throws an error:

"errorMessage": "'ascii' codec can't encode characters in position 10-20: ordinal not in range(128)", "errorType": "UnicodeEncodeError"

How do I encode the unicode characters while passing the string to API gateway?

https://zga2tn1wgd.execute-api.us-east-1.amazonaws.com/mycall?url=https://mr.wikipedia.org/wiki/%E0%A4%95%E0%A4%BF%E0%A4%B6%E0%A5%8B%E0%A4%B0%E0%A4%BE%E0%A4%B5%E0%A4%B8%E0%A5%8D%E0%A4%A5%E0%A4%BE


I am using following bookmarklet to generate the URL mentioned above...

javascript:(function(){location.href='https://z3nt6lcj40.execute-api.us-east-1.amazonaws.com/mycall?url='+encodeURIComponent(location.href);})();
shantanuo
  • 27,732
  • 66
  • 204
  • 340
  • what is `/mycall` ? – devio Apr 10 '20 at 06:20
  • @devio python code from line 57 to 91 https://datameetgeobk.s3.amazonaws.com/cftemplates/furl.yaml.txt – shantanuo Apr 11 '20 at 08:01
  • The likely answer is exactly what the error message is telling you: the `mycall` function is using an ASCII codec, which can't understand Unicode. If you don't own the code for `mycall`, you can't give it Unicode because it's using an ASCII codec. If you do, you need to change which codec it's using to interpret the byte string it receives. If you do own the code for `mycall`, it'd be easier to help if we saw that – TurnipEntropy Apr 15 '20 at 20:48

1 Answers1

2

There is this line in your lambda function that unquotes the URL

url1 = urllib.parse.unquote(url)

from

'https://zga2tn1wgd.execute-api.us-east-1.amazonaws.com/mycall?url=https://mr.wikipedia.org/wiki/%E0%A4%95%E0%A4%BF%E0%A4%B6%E0%A5%8B%E0%A4%B0%E0%A4%BE%E0%A4%B5%E0%A4%B8%E0%A5%8D%E0%A4%A5%E0%A4%BE'

to

'https://zga2tn1wgd.execute-api.us-east-1.amazonaws.com/mycall?url=https://mr.wikipedia.org/wiki/किशोरावस्था'

The non US-ASCII parts of the above results has to be encoded before performing the request. This is in the query component.

It is recommended to separate URI into its components when encoding it to keep from changing its semantics.

Here is some more things to do before making request to the URL.

url1 = urllib.parse.unquote(url)
urlparts = urllib.parse.urlparse(url1)
querypart = urllib.parse.parse_qs(urlparts.query)
querypart_enc = urllib.parse.urlencode(querypart)

# Rebuild URL with escaped query part
url1 = urllib.parse.urlunparse((
     urlparts.scheme, urlparts.netloc, 
     urlparts.path, urlparts.params,
     querypart_enc, urlparts.fragment
))
Oluwafemi Sule
  • 27,776
  • 1
  • 40
  • 64