-3

Introduction

I'm creating a scraper bot with telepot and selenium and when i get the text data that i need to send with the telegram bot it is unreadabl, because it contains unicode-escape characters (emoji) in a wrong format like:

"hi I like this emoji: \\u265B\\u2655"

Output

"hi I like this emoji: \u265B\u2655"

Needed Output

"hi I like this emoji: ♕♛"

in my case i can't use u"hi I like this emoji: \u265B\u2655" because my string is stored in a variable obtained with selenium and regex

What i have tried

I used json.loads("hi I like this emoji: \\u265B\\u2655") i got this

Exception Raised

raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Question

How can i format this string to obtain the needed output?

Edit

i tried yhis:

json.loads('"' + mystring + '"')

and i got:

json.decoder.JSONDecodeError: Invalid control character at: line 1 column 23 (char 22)

as asked in the comment this is the result of print(repr(mystring)):

'La Spezia\\ud83d\\udccd\\n\\ud83d\\udcdaLiceo Scientifico Sportivo A. Pacinotti\\ud83c\\udfeb\\nITALIAN FENCER \\ud83c\\uddee\\ud83c\\uddf9 \\ud83e\\udd3a SPCS!!\\nELECTRIC BASS\\ud83c\\udfb8\\ud83c\\udfb6\\nBooks \\ud83d\\udcd6\\n2a T ( ESCONI ) \\ud83d\\ude0d \\ud83c\\udf93'
Leonardo Scotti
  • 905
  • 3
  • 20
  • 1
    You're *probably* simply looking at JSON encoding?! But that is rather unclear without more details. – deceze Jan 20 '21 at 10:39
  • I've edited the question being clearer about the problem and the situation, @deceze can you please vote to reopen it? – Leonardo Scotti Jan 20 '21 at 10:54
  • 1
    `json.loads('"hi I like this emoji: \\u265B\\u2655"')` → 'hi I like this emoji: ♛♕'… – deceze Jan 20 '21 at 10:58
  • 1
    There's a big difference between `json.loads('"..."')` and `json.loads("...")`. If you have a *JSON value* somewhere, that value should include double quotes. If you put that JSON value into a Python string literal, those quotes must be contained within the Python string literal, i.e.: `s = '"\\u265B..."'` to represent the JSON value `"\u265B..."`. *That* should be perfectly JSON-decodable. It's unclear what exactly you're dealing with and whether you're just failing because you copy a valid JSON value incorrectly into Python source code for testing, or something else… – deceze Jan 20 '21 at 11:17
  • To make this unambiguous, show `print(repr(the_json_string_you_extracted_from_somewhere))`. – deceze Jan 20 '21 at 11:19
  • 1
    You probably don't want to use regex to extract this string in the first place (there can did you handle escaped quotes?) – user202729 Jan 20 '21 at 11:43
  • 1
    Works…?! https://repl.it/@DavidZentgraf/AccomplishedSqueakyLegacysystem – deceze Jan 20 '21 at 11:44
  • It isn't really clear what exactly can happen in the string. If it's just `\u<...>` you can easily do it with a regex, but what about escaped backslashes/other characters? – user202729 Jan 20 '21 at 11:44
  • yes @deceze it works thank you so much – Leonardo Scotti Jan 20 '21 at 14:41

1 Answers1

1

From your final edit, the scraped string looks like a JSON-encoded string that was extracted directly out of a JSON file somewhere. Strings in JSON need to be double-quoted to extract properly:

>>> import json
>>> s='La Spezia\\ud83d\\udccd\\n\\ud83d\\udcdaLiceo Scientifico Sportivo A. Pacinotti\\ud83c\\udfeb\\nITALIAN FENCER \\ud83c\\uddee\\ud83c\\uddf9 \\ud83e\\udd3a SPCS!!\\nELECTRIC BASS\\ud83c\\udfb8\\ud83c\\udfb6\\nBooks \\ud83d\\udcd6\\n2a T ( ESCONI ) \\ud83d\\ude0d \\ud83c\\udf93'
>>> print(json.loads(f'"{s}"'))
La Spezia
Liceo Scientifico Sportivo A. Pacinotti
ITALIAN FENCER   SPCS!!
ELECTRIC BASS
Books 
2a T ( ESCONI )  
Mark Tolonen
  • 132,868
  • 21
  • 152
  • 208