Your attempt doesn't try to remove the characters. You can use the replace
method to replace characters in a string, it can also be used to remove characters by just replacing with the empty string.
The only problem is to properly represent the 0xF0B7 in your source code and the proper way depends on whether document.paragraphs
contains normal strings or unicode strings (I'd recommend using python3 to avoid unicode problems). I assume that they are unicode strings and then you would represent the code point as `u"\uF0B7" (if it's normal strings then it will depend on the encoding).
Apart from that your code has an issue since the way you build text_string
may be suboptimal. Another way to build a string from fragments is to put the fragments in a list and then join them by using "".join(l)
.
Putting this together you get (assuming that document.paragraphs
is unicode strings):
from docx import Document
document = Document(file_to_read)
text_string = u"\n".join([p.replace(u"\uF0B7", u"")
for p in document.paragraphs])
print(text_string)
If you use python3 you have to remove the u
s before the strings (since in python3 all strings are unicode). Also note that when print
ing you must make sure that you have an encoding that supports all the characters in the document (which may have been the reason you want to remove bullets in the first place).