I have a text file full of amino acids (CA-Final.txt) as well as some other data. Here is a snippet of the text file
ATOM 109 CA ASER A 48 10.832 19.066 -2.324 0.50 61.96 C
ATOM 121 CA AALA A 49 12.327 22.569 -2.163 0.50 60.22 C
ATOM 131 CA AGLN A 50 8.976 24.342 -1.742 0.50 56.71 C
ATOM 145 CA APRO A 51 7.689 25.565 1.689 0.50 51.89 C
ATOM 158 CA GLN A 52 5.174 23.336 3.467 1.00 43.45 C
ATOM 167 CA HIS A 53 2.339 24.135 5.889 1.00 38.39 C
ATOM 177 CA PHE A 54 0.900 22.203 8.827 1.00 33.79 C
ATOM 188 CA TYR A 55 -1.217 22.065 11.975 1.00 34.89 C
ATOM 200 CA ALA A 56 0.334 20.465 15.090 1.00 31.84 C
ATOM 205 CA VAL A 57 0.000 20.066 18.885 1.00 30.46 C
ATOM 212 CA VAL A 58 2.738 21.762 20.915 1.00 27.28 C
Essentially, my problem is that a few of the amino acids have the letter A in front of them where they are not supposed to be. Amino acid abbreviations are supposed to be 3 letters long. I have attempted to use regular expressions to remove the A at every instance of A in front of an amino acid abbreviation. Here is my code so far
def Trimmer(txtFileName):
i = open('CA-final.txt', 'w')
j = open(txtFileName, 'r')
for record in j:
with open(txtFileName, 'r') as j:
content= j.read()
content_new = re.sub('^ATOM\s+\d+\s+CA\s+A[ADTSEPGCVMILYFHKRWQN]', r'^ATOM\s+\d+\s+CA\s+[ADTSEPGCVMILYFHKRWQN]', content, flags = re.M)
When I run the function, it returns an error
File "C:\Users\UserName\AppData\Local\conda\conda\envs\biopython\lib\sre_parse.py", line 1024, in parse_template
raise s.error('bad escape %s' % this, len(this))
error: bad escape \s
My idea is that this function will find every instance of an A in front of a string of 3 characters and replace it with just the 3 other characters. Why exactly am I getting this error?