-3

python 2.7.6

sample document

   <div id="memo_img">
      <table style="table-layout: fixed; width: 100%">
         <tbody>
            <tr>
              <td>This is just simple sentence
              </td>
           </tr>
         </tbody>
      </table>
   </div>

this html has many whitespace.

I want capture just "This is just simple sentence"

my regex

<table style="table-layout: fixed; width: 100%"><tbody><tr><td>(.*)</td>

not working.

how to ignore whitespace and tabs?

please help me

Moumit
  • 4,822
  • 6
  • 41
  • 45
nontoxice
  • 21
  • 6

1 Answers1

-1

You may approach it with regex too, I've made string a bit more mess, so you can see how it works in the hard mode :

import re
a = '''
    <table style="table-layout: fixed; width: 100%"><tbody><tr><td>

                                    This is just simple sentence
word
                other          word
 number
                         22    14        </td></tr></tbody></table>
                                    </div>
'''
m = re.search('<td>((.|\n)*?)<\/td>', a)
str = m.group(1)
print ' '.join(str.split())

result will be : This is just simple sentence word other word number 22 14

Anatolii Chmykhalo
  • 404
  • 1
  • 5
  • 15