I am looking for a regular expression that works in the Javascript regexp engine that satisfies the following requirements.
I have a file with content structured in the following way (the text within the box):
Column 1 Column 2 Column 3
_______________________________________________________________________________________________
line 1|Heading 1 Heading 2 Heading 3 |
line 2| 123 456 Quisque imperdiet nibh nec fermentum sollicitudin. |
line 3| Vestibulum eu elit rutrum, eleifend ligula eu, interdum massa. |
line 4| 789 012 Suspendisse vel urna vulputate, porta ex ut, varius felis. |
line 5| Praesent a metus faucibus, porttitor magna at, fermentum libero. |
line 6| |
line 7| |
line 8|Heading 1 Heading 2 Heading 3 |
line 9| 123 456 Quisque imperdiet nibh nec fermentum sollicitudin. |
line 10| Vestibulum eu elit rutrum, eleifend ligula eu, interdum massa. |
line 11| 789 012 Suspendisse vel urna vulputate, porta ex ut, varius felis. |
line 12| Praesent a metus faucibus, porttitor magna at, fermentum libero. |
|_____________________________________________________________________________________________|
Note that the file does not contain tabs, only spaces, but I would prefer if the regular expression was extended to be able to handle tabs.
Column Description:
The heading lines are simply letters. I already know how to create a regular expression to match the heading lines.
The first two columns can either only be empty or can only contain a number with an arbitrary number of digits.
The third column can have any combination of letters, numbers, and some special characters as well (brackets of any type--curly, round, angle, forward slash, period, hyphen, equals sign)
The third column may contain a number followed by a space followed by a word or special character (these examples are valid entries in the third column,
5 RANDOMWORD
,5 (10)
,5 AND 10
)The third column will never contain: (1) a single number, (2) only numbers separated by spaces
I want a regular expression which will allow me to match extra spaces (either two or more spaces, tabs, or any combination of tabs or spaces) in the contents in the third column so I can easily delete them. The goal is to find multiple spaces in the third column and replace them with a single space.
I want to ignore the heading lines completely.
I also do not want to match the spaces around the numbers present in the first two columns. Note that the first two columns may not always contain numbers.
The regular expression I have been able to piece together so far looks like this:
/(?=^(?:(?!Heading 1 Heading 2 Heading 3).)*$)([ \t]*[\S]+[^\n]*)[ \t]{2,}/
The
/(?=^(?:(?!Heading 1 Heading 2 Heading 3).)*$)/
allows me to ignore heading lines completely.The
/([ \t]*[\S]+[^\n]*)[ \t]{2,}/
allows me to find multiple spaces in the lines which do not have numbers in the first two columns. However, the problem with this one is that it will match the space after numbers in the second column (like in lines 2 and 9), which I do not want to do.
If Javascript supported lookbehind I think this problem would have been easy to solve, otherwise I am at a loss on how to solve this problem.
Edit 1: Apologies, my original question was not clear. I am not looking for Javascript code, but merely a regular expression that works in the Javascript regexp engine.
Also, my preference would be a single regexp expression as opposed to doing it in multiple steps.
Edit 2: More details added in the specifications.
Edit 3: Lookbehind assertions got accepted into the JavaScript standard and is supported by some but not all JavaScript engines as of writing this comment. See: Javascript: negative lookbehind equivalent?. This might be possible with a single regexp using lookbehinds, but I have not yet tested this as of yet.
Thanks a lot for your help.