-1

I need a regex (or another nice solution) that will match whitespace only between tags inside a table. My current regex will match whitespace between all tags.

const result = `
<div>
  <table class="foo">
    <tr>
      <td>
        Lorem ipsum
      </td>
    </tr>
    <tr>
      <td>
        Dolor
      </td>
    </tr>
  </table>
</div>
`.replace(/>\s+</g, '><');

I want to achieve this:

<div>
  <table class="foo"><tr><td>Lorem ipsum</td></tr><tr><td>Dolor</td></tr></table>
</div>
Ben Besuijen
  • 504
  • 7
  • 20
  • I can see what you're trying to do, but can I ask as to why you're trying to do that? – JO3-W3B-D3V Dec 31 '18 at 13:58
  • In React i'm converting a string to a JSX element. But when there are whitespace characters it will give an error: whitespace text nodes cannot appear as a child of . I don't want to affect all other elements outside the the table.
    – Ben Besuijen Dec 31 '18 at 14:04
  • If your trying to minify your code there are lots of off the shelf minifiers around, using [regex to parse HTML leads to madness](https://stackoverflow.com/a/1732454/542251) – Liam Dec 31 '18 at 14:06
  • Here are a couple of resources to help with that, https://i.pinimg.com/originals/f4/e8/35/f4e835a17ffc770a69f632d257b77473.png, https://pics.me.me/types-of-headaches-migraine-hypertension-parsing-x-html-stresswith-regex-gexas-29878500.png, https://s3.amazonaws.com/websitebeaver/blog/escape-html-inside-code-or-pre-tag-to-entities-to-display-raw-code-with-prismjs/main.jpg – shanks Dec 31 '18 at 14:09
  • [Parsing HTML with regex is hard job](https://stackoverflow.com/a/4234491/372239) – Toto Dec 31 '18 at 14:58

2 Answers2

1

Explanation

This isn't quite a regular expression solution, however I feel that it's actually a more simplistic solution, feel free to provide feedback.

With this solution, considering that you want to target table tags specifically, I think that this should suffice?

let words = ['Lorum ipsum', 'Dolor'];
let result = `
<div>
  <table class="foo" id="demo" style="">
    <tr>
      <td>
        words[0]
      </td>
    </tr>
    <tr>
      <td>
        words[1]
      </td>
    </tr>
  </table>
</div>
`;

let newResult = '';

const cleanseString = str => {
  const attributes = ['id', 'class', 'style']; // etc ...
  str = str.replace(/\s/g, '');
  const index = str.replace(/\D/g, '');
  const marker = `words[${index}]`;

  if (str.indexOf(marker) >= 0) {
    str = str.replace(marker, words[index]);
  }

  attributes.forEach(attr => {
    if (str.indexOf(attr) >= 0) {
      let start = '',
        end = '';
      start = str.substring(0, str.indexOf(attr));
      end = str.substring(str.indexOf(attr), str.length);
      str = start + " " + end;
    }
  });

  return str;
};

result.split("<").forEach(str => {
  str = cleanseString(str);

  if (str != '') {
    if (str.indexOf("/table") >= 0) newResult += "<" + str + '\n';
    else if (str.indexOf('table') >= 0) newResult += '\n\t' + "<" + str;
    else newResult += "<" + str;
  }
});

//console.clear();
console.log(newResult);
JO3-W3B-D3V
  • 1,948
  • 8
  • 23
0

I saw you wanted it to fix this bug -- Whitespace text nodes cannot appear as a child of <table> -- in the html-to-react package.

I encountered the same bug -- I'm using the following to fix it (where sliced in this example in the name of the string which contains the HTML):

      // React expect no whitespace between table elements
      sliced = sliced.replace(/<table>\s*<thead>/g, "<table><thead>");
      sliced = sliced.replace(/<table>\s*<tbody>/g, "<table><tbody>");
      sliced = sliced.replace(/<thead>\s*<tr>/g, "<thead><tr>");
      sliced = sliced.replace(/<tbody>\s*<tr>/g, "<tbody><tr>");
      sliced = sliced.replace(/<tr>\s*<th>/g, "<tr><th>");
      sliced = sliced.replace(/<tr>\s*<td>/g, "<tr><td>");

      sliced = sliced.replace(/<\/thead>\s*<tbody>/g, "</thead><tbody>");

      sliced = sliced.replace(/<\/thead>\s*<\/table>/g, "</thead></table>");
      sliced = sliced.replace(/<\/tbody>\s*<\/table>/g, "</tbody></table>");
      sliced = sliced.replace(/<\/tr>\s*<\/thead>/g, "</tr></thead>");
      sliced = sliced.replace(/<\/tr>\s*<\/tbody>/g, "</tr></tbody>");
      sliced = sliced.replace(/<\/th>\s*<\/tr>/g, "</th></tr>");
      sliced = sliced.replace(/<\/td>\s*<\/tr>/g, "</td></tr>");

      sliced = sliced.replace(/<\/tr>\s*<tr>/g, "</tr><tr>");
      sliced = sliced.replace(/<\/th>\s*<th>/g, "</th><th>");
      sliced = sliced.replace(/<\/td>\s*<td>/g, "</td><td>");

That (using regular expressions) is assuming it's regular HTML and not "arbitrary" HTML.

ChrisW
  • 51,820
  • 11
  • 101
  • 201