32

I'm wondering if there is a lightweight way I could use JavaScript or jQuery to sniff out a specific text character across a document; say and find all instances of this character. And then! Write an ability to replace all instances of this with say a $.

I found this snippet for starters:

var str = 'test: '';

str = str.replace(/'/g, "'");

Essentially; I am wanting a solution for a one page document. Grab all instances of X and make it XY. Only text characters.

fred randall
  • 7,023
  • 18
  • 73
  • 167
  • 2
    Yes, there probably is a way to do that, though the simplest way would also remove all event handlers and element data from your page. `$("body").html( $(body).html().replace(/\€/g,'$') )` this is a bad way of doing it. – Kevin B Sep 05 '13 at 18:49
  • 2
    If you wanted to avoid losing events and element data, it gets far more complex. – Kevin B Sep 05 '13 at 18:50
  • 1
    Thanks @KevinB -- I'd like to flesh this out further. Just something I am curious about. What could I add to be more specific? – fred randall Sep 05 '13 at 19:01
  • 2
    The best thing would be to do it server-side, but if that isn't an option, client-side you would need to select all text nodes in the document, iterate over them, then perform the replace on each text node individually. – Kevin B Sep 05 '13 at 19:05
  • 1
    Here's a sample using a plugin i wrote a while back to highlight text. http://jsfiddle.net/2t8TV/1/ I used it to wrap all occurrences of `€` with a span, then i replaced the text inside of those spans. – Kevin B Sep 05 '13 at 19:09

14 Answers14

37

How about this, replacing @ with $:

$("body").children().each(function () {
    $(this).html( $(this).html().replace(/@/g,"$") );
});

http://jsfiddle.net/maximua/jp96C/1/

017Bluefield
  • 161
  • 2
  • 13
Max Malyk
  • 850
  • 5
  • 8
  • 4
    Adding `i` to the regex will give you a case-insensitive search if you're replacing alpha characters `.replace(/soMeWord/gi,"Another Word"));` – Dylan Valade Feb 23 '14 at 20:20
  • 1
    @DavidBailey good catch, including spaces before and after allowed me to reduce that chance – D34dman Oct 21 '16 at 15:17
22

My own suggestion is as follows:

function nativeSelector() {
    var elements = document.querySelectorAll("body, body *");
    var results = [];
    var child;
    for(var i = 0; i < elements.length; i++) {
        child = elements[i].childNodes[0];
        if(elements[i].hasChildNodes() && child.nodeType == 3) {
            results.push(child);
        }
    }
    return results;
}

var textnodes = nativeSelector(),
    _nv;
for (var i = 0, len = textnodes.length; i<len; i++){
    _nv = textnodes[i].nodeValue;
    textnodes[i].nodeValue = _nv.replace(/£/g,'€');
}

JS Fiddle demo.

The nativeSelector() function comes from an answer (posted by Anurag) to this question: getElementsByTagName() equivalent for textNodes.

Community
  • 1
  • 1
David says reinstate Monica
  • 230,743
  • 47
  • 350
  • 385
19

ECMAScript 2015+ approach

Pitfalls when solving this task

This seems like an easy task, but you have to take care of several things:

  • Simply replacing the entire HTML kills all DOM functionality, like event listeners
  • Replacing the HTML may also replace <script> or <style> contents, or HTML tags or attributes, which is not always desired
  • Changing the HTML may result in an attack
  • You may want to replace attributes like title and alt (in a controlled manner) as well

Guarding against attacks generally can’t be solved by using the approaches below. E.g. if a fetch call reads a URL from somewhere on the page, then sends a request to that URL, the functions below won’t stop that, since this scenario is inherently unsafe.

Replacing the text contents of all elements

This basically selects all elements that contain normal text, goes through their child nodes — among those are also text nodes —, seeks those text nodes out and replaces their contents.

You can optionally specify a different root target, e.g. replaceOnDocument(/€/g, "$", { target: someElement });; by default, the <body> is chosen.

const replaceOnDocument = (pattern, string, {target = document.body} = {}) => {
  // Handle `string` — see the last section
  [
    target,
    ...target.querySelectorAll("*:not(script):not(noscript):not(style)")
  ].forEach(({childNodes: [...nodes]}) => nodes
    .filter(({nodeType}) => nodeType === document.TEXT_NODE)
    .forEach((textNode) => textNode.textContent = textNode.textContent.replace(pattern, string)));
};

replaceOnDocument(/€/g, "$");

Replacing text nodes, element attributes and properties

Now, this is a little more complex: you need to check three cases: whether a node is a text node, whether it’s an element and its attribute should be replaced, or whether it’s an element and its property should be replaced. A replacer object provides methods for text nodes and for elements.

Before replacing attributes and properties, the replacer needs to check whether the element has a matching attribute; otherwise new attributes get created, undesirably. It also needs to check whether the targeted property is a string, since only strings can be replaced, or whether the matching property to the targeted attribute is not a function, since this may lead to an attack.

In the example below, you can see how to use the extended features: in the optional third argument, you may add an attrs property and a props property, which is an iterable (e.g. an array) each, for the attributes to be replaced and the properties to be replaced, respectively.

You’ll also notice that this snippet uses flatMap. If that’s not supported, use a polyfill or replace it by the reduceconcat, or mapreduceconcat construct, as seen in the linked documentation.

const replaceOnDocument = (() => {
    const replacer = {
      [document.TEXT_NODE](node, pattern, string){
        node.textContent = node.textContent.replace(pattern, string);
      },
      [document.ELEMENT_NODE](node, pattern, string, {attrs, props} = {}){
        attrs.forEach((attr) => {
          if(typeof node[attr] !== "function" && node.hasAttribute(attr)){
            node.setAttribute(attr, node.getAttribute(attr).replace(pattern, string));
          }
        });
        props.forEach((prop) => {
          if(typeof node[prop] === "string" && node.hasAttribute(prop)){
            node[prop] = node[prop].replace(pattern, string);
          }
        });
      }
    };

    return (pattern, string, {target = document.body, attrs: [...attrs] = [], props: [...props] = []} = {}) => {
      // Handle `string` — see the last section
      [
        target,
        ...[
          target,
          ...target.querySelectorAll("*:not(script):not(noscript):not(style)")
        ].flatMap(({childNodes: [...nodes]}) => nodes)
      ].filter(({nodeType}) => replacer.hasOwnProperty(nodeType))
        .forEach((node) => replacer[node.nodeType](node, pattern, string, {
          attrs,
          props
        }));
    };
})();

replaceOnDocument(/€/g, "$", {
  attrs: [
    "title",
    "alt",
    "onerror" // This will be ignored
  ],
  props: [
    "value" // Changing an `<input>`’s `value` attribute won’t change its current value, so the property needs to be accessed here
  ]
});

Replacing with HTML entities

If you need to make it work with HTML entities like &shy;, the above approaches will just literally produce the string &shy;, since that’s an HTML entity and will only work when assigning .innerHTML or using related methods.

So let’s solve it by passing the input string to something that accepts an HTML string: a new, temporary HTMLDocument. This is created by the DOMParser’s parseFromString method; in the end we read its documentElement’s textContent:

string = new DOMParser().parseFromString(string, "text/html").documentElement.textContent;

If you want to use this, choose one of the approaches above, depending on whether or not you want to replace HTML attributes and DOM properties in addition to text; then simply replace the comment // Handle `string` — see the last section by the above line.

Now you can use replaceOnDocument(/Güterzug/g, "G&uuml;ter&shy;zug");.

NB: If you don’t use the string handling code, you may also remove the { } around the arrow function body.

Note that this parses HTML entities but still disallows inserting actual HTML tags, since we’re reading only the textContent. This is also safe against most cases of : since we’re using parseFromString and the page’s document isn’t affected, no <script> gets downloaded and no onerror handler gets executed.

You should also consider using \xAD instead of &shy; directly in your JavaScript string, if it turns out to be simpler.

Sebastian Simon
  • 14,320
  • 6
  • 42
  • 61
  • 1
    Looks great in itself, but I need to replace words with the same words with soft hyphens inserted. That is because of errors being made by the CSS hyphenation function with certain Dutch words. And then your function makes the browser render: `Lo­rem` and `ip­sum` (Latin examples). You wouldn't happen to know how to solve that, would you? – Frank Conijn Jun 26 '18 at 11:39
  • 2
    Thanks! You had already set me on the right track, that of `innerHTML`. I'm working on another way with that, and will make Fiddles with both. So we can see which one is the fastest and easiest to maintain for the next web dev. As Arnold used to say: I'll be back. ;-) – Frank Conijn Jun 26 '18 at 12:52
  • 2
    Hier bin ich wieder. Actually, I don't really know how to test scripts for speed. I can imagine something, but that's about it. However, I did come up with a simple forked script that can replace with other normal words, and can insert HTML entities. See https://jsfiddle.net/FrankConijn/t1e0k2fx/4/. Move the center scrollbar bar to the right, shrinking the output field, and see that the word 'me­de­wer­kers­te­vre­den­heids­on­der­zoek' gets hyphenated. – Frank Conijn Jun 26 '18 at 20:12
  • 1
    @FrankConijn Although that solution uses `innerHTML` replacement which has the problem of overriding event listeners… – Sebastian Simon Jun 26 '18 at 20:49
  • 1
    I know what event listeners are, but don't quite see what exactly you mean. Can you give a small example? – Frank Conijn Jun 26 '18 at 20:54
  • 2
    @FrankConijn See [Is it possible to append to `innerHTML` without destroying descendants' event listeners?](https://stackoverflow.com/q/595808/4642212). Basically, `body.children[someChild].addEventListener(someEvent, someListener);` then `body.innerHTML += someAdditionalContent;` or `body.innerHTML = someNewContent;` serializes and re-parses the entire HTML, clearing off any previously bound event listener. That’s why, in my approach, I carefully replace the contents of individual text nodes. – Sebastian Simon Jun 26 '18 at 21:00
  • 1
    Interesting. Didn't know that, so thanks for the warning. Nonetheless, there are several easy workarounds for the problem of the destruction of event listeners: 1. In case of (furthermore) static pages: add the event listeners last. 2. On the page you linked to, there are several workarounds for pages with altered markup. To which I added a simple one: https://stackoverflow.com/a/51054276/2056165. – Frank Conijn Jun 27 '18 at 04:27
3

Similar to @max-malik's answer, but without using jQuery, you can also do this using document.createTreeWalker:

button.addEventListener('click', e => {
  const treeWalker = document.createTreeWalker(document.body);
  while (treeWalker.nextNode()) {
    const node = treeWalker.currentNode;
    node.textContent = node.textContent.replace(/@/g, '$');
  }
})
<div>This is an @ that we are @ replacing.</div>
<div>This is another @ that we are replacing.</div>
<div>
  <span>This is an @ in a span in @ div.</span>
</div>
<br>
<input id="button" type="button" value="Replace @ with $" />
Jsilvermist
  • 352
  • 4
  • 16
3

I think you may be overthinking this.

My approach is simple.

Enclose you page with a div tag:

<div id="mydiv">
<!-- you page here -->
</div>

In your javascript:

var html=document.getElementById('mydiv').innerHTML;
html = html.replace(/this/g,"that");
document.getElementById('mydiv').innerHTML=html;
Joe Bonds
  • 66
  • 5
2

In javascript without using jquery:

document.body.innerText = document.body.innerText.replace('actualword', 'replacementword');
Ivan Rubinson
  • 2,468
  • 4
  • 14
  • 33
sohail.hussain.dyn
  • 1,211
  • 1
  • 14
  • 25
  • 4
    Without using a regex, with the `g` switch/modifier, that will only replace the first instance of the 'actualword'. – David says reinstate Monica Sep 05 '13 at 18:54
  • 1
    Please make a demo to show that this works. In my test set-up, it doesn't do a thing, – Frank Conijn Jun 26 '18 at 11:00
  • 2
    @FrankConijn You actually have to set the innerText, which isn't shown in the answer. `document.body.innerText = document.body.innerText.replace('actualword', 'replacementword')`... – Grant Nov 27 '18 at 23:12
2

The best would be to do this server-side or wrap the currency symbols in an element you can select before returning it to the browser, however if neither is an option, you can select all text nodes within the body and do the replace on them. Below i'm doing this using a plugin i wrote 2 years ago that was meant for highlighting text. What i'm doing is finding all occurrences of € and wrapping it in a span with the class currency-symbol, then i'm replacing the text of those spans.

Demo

(function($){

    $.fn.highlightText = function () {
        // handler first parameter
        // is the first parameter a regexp?
        var re,
            hClass,
            reStr,
            argType = $.type(arguments[0]),
            defaultTagName = $.fn.highlightText.defaultTagName;

        if ( argType === "regexp" ) {
            // first argument is a regular expression
            re = arguments[0];
        }       
        // is the first parameter an array?
        else if ( argType === "array" ) {
            // first argument is an array, generate
            // regular expression string for later use
            reStr = arguments[0].join("|");
        }       
        // is the first parameter a string?
        else if ( argType === "string" ) {
            // store string in regular expression string
            // for later use
            reStr = arguments[0];
        }       
        // else, return out and do nothing because this
        // argument is required.
        else {
            return;
        }

        // the second parameter is optional, however,
        // it must be a string or boolean value. If it is 
        // a string, it will be used as the highlight class.
        // If it is a boolean value and equal to true, it 
        // will be used as the third parameter and the highlight
        // class will default to "highlight". If it is undefined,
        // the highlight class will default to "highlight" and 
        // the third parameter will default to false, allowing
        // the plugin to match partial matches.
        // ** The exception is if the first parameter is a regular
        // expression, the third parameter will be ignored.
        argType = $.type(arguments[1]);
        if ( argType === "string" ) {
            hClass = arguments[1];
        }
        else if ( argType === "boolean" ) {
            hClass = "highlight";
            if ( reStr ) {
                reStr = "\\b" + reStr + "\\b";
            }
        }
        else {
            hClass = "highlight";
        }

        if ( arguments[2] && reStr ) {
            reStr = reStr = "\\b" + reStr + "\\b";
        } 

        // if re is not defined ( which means either an array or
        // string was passed as the first parameter ) create the
        // regular expression.
        if (!re) {
            re = new RegExp( "(" + reStr + ")", "ig" );
        }

        // iterate through each matched element
        return this.each( function() {
            // select all contents of this element
            $( this ).find( "*" ).andSelf().contents()

            // filter to only text nodes that aren't already highlighted
            .filter( function () {
                return this.nodeType === 3 && $( this ).closest( "." + hClass ).length === 0;
            })

            // loop through each text node
            .each( function () {
                var output;
                output = this.nodeValue
                    .replace( re, "<" + defaultTagName + " class='" + hClass + "'>$1</" + defaultTagName +">" );
                if ( output !== this.nodeValue ) {
                    $( this ).wrap( "<p></p>" ).parent()
                        .html( output ).contents().unwrap();
                }
            });
        });
    };

    $.fn.highlightText.defaultTagName = "span";

})( jQuery );

$("body").highlightText("€","currency-symbol");
$("span.currency-symbol").text("$");
Kevin B
  • 92,700
  • 15
  • 158
  • 170
  • 2
    Obviously it would be horribly inefficient to do this on the body element on a large website, therefore you should replace "body" with a selector that selects all the areas where this symbol can occur. – Kevin B Sep 05 '13 at 19:27
2

Use split and join method

$("#idBut").click(function() {
    $("body").children().each(function() {
        $(this).html($(this).html().split('@').join("$"));
    });
});

here is solution

Suresh Mahawar
  • 1,178
  • 12
  • 34
1

You can use:

str.replace(/text/g, "replaced text");
Praxis Ashelin
  • 4,983
  • 2
  • 16
  • 43
  • 1
    Does that work with strings/text? The [API docs for `replaceAll()`](http://api.jquery.com/replaceAll/) suggests it's used to replace one set of elements with another element. – David says reinstate Monica Sep 05 '13 at 18:59
  • 1
    You are correct, I didn't even know that. But in that case this should help: http://stackoverflow.com/questions/13574980/jquery-replace-all-instances-of-a-character-in-a-string – Praxis Ashelin Sep 05 '13 at 19:03
1
str.replace(/replacetext/g,'actualtext')

This replaces all instances of replacetext with actualtext

Venkata Krishna
  • 13,950
  • 5
  • 37
  • 55
  • 1
    Author wanted to "sniff out a specific text character across a document" not just in a string – Max Malyk Sep 05 '13 at 19:07
  • 1
    @MaxMalyk from the question.. i think its clear that the sniffing part is done & the "str" value has been extracted. – Venkata Krishna Sep 05 '13 at 19:08
  • 1
    no, it is not: >>Essentially; I am wanting a solution for a one page document. Grab all instances of X and make it XY. Only text characters.< – Max Malyk Sep 05 '13 at 19:10
1

For each element inside document body modify their text using .text(fn) function.

$("body *").text(function() {
    return $(this).text().replace("x", "xy");
});
letiagoalves
  • 10,718
  • 4
  • 37
  • 66
1

As you'll be using jQuery anyway, try:

https://github.com/cowboy/jquery-replacetext

Then just do

$("p").replaceText("£", "$")

It seems to do good job of only replacing text and not messing with other elements

James
  • 1,717
  • 14
  • 17
1

Vanilla JavaScript solution:

document.body.innerHTML = document.body.innerHTML.replace(/Original/g, "New")
Bjørnar Hagen
  • 465
  • 6
  • 13
0

Here is something that might help someone looking for this answer: The following uses jquery it searches the whole document and only replaces the text. for example if we had

<a href="/i-am/123/a/overpopulation">overpopulation</a>

and we wanted to add a span with the class overpop around the word overpopulation

<a href="/i-am/123/a/overpopulation"><span class="overpop">overpopulation</span></a>

we would run the following

        $("*:containsIN('overpopulation')").filter(
            function() {
                return $(this).find("*:contains('" + str + "')").length == 0
            }
        ).html(function(_, html) {
            if (html != 'undefined') {
                return html.replace(/(overpopulation)/gi, '<span class="overpop">$1</span>');
            }

        });

the search is case insensitive searches the whole document and only replaces the text portions in this case we are searching for the string 'overpopulation'

    $.extend($.expr[":"], {
        "containsIN": function(elem, i, match, array) {
            return (elem.textContent || elem.innerText || "").toLowerCase().indexOf((match[3] || "").toLowerCase()) >= 0;
        }
    });
Natdrip
  • 866
  • 1
  • 9
  • 22