How to capture plain text site into a variable using JavaScript or jQuery?

Question

I need to save all the content of a website inside a variable in order to search for a specific string.

Here is an example of how the content looks like, it's literally a plain text log.

======================================================================
BEGIN_TEXT

APPLEYARD IAN 23761347 BA 2 Airport Data:
Code = JFK
Name = JFK/John F Kennedy International
City = New York
State = NY
Airport Data:
Code = LCY
Name = Lcy/London City Airport            '
City = London
State = England
sysTime:XXXXXXXXX0000 year:2012 month:7 day:16 hour:7 min:10 pm,END_TEXT

======================================================================

----(1341920977.93286)
                           2012/07/10 12:03:22.582
MAP UNAVAIL-AIRPORT CHECK IN ONLY *
>

======================================================================

----(1341920977.93286)
                           2012/07/10 12:03:23.202
I

======================================================================

----(1341920977.93286)
                           2012/07/10 12:03:23.337
IGND 
>

======================================================================

----(1341920977.93286)
                           2012/07/10 12:03:23.337
9V/BA2R16AUGJFKLCY

======================================================================

This was my original idea:

var content = document.body.textContent; //But there is no body!
var pos = content.search("UNAVAIL-AIRPORT"); // Just an example to search for

So my questions are:

How do I capture that content?
Once I get the position of the string, how can I scroll there and highlight the match? I basically want to recreate the CTRL + F function.

Thanks in advance!

There is always a body in html document. The body *tag* doesn't have to exist in markup, but the body *element* does exist. If you serve this as html, it will be in `document.body.textContent` indeed. — Esailija, Aug 24 '12 at 14:32
I don't think the result (the log per se) is an HTML document. If you do a view source you see the exact same thing. It's like looking at a TXT file from a browser. The script will be injected as a bookmarklet. — fedxc, Aug 24 '12 at 14:40
@fedxc `The script will be injected as a bookmarklet.` -- Aha! That's hugely important information, and it should be edited into the question. (Without mentioning you are writing a bookmarklet, most people will assume you have another page where are pulling in the text file via Ajax or an iframe.) — apsillers, Aug 24 '12 at 17:15

score 1 · Accepted Answer · edited May 23 '17 at 12:20

When you load a text file as a web page in your browser, your browser creates a bare-bones HTML scaffolding around the text:

<html>
    <head></head>
    <body>
        <pre>[your entire text document here]</pre>
    </body>
</html>

This doesn't show up if you view the source of the page, but it is visible if you insect the page (e.g. with Firebug or Chrome dev tools).

A simple way to manipulate and style the text is to grab the innerHTML of the <pre> block and add tags into it:

function highlightText(regexStr) {
    var preTag = document.getElementsByTagName("pre")[0];
    preTag.innerHTML = preTag.innerHTML
                         .replace(new RegExp("("+regexStr+")", "g"),
                                  "<span style='background-color:orange;'>$1</span>");
}

highlighText("some regex phrase to highlight");

Add whatever styles you like in the <span> to achieved the desired highlighting effect.

Note that the string you pass into highlightText is used in a regex, so you should escape special regex characters like $ and ^ before you pass the string into the function (or make the function sanitize its own input). This has been addressed in How do you pass a variable to a Regular Expression JavaScript?:

str.replace(/([.?*+^$[\]\\(){}|-])/g, "\\$1");

score 0 · Answer 2 · answered Aug 24 '12 at 14:30

0

var content = '';

    $.get('ajax/test.html', function(data) {
        content = data;
    });

2-) $(content).find() what you are searching for and add a class to highlight

answered Aug 24 '12 at 14:30

Thiago Custodio

13,761
6
36
74

1

`find` won't work here since it's all plain text, not an HTML document. `find` won't locate any DOM elements, because there are no DOM elements at all. – apsillers Aug 24 '12 at 14:32
1

AJAX is asynchronous. Everything must be done within the callback function (i.e. vars declaration also). – sp00m Aug 24 '12 at 14:33
@sp00m Are you objecting to separation of the declaration and assignment of `content`? There is no reason why a function cannot reference a variable that exists in an outer scope. The `content` in the callback simply refers to (and sets the value of) the `content` defined in the outer scope. – apsillers Aug 24 '12 at 14:34
@apsillers It can, but it's too confusing. `content` won't be necessarily initialized after (DOM-speaking) the AJAX request. – sp00m Aug 24 '12 at 14:37
@sp00m I think I misunderstood you before. Do you mean that all **manipulation** of `content` that uses the result of the Ajax call must take place inside of (or be triggered by) the callback? Yes, of course, I agree with that. However, you might still use an outer scope in certain specialized cases (e.g., `content` is constantly set and reset by repeated Ajax calls and is read by event handlers that read its current value at the time of the event). I can appreciate this is not one of those cases, and the `content` declaration may as well be contained inside the callback. – apsillers Aug 24 '12 at 14:48

How to capture plain text site into a variable using JavaScript or jQuery?

2 Answers2