1

Perhaps I'm asking too much but asking it is much better than assuming it doesn't exists.

In order to create a converter of webpages into single file (including all the external resources the html), I'm planning to use Data URI Scheme. But the problem with such a conversion is that lots of pages incorporate Javascript codes to come up with the URI of a resource. i. e. it's completely ordinary for a page to set the src of an image with JS.

In worst case scenario, the URI is composed in which cases it's absolutely impossible to replace the URI with Data URI Scheme, e. g. img.src = 'domain.com/' + some_variable + '.png';.

The only way I can think of to overcome such scenarios and all other, is to introduce a middle block of code after URI is changed and before it is downloaded. If such opportunity exits, I can work on the requested URI and replace it with appropriate Data URI Scheme. It also benefits from a global Data URI Scheme storage that leads to smaller html file size.

Does anyone know if such opportunity exists? Or if one can create it?

[EDIT]

Here's how I would like my single-file webpage to be:

<script>
var uri_dictionary = {
    'http://www.domain.com/1.png': '...',
    'http://www.domain.com/2.png': '...'
};

function translateURI(url) {
    if (uri_dictionary[url]) {
        return uri_dictionary[url];
    }
    else {
        return url;
    }
}
</script>

The above is a simplified version of the code that I'm planning to inject into the single-file webpage that I'm going to generate. It's just that the translateURI function needs to intercept all the outgoing requests before they are sent. And replace their requested URI in feasible cases.

Mehran
  • 12,150
  • 14
  • 78
  • 191
  • Dou you want to do this on your own sites or others? – Lorenz Oct 28 '14 at 19:26
  • On my own, for the website I code. But I don't see how does that matter? – Mehran Oct 28 '14 at 19:28
  • Can you use PhantomJS? It could run the whole javascript and you could later use the full urls to download the images. If you are interested, I can provide an example. – Lorenz Oct 28 '14 at 19:30
  • Actually I do plan on using PhantomJs. But I don't think that's gonna solve anything. The simplest case to demonstrate the problem is an image which toggles between two sources since it is clicked. And the new source's URI will be composed with a JS code. How are you planning to benefit from PhantomJs solving such a complex problem? – Mehran Oct 28 '14 at 19:34

1 Answers1

1

I don't think there is a general solution to this problem. You could (in theory) analyze the code paths and try all possible cases, but that would be "a bit" overkill and probably cause the site to load millions of images into the HTML. You could also try to manually specify the images to load, since it is your own site.

Your approach to intercept the image load event is also not possible (According to https://stackoverflow.com/a/6974878/2224188)

You can't just "cancel" the loading process of an image (setting the src attribute to an empty string is not a good idea). In fact, even if you could, doing so would only make things worst, as your server would continually send data that would get cancelled, increasing it's load factor and slow it down.

I also don't really like the idea of embedding images into HTML because it circumvents all caching mechanisms and puts more load on your server.

Edit I hacked some code together, the $("img").removeAttr part seemed to work fine (it aborted the load quick enough, at least on Chrome), the rest is untested.

<script type="text/javascript" charset="utf-8">
$(document).ready(function() {
  var links = $("img").map(function() {
    return $(this).attr("src");
  });
  $("img").removeAttr("src");
  $("img").each(function (image,index) {
    $(image).attr("src",translateURI(links[index]));
  }
} );
</script>

Try to put that in the head of the page

Community
  • 1
  • 1
Lorenz
  • 2,003
  • 2
  • 16
  • 18
  • You are absolutely right regarding the prevention of caching and all the downsides of the embedded resources. But I'm working on a scenario which not just that it's feasible (due to small resources) but also it's the only way. In other words, it's impossible for me to keep a webpage in multiple files. – Mehran Oct 28 '14 at 19:47
  • Then my first approach would be to insert the data into the Javascript and not the links. If that is not possible too, look at the canvas tag. – Lorenz Oct 28 '14 at 19:50
  • To be frank, I'll be amazed if I can do what I'm asking. But the post you've mentioned is not a veto for my question. I'm not trying to abort the sent request. I'm trying to divert it before it is even sent. Which means I'm trying to prevent any request from being sent and instead load them from a previously downloaded Data URI Scheme string. – Mehran Oct 28 '14 at 19:50
  • Even though your hack solves the problem for initial value of `src`, but it won't do any good if the `src` is updated by a code later on. – Mehran Oct 28 '14 at 20:21
  • Register a DOM modification handler (http://stackoverflow.com/a/13835369/2224188) and do the same to every change that I did above. – Lorenz Oct 28 '14 at 20:27