8

I need to be able to offer replica sites (to www.google.com, www.facebook.com, etc. any site) through my node server. I found this library:

https://github.com/nodejitsu/node-http-proxy

And I used the following code when proxying requests:

options = {
  ignorePath: true,
  changeOrigin: false
}

var proxy = httpProxy.createProxyServer({options});

router.get(function(req, res) {
  proxy.web(req, res, { target: req.body.url });
});

However, this configuration causes an error for most sites. Depending on the site, I'll get an Unknown service error coming from the target url, or an Invalid host... something along those lines. However, when I pass

changeOrigin: true

I get a functioning proxy service, but my the user's browser gets redirected to the actual url of their request, not to mine (so if req.body.url = http://www.google.com, the request will go to http://www.google.com)

How can I make it so my site's url gets shown, but so that I can exactly copy whatever is being displayed? I need to be able to add a few JS files to the request, which I'm doing using another library.

For clarification, here is a summary of the problem:

  1. The user requests a resource that has a url property

  2. This url is in the form of http://www.example.com

  3. My server, running on www.pv.com, need to be able to direct the user to www.pv.com/http://www.example.com

  4. The HTTP response returned alongside www.pv.com/http://www.example.com is a full representation of http://www.example.com. I need to be able to add my own Javascript/HTML files in this response as well.

serv-inc
  • 29,557
  • 9
  • 128
  • 146
db2791
  • 860
  • 3
  • 14
  • 28
  • In what form you proxy gets incoming requests? If the user wants to open site `http://example.com/somepage.html` what do you get? Is it just a simple HTTP GET request to `http://yourproxy.somedomain/?url=http://example.com/somepage.html`? Or what **_exactly_** it is? – SergGr Apr 05 '17 at 15:56
  • 1
    @SergGr It is not one way the user accesses the proxy. One way, however, is that the server fetches an item in a database that contains a field for `url`, which contains the url that will be used in the proxy. The url is not encoded, so it is in the form of `http://example.com/somepage.html` – db2791 Apr 05 '17 at 16:01
  • 1
    Sorry, but if this is not a proxy in a simple sense, why don't you put your real problem into a question? Is there any incoming HTTP request at all? Should it be server with the response from the remote server or should the remote server's response be post-processed? – SergGr Apr 05 '17 at 16:06
  • @SergGr I hope I've clarified the question with my edit – db2791 Apr 05 '17 at 17:03
  • @db2791 This link can be helpful for you http://stackoverflow.com/questions/42156282/how-to-cluster-node-app-in-multiple-machines – Arpit Kumar Apr 11 '17 at 05:24
  • Do you need to support HTTPS, redirects from the target resource, url rewrite (so clicking on the link will work correctly)? – evgeny.myasishchev Apr 11 '17 at 14:51

3 Answers3

6

Looking at https://stackoverflow.com/a/32704647/1587329, the only difference is that it uses a different target parameter:

var http = require('http');
var httpProxy = require('http-proxy');
var proxy = httpProxy.createProxyServer({});

http.createServer(function(req, res) {
    proxy.web(req, res, { target: 'http://www.google.com' });
}).listen(3000);

This would explain the Invalid host error: you need to pass a host as the target parameter, not the whole URL. Thus, the following might work:

options = {
  ignorePath: true,
  changeOrigin: false
}

var proxy = httpProxy.createProxyServer({options});

router.get(function(req, res) {
  var url = req.body.url;
  proxy.web(req, res, { target: url.protocol + '//' + url.host });
});

For the URL object, see the NodeJS website.

Community
  • 1
  • 1
serv-inc
  • 29,557
  • 9
  • 128
  • 146
  • My `req.body.url` is replaceable with `'http://www.google.com'`, but it doesn't change the errors I'm getting. – db2791 Apr 14 '17 at 17:10
2

Use a headless browser to navigate to the website and get the HTML of the website. Then send the HTML as a response for the website requested. One advantage of using a headless browser is that it allows you to get the HTML from sites rendered with JavaScript. Nightmare.js (an API or library for electron.js) is a good choice because it uses Electron.js under the hood. The electron framework is faster than Phantom.js (an alternative). With Nightmare.js you can inject a JavaScript file into the page as shown in the code snippet below. You may need to tweak the code to add other features. Currently, I am only allowed to add two links, so links to other resources are in the code snippet.


apt-get update && apt-get install -y xvfb x11-xkb-utils xfonts-100dpi
xfonts-75dpi xfonts-scalable xfonts-cyrillic x11-apps clang
libdbus-1-dev libgtk2.0-dev libnotify-dev libgnome-keyring-dev
libgconf2-dev libasound2-dev libcap-dev libcups2-dev libxtst-dev
libxss1 libnss3-dev gcc-multilib g++-multilib

-

// example: http://hostname.com/http://www.tutorialspoint.com/articles/how-to-configure-and-install-redis-on-ubuntu-linux
//X server: http://www.linfo.org/x_server.html

var express = require('express')
var Nightmare = require('nightmare')// headless browser
var Xvfb = require('xvfb')// run headless browser using X server
var vo = require('vo')// run generator function
var app = express()
var xvfb = new Xvfb()


app.get('/', function (req, res) {
  res.end('')
})

// start the X server to run nightmare.js headless browser
xvfb.start(function (err, xvfbProcess) {
  if (!err) {
    app.get('/*', function (req, res) {
      var run = function * () {
        var nightmare = new Nightmare({
          show: false,
          maxAuthRetries: 10,
          waitTimeout: 100000,
          electronPath: require('electron'),
          ignoreSslErrors: 'true',
          sslProtocol: 'tlsv1'
        })

        var result = yield nightmare.goto(req.url.toString().substring(1))
        .wait()
        // .inject('js', '/path/to/.js') inject a javascript file to manipulate or inject html
        .evaluate(function () {
          return document.documentElement.outerHTML
        })
        .end()
        return result
      }

      // execute generator function
      vo(run)(function (err, result) {
        if (!err) {
          res.end(result)
        } else {
          console.log(err)
          res.status(500).end()
        }
      })
    })
  }
})

app.listen(8080, '0.0.0.0')
serv-inc
  • 29,557
  • 9
  • 128
  • 146
Citrudev
  • 43
  • 1
  • 5
  • apt-get update && apt-get install -y xvfb x11-xkb-utils xfonts-100dpi xfonts-75dpi xfonts-scalable xfonts-cyrillic x11-apps clang libdbus-1-dev libgtk2.0-dev libnotify-dev libgnome-keyring-dev libgconf2-dev libasound2-dev libcap-dev libcups2-dev libxtst-dev libxss1 libnss3-dev gcc-multilib g++-multilib – Citrudev Apr 26 '17 at 19:08
0

You need to have HTTPS, as most of the websites you mentioned will redirect to their HTTPS version of their website. Perhaps, instead of doing http proxy you are better of with SOCKS proxy if you want to provide access to some websites from places where these are forbidden/blocked.

Pavel P
  • 13,962
  • 11
  • 68
  • 109