-3

I'm trying to get all links of the articles of one blog ( https://www.mrmoneymustache.com ) so I can compile them into a pdf, but i'm a complete noob in javascript. Somebody on reddit told me to use this code, which is supposed to do what I want :

const fs = require('fs');
const EventEmitter = require('events').EventEmitter;
const fetch = require('node-fetch');
const cheerio = require('cheerio');

const e = new EventEmitter();

e.on('fetchPage', link => {
  fetch(link).then(r => r.text()).then(cheerio.load).then($ => {
    const nextLink = $(".next_post a").attr('href');
    if (nextLink === undefined) return; // end on final page
    const postTitle = $(".headline").text();
    const postContent = $(".post_content").html();
    console.log(postTitle);
    fs.writeFileSync(postTitle + ".html", postContent);
    setTimeout(() => e.emit('fetchPage', nextLink), 5000);
  });
});

e.emit('fetchPage', 'https://whatever/post1');

But I dont really get how I am supposed to run this program.. Help please ?

Anthony
  • 29
  • 7

2 Answers2

2

Install Node.js, then run this command in a command shell:

node yourfile.js
Luca Kiebel
  • 8,292
  • 5
  • 24
  • 37
Sébastien S.
  • 1,224
  • 5
  • 13
  • I think i'm really near to make it work, now I got this error : fetch(pageURL).then(r => r.text()).then(cheerio.load).then($ => { ^ ReferenceError: pageURL is not defined – Anthony Apr 22 '18 at 07:48
1

You will have to install node and then node-fetch and cheerio using npmjs, the node package manager. Then, run with

node thenameoftheprogram.js

There are many scraping tools, however, that can be used online and where the learning curve is less steep. They could be maybe be a better match for your problem.

jjmerelo
  • 19,108
  • 5
  • 33
  • 72
  • So I'm now trying to use `npm install cheerio` and npm `install node-fetch`, therefore I get in both cases the errors : EDIT : I wasnt in the directory of Cheerio's files, now I get this ... – Anthony Apr 22 '18 at 07:13
  • That's a different problem, and that is why I said that it was better to use a tool with a less steep learning curve. `npm install cheerio` should work out of the box. Please step back for a minute and consider that maybe that program is not the best way to solve your problem, since it's causing additional ones. – jjmerelo Apr 22 '18 at 07:18
  • I would like to avoid learning everything from Js just for one script... – Anthony Apr 22 '18 at 07:23
  • That's exactly what I'm saying... You might want to use things like this https://darrennewton.com/2011/10/30/mirror-site-and-convert-to-pdf/ instead. It's only a matter of installing a couple of utilities. – jjmerelo Apr 22 '18 at 07:36