60

I want to create a scraper that:

  1. opens a headless browser,
  2. goes to a url,
  3. logs in (there is steam oauth),
  4. fills some inputs,
  5. and clicks 2 buttons.

My problem is that every new instance of headless browser clears my login session, and then I need to login again and again...

How to save it through instances? (using puppeteer with headless chrome)

Or how can I open already logged in chrome headless instance? (if I have already logged in in my main chrome window)

theDavidBarton
  • 3,429
  • 3
  • 10
  • 33
Anton Kurtin
  • 603
  • 1
  • 5
  • 7
  • See also [Puppeteer: how to store a session (including cookies, page state, local storage, etc) and continue later?](https://stackoverflow.com/questions/57987585/puppeteer-how-to-store-a-session-including-cookies-page-state-local-storage#57995750) – ggorlen Apr 26 '21 at 21:39

4 Answers4

100

There is an option to save user data using the userDataDir option when launching puppeteer. This stores the session and other things related to launching chrome.

puppeteer.launch({
  userDataDir: "./user_data"
});

It doesn't go into great detail but here's a link to the docs for it: https://pptr.dev/#?product=Puppeteer&version=v1.6.1&show=api-puppeteerlaunchoptions

meatherly
  • 1,391
  • 2
  • 11
  • 16
  • 3
    This is a better solution, keep cookie and local storage – r1si Jan 16 '19 at 09:23
  • 2
    This is the easiest way to persist the session, though you might end up storing more data than what you need. Just launching a browser with this configuration creates a folder containing ~3mb of data. If storage is a concern, you might want to consider @Ecovirtual solution. Otherwise, this is perfect. – Rafael Mejía Mar 30 '19 at 20:17
  • Good answer but this take more disk space, can I specify the cookies only to save into this folder?? – Ghyath Darwish Jul 28 '19 at 09:39
  • I use it not't ok, what should i do? – Sinosaurus Oct 17 '19 at 10:41
  • Does this approach store SESSION cookies? Semantically, the session ends when you close the browser, so I could see the argument for not persisting session cookies between puppeteer page instances this way. – jamis0n Feb 02 '20 at 18:56
  • 1
    Here's a version-agnostic link to the Puppeteer docs for launch options, since the version update has killed the old link: https://pptr.dev/#?product=Puppeteer&show=api-puppeteerlaunchoptions – Rick Gladwin Aug 16 '20 at 03:28
49

In puppeter you have access to the session cookies through page.cookies().

So once you log in, you could get every cookie and save it in a json file:

const fs = require(fs);
const cookiesFilePath = 'cookies.json';
// Save Session Cookies
const cookiesObject = await page.cookies()
// Write cookies to temp file to be used in other profile pages
fs.writeFile(cookiesFilePath, JSON.stringify(cookiesObject),
 function(err) { 
  if (err) {
  console.log('The file could not be written.', err)
  }
  console.log('Session has been successfully saved')
})

Then, on your next iteration right before using page.goto() you can call page.setCookie() to load the cookies from the file one by one:

const previousSession = fs.existsSync(cookiesFilePath)
if (previousSession) {
  // If file exist load the cookies
  const cookiesString = fs.readFileSync(cookiesFilePath);
  const parsedCookies = JSON.parse(cookiesString);
  if (parsedCookies.length !== 0) {
    for (let cookie of parsedCookies) {
      await page.setCookie(cookie)
    }
    console.log('Session has been loaded in the browser')
  }
}

Checkout the docs:

EcoVirtual
  • 1,132
  • 11
  • 15
  • jsonfile does not seem to work when headless: false, the documentation says "Note: this module cannot be used in the browser." – Zeeshan Chawdhary Jan 07 '19 at 21:22
  • 2
    fileExistSync is not a valid function... need to use : https://stackoverflow.com/questions/4482686/check-synchronously-if-file-directory-exists-in-node-js – r1si Jan 16 '19 at 08:42
  • Just updated to use Node's "fs" instead of external dependency for writing and reading files. – EcoVirtual Nov 10 '20 at 12:42
19

For a version of the above solution that actually works and doesn't rely on jsonfile (instead using the more standard fs) check this out:

Setup:

const fs = require('fs');
const cookiesPath = "cookies.txt";

Reading the cookies (put this code first):

// If the cookies file exists, read the cookies.
const previousSession = fs.existsSync(cookiesPath)
if (previousSession) {
  const content = fs.readFileSync(cookiesPath);
  const cookiesArr = JSON.parse(content);
  if (cookiesArr.length !== 0) {
    for (let cookie of cookiesArr) {
      await page.setCookie(cookie)
    }
    console.log('Session has been loaded in the browser')
  }
}

Writing the cookies:

// Write Cookies
const cookiesObject = await page.cookies()
fs.writeFileSync(cookiesPath, JSON.stringify(cookiesObject));
console.log('Session has been saved to ' + cookiesPath);
Daniel Porteous
  • 3,173
  • 1
  • 19
  • 33
0

For writing Cookies

async function writingCookies() {
const cookieArray = require(C.cookieFile); //C.cookieFile can be replaced by ('./filename.json')
await page.setCookie(...cookieArray);
await page.cookies(C.feedUrl); //C.url can be ('https://example.com')
}

For reading Cookies, for this, you've to install jsonfile in your project : npm install jsonfile

async function getCookies() {
const cookiesObject = await page.cookies();
jsonfile.writeFile('linkedinCookies.json', cookiesObject, { spaces: 2 },
  function (err) {
    if (err) {
      console.log('The Cookie file could not be written.', err);
    }
    console.log("Cookie file has been successfully saved in current working Directory : '" + process.cwd() + "'");
  })
}

Call these two functions using await and it will work for you.