3
app.post('/api/auth/check', async (req, res) => {
try {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto(
    'https://www.google.com'
  );
  res.json({message: 'Success'})
} catch (e) {
  console.log(e);
  res.status(500).json({ message: 'Error' });
}});

app.post('/api/auth/register', async (req, res) => {
  console.log('register');
  // Here i'm need to transfer the current user session (page and browser) and then perform actions on the same page.
  await page.waitForTimeout(1000);
  await browser.close();
}});

Is it possible to somehow transfer page and browser from one route to another while maintaining puppeteer concurrency. If you set the variable globally, then the page and browser will be overwritten and multitasking will not work.

Vaviloff
  • 13,291
  • 3
  • 39
  • 45
apsenT
  • 33
  • 3
  • Welcome to SO! I'm not sure if the terms "parallelism", "concurrency" and "multitasking" are what you're looking for here. Node is single-threaded, async event-driven. I think you mean you want to maintain individual browser instances across routes without blocking the event loop and I answered under this assumption. Feel free to clarify if your intent is something else. – ggorlen Apr 05 '21 at 00:22

1 Answers1

1

One approach is to create a closure that returns promises that will resolve to the same page and browser instances. Since HTTP is stateless, I assume you have some session/authentication management system that associates a user's session with a Puppeteer browser instance.

I've simplified your routes a bit and added a naive token management system to associate a user with a session in the interests of making a complete, runnable example but I don't think you'll have problems adapting it to your use case.

const express = require("express");
const puppeteer = require("puppeteer");

// https://stackoverflow.com/questions/51391080/handling-errors-in-express-async-middleware 
const asyncHandler = fn => (req, res, next) =>
  Promise.resolve(fn(req, res, next)).catch(next)
;
const startPuppeteerSession = async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  return {browser, page};
};
const sessions = {};

express()
  .use((req, res, next) => 
    req.query.token === undefined ? res.sendStatus(401) : next()
  )
  .get("/start", asyncHandler(async (req, res) => {
    sessions[req.query.token] = await startPuppeteerSession();
    res.sendStatus(200);
  }))
  .get("/navigate", asyncHandler(async (req, res) => {
    const page = await sessions[req.query.token].page;
    await page.goto(req.query.to || "http://www.example.com");
    res.sendStatus(200);
  }))
  .get("/content", asyncHandler(async (req, res) => {
    const page = await sessions[req.query.token].page;
    res.send(await page.content()); 
  }))
  .get("/kill", asyncHandler(async (req, res) => {
    const browser = await sessions[req.query.token].browser;
    await browser.close();
    delete sessions[req.query.token];
    res.sendStatus(200);
  }))
  .use((err, req, res, next) => res.sendStatus(500))
  .listen(8000, () => console.log("listening on port 8000"))
;

Sample usage from the client's perspective:

$ curl localhost:8000/start?token=1
OK
$ curl 'localhost:8000/navigate?to=https://stackoverflow.com/questions/66935883&token=1'
OK
$ curl localhost:8000/content?token=1 | grep 'apsenT'
        <a href="/users/15547056/apsent">apsenT</a><span class="d-none" itemprop="name">apsenT</span>
            <a href="/users/15547056/apsent">apsenT</a> is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                        <a href="/users/15547056/apsent">apsenT</a> is a new contributor. Be nice, and check out our <a href="/conduct">Code of Conduct</a>.
$ curl localhost:8000/kill?token=1
OK

You can see the client associated with token 1 has persisted a single browser session across multiple routes. Other clients can launch browser sessions and manipulate them simultaneously.

To reiterate, this is only a proof-of-concept of sharing a Puppeteer browser instance across routes. Using the code above, a user can just spam the start route and create browsers until the server crashes, so this is totally unfit for production without real authentication and session management/error handling.

Packages used: express ^4.17.1, puppeteer ^8.0.0.

ggorlen
  • 26,337
  • 5
  • 34
  • 50