How to build a docker image from a nodejs project in a monorepo with yarn workspaces

Question

We are currently looking into CI/CD with our team for our website. We recently also adapted to a monorepo structure as this keeps our dependencies and overview a lot easier. Currently testing etc is ready for the CI but I'm now onto the deployment. I would like to create docker images of the needed packages.

Things I considered:

1) Pull the full monorepo into the docker project but running a yarn install in our project results in a total project size of about 700MB and this mainly due to our react native app which shouldn't even have a docker image. Also this should result in a long image pull time every time we have to deploy a new release

2) Bundle my projects in some kind of way. With our frontend we have working setup so that should be ok. But I just tried to add webpack to our express api and ended up with an error inside my bundle due to this issue: https://github.com/mapbox/node-pre-gyp/issues/308

3) I tried running yarn install only inside the needed project but this will still install my node_modules for all my projects.

4) Run the npm package: pkg. This results in a single file ready to run on a certain system with a certain node version. This DOES work but I'm not sure how well this will handle errors and crashes.

5) Another solution could be copying the project out of the workspace and running a yarn install on it over there. The issue with this is that the use of yarn workspaces (implicitly linked dependencies) is as good as gone. I would have to add my other workspace dependencies explicitly. A possibility is referencing them from a certain commit hash, which I'm going to test right now. (EDIT: you can't reference a subdirectory as a yarn package it seems)

6) ???

I'd like to know if I'm missing an option to have only the needed node_modules for a certain project so I can keep my docker images small.

have you found a solution to this? I am working on a similar project. — Peter, Sep 05 '18 at 16:42
This is not going to be a problem if you publish your packages to npm, you should not depend directly on the package in the disk during deployment, but on the one submitted to the registry. The automatic linking yarn does should only be used during development. If you keep this in mind you are going to have no problems with a normal deployment were you just copy the service directory to the docker image and install the deps there. — jonathancardoso, Dec 04 '19 at 15:09

score 20 · Answer 1 · answered Oct 14 '18 at 18:36

I've worked on a project following a structure similar to yours, it was looking like:

project
├── package.json
├── packages
│   ├── package1
│   │   ├── package.json
│   │   └── src
│   ├── package2
│   │   ├── package.json
│   │   └── src
│   └── package3
│       ├── package.json
│       └── src
├── services
│   ├── service1
│   │   ├── Dockerfile
│   │   ├── package.json
│   │   └── src
│   └── service2
│       ├── Dockerfile
│       ├── package.json
│       └── src
└── yarn.lock

The services/ folder contains one service per sub-folder. Every service is written in node.js and has its own package.json and Dockerfile. They are typically web server or REST API based on Express.

The packages/ folder contains all the packages that are not services, typically internal libraries.

A service can depend on one or more package, but not on another service. A package can depend on another package, but not on a service.

The main package.json (the one at the project root folder) only contains some devDependencies, such as eslint, the test runner etc.

An individual Dockerfile looks like this, assuming service1 depends on both package1 & package3:

FROM node:8.12.0-alpine AS base

WORKDIR /project

FROM base AS dependencies

# We only copy the dependencies we need
COPY packages/package1 packages/package1
COPY packages/package3 packages/package3

COPY services/services1 services/services1

# The global package.json only contains build dependencies
COPY package.json .

COPY yarn.lock .

RUN yarn install --production --pure-lockfile --non-interactive --cache-folder ./ycache; rm -rf ./ycache

The actual Dockerfiles I used were more complicated, as they had to build the sub-packages, run the tests etc. But you should get the idea with this sample.

As you can see the trick was to only copy the packages that are needed for a specific service. The yarn.lock file contains a list of package@version with the exact version and dependencies resolved. To copy it without all the sub-packages is not a problem, yarn will use the version resolved there when installing the dependencies of the included packages.

In your case the react-native project will never be part of any Dockerfile, as it is the dependency of none of the services, thus saving a lot of space.

For sake of conciseness, I omitted a lot of details in that answer, feel free to ask for precision in the comment if something isn't really clear.

How does `COPY packages/package1 packages/package1` work if the Dockerfile is located inside the service1 directory? Isn't it `COPY ../../packages/package1 packages/package1`? — HenningCash, Nov 20 '18 at 11:56
It's because I was using a build command such as `docker build -f ./services/service1/Dockerfile .` that sets the context to the current directory (the project root in this case) with the Dockerfile of service1. — Anthony Garcia-Labiad, Nov 20 '18 at 22:59
I really wish there was a way to not have to copy the packages in and just let webpack handle installing the dependencies. Is this possible? — Travis Tubbs, Feb 12 '19 at 16:28
The downside of this approach is that you have to define your dependencies twice; once in your service's `package.json` and once in your `Dockerfile`. — Nepoxx, Nov 04 '20 at 20:06
You can auto generate parts of `Dockerfile`s in precommit hook/ci with info from `package.json` files. — hexagoncode, Jan 30 '21 at 20:47
How does using typescript changes things? When you compile it creates a directory with the js files ... — Nathan H, Apr 25 '21 at 15:04

score 0 · Answer 2 · answered Mar 04 '19 at 20:13

We put our backend services to a monorepo recently and this was one of a few points that we had to solve. Yarn doesn't have anything that would help us in this regard so we had to look elsewhere.

First we tried @zeit/ncc, there were some issues but eventually we managed to get the final builds. It produces one big file that includes all your code and also all your dependencies code. It looked great. I had to copy to the docker image only a few files (js, source maps, static assets). Images were much much smaller and the app worked. BUT the runtime memory consumption grew a lot. Instead of ~70MB the running container consumed ~250MB. Not sure if we did something wrong but I haven't found any solution and there's only one issue mentioning this. I guess Node.js load parses and loads all the code from the bundle even though most of it is never used.

All we needed is to separate each of the packages production dependencies to build a slim docker image. It seems it's not so simple to do but we found a tool after all.

We're now using fleggal/monopack. It bundles our code with Webpack and transpile it Babel. So it produces also one file bundle but it doesn't contain all the dependencies, just our code. This step is something we don't really needed but we don't mind it's there. For us the important part is - Monopack copies only the package's production dependency tree to the dist/bundled node_modules. That's exactly what we needed. Docker images now have 100MB-150MB instead of 700MB.

There's one easier way. If you have only a few really big npm modules in your node_modules you can use nohoist in your root package.json. That way yarn keeps these modules in package's local node_modules and it doesn't have to be copied to Docker images of all other services.

eg.:

"nohoist": [
  "**/puppeteer",
  "**/puppeteer/**",
  "**/aws-sdk",
  "**/aws-sdk/**"
]

score 0 · Answer 3 · answered Jul 30 '19 at 23:34

After a lot of trial and error I've found that using that careful use of the file .dockerignore is a great way to control your final image. This works great when running under a monorepo to exclude "other" packages.

For each package, we have a similar named dockerignore file that replaces the live .dockerignore file just before the build.

e.g., cp admin.dockerignore .dockerignore

Below is an example of admin.dockerignore. Note the * at the top of that file that means "ignore everything". The ! prefix means "don't ignore", i.e., retain. The combination means ignore everything except for the specified files.

*
# Build specific keep
!packages/admin

# Common Keep
!*.json
!yarn.lock
!.yarnrc
!packages/common

**/.circleci
**/.editorconfig
**/.dockerignore
**/.git
**/.DS_Store
**/.vscode
**/node_modules

How to build a docker image from a nodejs project in a monorepo with yarn workspaces

3 Answers3

Linked