Containerized e2e is the difference between a green build and a four-hour debugging session over Slack DM. "Works on my machine" stops being funny the third time it happens. This guide gives you a battle-tested Dockerfile and docker-compose.yml for running Cypress and Playwright in CI, plus the YoBox plumbing that keeps disposable inboxes and webhook receivers out of your container image.
The Docker Builder tool scaffolds the baseline; this article explains the why behind every line.
What good looks like
A solid e2e container has five properties:
Deterministic — same image, same result, this year and next.
Cached — node_modules and browser binaries don't re-download every run.
Shardable — N replicas, N× faster, no shared state.
Small enough — under ~1.5 GB so pulls don't dominate wall-clock time.
Observable — traces, screenshots, and videos survive the run.
YoBox helps with property 3 by giving every shard its own disposable inbox and webhook URL over plain HTTP. No mail server in the compose file, no tunnel sidecar.
Base image choice
Both Playwright and Cypress publish official images. They are large but they save you from chasing missing Chromium libs at 1 AM. Use them.
Playwright
FROM mcr.microsoft.com/playwright:v1.49.0-jammy
Cypress
FROM cypress/included:14.0.0
If you need both in one image (rare, but useful for golden tests), start from the Playwright image and npm i cypress on top — Cypress brings its own Electron, Playwright brings its own browsers, and they coexist fine.
رA production Dockerfile
```dockerfile FROM mcr.microsoft.com/playwright:v1.49.0-jammy
WORKDIR /app
1. Dependencies — separate layer for cache reuse COPY package.json package-lock.json ./ RUN npm ci --no-audit --no-fund
2. Browsers (Playwright auto-installs in base image; uncomment if pinning) # RUN npx playwright install --with-deps
3. Source COPY . .
4. Default env ENV CI=1 \ YOBOX=https://yobox.dev/api \ PWDEBUG=0
ENTRYPOINT ["npx", "playwright", "test"] ```
Key choices:
npm ci not npm install — reproducible installs.
Source copied after deps so a one-line spec change doesn't blow the cache.
YOBOX baked in so tests have a sane default but can be overridden per environment.
docker-compose for local parity
```yaml services: app: build: ./web ports: ["3000:3000"]
e2e: build: ./tests environment: - BASE_URL=http://app:3000 - YOBOX=https://yobox.dev/api depends_on: - app command: ["--shard=1/1"] ```
That's it. No mail server. No webhook tunnel. The tests reach YoBox over the public internet, which is exactly what CI does too — meaning your local run matches CI byte-for-byte.
Sharding in CI
GitHub Actions, four shards:
jobs:
e2e:
strategy:
matrix: { shard: ["1/4", "2/4", "3/4", "4/4"] }
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: docker compose build e2e
- run: docker compose run --rm e2e --shard=${{ matrix.shard }}
Because every test asks YoBox for its own inbox and its own webhook URL, the shards never collide.
Cache strategy
Image pulls are the single biggest cost in containerized e2e. Two tricks:
- uses: docker/setup-buildx-action@v3
- uses: docker/build-push-action@v6 with: context: ./tests cache-from: type=gha cache-to: type=gha,mode=max load: true tags: e2e:latest GHA's registry cache turns a cold 4-minute build into a 20-second warm build.
Trace and video artifacts
Mount an output volume and upload on failure:
- run: docker compose run --rm -v "$PWD/test-results:/app/test-results" e2e
- if: failure() uses: actions/upload-artifact@v4 with: name: traces-${{ matrix.shard }} path: test-results Pair Playwright's trace: "on-first-retry" with this and every flaky failure ships you a viewable trace.
Image size: what actually matters
| Layer | Size | Notes | | -------------------- | -------- | ------------------------------------ | | Playwright base | ~1.2 GB | Includes Chromium, Firefox, WebKit. | | Cypress base | ~1.4 GB | Includes Electron + Xvfb. | | node_modules | 200–500 MB | Cacheable separately. | | Source | <10 MB | Negligible. |
Stripping browsers you don't use saves more than micro-optimizing layers. Pick one browser engine per suite when you can.
Pairing with YoBox
Inside the container, every helper is a one-liner:
const inbox = await fetch(process.env.YOBOX + "/mail/new", { method: "POST" }).then(r => r.json());
That's all the integration code you need. The Cypress guide and Playwright guide cover the full fixture patterns.
Common pitfalls
npm install in CI. Always npm ci. Always.
--ipc=host missing for Chromium. Without it, Chromium crashes under load. docker run --ipc=host ...
Mounting the host node_modules. Don't. Native modules differ between host and container.
No browser pinning. Tag the Playwright base image with an exact version. latest will betray you.
Skipping retries. Set retries: 1 in CI to absorb single-request blips without masking real bugs.
FAQ
Can I run this on ARM (M-series Macs)? Yes — both Playwright and Cypress publish multi-arch images.
How do I avoid pulling the image every run? Use a self-hosted runner with a Docker volume, or GHA's registry cache.
Should I bake node_modules into the image? Yes for CI, no for local dev where you bind-mount source.
Where do I store test reports? Upload as an artifact and link from the PR. Don't commit them.
Conclusion
Containerized e2e is non-negotiable for any team running tests across more than one machine. The recipe above — official base image, npm ci, sharded compose, GHA cache, YoBox-backed inboxes and webhooks — gets you to a green pipeline in an afternoon. Generate your starting Dockerfile from the Docker Builder and customize from there.
See also: The Only docker-compose.yml Pattern You Need, Cypress + YoBox, Playwright + YoBox.
Multi-stage builds for smaller images
For self-hosted runners with bandwidth caps, a multi-stage build keeps only the runtime needed for tests:
\`dockerfile FROM mcr.microsoft.com/playwright:v1.49.0-jammy AS deps WORKDIR /app COPY package.json package-lock.json ./ RUN npm ci
FROM mcr.microsoft.com/playwright:v1.49.0-jammy WORKDIR /app COPY --from=deps /app/node_modules ./node_modules COPY . . ENTRYPOINT ["npx", "playwright", "test"] \`
Pinning vs floating tags
\v1.49.0-jammy\ is reproducible across years; \latest\ is reproducible for about 12 hours. Pin in CI, float in personal sandboxes.
GHA cache versus self-hosted
The GitHub Actions cache is fast but capped per repo. Self-hosted runners with a persistent Docker volume win above ~50 e2e jobs per day. Below that, GHA cache is simpler.
Migration tips
Most teams adopting containerized e2e do it after they've outgrown a single CI machine. The migration order that works: containerize the test runner first, then add sharding, then move to a self-hosted runner pool once cache pressure shows up.
Top comments (0)