
How npm, Yarn, and pnpm Store Dependencies
As front-end developers, package managers (npm, Yarn, pnpm) are essential tools in our daily work. When we run the install command, how are thousands of dependency packages downloaded, organized, and stored?
Understanding their underlying storage principles not only helps us solve real-world problems like phantom dependencies, disk space usage, and slow installation speeds, but is also a necessary step towards more efficient and stable front-end engineering.
This article will deeply analyze the storage strategies and the technical philosophies behind the three major tools: npm, Yarn, and pnpm.
In npm v2 and earlier versions, dependencies were stored in a fully nested way.
- Storage Structure: If package A depends on B, and B depends on C, the file structure strictly follows the dependency tree:
node_modules āāā A āāā node_modules āāā B āāā node_modules āāā C - Problems:
- Wasted Disk Space: Different versions of the same library were downloaded and stored multiple times.
- Long Paths: This "nested hell" could easily hit path length limits, especially on Windows systems.
To solve the "nested hell," npm v3 introduced a flattening mechanism called Hoisting. The classic Yarn (v1) used a similar approach.
- Core Mechanism: The installer tries to "hoist" (lift) as many modules as possible from the dependency tree into the project's root
node_modulesdirectory. - Storage Structure: Dependencies with version conflicts are nested inside their parent's
node_modules. Dependencies without conflicts are hoisted to the root. - Problems: Phantom Dependencies and Doppelgangers.
- Phantom Dependencies: Your code can accidentally use indirect dependencies (dependencies of your dependencies) that are hoisted but not listed in your
package.json. This creates a mismatch between declared and actual dependencies. - Doppelgangers: Even with flattening, the storage model is still based on file copying. Dependencies that cannot be hoisted remain as full, separate copies on disk.
- Phantom Dependencies: Your code can accidentally use indirect dependencies (dependencies of your dependencies) that are hoisted but not listed in your
With the release of Yarn v2 (Berry), Yarn introduced Plug'n'Play (PnP), a radical dependency management strategy that completely challenges the traditional node_modules structure.
- Storage Mechanism: Yarn PnP completely removes the traditional
node_modulesdirectory. Dependency packages are no longer extracted into your project folder. - Storage Location: Dependencies are downloaded as Zip files and stored in Yarn's global cache or the project's
.yarn/cachedirectory. - Benefits:
- Very Fast Installation: Avoids the slow file I/O operations of extracting files and building the
node_modulestree. - No Phantom Dependencies: Completely eliminates phantom dependencies because there is no flat
node_modulesfolder for non-declared dependencies to be accessed from.
- Very Fast Installation: Avoids the slow file I/O operations of extracting files and building the
- Core Idea: PnP takes over Node.js's standard module resolution (
require/import) by generating a special.pnp.cjsfile. - How it Works: This
.pnp.cjsfile is a large JavaScript file containing a complex map. It precisely records each package's name, version, and the exact location of its files within the Zip cache. - Module Lookup: When your code tries to import a module, Node.js first loads
.pnp.cjs. This file then uses the map to directly locate the file inside the dependency's Zip file, enabling isolated access.
Summary: Yarn PnP achieves near-zero installation time (if dependencies are already downloaded) and strict isolation. However, it works at the software level (by taking over Node.js) rather than the file system level, and requires specific configuration support from other tools (like Webpack, TypeScript).
pnpm (Performant npm) changed the storage model for package managers. It borrows the content-addressable idea from systems like Git, achieving true file sharing at a global level and strict dependency isolation.
pnpm's first key innovation is a global content store, which is the foundation of its efficiency and disk space savings.
- Storage Location: pnpm maintains a single storage directory called the Store in a global location on your disk (usually
~/.pnpm-store). - Storage Mechanism: When pnpm downloads a package, it generates a unique hash based on the package's content, version, and metadata. The Store's internal structure is organized using these hash values.
- For example, files for
lodash@4.17.21might be stored in a path like~/.pnpm-store/v3/files/c2b53c...d81a2f.
- For example, files for
- Result:
- Maximum Deduplication: No matter how many projects or versions depend on the same package, the actual files are stored on disk only once. This is true deduplication and space saving.
- Immutability: Hash-based storage ensures files cannot be changed once stored, improving reliability.
Instead of copying files into the project's node_modules, pnpm uses file system linking for efficient referencing and strict isolation.
š¹ First Layer: Physical Storage (Hard Links)
Hard links are the key technology for saving space.
- Operation: When you run
pnpm install, it creates hard links inside the project'snode_modulesthat point to the actual files in the global Store (~/.pnpm-store). - Advantage of Hard Links: They are file system "aliases" pointing to the same physical disk space (inode). Creating a hard link uses almost no extra disk space, and access speed is the same as accessing the original file.
š”ļø Second Layer: Virtual Store Directory
The structure of the node_modules folder in a pnpm project is key to its strict isolation.
- Virtual Store Directory (
.pnpm): There is a special.pnpmdirectory inside the rootnode_modules. This "virtual store" reflects pnpm's precise dependency graph and contains a flat list of all dependencies (both direct and indirect). - Naming Rule: Directory names include the package name, version, and a hash of its peer dependencies, for example:
The files inside this directory are hard links pointing to the actual files in the global Store.
node_modules/.pnpm/lodash@4.17.21/node_modules/lodash
š Third Layer: Dependency Isolation (Symbolic Links)
Symbolic links are the key technology for dependency isolation.
- Building the Tree: Inside each package directory in the virtual store (e.g.,
.../lodash/node_modules/lodash), symbolic links point to the exact locations of its dependencies within the.pnpmdirectory. - Project Root
node_modules: This directory contains only symbolic links for packages you directly declared in yourpackage.json. These links point to the corresponding entries in the.pnpmvirtual directory.
This layered structure perfectly solves the problems of traditional package managers:
| Layer | Content / Method | Solves |
|---|---|---|
| Store (Global) | Actual files, stored by content hash, shared across projects. | Wasted disk space (global deduplication). |
Virtual Dir (.pnpm) | Contains hard links to all dependencies, building strict relationships. | File redundancy (via hard links to the Store). |
Project node_modules | Contains only symbolic links to direct dependencies, pointing to .pnpm. | Phantom dependencies (forced isolation). |
When Node.js tries to resolve a package:
- It first looks in the project root
node_modules. - If found, it's a symbolic link. Node.js follows it to the
.pnpmvirtual directory. - Inside the
.pnpmdirectory, Node.js only sees the dependencies that this package directly declared. Non-declared dependencies are hidden by the unique symbolic link structure, achieving strict isolation.
Conclusion: pnpm's storage model is an engineering masterpiece. It uses hard links at the physical storage layer for maximum deduplication and symbolic links at the logical access layer for strict dependency isolation.
| Feature | npm (v3+) & Yarn (v1) | Yarn PnP (v2+) | pnpm |
|---|---|---|---|
| Storage Model | File copying/moving based on project node_modules | Zip cache-based, no node_modules | Global content store + project hard links |
| Disk Usage | Saves some repeats inside project (Hoisting), but global redundancy exists. | Uses Zip cache. Fast install, but needs tooling support. | Global deduplication. Files stored once. Saves huge space. |
node_modules | Mostly flat, but can be unpredictable. | No node_modules. Uses .pnp.cjs map. | Strict symbolic link structure. Forces isolation. |
| File Operations | Lots of file copying, moving, deleting. | Downloads Zip files. Almost no project file ops. | Mainly file system linking. Very I/O efficient. |
| Speed | Slower. | Very Fast (zero install if cached, needs first download). | Very Fast (especially on reinstall or switching projects). |
| Dependency Certainty | Risk of phantom dependencies. | Strict isolation. No phantom dependencies. | Strict isolation. No phantom dependencies. |
For those seeking efficiency and stability, especially in complex environments with multiple projects and dependency versions, both pnpm and Yarn PnP offer better solutions than traditional Hoisting. pnpm's advantage lies in its file-system-based universality and extreme disk space efficiency, making it a top choice for modern engineering.