Conversation
|
Review requested:
|
|
The
notable-change
Please suggest a text for the release notes if you'd like to include a more detailed summary, then proceed to update the PR description with the text or a link to the notable change suggested text comment. Otherwise, the commit will be placed in the Other Notable Changes section. |
|
Nice! This is a great addition. Since it's such a large PR, this will take me some time to review. Will try to tackle it over the next week. |
| */ | ||
| existsSync(path) { | ||
| // Prepend prefix to path for VFS lookup | ||
| const fullPath = this.#prefix + (StringPrototypeStartsWith(path, '/') ? path : '/' + path); |
| validateObject(files, 'options.files'); | ||
| } | ||
|
|
||
| const { VirtualFileSystem } = require('internal/vfs/virtual_fs'); |
There was a problem hiding this comment.
Shouldn't we import this at the top level / lazy load it at the top level?
| ArrayPrototypePush(this.#mocks, { | ||
| __proto__: null, | ||
| ctx, | ||
| restore: restoreFS, |
There was a problem hiding this comment.
| restore: restoreFS, | |
| restore: ctx.restore, |
nit
lib/internal/vfs/entries.js
Outdated
| * @param {object} [options] Optional configuration | ||
| */ | ||
| addFile(name, content, options) { | ||
| const path = this._directory.path + '/' + name; |
lib/internal/vfs/virtual_fs.js
Outdated
| let entry = current.getEntry(segment); | ||
| if (!entry) { | ||
| // Auto-create parent directory | ||
| const dirPath = '/' + segments.slice(0, i + 1).join('/'); |
lib/internal/vfs/virtual_fs.js
Outdated
| let entry = current.getEntry(segment); | ||
| if (!entry) { | ||
| // Auto-create parent directory | ||
| const parentPath = '/' + segments.slice(0, i + 1).join('/'); |
lib/internal/vfs/virtual_fs.js
Outdated
| } | ||
| } | ||
| callback(null, content); | ||
| }).catch((err) => { |
There was a problem hiding this comment.
| }).catch((err) => { | |
| }, (err) => { |
lib/internal/vfs/virtual_fs.js
Outdated
| const bytesToRead = Math.min(length, available); | ||
| content.copy(buffer, offset, readPos, readPos + bytesToRead); |
lib/internal/vfs/virtual_fs.js
Outdated
| } | ||
|
|
||
| callback(null, bytesToRead, buffer); | ||
| }).catch((err) => { |
There was a problem hiding this comment.
| }).catch((err) => { | |
| }, (err) => { |
|
Left an initial review, but like @Ethan-Arrowood said, it'll take time for a more in depth look |
|
It's nice to see some momentum in this area, though from a first glance it seems the design has largely overlooked the feedback from real world use cases collected 4 years ago: https://github.com/nodejs/single-executable/blob/main/docs/virtual-file-system-requirements.md - I think it's worth checking that the API satisfies the constraints that users of this feature have provided, to not waste the work that have been done by prior contributors to gather them, or having to reinvent it later (possibly in a breaking manner) to satisfy these requirements from real world use cases. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #61478 +/- ##
==========================================
+ Coverage 89.68% 89.81% +0.12%
==========================================
Files 676 692 +16
Lines 206555 215106 +8551
Branches 39552 41196 +1644
==========================================
+ Hits 185249 193189 +7940
- Misses 13444 14030 +586
- Partials 7862 7887 +25
🚀 New features to boost your workflow:
|
|
And why not something like OPFS aka whatwg/fs? const rootHandle = await navigator.storage.getDirectory()
await rootHandle.getFileHandle('config.json', { create: true })
fs.mount('/app', rootHandle) // to make it work with fs
fs.readFileSync('/app/config.json')OR const rootHandle = await navigator.storage.getDirectory()
await rootHandle.getFileHandle('config.json', { create: true })
fs.readFileSync('sandbox:/config.json')fs.createVirtual seems like something like a competing specification |
5e317de to
977cc3d
Compare
I generally prefer not to interleave with WHATWG specs as much as possible for core functionality (e.g., SEA). In my experience, they tend to perform poorly on our codebase and remove a few degrees of flexibility. (I also don't find much fun in working on them, and I'm way less interested in contributing to that.) On an implementation side, the core functionality of this feature will be identical (technically, it's missing writes that OPFS supports), as we would need to impact all our internal fs methods anyway. If this lands, we can certainly iterate on a WHATWG-compatible API for this, but I would not add this to this PR. |
|
Small prior art: https://github.com/juliangruber/subfs |
8d711c1 to
73c18cd
Compare
|
I also worked on this a bit on the side recently: Qard@73b8fc6 That is very much in chaotic ideation stage with a bunch of LLM assistance to try some different ideas, but the broader concept I was aiming for was to have a module.exports = new VirtualFileSystem(new LocalProvider())I intended for it to be extensible for a bunch of different interesting scenarios, so there's also an S3 provider and a zip file provider there, mainly just to validate that the model can be applied to other varieties of storage systems effectively. Keep in mind, like I said, the current state is very much just ideation in a branch I pushed up just now to share, but I think there are concepts for extensibility in there that we could consider to enable a whole ecosystem of flexible storage providers. 🙂 Personally, I would hope for something which could provide both read and write access through an abstraction with swappable backends of some variety, this way we could pass around these virtualized file systems like objects and let an ecosystem grow around accepting any generalized virtual file system for its storage backing. I think it'd be very nice for a lot of use cases like file uploads or archive management to be able to just treat them like any other readable and writable file system. |
just a bit off topic... but this reminds me of why i created this feature request: Would not lie, it would be cool if NodeJS also provided some type of static example that would only work in NodeJS (based on how it works internally) const size = 26
const blobPart = BlobFrom({
size,
stream (start, end) {
// can either be sync or async (that resolves to a ReadableStream)
// return new Response('abcdefghijklmnopqrstuvwxyz'.slice(start, end)).body
// return new Blob(['abcdefghijklmnopqrstuvwxyz'.slice(start, end)]).stream()
return fetch('https://httpbin.dev/range/' + size, {
headers: {
range: `bytes=${start}-${end - 1}`
}
}).then(r => r.body)
}
})
blobPart.text().then(text => {
console.log('a-z', text)
})
blobPart.slice(-3).text().then(text => {
console.log('x-z', text)
})
const a = blobPart.slice(0, 6)
a.text().then(text => {
console.log('a-f', text)
})
const b = a.slice(2, 4)
b.text().then(text => {
console.log('c-d', text)
})An actual working PoC(I would not rely on this unless it became officially supported by nodejs core - this is a hack) const blob = new Blob()
const symbols = Object.getOwnPropertySymbols(blob)
const blobSymbol = symbols.map(s => [s.description, s])
const symbolMap = Object.fromEntries(blobSymbol)
const {
kHandle,
kLength,
} = symbolMap
function BlobFrom ({ size, stream }) {
const blob = new Blob()
if (size === 0) return blob
blob[kLength] = size
blob[kHandle] = {
span: [0, size],
getReader () {
const [start, end] = this.span
if (start === end) {
return { pull: cb => cb(0) }
}
let reader
return {
async pull (cb) {
reader ??= (await stream(start, end)).getReader()
const {done, value} = await reader.read()
cb(done ^ 1, value)
}
}
},
slice (start, end) {
const [baseStart] = this.span
return {
span: [baseStart + start, baseStart + end],
getReader: this.getReader,
slice: this.slice,
}
}
}
return blob
}currently problematic to do: also need to handle properly clone, serialize & deserialize, if this where to be sent of to another worker - then i would transfer a MessageChannel where the worker thread asks main frame to hand back a transferable ReadableStream when it needs to read something. but there are probably better ways to handle this internally in core with piping data directly to and from different destinations without having to touch the js runtime? - if only getReader could return the reader directly instead of needing to read from the ReadableStream using js? |
lib/internal/fs/cp/cp-sync.js
Outdated
| let srcIsDir; | ||
| const vfsStat = vfsState.handlers?.statSync(src); | ||
| if (vfsStat) { | ||
| srcIsDir = vfsStat.isDirectory(); | ||
| } else { | ||
| srcIsDir = fsBinding.internalModuleStat(src) === 1; | ||
| } |
There was a problem hiding this comment.
nit
| let srcIsDir; | |
| const vfsStat = vfsState.handlers?.statSync(src); | |
| if (vfsStat) { | |
| srcIsDir = vfsStat.isDirectory(); | |
| } else { | |
| srcIsDir = fsBinding.internalModuleStat(src) === 1; | |
| } | |
| const srcIsDir = vfsState.handlers?.statSync(src)?.isDirectory() ?? fsBinding.internalModuleStat(src) === 1; |
lib/internal/fs/cp/cp.js
Outdated
| let srcIsDir; | ||
| if (vfsState.handlers !== null && vfsState.handlers.statSync(src)) { | ||
| srcIsDir = (await stat(src)).isDirectory(); | ||
| } else { | ||
| srcIsDir = fsBinding.internalModuleStat(src) === 1; | ||
| } |
There was a problem hiding this comment.
| let srcIsDir; | |
| if (vfsState.handlers !== null && vfsState.handlers.statSync(src)) { | |
| srcIsDir = (await stat(src)).isDirectory(); | |
| } else { | |
| srcIsDir = fsBinding.internalModuleStat(src) === 1; | |
| } | |
| const srcIsDir = vfsState.handlers?.statSync(src)?.isDirectory() ?? fsBinding.internalModuleStat(src) === 1; |
lib/internal/vfs/providers/memory.js
Outdated
| getContentSync() { | ||
| if (this.contentProvider !== null) { | ||
| const result = this.contentProvider(); | ||
| if (typeof result?.then === 'function') { |
There was a problem hiding this comment.
Since we're not dealing with user provided objects, it's probably better to check for actual promise objects:
| if (typeof result?.then === 'function') { | |
| if (isPromise(result)) { |
lib/vfs.js
Outdated
| function create(provider, options) { | ||
| // Handle case where first arg is options (no provider) | ||
| if (provider != null && | ||
| !(provider instanceof VirtualProvider) && |
There was a problem hiding this comment.
| !(provider instanceof VirtualProvider) && | |
| !FunctionPrototypeSymbolHasInstance(VirtualProvider, provider) && |
| const fs = require('fs'); | ||
| const assert = require('assert'); | ||
|
|
||
| // Test that the VFS is automatically mounted at /sea |
There was a problem hiding this comment.
On the edge case side, can we test/document what happens when switching drive letters in the working directory at runtime on Windows? My intuition is that it would break /sea reads because the normalized path would change from c:\sea\config.json to d:\sea\config.json (assuming we switch from C to D). But I'm not sure if that is the current or expected behavior.
| return this.lstatSync(path, options); | ||
| } | ||
|
|
||
| readdirSync(path, options) { |
There was a problem hiding this comment.
Is recursive option handled for readdir and readdirSync? Just making sure because it's missing in the @platformatic/vfs package.
|
Does this PR adhere to (a) Does not seem to apply: (b) Also doesn't apply because this is based on previous work, but no one can assert knowledge of the licensing terms of the code it is based on: (c) Does not apply:
I admire the aspiration behind the change, though |
|
@indutny, based on your question, you are asking if AI-assisted development is compatible with "Developer's Certificate of Origin 1.1". I'm not a lawyer, so I'm not prepared to answer that question from a legal standpoint. My understanding is that this PR aligns with the LF's recommendations (https://www.linuxfoundation.org/legal/generative-ai) and with the discussion in (openjs-foundation/cross-project-council#1509 (comment)). I would prefer to keep this technical thought. Would you mind opening an issue in the Node.js or TSC repository to discuss this matter? Is this a hard block? |
|
If that's not clear, I assert that I've followed the DCO. |
I'm neither, but if we find it hard to answer the questions in Certificate of Origin from a legal standpoint than at least it warrants further review and discussion.
I believe this discussion to be in direct relation to the content and nature of this contribution. I don't mind starting a separate conversation elsewhere, but we might as well have it here since in my opinion it has to be resolved before this change can be merged.
Although I'm not a fan of stating things strongly, yes. For me it is a hard block. (Please don't take any of my comments as a personal criticism. You have my deep respect for both the work you're doing and the interactions we had) |
|
While anyone can see your points, this my position on the DCO:
This contribution was created by me with massive help from @claude. The design of this feature is mostly mine, with a lot of inspiration from @Qard's earlier work that he shared with me (https://github.com/Qard/node/tree/vfs) after I opened the PR (which I'm open to add him as a Co-Authored-By in this PR). As the majority of the features in this project, the outcome is the result of all the contributors participating.
This contribution is massively based on Node.js itself. It exposes the same API as the All of this can be seen from the massive commit history in this branch. As it seems that you have a different position on this issue, I will bring this to the attention of the @nodejs/tsc for an official vote. On a side note, I would also officially summon the topic at the next OpenJS board meeting. On a personal note, I think this issue raises a different question: whether AI-assisted development is recognized as a practice when contributing to Open Source. And what would be the long-term impact for projects not accepting AI-assisted contributions? |
|
FYI there's a different PR discussing policy regarding AI-assisted/AI-generated contributions: #62105 |
I agree that part of the Claude's training data is Node.js itself, but it is also well known that it contains both unlicensed source code and source code with incompatible licenses. Producing code with LLM tools requires effort on the author side through writing prompts, but I view the generated code as unattributed and unlicensed material that shouldn't become part of the Node.js. (Especially given the size and scope of this change).
I appreciate it! |
Yes. The DCO is correctly applied here. It doesn't matter what tool we individually use to assist in writing the code. While AI agents are more advanced that the typical auto-complete / auto-suggest mechanisms in code editors they fall into the same basic category of coding assistant tools. The DCO does not assert that every line of code in the contribution was written specifically by hand by the person opening the PR .. otherwise we wouldn't be able to take tool-generated OpenSSL configurations or other automation-generated files in any PR. There's no issue here and no reason to block. By opening the PR, @mcollina is asserting that he is responsible for the code and has reviewed it himself, which I trust. |
Simplify srcIsDir checks in cp-sync.js and cp.js to one-liner with optional chaining and nullish coalescing. Use isPromise() instead of duck-typing in memory provider. Use FunctionPrototypeSymbolHasInstance instead of instanceof in vfs.js.
By the same logic I could claim that But anyway, I appreciate y'all initiating a vote and a formal discussion on this. |
That's a false equivalency. Auto-complete, templated generators, etc aren't just copying. If you can point to any specific lines of code in this PR that are copied verbatim from any source that is incompatible with our license and doesn't meet the common sense reasonable use standards (e.g. you can't copyright common patterns) then those should be called out specifically. |
Doesn't DCO exist partly to make sure that the burden of identifying potential license violations lies on the submitter of the proposed change? Is "copied verbatim" the minimum bar for plagiarism that we set for the contributions? Would exactly the same code with changed variable names or swapped if/else clauses be considered a different implementation? I think we can agree that at the very least these are highly debatable issues unlike the contributions written without LLM assistance that we have been encouraging and merging since the beginning of this project. |
|
The key part of the DCO assertion is this (in bold): "The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file" In other words, the contributor is asserting they have the right to submit. If agent output turns out to include material that violates someone's copyright, the contributor made a false certification; just like if they Take the use of AI out of it completely. What if I paid someone to write the code for me and then I opened the PR as my own submission. That would still be ok under the DCO and I would be just as liable for the IP assertion as when I write the code myself. The key bit is whether I have the right to submit it, not what tool was used. |
I agree.
You are right, but there is also an aspect of it where the certification was done, but the contributed code is in clear violation (e.g. has copy-pasted copyright headers incompatible with the license). This would be (and always were) grounds for rejecting the contribution despite the present certification.
If reviewer knows about this misrepresentation then it is at least ethically wrong to accept the contribution. |
|
I checked with legal and the foundation is fine with the DCO on AI-assisted contributions. We’ll work on getting this documented. |
|
If the Node.js community would not like to merge the improvement because is AI-driven, we will be very happy to have it on Edge.js @mcollina ❤️ I'm hoping we can all find alignment that writing code with AI assistance is becoming the norm now. If Node.js doesn't embrace it fully, it may be left behind (just because it may lack a good iteration speed vs other projects) |
Yes. Which goes back to what I said previously: if there's any specific part of this contribution that appears to have been inappropriately copied from code with an incompatible license, then that should be called out. Blocking as a general rule because it might-be-but-we-don't-know-for-sure is not valid. |
This is what I'm doing here. Calling out LLM-generated code as re-phrasing of other software's code. It is known that LLMs produce verbatim copies of work they are trained on so I cannot rule out that this hasn't happened in this pull request. |
Code Provenance Analysis: PR #61478 (Virtual File System)BackgroundPR #61478 adds a Virtual File System (~19k lines across 76 files) to Node.js. The author (@mcollina) disclosed it was "created by me with massive help from @claude." A reviewer raised concerns that LLM-generated code might contain verbatim or near-verbatim copies from copyrighted code with incompatible licenses. MethodologyEvery new file and every modification in the PR was read and analyzed for:
Accounting for Linter NormalizationNode.js has one of the most aggressive linters in open source. It forces:
This means surface-level style conformity with Node.js conventions is NOT evidence of originality -- the linter would force any code into that style. The analysis therefore focuses on linter-proof structural indicators: algorithm structure, data model choices, method decomposition, architecture, constants, edge case handling, and integration points. FindingsNo evidence of copied code was found.1. Algorithm Structure Differs from All Known LibrariesPath resolution ( Write buffer expansion ( Symlink loop detection: The PR implements proper ELOOP detection with
2. Data Model Is Structurally Different from All Known Libraries
The 3. Architecture Has No PrecedentThe handler-registration pattern (
The mount-point routing + overlay mode + virtual cwd + module loader hooks + 4. Specific Constants Match Node.js Core, Not Any Library
5. Deliberate Omissions Inconsistent with CopyingIf code were copied from a mature library and adapted, you'd expect it to retain capabilities from the source. This implementation lacks:
These omissions are consistent with a minimal implementation built for a specific purpose, not with code copied from a more complete library. 6. No Distinctive Naming from Any Known LibraryZero occurrences of: 7. No License Headers, Attribution, URLs, or External ReferencesZero copyright notices, license headers (MIT, Apache, BSD, ISC, SPDX), 8. Integration Uses Node.js-Internal APIs That Don't Exist ExternallySeveral integration points use APIs that only exist inside Node.js core, making direct copy-paste from external libraries impossible for these sections (though an AI trained on Node.js core could generate code using these APIs independently):
9. Vestigial Code Indicates Iterative Development, Not Copy-PasteThe dead ternary at ConclusionAfter thorough analysis accounting for linter normalization, no evidence was found that this PR contains code copied verbatim or near-verbatim from any other open-source project. The algorithm structures, data models, architectural patterns, constant values, capability omissions, user-defined identifiers (class/method/variable names, which the linter does not alter), and internal API usage are all inconsistent with code derived from memfs, mock-fs, unionfs, fs-monkey, BrowserFS, or any other known VFS library. The evidence points to original code purpose-built for Node.js core integration, with AI assistance for rapid generation of repetitive boilerplate. The reviewer's concern that "it cannot be confirmed" the code is clean is technically true of any contribution -- human-written or AI-assisted. But the specific, concrete, linter-proof structural evidence here is that this code does not structurally resemble any known VFS library from which it could have been copied. Update: Corrected a few factual errors:
|
|
@jasnell you can't be serious... responding to concerns about LLM use for creating large code change with several pages of LLM analysis of the PR? |
|
Is anything about the analysis incorrect? I'm happy to make corrections in it as necessary. |
|
I'll let others decide. I hope the comments provided here express my point of view succinctly, so I'll try avoid thrashing this PR further :-) |
A first-class virtual file system module (
node:vfs) with a provider-based architecture that integrates with Node.js's fs module and module loader.Key Features
Provider Architecture - Extensible design with pluggable providers:
MemoryProvider- In-memory file system with full read/write supportSEAProvider- Read-only access to Single Executable Application assetsVirtualProvider- Base class for creating custom providersStandard fs API - Uses familiar
writeFileSync,readFileSync,mkdirSyncinstead of custom methodsMount Mode - VFS mounts at a specific path prefix (e.g.,
/virtual), clear separation from real filesystemModule Loading -
require()andimportwork seamlessly from virtual filesSEA Integration - Assets automatically mounted at
/seawhen running as a Single Executable ApplicationFull fs Support - readFile, stat, readdir, exists, streams, promises, glob, symlinks
Example
SEA Usage
When running as a Single Executable Application, bundled assets are automatically available:
Public API
Disclaimer: I've used a significant amount of Claude Code tokens to create this PR. I've reviewed all changes myself.
Fixes #60021