Skip to content

Split maven artifacts into component libraries#4031

Merged
kddnewton merged 9 commits intoruby:mainfrom
headius:split-maven-artifacts
Mar 24, 2026
Merged

Split maven artifacts into component libraries#4031
kddnewton merged 9 commits intoruby:mainfrom
headius:split-maven-artifacts

Conversation

@headius
Copy link
Contributor

@headius headius commented Mar 23, 2026

This PR will move all Java-related components under the java/ dir and begin splitting up the aggregate prism-parser artifact into component libraries:

  • prism-parser-api, under java/api is the Loader API and support classes.
  • prism-parser-native, under java/native is the native JNI binding for the Prism shared library.
  • prism-parser-wasm under java/wasm is the WASM-based binding of the Prism library.

There will be at least two more components added to this list, either as part of this PR or as separate work.

  • prism-parser-complete, a build of the WASM binding that includes non-semantic information like comments and line numbers.

This will likely require some enhancements in the API module to support these other elements. This will also be the basis for the JRuby version of the Ruby-based Prism parser API.

  • prism-parser-native-<platform> or similar will handle shipping pre-built native binaries for the JNI backend.

* The Loader API lives under java/api.
* The current native endpoint for the Prism shared library lives
  under java/native.
* The WASM build and binding lives under java/wasm.

The libraries will be released together but can be developed and
snapshotted independently. Users that copy the source from the
previous java/ will want to grab both java/api/src/main/java and
java/native/src/main/java contents.
@headius headius changed the title Begin splitting the Java artifact into components Split maven artifacts into component libraries Mar 23, 2026
headius added 2 commits March 23, 2026 15:04
This uses the JRuby rake-maven-plugin to generate the templates
as part of the Maven build. The generated output for the Java
templates will be under java/api/target/generated-sources/java.
@eregon
Copy link
Member

eregon commented Mar 23, 2026

prism-parser-api will need to be templated with & without non-semantic fields, how are you thinking to handle that?
Different artifact names for the same pom.xml seems the easiest.

@headius
Copy link
Contributor Author

headius commented Mar 23, 2026

@eregon Perhaps a different artifact or perhaps a sibling API within the same artifact. I have not decided what would be cleanest.

It seems like the ideal case would be that the API just includes empty elements for non-semantic elements when those are not enabled by the backend.

@eregon
Copy link
Member

eregon commented Mar 23, 2026

For example modifying Loader to handle both "all fields" and "only semantic fields" will not fly, because the Java fields of the nodes need to reflect the set of fields chosen (or if always including all fields then it's a huge memory overhead).

@eregon
Copy link
Member

eregon commented Mar 23, 2026

A different artifact seems the cleanest to me, also because each of these 2 artifacts would then depend on either prism-parser-wasm or prism-parser-complete but not both.
And we wouldn't need to make compromises on flexibility during templating time like identifier types, see #4009 (comment)

@eregon
Copy link
Member

eregon commented Mar 23, 2026

prism-parser-native
prism-parser-native-<platform>
prism-parser-wasm
prism-parser-complete

Regarding naming, prism-parser-complete seems too unclear (e.g. is it like jruby-complete and effectively includes all jruby-parser-*? No it isn't), prism-parser-wasm-complete would be clearer.

But, since all artifacts are either "all fields" or "only-semantic fields" I think separate prefixes would be best.
Note we'll also need "native with all fields", unless we're OK to always use the slower WASM for that case.

So I'd suggest this:

prism-semantic-parser-{api,native,native-<platform>,wasm}
prism-full-parser-{api,native,native-<platform>,wasm}

@eregon
Copy link
Member

eregon commented Mar 23, 2026

In terms of dependencies would there be any between those artifacts? Or rather the users would pick:

  • An -api artifact
  • And choose one of -{native,native-<platform>,wasm}

@headius
Copy link
Contributor Author

headius commented Mar 23, 2026

Regarding naming, prism-parser-complete seems too unclear

We can bikeshed the naming later.

In terms of dependencies

Everyone on the Java side of things will use api and select one artifact that provides a parser backend. Those backends might be configured via SPI, but I haven't decided if that's worth it (really only useful if multiple backends will be used in a single JVM app, like JRuby's fallback on WASM, but that can be configured at a higher level).

@headius
Copy link
Contributor Author

headius commented Mar 23, 2026

Latest patch cleans up some path locations in the CI builds.

The "build java*" jobs use make which uses the rake-compiler JavaExtensionTask to build, but that plugin does not have a way to fetch Maven dependencies needed like JUnit and Chicory. The make build for the Java API should not be using the extension plugin (which is designed for building JRuby extensions that have no other dependencies) and instead should use the Maven build, but I'm still sorting out where that's all done.

@headius headius force-pushed the split-maven-artifacts branch 2 times, most recently from d5ab64f to 2b6c107 Compare March 23, 2026 20:41
@headius
Copy link
Contributor Author

headius commented Mar 23, 2026

Bit of a chicken and egg issue trying to get the generated parts of the Java build in place:

  • generate-sources phase for the java/api build generates the .java sources, but also generates the .c sources.
  • generate-resources phase for java/wasm needs the .wasm to be already built, which needs make to run.
  • make needs the generated .c sources.

I'll play with different configurations to figure out an appropriate sequence and hopefully get all those steps to work in both the maven builds and the rake builds.

@headius headius force-pushed the split-maven-artifacts branch 3 times, most recently from 6745e86 to d0d894c Compare March 23, 2026 21:23
@headius headius force-pushed the split-maven-artifacts branch from d0d894c to 1983383 Compare March 23, 2026 21:25
@eregon
Copy link
Member

eregon commented Mar 23, 2026

We can bikeshed the naming later.

#4031 (comment) is not just about the name of that one but the general organization, I think that makes sense to discuss now. It doesn't need to block this PR, but that's an important discussion and architecture point to decide early on.

@headius
Copy link
Contributor Author

headius commented Mar 23, 2026

is it like jruby-complete

The jruby-complete naming was done a long time ago. I probably wouldn't use that naming now.

So I'd suggest this

That's eight artifacts. If we also are publishing separate artifacts for every identifier form, that would be 24 artifacts.

I think that's overkill.

There are better ways to design the API to have optional fields in the AST, such as by having a set of full non-semantic AST subclass nodes that can be created by the full non-semantic parser build. The AST would look the same unless you require the non-semantic data, and if you don't it's the simpler API that doesn't include those fields.

@eregon
Copy link
Member

eregon commented Mar 23, 2026

That's eight artifacts.

Yes, and of course they can be created as needs arise, for now a subset is fine of course, but the general naming needs to take the fields distinction into account somehow.

If we also are publishing separate artifacts for every identifier form, that would be 24 artifacts.

No, semantic implies RubySymbol (or byte[] if you prefer that for JRuby), and non-semantic implies either j.l.String or byte[] (whatever is best for general API users, still unclear at this stage), so it's still just 8.
See #4009 (comment)

@kddnewton kddnewton added the java Pull requests that update Java code label Mar 23, 2026
There's a lot of chicken-and-egg issues with trying to have Maven
do all the steps for the Java artifact builds right now, so back
off and require that templates (and WASM) are generated before the
Maven builds of the relevant modules.
@headius headius force-pushed the split-maven-artifacts branch 2 times, most recently from 01093a6 to 0008f5e Compare March 23, 2026 22:28
@headius headius force-pushed the split-maven-artifacts branch from 0008f5e to f369cab Compare March 23, 2026 22:35
@headius headius force-pushed the split-maven-artifacts branch from 84bb9c9 to 32661ee Compare March 24, 2026 00:22
Comment on lines +691 to +693
"java/api/target/generated-sources/java/org/ruby_lang/prism/Loader.java",
"java/api/target/generated-sources/java/org/ruby_lang/prism/Nodes.java",
"java/api/target/generated-sources/java/org/ruby_lang/prism/AbstractNodeVisitor.java",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to put them under target/generated-sources instead of java/api/src/main/java?
I saw the docs mention mvn clean cleans that, but is that actually useful? There is rake clean and also the most common case is probably to run rake templates to update the generated files rather than cleaning.

This is the script used to import the .java files in TruffleRuby, it'd be nice if we don't have to manually merge from different folders.
I think it's also nicer in an editor when files in the same package are siblings on the filesystem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

target/generated-sources is the standard location for sources generated by the Maven build, but after I reverted to generating outside of the build I'm not sure if that fits. I'll look into alternatives.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right but the sources are not generated by the Maven build here, so it seems better under java/api/src/main/java/org/ruby_lang/prism

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Continued in #4039

@eregon
Copy link
Member

eregon commented Mar 24, 2026

The nesting is quite deep with Maven, e.g.
java/org/ruby_lang/prism/ParseResult.java is moved to
java/api/src/main/java/org/ruby_lang/prism/ParseResult.java
that's 8 levels of nested folders before arriving to a source file (vs 4 before).

I looked and found there is:

<build>
    <sourceDirectory>${project.basedir}/src</sourceDirectory>
</build>

So it'd be java/api/src/org/ruby_lang/prism/ParseResult.java which would be nice since the main/java parts are purely redundant.

Could you try that if it doesn't cause any issue with the Maven plugins used?

@eregon
Copy link
Member

eregon commented Mar 24, 2026

Bit of a chicken and egg issue trying to get the generated parts of the Java build in place:

What you landed on looks good to me, i.e. require to run rake before.
That way there is no duplication and there is no need for Prism contributors to install Maven (if some Java-related CI job would fail), etc. Keeping to use Rake::JavaExtensionTask is good because it's low friction and ensures there are zero dependencies (on Maven packages).

@kddnewton kddnewton marked this pull request as ready for review March 24, 2026 11:08
@kddnewton kddnewton merged commit a98ab82 into ruby:main Mar 24, 2026
68 checks passed
@headius headius deleted the split-maven-artifacts branch March 24, 2026 15:48
@headius
Copy link
Contributor Author

headius commented Mar 24, 2026

that's 8 levels of nested folders

The layout I used is standard Maven layout requiring no configuration.

  • One of those levels is for splitting up the components, so that's unavoidable.
  • The src path is required either way to separate sources from other elements.

So only main/java is really extra, and only for the non-wasm sources. In the wasm component, all levels are being used:

  • src/main/java for Java sources
  • src/main/java-templates for the generated sources
  • src/test/java for the Java tests
  • src/test/resources for the WASM build

If we generate sources into src/main/java-templates (perhaps preferable to target/generated-sources) and add src/test/java for some minimal unit tests, the divisions are no longer redundant. I'd rather leave the layout like this than have to add it back again later.

it'd be nice if we don't have to manually merge from different folders.

The generated C sources do go into src, but they are not versioned like the Java sources used to be:

[] prism $ ls -1 templates/src
diagnostic.c.erb
json.c.erb
node.c.erb
prettyprint.c.erb
serialize.c.erb
tokens.c.erb
[] prism $ diff <(git ls-files src) <(ls src/*)
4a5
> src/diagnostic.c
6a8
> src/json.c
9a12
> src/node.c
11a15
> src/prettyprint.c
13a18
> src/serialize.c
19a25
> src/tokens.c

Perhaps it would be easiest for you if the TR test build saved an archive of sources?

I think it's also nicer in an editor when files in the same package are siblings on the filesystem.

Actually, the -native and -wasm sources should be in subpackages, because JPMS does not allow separate modules to have files in the same package.

That way there is no duplication and there is no need for Prism contributors to install Maven

We can set up a ./mvnw Maven wrapper to avoid anyone having to install it.

Keeping to use Rake::JavaExtensionTask is good because it's low friction and ensures there are zero dependencies (on Maven packages)

There may be test-time dependencies in the future, such as on JUnit for testing. JavaExtensionTask will neither be able to build nor run those tests. It already can't verify the wasm component because it knows nothing of dependencies on JUnit and JRuby for testing nor Chicory for the build. It's just inadequate for non-trivial Java projects.

Verification that everything builds and tests should be done by a Maven build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

java Pull requests that update Java code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants