Skip to content

Backend neutral API services#70

Merged
solsson merged 52 commits intomainfrom
backend-neutral-api-services
Mar 26, 2026
Merged

Backend neutral API services#70
solsson merged 52 commits intomainfrom
backend-neutral-api-services

Conversation

@solsson
Copy link
Copy Markdown
Collaborator

@solsson solsson commented Mar 16, 2026

  • adds y-s3-api.blobs.svc.cluster.local
  • adds y-boostrap.kafka.svc.cluster.local
  • the blobstore backend (VersityGW by default) now runs in the blobs namespace.
  • kafka with backend (Redpanda) now runs in kafka.
  • new convention for topic create: use http://y-kustomize.ystack.svc.cluster.local/v1/blobs/setup-bucket-job/base-for-annotations.yaml as resource
  • new convention for bucket create: use http://y-kustomize.ystack.svc.cluster.local/v1/kafka/setup-topic-job/base-for-annotations.yaml
  • y-cluster-converge-ystack should now represent a rule set for yolean.se/module-part labels and ordered bases in ./k3s/
  • y-cluster-sudoers now seems to handle sudo-rs well enough
  • y-kubefwd adds --domain for non-local contexts, for http://y-skaffold.ystack.prod1 etc.

solsson and others added 30 commits March 25, 2026 07:15
Adds a readiness check after cluster creation to prevent TLS handshake
timeouts when kubectl apply runs before k3s has finished initializing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Allows skipping the image cache and containerd load steps while still
running converge, useful when Docker networking is flaky or images are
already loaded.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Introduce y-s3-api.blobs:80 (ExternalName to minio or versitygw) and
y-bootstrap.kafka:9092 (ClusterIP selecting redpanda pods) so consumers
don't need to know which implementation backs S3 or Kafka.

Namespace resources are now managed in dedicated nn-namespace-* bases
(00-ystack, 01-blobs, 02-kafka, 03-monitoring) instead of being bundled
with workload bases, preventing accidental namespace deletion on
kubectl delete -k.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
SWS (static-web-server) serves kustomize base files from secrets.
When versitygw is installed it creates the secret
y-kustomize.blobs.setup-bucket-job, mounted into SWS at
/blobs/setup-bucket-job/. Consumers reference individual resources:

  resources:
  - http://y-kustomize.ystack.svc.cluster.local/blobs/setup-bucket-job/setup-bucket-job.yaml

The setup-bucket-job uses y-s3-api.blobs abstraction so consumers
don't need to know the S3 backend. Kustomize treats HTTP directory
URLs as git repos, so individual file URLs are used instead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Versioned base URLs at /v1/blobs/setup-bucket-job/
- setup-bucket-job.yaml now includes a credentials Secret (name: bucket)
  alongside the Job, so consumers get endpoint+creds after setup
- builds-registry-versitygw adapted: reads S3 config from
  builds-registry-bucket secret instead of hardcoded blobs-versitygw
- No longer depends on registry/generic,versitygw; uses registry/generic
  + versitygw/defaultsecret + y-kustomize HTTP resource directly
- SWS: enable symlinks and hidden files for k8s secret mounts, add
  --health flag

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Canonical URL: /v1/{category}/{job}/base-for-annotations.yaml
- Rename versitygw secret from "minio" to "versitygw-server" with
  root-prefixed keys to clarify these are admin credentials
- Move per-generator disableNameSuffixHash from global generatorOptions
- Add y-kustomize/openapi/openapi.yaml (OpenAPI 3.1) specifying the
  API contract for any y-kustomize implementation
- Add TODO_VALIDATE.md with design for spec-based validation job

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Run y-k8s-ingress-hosts after y-kustomize HTTPRoute is created so
kustomize can resolve the hostname. Curl-based readiness loops with
short timeouts gate any step that depends on y-kustomize HTTP resources.

After versitygw creates the blobs secret, restart y-kustomize and
wait for the base-for-annotations.yaml endpoint before applying
builds-registry.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add yolean.se/module-part=gateway label to all Gateway, HTTPRoute,
GRPCRoute resources and the y-kustomize stack (Deployment, Service,
HTTPRoute).

Pass 1 applies only gateway-labeled resources from all bases using
label selector, then runs y-k8s-ingress-hosts to update /etc/hosts.
Pass 2 does the full apply including bases that depend on y-kustomize
HTTP resources being reachable from the host.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
y-k8s-ingress-hosts now reads yolean.se/override-ip from the ystack
gateway annotation, removing the need to pass -override-ip through
the call chain. The new --ensure flag combines check + write.

Converge persists YSTACK_OVERRIDE_IP env as a gateway annotation
after pass 1, then calls --ensure. Provision scripts set the env var
instead of managing hosts separately. Also adds 30s timeout on
y-kustomize host reachability and renders bases to temp files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- ingress-hosts --check now verifies IP matches, not just hostname
  presence, detecting stale entries from previous provisioners
- --ensure appends -write when check fails (was missing)
- Fix dry-run label filtering: drop --server-side from dry-run=client
  (incompatible combination caused all gateway resources to be skipped)
- Increase curl connect-timeout to 10s (macOS mDNS adds 5s delay for
  .cluster.local hostnames) and overall timeout to 60s
- Add --request-timeout=5s to kubectl calls in ingress-hosts
- Fix y-image-list-ystack to extract from BASES array (was grepping
  for removed apply_base function)
- Update multipass provision to use converge's built-in --ensure

Tested: full multipass provision converge completes successfully.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Create kafka/common with secretGenerator for y-kustomize.kafka.setup-topic-job,
mirroring the blobs pattern in versitygw/common. The base serves a Secret
(kafka-bootstrap with broker endpoint) and a Job (setup-topic using rpk).

Add k3s/09-kafka-common to deploy the secret in ystack namespace.
Update converge to apply it and validate y-kustomize serves kafka bases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
and not require the user running it to be in the sudo group.

Also gives up (once again) on sudo-rs,
as we still haven't found rules that work for us on that sudo impl.
- Move kafka/redpanda/kafka/kustomization.yaml to kafka/redpanda/ (depth-2 convention)
- Upgrade redpanda image to v24.2.14@sha256:a91cddd8a93181b85107a3cde0beebb
- Use fully qualified image name (docker.redpanda.com/redpandadata/redpanda)
- Add k3s/10-redpanda/ base with redpanda-image component
- Include redpanda in converge-ystack (deploy + rollout wait)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rename versitygw/ to blobs-versitygw/ and minio/ to blobs-minio/,
deploy to blobs namespace instead of ystack. y-s3-api becomes a
direct ClusterIP service selecting pods (removes ExternalName
indirection). Registry and other ystack consumers use
y-s3-api.blobs.svc.cluster.local.

- Remove intermediate blobs-versitygw/blobs-minio services
- Add k3s/09-blobs-common for y-kustomize secret in ystack namespace
- Remove versitygw from k3s/40-buildkit (not buildkit's concern)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
CPU limits cause throttling even when the node has spare capacity,
hurting latency and throughput without meaningful benefit.
Memory limits are kept.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Consolidate 23 k3s bases into 12 using numbered ranges (0*-6*),
rewrite converge script as a generic 10-step phase loop that
discovers bases by directory listing and uses label selectors
(config, services, gateway) instead of hardcoded base names.

Bases with -disabled suffix are skipped by convention.
Deferred bases (referencing y-kustomize HTTP) are detected
automatically and rendered/applied after y-kustomize restart.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Split 21-ystack-config into category-grouped bases so folder names
reflect the namespace relationship. Move namespace declarations from
individual resource yamls to kustomization.yaml.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…headers

Feature bases now own their target namespace, making them
portable across provisioners. Only k3s/60-builds-registry
retains namespace: ystack (cross-namespace override for
blobs-versitygw/defaultsecret, documented with comment).

Extract gateway/ feature base from k3s/20-ystack-core.

Add yaml-language-server schema comment and apiVersion/kind
to all kustomization files touched in this branch.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Namespace bases are not subject to consolidation — keep one per
namespace for clarity and to support selective provisioning.

Remove yaml-language-server schema directive from kustomize
Component (redpanda-image) since schemastore has no Component
schema.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Simplify to inline render-and-apply piping. The only base with
y-kustomize HTTP dependency (60-builds-registry) renders naturally
in step 8 after y-kustomize is restarted in step 7.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
with very strange json payloads on stdout

and we might not need emulation with the new cluster strategy
Mounted secrets refresh in-place so no restart is needed, supporting
repeated converge. Replace until-loops with curl --retry flags.
Drop step numbering. Add --teardown-prune flag to lima provision.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Consistent log attribution when scripts call each other.
Also fix setup-bucket job name in validate and update converge
completion message.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use static-web-server based example (two layers) instead of node app.
Verify build+push via registry API instead of deploying a pod.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Envoy proxy was needed before Traefik supported GRPCRoute.
Clients connect directly through the gateway on port 80 with
fallback to port 8547 for remote clusters using y-kubefwd.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add annotation-driven bucket-name to setup-bucket-job base so
consumers can use commonAnnotations instead of JSON patches.
Create registry/builds-bucket and registry/builds-topic feature
bases. Remove legacy generic,kafka (pixy-based notifications).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Switch from render pipe to kubectl apply -k. Only tolerate
"no objects passed to apply" from label-selector phases.
All other errors are now fatal.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
solsson and others added 20 commits March 25, 2026 07:15
Replace multi-pass label selector approach with single-pass converge
that waits for deployment rollouts between digit groups. Bootstrap
y-kustomize with empty secrets (09-*) so it can start before real
secrets arrive at 3*/4*. Split 20-ystack-core into 20-gateway and
29-y-kustomize. Rename bases to use consistent naming: httproute→gateway,
grpcroute→gateway, common→y-kustomize. Update all kustomization.yaml
references to match renamed paths.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace pod exec + wget with direct curl to registry service in
validate. Use faster retry params (20x2s) across both scripts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use &&/|| chains so curl failures flow into report instead of aborting.
Add retries to tags check. Extract REGISTRY_HOST variable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Avoids polluting .local DNS domain when forwarding from remote
clusters. Skipped when context is "local" or --domain is explicit.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The -E (preserve environment) flag is rejected by scoped NOPASSWD
sudoers rules. These scripts pass all needed config via command-line
flags, not environment variables, so -E is unnecessary.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Provision scripts (k3d, multipass, lima) accept --exclude=SUBSTRING
(default: monitoring) and forward it to y-cluster-converge-ystack which
filters out k3s bases matching the substring. YSTACK_OVERRIDE_IP env var
replaced by --override-ip flag on converge-ystack.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace hardcoded CRD waits (gateway, prometheus-operator) with
kubectl wait --for=condition=Established crd --all. Replace hardcoded
namespace list for rollout status with dynamic discovery of namespaces
that have deployments. This makes --exclude work for any base without
hanging on missing CRDs or namespaces.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…delay

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Provisions an Ubuntu cloud image VM with k3s via QEMU/KVM.
Uses cloud-init for SSH access, port-forwarding for API server.
Runs y-cluster-converge-ystack with Gateway API and y-k8s-ingress-hosts,
followed by y-cluster-validate-ystack.

The VM disk can be exported as a VMware appliance via --export-vmdk.

Prerequisites: qemu-system-x86 qemu-utils cloud-image-utils, /dev/kvm

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Detection order: qemu (if qemu-system-x86_64, qemu-img, cloud-localds,
and /dev/kvm are all present) > multipass > k3d (docker).

Also add ystack-qemu to y-cluster-local-detect for teardown support.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… installed

Instead of failing, print SKIP when the monitoring namespace doesn't
exist. This avoids false failures when converge excludes monitoring.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Curl each base YAML and dry-run apply to verify y-kustomize is serving
valid resources. Runs after registry checks since registry being up
implies the bases should be available.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Adds ls and cat subcommands modeled after gcloud storage ls/cat.
Uses context:// URL format (e.g. local://bucket/key) instead of gs://.
Supports ls -l for long listing with size and timestamp.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix Distribution v3 config mount path (/etc/distribution not /etc/docker/registry)
and add S3 storage driver + cache settings to generic config.yml so the registry
actually uses VersityGW for persistence instead of emptyDir.

Add port 80/443 forwarding to QEMU provisioner so kustomize HTTP resource
fetches and host curl reach Traefik inside the VM. Require
ip_unprivileged_port_start<=80 with clear error and fix instructions.

Make validate-ystack robust with a restart cycle: run all pre-build checks,
restart versitygw + registry, re-run checks — confirms state survives pod
replacement. Use kurl (K8s API proxy) instead of direct curl for service
checks so they work regardless of provisioner networking.

Tighten --exclude in converge-ystack to reject unknown values by validating
against 0N-namespace-* directories in k3s/.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the minio/mc one-shot pod with kubectl exec into the versitygw
pod for ~100x speedup (~65ms vs seconds). Add -r flag for recursive
listing (default is now direct children only). Output full context://
bucket/path URLs so ls entries can be copy-pasted directly into cat.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Switch from ghcr.io/yolean/redpanda:v24.2.14 to upstream v25.3.10.
Enable developer_mode to bypass the 1GB minimum memory check, allowing
--memory=512M with --reserve-memory=0M. Idle footprint: 11m CPU / 102Mi.

Tune for minimal dev use: group_topic_partitions=1, storage_min_free_bytes
10MB, log segments 16MB. Resource requests set to actual idle use (10m CPU,
200Mi memory), memory limit 600Mi, no CPU limit.

Add y-cluster-kafka script wrapping rpk via kubectl exec into the redpanda
pod. Subcommands: topic list, topic produce, topic consume, rpk.

Add kafka validation to y-cluster-validate-ystack: creates a test topic
via y-kustomize setup job, then verifies list, produce, and consume.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The unhelm'd templates from redpanda-5.7.34 used /bin/bash for the
configurator init container and lifecycle hooks, but v25.3.10 images
may not include bash or may have architecture issues with it. Switch
to /bin/sh for portability.

Also simplify configurator.sh by replacing bash arrays with plain
variables with replicas=1 the array indirection was unnecessary.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@solsson solsson force-pushed the backend-neutral-api-services branch from b49a6f5 to 2b094dd Compare March 25, 2026 06:15
solsson and others added 2 commits March 25, 2026 07:18
because we wanted to know that it's compatible with our setup and then keep using the same version as in prod
@solsson solsson merged commit ed7eae9 into main Mar 26, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant