feat(identifier): denormalize base_type onto identifier table#35164
feat(identifier): denormalize base_type onto identifier table#35164
Conversation
…ry performance Adds a base_type (INT4) column to the identifier table that mirrors structure.structuretype, eliminating structure joins from hot-path queries. - Identifier bean, factory, transformer updated to read/write base_type - base_type set automatically on createNewIdentifier for all contentlet types - Task260331AddBaseTypeColumnToIdentifier: adds column + index at startup - PopulateIdentifierBaseTypeJob/Util: backfills existing rows in batches of 500 with 250ms sleep between batches to avoid lock pressure on large tables; fires immediately at startup, self-unschedules when migration is complete - postgres.sql updated with column and index for fresh installs - Integration tests for startup task, Quartz job, util, and IdentifierFactory Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Claude finished @wezell's task in 3m 11s —— View job PR Review: feat(identifier): denormalize base_type onto identifier table
3 issues worth a second look: 1.
|
…-migration Previously forceRun() returned false once idx_identifier_base_type existed, permanently abandoning millions of NULL base_type rows if the server restarted mid-backfill. Fix: also return true when hasPendingRows() detects identifier rows with null base_type and non-null asset_subtype. DDL uses IF NOT EXISTS so re-running executeUpgrade() is always safe. Tests: replace incorrect false-when-index-exists assertion with two cases: - returns false only when index exists AND no pending rows remain - returns true when index exists but backfill is incomplete (restart scenario) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…n is confirmed complete Previously removeJob() was called unconditionally after populate(), which caused the job to unschedule itself even when interrupted mid-run with rows still pending. Now hasPendingRows() is checked after populate() and removeJob() is only called when the backfill is confirmed finished. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…startup errors 1. hasPendingRows() now joins structure so orphaned identifiers (asset_subtype with no matching structure entry) cannot keep the backfill job running indefinitely. The populate loop already skips them via inner join; the sentinel must match. 2. forceRun() now returns true on DotDataException instead of false so a transient DB error during startup does not permanently skip the migration. executeUpgrade() is idempotent (IF NOT EXISTS) so re-running is always safe. 3. Added integration test covering the orphaned-row case for hasPendingRows(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Make hasPendingRows() public — called from Task260331 in a different package - Fix DotConnect.executeUpdate() arg order: connection comes first, not second - Remove unused @WrapInTransaction import (replaced by manual connection management) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…entifier() The ContentTypeAPI lookup that resolves a missing base_type was running on every saveIdentifier() call regardless of whether it was an INSERT or UPDATE. INSERT callers always provide base_type (set in createNewIdentifier), so the guard was only ever needed for UPDATE — where a stale cached Identifier loaded before the backfill ran could write null back over a freshly populated value. Moving the guard inside the isIdentifier==true branch eliminates the API+DB call on all INSERT paths, making bulk operations (imports, publishing pipelines) that create many new identifiers unaffected during the migration window. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…up tasks Task260403SetPermissionReferenceUnlogged: - Converts permission_reference to UNLOGGED via ALTER TABLE SET UNLOGGED - Table is a regenerable permission cache; crash-safety is not required - Eliminates WAL write cost on every permission rebuild INSERT/DELETE - forceRun() checks pg_class.relpersistence to skip if already unlogged Task260403SetLz4CompressionOnTextColumns: - Sets LZ4 compression on all text/bytea/jsonb/json columns in public schema - Queries pg_attribute dynamically — no hardcoded column list, covers plugins - Skips columns already using LZ4 (attcompression='l') for idempotency - Continues on per-column failure so one bad column does not block all others - Gracefully skips on PostgreSQL < 14 (attcompression column absent) - LZ4 decompresses 3-5x faster than pglz; affects future writes only postgres.sql: - permission_reference CREATE TABLE now uses UNLOGGED - LZ4 SET COMPRESSION block added for all eligible columns (fresh installs) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…on startup tasks" This reverts commit ac32518.
Summary
base_type INT4column +idx_identifier_base_typeindex to theidentifiertable, mirroringstructure.structuretypeIdentifierbean,IdentifierFactoryImpl, andIdentifierTransformerupdated to read/writebase_typeon every saveTask260331AddBaseTypeColumnToIdentifieradds the column and index at startup (DDL only — fast), then schedules the backfill jobPopulateIdentifierBaseTypeJob/PopulateIdentifierBaseTypeUtilbackfill existing rows in batches of 500 (250 ms sleep between batches), fire immediately at startup, and self-unschedule when completepostgres.sqlupdated for fresh installsTest plan
Task260331AddBaseTypeColumnToIdentifierTest— column/index created, idempotent, correct metadataPopulateIdentifierBaseTypeUtilTest— sets correct base type per content type, ignores folders (null asset_subtype), returns 0 on second runPopulateIdentifierBaseTypeJobTest— schedules, skips when no work pending, removes itselfIdentifierFactoryTest(5 new methods) — base_type persisted for CONTENT and FILEASSET, survives re-save, transformer maps it, folders get nullfixes #35162