Iceberg stress testing and performance tuning by kbatuigas · Pull Request #1599 · redpanda-data/docs

kbatuigas · 2026-03-04T23:33:32Z

Description

This pull request introduces significant improvements to the Iceberg documentation by splitting out performance tuning and troubleshooting content into dedicated pages, enhancing navigation, and updating cross-references for clarity. The changes make it easier for users to find best practices for optimizing and troubleshooting Iceberg topics, and ensure that documentation links are accurate and up-to-date.

Documentation Restructuring and Navigation:

Added new navigation entries for "Tune Performance" and "Troubleshoot" under the Iceberg section in nav.adoc, improving discoverability of these topics.
Updated links in the "About Iceberg Topics" page to reference the new dedicated performance tuning and troubleshooting pages, and removed the inlined troubleshooting and performance sections for better modularity.

New and Updated Content:

Created a new iceberg-performance-tuning.adoc page with comprehensive guidance on partitioning, compaction, flush threshold and lag tuning, cluster sizing, and monitoring translation performance for Iceberg topics.
- This standalone page now hosts content currently found in the About Iceberg Topics > Performance considerations section.
Created a new standalone iceberg-troubleshooting.adoc page that hosts content currently at About Iceberg Topics > Troubleshoot errors. Added list of relevant metrics.

Resolves https://redpandadata.atlassian.net/browse/
Review deadline:

Page previews

Tune Performance for Iceberg Topics
Troubleshoot Iceberg Topics

Checks

New feature
Content gap
Support Follow-up
Small fix (typos, links, copyedits, etc)

netlify · 2026-03-04T23:33:37Z

✅ Deploy Preview for redpanda-docs-preview ready!

Name	Link
🔨 Latest commit	`d9955d4`
🔍 Latest deploy log	https://app.netlify.com/projects/redpanda-docs-preview/deploys/69d80a4cd0d81d000894943c
😎 Deploy Preview	https://deploy-preview-1599--redpanda-docs-preview.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

coderabbitai · 2026-03-04T23:33:43Z

📝 Walkthrough

Walkthrough

This PR reorganizes Iceberg documentation by extracting troubleshooting content and performance tuning guidance from about-iceberg-topics.adoc into two new dedicated pages: iceberg-troubleshooting.adoc and iceberg-performance-tuning.adoc. The troubleshooting page documents dead-letter queue handling, invalid record processing, and diagnostic workflows. The performance page covers custom partitioning, table compaction, and translation throughput tuning. Cross-references in the configuration property overrides file and four existing documentation pages are updated to point to the new documentation locations.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

docs(iceberg): Add DLQ table retention note to Iceberg data retention section #1597: Modifies DLQ-related documentation in about-iceberg-topics.adoc, which is directly affected by the DLQ content relocation in this PR.
26.1 fixes #1646: Updates to the same Iceberg documentation files with overlapping structural changes.
Iceberg data retention #1171: Adds content to about-iceberg-topics.adoc, which is reorganized and consolidated in this PR.

Suggested reviewers

andrwng
mattschumpert

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title describes documentation restructuring into performance tuning and troubleshooting pages, but 'stress testing' is not reflected in the actual changes provided, which focus on performance tuning and troubleshooting.	Clarify whether stress testing content is included in the changes. If not, revise the title to accurately reflect the actual content (e.g., 'Iceberg performance tuning and troubleshooting documentation').

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description is comprehensive and well-structured, covering the documentation restructuring, new content, and page previews. However, the Jira ticket reference is a placeholder and the review deadline is missing.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch DOC-1848-Document-feature-Iceberg-stress-testing

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (3)

modules/manage/pages/iceberg/specify-iceberg-schema.adoc (1)
63-63: Use empty-bracket xrefs for consistency with repo linking style.

Consider xref:manage:iceberg/iceberg-troubleshooting.adoc#dead-letter-queue[] and xref:manage:iceberg/iceberg-troubleshooting.adoc[] to avoid hard-coded labels.

Based on learnings, prefer AsciiDoc xref links with empty brackets to auto-resolve link titles from target pages.

Also applies to: 187-187
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/manage/pages/iceberg/specify-iceberg-schema.adoc` at line 63, Replace
hard-coded AsciiDoc xref labels with empty-bracket xrefs so titles auto-resolve:
change occurrences of
"xref:manage:iceberg/iceberg-troubleshooting.adoc#dead-letter-queue[dead-letter
queue]" and "xref:manage:iceberg/iceberg-troubleshooting.adoc[Iceberg
troubleshooting]" (or any other hard-coded label at the other occurrence) to
"xref:manage:iceberg/iceberg-troubleshooting.adoc#dead-letter-queue[]" and
"xref:manage:iceberg/iceberg-troubleshooting.adoc[]" respectively to follow the
repo linking style.
modules/manage/pages/iceberg/iceberg-topics-gcp-biglake.adoc (1)
249-249: Prefer title-derived xrefs instead of hard-coded link text.

Use xref:manage:iceberg/iceberg-troubleshooting.adoc#dead-letter-queue[] so link text stays aligned automatically if the target title changes.

Based on learnings, prefer AsciiDoc xref links with empty brackets so the referenced document title is rendered automatically.

Also applies to: 296-296
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/manage/pages/iceberg/iceberg-topics-gcp-biglake.adoc` at line 249,
Replace the hard-coded link text "dead-letter queue (DLQ) table" with an
AsciiDoc title-derived xref by using
xref:manage:iceberg/iceberg-troubleshooting.adoc#dead-letter-queue[] so the link
text is rendered from the target title; update the occurrence in the sentence
mentioning the iceberg_dlq_table_suffix property and also make the same change
at the other occurrence referenced (line ~296) to ensure both links use the
empty-bracket xref form.
modules/manage/pages/iceberg/iceberg-performance-tuning.adoc (1)
25-25: Consider normalizing xrefs to xref:...[] across the new page.

Using title-derived xrefs improves maintainability and keeps link text in sync with target page titles.

Based on learnings, AsciiDoc links in this repo should prefer xref:...[] so displayed text is pulled from the referenced document.

Also applies to: 33-33, 77-79, 108-108, 137-146
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modules/manage/pages/iceberg/iceberg-performance-tuning.adoc` at line 25,
Replace any xref that hardcodes link text with a title-derived xref so AsciiDoc
pulls the target page title; for example change
xref:manage:iceberg/about-iceberg-topics.adoc[About Iceberg Topics] to
xref:manage:iceberg/about-iceberg-topics.adoc[] and apply the same change to all
other explicit xref[...] instances on the iceberg-performance-tuning.adoc page
so displayed text is sourced from the referenced document.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@modules/manage/pages/iceberg/iceberg-performance-tuning.adoc`:
- Line 25: Replace any xref that hardcodes link text with a title-derived xref
so AsciiDoc pulls the target page title; for example change
xref:manage:iceberg/about-iceberg-topics.adoc[About Iceberg Topics] to
xref:manage:iceberg/about-iceberg-topics.adoc[] and apply the same change to all
other explicit xref[...] instances on the iceberg-performance-tuning.adoc page
so displayed text is sourced from the referenced document.

In `@modules/manage/pages/iceberg/iceberg-topics-gcp-biglake.adoc`:
- Line 249: Replace the hard-coded link text "dead-letter queue (DLQ) table"
with an AsciiDoc title-derived xref by using
xref:manage:iceberg/iceberg-troubleshooting.adoc#dead-letter-queue[] so the link
text is rendered from the target title; update the occurrence in the sentence
mentioning the iceberg_dlq_table_suffix property and also make the same change
at the other occurrence referenced (line ~296) to ensure both links use the
empty-bracket xref form.

In `@modules/manage/pages/iceberg/specify-iceberg-schema.adoc`:
- Line 63: Replace hard-coded AsciiDoc xref labels with empty-bracket xrefs so
titles auto-resolve: change occurrences of
"xref:manage:iceberg/iceberg-troubleshooting.adoc#dead-letter-queue[dead-letter
queue]" and "xref:manage:iceberg/iceberg-troubleshooting.adoc[Iceberg
troubleshooting]" (or any other hard-coded label at the other occurrence) to
"xref:manage:iceberg/iceberg-troubleshooting.adoc#dead-letter-queue[]" and
"xref:manage:iceberg/iceberg-troubleshooting.adoc[]" respectively to follow the
repo linking style.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 10ce3270-f91a-40f6-a356-52466a149ef5

📥 Commits

Reviewing files that changed from the base of the PR and between 2c85927 and d9955d4.

📒 Files selected for processing (6)

docs-data/property-overrides.json
modules/manage/pages/iceberg/about-iceberg-topics.adoc
modules/manage/pages/iceberg/iceberg-performance-tuning.adoc
modules/manage/pages/iceberg/iceberg-topics-gcp-biglake.adoc
modules/manage/pages/iceberg/iceberg-troubleshooting.adoc
modules/manage/pages/iceberg/specify-iceberg-schema.adoc

andrwng · 2026-04-10T00:15:57Z

modules/manage/pages/iceberg/iceberg-performance-tuning.adoc

+
+=== Cluster sizing and backpressure
+
+When you enable Iceberg for any substantial workload and start translating topic data to the Iceberg format, you may see most of your cluster's CPU utilization increase. If this additional workload overwhelms the brokers and causes the Iceberg table lag to exceed the configured target lag, Redpanda automatically applies backpressure to producers to prevent Iceberg tables from lagging further. This ensures that Iceberg tables keep up with the volume of incoming data, but sacrifices ingress throughput of the cluster.


IIRC we removed backpressure on the produce path because we saw it leading to incidents. We do automatically increase the scheduling priority of Iceberg writes if we have a large backlog though.

andrwng · 2026-04-10T00:20:21Z

modules/manage/pages/iceberg/iceberg-performance-tuning.adoc

+Over time, Iceberg translation can produce many small Parquet files, especially with low-throughput topics or short lag targets. Compaction merges small files into larger ones, reducing the number of metadata operations query engines must perform and improving read performance.
+
+* Managed service: Some managed query engines and data platforms, such as Snowflake and Databricks, automatically compact Iceberg tables.
+* Manual or scheduled compaction: Tools like https://spark.apache.org/[Apache Spark^] can run compaction jobs on a schedule. This is useful if your query engine does not compact automatically.


nit: typically it isn't the query engine service that does the automatic compaction. I know for instance that Glue does automatic compaction. Though manual compaction can be a function of query engines (like Spark)

andrwng · 2026-04-10T00:31:15Z

modules/manage/pages/iceberg/iceberg-performance-tuning.adoc

+To check the current values of key translation cluster properties:
+
+[,bash]
+----
+rpk cluster config get datalake_translator_flush_bytes
+rpk cluster config get iceberg_target_lag_ms
+rpk cluster config get iceberg_target_backlog_size
+----


Seems a bit odd to be in the section about monitoring?

Also might be worth mentioning how to get the topic properties, if we want to keep this somewhere?

andrwng · 2026-04-10T00:36:03Z

modules/manage/pages/iceberg/iceberg-troubleshooting.adoc

@@ -0,0 +1,118 @@
+= Troubleshoot Iceberg Topics


Just noting that there's also an admin endpoint for validating REST catalog connectivity: redpanda-data/redpanda#29677

ballard26 · 2026-04-10T04:06:24Z

modules/manage/pages/iceberg/iceberg-performance-tuning.adoc

+If query latency is a concern and your workload produces large messages, consider:
+
+* Reducing individual message sizes if your data model allows it.
+* Increasing `datalake_translator_flush_bytes` to produce Parquet files with more records per file.


The key cluster config properties they'll want to change are:

datalake_translator_flush_bytes - 32MiB default

iceberg_target_lag_ms - 1min default

Currently we'll translate records on a given partition for 30s. Afterwards we'll check if either the total translated size exceeds datalake_translator_flush_bytes or if the oldest translate record exceeds iceberg_target_lag_ms. If either are true we upload the parquet file.

This can mean that parquet files can be larger than 32MiB if the average message rate is 2 or more per 30s on a given partition. Even with the default cluster config values for datalake_translator_flush_bytes and iceberg_target_lag_ms.

However, this is a subtlety with our implementation and end users should increase datalake_translator_flush_bytes and iceberg_target_lag_ms so that they get a decent number of records per parquet file. A very rough est could be:

messages_per_file = min( datalake_translator_flush_bytes / message_size, throughput * iceberg_target_lag_ms )

All of this to say that we might want to include iceberg_target_lag_ms in the docs here lol.

kbatuigas requested a review from mattschumpert March 4, 2026 23:42

kbatuigas requested a review from ballard26 March 19, 2026 19:53

kbatuigas marked this pull request as ready for review March 24, 2026 20:40

kbatuigas requested a review from a team as a code owner March 24, 2026 20:40

kbatuigas changed the title ~~[26.1] Iceberg stress testing and performance tuning~~ Iceberg stress testing and performance tuning Mar 26, 2026

Base automatically changed from v-WIP/26.1 to main March 31, 2026 16:18

kbatuigas requested a review from andrwng April 9, 2026 15:59

kbatuigas added 5 commits April 9, 2026 13:19

Draft stress testing and tuning doc

5e7f3ed

Reorganize draft based on SME feedback

76d5823

Update Iceberg xrefs

8473997

Minor xref edit

d4ec3b6

Style improvements

d9955d4

kbatuigas force-pushed the DOC-1848-Document-feature-Iceberg-stress-testing branch from 0041b57 to d9955d4 Compare April 9, 2026 20:21

coderabbitai bot reviewed Apr 9, 2026

View reviewed changes

andrwng reviewed Apr 10, 2026

View reviewed changes

ballard26 reviewed Apr 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Iceberg stress testing and performance tuning#1599

Iceberg stress testing and performance tuning#1599
kbatuigas wants to merge 5 commits intomainfrom
DOC-1848-Document-feature-Iceberg-stress-testing

kbatuigas commented Mar 4, 2026 •

edited

Loading

Uh oh!

netlify bot commented Mar 4, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Mar 4, 2026 •

edited

Loading

Walkthrough

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Uh oh!

andrwng Apr 10, 2026

Uh oh!

andrwng Apr 10, 2026

Uh oh!

andrwng Apr 10, 2026

Uh oh!

andrwng Apr 10, 2026

Uh oh!

ballard26 Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		=== Cluster sizing and backpressure

		When you enable Iceberg for any substantial workload and start translating topic data to the Iceberg format, you may see most of your cluster's CPU utilization increase. If this additional workload overwhelms the brokers and causes the Iceberg table lag to exceed the configured target lag, Redpanda automatically applies backpressure to producers to prevent Iceberg tables from lagging further. This ensures that Iceberg tables keep up with the volume of incoming data, but sacrifices ingress throughput of the cluster.

Conversation

kbatuigas commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Page previews

Checks

Uh oh!

netlify bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for redpanda-docs-preview ready!

Uh oh!

coderabbitai bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

andrwng Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

andrwng Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

andrwng Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

andrwng Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

ballard26 Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kbatuigas commented Mar 4, 2026 •

edited

Loading

netlify bot commented Mar 4, 2026 •

edited

Loading

coderabbitai bot commented Mar 4, 2026 •

edited

Loading