Iceberg stress testing and performance tuning#1599
Conversation
✅ Deploy Preview for redpanda-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
📝 WalkthroughWalkthroughThis PR reorganizes Iceberg documentation by extracting troubleshooting content and performance tuning guidance from Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
0041b57 to
d9955d4
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (3)
modules/manage/pages/iceberg/specify-iceberg-schema.adoc (1)
63-63: Use empty-bracket xrefs for consistency with repo linking style.Consider
xref:manage:iceberg/iceberg-troubleshooting.adoc#dead-letter-queue[]andxref:manage:iceberg/iceberg-troubleshooting.adoc[]to avoid hard-coded labels.Based on learnings, prefer AsciiDoc xref links with empty brackets to auto-resolve link titles from target pages.
Also applies to: 187-187
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modules/manage/pages/iceberg/specify-iceberg-schema.adoc` at line 63, Replace hard-coded AsciiDoc xref labels with empty-bracket xrefs so titles auto-resolve: change occurrences of "xref:manage:iceberg/iceberg-troubleshooting.adoc#dead-letter-queue[dead-letter queue]" and "xref:manage:iceberg/iceberg-troubleshooting.adoc[Iceberg troubleshooting]" (or any other hard-coded label at the other occurrence) to "xref:manage:iceberg/iceberg-troubleshooting.adoc#dead-letter-queue[]" and "xref:manage:iceberg/iceberg-troubleshooting.adoc[]" respectively to follow the repo linking style.modules/manage/pages/iceberg/iceberg-topics-gcp-biglake.adoc (1)
249-249: Prefer title-derived xrefs instead of hard-coded link text.Use
xref:manage:iceberg/iceberg-troubleshooting.adoc#dead-letter-queue[]so link text stays aligned automatically if the target title changes.Based on learnings, prefer AsciiDoc xref links with empty brackets so the referenced document title is rendered automatically.
Also applies to: 296-296
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modules/manage/pages/iceberg/iceberg-topics-gcp-biglake.adoc` at line 249, Replace the hard-coded link text "dead-letter queue (DLQ) table" with an AsciiDoc title-derived xref by using xref:manage:iceberg/iceberg-troubleshooting.adoc#dead-letter-queue[] so the link text is rendered from the target title; update the occurrence in the sentence mentioning the iceberg_dlq_table_suffix property and also make the same change at the other occurrence referenced (line ~296) to ensure both links use the empty-bracket xref form.modules/manage/pages/iceberg/iceberg-performance-tuning.adoc (1)
25-25: Consider normalizing xrefs toxref:...[]across the new page.Using title-derived xrefs improves maintainability and keeps link text in sync with target page titles.
Based on learnings, AsciiDoc links in this repo should prefer
xref:...[]so displayed text is pulled from the referenced document.Also applies to: 33-33, 77-79, 108-108, 137-146
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modules/manage/pages/iceberg/iceberg-performance-tuning.adoc` at line 25, Replace any xref that hardcodes link text with a title-derived xref so AsciiDoc pulls the target page title; for example change xref:manage:iceberg/about-iceberg-topics.adoc[About Iceberg Topics] to xref:manage:iceberg/about-iceberg-topics.adoc[] and apply the same change to all other explicit xref[...] instances on the iceberg-performance-tuning.adoc page so displayed text is sourced from the referenced document.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@modules/manage/pages/iceberg/iceberg-performance-tuning.adoc`:
- Line 25: Replace any xref that hardcodes link text with a title-derived xref
so AsciiDoc pulls the target page title; for example change
xref:manage:iceberg/about-iceberg-topics.adoc[About Iceberg Topics] to
xref:manage:iceberg/about-iceberg-topics.adoc[] and apply the same change to all
other explicit xref[...] instances on the iceberg-performance-tuning.adoc page
so displayed text is sourced from the referenced document.
In `@modules/manage/pages/iceberg/iceberg-topics-gcp-biglake.adoc`:
- Line 249: Replace the hard-coded link text "dead-letter queue (DLQ) table"
with an AsciiDoc title-derived xref by using
xref:manage:iceberg/iceberg-troubleshooting.adoc#dead-letter-queue[] so the link
text is rendered from the target title; update the occurrence in the sentence
mentioning the iceberg_dlq_table_suffix property and also make the same change
at the other occurrence referenced (line ~296) to ensure both links use the
empty-bracket xref form.
In `@modules/manage/pages/iceberg/specify-iceberg-schema.adoc`:
- Line 63: Replace hard-coded AsciiDoc xref labels with empty-bracket xrefs so
titles auto-resolve: change occurrences of
"xref:manage:iceberg/iceberg-troubleshooting.adoc#dead-letter-queue[dead-letter
queue]" and "xref:manage:iceberg/iceberg-troubleshooting.adoc[Iceberg
troubleshooting]" (or any other hard-coded label at the other occurrence) to
"xref:manage:iceberg/iceberg-troubleshooting.adoc#dead-letter-queue[]" and
"xref:manage:iceberg/iceberg-troubleshooting.adoc[]" respectively to follow the
repo linking style.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 10ce3270-f91a-40f6-a356-52466a149ef5
📒 Files selected for processing (6)
docs-data/property-overrides.jsonmodules/manage/pages/iceberg/about-iceberg-topics.adocmodules/manage/pages/iceberg/iceberg-performance-tuning.adocmodules/manage/pages/iceberg/iceberg-topics-gcp-biglake.adocmodules/manage/pages/iceberg/iceberg-troubleshooting.adocmodules/manage/pages/iceberg/specify-iceberg-schema.adoc
|
|
||
| === Cluster sizing and backpressure | ||
|
|
||
| When you enable Iceberg for any substantial workload and start translating topic data to the Iceberg format, you may see most of your cluster's CPU utilization increase. If this additional workload overwhelms the brokers and causes the Iceberg table lag to exceed the configured target lag, Redpanda automatically applies backpressure to producers to prevent Iceberg tables from lagging further. This ensures that Iceberg tables keep up with the volume of incoming data, but sacrifices ingress throughput of the cluster. |
There was a problem hiding this comment.
IIRC we removed backpressure on the produce path because we saw it leading to incidents. We do automatically increase the scheduling priority of Iceberg writes if we have a large backlog though.
| Over time, Iceberg translation can produce many small Parquet files, especially with low-throughput topics or short lag targets. Compaction merges small files into larger ones, reducing the number of metadata operations query engines must perform and improving read performance. | ||
|
|
||
| * Managed service: Some managed query engines and data platforms, such as Snowflake and Databricks, automatically compact Iceberg tables. | ||
| * Manual or scheduled compaction: Tools like https://spark.apache.org/[Apache Spark^] can run compaction jobs on a schedule. This is useful if your query engine does not compact automatically. |
There was a problem hiding this comment.
nit: typically it isn't the query engine service that does the automatic compaction. I know for instance that Glue does automatic compaction. Though manual compaction can be a function of query engines (like Spark)
| To check the current values of key translation cluster properties: | ||
|
|
||
| [,bash] | ||
| ---- | ||
| rpk cluster config get datalake_translator_flush_bytes | ||
| rpk cluster config get iceberg_target_lag_ms | ||
| rpk cluster config get iceberg_target_backlog_size | ||
| ---- |
There was a problem hiding this comment.
Seems a bit odd to be in the section about monitoring?
Also might be worth mentioning how to get the topic properties, if we want to keep this somewhere?
| @@ -0,0 +1,118 @@ | |||
| = Troubleshoot Iceberg Topics | |||
There was a problem hiding this comment.
Just noting that there's also an admin endpoint for validating REST catalog connectivity: redpanda-data/redpanda#29677
| If query latency is a concern and your workload produces large messages, consider: | ||
|
|
||
| * Reducing individual message sizes if your data model allows it. | ||
| * Increasing `datalake_translator_flush_bytes` to produce Parquet files with more records per file. |
There was a problem hiding this comment.
The key cluster config properties they'll want to change are:
datalake_translator_flush_bytes- 32MiB defaulticeberg_target_lag_ms- 1min default
Currently we'll translate records on a given partition for 30s. Afterwards we'll check if either the total translated size exceeds datalake_translator_flush_bytes or if the oldest translate record exceeds iceberg_target_lag_ms. If either are true we upload the parquet file.
This can mean that parquet files can be larger than 32MiB if the average message rate is 2 or more per 30s on a given partition. Even with the default cluster config values for datalake_translator_flush_bytes and iceberg_target_lag_ms.
However, this is a subtlety with our implementation and end users should increase datalake_translator_flush_bytes and iceberg_target_lag_ms so that they get a decent number of records per parquet file. A very rough est could be:
messages_per_file = min(
datalake_translator_flush_bytes / message_size,
throughput * iceberg_target_lag_ms
)
All of this to say that we might want to include iceberg_target_lag_ms in the docs here lol.
Description
This pull request introduces significant improvements to the Iceberg documentation by splitting out performance tuning and troubleshooting content into dedicated pages, enhancing navigation, and updating cross-references for clarity. The changes make it easier for users to find best practices for optimizing and troubleshooting Iceberg topics, and ensure that documentation links are accurate and up-to-date.
Documentation Restructuring and Navigation:
nav.adoc, improving discoverability of these topics.New and Updated Content:
Created a new
iceberg-performance-tuning.adocpage with comprehensive guidance on partitioning, compaction, flush threshold and lag tuning, cluster sizing, and monitoring translation performance for Iceberg topics.Created a new standalone
iceberg-troubleshooting.adocpage that hosts content currently at About Iceberg Topics > Troubleshoot errors. Added list of relevant metrics.Resolves https://redpandadata.atlassian.net/browse/
Review deadline:
Page previews
Tune Performance for Iceberg Topics
Troubleshoot Iceberg Topics
Checks