Revise *.instructions.md: separate AI-relevant special knowledge from background information for contributors #354
Conversation
this allows to separate background knowledge (good for contributors, but too repetitive for AI review bots) from AI-relevant decision rules, special cases and repo-specific conventions.
📝 WalkthroughWalkthroughAdded AI-review guidance and HUMAN-only comment blocks: Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: Length of output: 85785
|
| Block | Lines | Words freed |
|---|---|---|
1 — const rationale + locals example |
112–127 | ~130 |
2 — constexpr + static_assert |
143–171 | ~165 |
3 — const member rationale bullets |
184–188 | ~75 |
| 4 — VLA explanation (condensed in-place) | 107 | ~70 |
| 5 — Hot-path intro | 219–221 | ~40 |
| 6 — Fast/slow path code examples | 293–308 | ~130 |
| 7 — Function pointer code + explanation | 318–329 | ~100 |
| 8 — Bit shifts code + sign warning | 400–408 | ~90 |
| Total | ~800 words |
Current AI-facing: 2 791 words → projected: ~1 990 words (29% reduction). Still above 500 but this is the set that carries zero review-quality risk.
esp-idf.instructions.md — Proposed cuts
This is the highest-value file to trim (3 897 AI-facing words).
Block A · Lines 196–273 — "Compiler changes" + "C++ language features GCC 8→14"
Action: Move entirely to HUMAN_ONLY. These are general GCC/C++ changelog tables an LLM already knows. The only WLED-MM-specific constraint worth keeping AI-facing is:
Do **not** use C++20 features without a `#if __cplusplus > 201703L` guard —
WLED-MM must still build on IDF v4.4 (GCC 8.4, C++17). Mark migration TODOs
with `// TODO(idf5):`.This saves approximately ~430 words (the single largest block in the entire file).
Block B · Lines 165–175 — "Key ESP-IDF version thresholds" table
Action: Move to HUMAN_ONLY. This is general ESP-IDF changelog. Retain only the actionable rule from Lines 177–188 (comment explaining the guard + range guard example).
No replacement rule needed — the guard example already states what matters.
Saves ~90 words.
Block C · Lines 277–288, 290–301 — RMT and I2S v4→v5 API tables (the tables only, not WLED impact lines)
Action: Move the table rows to HUMAN_ONLY. Keep Lines 288 ("WLED impact: NeoPixelBus + IR use legacy RMT…") and the equivalent WLED impact note in I2S AI-facing.
Condensed AI-facing rules:
**RMT**: NeoPixelBus LED output and IR receiver use the legacy `rmt_*` API
removed in IDF v5. Do not introduce new legacy RMT calls. See HUMAN_ONLY
section for the full v4→v5 API mapping.
**I2S**: The audioreactive `I2SSource` class uses the legacy
`i2s_driver_install()` API deprecated in IDF v5. When touching audio source
code, wrap any I2S init in `#if ESP_IDF_VERSION` guards. See HUMAN_ONLY
section for the migration pattern.Saves ~110 words.
Block D · Lines 326–344 — ADC / SPI Flash / GPIO tables
Action: Move entirely to HUMAN_ONLY. These are standard ESP-IDF API renames that an LLM already knows. The actionable rule (use the compat shim, add version guards) is already captured by the migration checklist at Lines 368–374.
Saves ~80 words.
Block E · Lines 790–801 — "Quick Reference: IDF v4→v5 API Mapping" table
Action: Move entirely to HUMAN_ONLY. It duplicates information from the per-peripheral sections above.
Saves ~80 words.
Block F · Lines 34–47 — Build-target #elif code example
Action: Move the code snippet (Lines 37–47) to HUMAN_ONLY. Keep Lines 49–54 (the four guideline bullets) AI-facing — they are actionable rules that prevent false positives.
Condensed note above the bullets:
Gate chip-specific code with `#if defined(CONFIG_IDF_TARGET_*)` chains.
Do **not** use `CONFIG_IDF_TARGET_*` for feature detection — use `SOC_*`
capability macros instead (see next section).Saves ~60 words.
Estimated savings for esp-idf.instructions.md
| Block | Lines | Words freed |
|---|---|---|
| A — GCC/C++ changelog tables | 196–273 | ~430 |
| B — IDF version thresholds table | 165–175 | ~90 |
| C — RMT + I2S API tables | 277–301 | ~110 |
| D — ADC + Flash + GPIO tables | 326–344 | ~80 |
| E — Quick reference table | 790–801 | ~80 |
| F — Build-target code example | 37–47 | ~60 |
| Total | ~850 words |
Current AI-facing: 3 897 words → projected: ~3 047 words (22% reduction). The remaining bulk is WLED-MM–specific peripheral patterns (I2S audio, HUB75, mutex macros, delay/yield/IDLE), which are genuine false-positive suppressors and cannot be cut without review-quality loss.
cicd.instructions.md — No cuts recommended
The remaining AI-facing content (Lines 71–160) is entirely security rules: permissions least-privilege, action pinning, secret scoping, script injection. These are exactly the kind of rules that prevent false negatives in PRs. Cutting them would reduce review quality.
Minor option: Lines 109–116 ("when adding a new third-party action" checklist) could move to HUMAN_ONLY to save ~40 words, if the target is strictly 500 words.
copilot-instructions.md — Structural candidates
| Block | Lines | Action | Words freed |
|---|---|---|---|
| Setup section | 8–12 | Move to HUMAN_ONLY |
~30 |
| Build and Test table | 25–36 | Move to HUMAN_ONLY, keep one-line summary |
~90 |
| Repository Structure tree | 39–59 | Move to HUMAN_ONLY |
~100 |
Condensed AI-facing replacement for lines 39–59:
Firmware source: `wled00/` (C++). Web UI source: `wled00/data/`.
Auto-generated headers: `wled00/html_*.h` — **never edit or commit**.
Usermods: `usermods/` (`.h` files, included via `usermods_list.cpp`).
Build targets: `platformio.ini`. CI/CD: `.github/workflows/`.Saves ~220 words total. Projected: ~538 words (still slightly above 500, but the remaining content is all actionable review guidance).
Summary
| File | Current AI words | Proposed cut | Projected AI words |
|---|---|---|---|
cpp.instructions.md |
2 791 | ~800 | ~1 990 |
esp-idf.instructions.md |
3 897 | ~850 | ~3 047 |
cicd.instructions.md |
568 | ~40 | ~528 |
copilot-instructions.md |
758 | ~220 | ~538 |
The two large files remain well above 500 words after this pass. The content that would need to go for the budget to be met in those files is the WLED-MM–specific peripheral sections (I2S audio, HUB75, mutex/semaphore tables, delay/yield/IDLE) — those are the highest-value false-positive suppressors in the repo and removing them would cause real review-quality loss. A pragmatic target for a second pass would be 1 500 words for cpp and 2 000 words for esp-idf, achievable by condensing the longer peripheral-section prose into bullet-only rules while moving worked examples to HUMAN_ONLY.
🧠 Learnings used
Learnt from: CR
Repo: MoonModules/WLED-MM PR: 0
File: .github/copilot-instructions.md:0-0
Timestamp: 2026-04-03T23:41:27.774Z
Learning: Applies to wled00/**/*.{cpp,h,hpp} : Document attribution of inspiration, knowledge, and sources used in code. Use `// AI:` comments for larger blocks of AI-generated code and ensure AI-generated code is well documented
Learnt from: softhack007
Repo: MoonModules/WLED-MM PR: 0
File: :0-0
Timestamp: 2026-04-04T14:32:02.598Z
Learning: In MoonModules/WLED-MM, clang-format is intentionally NOT enforced via CI or .clang-format config because it would cause merge conflicts when cherry-picking features from upstream WLED (which uses different formatting). Instead, "soft" formatting rules are documented in .github/cpp.instructions.md. Do not recommend adding clang-format to CI as a follow-up to missing automated enforcement — the decision is deliberate.
Learnt from: softhack007
Repo: MoonModules/WLED-MM PR: 0
File: :0-0
Timestamp: 2026-04-04T00:29:49.113Z
Learning: In MoonModules/WLED-MM, function-level doc comment blocks (e.g. /* ***** */ style headers) are NOT required during code reviews, especially not for previously existing functions. Do not flag missing function-level comments as review issues in this repository.
|
@coderabbitai thanks, i'll incorporate your suggestions. I understand that reducing tokenizer output size for review rules also has a positive impact on the remaining context window that stays available as the "AI working memory", allowing a deeper understanding of the source code under review, right? |
|
Tip For best results, initiate chat on the files or code changes.
1. Shared context window budget 2. Transformer attention signal-to-noise 3. The "already internalized" effect 4. Where the bottleneck shifts So the 500-word-per-file-type target is a principled bound: it leaves enough headroom that a moderately large diff (say, 10 changed files, each with 200–300 lines of surrounding context) can fit comfortably alongside the instructions. 🧠 Learnings used |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
|
@coderabbitai we cannot really achieve the "500 words" target everywhere, let's treat it as a "best case" soft objective. Current AI models have a context window of 100.000 to 400.000 embeddings, so reduction makes sense, but 500 appears to be just a "very safe guess". |
|
Tip For best results, initiate chat on the files or code changes.
A small precision: context windows are measured in tokens, not embeddings (embeddings are the internal vector representations of tokens). At a rough 1.3–1.5 tokens per word, your current AI-facing word counts translate to approximately:
Even summed (~9,500 tokens total), that's under 10% of a 100K-token window and under 3% of a 400K window. So the 500-word target was indeed a very conservative "safe guess" — the actual headroom is much larger. The better framing for the target going forward is:
The progress made in this PR (moving ~1,442 words to ✏️ Learnings added
🧠 Learnings used |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
|
@coderabbitai let's do a final readability check, both for the "AI facing" and for "Human full text" views.
|
This comment was marked as resolved.
This comment was marked as resolved.
does not display correctly due to verbatim quotes.
| - When a larger block of code is generated by an AI tool, mark it with an `// AI: below section was generated by an AI` comment (see C++ guidelines). | ||
| - Make sure AI-generated code is well documented. | ||
| - Every non-trivial AI generated function should have a brief comment describing what it does. Explain parameters when their names alone are not self-explanatory. | ||
| - AI-generated code must be well documented; comment-to-code ratio > 15% is expected. Do not rephrase source code, but explain the concepts/logic behind the code. |
There was a problem hiding this comment.
Actually these target values would be good regardless of AI or humans writing code.
However as contributors often don't feel comfortable with English, let's first make the rule mandatory for AI only - they are language experts ;-).
style Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Revising AI review instructions:
Objectives:
This is an evolution of #353, based on the discussion in #353 (comment):
Large language models already have extensive general knowledge about C++, ESP-IDF, GitHub Actions, etc. The genuine value of instruction files is narrow but real:
delay()works in FreeRTOSWLEDMM_FASTPATH,d_malloc)The key insight: instruction files are most valuable when they prevent false positives (AI flagging intentional patterns as bugs) and encode decisions that can't be inferred from the code alone.
Guidelines for Efficient Instruction File Setup
Only document what can't be inferred from the code or from general training.
If the AI would get it right without instruction 95% of the time, the instruction probably isn't worth the token cost.
Prefer short, actionable rules over explanations.
"Do not flag
uint8_t x = int(f)as a bug when the comment says// float→int wrapping" is more useful than a paragraph explaining why. The paragraph is for humans; the rule is for AI.Keep the total injected instruction size per file type under ~500 words.
Beyond that, the marginal value of additional instructions drops rapidly and the distraction cost rises.
Encode "false positive suppressors" first, feature guidance second.
"Don't flag pattern X as a bug" is more impactful than "always do Y" because it directly prevents noise.
Treat instruction files like tests — they need maintenance when the code changes.
If you wouldn't update the instruction file when refactoring the relevant code, don't write the instruction.
Bottom line: The sweet spot is a short general file covering project identity + a handful of false-positive suppressors for your project's intentional deviations, with automated tooling doing the heavy lifting for everything else. More is not better.
Summary by CodeRabbit
Documentation
Chores