GNU3 demangler: rewrite, bug fixes, and type name corrections#8038
Draft
GNU3 demangler: rewrite, bug fixes, and type name corrections#8038
Conversation
Replace the TypeBuilder-based demangling path with a lightweight
DemangledTypeNode representation that defers type object construction
until the symbol is fully parsed. This avoids repeated heap allocation
and ref-count churn during recursive descent.
Key changes:
- Add DemangledTypeNode / demangled_type_node.{h,cpp}: a compact IR
that mirrors the type grammar without allocating BN Type objects
- Use a thread_local demangler instance to amortize vector allocations
across calls
Result: ~3x throughput improvement on a 180K-symbol corpus with
97.7% success rate (matching the previous implementation).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Demangle GCC transaction clone symbols (_ZGTt/<_ZGTn) into 'name [transaction clone](params)' format. The encoding is GTt<encoding> for transaction-safe clones and GTn<encoding> for non-transaction-safe clones, consuming the t/n qualifier before demangling the inner symbol. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- TH <encoding>: demangle as "tls_init_function_for_<name>" - TW <encoding>: demangle as "tls_wrapper_function_for_<name>" - Tc <call-offset> <call-offset> <encoding>: demangle as "covariant_return_thunk_to_<name>(<params>)" - Remove stale TODO comment from Tv (virtual thunk) case Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Handles ____ZN..._block_invoke.N style symbols produced by Clang's block invocation ABI. Strips the _block_invoke[.N|_N] suffix, normalizes excess leading underscores back to a _Z prefix, recursively demangles the base symbol, and wraps the result as "invocation function for block in <base>", preserving the base function's return type and parameter list. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Change "invocation function for block in " to "invocation_function_for_block_in_" to match the underscore convention used elsewhere (guard_variable_for_, tls_init_function_for_, non-virtual_thunk_to_, etc.) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Generic lambdas encode 'auto' params as T_, T0_, T1_... which reference the lambda's own operator() template params, not any outer template scope. Previously GetTemplateType() would throw because m_templateSubstitute held the outer context and had no entries for these refs. Fix by saving and replacing m_templateSubstitute with 'auto' placeholder nodes before parsing lambda params, then restoring it after. Also fix MyLogDebug signature to match updated PerformLog API (used when GNUDEMANGLE_DEBUG is enabled). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three fixes: 1. DemangleLocalName: give the local function its own template scope. Previously m_topLevel=false prevented the local function's template args from being pushed to m_templateSubstitute, so T0_/T1_ references in the function's parameter types would throw. Fix by saving/clearing m_templateSubstitute and setting m_topLevel=true around DemangleSymbol, then restoring both afterward. 2. Pointer-to-member type (case 'M'): set substitute=true so the type is pushed to m_substitute. Member pointer types are substitutable per the Itanium ABI; not tracking them shifted all subsequent substitution indices by 1, breaking e.g. SO_ references that follow. 3. Pack expansion (Dp): append '...' to the expanded type string so that e.g. DpOT_ renders as 'auto&&...' instead of 'auto&&'. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add case 'I' to DemangleTemplateArgs to handle I...E encoded argument packs (GCC sometimes uses I...E instead of J...E for variadic template argument packs, e.g. emplace_back<IIS1_EE>) - Fix Dp (pack expansion) to omit trailing '...' for concrete types; only abstract types like 'auto&&...' (generic lambda params) keep '...' Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Handles _ZGV<isa><mask><vlen><params>_<name> mangled names such as _ZGVeN16v___logf_finite. Supports all ISA codes (b/c/d/e/x/y/Y/z/Z), mask/nomask, vector length, and all parameter kinds: linear (l/R/U/L) with stride and alignment, uniform (u), and vector (v). Disambiguates from existing guard variable handling (_ZGV<symbol>) by checking the ISA code character. Example: _ZGVeN16v___logf_finite -> __logf_finite [SIMD:AVX512,nomask,N=16,(vector)] Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add U <source-name> [<template-args>] <type> handling, used for Objective-C block pointer types (U13block_pointer <function-type>) and other vendor extensions; produces e.g. "void (T*, T2*, T3*) block_pointer" - Fix TI (typeinfo) case to use UnknownNamedTypeClass instead of StructNamedTypeClass, which was causing the type name to appear duplicated in the output (e.g. "struct typeinfo_for_X _typeinfo_for_X") Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two related fixes in the sr (scope resolution) expression handler: 1. Break loop after qualifier with template-args rather than waiting for a standalone E. GCC sometimes reuses the template-args E to also terminate the qualifier list, omitting the explicit qualifier-list E. 2. Push each sr qualifier level to the substitution table (both the bare name and its template instantiation if present). The compiler tracks these as substitution candidates when encoding, so downstream SX_ references inside the same expression expect them to be present. Together these allow symbols like: _Z27qRegisterNormalizedMetaTypeI16QQmlListPropertyI20QQuickStateOperationEE iRK10QByteArrayPT_N9QtPrivate21MetaTypeDefinedHelperIS6_Xaasr12QMetaTypeId2 IS6_E7DefinedntsrSB_9IsBuiltInEE11DefinedTypeE to demangle successfully where c++filt and demumble both fail. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
DemangleSubstitution only handled single-character seq-ids (S0_ through SZ_, covering positions 1-36). Multi-character seq-ids like S10_, S11_ (positions 37, 38, ...) would throw because the code read one char then expected '_' but got more digits. Fix: accumulate all [0-9A-Z] digits before '_' using base-36 arithmetic, then add 1 to get the substitution table index. This allows symbols with many unique type substitutions (e.g. large template visitor tables with 17+ type variants) to demangle correctly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ames In DemangleNestedName(), the previous code broke out of the parsing loop when it saw 'B' (an ABI tag prefix), leaving the 'E' nested-name terminator unconsumed and causing the remainder of the qualified name (e.g. constructor C1) to be lost. The fix: instead of breaking, consume all consecutive 'B <source-name>' sequences inline, appending "[abi:tag]" to the last name segment, then continue parsing the rest of the nested name normally. The m_lastName is saved and restored so that a following C1/D1 ctor/dtor still resolves to the class name rather than the ABI tag string. In DemangleFunction(), the old ad-hoc 'B' handler (which only recognized "cxx11" and set a wrong return type) is replaced with the same ABI-tag loop, allowing correct return-type detection for constructors/destructors with ABI-tagged names. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When a template non-type parameter uses a pointer to a function as its value, the Itanium ABI encodes it as L <mangled-name> E inside the template argument. DemanglePrimaryExpression() handled this case via a _Z prefix check, but called DemangleSymbol with m_topLevel=false and without clearing the outer template substitution table. This caused T_ / T0_ / ... references inside the embedded symbol's parameter list to resolve against the outer symbol's template params, not the inner symbol's, leading to a DemangleException and a complete demangling failure for these symbols. Fix: save/clear m_templateSubstitute and set m_topLevel=true before recursing into DemangleSymbol for the embedded name, then restore both afterwards. This mirrors the same pattern used by DemangleLocalName. Also use GetTypeAndName() instead of manually concatenating GetStringBeforeName() + name + GetStringAfterName() to ensure proper spacing between return type and function name. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Increase MAX_DEMANGLE_LENGTH to 262144 (256KB) to handle deeply templated symbols from real-world binaries - Fix substitution table off-by-one for template functions whose template args are all types (no non-type L/X args): DemangleName's 'N' case now pushes the complete specialized qualified name to the substitution table, matching the compiler's encoding. Introduced DemangleTemplateArgs(hadNonTypeArg) and DemangleNestedName(allTypeTemplateArgs) out-parameters to detect when all template args are types and the push is appropriate. - Fix DemangleName's 'Z' (local-name) context: added m_inLocalName flag set around the inner DemangleSymbol call in DemangleLocalName, so the push only fires for the outer function name, not inner scope functions. - Fix DemangleUnresolvedType decltype handling: the DT/Dt branch was peeking for the prefix but not consuming it before calling DemangleExpression, causing the expression parser to see 'D','T' with no matching case. Now correctly consumes the two-char prefix and the mandatory trailing 'E', enabling sr(decltype(...))::member patterns. - Add forward template ref support for cv conversion operator types: ResolveForwardTemplateRefs patches placeholder types in the result once the enclosing template-args are known. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…p_ in decltype
- DemanglePrimaryExpression: handle LZ<encoding>E (function address without leading
underscore, used by GCC/Clang for pointer-to-function non-type template args).
e.g. __ZNK1aI1bIP1cELZN1a1e1fEvELZNS5_1fEEE1iEv -> a<b<c*>, a::e::f(), a::e::f>::i() const
- fp_/fp<N>_ in decltype return types: return placeholder name ("fp", "fp0", etc.)
instead of throwing when m_functionSubstitute is empty or too small (params not
yet parsed at the time the return type DT expression is evaluated).
- cl expression: format as callable(args) instead of (callable, args).
- dt/pt expressions: remove unnecessary parens around operands; use obj.member and
obj->member format directly for clean output.
e.g. _ZN1a1bIN1c1dINS0_1eEEEE1fIKS3_EEDTcldtfp_1gEERT_i ->
fp.g() a::b<c::d<a::b::e>>::f<a::b::e const>(a::b::e const&, int32_t)
…guard disambiguation
- sr unresolved-qualifier-level: remove incorrect PushType calls (expressions
don't generate substitution candidates per the Itanium ABI); fix the trailing
E to be conditional rather than unconditional to handle GCC's occasional omission
when the last qualifier had template-args whose closing E served double duty
- Add CI1/CI2 inheriting constructor support in DemangleUnqualifiedName; save
and restore m_lastName around DemangleType() to prevent the inherited class
type from clobbering the enclosing class name
- Handle 'M' data-member-prefix in DemangleNestedName (closure/lambda inside
a data member initializer; the M is consumed and the outer name is retained)
- Improve guard variable vs Vector ABI disambiguation for z/Z suffixes by
scanning all vparam chars ({v,l,u,R,L,s,digits}) and requiring a trailing
underscore, preventing false positive vector matches on guard variables
GCC emits "sr St 6__and_I<T>E 5value" for std::__and_<T>::value in
enable_if expressions. The previous sr/else handler assumed exactly one
qualifier after the unresolved type, leaving trailing source-name
components unread and causing DemangleException in subsequent parsing.
Fix: loop over digit-started source names after the initial unresolved
type, treating each as an intermediate qualifier when it has template
args AND another source name follows, or as the final name otherwise.
Operator-name ("on") and destructor-name ("dn") final components fall
through to the existing DemangleBaseUnresolvedName() call.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…d sr intermediate qualifiers Two missing substitution table entries caused out-of-range access (SG_=17) when decoding symbols like std::swap<T> with enable_if/enable_if constraints: 1. DemangleName case 'S': after processing template args for an unscoped name (e.g. St 4swap I...E), the encoder pushes BOTH the prefix (std::swap) AND the full instantiation (std::swap<T>) to the substitution table. We were only pushing the prefix. Fixed by adding PushType(type) after template args are appended. 2. DemangleExpression sr else branch: when processing a multi-level sr scope without the N prefix (e.g. sr St 6__and_ I...E ...), intermediate template qualifiers must be pushed to the substitution table. Previously the while loop appended to 'out' but never called PushType, unlike the N-prefix branch. Fixed by adding PushType(CreateUnknownType(...)) for intermediate qualifiers. Reduces failures from 211 to 90 on the 178,883-symbol test corpus (99.95%). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ch substitution pushes - sr N-prefix branch: guard DemangleUnresolvedType() call with isdigit check so GCC-style N <source-name>+ E patterns don't throw on digit-start names - sr N-prefix branch: push template instantiation (name+args) after template args, in addition to the bare name, to keep the substitution table in sync - sr digit branch: push bare name and template instantiation to substitution table so forward references like SK_ resolve correctly Reduces demangling failures on the corpus from 90 to 4. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…fL fix, decltype - GR <object name> [<seq-id>] _: implement reference temporary special names - u <source-name> [<template-args>]: add vendor extended type in DemangleType - $tlv$init suffix: strip macOS thread-local init suffix before demangling, append it back to the resulting variable name - fL <level> p <CV> [<param>] _: allow out-of-range listNumber to fall through to the existing "fp" placeholder paths instead of throwing, fixing fL inside decltype return types where m_functionSubstitute is not yet populated - DT/Dt: wrap the demangled expression in decltype(...) instead of emitting raw Reduces failures on a 290k-symbol BN corpus from 119 to 14 (88% improvement). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- ABI tags (B<source-name>): consolidate handling after switch in DemangleUnqualifiedName, covering source names and operator names (e.g. operator<=>[abi:ne180100]) - operator<=> (ss): add to GetOperator and DemangleUnqualifiedName - Template param indices: fix DemangleTemplateSubstitution to parse multi-digit decimal indices (T10_, T11_, ...) instead of only single-digit (fixes symbols with 10+ template parameters) - cv cast expression: consume '_' delimiter before calling DemangleExpressionList for the multi-arg form (cv <type> _ <expr>* E), fixing decltype(cm(..., (void)())) return types Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix DemangleBaseUnresolvedName: consume 'on' prefix before reading operator chars - Fix PeekString(32) in Intel Vector ABI z/Z disambiguation: use min(32, Length()) - Remove unused NextIsOneOf and ReadUntil Reader methods - Guard GetRaw, GetTemplateType, and PrintTables behind GNUDEMANGLE_DEBUG Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove unreachable C1-C5/D0-D5 cases from GetNameType() — these are handled directly by DemangleUnqualifiedName's own case blocks - Remove unused DemangleInitializer() (declaration and definition) - Remove dead "." operator check in DemangleBinaryExpression — the dt (member access) operator has its own handler in DemangleExpression Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix DemanglePrimaryExpression 'n' case: type is __int128 (signed), not __uint128 — matches the Itanium ABI spec for type code 'n' - Fix dot-extension appending (.eh, .eh_frame, etc.): add space before the human-readable extension name to avoid "fooexception handler" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- x/y: use "long long"/"unsigned long long" (was indistinguishable from long) - a: use "signed char" (was mapped to "char", same as type 'c') - n/o: use "__int128"/"unsigned __int128" (was "int128_t"/"uint128_t") - g: use "__float128" via FloatType with altName (was "float128") - Dd/Df/De: use "decimal64"/"decimal32"/"decimal128" (were mapped to binary float types double/float/float128) - Dh: use "_Float16" (was "float16") - Dp: always show "..." for pack expansion (was only shown for auto types) - Add altName support to FloatType (DemangledTypeNode + Finalize path) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
bdash
reviewed
Mar 24, 2026
| }; | ||
|
|
||
|
|
||
| // Non-templated static functions (not part of the template class) |
Contributor
There was a problem hiding this comment.
What "template class" is this referring to?
Member
Author
There was a problem hiding this comment.
Oh yeah this is some cruft that accidentally made it in.
Member
Author
|
Converted to a draft as I have a couple of changes that are needed. I'm also planning on cutting back down the |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
DemangledTypeNodefor performance (eliminates intermediateTypeallocations)Fixes #1985, fixes #1895, fixes #1016
Type name fixes
x/y:long long/unsigned long long(were indistinguishable fromlong)a:signed char(was mapped tochar, same as typec)n/o:__int128/unsigned __int128(were non-standardint128_t/uint128_t)g:__float128(wasfloat128)Dd/Df/De:decimal64/decimal32/decimal128(were incorrectly mapped to binary float types)Dh:_Float16(wasfloat16)Dp: always show...for pack expansion (was only shown forautotypes)n:__int128not__uint128New construct support
_ZGV...) with ISA/mask/vlen/parameter parsingTH/TW)GTt/GTn)Tc)_block_invoke)GR)B<source-name>)<=>(spaceship)CI)Bug fixes
PeekStringin z/Z Vector ABI disambiguation (was reading past end)DemangleBaseUnresolvedNameconsumingonprefix chars as operator.eh→exception handlernow has space)srscope resolution pathsCleanup
GetNameTypeC/D constructor/destructor cases,DemangleInitializer,BinaryExpression"." checkGetRaw/PrintTables/GetTemplateTypebehindGNUDEMANGLE_DEBUGReadUntil/NextIsOneOfaltNameparameter toFloatType(parallelsIntegerType)Test plan
EXPECT_EQvalidation🤖 Generated with Claude Code