Skip to content

GNU3 demangler: rewrite, bug fixes, and type name corrections#8038

Draft
plafosse wants to merge 26 commits intodevfrom
test_gnu3_demangler_improvements
Draft

GNU3 demangler: rewrite, bug fixes, and type name corrections#8038
plafosse wants to merge 26 commits intodevfrom
test_gnu3_demangler_improvements

Conversation

@plafosse
Copy link
Member

@plafosse plafosse commented Mar 24, 2026

Summary

  • Rewrite GNU3 demangler internals using DemangledTypeNode for performance (eliminates intermediate Type allocations)
  • Fix type names to match the Itanium C++ ABI specification (verified against demumble/LLVM)
  • Add support for many previously unhandled mangling constructs
  • Remove dead code and unused functions

Fixes #1985, fixes #1895, fixes #1016

Type name fixes

  • x/y: long long / unsigned long long (were indistinguishable from long)
  • a: signed char (was mapped to char, same as type c)
  • n/o: __int128 / unsigned __int128 (were non-standard int128_t/uint128_t)
  • g: __float128 (was float128)
  • Dd/Df/De: decimal64/decimal32/decimal128 (were incorrectly mapped to binary float types)
  • Dh: _Float16 (was float16)
  • Dp: always show ... for pack expansion (was only shown for auto types)
  • Primary expression n: __int128 not __uint128

New construct support

  • Intel Vector Function ABI (_ZGV...) with ISA/mask/vlen/parameter parsing
  • TLS init/wrapper functions (TH/TW)
  • Transaction clones (GTt/GTn)
  • Covariant return thunks (Tc)
  • Block invocations (_block_invoke)
  • Reference temporaries (GR)
  • ABI tags (B<source-name>)
  • Operator <=> (spaceship)
  • Conversion operators with forward template references
  • Inheriting constructors (CI)
  • Generic lambdas with auto parameters
  • Many expression operators for decltype contexts

Bug fixes

  • PeekString in z/Z Vector ABI disambiguation (was reading past end)
  • DemangleBaseUnresolvedName consuming on prefix chars as operator
  • Dot extension spacing (.ehexception handler now has space)
  • Substitution table tracking for sr scope resolution paths

Cleanup

  • Remove dead code: GetNameType C/D constructor/destructor cases, DemangleInitializer, BinaryExpression "." check
  • Guard GetRaw/PrintTables/GetTemplateType behind GNUDEMANGLE_DEBUG
  • Remove unused ReadUntil/NextIsOneOf
  • Add altName parameter to FloatType (parallels IntegerType)

Test plan

  • 365 unit tests with full-signature EXPECT_EQ validation
  • 93% code region coverage, 89% branch coverage
  • Comparison script against demumble (LLVM) confirms 205/282 exact matches; remaining differences are documented (expression formatting, return type presence)

🤖 Generated with Claude Code

plafosse and others added 26 commits March 24, 2026 15:01
Replace the TypeBuilder-based demangling path with a lightweight
DemangledTypeNode representation that defers type object construction
until the symbol is fully parsed. This avoids repeated heap allocation
and ref-count churn during recursive descent.

Key changes:
- Add DemangledTypeNode / demangled_type_node.{h,cpp}: a compact IR
  that mirrors the type grammar without allocating BN Type objects
- Use a thread_local demangler instance to amortize vector allocations
  across calls

Result: ~3x throughput improvement on a 180K-symbol corpus with
97.7% success rate (matching the previous implementation).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Demangle GCC transaction clone symbols (_ZGTt/<_ZGTn) into
'name [transaction clone](params)' format. The encoding is
GTt<encoding> for transaction-safe clones and GTn<encoding>
for non-transaction-safe clones, consuming the t/n qualifier
before demangling the inner symbol.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- TH <encoding>: demangle as "tls_init_function_for_<name>"
- TW <encoding>: demangle as "tls_wrapper_function_for_<name>"
- Tc <call-offset> <call-offset> <encoding>: demangle as
  "covariant_return_thunk_to_<name>(<params>)"
- Remove stale TODO comment from Tv (virtual thunk) case

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Handles ____ZN..._block_invoke.N style symbols produced by Clang's
block invocation ABI. Strips the _block_invoke[.N|_N] suffix,
normalizes excess leading underscores back to a _Z prefix, recursively
demangles the base symbol, and wraps the result as
"invocation function for block in <base>", preserving the base
function's return type and parameter list.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Change "invocation function for block in " to
"invocation_function_for_block_in_" to match the underscore
convention used elsewhere (guard_variable_for_, tls_init_function_for_,
non-virtual_thunk_to_, etc.)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Generic lambdas encode 'auto' params as T_, T0_, T1_... which reference
the lambda's own operator() template params, not any outer template scope.
Previously GetTemplateType() would throw because m_templateSubstitute held
the outer context and had no entries for these refs. Fix by saving and
replacing m_templateSubstitute with 'auto' placeholder nodes before parsing
lambda params, then restoring it after.

Also fix MyLogDebug signature to match updated PerformLog API (used when
GNUDEMANGLE_DEBUG is enabled).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three fixes:

1. DemangleLocalName: give the local function its own template scope.
   Previously m_topLevel=false prevented the local function's template args
   from being pushed to m_templateSubstitute, so T0_/T1_ references in the
   function's parameter types would throw. Fix by saving/clearing
   m_templateSubstitute and setting m_topLevel=true around DemangleSymbol,
   then restoring both afterward.

2. Pointer-to-member type (case 'M'): set substitute=true so the type is
   pushed to m_substitute. Member pointer types are substitutable per the
   Itanium ABI; not tracking them shifted all subsequent substitution
   indices by 1, breaking e.g. SO_ references that follow.

3. Pack expansion (Dp): append '...' to the expanded type string so that
   e.g. DpOT_ renders as 'auto&&...' instead of 'auto&&'.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add case 'I' to DemangleTemplateArgs to handle I...E encoded argument
  packs (GCC sometimes uses I...E instead of J...E for variadic template
  argument packs, e.g. emplace_back<IIS1_EE>)
- Fix Dp (pack expansion) to omit trailing '...' for concrete types;
  only abstract types like 'auto&&...' (generic lambda params) keep '...'

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Handles _ZGV<isa><mask><vlen><params>_<name> mangled names such as
_ZGVeN16v___logf_finite.

Supports all ISA codes (b/c/d/e/x/y/Y/z/Z), mask/nomask, vector length,
and all parameter kinds: linear (l/R/U/L) with stride and alignment,
uniform (u), and vector (v). Disambiguates from existing guard variable
handling (_ZGV<symbol>) by checking the ISA code character.

Example: _ZGVeN16v___logf_finite -> __logf_finite [SIMD:AVX512,nomask,N=16,(vector)]

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add U <source-name> [<template-args>] <type> handling, used for
  Objective-C block pointer types (U13block_pointer <function-type>)
  and other vendor extensions; produces e.g.
  "void (T*, T2*, T3*) block_pointer"
- Fix TI (typeinfo) case to use UnknownNamedTypeClass instead of
  StructNamedTypeClass, which was causing the type name to appear
  duplicated in the output (e.g. "struct typeinfo_for_X _typeinfo_for_X")

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Two related fixes in the sr (scope resolution) expression handler:

1. Break loop after qualifier with template-args rather than waiting for
   a standalone E. GCC sometimes reuses the template-args E to also
   terminate the qualifier list, omitting the explicit qualifier-list E.

2. Push each sr qualifier level to the substitution table (both the bare
   name and its template instantiation if present). The compiler tracks
   these as substitution candidates when encoding, so downstream SX_
   references inside the same expression expect them to be present.

Together these allow symbols like:
  _Z27qRegisterNormalizedMetaTypeI16QQmlListPropertyI20QQuickStateOperationEE
  iRK10QByteArrayPT_N9QtPrivate21MetaTypeDefinedHelperIS6_Xaasr12QMetaTypeId2
  IS6_E7DefinedntsrSB_9IsBuiltInEE11DefinedTypeE

to demangle successfully where c++filt and demumble both fail.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
DemangleSubstitution only handled single-character seq-ids (S0_ through
SZ_, covering positions 1-36). Multi-character seq-ids like S10_, S11_
(positions 37, 38, ...) would throw because the code read one char then
expected '_' but got more digits.

Fix: accumulate all [0-9A-Z] digits before '_' using base-36 arithmetic,
then add 1 to get the substitution table index.

This allows symbols with many unique type substitutions (e.g. large
template visitor tables with 17+ type variants) to demangle correctly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ames

In DemangleNestedName(), the previous code broke out of the parsing loop
when it saw 'B' (an ABI tag prefix), leaving the 'E' nested-name terminator
unconsumed and causing the remainder of the qualified name (e.g. constructor
C1) to be lost.

The fix: instead of breaking, consume all consecutive 'B <source-name>'
sequences inline, appending "[abi:tag]" to the last name segment, then
continue parsing the rest of the nested name normally. The m_lastName is
saved and restored so that a following C1/D1 ctor/dtor still resolves to
the class name rather than the ABI tag string.

In DemangleFunction(), the old ad-hoc 'B' handler (which only recognized
"cxx11" and set a wrong return type) is replaced with the same ABI-tag
loop, allowing correct return-type detection for constructors/destructors
with ABI-tagged names.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When a template non-type parameter uses a pointer to a function as its
value, the Itanium ABI encodes it as L <mangled-name> E inside the
template argument.  DemanglePrimaryExpression() handled this case via
a _Z prefix check, but called DemangleSymbol with m_topLevel=false and
without clearing the outer template substitution table.

This caused T_ / T0_ / ... references inside the embedded symbol's
parameter list to resolve against the outer symbol's template params,
not the inner symbol's, leading to a DemangleException and a complete
demangling failure for these symbols.

Fix: save/clear m_templateSubstitute and set m_topLevel=true before
recursing into DemangleSymbol for the embedded name, then restore
both afterwards.  This mirrors the same pattern used by DemangleLocalName.

Also use GetTypeAndName() instead of manually concatenating
GetStringBeforeName() + name + GetStringAfterName() to ensure
proper spacing between return type and function name.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Increase MAX_DEMANGLE_LENGTH to 262144 (256KB) to handle deeply
  templated symbols from real-world binaries

- Fix substitution table off-by-one for template functions whose
  template args are all types (no non-type L/X args): DemangleName's
  'N' case now pushes the complete specialized qualified name to the
  substitution table, matching the compiler's encoding. Introduced
  DemangleTemplateArgs(hadNonTypeArg) and DemangleNestedName(allTypeTemplateArgs)
  out-parameters to detect when all template args are types and the
  push is appropriate.

- Fix DemangleName's 'Z' (local-name) context: added m_inLocalName flag
  set around the inner DemangleSymbol call in DemangleLocalName, so the
  push only fires for the outer function name, not inner scope functions.

- Fix DemangleUnresolvedType decltype handling: the DT/Dt branch was
  peeking for the prefix but not consuming it before calling
  DemangleExpression, causing the expression parser to see 'D','T' with
  no matching case. Now correctly consumes the two-char prefix and the
  mandatory trailing 'E', enabling sr(decltype(...))::member patterns.

- Add forward template ref support for cv conversion operator types:
  ResolveForwardTemplateRefs patches placeholder types in the result
  once the enclosing template-args are known.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…p_ in decltype

  - DemanglePrimaryExpression: handle LZ<encoding>E (function address without leading
    underscore, used by GCC/Clang for pointer-to-function non-type template args).
    e.g. __ZNK1aI1bIP1cELZN1a1e1fEvELZNS5_1fEEE1iEv -> a<b<c*>, a::e::f(), a::e::f>::i() const

  - fp_/fp<N>_ in decltype return types: return placeholder name ("fp", "fp0", etc.)
    instead of throwing when m_functionSubstitute is empty or too small (params not
    yet parsed at the time the return type DT expression is evaluated).

  - cl expression: format as callable(args) instead of (callable, args).

  - dt/pt expressions: remove unnecessary parens around operands; use obj.member and
    obj->member format directly for clean output.
    e.g. _ZN1a1bIN1c1dINS0_1eEEEE1fIKS3_EEDTcldtfp_1gEERT_i ->
         fp.g() a::b<c::d<a::b::e>>::f<a::b::e const>(a::b::e const&, int32_t)
…guard disambiguation

  - sr unresolved-qualifier-level: remove incorrect PushType calls (expressions
    don't generate substitution candidates per the Itanium ABI); fix the trailing
    E to be conditional rather than unconditional to handle GCC's occasional omission
    when the last qualifier had template-args whose closing E served double duty

  - Add CI1/CI2 inheriting constructor support in DemangleUnqualifiedName; save
    and restore m_lastName around DemangleType() to prevent the inherited class
    type from clobbering the enclosing class name

  - Handle 'M' data-member-prefix in DemangleNestedName (closure/lambda inside
    a data member initializer; the M is consumed and the outer name is retained)

  - Improve guard variable vs Vector ABI disambiguation for z/Z suffixes by
    scanning all vparam chars ({v,l,u,R,L,s,digits}) and requiring a trailing
    underscore, preventing false positive vector matches on guard variables
GCC emits "sr St 6__and_I<T>E 5value" for std::__and_<T>::value in
enable_if expressions. The previous sr/else handler assumed exactly one
qualifier after the unresolved type, leaving trailing source-name
components unread and causing DemangleException in subsequent parsing.

Fix: loop over digit-started source names after the initial unresolved
type, treating each as an intermediate qualifier when it has template
args AND another source name follows, or as the final name otherwise.
Operator-name ("on") and destructor-name ("dn") final components fall
through to the existing DemangleBaseUnresolvedName() call.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…d sr intermediate qualifiers

Two missing substitution table entries caused out-of-range access (SG_=17)
when decoding symbols like std::swap<T> with enable_if/enable_if constraints:

1. DemangleName case 'S': after processing template args for an unscoped name
   (e.g. St 4swap I...E), the encoder pushes BOTH the prefix (std::swap) AND
   the full instantiation (std::swap<T>) to the substitution table.  We were
   only pushing the prefix.  Fixed by adding PushType(type) after template
   args are appended.

2. DemangleExpression sr else branch: when processing a multi-level sr scope
   without the N prefix (e.g. sr St 6__and_ I...E ...), intermediate template
   qualifiers must be pushed to the substitution table.  Previously the while
   loop appended to 'out' but never called PushType, unlike the N-prefix branch.
   Fixed by adding PushType(CreateUnknownType(...)) for intermediate qualifiers.

Reduces failures from 211 to 90 on the 178,883-symbol test corpus (99.95%).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ch substitution pushes

- sr N-prefix branch: guard DemangleUnresolvedType() call with isdigit check
  so GCC-style N <source-name>+ E patterns don't throw on digit-start names
- sr N-prefix branch: push template instantiation (name+args) after template args,
  in addition to the bare name, to keep the substitution table in sync
- sr digit branch: push bare name and template instantiation to substitution table
  so forward references like SK_ resolve correctly

Reduces demangling failures on the corpus from 90 to 4.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…fL fix, decltype

- GR <object name> [<seq-id>] _: implement reference temporary special names
- u <source-name> [<template-args>]: add vendor extended type in DemangleType
- $tlv$init suffix: strip macOS thread-local init suffix before demangling,
  append it back to the resulting variable name
- fL <level> p <CV> [<param>] _: allow out-of-range listNumber to fall through
  to the existing "fp" placeholder paths instead of throwing, fixing fL inside
  decltype return types where m_functionSubstitute is not yet populated
- DT/Dt: wrap the demangled expression in decltype(...) instead of emitting raw

Reduces failures on a 290k-symbol BN corpus from 119 to 14 (88% improvement).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- ABI tags (B<source-name>): consolidate handling after switch in
  DemangleUnqualifiedName, covering source names and operator names
  (e.g. operator<=>[abi:ne180100])
- operator<=> (ss): add to GetOperator and DemangleUnqualifiedName
- Template param indices: fix DemangleTemplateSubstitution to parse
  multi-digit decimal indices (T10_, T11_, ...) instead of only
  single-digit (fixes symbols with 10+ template parameters)
- cv cast expression: consume '_' delimiter before calling
  DemangleExpressionList for the multi-arg form (cv <type> _ <expr>* E),
  fixing decltype(cm(..., (void)())) return types

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix DemangleBaseUnresolvedName: consume 'on' prefix before reading operator chars
- Fix PeekString(32) in Intel Vector ABI z/Z disambiguation: use min(32, Length())
- Remove unused NextIsOneOf and ReadUntil Reader methods
- Guard GetRaw, GetTemplateType, and PrintTables behind GNUDEMANGLE_DEBUG

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Remove unreachable C1-C5/D0-D5 cases from GetNameType() — these
  are handled directly by DemangleUnqualifiedName's own case blocks
- Remove unused DemangleInitializer() (declaration and definition)
- Remove dead "." operator check in DemangleBinaryExpression — the
  dt (member access) operator has its own handler in DemangleExpression

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix DemanglePrimaryExpression 'n' case: type is __int128 (signed),
  not __uint128 — matches the Itanium ABI spec for type code 'n'
- Fix dot-extension appending (.eh, .eh_frame, etc.): add space before
  the human-readable extension name to avoid "fooexception handler"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- x/y: use "long long"/"unsigned long long" (was indistinguishable from long)
- a: use "signed char" (was mapped to "char", same as type 'c')
- n/o: use "__int128"/"unsigned __int128" (was "int128_t"/"uint128_t")
- g: use "__float128" via FloatType with altName (was "float128")
- Dd/Df/De: use "decimal64"/"decimal32"/"decimal128" (were mapped to
  binary float types double/float/float128)
- Dh: use "_Float16" (was "float16")
- Dp: always show "..." for pack expansion (was only shown for auto types)
- Add altName support to FloatType (DemangledTypeNode + Finalize path)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
};


// Non-templated static functions (not part of the template class)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What "template class" is this referring to?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah this is some cruft that accidentally made it in.

@plafosse plafosse marked this pull request as draft March 24, 2026 20:02
@plafosse
Copy link
Member Author

Converted to a draft as I have a couple of changes that are needed. I'm also planning on cutting back down the MAX_DEMANGLE_LENGTH as I only expanded that for testing purposes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Demangling decltype failure case Name demangler edge cases Unsupported demangle type: transaction clone

2 participants