Skip to content

Add DIGEST-MD5 SASL delegation token auth to HiveCatalog#3150

Open
ShreyeshArangath wants to merge 5 commits intoapache:mainfrom
ShreyeshArangath:feat/add-delegation-token
Open

Add DIGEST-MD5 SASL delegation token auth to HiveCatalog#3150
ShreyeshArangath wants to merge 5 commits intoapache:mainfrom
ShreyeshArangath:feat/add-delegation-token

Conversation

@ShreyeshArangath
Copy link

Rationale for this change

Enable PyIceberg's HiveCatalog to authenticate using DIGEST-MD5 SASL with delegation tokens from $HADOOP_TOKEN_FILE_LOCATION, which is the standard mechanism in secure Hadoop environments. This unblocks PyIceberg adoption in production clusters that don't use Kerberos directly

Summary

  • Add HiveAuthError exception for Hive-specific auth failures
  • Add hadoop_credentials module to parse HDTS binary token files
  • Add _DigestMD5SaslTransport to work around THRIFT-5926 (None initial response)
  • Support hive.metastore.authentication property (NONE/KERBEROS/DIGEST-MD5)
  • Add pure-sasl to hive extras in pyproject.toml
  • Backward compatible: existing kerberos_auth boolean still works

Closes #3145

Are these changes tested?

Unit tests

Are there any user-facing changes?

Yes, introduce DIGEST-MD5 SASAL delegation token support

ShreyeshArangath and others added 4 commits March 16, 2026 13:44
Enable PyIceberg's HiveCatalog to authenticate using DIGEST-MD5 SASL
with delegation tokens from $HADOOP_TOKEN_FILE_LOCATION, which is the
standard mechanism in secure Hadoop environments. This unblocks PyIceberg
adoption in production clusters that don't use Kerberos directly.

- Add HiveAuthError exception for Hive-specific auth failures
- Add hadoop_credentials module to parse HDTS binary token files
- Add _DigestMD5SaslTransport to work around THRIFT-5926 (None initial response)
- Support hive.metastore.authentication property (NONE/KERBEROS/DIGEST-MD5)
- Add pure-sasl to hive extras in pyproject.toml
- Backward compatible: existing kerberos_auth boolean still works

Closes apache#3145

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address all findings from code review:

Critical:
- Rewrite VInt decoder to match Java WritableUtils.readVLong exactly,
  using signed-byte interpretation and correct prefix/length semantics

High:
- Catch OSError (not just FileNotFoundError) when reading token file
- Reject unknown auth mechanisms with HiveAuthError instead of silently
  falling back to unauthenticated TBufferedTransport
- Replace monkey-patching sasl.process in _DigestMD5SaslTransport with
  a clean send_sasl_msg override (thread-safe, no shared state mutation)

Medium:
- Fix kerberos_service_name default from config key to actual value
- Wrap UnicodeDecodeError in HiveAuthError for invalid UTF-8 in tokens
- Rewrite VInt test encoder to match real Hadoop encoding format
- Fix dead kerberos backward-compat tests to actually exercise __init__

Low:
- Add upper bound to pure-sasl dependency (<1.0.0)
- Fix tmp_path typing from object to pathlib.Path
- Fix docs to say pure-sasl (pip package name) not puresasl

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The parent TSaslClientTransport.send_sasl_msg() has no type annotations,
so there is no override incompatibility for mypy to suppress.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>


class _DigestMD5SaslTransport(TTransport.TSaslClientTransport):
"""TSaslClientTransport subclass that works around THRIFT-5926.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be addressed as part of apache/thrift#3342

- Document that hive.kerberos-service-name applies to both KERBEROS and DIGEST-MD5
- Add precedence note for hive.metastore.authentication vs legacy boolean
- Add test for empty-string auth mechanism raising HiveAuthError
- Add integration test for KERBEROS via hive.metastore.authentication config
- Expand HiveAuthError docstring to cover token file errors

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@ShreyeshArangath ShreyeshArangath marked this pull request as ready for review March 22, 2026 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support DIGEST-MD5 / delegation token authentication for HMS

1 participant