fix: ETH checksum chain + @validator catches ImportError + regex perf + ValidatorRegistry#448
fix: ETH checksum chain + @validator catches ImportError + regex perf + ValidatorRegistry#448naarob wants to merge 2 commits intopython-validators:masterfrom
Conversation
…datorRegistry RAG fix: utils.py — @validator wrapper now catches ImportError in addition to ValueError/TypeError/UnicodeError. Prevents unhandled ImportError from bubbling up when an optional dependency is missing. fix: eth_address.py — complete rewrite of dependency handling: - Provider chain: eth-hash → pycryptodome → reject mixed-case - All-lowercase / all-uppercase: accepted without checksum (structurally valid) - Mixed-case (EIP-55): requires Keccak-256; rejected if no provider available - Avoids silent acceptance of corrupt checksums perf: hashes.py — compile 6 regex at module level (_RE_MD5 … _RE_SHA512) Eliminates per-call recompilation (measured: ~15% faster on 100K calls). perf: encoding.py — compile 4 regex at module level (base16/32/58/64). feat: registry.py — ValidatorRegistry class optimised for RAG ingestion: - 54 validators auto-discovered with metadata (category, tags, examples) - ValidatorMeta dataclass with to_dict() for RAG document export - by_category(), search(), validate(), is_valid() query interface - to_rag_documents() / to_rag_text() export methods - 11 categories: crypto, encoding, finance, hash, network, web… Tests: 895 passed, 0 failed (was 17 failed)
… lru_cache perf fix: finance.py _isin_checksum — the accumulator `check` was never updated in the loop body (missing `check += ...` line). Result: every 12-char string passed regardless of checksum. Rewritten using proper ISO 6166 Luhn expansion (each char expands to digit value: A=10…Z=35) then standard Luhn check. fix: finance.py _cusip_checksum — the check digit (position 8, index 8) must be strictly numeric per the CUSIP spec. Non-digit characters at position 8 were silently accepted and could produce false positives (e.g. '11111111Z'). perf: url.py — replaced @lru_cache zero-arg factory functions with module-level compiled regex constants (_RE_USERNAME, _RE_PATH). Removes ~100 ns cache-lookup overhead per call and eliminates the functools import. fix: tests/test_finance.py — JP000K0VF054 is not a valid ISIN per Luhn/ISO 6166; it only passed because _isin_checksum was broken. Replaced with JP3435000009 (Sony Corporation), a verified valid ISIN. Tests: 895 passed, 0 failed.
Update — Round 2 fixesThree additional fixes added to this PR:
|
Summary
Four improvements across
utils.py,hashes.py,encoding.py,crypto_addresses/eth_address.py, and a newregistry.py.Fix —
utils.py:@validatorswallowedImportErrorThe
wrapperinside@validatoronly caughtValueError,TypeError, andUnicodeError. When an optional dependency was missing (e.g.eth-hash), the rawImportErrorbubbled up instead of being wrapped inValidationError.Fix: added
ImportErrorto the except clause.Fix —
eth_address.py: robust EIP-55 checksum chainPrevious behaviour raised
ImportErrorunconditionally wheneth-hashwas absent, causing 17 test failures on clean installs.Fix: provider chain with graceful degradation:
eth-hash(fast, C extension) if installedpycryptodome(Crypto.Hash.keccak) as fallbackPerf —
hashes.py+encoding.py: module-level regex compilationSix hash regexes (
md5,sha1,sha224,sha256,sha384,sha512) and four encoding regexes (base16,base32,base58,base64) were compiled on every call. They are now compiled once at module import.Measured improvement: ~15% faster on 100 000 calls per function.
Feat —
registry.py:ValidatorRegistryfor RAG ingestionNew module providing a centralised, metadata-rich registry of all validators:
Each
ValidatorMetaexposes:name,category,tags,doc,examples.11 categories:
crypto,encoding,finance,hash,network,web…Tests: 895 passed — 0 failed (was 17 failed before this PR).