Skip to content

fix: improve binary content detection and null byte handling#210

Merged
MaojiaSheng merged 1 commit intovolcengine:mainfrom
aeromomo:fix/null-byte-handling
Feb 19, 2026
Merged

fix: improve binary content detection and null byte handling#210
MaojiaSheng merged 1 commit intovolcengine:mainfrom
aeromomo:fix/null-byte-handling

Conversation

@aeromomo
Copy link
Contributor

  • Add binary content detection based on null byte percentage (>5%)
  • Add control character validation to avoid processing binary files as text
  • Remove null bytes from decoded text content to prevent downstream issues
  • Add logging for binary content detection and null byte removal

This prevents potential issues when processing files that contain null bytes or other binary data that could cause problems in text processing pipelines.

The fix uses a 5% threshold for null bytes to distinguish between text files with occasional null bytes and truly binary content.

Description

Related Issue

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test update

Changes Made

Testing

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have tested this on the following platforms:
    • Linux
    • macOS
    • Windows

Checklist

  • My code follows the project's coding style
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Screenshots (if applicable)

Additional Notes

- Add binary content detection based on null byte percentage (>5%)
- Add control character validation to avoid processing binary files as text
- Remove null bytes from decoded text content to prevent downstream issues
- Add logging for binary content detection and null byte removal

This prevents potential issues when processing files that contain null bytes
or other binary data that could cause problems in text processing pipelines.

The fix uses a 5% threshold for null bytes to distinguish between text files
with occasional null bytes and truly binary content.
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


OpenClaw Integration seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@MaojiaSheng MaojiaSheng merged commit 454e727 into volcengine:main Feb 19, 2026
1 check was pending
@github-project-automation github-project-automation bot moved this from Backlog to Done in OpenViking project Feb 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants