Skip to content

feat: concurrent embedding, GitHub ZIP download, read offset/limit#267

Merged
MaojiaSheng merged 1 commit intovolcengine:mainfrom
yangxinxin-7:feat/code
Feb 24, 2026
Merged

feat: concurrent embedding, GitHub ZIP download, read offset/limit#267
MaojiaSheng merged 1 commit intovolcengine:mainfrom
yangxinxin-7:feat/code

Conversation

@yangxinxin-7
Copy link
Collaborator

Performance improvements:

  • upload_directory: three-phase approach (collect → pre-create dirs →
    concurrent upload via asyncio.Semaphore, limit 8). Memoized mkdir
    eliminates redundant AGFS calls.
  • tree_builder: replace recursive file-by-file move with single agfs.mv()
    wrapped in asyncio.to_thread
  • TextEmbeddingHandler: offload blocking embed() to thread pool
  • EmbeddingQueue: configurable max_concurrent workers via
    EmbeddingConfig.max_concurrent (default 1)

New features:

  • CodeRepositoryParser: use GitHub archive ZIP API instead of git clone
    for GitHub URLs without a specific commit (faster, no git history).
    Includes Zip Slip validation.
  • read()/read_file(): add offset/limit line-slicing, propagated through
    all client/service/HTTP/CLI layers

@CLAassistant
Copy link

CLAassistant commented Feb 24, 2026

CLA assistant check
All committers have signed the CLA.

@MaojiaSheng MaojiaSheng merged commit 7557f5d into volcengine:main Feb 24, 2026
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants