Skip to content

feat: hybrid retrieval approach#8

Open
srini047 wants to merge 4 commits intodevrev:mainfrom
srini047:srini-01
Open

feat: hybrid retrieval approach#8
srini047 wants to merge 4 commits intodevrev:mainfrom
srini047:srini-01

Conversation

@srini047
Copy link
Copy Markdown

@srini047 srini047 commented Mar 16, 2026

Implementation:
BM42 Hybrid Retrieval with Haystack + Qdrant - Hybrid search system combining sparse (BM42) and dense (mxbai-embed-large-v1) embeddings for information retrieval on the DevRev knowledge base.

Indexing Pipeline:
DocumentCleaner → FastembedSparseDocumentEmbedder (BM42)
→ SentenceTransformersDocumentEmbedder (1024-dim)
→ DocumentWriter (Qdrant)

Retrieval Pipeline:
FastembedSparseTextEmbedder (BM42) + SentenceTransformersTextEmbedder
→ QdrantHybridRetriever (RRF fusion)

Work item: ISS-269621

@srini047
Copy link
Copy Markdown
Author

@nimit2801 Please help with the PR description validation.

@srini047
Copy link
Copy Markdown
Author

Please use this json for results: test_queries_results.json. Have the parquet as well.

@nimit2801
Copy link
Copy Markdown
Collaborator

hey @srini047

Kindly add this in your PR description: https://app.devrev.ai/devrev/works/ISS-269621

@prakhar7651
Copy link
Copy Markdown
Contributor

Hey!
These are your scores.
Recall@10: 0.3188
Precision@10: 0.2598

@prakhar7651
Copy link
Copy Markdown
Contributor

Hey!
There were some errors in the evaluation pipeline. These are your scores.
Recall: 0.1613
Precision: 0.1739

@srini047
Copy link
Copy Markdown
Author

srini047 commented Apr 1, 2026

Hey!
There were some errors in the evaluation pipeline. These are your scores.
Recall: 0.1613
Precision: 0.1739

What are the older scores then?

@prakhar7651
Copy link
Copy Markdown
Contributor

prakhar7651 commented Apr 1, 2026

What are the older scores then?

Previous calculations had some errors. We have added the correct scores. Please check.

@prakhar7651
Copy link
Copy Markdown
Contributor

prakhar7651 commented Apr 1, 2026

Looking at the quality of submissions and eagerness for folks to contribute, we're extending the deadline to April 7th. Evaluations would be still going on. Please keep contributing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants