feat(translation-worker): add translation worker with argostranslate by winsomeglint · Pull Request #8 · ICIJ/datashare-python

winsomeglint · 2026-03-05T11:25:41Z

Adds a translation worker under translation-worker powered by ctranslate2 and argostranslate.

ClemDoum · 2026-03-23T11:45:42Z

datashare-python/datashare_python/utils.py

+
+
+# Temporal utils
+async def async_batches(


I feel like these ones could go in here: https://github.com/ICIJ/icij-python/tree/main/icij-common/icij_common
Since there are not datashare / temporal specific but more asyncio / iteratools related utils useful in all icij python projects

translation-worker/translation_worker/core.py

translation-worker/translation_worker/activities.py

ClemDoum · 2026-03-23T12:48:47Z

translation-worker/translation_worker/objects.py

+
+    task: str = Field(default=TRANSLATION_TASK_NAME, frozen=True)
+    device: str = Field(default=CPU, frozen=True)
+    batch_size: int = 16


I think we should split between the TranslationConfig and TranslationWorkerConfig.
The TranslationConfig should hold all parameters which can be tweaked between calls, it should be filled by the caller/dev while the TranslationWorkerConfig should be created and set bet the person who deploys the worker and can tune parameters according to the actual worker resources.

For some parameters its not trivial to decide but for others it's more straightforward.
I'd say:

device should be in TranslationWorkerConfig (the worker knows it it's a CPU or GPU worker)

batch_size / max_parallel_batches it's tempting to put it in the TranslationConfig, but probably more appropriate in the TranslationWorkerConfig. This way we can deploy the same code, but just change the deployments if workers are hitting OOM or are not running at full speed

beam_size / num_hypotheses should be TranslationConfig since it impact the translation output

inter_threads / intra_threads / compute_type since to be more worker options than runtime options

My concern with allowing beam_size and num_hypotheses to be determined by the requestor is that both of these have a significant effect on resource consumption—beam_size especially. Someone could theoretically blow the GPU out in a single pass if they set it high enough. I prefer putting them all onto a worker config and actually moving away from payload completely, or else setting a beam_size max (I'm also going to drop num_hypotheses for now since we're only ever returning the highest scorer).

That sounds reasonable agreed

translation-worker/translation_worker/activities.py

translation-worker/translation_worker/utils.py

translation-worker/translation_worker/workflows.py

winsomeglint force-pushed the temporal-translation branch 2 times, most recently from 00907da to 3468b67 Compare March 5, 2026 15:21

ClemDoum force-pushed the main branch from 74d91db to 5040387 Compare March 5, 2026 15:54

winsomeglint changed the title ~~initial commit~~ translation worker Mar 10, 2026

winsomeglint force-pushed the temporal-translation branch 4 times, most recently from c12bc74 to efaa435 Compare March 12, 2026 11:19

ClemDoum and others added 2 commits March 12, 2026 12:19

fix(datashare-python,worker-template): template package name

50d5d99

chore(translation-worker): add translation worker with argostranslate

271c4c1

winsomeglint force-pushed the temporal-translation branch from efaa435 to 271c4c1 Compare March 12, 2026 11:19

winsomeglint changed the title ~~translation worker~~ (chore:translation-worker): add translation worker with argostranslate Mar 12, 2026

winsomeglint requested a review from ClemDoum March 12, 2026 11:23

winsomeglint added 2 commits March 23, 2026 13:14

add activities unit tests

a35b3b2

simplify some tests

4e90954

ClemDoum reviewed Mar 24, 2026

View reviewed changes

ClemDoum requested changes Mar 24, 2026

View reviewed changes

translation-worker/translation_worker/workflows.py Outdated Show resolved Hide resolved

batching refactor

12152b8

winsomeglint changed the title ~~(chore:translation-worker): add translation worker with argostranslate~~ feat(translation-worker): add translation worker with argostranslate Mar 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(translation-worker): add translation worker with argostranslate#8

feat(translation-worker): add translation worker with argostranslate#8
winsomeglint wants to merge 5 commits intomainfrom
temporal-translation

winsomeglint commented Mar 5, 2026 •

edited

Loading

Uh oh!

ClemDoum Mar 23, 2026

Uh oh!

Uh oh!

Uh oh!

ClemDoum Mar 23, 2026

Uh oh!

winsomeglint Mar 30, 2026 •

edited

Loading

Uh oh!

ClemDoum Mar 30, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

winsomeglint commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ClemDoum Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ClemDoum Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

winsomeglint Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ClemDoum Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

winsomeglint commented Mar 5, 2026 •

edited

Loading

winsomeglint Mar 30, 2026 •

edited

Loading