-
Notifications
You must be signed in to change notification settings - Fork 90
Description
Hi NTv3 Team,
I’m trying to fully understand the NTv3 post-training setup and would appreciate clarification on a few technical points regarding the methodology and evaluation. All the following questions are based on your paper: A foundational model for joint sequence-function multi-species modeling at scale for long-range genomic prediction.
1. Center 37.5% Clipping for Functional Tracks (Section 4.3.5)
The paper mentions calculating the loss only on the center 37.5% of the sequence to ensure adequate context.
- Determination: How was this specific 37.5% threshold determined? Was it through ablation studies on the effective receptive field?
- Confidence Decay: Have you evaluated how the prediction quality or confidence decays as the distance from the center increases?
2. Stride of 0.625 vs 0.375 during Inference (Section 4.7.1)
The text states: "we use a stride of 0.625 (calculated as 1 - 0.375)". I have a confusion here:
- If the sequence is [0, 1] and the target is [0.3125, 0.6875] (the center 37.5%), a stride of 0.375 would perfectly align the next window's target region, [0.6875, 1.0625].
- A stride of 0.625 would move the next window's start to 0.625, placing its target region at [0.9375, 1.3125], which leaves a gap in the genomic coordinates.
Could you please clarify the exact sliding window mechanism used to ensure every nucleotide is covered?
3. Log1p Transformation in Benchmark Metrics
Section 4.3.7 mentions that functional tracks are evaluated using PCC on log-transformed raw values. However, the provided tutorial appears to calculate PCC on raw (scaled) values without the transformation for test set. I’d like to confirm how PCC calculated in the NTv3 benchmark.
4. Comparison with AlphaGenome
With AlphaGenome now open-sourced, are there any plans to benchmark NTv3 against it to compare the performance of a general-purpose post-trained foundation model versus a S2F expert model?
Thank you for your time and for the contribution of NTv3 to the community!
Best regards,
Cheng