Skip to content

8587 test erros on pytorch release 2508 on series 50#8770

Open
garciadias wants to merge 37 commits intoProject-MONAI:devfrom
garciadias:8587-test-erros-on-pytorch-release-2508-on-series-50
Open

8587 test erros on pytorch release 2508 on series 50#8770
garciadias wants to merge 37 commits intoProject-MONAI:devfrom
garciadias:8587-test-erros-on-pytorch-release-2508-on-series-50

Conversation

@garciadias
Copy link
Copy Markdown
Contributor

@garciadias garciadias commented Mar 9, 2026

Fixes #8587.

Description

On NVIDIA Blackwell GPUs (compute capability 12.x, sm_120), running MONAI with USE_COMPILED=True produces silently incorrect spatial transform results. The root cause is that the MONAI compiled C extension (monai._C) is built at install time against a fixed set of CUDA architectures (TORCH_CUDA_ARCH_LIST). Blackwell (sm_120) is not included in the default build list (which tops out at sm_90, Hopper), so grid_pull executes against a mismatched PTX or JIT path and silently returns wrong values — there is no runtime error.

This affects two independent code paths:

1. spatial_resample / Resample (monai/transforms/spatial/functional.py)
When USE_COMPILED=True, spatial_resample unconditionally calls grid_pull. On Blackwell GPUs this produces incorrect resampling output without raising any exception.

2. Warp (monai/networks/blocks/warp.py)
Warp.__init__ stores interpolation and padding modes as integers when USE_COMPILED=True (as required by grid_pull). The PyTorch-native fallback path (F.grid_sample) requires string modes. Without a Blackwell-aware fallback, there was no path to trigger this mismatch — but once the device check forces the fallback, the integer modes cause a type error at runtime.

Fix

A private helper _compiled_unsupported(device: torch.device) -> bool is added to monai/transforms/spatial/functional.py. It returns True for CUDA devices with compute capability major ≥ 12, and False for all other devices (CPU, older GPUs).

  • In spatial_resample, the compiled path is now gated on USE_COMPILED and not _compiled_unsupported(img.device), falling back to the PyTorch-native affine_grid + grid_sample path on unsupported devices.
  • In Warp, the same gate is applied in forward(). Additionally, __init__ now always stores both the compiled integer modes (for grid_pull) and the native string modes (for F.grid_sample), ensuring the fallback path has correctly-typed arguments regardless of how USE_COMPILED was set at initialisation time.

Behaviour on all previously supported GPU architectures (sm_75 through sm_90) is unchanged.

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

Signed-off-by: R. Garcia-Dias <rafaelagd@gmail.com>
…en USE_COMPILED=True

monai._C (grid_pull) was not compiled with sm_120 (Blackwell) architecture
support, causing spatial_resample to produce incorrect results on RTX 50-series
GPUs when USE_COMPILED=True.

Add _compiled_unsupported() to detect compute capability major >= 12 at
runtime and transparently fall back to the PyTorch-native affine_grid +
grid_sample path, which is verified correct on sm_120.

Fixes test_flips_inverse_124 in tests.transforms.spatial.test_spatial_resampled
on NVIDIA GeForce RTX 5090 (Blackwell, sm_120).
The same USE_COMPILED guard that was fixed in spatial_resample
(functional.py) was also present in Resample.__call__ (array.py),
used by Affine, RandAffine and related transforms.

Apply the same _compiled_unsupported() check so that grid_pull is not
called on sm_120 (Blackwell) devices when monai._C lacks sm_120 support,
preventing garbage output in test_affine, test_affined, test_rand_affine
and test_rand_affined on RTX 50-series GPUs.
Signed-off-by: R. Garcia-Dias <rafaelagd@gmail.com>
Signed-off-by: R. Garcia-Dias <rafaelagd@gmail.com>
Signed-off-by: R. Garcia-Dias <rafaelagd@gmail.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 9, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The diff mostly standardizes string formatting and minor tuple/whitespace adjustments across many modules. It introduces a private helper _compiled_unsupported(device: torch.device) -> bool and uses it to disable the compiled spatial resampling path on specific CUDA devices. Related changes add device-gated fallbacks (using torch.nn.functional.grid_sample) in spatial resampling and a warp block, and update mode/backend resolution to respect the new gate. A new test module for compiled/GPU support detection was added. No public API signatures were changed.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 3

❌ Failed checks (2 warnings, 1 inconclusive)

Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning PR includes extensive formatting changes (multi-line strings to single-line, type-ignore comment spacing, blank lines after docstrings) that are out of scope relative to the stated objective of fixing Blackwell GPU support. Separate formatting/whitespace changes into a dedicated cleanup PR; keep this PR focused solely on GPU capability detection and fallback logic for #8587.
Docstring Coverage ⚠️ Warning Docstring coverage is 76.60% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive Title is vague and poorly formatted; '8587 test erros on pytorch release 2508 on series 50' lacks clarity and contains a typo ('erros' instead of 'errors') making it unclear what the PR actually accomplishes. Use a clear, descriptive title like 'Add GPU compute capability check for compiled spatial transforms' or 'Fallback to native grid_sample on unsupported Blackwell GPUs' to accurately convey the main change.
✅ Passed checks (2 passed)
Check name Status Explanation
Linked Issues check ✅ Passed PR successfully addresses #8587 by implementing _compiled_unsupported helper to detect unsupported CUDA compute capability and fallback to native PyTorch path for spatial resampling on Blackwell GPUs.
Description check ✅ Passed PR description covers the issue (#8587), root cause, fix approach, affected code paths, and types of changes. All required template sections are present and substantively completed.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
monai/transforms/spatial/array.py (1)

2066-2117: ⚠️ Potential issue | 🟠 Major

Convert grid coordinates when falling back to grid_sample with norm_coords=False.

When _compiled_unsupported() returns True, the fallback path at line 2103+ passes grids directly to grid_sample without converting from compiled convention [0, size-1] to PyTorch convention [-1, 1]. This causes silent missampling on unsupported devices (Blackwell cc12.x). Add coordinate conversion in the else block or reject the norm_coords=False + compiled fallback combination explicitly. Add test coverage for this scenario.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@monai/transforms/spatial/array.py` around lines 2066 - 2117, The fallback
branch that calls torch.nn.functional.grid_sample (else block) fails to convert
compiled-style grid coordinates [0, size-1] to PyTorch normalized coords [-1, 1]
when self.norm_coords is False; update the else branch before calling
grid_sample to convert grid_t per-dimension: for each i and corresponding
spatial size dim from img_t.shape, compute grid_t[0, ..., i] = (grid_t[0, ...,
i] * 2.0 / max(1, dim - 1)) - 1.0 (use max to avoid division by zero) so
grid_sample receives normalized coords, and preserve dtype/device/memory_format
as done elsewhere (target symbols: _compiled_unsupported, grid_sample,
self.norm_coords, grid_t, img_t, moveaxis).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@monai/apps/auto3dseg/bundle_gen.py`:
- Around line 266-268: The error message raised in BundleAlgo._run_cmd uses a
concatenated string missing a space after the period; update the
NotImplementedError text that references self.device_setting['MN_START_METHOD']
so it reads "...not supported yet. Try modify BundleAlgo._run_cmd for your
cluster." (preserve the current exception chaining "from err") to fix the
missing space between sentences.

In `@monai/apps/detection/networks/retinanet_detector.py`:
- Around line 519-522: The error message raised when self.inferer is None
contains a missing space after the period; update the string in the raise
ValueError call to insert a space so it reads "`self.inferer` is not defined.
Please refer to function self.set_sliding_window_inferer(*)" (locate the check
referencing self.inferer and the call to self.set_sliding_window_inferer to
apply the fix).

---

Outside diff comments:
In `@monai/transforms/spatial/array.py`:
- Around line 2066-2117: The fallback branch that calls
torch.nn.functional.grid_sample (else block) fails to convert compiled-style
grid coordinates [0, size-1] to PyTorch normalized coords [-1, 1] when
self.norm_coords is False; update the else branch before calling grid_sample to
convert grid_t per-dimension: for each i and corresponding spatial size dim from
img_t.shape, compute grid_t[0, ..., i] = (grid_t[0, ..., i] * 2.0 / max(1, dim -
1)) - 1.0 (use max to avoid division by zero) so grid_sample receives normalized
coords, and preserve dtype/device/memory_format as done elsewhere (target
symbols: _compiled_unsupported, grid_sample, self.norm_coords, grid_t, img_t,
moveaxis).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: aa93797a-0eea-4904-a754-c937994e99c8

📥 Commits

Reviewing files that changed from the base of the PR and between daaedaa and 36e2623.

📒 Files selected for processing (14)
  • monai/apps/auto3dseg/bundle_gen.py
  • monai/apps/detection/networks/retinanet_detector.py
  • monai/apps/detection/utils/anchor_utils.py
  • monai/apps/detection/utils/detector_utils.py
  • monai/auto3dseg/analyzer.py
  • monai/data/wsi_reader.py
  • monai/losses/unified_focal_loss.py
  • monai/metrics/meandice.py
  • monai/networks/blocks/patchembedding.py
  • monai/networks/layers/factories.py
  • monai/transforms/croppad/array.py
  • monai/transforms/regularization/array.py
  • monai/transforms/spatial/array.py
  • monai/transforms/spatial/functional.py

@garciadias
Copy link
Copy Markdown
Contributor Author

garciadias commented Mar 9, 2026

Tolerance tests are passing.

🐧 ❯ docker run --rm -v ./:/opt/monai/ --name monai_slim --gpus=all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 -it monai:slim python -m unittest \
                 tests.transforms.test_affine \
                 tests.transforms.test_affined \
                 tests.transforms.test_affine_grid \
                 tests.transforms.test_rand_affine \
                 tests.transforms.test_rand_affine_grid \
                 tests.transforms.test_rand_affined \
                 tests.transforms.test_create_grid_and_affine \
                 tests.transforms.test_spatial_resample \
                 tests.transforms.spatial.test_spatial_resampled \
                 tests.networks.layers.test_affine_transform \
                 tests.data.utils.test_zoom_affine \
                 tests.integration.test_meta_affine
tf32 enabled: False
monai.transforms.spatial.array Orientation.__init__:labels: Current default value of argument `labels=(('L', 'R'), ('P', 'A'), ('I', 'S'))` was changed in version None from `labels=(('L', 'R'), ('P', 'A'), ('I', 'S'))` to `labels=None`. Default value changed to None meaning that the transform now uses the 'space' of a meta-tensor, if applicable, to determine appropriate axis labels.
monai.transforms.spatial.dictionary Orientationd.__init__:labels: Current default value of argument `labels=(('L', 'R'), ('P', 'A'), ('I', 'S'))` was changed in version None from `labels=(('L', 'R'), ('P', 'A'), ('I', 'S'))` to `labels=None`. Default value changed to None meaning that the transform now uses the 'space' of a meta-tensor, if applicable, to determine appropriate axis labels.
.........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................__array__ implementation doesn't accept a copy keyword, so passing copy=False failed. __array__ must implement 'dtype' and 'copy' keyword arguments. To learn more, see the migration guide https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword
........__array__ implementation doesn't accept a copy keyword, so passing copy=False failed. __array__ must implement 'dtype' and 'copy' keyword arguments. To learn more, see the migration guide https://numpy.org/devdocs/numpy_2_0_migration_guide.html#adapting-to-changes-in-the-copy-keyword
...................................................................................................................................................................................................................................................................................Since version 1.3.0, affine_grid behavior has changed for unit-size grids when align_corners=True. This is not an intended use case of affine_grid. See the documentation of affine_grid for details.
...........................2026-03-09 13:37:04,857 - INFO - Verified 'ref_avg152T1_LR.nii.gz', sha256: c01a50caa7a563158ecda43d93a1466bfc8aa939bc16b06452ac1089c54661c8.
2026-03-09 13:37:04,858 - INFO - File exists: /opt/monai/tests/testing_data/ref_avg152T1_LR.nii.gz, skipped downloading.
2026-03-09 13:37:04,858 - INFO - Verified 'ref_avg152T1_RL.nii.gz', sha256: 8a731128dac4de46ccb2cc60d972b98f75a52f21fb63ddb040ca96f0aed8b51a.
2026-03-09 13:37:04,858 - INFO - File exists: /opt/monai/tests/testing_data/ref_avg152T1_RL.nii.gz, skipped downloading.
builtin type SwigPyPacked has no __module__ attribute
builtin type SwigPyObject has no __module__ attribute
builtin type swigvarlink has no __module__ attribute
...........monai.transforms.spatial.array Orientation.__init__:labels: Current default value of argument `labels=(('L', 'R'), ('P', 'A'), ('I', 'S'))` was changed in version None from `labels=(('L', 'R'), ('P', 'A'), ('I', 'S'))` to `labels=None`. Default value changed to None meaning that the transform now uses the 'space' of a meta-tensor, if applicable, to determine appropriate axis labels.
........................monai.transforms.spatial.dictionary Orientationd.__init__:labels: Current default value of argument `labels=(('L', 'R'), ('P', 'A'), ('I', 'S'))` was changed in version None from `labels=(('L', 'R'), ('P', 'A'), ('I', 'S'))` to `labels=None`. Default value changed to None meaning that the transform now uses the 'space' of a meta-tensor, if applicable, to determine appropriate axis labels.
............
----------------------------------------------------------------------
Ran 910 tests in 12.732s

OK
sys:1: DeprecationWarning: builtin type swigvarlink has no __module__ attribute

…ytorch-release-2508-on-series-50

DCO Remediation Commit for R. Garcia-Dias <rafaelagd@gmail.com>

I, R. Garcia-Dias <rafaelagd@gmail.com>, hereby add my Signed-off-by to this commit: ba56a6d
I, R. Garcia-Dias <rafaelagd@gmail.com>, hereby add my Signed-off-by to this commit: 09c2cd9
I, R. Garcia-Dias <rafaelagd@gmail.com>, hereby add my Signed-off-by to this commit: 7cd0607

Signed-off-by: R. Garcia-Dias <rafaelagd@gmail.com>
@garciadias garciadias force-pushed the 8587-test-erros-on-pytorch-release-2508-on-series-50 branch from f64b17a to 356956a Compare March 9, 2026 13:43
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
monai/apps/detection/networks/retinanet_detector.py (1)

519-522: ⚠️ Potential issue | 🟡 Minor

Missing space after period.

"is not defined.Please""is not defined. Please".

Proposed fix
             raise ValueError(
-                "`self.inferer` is not defined.Please refer to function self.set_sliding_window_inferer(*)."
+                "`self.inferer` is not defined. Please refer to function self.set_sliding_window_inferer(*)."
             )

,

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@monai/apps/detection/networks/retinanet_detector.py` around lines 519 - 522,
The ValueError message raised when self.inferer is None in retinanet_detector.py
contains a missing space after the period; update the string in the check inside
the method where self.inferer is validated (the block referencing self.inferer
and the suggestion to call self.set_sliding_window_inferer) to read
"`self.inferer` is not defined. Please refer to function
self.set_sliding_window_inferer(*)`" so there is a space after the period.
🧹 Nitpick comments (1)
monai/transforms/spatial/functional.py (1)

57-76: Function implementation is sound; docstring lacks formal Args/Returns.

The logic correctly gates Blackwell GPUs (sm_120+). Consider adding explicit Args: and Returns: sections to align with Google-style docstrings per project conventions.

📝 Suggested docstring format
 def _compiled_unsupported(device: torch.device) -> bool:
     """
     Return True if ``monai._C`` (the compiled C extension providing ``grid_pull``) is not
     compiled with support for the given CUDA device's compute capability.
-
+    
+    Args:
+        device: The torch device to check for compiled extension support.
+    
+    Returns:
+        True if the device is CUDA with compute capability major >= 12 (Blackwell+),
+        False otherwise.
+    
     ``monai._C`` is built at install time against a fixed set of CUDA architectures.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@monai/transforms/spatial/functional.py` around lines 57 - 76, The
_compiled_unsupported function's docstring is missing formal Google-style
"Args:" and "Returns:" sections; update the docstring for
_compiled_unsupported(device: torch.device) to include an "Args:" entry
describing the device parameter (type and semantics) and a "Returns:" entry
stating it returns a bool indicating whether the compiled monai._C lacks support
for the device (True for CUDA devices with major compute capability >= 12, False
otherwise), keeping the existing explanatory text intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@monai/transforms/spatial/functional.py`:
- Around line 57-76: Warp.forward currently calls grid_pull when USE_COMPILED is
true without checking whether the compiled C extension supports the device;
update Warp.forward to compute _use_compiled = USE_COMPILED and not
_compiled_unsupported(image.device) (reuse the existing _compiled_unsupported
function from functional.py) and branch: if not _use_compiled use the PyTorch
native affine_grid + grid_sample path, else call grid_pull as before, ensuring
grid_pull is only invoked on supported CUDA devices (reference symbols:
Warp.forward, USE_COMPILED, _compiled_unsupported, grid_pull).

---

Duplicate comments:
In `@monai/apps/detection/networks/retinanet_detector.py`:
- Around line 519-522: The ValueError message raised when self.inferer is None
in retinanet_detector.py contains a missing space after the period; update the
string in the check inside the method where self.inferer is validated (the block
referencing self.inferer and the suggestion to call
self.set_sliding_window_inferer) to read "`self.inferer` is not defined. Please
refer to function self.set_sliding_window_inferer(*)`" so there is a space after
the period.

---

Nitpick comments:
In `@monai/transforms/spatial/functional.py`:
- Around line 57-76: The _compiled_unsupported function's docstring is missing
formal Google-style "Args:" and "Returns:" sections; update the docstring for
_compiled_unsupported(device: torch.device) to include an "Args:" entry
describing the device parameter (type and semantics) and a "Returns:" entry
stating it returns a bool indicating whether the compiled monai._C lacks support
for the device (True for CUDA devices with major compute capability >= 12, False
otherwise), keeping the existing explanatory text intact.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a83bf9dd-d11c-4601-9888-15b532b84f0b

📥 Commits

Reviewing files that changed from the base of the PR and between f64b17a and 356956a.

📒 Files selected for processing (7)
  • monai/apps/auto3dseg/bundle_gen.py
  • monai/apps/detection/networks/retinanet_detector.py
  • monai/auto3dseg/analyzer.py
  • monai/data/wsi_reader.py
  • monai/transforms/regularization/array.py
  • monai/transforms/spatial/array.py
  • monai/transforms/spatial/functional.py
💤 Files with no reviewable changes (1)
  • monai/transforms/regularization/array.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • monai/data/wsi_reader.py
  • monai/apps/auto3dseg/bundle_gen.py

Signed-off-by: R. Garcia-Dias <rafaelagd@gmail.com>
- Import _compiled_unsupported from spatial.functional
- Check device compatibility before using grid_pull
- Fall back to PyTorch grid_sample for unsupported CUDA devices
- Prevents silent failures on Blackwell GPUs and future architectures
- Add coordinate conversion from [0, size-1] to [-1, 1] convention
- Implement fallback to PyTorch grid_sample when compiled extension unavailable
- Prevents misaligned output on Blackwell GPUs and unsupported architectures
- Ensures correct behavior when _compiled_unsupported() returns True
- Add Args section documenting device parameter
- Add Returns section describing return values
- Add Note section with detailed explanation
- Clarify compute capability threshold and rebuild implications
- bundle_gen.py: Add space after period in NotImplementedError message
- retinanet_detector.py: Add space after period in ValueError message
- Improves readability and follows English punctuation standards
- TestCompiledUnsupported: Verify device detection logic
- TestResampleFallback: Verify fallback behavior on unsupported devices
- Tests for CPU, CUDA, and non-CUDA device handling
- Uses unittest framework only (no pytest dependency)
- All tests pass on current supported architectures
Signed-off-by: R. Garcia-Dias <rafaelagd@gmail.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
monai/networks/blocks/warp.py (1)

142-154: ⚠️ Potential issue | 🔴 Critical

Mode type mismatch causes runtime failure on unsupported devices.

When USE_COMPILED=True at instantiation, self._interp_mode and self._padding_mode are integers (e.g., 1, 7). If _compiled_unsupported() returns True at runtime (Blackwell GPU), the code falls back to F.grid_sample which requires string modes ("bilinear", "zeros"). Passing integers will raise RuntimeError.

🐛 Proposed fix: store both mode representations
     def __init__(self, mode=GridSampleMode.BILINEAR.value, padding_mode=GridSamplePadMode.BORDER.value, jitter=False):
         ...
         super().__init__()
-        # resolves _interp_mode for different methods
-
+        # Store native (string) mode for fallback path
+        self._native_interp_mode = GridSampleMode(mode).value
+        self._native_padding_mode = GridSamplePadMode(padding_mode).value
+
+        # resolves _interp_mode for compiled path
         if USE_COMPILED:
             if mode in (inter.value for inter in GridSampleMode):
                 mode = GridSampleMode(mode)
                 if mode == GridSampleMode.BILINEAR:
                     mode = 1
                 elif mode == GridSampleMode.NEAREST:
                     mode = 0
                 elif mode == GridSampleMode.BICUBIC:
                     mode = 3
                 else:
                     mode = 1  # default to linear
             self._interp_mode = mode
         else:
-            warnings.warn("monai.networks.blocks.Warp: Using PyTorch native grid_sample.")
-            self._interp_mode = GridSampleMode(mode).value
+            self._interp_mode = self._native_interp_mode

         # resolves _padding_mode for different methods
         if USE_COMPILED:
             ...
             self._padding_mode = padding_mode
         else:
-            self._padding_mode = GridSamplePadMode(padding_mode).value
+            self._padding_mode = self._native_padding_mode

Then in forward:

         if not _use_compiled:  # pytorch native grid_sample
             for i, dim in enumerate(grid.shape[1:-1]):
                 grid[..., i] = grid[..., i] * 2 / (dim - 1) - 1
             index_ordering: list[int] = list(range(spatial_dims - 1, -1, -1))
             grid = grid[..., index_ordering]  # z, y, x -> x, y, z
             return F.grid_sample(
-                image, grid, mode=self._interp_mode, padding_mode=f"{self._padding_mode}", align_corners=True
+                image, grid, mode=self._native_interp_mode, padding_mode=self._native_padding_mode, align_corners=True
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@monai/networks/blocks/warp.py` around lines 142 - 154, The fallback path to
F.grid_sample fails when self._interp_mode and self._padding_mode are stored as
integers by the compiled path; update the forward logic around the _use_compiled
check (where USE_COMPILED and _compiled_unsupported are used) to convert the
integer modes to the string names expected by torch.nn.functional.grid_sample
(e.g., map numeric self._interp_mode to "bilinear"/"nearest" and numeric
self._padding_mode to "zeros"/"border"/"reflection") before calling
F.grid_sample; keep the compiled-path call to grid_pull unchanged and reuse any
existing mappings or create a minimal mapping helper near the class (referencing
self._interp_mode, self._padding_mode, F.grid_sample, and grid_pull) so runtime
fallback on unsupported devices uses string mode values.
🧹 Nitpick comments (3)
tests/transforms/test_spatial_gpu_support.py (3)

26-34: Redundant test: test_non_cuda_device_always_supported duplicates test_cpu_device_always_supported.

Both tests create a CPU device and assert the same condition. Either remove the duplicate or expand the second test to cover other non-CUDA device types (e.g., "mps" if relevant).

♻️ Suggested fix
-    def test_non_cuda_device_always_supported(self):
-        """Non-CUDA devices should always be supported."""
-        device = torch.device("cpu")
-        self.assertFalse(_compiled_unsupported(device))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/transforms/test_spatial_gpu_support.py` around lines 26 - 34, The test
test_non_cuda_device_always_supported is a duplicate of
test_cpu_device_always_supported; remove the redundant test or replace it to
cover other non-CUDA device types (e.g., create torch.device("mps") and assert
False from _compiled_unsupported(device) when MPS is relevant), ensuring you
keep one explicit CPU check (test_cpu_device_always_supported) and either delete
test_non_cuda_device_always_supported or modify it to assert behavior for other
non-CUDA devices using the same helper _compiled_unsupported.

65-77: Consider overlap with TestCompiledUnsupported.test_cuda_device_detection.

This test duplicates CUDA detection logic from line 36-47. If both are intended, consolidate; otherwise pick one location.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/transforms/test_spatial_gpu_support.py` around lines 65 - 77, The test
test_compiled_unsupported_logic duplicates CUDA detection logic already present
in TestCompiledUnsupported.test_cuda_device_detection; remove the duplication by
either deleting test_compiled_unsupported_logic or merge its assertions into
TestCompiledUnsupported.test_cuda_device_detection, keeping the validation of
_compiled_unsupported(cuda_device) vs. cc_major >= 12 under a single test;
ensure the helper _compiled_unsupported is exercised exactly once for CUDA
detection to avoid redundant coverage.

36-47: Unnecessary conditionals inside skip-guarded test.

Line 39's ternary and line 40's if device.type == "cuda" are redundant—the @unittest.skipIf already guarantees CUDA availability. Simplify:

♻️ Suggested simplification
     `@unittest.skipIf`(not torch.cuda.is_available(), reason="CUDA not available")
     def test_cuda_device_detection(self):
         """Verify CUDA compute capability detection."""
-        device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
-        if device.type == "cuda":
-            cc_major = torch.cuda.get_device_properties(device).major
-            unsupported = _compiled_unsupported(device)
-            # Device is unsupported if cc_major >= 12
-            if cc_major >= 12:
-                self.assertTrue(unsupported)
-            else:
-                self.assertFalse(unsupported)
+        device = torch.device("cuda:0")
+        cc_major = torch.cuda.get_device_properties(device).major
+        unsupported = _compiled_unsupported(device)
+        # Device is unsupported if cc_major >= 12
+        if cc_major >= 12:
+            self.assertTrue(unsupported)
+        else:
+            self.assertFalse(unsupported)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/transforms/test_spatial_gpu_support.py` around lines 36 - 47, The
test_cuda_device_detection contains redundant checks guarded by
`@unittest.skipIf`(torch.cuda.is_available()), so remove the ternary and inner if:
set device directly to torch.device("cuda:0") (no fallback), then call
torch.cuda.get_device_properties(device).major and _compiled_unsupported(device)
and assert based on cc_major >= 12; update references to device,
torch.cuda.is_available, and _compiled_unsupported accordingly to eliminate the
needless conditional branches.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/transforms/test_spatial_gpu_support.py`:
- Around line 80-83: Remove the duplicated module entry point: there are two
identical "if __name__ == \"__main__\": unittest.main()" blocks in
tests/transforms/test_spatial_gpu_support.py—delete the extra one so only a
single main invocation remains at the end of the file.
- Around line 59-63: The test method test_resample_compilation_flag_respected is
empty and will always pass; either implement a real check or explicitly
skip/remove it. Fix by implementing a unit test that verifies Resample respects
the _compiled_unsupported flag (e.g., mock device properties or simulate a
Blackwell GPU and assert Resample raises/skips compilation) or replace the body
with unittest.skip("TODO: implement test for _compiled_unsupported") or raise
unittest.SkipTest with a clear reason; target the test method name
test_resample_compilation_flag_respected and any helpers used to construct
Resample/mocked device properties.

---

Outside diff comments:
In `@monai/networks/blocks/warp.py`:
- Around line 142-154: The fallback path to F.grid_sample fails when
self._interp_mode and self._padding_mode are stored as integers by the compiled
path; update the forward logic around the _use_compiled check (where
USE_COMPILED and _compiled_unsupported are used) to convert the integer modes to
the string names expected by torch.nn.functional.grid_sample (e.g., map numeric
self._interp_mode to "bilinear"/"nearest" and numeric self._padding_mode to
"zeros"/"border"/"reflection") before calling F.grid_sample; keep the
compiled-path call to grid_pull unchanged and reuse any existing mappings or
create a minimal mapping helper near the class (referencing self._interp_mode,
self._padding_mode, F.grid_sample, and grid_pull) so runtime fallback on
unsupported devices uses string mode values.

---

Nitpick comments:
In `@tests/transforms/test_spatial_gpu_support.py`:
- Around line 26-34: The test test_non_cuda_device_always_supported is a
duplicate of test_cpu_device_always_supported; remove the redundant test or
replace it to cover other non-CUDA device types (e.g., create
torch.device("mps") and assert False from _compiled_unsupported(device) when MPS
is relevant), ensuring you keep one explicit CPU check
(test_cpu_device_always_supported) and either delete
test_non_cuda_device_always_supported or modify it to assert behavior for other
non-CUDA devices using the same helper _compiled_unsupported.
- Around line 65-77: The test test_compiled_unsupported_logic duplicates CUDA
detection logic already present in
TestCompiledUnsupported.test_cuda_device_detection; remove the duplication by
either deleting test_compiled_unsupported_logic or merge its assertions into
TestCompiledUnsupported.test_cuda_device_detection, keeping the validation of
_compiled_unsupported(cuda_device) vs. cc_major >= 12 under a single test;
ensure the helper _compiled_unsupported is exercised exactly once for CUDA
detection to avoid redundant coverage.
- Around line 36-47: The test_cuda_device_detection contains redundant checks
guarded by `@unittest.skipIf`(torch.cuda.is_available()), so remove the ternary
and inner if: set device directly to torch.device("cuda:0") (no fallback), then
call torch.cuda.get_device_properties(device).major and
_compiled_unsupported(device) and assert based on cc_major >= 12; update
references to device, torch.cuda.is_available, and _compiled_unsupported
accordingly to eliminate the needless conditional branches.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 4e55f54c-4323-4d1d-9f33-104f1e2e6279

📥 Commits

Reviewing files that changed from the base of the PR and between 356956a and 4b5bf1e.

📒 Files selected for processing (19)
  • monai/apps/auto3dseg/bundle_gen.py
  • monai/apps/detection/networks/retinanet_detector.py
  • monai/apps/detection/transforms/box_ops.py
  • monai/auto3dseg/analyzer.py
  • monai/bundle/scripts.py
  • monai/data/dataset.py
  • monai/handlers/utils.py
  • monai/metrics/utils.py
  • monai/networks/blocks/warp.py
  • monai/transforms/io/array.py
  • monai/transforms/spatial/array.py
  • monai/transforms/spatial/functional.py
  • monai/transforms/utility/array.py
  • tests/integration/test_loader_semaphore.py
  • tests/profile_subclass/profiling.py
  • tests/profile_subclass/pyspy_profiling.py
  • tests/transforms/croppad/test_pad_nd_dtypes.py
  • tests/transforms/test_spatial_gpu_support.py
  • versioneer.py
✅ Files skipped from review due to trivial changes (15)
  • tests/profile_subclass/pyspy_profiling.py
  • tests/integration/test_loader_semaphore.py
  • monai/transforms/io/array.py
  • tests/profile_subclass/profiling.py
  • tests/transforms/croppad/test_pad_nd_dtypes.py
  • monai/handlers/utils.py
  • versioneer.py
  • monai/metrics/utils.py
  • monai/auto3dseg/analyzer.py
  • monai/apps/detection/transforms/box_ops.py
  • monai/bundle/scripts.py
  • monai/apps/auto3dseg/bundle_gen.py
  • monai/apps/detection/networks/retinanet_detector.py
  • monai/data/dataset.py
  • monai/transforms/utility/array.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • monai/transforms/spatial/functional.py
  • monai/transforms/spatial/array.py

Signed-off-by: R. Garcia-Dias <rafaelagd@gmail.com>
I, R. Garcia-Dias <rafaelagd@gmail.com>, hereby add my Signed-off-by to this commit: 8de64af
I, R. Garcia-Dias <rafaelagd@gmail.com>, hereby add my Signed-off-by to this commit: 7c2ddb6
I, R. Garcia-Dias <rafaelagd@gmail.com>, hereby add my Signed-off-by to this commit: cfe5524
I, R. Garcia-Dias <rafaelagd@gmail.com>, hereby add my Signed-off-by to this commit: 1dec216
I, R. Garcia-Dias <rafaelagd@gmail.com>, hereby add my Signed-off-by to this commit: ec8bf1f

Signed-off-by: R. Garcia-Dias <rafaelagd@gmail.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/transforms/test_spacing.py (1)

34-37: Missing docstring on helper function.

Per coding guidelines, definitions should have docstrings describing parameters and return values.

📝 Suggested docstring
 def _template_5_expected_output(device: torch.device) -> torch.Tensor:
+    """Return expected output tensor for template 5 based on device capabilities.
+
+    Args:
+        device: Target torch device to check for compiled extension support.
+
+    Returns:
+        _TEMPLATE_5_COMPILED if compiled path is active, _TEMPLATE_5_NATIVE otherwise.
+    """
     if USE_COMPILED and not _compiled_unsupported(device):
         return _TEMPLATE_5_COMPILED
     return _TEMPLATE_5_NATIVE
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/transforms/test_spacing.py` around lines 34 - 37, Add a docstring to
the helper function _template_5_expected_output that briefly describes what the
function returns, documents the device parameter (type: torch.device) and the
return value (torch.Tensor), and notes behavior differences when USE_COMPILED is
true and _compiled_unsupported(device) is checked; place the docstring
immediately under the def line for _template_5_expected_output.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/transforms/test_spacing.py`:
- Around line 34-37: Add a docstring to the helper function
_template_5_expected_output that briefly describes what the function returns,
documents the device parameter (type: torch.device) and the return value
(torch.Tensor), and notes behavior differences when USE_COMPILED is true and
_compiled_unsupported(device) is checked; place the docstring immediately under
the def line for _template_5_expected_output.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 143d48d8-9830-4c5f-b6dc-584e8fedbffd

📥 Commits

Reviewing files that changed from the base of the PR and between 4b5bf1e and 057ff4d.

📒 Files selected for processing (7)
  • monai/apps/detection/transforms/dictionary.py
  • monai/apps/detection/utils/anchor_utils.py
  • monai/apps/reconstruction/transforms/array.py
  • monai/bundle/utils.py
  • monai/losses/dice.py
  • monai/losses/focal_loss.py
  • tests/transforms/test_spacing.py
✅ Files skipped from review due to trivial changes (6)
  • monai/bundle/utils.py
  • monai/losses/focal_loss.py
  • monai/apps/detection/utils/anchor_utils.py
  • monai/apps/reconstruction/transforms/array.py
  • monai/losses/dice.py
  • monai/apps/detection/transforms/dictionary.py

claude and others added 7 commits March 30, 2026 08:24
…GPUs

When USE_COMPILED=True, Warp.__init__ stores integer interpolation/padding
modes (e.g. 1, 0, 7) for grid_pull. However, when a Blackwell GPU triggers
the runtime fallback in forward(), F.grid_sample is called with those integers
instead of the required strings ("bilinear", "zeros", etc.), causing a crash.

Fix by always storing _interp_mode_native and _padding_mode_native as string
attributes in __init__, and using them exclusively in the F.grid_sample call.

Also clean up test_spatial_gpu_support.py:
- Remove duplicate test_non_cuda_device_always_supported (identical to cpu test)
- Implement test_resample_compilation_flag_respected using unittest.mock so it
  runs without a physical Blackwell device
- Remove duplicate if __name__ == "__main__" block

https://claude.ai/code/session_015SGxtVTnVKUHk7hWHLSRZK
Tests that hang indefinitely (e.g. GPU ops stuck on Blackwell) block the
entire suite. Add a --timeout SECONDS option to tests/runner.py that uses
SIGALRM to interrupt any individual test that exceeds the limit; the test is
recorded as an error and the runner continues with the next test.

- Default is 0 (disabled); SIGALRM support is checked at runtime so the flag
  is silently ignored on Windows.
- runtests.sh gains a matching --timeout [secs] flag (default 180s when the
  flag is given without a value) that is forwarded to runner.py for unit tests.

Usage:
  ./runtests.sh -u --timeout          # 3-minute per-test limit
  ./runtests.sh -u --timeout 60       # 1-minute per-test limit
  python tests/runner.py --timeout 180

https://claude.ai/code/session_015SGxtVTnVKUHk7hWHLSRZK
…-Apjwl' into 8587-test-erros-on-pytorch-release-2508-on-series-50
…handling for video writing

Signed-off-by: R. Garcia-Dias <rafaelagd@gmail.com>
…vent access errors

Signed-off-by: R. Garcia-Dias <rafaelagd@gmail.com>
…vent access errors

Signed-off-by: R. Garcia-Dias <rafaelagd@gmail.com>

DCO Remediation Commit for Claude <noreply@anthropic.com>

I, R. Garcia-Dias <rafaelagd@gmail.com>, hereby add my Signed-off-by to this commit: 2b5b367
I, R. Garcia-Dias <rafaelagd@gmail.com>, hereby add my Signed-off-by to this commit: 1b5ac46

Signed-off-by: R. Garcia-Dias <rafaelagd@gmail.com>
@garciadias garciadias force-pushed the 8587-test-erros-on-pytorch-release-2508-on-series-50 branch from 10d3e12 to 0297a48 Compare April 2, 2026 13:01
I, Claude <noreply@anthropic.com>, hereby add my Signed-off-by to this commit: 2b5b367
I, Claude <noreply@anthropic.com>, hereby add my Signed-off-by to this commit: 1b5ac46

Signed-off-by: Claude <noreply@anthropic.com>
@garciadias garciadias force-pushed the 8587-test-erros-on-pytorch-release-2508-on-series-50 branch from 0297a48 to 2feca34 Compare April 2, 2026 13:02
claude and others added 8 commits April 2, 2026 13:05
I, R. Garcia-Dias <rafaelagd@gmail.com>, hereby add my Signed-off-by to this commit: 2b5b367
I, R. Garcia-Dias <rafaelagd@gmail.com>, hereby add my Signed-off-by to this commit: 1b5ac46

Signed-off-by: R. Garcia-Dias <rafaelagd@gmail.com>
…-Apjwl' into 8587-test-erros-on-pytorch-release-2508-on-series-50
I, Claude <noreply@anthropic.com>, hereby add my Signed-off-by to this commit: 2b5b367
I, Claude <noreply@anthropic.com>, hereby add my Signed-off-by to this commit: 1b5ac46
I, Claude <noreply@anthropic.com>, hereby add my Signed-off-by to this commit: 0b887a3

Signed-off-by: Claude <noreply@anthropic.com>
…-Apjwl' into 8587-test-erros-on-pytorch-release-2508-on-series-50
I, R. Garcia-Dias <rafaelagd@gmail.com>, hereby add my Signed-off-by to this commit: 43dd636

Signed-off-by: R. Garcia-Dias <rafaelagd@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test erros on PyTorch Release 25.08 on Series 50

2 participants