Skip to content

fixing NDArray save method to use its own compression parameters#601

Merged
FrancescAlted merged 1 commit intoBlosc:mainfrom
evgmik:fix-ndarray-save
Mar 14, 2026
Merged

fixing NDArray save method to use its own compression parameters#601
FrancescAlted merged 1 commit intoBlosc:mainfrom
evgmik:fix-ndarray-save

Conversation

@evgmik
Copy link
Contributor

@evgmik evgmik commented Mar 13, 2026

The 'NDArray.save' method does not path 'cparams' to the parent class 'copy' consequently array is reprocessed with the default 'cparams'.

To observe the bug run the script below.

================================================================= import os
import blosc2
import numpy as np

a = np.arange(10_000_000)
a = a * a

cparams = blosc2.CParams(
codec=blosc2.Codec.ZSTD,
clevel=9,
filters=[blosc2.Filter.BITSHUFFLE],
)

ba = blosc2.asarray(a, cparams=cparams)
print(f"Blosc2 memory: size {ba.cbytes}\t cratio: {ba.cratio}")

outdir = "cache/"
prefix = "save"
fname = outdir + prefix + ".b2nd"
ba.save(fname, mode="w")
fsize = os.path.getsize(fname)
print(f"Blosc2 array save:\t saved file size = {fsize}\t cratio: {a.nbytes/fsize}")

=================================================================

You should see

Blosc2 memory: size 4370284      cratio: 18.74477722729232
Blosc2 array save:       saved file size = 12370369      cratio: 6.467066584675041

ba.cbytes ---> 4370284
ba.cratio ---> 18.3

I.e. the array in memory has 'cbytes=4370284',
however the saved file has 12370369 bytes, which gives 'cratio' of about 6.5. Also if the array is loaded back from the file, it is easy to see that 'cparams' are different from the original array and changed to the default one.

After the patch, the memory 'cbytes' are closely matching the saved file size as well as 'cparams'

Blosc2 memory: size 4370284      cratio: 18.74477722729232
Blosc2 array save:       saved file size = 4370051       cratio: 18.30642251085856

The 'NDArray.save' method does not path 'cparams' to the parent class 'copy'
consequently array is reprocessed with the default 'cparams'.

To observe the bug run the script below.

=================================================================
import os
import blosc2
import numpy as np

a = np.arange(10_000_000)
a = a * a

cparams = blosc2.CParams(
    codec=blosc2.Codec.ZSTD,
    clevel=9,
    filters=[blosc2.Filter.BITSHUFFLE],
)

ba = blosc2.asarray(a, cparams=cparams)
print(f"Blosc2 memory: size {ba.cbytes}\t cratio: {ba.cratio}")

outdir = "cache/"
prefix = "save"
fname = outdir + prefix + ".b2nd"
ba.save(fname, mode="w")
fsize = os.path.getsize(fname)
print(f"Blosc2 array save:\t saved file size = {fsize}\t cratio: {a.nbytes/fsize}")

=================================================================

You should see
~~~~~
Blosc2 memory: size 4370284      cratio: 18.74477722729232
Blosc2 array save:       saved file size = 12370369      cratio: 6.467066584675041
~~~~~

ba.cbytes  ---> 4370284
ba.cratio ---> 18.3

I.e. the array in memory has 'cbytes=4370284',
however the saved file has 12370369 bytes, which gives 'cratio' of about 6.5.
Also if the array is loaded back from the file, it is easy to see
that 'cparams' are different from the original array and changed to the
default one.

After the patch, the memory 'cbytes' are closely matching the saved file size
as well as 'cparams'
~~~~~
Blosc2 memory: size 4370284      cratio: 18.74477722729232
Blosc2 array save:       saved file size = 4370051       cratio: 18.30642251085856
~~~~~
@FrancescAlted
Copy link
Member

LGTM. Thanks @evgmik !

@FrancescAlted FrancescAlted merged commit 1123f14 into Blosc:main Mar 14, 2026
12 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants