Dask Performance and Best Practices#
Overview#
nima leverages xarray and dask to support large bioimage datasets that do not fit in memory. By using lazy evaluation and chunked arrays, nima can process multi-terabyte datasets efficiently, provided that individual 2D planes (Y, X) fit in memory.
Chunking Strategy#
The recommended chunking strategy for nima is to parallelize across the Time (T), Channel (C), and Z-stack (Z) dimensions, while keeping the spatial dimensions (Y, X) contiguous.
Recommended Chunks#
# Optimal for nima workflows
chunks = {
"T": 1,
"C": 1,
"Z": 1,
"Y": -1, # Full size
"X": -1 # Full size
}
Why Y/X must be contiguous?#
Core operations in nima such as median filtering and segment (thresholding, watershed) are inherently 2D spatial operations.
Median Filter: Uses
scipy.ndimage.median_filter. Splitting Y/X into chunks would require complex overlap handling (map_overlap) to avoid edge artifacts. Currently,nimaprocesses full 2D planes.Segmentation: Thresholding methods (Yen, Li) and labeling (connected components) require global context of the 2D plane.
Performance Benchmarks#
Benchmarks were run on a standard workstation with the following parameters:
Array Size: T=10, C=3, Y=2048, X=2048 (Float64)
Total Data: ~1.2 Billion pixels (~9.6 GB raw)
Operation: Compute single timepoint (T=1)
Operation |
Time (per T=1) |
Throughput (approx) |
Notes |
|---|---|---|---|
Shading Correction |
~0.06s |
~1.6 GB/s |
Fully parallel, limited by memory bandwidth. |
Median Filter |
~0.13s |
~250 MB/s |
CPU-bound spatial convolution (3x3 disk). |
Note: Benchmarks vary by hardware.
Limitations#
Memory Usage: The largest single 2D plane (C, Y, X) or (Y, X) must fit in memory. For a 2k x 2k float64 image, this is small (~32MB). For very large stitched images (e.g., 50k x 50k), this approach will hit OOM errors.
Global Thresholding: Calculating thresholds requires computing histograms of the full 2D plane.
Best Practices#
Loading Data: When using
nima.io.read_image, dask arrays are created automatically if supported by the backend.Processing: Pipelines are built lazily. Use
.compute()only when you need the final result (e.g., plotting, saving).Saving: Use
to_netcdforto_zarrfor efficient parallel writing of results.