Dynamic Compute Adaptation

Chloros 1.1.0 introduces intelligent hardware detection and automatic processing strategy selection. The processing engine adapts to your hardware — from a Jetson Nano to a multi-GPU workstation — without any manual configuration.


How It Works

When Chloros starts, it automatically profiles your system:

  1. Detects the operating system — Windows or Linux

  2. Identifies CPU cores and total RAM

  3. Detects GPU presence — NVIDIA CUDA capability, VRAM, model

  4. Identifies Jetson model (if applicable) — via /proc/device-tree/model

  5. Checks thermal sensors (Jetson) — for temperature-aware processing

  6. Selects the optimal compute strategy — based on all detected hardware

  7. Configures worker count, pipeline type, and memory allocation automatically

The result is cached so subsequent runs start faster. If hardware changes (e.g., a GPU is added), Chloros re-profiles on the next launch.


Compute Strategies

Chloros selects one of three compute strategies based on your hardware:

Strategy
GPU Required
Workers
Pipeline
Best For

GPU_PARALLEL

Yes (12GB+ VRAM or 16GB+ shared)

3-4

fused_gpu

Desktop GPUs with 12GB+, Jetson Orin NX 16GB, AGX Orin

GPU_SINGLE

Yes (< 12GB VRAM)

1-3

tiled_gpu

Entry-level GPUs, Jetson Nano, Orin Nano

CPU_PARALLEL

No

cores - 1

cpu_fallback

Systems without NVIDIA GPU

Pipeline Types

  • fused_gpu — Full GPU processing path. All debayer, correction, and index operations run on the GPU in a single fused pass. Highest throughput but requires more VRAM.

  • tiled_gpu — Memory-efficient GPU path. Processes images in tiles to fit within limited GPU memory. Lower throughput but works on memory-constrained devices.

  • cpu_fallback — CPU-only processing using multi-threaded parallelism. Used when no NVIDIA GPU is available.


Platform-Specific Behavior

Platform
Strategy
Workers
Pipeline
Notes

Jetson Nano 8GB

GPU_SINGLE

1

tiled_gpu (serialized)

Memory-efficient mode, processes one image at a time

Jetson Orin NX 16GB

GPU_PARALLEL

3

fused_gpu (concurrent)

Recommended edge device — real parallel GPU processing

Jetson AGX Orin 64GB

GPU_PARALLEL

4

fused_gpu (concurrent)

Maximum edge performance

Desktop with 8GB GPU

GPU_SINGLE

3

tiled_gpu

Good desktop performance with memory-efficient tiles

Desktop with 12GB+ GPU

GPU_PARALLEL

3-4

fused_gpu

Optimal desktop performance

CPU-only system

CPU_PARALLEL

cores - 1

cpu_fallback

No GPU required, uses ThreadPool

circle-info

Jetson unified memory: Jetson devices share GPU and CPU memory. A Jetson Orin NX 16GB reports ~15.3GB of VRAM, but this is the same physical RAM used by the OS and CPU processes. Chloros accounts for this when setting memory allocation thresholds.


Dynamic GPU Memory Allocation

Chloros uses a 4-thread processing pipeline:

  • Thread 1 (Detection) — Image loading, EXIF parsing, target detection

  • Thread 2 (Calibration) — Reflectance calibration computation

  • Thread 3 (Processing) — GPU debayer, vignette correction, index calculation

  • Thread 4 (Export) — File writing, metadata embedding

As earlier pipeline threads complete their work (e.g., all images have been detected), their GPU memory allocation is released and redistributed to the remaining active threads. This means Thread 3 (the GPU-intensive stage) gets progressively more memory as the pipeline advances, improving throughput for the most compute-intensive work.

Allocation Stages

Stage
Active Threads
GPU Memory Distribution

Early

1, 2, 3, 4

Split across all threads

Mid-Early

2, 3, 4

Thread 1 memory redistributed

Mid-Late

3, 4

Threads 1+2 memory goes to 3+4

Late

3 or 4

Maximum memory for remaining thread


Texture Aware Processing

The Texture Aware debayer method (Chloros+ only) uses significantly more GPU memory than the Standard method due to the AI/ML denoising model:

  • Systems with < 7GB VRAM are forced into a synchronous processing loop for Texture Aware mode (one image at a time)

  • Systems with 7GB+ VRAM can process Texture Aware concurrently, though at reduced worker count compared to Standard


Thermal Management (Jetson)

Jetson devices have thermal constraints, especially in enclosed or airborne deployments. Chloros monitors GPU and CPU temperatures and automatically adjusts processing:

Temperature
Response

< 70°C

Normal operation — full speed

70°C (Warning)

Reduce batch size

80°C (Critical)

Aggressive throttling — lower concurrency and worker count

90°C (Shutdown)

Stop GPU processing entirely

Temperature monitoring uses tegrastats on Jetson platforms. On desktop systems with adequate cooling, thermal throttling is rarely triggered.


Memory Pressure Handling

Chloros monitors system memory pressure during processing:

  • Memory threshold: 85% utilization triggers conservative behavior

  • OOM reduction: If an out-of-memory event occurs, allocation is reduced by 25% (0.75x multiplier)

  • Pipeline fallback: Under severe memory pressure, the pipeline falls back from fused_gpu to tiled_gpu automatically

  • Swap recommendations: On Jetson, Chloros warns you if swap space is insufficient for your dataset size


Monitoring Compute Adaptation

CLI Status Output

When processing starts, the CLI displays the detected hardware profile:

System Diagnostics

Run chloros-cli selftest to see a full hardware profile and verify compute capabilities:

This checks CUDA availability, GPU memory, denoiser models, and backend connectivity.


Next Steps

Last updated