Spaces:
Running on Zero
Running on Zero
The purpose of these changes is to stop LoRA adapters from accumulating in GPU memory forever.
#17
by RahulRathod7 - opened
The purpose of these changes is to stop LoRA adapters from accumulating in GPU memory forever.
Before:
- Every newly selected LoRA stayed loaded for the lifetime of the app.
- VRAM usage could keep growing as users tried more adapters.
- Because the pipeline is global, concurrent requests could also interfere with adapter switching.
After:
- The app keeps only a small number of recently used LoRAs in memory.
- When the cache is full, it evicts the least recently used adapter before loading a new one.
- A lock ensures adapter load/evict/switch operations don’t race with inference.
So the practical goal is:
- lower long-session VRAM growth
- make the app more stable under repeated adapter switching
- reduce the chance of OOMs and shared-pipeline corruption
You can control the cache size with MAX_CACHED_LORAS.