Skip to content

Commit e29f012

Browse files
Update pkgimage-loading-performance.md
1 parent 3bcd7a9 commit e29f012

File tree

1 file changed

+42
-2
lines changed

1 file changed

+42
-2
lines changed

doc/src/devdocs/pkgimage-loading-performance.md

Lines changed: 42 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -427,9 +427,49 @@ Potential future approaches:
427427
| Refactor `jl_codegen_params_t` for explicit lock management | Enable shared context | Very High | Requires careful analysis of all shared state |
428428
| Pipeline parallelism (inference ‖ codegen ‖ LLVM opt) | Better utilization | High | Data dependencies, buffering |
429429
| Batch codegen at module granularity | Coarser parallelism | Medium | Load balancing |
430-
| Parallel LLVM optimization (already done) | ✅ Already parallelized | - | 3 threads used for native code gen |
430+
| Parallel LLVM optimization (already done) | ✅ Already parallelized | - | Variable threads based on module size |
431431

432-
**Current status:** Parallel codegen disabled. The native code generation phase already uses 3 threads (visible in debug output), so some parallelism exists in the later stages.
432+
**Current status:** Parallel codegen disabled. The native code generation phase uses parallel LLVM optimization (visible in debug output as "threads: N"), with thread count determined dynamically based on module complexity.
433+
434+
### Thread Pool Coordination
435+
436+
When multiple Julia processes run parallel precompilation (e.g., `Pkg.precompile()`), each process independently decides how many threads to use for native code generation. This can lead to thread oversubscription on the system.
437+
438+
A thread pool mechanism allows coordination across processes:
439+
440+
**Location:** The thread pool file is stored at `DEPOT_PATH[1]/compiled/threadpool` (derived automatically from the precompilation output path).
441+
442+
**Environment Variables:**
443+
444+
- `JULIA_IMAGE_THREAD_POOL`: Set to `0` to disable cross-process thread coordination. Enabled by default.
445+
- `JULIA_IMAGE_THREAD_POOL_SIZE`: Maximum threads in the pool (default: number of CPU threads). This limits total threads used across all concurrent precompilation workers.
446+
447+
**Usage Example:**
448+
449+
```bash
450+
# Limit thread pool to 8 threads across all workers
451+
export JULIA_IMAGE_THREAD_POOL_SIZE=8
452+
julia -e "using Pkg; Pkg.precompile()"
453+
454+
# Disable thread pool coordination (each worker uses all available threads)
455+
export JULIA_IMAGE_THREAD_POOL=0
456+
julia -e "using Pkg; Pkg.precompile()"
457+
```
458+
459+
**How it works:**
460+
461+
1. Thread acquisition happens just before LLVM parallel optimization begins (not at precompilation start)
462+
2. Each worker acquires up to its desired threads from the pool, waiting if necessary
463+
3. Threads are released immediately after LLVM optimization completes
464+
4. This just-in-time approach minimizes lock contention since native code gen is only ~24% of total precompilation time
465+
5. The pool file is automatically located at `~/.julia/compiled/threadpool` (or equivalent depot path)
466+
467+
**Debug output:** When `JL_DEBUG_SAVING` is enabled, thread pool operations are logged:
468+
469+
```text
470+
[pkgsave] thread pool: acquired 3 threads (2 -> 5 in use, pool size 8)
471+
[pkgsave] thread pool: released 3 threads (5 -> 2 in use)
472+
```
433473

434474
## Performance Characteristics
435475

0 commit comments

Comments
 (0)