Unity Job System: Leveraging Multiple Cpu Cores
Getting Started with Unity’s Job System
The Unity Job System is a multithreaded system that allows Unity games and applications to fully utilize multicore and multi-CPU machines by running multiple operations across different threads simultaneously. This improves overall performance by keeping all available cores busy and not blocking on single threads. Some key benefits of leveraging the Unity Job System include:
- Increased performance through parallelization
- More efficient use of multiple cores and threads
- Asynchronous non-blocking operations
- Simplified multithreaded programming
Setting up and running jobs in Unity involves identifying key areas of your game logic, like physics, AI, or gameplay systems, that can benefit from parallelization across threads. These jobs need to be specifically implemented using Unity’s jobified code and data patterns. Some examples include:
- IJob – Base interface for all jobs
- IJobParallelFor – Parallel for loops
- IJobChunk – For dividing iterations into chunks
- JobHandle – Handle for managing job scheduling
By wrapping suitable systems and logic in Unity’s special job interfaces, they can leverage the behind-the-scenes scheduling and thread management handled by the job system. This takes care of splitting workloads efficiently across threads and enables easy parallelism.
Understanding Unity’s Job System Architecture
Under the hood, the Unity Job System is built on top of native threading APIs and leverages a high-performance multithreaded C# job scheduler. There is a system-managed main thread that handles the standard Unity game loop and scenes. Additional worker threads are then spawned to run special jobified systems in parallel.
Communication and data sharing between the main thread and job threads is handled in a cache-friendly lock-free manner using technologies like NativeArrays and the burst compiler. Care must be taken to explicitly define read/write access patterns for shared data to avoid race conditions between threads.
Some key aspects of the architecture include:
- Main thread – Runs core application logic
- Worker threads – Runs IJob workloads
- Job queues – Handles job scheduling/dispatch
- NativeArrays – Efficient inter-thread data containers
- Burst compiler – Produces highly optimized SIMD code
By understanding this underlying architecture and avoiding shared mutable state, highly parallelized systems can be built to fully leverage multicore CPU power without introducing race conditions or deadlocks.
Implementing Parallel For Jobs
A very common type of job in Unity is a parallel for loop, which iterates over a large set of data or calculations in parallel. This is handled via the IJobParallelFor interface and JobHandle struct for scheduling.
Key aspects when implementing an IJobParallelFor include:
- Identifying independent, parallelizable calculations
- Dividing iterations into equally-sized chunks
- Mapping readable data into NativeArrays
- Handling job scheduling and dependencies
- Supporting cancelling jobs
For example, applying physics to a large number of game objects like projectiles can be parallelized by running each calculation on a separate thread. The workload is divided up into chunks and distributed across the worker threads by the job scheduler.
Care must be taken however to balance overall workload, avoid race conditions in accessing data, and handling job lifecycles via JobHandles. Well structured parallel for jobs however can dramatically increase performance of systems in a Unity application.
Leveraging the Burst Compiler
The Burst Compiler that comes integrated with Unity’s Data-Oriented Technology Stack is a specialized compiler that produces highly optimized native code from C# jobs in Unity. It performs advanced optimizations such as loop unrolling, function inlining, and SIMD vectorization.
Burst is enabled by simply adding the [BurstCompile] attribute to any IJob. Advanced options allow fine-grained control of optimizations, boundaries, and compilation targets.
Key benefits when leveraging Burst compilation for job code include:
- Multithreaded performance – Optimized for multicore CPUs
- Increased efficiency – More work done per processor cycle
- SSE/AVX intrinsics – Automatic SIMD vectorization
- Reduced allocation overhead – Stack allocation versus heap
By enabling burst compilation alongside parallel for jobs and native containers, Unity applications can fully leverage modern CPU architecture for order-of-magnitude speedups in key systems and logic.
Avoiding Common Pitfalls
While immensely powerful, there are some common issues in implementing performant and safe jobified systems in Unity that should be avoided.
Debugging crashes and errors in multithreaded code can be challenging. Some tips include:
- Validate jobified code sequentially first
- Use the Unity profiler to locate hotspots
- Implement stringent error handling logic
- Throw exceptions on non-main threads
Performance bottlenecks can also sneak in, usually related to sub-optimal workload distribution. Ensure an even load balance across threads, minimize shared data access, and utilize NativeCollections.
Finally, incorrect read/write patterns can also introduce hard-to-diagnose data races. Care must be taken to operate on independent data whenever possible and leverage interleaving controls.
Putting it All Together
When implemented correctly, the Unity Job System can dramatically speed up and parallelize key game systems, including:
- Physics – Parallel rigidbody calculations
- AI – Pathfinding, decision making logic
- Gameplay – Parallel systems managing enemies, projectiles etc.
These jobified systems tie into the overall engine via code dependencies and JobHandle scheduling. This enables fine-grained control over order of operations and dependencies in a performant asynchronous manner.
For optimizing workflows, build Unity projects with the Burst compiler enabled from the start. Identify performance hotspots and implement parallelism via performant jobified code as early as possible. Profile regularly and aim for an even workload distribution to maximize utilization of all CPU cores and threads.
By following best practices and fully leveraging Unity’s Job System, previously CPU-bound games can achieve new levels of performance and responsiveness – enabling more enemies, physics objects, and richer gameplay than ever before possible.