1 core, 2 cores, 4 cores, 8 cores, 16 cores! Who's got more? The more cores, the better. The Unity Job System will squeeze them all, right? Yeah, probably. But what if Unity was over-milking your CPU? Would that be a problem?
A word of warning...
This is a niche post. Chances are, you'll never run into this kind of trouble.
But still, you might be somewhat curious about the Unity Job System. And performance in general.
I'll tackle some interesting points in this article. Will you join me?
Types of Performance Bottlenecks
Another word of warning: don't make eye contact with the following performance bottlenecks. Fail to do so and they'll find your game. And you won't be happy to meet them.
When we talk about performance, game developers tend to classify your game into three categories:
- No bottlenecks: your game runs smoothly on your target device. No further action required.
- Under-performant: your game doesn't reach the target frame-rate and players are upset. Two main reasons:
- GPU-bound: The Graphical Processing Unit is overwhelmed and slows down your CPU. Typical reasons: overdraw, fragment shader complexity, polygon count, etc..
- CPU-bound: The Central Processing Unit is upset at the amount of work you're giving it. Your CPU is slowing down your GPU. This often happens because of slow scripts, high amount of draw calls, physics complexity, etc..
This categorization is a bit of generalization. But it works well in most situations.
Developers have many tools at their disposal to tackle both scenarios. After all, what we all want is to reach the target frame-rate or FPS.
Fixing CPU Bottlenecks: Unity Job System
When it comes to dissolving the CPU bottlenecks, one of the core strategies is splitting the workload across different CPU cores. This is commonly known as multithreading.
And multithreading is great. The problem? It's very complex not to mess it up. And debugging these issues is no fun.
Unity has been investing heavily in multithreading technologies in the last few years. You might have heard of the Unity Job System, DOTS and such. Their implementations rely heavily on splitting the work into several working units that each CPU core will work on.
The job system was born to help Unity developers in two ways. For one, Unity jobifies some of their systems, such as animation, physics and rendering. And jobifying a task means:
- Your main CPU thread, commonly called game thread, splits a task into several sub-tasks. Each sub-task is very similar in behavior but different in data.
- Then, the main thread delegates these sub-tasks into different jobs. Each job takes a number of these sub-tasks and is responsible for finishing them solely based on their data.
- The jobs execute the sub-tasks in parallel (different cores, hopefully). When they're done, they report their results to the main thread in the future.
If you go to McDonalds, you'll see a screen with all the orders to be prepared. The employees will work in parallel to make the burgers assigned to them. Well, that monitor full of orders is the job scheduler and the employees are the job workers.
Ideally, we send the exact amount of work your CPU cores can handle. If we give it less, we'd be wasting CPU resources (employees will be idle). But if we overload it, then the CPU will stall (people will wait longer lines).
When The Unity Job System Goes Wrong
In practice, the unity job system never gets to use 100% of the CPU. Most of the time, it won't fill all your CPU power efficiently.
In any case, Unity is improving the Job System substantially, so it'll only get better over time. And don't take me wrong, it is already great.
Except when it isn't
There are cases where Unity is sending way too much work to your CPU. And that is a mistake. Your CPUs becomes over-stressed because it tries to do too many things at the same time.
And that decreases performance.
How is that?
As you know, there are many different platforms on the market. So it is just plain hard for Unity engineers to tweak the engine to be performant for all of the devices at the same time. And when Unity fails, you lose a bit of performance.
Here's an example: Oculus Quest. In a Quest CPU, you have 4+4 physical cores. Four are called performance cores and the remaining are efficiency cores.
The thing is, not all of these CPU cores are at your disposal. The greedy operating system reserves some to do important calculations like VR timewarp, spatial tracking and finding the best facebook ads to serve. Ok, the last one is a joke. Hopefully.
The Unity engine doesn't really know before-hand how many cores are really available for the engine. So it tries to guess. And even if it did know, it's a pretty hard task to find the optimal count, since you've so many running processes in the background.
Right now, Unity assumes you have more cores at your disposal than you currently have on Quest. Unity spawns four job workers (threads) plus all the other existing threads such as audio and rendering.
When threading over-allocation happens, all jobs will compete for more of the CPU's attention.
“If you think humans aren't good at multi-tasking, well... CPUs are better only up to a point”
Rubén Torres Bonet
So when there's just too much work, threads will start stealing each others' time. The CPU will have to switch between tasks, and this is expensive. This process is called context switching.
As another example, the PlayStation 4 also reserves some cores for the greedy operating system. I don't know if the same issue is present there, I guess not. It's been years since I last touched that platform.
But yeah, usually consoles reserve CPU cores to run their services. So if you are developing for consoles, it's time you checked your profiler.
Your profiler might surprise you.
I Caught You Red-Handed, Unity
If you catch the sneaky Unity spawning more job workers than cores you have, then sit down and have a chat with the engine. Tell Unity to calm down with the workers thing for a second.
Can you do this?
JobsUtility.JobWorkerCount = 3;
You can invoke that at run-time and even change the number dynamically as you may please. But what is the best number?
I have a rule of thumb:
Job worker count = number of available CPU cores - 1
Is it scientific? No.
Is it accurate? No
So why the hell are you making it up? And why that minus 1? It's making me nervous.
In my experience, that number works well for several reasons. For one, the job director (main thread) often works together with the job workers. The director will be the producer who stays doing over-hours along with the developers even if just to order pizza. So that main thread counts.
And even if it didn't, we also have the render thread, the audio thread, the reddit thread and all other threads you can think of.
Will you see substantial performance gains from this?
Probably not, it highly depends on your operating system. If the CPU scheduler is a slow one like in my case, then you'll see some benefit to it.
So if you find a formula that works better in your case, please let me know. I'll add you to the hall of fame.
One last thing...
Don't forget to get on board and join the weekly newsletter.
You'll unlock exclusive content.
You'll become part of the growing community.
You'll level up your skills.
Till next week,