Every game, no matter how old and simple, I run on my computer constantly uses an entire CPU thread even when idling at a menu. (Except for some newer multi-threaded games that do the same with multiple threads!) To raise this from a curiosity to a problem, this means that my computer's fans are on at full blast whenever I have a game going, so I notice.
To be clear, that symptom could be the result of many different
possible causes, others of which I may explore in future blog
posts.1 But specifically for systems with Nvidia GPUs using
the Nvidia proprietary driver (as opposed to
nouveau), setting the environmental variable
USLEEP fixed the issue in some games for me. To do so when running
a single game, run
__GL_YIELD="USLEEP" /path/to/game or
to do so permanently, add the line
~/.profile and restart X.
How to measure CPU usage
A normal process monitor like
htop shows which process
is using the CPU, but getting more detail requires different tools.
This is essentially a performance optimization problem, and when
writing a program, profiling tools are usually used to
help understand what the program is spending its time doing. While
they're intended as debugging tools, some are usable with non-debug
builds, albeit with more difficult to understand output.
The tool I used is
perf2. I'm new to
so I don't have much wisdom to impart on its usage. I got started
through this quick intro I found, which also points
to a larger tutorial in the official documentation.
I also found someone describing how they used it to look at context
The basic way to get started is to run the application and run
sudo perf top -p $(pidof target_program)
(or read the PID off of
htop, especially if there's multiple
processes and you want to pick which one to examine).
An aside on debugging symbols
If debugging tools show just hex digits like
in place of function names, then you're missing debugging symbols. You
may be able to get
-dbg packages from your package manager, which
provide debugging symbols for the corresponding package. You'll want
libc6-dbg or your system's equivalent. If you have source
code, then you can compile the program being examined (and any relevant
-g in GCC to generate debugging symbols. Also
note that hex digits that start with
are kernel addresses and you may need to run the tool as
sudo) to get information on them.
What's using the CPU?
Here's what I saw in
Samples Overhead Shared Object Symbol 20.11% [kernel] [k] do_syscall_64 12.16% [kernel] [k] entry_SYSCALL_64
So the largest amount of time is spent inside the kernel processing system calls. And the fact that the highlight is on the entry point and not the code to actually process some specific system call suggests the issue is related to a lot of system calls being made not a smaller number of computation intensive system calls being made.
The next step is to figure out which system call(s) are causing
the issues. The way I actually used was
which lists all of the system calls made along with their arguments.
There I saw lots of calls to
sched_yield() in a row.
perf with the
-g switch ("Enables call-graph (stack
chain/backtrace) recording.") also answers the question:
sudo perf top -g -p $(pidof target_program)
Samples Children Self Shared Object Symbol + 55.85% 1.50% libc-2.30.so [.] __sched_yield - 33.18% 0.73% [kernel] [k] entry_SYSCALL_64_after_hwframe - 22.05% entry_SYSCALL_64_after_hwframe - 22.54% do_syscall_64 + 3.58% __x64_sys_sched_yield - 32.45% 19.81% [kernel] [k] do_syscall_64 + 19.74% __sched_yield + 2.76% do_syscall_64
(+ expands/collapses the trees.)
Unfortunately, for whatever reason,
perf wasn't revealing
where the calls to
sched_yield were coming from. A quick
web search revealed lots of calls it to can be a performance
problem but didn't give a hint as to where they
might be coming from.
Since the game I was debugging happened to be open source, I
was able to search the source code for calls to it... and found
there weren't any, so I knew the call must be coming from one of
the libraries it's using. Looking further down in the
output, I noticed the only library taking any significant amount of
libnvidia-glcore.so.440.82 (other than a diversion into
__vdso_clock_gettime taking time, which didn't end up leading
libnvidia-glcore.so in my search hit upon
this PR quoting Nvidia's README on the
__GL_YIELD environmental variable, which can be used to tell it to
either do nothing or call
usleep(0) instead of
Running the game with
__GL_YIELD="USLEEP" reduced the processor usage
to around 20%, confirming that was the performance problem (or, at
least, the primary problem).