The problem#
Every game, no matter how old and simple, I run on my computer constantly uses an entire CPU thread even when idling at a menu. (Except for some newer multi-threaded games that do the same with multiple threads!) To raise this from a curiosity to a problem, this means that my computer's fans are on at full blast whenever I have a game going, so I notice.
The solution#
To be clear, that symptom could be the result of many different
possible causes, others of which I may explore in future blog
posts.1 But specifically for systems with Nvidia GPUs using
the Nvidia proprietary driver (as opposed to
nouveau), setting the environmental variable __GL_YIELD
to
USLEEP
fixed the issue in some games for me. To do so when running
a single game, run __GL_YIELD="USLEEP" /path/to/game
or
to do so permanently, add the line
export __GL_YIELD="USLEEP"
to ~/.profile
and restart X.
The details#
How to measure CPU usage#
A normal process monitor like htop
shows which process
is using the CPU, but getting more detail requires different tools.
This is essentially a performance optimization problem, and when
writing a program, profiling tools are usually used to
help understand what the program is spending its time doing. While
they're intended as debugging tools, some are usable with non-debug
builds, albeit with more difficult to understand output.
The tool I used is perf
2. I'm new to perf
,
so I don't have much wisdom to impart on its usage. I got started
through this quick intro I found, which also points
to a larger tutorial in the official documentation.
I also found someone describing how they used it to look at context
switches.
The basic way to get started is to run the application and run
sudo perf top -p $(pidof target_program)
(or read the PID off of htop
, especially if there's multiple
processes and you want to pick which one to examine).
An aside on debugging symbols#
If debugging tools show just hex digits like 0x0000000000f9a8d0
in
in place of function names, then you're missing debugging symbols. You
may be able to get -dbg
packages from your package manager, which
provide debugging symbols for the corresponding package. You'll want
at least libc6-dbg
or your system's equivalent. If you have source
code, then you can compile the program being examined (and any relevant
libraries) with -g
in GCC to generate debugging symbols. Also
note that hex digits that start with ffff
like 0xffffffffbda0008c
are kernel addresses and you may need to run the tool as root
(i.e.
with sudo
) to get information on them.
What's using the CPU?#
Here's what I saw in perf
:
Samples
Overhead Shared Object Symbol
20.11% [kernel] [k] do_syscall_64
12.16% [kernel] [k] entry_SYSCALL_64
So the largest amount of time is spent inside the kernel processing system calls. And the fact that the highlight is on the entry point and not the code to actually process some specific system call suggests the issue is related to a lot of system calls being made not a smaller number of computation intensive system calls being made.
The next step is to figure out which system call(s) are causing
the issues. The way I actually used was strace
,
which lists all of the system calls made along with their arguments.
There I saw lots of calls to sched_yield()
in a row.
perf
with the -g
switch ("Enables call-graph (stack
chain/backtrace) recording.") also answers the question:
sudo perf top -g -p $(pidof target_program)
outputs
Samples
Children Self Shared Object Symbol
+ 55.85% 1.50% libc-2.30.so [.] __sched_yield
- 33.18% 0.73% [kernel] [k] entry_SYSCALL_64_after_hwframe
- 22.05% entry_SYSCALL_64_after_hwframe
- 22.54% do_syscall_64
+ 3.58% __x64_sys_sched_yield
- 32.45% 19.81% [kernel] [k] do_syscall_64
+ 19.74% __sched_yield
+ 2.76% do_syscall_64
(+ expands/collapses the trees.)
What's calling sched_yield
?#
Unfortunately, for whatever reason, perf
wasn't revealing
where the calls to sched_yield
were coming from. A quick
web search revealed lots of calls it to can be a performance
problem but didn't give a hint as to where they
might be coming from.
Since the game I was debugging happened to be open source, I
was able to search the source code for calls to it... and found
there weren't any, so I knew the call must be coming from one of
the libraries it's using. Looking further down in the perf
output, I noticed the only library taking any significant amount of
time is libnvidia-glcore.so.440.82
(other than a diversion into
__vdso_clock_gettime
taking time, which didn't end up leading
anywhere).
Including libnvidia-glcore.so
in my search hit upon
this PR quoting Nvidia's README on the
__GL_YIELD
environmental variable, which can be used to tell it to
either do nothing or call usleep(0)
instead of sched_yield
.
Running the game with __GL_YIELD="USLEEP"
reduced the processor usage
to around 20%, confirming that was the performance problem (or, at
least, the primary problem).
-
For example, this workaround fixes a completely unrelated WINE issue in some games which also manifests as 100% CPU usage. ↩
-
My system has
perf
andperf_5.6
and only the latter actually works for collecting data. I've just writtenperf
in this post to avoid confusion. ↩
Comments
Have something to add? Post a comment by sending an email to comments@aweirdimagination.net. You may use Markdown for formatting.
There are no comments yet.