A Weird Imagination

100% CPU usage in games with Nvidia Linux drivers

The problem

Every game, no matter how old and simple, I run on my computer constantly uses an entire CPU thread even when idling at a menu. (Except for some newer multi-threaded games that do the same with multiple threads!) To raise this from a curiosity to a problem, this means that my computer's fans are on at full blast whenever I have a game going, so I notice.

The solution

To be clear, that symptom could be the result of many different possible causes, others of which I may explore in future blog posts.1 But specifically for systems with Nvidia GPUs using the Nvidia proprietary driver (as opposed to nouveau), setting the environmental variable __GL_YIELD to USLEEP fixed the issue in some games for me. To do so when running a single game, run __GL_YIELD="USLEEP" /path/to/game or to do so permanently, add the line

export __GL_YIELD="USLEEP"

to ~/.profile and restart X.

The details

How to measure CPU usage

A normal process monitor like htop shows which process is using the CPU, but getting more detail requires different tools. This is essentially a performance optimization problem, and when writing a program, profiling tools are usually used to help understand what the program is spending its time doing. While they're intended as debugging tools, some are usable with non-debug builds, albeit with more difficult to understand output.

The tool I used is perf2. I'm new to perf, so I don't have much wisdom to impart on its usage. I got started through this quick intro I found, which also points to a larger tutorial in the official documentation. I also found someone describing how they used it to look at context switches.

The basic way to get started is to run the application and run

sudo perf top -p $(pidof target_program)

(or read the PID off of htop, especially if there's multiple processes and you want to pick which one to examine).

An aside on debugging symbols

If debugging tools show just hex digits like 0x0000000000f9a8d0 in in place of function names, then you're missing debugging symbols. You may be able to get -dbg packages from your package manager, which provide debugging symbols for the corresponding package. You'll want at least libc6-dbg or your system's equivalent. If you have source code, then you can compile the program being examined (and any relevant libraries) with -g in GCC to generate debugging symbols. Also note that hex digits that start with ffff like 0xffffffffbda0008c are kernel addresses and you may need to run the tool as root (i.e. with sudo) to get information on them.

What's using the CPU?

Here's what I saw in perf:

Samples
Overhead  Shared Object               Symbol
  20.11%  [kernel]                    [k] do_syscall_64
  12.16%  [kernel]                    [k] entry_SYSCALL_64

So the largest amount of time is spent inside the kernel processing system calls. And the fact that the highlight is on the entry point and not the code to actually process some specific system call suggests the issue is related to a lot of system calls being made not a smaller number of computation intensive system calls being made.

The next step is to figure out which system call(s) are causing the issues. The way I actually used was strace, which lists all of the system calls made along with their arguments. There I saw lots of calls to sched_yield() in a row. perf with the -g switch ("Enables call-graph (stack chain/backtrace) recording.") also answers the question:

sudo perf top -g -p $(pidof target_program)

outputs

Samples
  Children      Self  Shared Object               Symbol
+   55.85%     1.50%  libc-2.30.so                [.] __sched_yield
-   33.18%     0.73%  [kernel]                    [k] entry_SYSCALL_64_after_hwframe
   - 22.05% entry_SYSCALL_64_after_hwframe
      - 22.54% do_syscall_64
         + 3.58% __x64_sys_sched_yield
-   32.45%    19.81%  [kernel]                    [k] do_syscall_64
   + 19.74% __sched_yield
   + 2.76% do_syscall_64

(+ expands/collapses the trees.)

What's calling sched_yield?

Unfortunately, for whatever reason, perf wasn't revealing where the calls to sched_yield were coming from. A quick web search revealed lots of calls it to can be a performance problem but didn't give a hint as to where they might be coming from.

Since the game I was debugging happened to be open source, I was able to search the source code for calls to it... and found there weren't any, so I knew the call must be coming from one of the libraries it's using. Looking further down in the perf output, I noticed the only library taking any significant amount of time is libnvidia-glcore.so.440.82 (other than a diversion into __vdso_clock_gettime taking time, which didn't end up leading anywhere).

Including libnvidia-glcore.so in my search hit upon this PR quoting Nvidia's README on the __GL_YIELD environmental variable, which can be used to tell it to either do nothing or call usleep(0) instead of sched_yield.

Running the game with __GL_YIELD="USLEEP" reduced the processor usage to around 20%, confirming that was the performance problem (or, at least, the primary problem).


  1. For example, this workaround fixes a completely unrelated WINE issue in some games which also manifests as 100% CPU usage. 

  2. My system has perf and perf_5.6 and only the latter actually works for collecting data. I've just written perf in this post to avoid confusion. 

Comments

Have something to add? Post a comment by sending an email to comments@aweirdimagination.net. You may use Markdown for formatting.

There are no comments yet.