Sunday, December 6, 2015

HOW-TO: Control fan speed of multiple headless GPUs

Hi all,
    Deep learning is all the rage now. It is not a surprise if you have two or more GPUs, but it is problematic when the heat is accumulated among different GPUs. There is a need to increase fan speed of GPUs because the default speed is quite slow (22%). In this post, I will guide you how to increase fan of multiple GPUs. All required equipment is two monitors. I have successfully increased my two Titan Xs with this method.
    The principle is very simple. A GPU needs to have X screen attached to it in order to increase the fan speed of that GPU through Coolbits [1]. First, we use nvidia-settings to attach the second monitor to each GPU. The first monitor is always attached to the first GPU for display. The second monitor is alternatively attached to the second GPU to create another X screen.  See the below image for how to do it through GUI.
    The steps are following: First, we need sudo privilege to generate X-conf (only if you do not have) and start nividia-settings:
sudo nvidia-xconfig
sudo nvidia-settings
Then, we attach another X-screen like the following image and save to X-configuration file.

Next, open the X-configuration file /etc/X11/xorg.conf and add coolbit option to 2 X-screen such as:
Section "Screen"
    Identifier     "Screen0"
    Device         "Device0"
    Monitor        "Monitor0"
    DefaultDepth    24
    Option         "Coolbits" "4"
    Option         "Stereo" "0"
    Option         "nvidiaXineramaInfoOrder" "DFP-0"
    Option         "metamodes" "nvidia-auto-select +0+0"
    Option         "SLI" "Off"
    Option         "MultiGPU" "Off"
    Option         "BaseMosaic" "off"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection

Section "Screen"
    Identifier     "Screen1"
    Device         "Device1"
    Monitor        "Monitor1"
    DefaultDepth    24
    Option         "Coolbits" "4"
    Option         "Stereo" "0"
    Option         "metamodes" "nvidia-auto-select +0+0"
    Option         "SLI" "Off"
    Option         "MultiGPU" "Off"
    Option         "BaseMosaic" "off"
    SubSection     "Display"
        Depth       24
    EndSubSection
EndSection
 Note that we only add coolbit options to available X-screen (line: Option         "Coolbits" "4") by hand. Alternatively, we can run the following command to set coolbits
sudo nvidia-xconfig --cool-bits=4
Finally log out and log in again, voila, you can change your fan speed now. Please save a X-configuration for uses in the future:
sudo cp /etc/X11/xorg.conf /etc/X11/xorg.conf-backup-coolbit
Choose Selection to attach a monitor into a new GPU.


    Then, we need to increase the fan speed of GPUs through a script [2]. For example, I want to increase fan speed to 95% for two GPUs. The bash script is as simple as follows:
#!/bin/bash

nvidia-settings -a [gpu:0]/GPUFanControlState=1 -a [fan:0]/GPUTargetFanSpeed=95 -a [gpu:1]/GPUFanControlState=1 -a [fan:1]/GPUTargetFanSpeed=95
 You should save it as set_gpu_fan.sh and add it into start-up program on Ubuntu. So the script will be automatically executed when you log into your workstation through an application of X servers. Alternatively, you can enable the persistent state of GPUs by add the following command into crontab. The state of GPUs will be preserved in persistent (including fan speed) until reboot.
sudo crontab -e
Then add the following snippet into crontab.
@reboot nvidia-smi --persistence-mode=ENABLED
Finally, the following methods only work for the case you have as many monitors as number of GPUs. The principle is the same for more than 2 GPUs. You just program the /etc/X11/xorg.conf to have a X screen for each GPU. Then, you can manually adjust fan speed of each GPU. However, the drawback of this method is that you need to log in after the workstation restarts, but I work for me on the commodity workstation for doing researches in a lab.