Matthew BauerBlogRSS

12 Oct 2019

Improved performance in Nixpkgs

1 Avoiding subshells

A common complain in using Nixpkgs is that things can become slow when you have lots of dependencies. Processing of build inputs is processed in Bash which tends to be pretty hard to make performant. Bash doesn’t give us any way to loop through dependencies in parallel, so we end up with pretty slow Bash. Luckily, someone has found some ways to speed this up with some clever tricks in the setup.sh script.

1.1 Pull request

Albert Safin (@xzfc on GitHub) made an excellent PR that promises to improve performance for all users of Nixpkgs. The PR is available at PR #69131. The basic idea is to avoid invoking “subshells” in Bash. A subshell is basically anything that uses $(cmd ...). Each subshell requires forking a new process which has a constant time cost that ends up being ~2ms. This isn’t much in isolation, but adds up in big loops.

Subshells are usually used in Bash because they are convenient and easy to reason about. It’s easy to understand how a subshell works as it’s just substituting the result of one command into another’s arguments. We don’t usually care about the performance cost of subshells. In the hot path of Nixpkgs’ setup.sh, however, it’s pretty important to squeeze every bit of performance we can.

A few interesting changes were required to make this work. I’ll go through and document what there are. More information can be found at the Bash manual.

diff --git a/pkgs/stdenv/generic/setup.sh b/pkgs/stdenv/generic/setup.sh
index 326a60676a26..60067a4051de 100644
--- a/pkgs/stdenv/generic/setup.sh
+++ b/pkgs/stdenv/generic/setup.sh
@@ -98,7 +98,7 @@ _callImplicitHook() {
 # hooks exits the hook, not the caller. Also will only pass args if
 # command can take them
 _eval() {
-    if [ "$(type -t "$1")" = function ]; then
+    if declare -F "$1" > /dev/null 2>&1; then
         set +u
         "$@" # including args
     else

The first change is pretty easy to understand. It just replaces the type call with a declare call, utilizing an exit code in place of stdout. Unfortunately, declare is a Bashism which is not available in all POSIX shells. It’s been ill defined whether Bashisms can be used in Nixpkgs, but we now will require Nixpkgs to be sourced with Bash 4+.

diff --git a/pkgs/stdenv/generic/setup.sh b/pkgs/stdenv/generic/setup.sh
index 60067a4051de..7e7f8739845b 100644
--- a/pkgs/stdenv/generic/setup.sh
+++ b/pkgs/stdenv/generic/setup.sh
@@ -403,6 +403,7 @@ findInputs() {
     # The current package's host and target offset together
     # provide a <=-preserving homomorphism from the relative
     # offsets to current offset
+    local -i mapOffsetResult
     function mapOffset() {
         local -ri inputOffset="$1"
         if (( "$inputOffset" <= 0 )); then
@@ -410,7 +411,7 @@ findInputs() {
         else
             local -ri outputOffset="$inputOffset - 1 + $targetOffset"
         fi
-        echo "$outputOffset"
+        mapOffsetResult="$outputOffset"
     }

     # Host offset relative to that of the package whose immediate
@@ -422,8 +423,8 @@ findInputs() {

         # Host offset relative to the package currently being
         # built---as absolute an offset as will be used.
-        local -i hostOffsetNext
-        hostOffsetNext="$(mapOffset relHostOffset)"
+        mapOffset relHostOffset
+        local -i hostOffsetNext="$mapOffsetResult"

         # Ensure we're in bounds relative to the package currently
         # being built.
@@ -441,8 +442,8 @@ findInputs() {

             # Target offset relative to the package currently being
             # built.
-            local -i targetOffsetNext
-            targetOffsetNext="$(mapOffset relTargetOffset)"
+            mapOffset relTargetOffset
+            local -i targetOffsetNext="$mapOffsetResult"

             # Once again, ensure we're in bounds relative to the
             # package currently being built.

Similarly, this change makes mapOffset set to it’s result to mapOffsetResult instead of printing it to stdout, avoiding the subshell. Less functional, but more performant!

diff --git a/pkgs/stdenv/generic/setup.sh b/pkgs/stdenv/generic/setup.sh
index 7e7f8739845b..e25ea735a93c 100644
--- a/pkgs/stdenv/generic/setup.sh
+++ b/pkgs/stdenv/generic/setup.sh
@@ -73,21 +73,18 @@ _callImplicitHook() {
     set -u
     local def="$1"
     local hookName="$2"
-    case "$(type -t "$hookName")" in
-        (function|alias|builtin)
-            set +u
-            "$hookName";;
-        (file)
-            set +u
-            source "$hookName";;
-        (keyword) :;;
-        (*) if [ -z "${!hookName:-}" ]; then
-                return "$def";
-            else
-                set +u
-                eval "${!hookName}"
-            fi;;
-    esac
+    if declare -F "$hookName" > /dev/null; then
+        set +u
+        "$hookName"
+    elif type -p "$hookName" > /dev/null; then
+        set +u
+        source "$hookName"
+    elif [ -n "${!hookName:-}" ]; then
+        set +u
+        eval "${!hookName}"
+    else
+        return "$def"
+    fi
     # `_eval` expects hook to need nounset disable and leave it
     # disabled anyways, so Ok to to delegate. The alternative of a
     # return trap is no good because it would affect nested returns.

This change replaces the type -t command with calls to specific Bash builtins. declare -F tells us if the hook is a function, type -p tells us if hookName is a file, and otherwise -n tells us if the hook is non-empty. Again, this introduces a Bashism.

In the worst case, this does replace one case with multiple if branches. But since most hooks are functions, most of the time this ends up being a single if.

diff --git a/pkgs/stdenv/generic/setup.sh b/pkgs/stdenv/generic/setup.sh
index e25ea735a93c..ea550a6d534b 100644
--- a/pkgs/stdenv/generic/setup.sh
+++ b/pkgs/stdenv/generic/setup.sh
@@ -449,7 +449,8 @@ findInputs() {
             [[ -f "$pkg/nix-support/$file" ]] || continue

             local pkgNext
-            for pkgNext in $(< "$pkg/nix-support/$file"); do
+            read -r -d '' pkgNext < "$pkg/nix-support/$file" || true
+            for pkgNext in $pkgNext; do
                 findInputs "$pkgNext" "$hostOffsetNext" "$targetOffsetNext"
             done
         done

This change replaces the $(< ) call with a read call. This is a little surprising since read is using an empty delimiter '' instead of a new line. This replaces one Bashsism $(< ) with another in -d. And, the result, gets rid of a remaining subshell usage.

diff --git a/pkgs/build-support/bintools-wrapper/setup-hook.sh b/pkgs/build-support/bintools-wrapper/setup-hook.sh
index f65b792485a0..27d3e6ad5120 100644
--- a/pkgs/build-support/bintools-wrapper/setup-hook.sh
+++ b/pkgs/build-support/bintools-wrapper/setup-hook.sh
@@ -61,9 +61,8 @@ do
     if
         PATH=$_PATH type -p "@targetPrefix@${cmd}" > /dev/null
     then
-        upper_case="$(echo "$cmd" | tr "[:lower:]" "[:upper:]")"
-        export "${role_pre}${upper_case}=@targetPrefix@${cmd}";
-        export "${upper_case}${role_post}=@targetPrefix@${cmd}";
+        export "${role_pre}${cmd^^}=@targetPrefix@${cmd}";
+        export "${cmd^^}${role_post}=@targetPrefix@${cmd}";
     fi
 done

This replace a call to tr with a usage of the ^^. ${parameter^^pattern} is a Bash 4 feature and allows you to upper-case a string without calling out to tr.

diff --git a/pkgs/build-support/bintools-wrapper/setup-hook.sh b/pkgs/build-support/bintools-wrapper/setup-hook.sh
index 27d3e6ad5120..2e15fa95c794 100644
--- a/pkgs/build-support/bintools-wrapper/setup-hook.sh
+++ b/pkgs/build-support/bintools-wrapper/setup-hook.sh
@@ -24,7 +24,8 @@ bintoolsWrapper_addLDVars () {
         # Python and Haskell packages often only have directories like $out/lib/ghc-8.4.3/ or
         # $out/lib/python3.6/, so having them in LDFLAGS just makes the linker search unnecessary
         # directories and bloats the size of the environment variable space.
-        if [[ -n "$(echo $1/lib/lib*)" ]]; then
+        local -a glob=( $1/lib/lib* )
+        if [ "${#glob[*]}" -gt 0 ]; then
             export NIX_${role_pre}LDFLAGS+=" -L$1/lib"
         fi
     fi

Here, we are checking for whether any files exist in /lib/lib* using a glob. It originally used a subshell to check if the result was empty, but this change replaces it with the Bash ${#parameter} length operation.

diff --git a/pkgs/stdenv/generic/setup.sh b/pkgs/stdenv/generic/setup.sh
index 311292169ecd..326a60676a26 100644
--- a/pkgs/stdenv/generic/setup.sh
+++ b/pkgs/stdenv/generic/setup.sh
@@ -17,7 +17,8 @@ fi
 # code). The hooks for <hookName> are the shell function or variable
 # <hookName>, and the values of the shell array ‘<hookName>Hooks’.
 runHook() {
-    local oldOpts="$(shopt -po nounset)"
+    local oldOpts="-u"
+    shopt -qo nounset || oldOpts="+u"
     set -u # May be called from elsewhere, so do `set -u`.

     local hookName="$1"
@@ -32,7 +33,7 @@ runHook() {
         set -u # To balance `_eval`
     done

-    eval "${oldOpts}"
+    set "$oldOpts"
     return 0
 }

@@ -40,7 +41,8 @@ runHook() {
 # Run all hooks with the specified name, until one succeeds (returns a
 # zero exit code). If none succeed, return a non-zero exit code.
 runOneHook() {
-    local oldOpts="$(shopt -po nounset)"
+    local oldOpts="-u"
+    shopt -qo nounset || oldOpts="+u"
     set -u # May be called from elsewhere, so do `set -u`.

     local hookName="$1"
@@ -57,7 +59,7 @@ runOneHook() {
         set -u # To balance `_eval`
     done

-    eval "${oldOpts}"
+    set "$oldOpts"
     return "$ret"
 }

@@ -500,10 +502,11 @@ activatePackage() {
     (( "$hostOffset" <= "$targetOffset" )) || exit -1

     if [ -f "$pkg" ]; then
-        local oldOpts="$(shopt -po nounset)"
+        local oldOpts="-u"
+        shopt -qo nounset || oldOpts="+u"
         set +u
         source "$pkg"
-        eval "$oldOpts"
+        set "$oldOpts"
     fi

     # Only dependencies whose host platform is guaranteed to match the
@@ -522,10 +525,11 @@ activatePackage() {
     fi

     if [[ -f "$pkg/nix-support/setup-hook" ]]; then
-        local oldOpts="$(shopt -po nounset)"
+        local oldOpts="-u"
+        shopt -qo nounset || oldOpts="+u"
         set +u
         source "$pkg/nix-support/setup-hook"
-        eval "$oldOpts"
+        set "$oldOpts"
     fi
 }

@@ -1273,17 +1277,19 @@ showPhaseHeader() {

 genericBuild() {
     if [ -f "${buildCommandPath:-}" ]; then
-        local oldOpts="$(shopt -po nounset)"
+        local oldOpts="-u"
+        shopt -qo nounset || oldOpts="+u"
         set +u
         source "$buildCommandPath"
-        eval "$oldOpts"
+        set "$oldOpts"
         return
     fi
     if [ -n "${buildCommand:-}" ]; then
-        local oldOpts="$(shopt -po nounset)"
+        local oldOpts="-u"
+        shopt -qo nounset || oldOpts="+u"
         set +u
         eval "$buildCommand"
-        eval "$oldOpts"
+        set "$oldOpts"
         return
     fi

@@ -1313,10 +1319,11 @@ genericBuild() {

         # Evaluate the variable named $curPhase if it exists, otherwise the
         # function named $curPhase.
-        local oldOpts="$(shopt -po nounset)"
+        local oldOpts="-u"
+        shopt -qo nounset || oldOpts="+u"
         set +u
         eval "${!curPhase:-$curPhase}"
-        eval "$oldOpts"
+        set "$oldOpts"

         if [ "$curPhase" = unpackPhase ]; then
             cd "${sourceRoot:-.}"

This last change is maybe the trickiest. $(shopt -po nounset) is used to get the old value of nounset. The nounset setting tells Bash to treat unset variables as an error. This is used temporarily for phases and hooks to enforce this property. It will be reset to its previous value after we finish evaling the current phase or hook. To avoid the subshell here, the stdout provided in shopt -po is replaced with an exit code provided in shopt -qo nounset. If the shopt -qo nounset fails, we set oldOpts to +u, otherwise it is assumed that it is -u.

This commit was first merged in on September 20, but it takes a while for it to hit master. Today, it was finally merged into master (October 13) in 4e6826a so we can finally can see the benefits from it!

1.2 Benchmarking

Hyperfine makes it easy to compare differences in timings. You can install it locally with:

$ nix-env -iA nixpkgs.hyperfine

Here are some of the results:

$ hyperfine --warmup 3 \
  'nix-shell -I nixpkgs=https://github.com/NixOS/nixpkgs/archive/33366cc.tar.gz -p stdenv --run :' \
  'nix-shell -I nixpkgs=https://github.com/NixOS/nixpkgs/archive/4e6826a.tar.gz -p stdenv --run :'
Benchmark #1: nix-shell -I nixpkgs=https://github.com/NixOS/nixpkgs/archive/33366cc.tar.gz -p stdenv --run :
  Time (mean ± σ):     436.4 ms ±   8.5 ms    [User: 324.7 ms, System: 107.8 ms]
  Range (min … max):   430.8 ms … 459.6 ms    10 runs

Benchmark #2: nix-shell -I nixpkgs=https://github.com/NixOS/nixpkgs/archive/4e6826a.tar.gz -p stdenv --run :
  Time (mean ± σ):     244.5 ms ±   2.3 ms    [User: 190.7 ms, System: 34.2 ms]
  Range (min … max):   241.8 ms … 248.3 ms    12 runs

Summary
  'nix-shell -I nixpkgs=https://github.com/NixOS/nixpkgs/archive/4e6826a.tar.gz -p stdenv --run :' ran
    1.79 ± 0.04 times faster than 'nix-shell -I nixpkgs=https://github.com/NixOS/nixpkgs/archive/33366cc.tar.gz -p stdenv --run :'
$ hyperfine --warmup 3 \
  'nix-shell -I nixpkgs=https://github.com/NixOS/nixpkgs/archive/33366cc.tar.gz -p i3.buildInputs --run :' \
  'nix-shell -I nixpkgs=https://github.com/NixOS/nixpkgs/archive/4e6826a.tar.gz -p i3.buildInputs --run :'
Benchmark #1: nix-shell -I nixpkgs=https://github.com/NixOS/nixpkgs/archive/33366cc.tar.gz -p i3.buildInputs --run :
  Time (mean ± σ):      3.428 s ±  0.015 s    [User: 2.489 s, System: 1.081 s]
  Range (min … max):    3.404 s …  3.453 s    10 runs

Benchmark #2: nix-shell -I nixpkgs=https://github.com/NixOS/nixpkgs/archive/4e6826a.tar.gz -p i3.buildInputs --run :
  Time (mean ± σ):     873.4 ms ±  12.2 ms    [User: 714.7 ms, System: 89.3 ms]
  Range (min … max):   861.5 ms … 906.4 ms    10 runs

Summary
  'nix-shell -I nixpkgs=https://github.com/NixOS/nixpkgs/archive/4e6826a.tar.gz -p i3.buildInputs --run :' ran
    3.92 ± 0.06 times faster than 'nix-shell -I nixpkgs=https://github.com/NixOS/nixpkgs/archive/33366cc.tar.gz -p i3.buildInputs --run :'
$ hyperfine --warmup 3 \
  'nix-shell -I nixpkgs=https://github.com/NixOS/nixpkgs/archive/33366cc.tar.gz -p inkscape.buildInputs --run :' \
  'nix-shell -I nixpkgs=https://github.com/NixOS/nixpkgs/archive/4e6826a.tar.gz -p inkscape.buildInputs --run :'
Benchmark #1: nix-shell -I nixpkgs=https://github.com/NixOS/nixpkgs/archive/33366cc.tar.gz -p inkscape.buildInputs --run :
  Time (mean ± σ):      4.380 s ±  0.024 s    [User: 3.155 s, System: 1.443 s]
  Range (min … max):    4.339 s …  4.409 s    10 runs

Benchmark #2: nix-shell -I nixpkgs=https://github.com/NixOS/nixpkgs/archive/4e6826a.tar.gz -p inkscape.buildInputs --run :
  Time (mean ± σ):      1.007 s ±  0.011 s    [User: 826.7 ms, System: 114.2 ms]
  Range (min … max):    0.995 s …  1.026 s    10 runs

Summary
  'nix-shell -I nixpkgs=https://github.com/NixOS/nixpkgs/archive/4e6826a.tar.gz -p inkscape.buildInputs --run :' ran
    4.35 ± 0.05 times faster than 'nix-shell -I nixpkgs=https://github.com/NixOS/nixpkgs/archive/33366cc.tar.gz -p inkscape.buildInputs --run :'

Try running these commands yourself, and compare the results.

1.3 Results

Avoiding subshells leads to a decrease in up to 4x of the time it used to take. That multiplier is going to depend on precisely how many inputs we are processing. It’s a pretty impressive improvement, and it comes with no added cost. These kind of easy wins in performance are pretty rare, and worth celebrating!