random read disk stress test

When a GNU/Linux machine exhibits a high iowait (i.e. more than 20% of the processor time is locked waiting for IO to complete), it does not always mean a lot of bytes are read or written. It is demonstrated that reading randomly on the disk will create conditions in which less than 5Mb/s are read and the disk will be busy most of the time, which translates into an iowait greater than 20%. A workaround is to give more RAM to the (virtual) machine so that pages read are cached in memory and read only once, therefore reducing the probability of random reads.

The following command runs random reads on the disk to create a high disk usage with low bandwidth (i.e. few bytes per second being read).

mkdir -p /tmp/data ; fio --runtime=300 --time_based \
 --name=random-read --rw=randread --size=128m \
 --directory=/tmp/data

While it runs, the command

iostat -xm -d 30

should display an output like:

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.00     2.80  146.97   10.23     0.57     0.05     8.14    99.10  630.42   6.35  99.85

where %util is 99.85 and shows the disk is always busy and the rMB/s and wMB/s which show the number of bytes read and write per second for a total of 0.57 + 0.05 = 0.62 bytes per second.
The results of the fio command are listed for two disk configurations, SATA and SSD:

SAS 500GB RAID1 Dell Virtual Disk 1028
random-read: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
Starting 1 process
random-read: Laying out IO file(s) (1 file(s) / 128MB)
Jobs: 1 (f=1): [r] [100.0% done] [1,085K/0K /s] [265/0 iops] [eta 00m:00s]
random-read: (groupid=0, jobs=1): err= 0: pid=15008
  read : io=128MB, bw=672KB/s, iops=168, runt=194926msec
    clat (usec): min=106, max=451K, avg=5942.98, stdev=5092.48
    bw (KB/s) : min=  331, max= 1304, per=100.17%, avg=673.11, stdev=94.81
  cpu          : usr=0.09%, sys=0.63%, ctx=32778, majf=0, minf=37
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued r/w: total=32768/0, short=0/0
     lat (usec): 250=6.84%, 500=1.73%, 750=0.16%, 1000=0.04%
     lat (msec): 2=2.63%, 4=19.32%, 10=63.87%, 20=4.25%, 50=1.10%
     lat (msec): 100=0.03%, 250=0.02%, 500=0.01%

Run status group 0 (all jobs):
   READ: io=128MB, aggrb=672KB/s, minb=688KB/s, maxb=688KB/s, mint=194926msec, maxt=194926msec

Disk stats (read/write):
  dm-5: ios=32740/16, merge=0/0, ticks=193432/10996, in_queue=204428, util=99.30%, aggrios=0/0, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00%
    sda: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=-nan%

The disk is almost used at 100% (util=99.30%) and delivers 0.7MB/s (bw=672KB/s).

ATA INTEL SSDSA2M080 2CV1
random-read: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
Starting 1 process
Jobs: 1 (f=1): [r] [100.0% done] [8658K/0K /s] [2113/0 iops] [eta 00m:00s]
random-read: (groupid=0, jobs=1): err= 0: pid=20854
  read : io=5157MB, bw=17603KB/s, iops=4400, runt=300000msec
    clat (usec): min=3, max=316976, avg=220.94, stdev=1063.96
    bw (KB/s) : min=  562, max=21352, per=100.38%, avg=17669.09, stdev=3110.63
  cpu          : usr=2.85%, sys=13.11%, ctx=1326519, majf=0, minf=91
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued r/w: total=1320221/0, short=0/0
     lat (usec): 4=0.01%, 10=0.01%, 250=89.65%, 500=9.56%, 750=0.36%
     lat (usec): 1000=0.08%
     lat (msec): 2=0.27%, 4=0.07%, 10=0.01%, 20=0.01%, 50=0.01%
     lat (msec): 100=0.01%, 250=0.01%, 500=0.01%

Run status group 0 (all jobs):
   READ: io=5157MB, aggrb=17602KB/s, minb=18025KB/s, maxb=18025KB/s, mint=300000msec, maxt=300000msec

Disk stats (read/write):
  sdb: ios=1319853/4446, merge=0/37970, ticks=243372/10172108, in_queue=10684012, util=92.83%

The disk is almost used at 100% (util=92.83%) and delivers 17MB/s (bw=17603KB/s).

background

This stress test was done after witnessing the behaviour on a virtual machine running packaging-farm. In the following, the date of each file shows when the build of a package was completed. All packages were build in sequence.

...
 -rw-r--r-- 1 root root    0 Apr 28 05:37 pf-xivo-utils
  -rw-r--r-- 1 root root    0 Apr 28 05:31 pf-xivo-sysconfd
  -rw-r--r-- 1 root root    0 Apr 28 05:24 pf-xivo-queues-logger
  -rw-r--r-- 1 root root    0 Apr 28 05:18 pf-xivo-provisioning
  -rw-r--r-- 1 root root    0 Apr 28 05:12 pf-xivo-lib-python
  -rw-r--r-- 1 root root    0 Apr 28 05:05 pf-xivo-fetchfw
  -rw-r--r-- 1 root root    0 Apr 28 04:59 pf-xivo-extra
...

It shows that it took 6 minutes to build pf-xivo-fetchfw which is the difference between 04:59 and 05:05. The machine had 512MB of RAM and therefore very little opportunity to keep a useful file system cache. It was allocated 2GB and the time to build the packages was decreased by 5 minutes each.

...
 -rw-r--r-- 1 root root    0 Apr 29 09:20 pf-xivo-utils
  -rw-r--r-- 1 root root    0 Apr 29 09:19 pf-xivo-sysconfd
  -rw-r--r-- 1 root root    0 Apr 29 09:17 pf-xivo-queues-logger
  -rw-r--r-- 1 root root    0 Apr 29 09:16 pf-xivo-provisioning
  -rw-r--r-- 1 root root    0 Apr 29 09:14 pf-xivo-lib-python
  -rw-r--r-- 1 root root    0 Apr 29 09:13 pf-xivo-fetchfw
  -rw-r--r-- 1 root root    0 Apr 29 09:12 pf-xivo-extra
...