The ongoing struggle
I’ll get there one day.
SSDs and Linux Native Command Queuing
Native Command Queueing ^
Native Command Queuing (NCQ) is an extension of the Serial ATA protocol that allows multiple requests to be sent to a drive, allowing the drive to order them in a way it considers optimal.
This is very handy for rotational media like conventional hard drives, because they have to move the head all over to do random IO, so in theory if they are allowed to optimise ordering then they may be able to do a better job of it. If the drive supports NCQ then it will advertise this fact to the operating system and Linux by default will enable it.
Queue depth ^
The maximum depth of the queue in SATA is 31 for practical purposes, and so if the drive supports NCQ then Linux will usually set the depth to 31. You can change the depth by writing a number between 1 and 31 to /sys/block/ /device/queue_depth. Writing 1 to the file effectively disables NCQ for that device.
NCQ and SSDs ^
So what about SSDs? They aren’t rotational media; any access is in theory the same as any other access, so no need to optimally order the commands, right?
Well, let’s see what fio has to say about that.
The setup ^
- Two Intel DC s3610 1.6TB SSDs in an MD RAID-10 on Debian 8.1.
- noop IO scheduler.
- fio operating on a 4GiB test file that is on an ext4 filesystem backed by LVM.
- fio set to do a 70/30% mix of read vs write operations with 128 simultaneous IO operations in flight.
The goal of this is to simulate a busy highly parallel server load, such as you might see with a database.
The fio command line looks like this:
fio —randrepeat=1 \ —ioengine=libaio \ —direct=1 \ —gtod_reduce=1 \ —name=ncq \ —filename=test \ —bs=4k \ —iodepth=128 \ —size=4G \ —readwrite=randrw \ —rwmixread=70
Expected output will be something like this:
ncq: (g=0): rw=randrw, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=128 fio-2.1.11 Starting 1 process Jobs: 1 (f=1): [m(1)] [100.0% done] [50805KB/21546KB/0KB /s] [12.8K/5386/0 iops] [eta 00m:00s] ncq1: (groupid=0, jobs=1): err= 0: pid=11272: Sun Aug 9 06:29:33 2015 read : io=2867.6MB, bw=44949KB/s, iops=11237, runt= 65327msec write: io=1228.5MB, bw=19256KB/s, iops=4813, runt= 65327msec cpu : usr=4.39%, sys=25.20%, ctx=732814, majf=0, minf=6 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.1% issued : total=r=734099/w=314477/d=0, short=r=0/w=0/d=0 latency : target=0, window=0, percentile=100.00%, depth=128 Run status group 0 (all jobs): READ: io=2867.6MB, aggrb=44949KB/s, minb=44949KB/s, maxb=44949KB/s, mint=65327msec, maxt=65327msec WRITE: io=1228.5MB, aggrb=19255KB/s, minb=19255KB/s, maxb=19255KB/s, mint=65327msec, maxt=65327msec Disk stats (read/write): dm-0: ios=732755/313937, merge=0/0, ticks=4865644/3457248, in_queue=8323636, util=99.97%, aggrios=734101/314673, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00% md4: ios=734101/314673, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, aggrios=364562/313849, aggrmerge=2519/1670, aggrticks=2422422/2049132, aggrin_queue=4471730, aggrutil=94.37% sda: ios=364664/313901, merge=2526/1618, ticks=2627716/2223944, in_queue=4852092, util=94.37% sdb: ios=364461/313797, merge=2513/1722, ticks=2217128/1874320, in_queue=4091368, util=91.68%
The figures we’re interested in are the iops= ones, in this case 11237 and 4813 for read and write respectively.
Results ^
Here’s how different NCQ queue depths affected things. Click the graph image for the full size version.