Scsi_debug adapter driver for Linux


Scsi_debug adapter driver for Linux

Introduction

Parameters

Supported SCSI commands

Logical and physical block size

Logical block provisioning

Unit attentions

Zoned Block Devices

iopoll/blk_poll

Examples

Basic

Adding and removing hosts and devices

Mode pages

per_host_store

VERIFY

ZBC test scripts

REPORT LUNS Well Known LU

SAS personality

Downloads

Conclusion



Introduction

The scsi_debug adapter driver simulates a variable number of SCSI disks, each typically (but not always) sharing a common amount of RAM allocated by the driver to act as (volatile) storage. With one SCSI disk simulated, the scsi_debug driver is functionally equivalent to a RAM disk. When multiple SCSI disks are simulated, they could be viewed as multiple paths to the same storage device or simply separate devices. The driver can also be used to simulate very large disks, 20 terabytes or more in size by "wrapping" its data access within the available ram.

A small but hopefully useful set of SCSI commands is supported along with some crude error checking. The number of simulated devices and the shared RAM size for storage can be given as module parameters or boot time parameters if the scsi_debug driver is built into the kernel. The number of simulated devices (and hosts) can be varied at run time via sysfs. Various error conditions can be optionally generated to test the reaction of upper levels of the kernel and applications to abnormal situations.

To create real SCSI targets and Logical Units for something like an iSCSI server, the reader may prefer the Linux target subsystem, specifically the tcm_loop driver. In short: if you want create real SCSI devices then use the target subsystem; if you want to test or break something then read on. At the time of writing, the most recent version of this driver is 1.90 which adds the tur_ms_to_ready parameter and supports iopoll/blk_poll()/hipri requests.

This page describes the driver as found in the Linux kernel version 5.11.0 and/or possibly 5.12.0 . Earlier versions of this driver worked with the Linux kernel 2.6 series. For information about the scsi_debug driver found in the lk 2.4 production series see this page. A not so ancient, earlier version is here .

Parameters

The parameter name given in the table below is the module parameter name and the sysfs file name. The boot time parameter (if the scsi_debug driver is built into the kernel (not recommended)) has "scsi_debug." prepended to it. Hence the boot time parameter corresponding to add_host=2 is scsi_debug.add_host=2 .

When the scsi_debug module is loaded, many parameters can be given on the command line, separated by spaces: for example to simulate 140 disks "modprobe scsi_debug max_luns=2 num_tgts=7 add_host=10" could be used. This will generate 140 devices: 10 hosts, each with 7 targets, each with 2 logical units.

Sysfs parameters can be read with the cat command and written with the echo command. Sysfs expects a driver to be associated with a bus (e.g. PCI) so the "pseudo" bus was created for drivers like scsi_debug. An example:

# cd /sys/bus/pseudo/drivers/scsi_debug
# cat dev_size_mb
8
# echo 2 > max_luns
# echo 1 > add_host

Parameters appearing on the kernel boot line or given when the driver is loaded as a module may be either in decimal or hexadecimal (hex). Hex values are indicated by a '0x' prefix. Parameters changed via sysfs can only be done via decimal values unless otherwise noted.

In the /sys/module/scsi_debug/parameters directory the parameters used when the scsi_debug module was started (or their default values) are listed. Even though some of those parameters are have writable permissions, writing to them has no effect on the driver. Note that some parameters appear in this directory but not in the /sys/bus/pseudo/drivers/scsi_debug directory. If the sysfs access cell for a parameter in the following table is blank then it doesn't appear in the /sys/bus/pseudo/drivers/scsi_debug directory but does appear in the /sys/module/scsi_debug/parameters directory.

Here is a list of scsi_debug driver parameters:

Parameter name

default value

sysfs access

Hex accepted

sysfs write effect:

new in version or year:

Notes:

add_host

1

read-write

immediate


can add or remove hosts at runtime

ato

1

read only

-

1.81

application tag ownership (0 -> disk, 1 -> host)

bind


write only


Linux

takes a h:c:t:l tuple where "h" is the host number, "c" is the channel number, "t" is the target number and "l" is the LUN. Can be used together unbind option. Is not a driver parameter and is only found in the /sys/bus/pseudo/drivers/scsi_debug directory.

cdb_len

10

read-write

next command

0187

6, 10, 12, 16  and 32 accepted, other numbers treated same as 10. Size of READs, WRITEs and MODE SENSEs generated by the sd driver for the block layer. When 32 is given, it is treated as if 16 was given.

clustering

0



1.84

enable large transfers

delay

1

read-write

next command


IO command response delay: units are jiffies (configurable: 1 to 10 ms) . 0: no delay, all in one thread; -1: use a workqueue (no delay other than workqueue overhead) to complete each request.

dev_size_mb

8 [zbc: 128]

read only



units are Mebibytes (2**20 bytes). When zbc=managed, zbc=aware or ptype=0x14 is given at module or driver load time then the default size is 128 MiB.

dif

0

read-only


1.81

data integrity field type [T10: protection type]

dix

0

read-only


1.81

data integrity extension mask; check integrity when non zero

dsense

0

read-write

immediate

1.81

0 -> fixed; 1-> descriptor sense format

every_nth

0

Read-write
H

n commands from now


for error injection: 0 -> don't do error injection. When non zero (it can be negative) statistics parameter will be set to 1 if it isn't already.

fake_rw

0

read-write

next command

1.80

when set does no processing when a READ or WRITE command (of any cdb size) is received. When fake_rw=1 no ram is allocated.

guard

0

read-only


1.81

protection checksum: 0 -> crc; 1 -> ip

host_lock

0

read-write

next command

1.84, 1.88

when set wraps each submitted command a host_lock which is detrimental in a multi-queue system. From version 1.88 this parameter is ignored.

host_max_queue

0

read-only


1.89

when >0 enables the host-wide tag feature whereby each submitted command on a host has a unique tag taken from [0...max_queue). When >0 the max_queue parameter is changed to be equal to this parameter.

inq_product

"scsi_debug"



0187

user can set in module start parameters, 16 bytes long, pad spaces to the right to comply with SPC requirements.

inq_rev

driver_ver



0187

For example: "0187". Dropped decimal point and added leading "0" in this version. User can set in module start parameters, 4 bytes long

inq_vendor

"Linux"



0187

user can set in module start parameters, 8 bytes long, pad spaces to the right to comply with SPC requirements.

lowest_aligned

0



1.81

RCAP_16's lowest aligned logical block address (max: 0x3fff)

lbprz

1



2012

LB provisioning: returns 0s when reading unmapped block

lbpu

0



2012

LB provisioning: support UNMAP

lbpws

0



2012

LB provisioning: support WRITE SAME(16) and UNMAP

lbpws10

0



2012

LB provisioning: support WRITE SAME(10)

lun_format

0

read-write

next positive add_host or scan

1.90

when max_luns is 256 or less, by default this driver generates LUNs with "peripheral device addressing format" (address_method=0, bus_identifier=0). When lun_format is 1 this driver will generate "flat space" LUNs (address_method=1) instead. When max_luns is more than 256 (and less than or equal to 16,384) this driver will generate "flat space" LUNs irrespective of the lun_format setting. As long as no_lun_0 is not set to 1, LUN 0 is always generated as address_method=0, bus_identifier=0, lun=0 which is a LUN made of 8 zero bytes.

map


read-only



when logical block provisioning is active, it shows the internal provisioning map. Otherwise it shows '0-<sdebug_store_sectors>'.

max_luns

1

read-write

next positive add_host or scan


generates a sequence of LUNs: either 0 ... (max_luns-1) or 1 ... (max_luns-1); the second sequence is generated if no_lun_0 is set. In version 1.89 the largest supported max_luns increased from 256 to 16,384. Uses LUN peripheral device addressing format (address_method=0, bus_identifier=0) if max_luns is 256 or less, unless lun_format is set to 1. Uses LUN flat space addressing if lun_format is set to 1 or max_luns is more than 256. If no_lun_0 is not set, then LUN 0 is made of 8 zero bytes, irrespective of the addressing method.

max_queue

192

read-write

next command

1.82

number of commands driver can queue before telling mid-level it is full. Safe to change when commands already queued. Note: if host_max_queue is non-zero, it overrides the value of this parameter and any attempts to write to this parameter are ignored.

medium_error_count

10



0188

only active when bit 1 (0x2) of the opts parameter is set

medium_error_start

0x1234



0188

only active when bit 1 (0x2) of the opts parameter is set

ndelay

0

read-write

next command

1.84

IO command response delay: units are nanoseconds. If > 0 then the delay parameter will be ignored (it appears as -9999)

no_lun_0

0

read-write

next positive add_host or scan

1.77

no lun 0 but responds to INQUIRY and REPORT LUNS as per SPC-2

no_uld

0

read only


1.82

device (LUs) created by this driver will only attach to sg and bsg devices. So depending one their ptype (peripheral device type) there will be no corresponding /dev/sd*, /dev/sr*, /dev/st* or /dev/ses* device nodes. There will be /dev/sg* and /dev/bsg/<h:c:t:l> device nodes.

num_parts

0

read only



number of (MSDOS) partitions to configure on each store. If there is only one store then all LUs share the same partition table.

num_tgts

1

read-write

next positive add_host or scan


targets per host

opt_blks

64



1.84

'Optimal transfer length' field in Block Limits VPD page

opt_xferlen_exp

physblk_exp



1.84

Controls 'Optimal transfer length granularity' field in Block Limits VPD page

opts

0

read-write

H

usually following commands


0 -> quiet and no error injection
(mask 16 to inject aborted_command new in 1.81)

per_host_store

0

read-write

next positive add_host

1.89

0 -> a following add_host (or during module/driver invocation) will create one or more new hosts whose LUs will share the same storage as the previously added host. 1 -> a following add_host will create one or more new hosts whose LUs will share the same storage but with each new host having a new store. Note that each scsi_debug device (LU) has its own metadata (e.g. start/stop state and Unit Attention state).

physblk_exp

0



1.81

2**physblk_exp sets READ CAPACITY(16)'s logical blocks per physical block exponent field

poll_queues

0



1.90

this is the number of io_uring iopoll queues. Should be less than submit_queues. If it is not less than submit_queues then change to 1 (unless submit_queues is 1 in which case poll_queues is changed to 0). The reason why it must be less is that there needs to be at least one submit_queue available for non-iopoll requests.

ptype

0

read-write

next positive add_host or scan


peripheral device type (0 --> disk)

random

0

read-write


1.89

when 0 (the default) delays the response of media access SCSI commands as precisely as possible to the duration indicated by the delay and ndelay parameters. When 1 it chooses a delay from a uniform distribution from no delay (0) to the duration indicated by the delay and ndelay parameters. When multiple threads are issuing commands random=1 can be used to simulate out-of-order responses.

removable

0

read-write



When non-zero sets the RMB bit in the INQUIRY response indicating the device is removable

scsi_level

7

read only



from: 0 (no compliance), 1, 2 (SCSI-2), 3 (SPC), 4 (SPC-2), 5 (SPC-3), 6 (SPC-4), 7 (SPC-5)

sector_size

512

read only


1.81

logical block size in bytes. 512, 1024, 2048 and 4096 accepted

statistics

0

read-write

next command

1.86

collect statistics that are output by 'cat /proc/scsi/scsi_debug/<host_num>'. Needs kernel built with CONFIG_SCSI_PROC_FS selected. Due to sysfs policy this is not superseded by sysfs. [Don't believe everything you read in kernel config menus.]

strict

0

read-write

next command

1.85

check for bits set in the reserved part of SCSI command blocks. If found report with the position of the first offending bit.

submit_queues

1

read-only


0187

multi-queue setting from 1 (i.e. non-mq) to <= number of processors on the machine

tur_ms_to_ready

0

read-only


1.90

this is the number of milliseconds after a scsi_debug device is created that TUR (and other media access commands) will return a NOT READY sense key with the additional sense of "LU in process of becoming ready". Further, in the case of TUR, an estimate of the number of milliseconds before a subsequent TUR will first return GOOD status is placed in the sense data INFO field (see: T10 accepted proposal 20-061r2).

uevent


write-only


Linux

See bind option restrictions

unbind


write-only


Linux

opposite action of bind option, see its description.

unmap_alignment

0



1.81

Block limits VPD page's unmap granularity alignment

unmap_granularity

1



1.81

Block limits VPD page's optimal unmap granularity

unmap_max_blocks

0xffffffff



1.81

Block limits VPD page's maximum unmap LBA count

unmap_max_desc

255



1.81

Block limits VPD page's maximum unmap block descriptor count

uuid_ctl

0

read only


1.86

if 1 then each LU name is an internally generated UUID; if 2 then all LUs shared the same UUID and if 0 then the LU name is a locally assigned NAA

virtual_gb

0

read-write

immediate, next READ CAPACITY

1.79

When 0 then each device is (the same) dev_size_mb sized ram disk. When n > 0, "virtual" n Gibibyte size disk, wrapping on dev_size_mb actual ram. The Gibibyte unit is 2**30 bytes

vpd_use_hostno

1

read-write

next positive add_host or scan

1.80

the driver generates serial numbers and SAS naa-5 addresses based on host number ("hostno"), target id and lun. When set to 0, the generated numbers ignore "hostno".

wp

0



1.89

This parameter is for Write Protection. When 0 (the default) store modifying data access commands are permitted. When 1 store-modifying data access commands are not allowed (e.g. WRITE).

write_same_length

0xffff



2012

maximum blocks per WRITE SAME command

zbc

0 ['none']

read-only


1.89

The default is 0 or the string 'none'. To specify host-aware scsi_debug devices use 1 or 'host-aware'. To specify host-managed scsi_debug devices use 2 or 'host-managed' which will set the ptype (i.e. the Peripheral Device Type (pdt)) to 0x14. The three strings can be shortened to 'no', 'aware' and 'managed'. After the scsi_debug module is loaded with 'zbc=managed' say, using sysfs to change the ptype parameter to 0 will turn all scsi_debug devices into normal disk simulations.

zone_max_open

8



1.89

this is a limit on the number of co-incident OPEN ZONE commands that are allowed on a scsi_debug ZBC device. This can only be set at module/driver load time and is not visible in sysfs. The resultant value (which maybe changed due to other zone_* parameters) can be seen in the Zoned Block Device Characteristics VPD page [0xb6].

zone_nr_conv

1



1.89

this is the number of conventional zones that will be configured on each scsi_debug ZBC device. This can only be set at module/driver load time and is not visible in sysfs. Conventional zones comply SBC (i.e. "normal" disks) rather than ZBC models. Conventional zones typically appear first (i.e. the lowest LBAs) prior to any "sequential write required' or "sequential write preferred" zones.

zone_size_mb

128



1.89

this is the maximum size (in Mebibytes) of each zone that is allowed on a scsi_debug ZBC device. This can only be set at module/driver load time and is not visible in sysfs. If this parameter is not given (when zbc=host-managed) then a zone size of 128 MB is assumed.




The add_host parameter is the number of hosts (HBAs) to simulate. The default is 1. For boot time and module loads the allowable values are 0 through to a large positive number. For sysfs writes, a value of 0 does nothing while a positive number adds that many hosts and a negative number removes that number of hosts. A sysfs read of this parameter shows the current number of hosts scsi_debug is simulating. No more than num_tgts target ids will be used per host. Target ids are in ascending order from 0 excluding the target id that is used by the initiator (i.e. HBA) if any. The default setting of num_tgts is 1. The default setting for max_luns is 1. So the number of pseudo disks simulated at driver initialization time is (add_host * num_tgts * max_luns).  Note that if any of these three parameters is set to zero at kernel boot time or module load time then no devices are created. Modifying the add_host parameter in  sysfs can be used to simulate hot plugging and unplugging of hosts. See below for adding and deleting individual scsi devices

The ato parameter sets the field of the same name in the control mode page. The default value is 1 which implies the host is the application tag owner. A value of 0 implies the device server (e.g. the (pseudo) disk) is the application tag owner.

The cdb_len parameter controls the SCSI cdb lengths generated by the sd driver typically when it receives requests from the block layer. There are 3 bool internal variables: use_10_for_rw, use_16_for_rw and use_10_for_ms. "ms" is MODE SENSE/SELECT whose cdb can be 6 or 10 bytes long. If both use_10_for_rw and use_16_for_rw are false then READ(6) or WRITE(6) is used if the LBA and the number_of_blocks are not too large. This parameter can have these settings:

Note that this parameter has no control over the sd driver's use of READ(32) and WRITE(32) commands which are generated for some settings of Protection Information (PI).

The clustering parameter informs the SCSI mid layer whether (1) or not (0) clustering is enabled. The default is that is not (0) enabled. Setting this parameter facilitates large transfers of data with a single command.

The delay parameter is the number of jiffies by which the driver will delay responses. The default is 1 jiffy unless the ndelay parameter is given, see its description. Setting this parameter to 0 will cause the response to be sent back to the mid level before the request function is completed. The "jiffy" is a kernel space jiffy (typically the largest HZ figure yields a 1 millisecond on i386) rather than a user space jiffy (USER_HZ is typically 10 milliseconds on i386). HZ and USER_HZ are configurable in the kernel build. Both delayed and immediate responses are permitted however delayed responses are more realistic. For delayed responses, a kernel timer is used. [Real adapters would generate an interrupt when the response was ready (i.e. the command had completed).] For a fast ram disk set the delay parameter to 0. These SCSI commands ignore the delay parameter and respond immediately: INQUIRY, REPORT LUNS, REQUEST SENSE, SYNCHRONIZE CACHE plus various other non "media access" commands. TEST UNIT READY is considered a media access command.

The delay parameter may be set to -1 a kernel workqueue is used to generate a more or less immediate response (but in a different kernel thread). Trying to write a new value to delay while there are queued command responses may result in an EBUSY error.

The dev_size_mb parameter allows the user to specify the size of the simulated storage. The unit is Mebibytes (each 2**20 bytes and a bit larger than a Megabyte) and the default value is 8. The maximum value depends on the capabilities of the vmalloc() call on the target architecture. If the module fails to load with a "cannot allocate memory" message then a "vmalloc=nn{KMG}" boot time argument may be needed. [See the kernel source file: Documentation/kernel-parameters.txt for more information on this.] The RAM reserved for storage is initialized to zeros which leads the sd (scsi disk) driver and the block layer to believe there is no partition table present. Partitions can be simulated with num_parts (see below). All simulated dummy devices share the same RAM. If a value of 0 or less is given then dev_size_mb is forced to 1 so 1 MB of RAM is used. Given 512 byte logical blocks, the largest ramdisk that can be allocated is 2 TB but it is unlikely a system would be able to allocate that much ram (a situation that would be bypassed if fake_rw=1). Very large amounts of "virtual" storage can be simulated with the virtual_gb parameter (see below).

The dif parameter sets the T10 protection type which is a value between 0 and 3 where 0 (the default) is no protection. Protection information is extra bytes of data (typically 8) associated with blocks of data transferred between a SCSI initiator and a SCSI block logical unit (as defined in T10 SBC standards). T10 protection information is often called the "data integrity field" hence the name DIF. For information about DIF and DIX see https://oss.oracle.com/projects/data-integrity/documentation/ .

The dix parameter when set causes protection information to be carried between the operating system and the SCSI initiator. DIX is an abbreviation of "data integrity eXtension" and can be viewed as a front end to DIF. When its value is zero (the default) then no protection information is carried within the operating system. When the dix parameter is a non zero value then the the dix type will be the same as the dif parameter. So if dif=2 and dix=1 then both DIF and DIX are set to type 2 protection. Note that if dif=0 it doesn't matter what the dix parameter is, both DIF and DIX are set to type 0 protection (which is no protection).

The every_nth parameter takes a decimal number as an argument. When this number is greater than zero, then incoming commands are counted and when <n> is reached then the associated command generates some sort of error. Currently the available errors are timeout (when "opts & 4" is true) and RECOVERED_ERROR (when "opts & 8" is true) . Once the command count reaches <n> then it is reset to zero. For example setting every_nth to 3 and opts to 4 will cause every third command to be ignored (and hence a timeout). If every_nth is not given it is defaulted to 0 and timeouts and recovered errors will not be generated. Note that for the "every nth" mechanism to work the statistics parameter needs to be set.

If every_nth is negative then an internal command counter counts down to that value and when it is reached, continually generates the error condition (specified in opts) on each newly received command. The driver flags this continual error state by setting every_nth to -1 . The user can stop error conditions being generated on receipt of every subsequent command by writing 0 to every_nth (or opts ). From version 1.90 a hexadecimal number (prefixed by '0x') may be given for this parameter at runtime.

The fake_rw parameter instructs the scsi_debug driver to ignore all READ and WRITE commands and return a GOOD status. This means the data "read" when fake_rw is set is whatever was previously in the scatter gather list. The default value is 0 (i.e. process READ and WRITE commands). This parameter is for testing and when set can confuse the kernel or utilities that look for partitions and other information on a "disk".

The guard parameter when set to zero (the default) use T10 defined CRC in the protection information. When set to one the IP (internet protocol) checksum (as used by iSCSI ?) is used.

The host_lock parameter indicates whether each command (excluding its response delay and associated callback into the mid-layer) is surrounded by a per host host_lock (which is a kernel "spin lock"). In a SCSI multi-queue system the presence of this host lock will have the effect of serializing all commands form a host; and that is detrimental to system performance. Prior to version 1.84 this parameter was not available and the host_lock surround all commands. In version 1.84 and later the default is 0 which means the host_lock is not applied. Set host_lock=1 for the old behaviour. In version 1.88 this functionality (i.e. the host_lock) was removed and setting this parameter has no effect. It is kept so that scripts that call it will not break.

The inq_product parameter is the 16 byte ASCII string (left justified, space characters to the right) that get reported by this driver's standard INQUIRY response. The default is "scsi_debug      ".

The inq_rev parameter is the 4 byte ASCII string (left justified, space characters to the right) that get reported by this driver's standard INQUIRY response. The driver version number (was "1.86") has been reformatted to be suitable for this field. The default value is now "0187" and will increase as changes are added to this driver.

The inq_vendor parameter is the 8 byte ASCII string (left justified, space characters to the right) that get reported by this driver's standard INQUIRY response. The driver is "Linux   ".

The lbpu parameter, if set, causes the logical block partitioning VPD page to set the field of the same name. The default is to set the LBPU field to 0. When set this field indicates the UNMAP command is supported.

The lbpws and lbpws10 parameters cause the corresponding bits in the logical block partitioning VPD page to be set. The imply the the UNMAP field within the WRITE SAME(16) and WRITE_SAME(10) respectively are supported.

The lbprz parameter, if set, causes the logical block partitioning VPD page to set the field of the same name. When this field is set reading unmapped logical blocks will yield block(s) of data full of xeros to be returned.

The lowest_aligned parameter sets the field called LOWEST ALIGNED LOGICAL BLOCK ADDRESS in the READ CAPACITY (16) command response.
The default is zero which implies the logical block size and the physical block size are the same.

The max_luns parameter allows an upper limit to be placed on the logical unit number (lun) that the scsi_debug driver will respond to. A value of 2 means that this driver will respond to logical unit numbers 0 and 1. If max_luns is modified by a sysfs write then the scsi_debug driver modifies the scsi_host::max_lun member of all hosts that it owns. When max_luns is modified by a sysfs write then it will take effect the next time a host is added (see add_host) or when a scan is done on any existing host. The mid level scanning code will scan for up to but not including max_scsi_luns which is a SCSI mid level boot and module load time parameter.

The max_queue parameter indicates the maximum number of queued responses the driver can handle. This defaults to an internal define in the scsi_debug driver called SCSI_DEBUG_CANQUEUE which is currently 192 (on 64 bit machines, 96 or 32 bit machines). If both the delay and ndelay parameters are 0, no commands have queued responses. If there is an attempt to exceed this value then either SCSI_MLQUEUE_HOST_BUSY is returned to the mid-layer (the default) or a status of TASK_SET_FULL (if the 0x200 opts mask is set). Sysfs can be used at any time to change the value of max_queue, even when the are queued command responses.

The medium_error_count parameter indicates the number of blocks, including the medium_error_start LBA, on which to yield a SCSI MEDIUM ERROR sense key. This only occurs when the opts parameter has its bit 1 (i.e. 0x2) set. Its default value is 10.

The medium_error_start parameter indicates the first LBA to yield a SCSI MEDIUM ERROR sense key. This only occurs when the opts parameter has its bit 1 (i.e. 0x2) set. Its default value is 0x1234 (4660 in decimal).

The ndelay parameter is the response delay whose units are nanoseconds. This mechanism depends on high resolution timers in the kernel which may not be supported on small or old system (it is a kernel build config option). Its default value is 0 which means the delay parameter is operative. If ndelay is a positive value then a response delay for that many nanoseconds is active (and to indicate the delay parameter is overridden, it is set to -9999). Depending on the hardware, setting ndelay to less than a few microseconds probably causes no further reduction in the observed response delays. Trying to write a new value to ndelay while there are queued command responses may result in an EBUSY error.

The no_lun_0 parameter when set to a non zero value causes a lun 0 INQUIRY response of peripheral_qualifier==3 indicating there is no actual lu there. As required by SPC, lun 0 will still respond to the a REPORT LUNS command. If the REPORT LUNS has a 'select report' code of 1 or 2, then one of the luns reported will be the REPORT LUNS well known logical unit (lun 49409 or 0xc101). The default value is 0. If max_luns is greater than 1, the the first lun generated by scsi_debug will be lun 1 (since lun 0 is skipped). The REPORT LUNS well known logical unit (wlun) only supports the INQUIRY, REPORT LUNS, REQUEST SENSE and TEST UNIT READY SCSI commands. To make this wlun appear as a scsi generic (sg) device see the REPORT LUNS well known LUN example below.

The num_parts parameter writes a partition table to the ramdisk if the parameter's value is greater than 0. The default is 0 so in that case the ramdisk is simply all zeros. When num_parts is greater than zero a DOS format primary partition block is written to logical block 0, so the number of partitions is limited to a maximum of 4. The partitions are given an id of 0x83 which is a "Linux" partition. The available space on the ramdisk is roughly divided evenly between partitions when 2 or more partitions are requested. The partitions are not initialized with any file system. Even if no partitions are specified, a utility like fdisk can be used to added them later.

The num_tgts parameter allows the  number of targets per host to be specified. It should be 0 or greater.  Target id numbers start at 0 and ascend, bypassing the target id of the initiator (i.e. the HBA). If num_tgts is modified by a sysfs write then the scsi_debug driver modifies the scsi_host::max_id member of all hosts that it owns. When num_tgts is modified by a sysfs write then it will take effect the next time a host is added (see add_host) or when a scan is done on any existing host.

The opt_blks parameter is placed in the "Optimal transfer length" field of the Block Limits VPD page. Its default value is 64.

The opt_xferlen_exp parameter (with help from the physblk_exp parameter) controls the "Optimal transfer length granularity" field (OTLG) in the Block Limits VPD page. If 0 (default) or less than, or equal to, physblk_exp then the OTLG field is set to 2**physblk_exp making physblk_exp the effective default value. Otherwise, if this parameter is greater than physblk_exp then the OTLG field is set to 2**opt_xferlen_exp .

The opts parameter takes a  number as an argument which is the bitwise "or" of several flags. Values can be given either in decimal or hex (prefixed by '0x'). The flags that mention "nth" are only active when every_nth != 0 . So-called "read-write" commands include some others such as VERIFY. The flags supported are:



READ commands in the above list include READ(6), READ(10), READ(12), READ(16) and READ(32). WRITE commands include WRITE(6), WRITE(10), WRITE(12), WRITE(16), WRITE(32), WRITE SCATTERED(16) and WRITE SCATTERED(32). 'Media access' commands are any command that accesses this driver's data store (e.g. VERIFY(16)). If no commands are specified in a line of the above list then the error injection can occur on any command. Where commands are specified and the "nth" occurs on another command then a global flag is set and the next time one of the specified commands is processed then the error is injected (and the global flag cleared). These injection rules were changed to what is documented here in version 1.90 of this driver.

The opts "noisy" (or debug) flag will cause all scsi_debug entry points to be logged in the system log (and often sent to the console depending on how kernel informational messages are processed). With this flag set commands are listed in hex and if they yield a result other than successful then that is shown. In a busy system this may prove to be too much log "noise" in which case this combination of flags may be useful: opts=0x6201 .

The opts "medium error" flag will cause any read command whose LBA start at medium_error_start (default: 0x1234 (4660 in decimal)) for medium_error_count blocks to return a medium error indication to the mid level. The "ignore nth" flag is only active when every_nth != 0 . When an internal command counter reaches the value in every_nth and the "ignore nth" flag is set, then this command is ignored (i.e. quietly not processed). Typically this will cause the SCSI mid level code to timeout the command which leads to further error processing. The internal command counter is reset to zero whenever opts is written to, whenever every_nth is written to, when the every_nth value is reached and at driver load time. The "recovered error" flag  works in a similar fashion to the "ignore nth" flag, however when the every_nth value is reached and it is either a read or a write command then the command is processed normally but yields a "recovered error" indication. Such an indication is _not_ a hard error but for a real disk could indicate deteriorating media. The "aborted command" flag injects a transport error in a similar fashion to the way the "recovered error" flag works. A minor point: the kernel boot time and module load time opts parameter is a decimal integer. However the output sysfs value is a hexadecimal number (output as 0x9 for example) while the input value is interpreted as hexadecimal if prefixed by "0x" and decimal otherwise. When combining these flags it is easier to consider them as hexadecimal numbers.

The physblk_exp parameter becomes the "Logical blocks per physical block exponent" field in the READ CAPACITY (16) response. The default value is 0 which means the logical block and physical block sizes are the same.

The ptype parameter allows the SCSI peripheral type to be set or modified. The default value is 0 which corresponds to a disk. Other useful peripheral types are 1 for tape, 3 for processor, 5 for dvd/cd and 13 for enclosure (SES).

The scsi_level parameter is the ANSI SCSI standard level that the simulated disk announces that it is compliant to.  The INQUIRY response which is generated by scsi_debug contains the ANSI SCSI standard level value (in byte 2).

The sector_size parameter (default 512) is the logical block size in bytes (assuming ptype=0 which means a block storage device).

The statistics parameter controls whether several internal counters are incremented or not. For speed the default is 0 (i.e. don't collect statistics). The "every nth" mechanism requires those internal counters so specifying a non-zero every_nth parameter will cause the statistics collection to be turned on.

The strict parameter can be 1 or 0 (the default). If 1 then it uses the cdb mask given in the REPORT SUPPORTED OPERATION CODES command to check each command cdb received by this driver. If any bit is set in the cdb but the corresponding bit is not set in the mask, then the command is rejected with a status of CHECK CONDITION, a sense key of ILLEGAL REQUEST and additional sense of INVALID FIELD in CDB. The sense data also points to the byte and bit position in the cdb that first failed the mask comparison. Byte long (and longer) fields will always point at bit 7 as failed. Each cdb is scanned in ascending byte order.

The submit_queue parameter sets the number of submission queues the SCSI multi-queue logic will maintain for this driver. The default value is 1 which implies no multi-queue. If a value is given that exceeds the number of processors on the machine then the value used will be the number of processors on the machine. A warning is issued to the log if the driver reduces this value.

The uuid_ctl parameter controls whether a locally assigned NAA (64 bit value) is used to identify each logical unit (LU) simulated by this driver, or if a UUID (128 bit, RFC 4122) is used. If the value is 0 (the default) a locally assigned NAA is used. If the value is 1 then a new UUID (effectively a random value) is generated for each LU. If the value is 2 then the same generated UUID is used for all LUs simulated by this driver.

The virtual_gb parameter allows the scsi_debug driver to simulate a much larger storage device than physical RAM available in the machine.  When the virtual_gb parameter is 0 (its default value) then the maximum storage available is that indicated by the dev_size_mb parameter. When the virtual_gb parameter is greater than zero, that many Gibibytes (each of 2**30 bytes and larger than a Gigabyte) are reported by the READ CAPACITY command. Reading and writing of the "Gigabytes" of data wraps around within the available physical ram (which the scsi_debug driver has allocated and is dev_size_mb Mebibytes in size). When the number of virtual Gibibytes is 2048 or greater then READ CAPACITY (16) is needed to represent the size and READ (16) and/or WRITE (16) are needed to access data at the 2048 Gibibyte boundary and beyond. This boundary represents 2**32-1 blocks (sectors) assuming 512 bytes long. The "wrapping" action still allows partitions to be written with fdisk and in many cases a file system to be initialized. Trying to store and retrieve any useful data on such a big virtual disk would not be wise! Setting the dev_size_mb parameter to a prime number, larger than the default value (which is 8) and that doesn't starve the machine for resources, seems to help in creating ext3 file systems. This occurs since mkfs writes the file system super block at several offsets within the partition, and the wrap may cause the file system header to be overwritten. The virtual_gb option is designed for testing, not practical data storage.

The vpd_use_hostno parameter affects the way the scsi_debug driver generates its serial numbers, SAS and naa-5 addresses. When vpd_use_hostno is set to 1 (its default value) then the host number ("hostno"), target_id and lun are used to generate the serial number, SAS and naa-5 addresses. The formula is "((hostno + 1) * 2000) + (target_id * 1000) + lun)".  When vpd_use_hostno is set to 0 then the "hostno" term in the formula is set to 0. This has the affect of making multiple simulated hosts look like they are connected to the same drives (i.e. there are only "num_tgts * max_luns" unique simulated devices). The kernel will still report "add_host * num_tgts * max_luns" devices but higher level multipath aware software may see the difference

Supported SCSI commands

Below is a list of supported commands. Some do nothing (e.g. SYNCHRONIZE CACHE). Those that have interesting functionality have notes in brackets. If the feature was introduced in a recent version (i.e. since 1.76) then that is noted. 

The implementations of the above commands are sufficient for the scsi subsystem to detect and attach devices. The fdisk, e2fsck and mount commands also work as do the utilities found in the sg3_utils package (see the main page). Crude error processing picks up unsupported commands and attempts to read or write outside the available RAM storage area.

Modern SCSI devices use vital product page 0x83 for identification. This driver yields both "T10 vendor identification" and "NAA" descriptors. The former yields an ASCII string like "Linux   scsi_debug      4000" where the "4000" is the ((host_no + 1) * 2000) + (target_id * 1000) + lun). In this case "4000" corresponds to host_no==1,  target_id==0 and lun==0. The "NAA-5" descriptor is an 8 byte binary value that looks like this hex sequence: "51 23 45 60 00 00 0f a0" where the IEEE company id is 0x123456 (fake) and the vendor specific identifier in the least significant bytes is 4000 (which is fa0 in hex). [The "4000" is derived the same way for both descriptors.]

Read and write commands executed by the scsi_debug driver are atomic (i.e. a write to one scsi_debug device will not interrupt (split) a read from another scsi_debug device. So a  read command will either yield the contents of ram before a co-incident write, or after the co-incident write has finished. In the presence of multiple stores, the read-write locks are per store.

The START STOP UNIT (SSU) and SYNCHRONIZE CACHE (SC) commands have special longer delay processing from version 0188 onward. For both commands if ndelay <= 10,000 (10 microseconds) then long delays are ignored. Otherwise SSU has at least a 1 second delay and if delay > 1 then its delay is that many seconds. And for SC its longer delay is 1/20 that of SSU (e.g. if delay=2 then SSU's delay is 2 seconds and SC's delay is 100 milliseconds).

Logical and physical block size

scsi-debug supports emulating devices with logical block sizes bigger than 512 bytes. This can be specified using the sector_size option.

Some storage devices use physical block sizes bigger than 512 bytes internally but expose a 512-byte logical block size to the host for compatibility reasons. The physblk_exp parameter can be used to indicate that the internal block size is 2^n times bigger than the reported logical block size. For instance: Supplying physblk_exp=3 on the command line will cause scsi_debug to simulate a device with 512-byte logical blocks and 4KB physical blocks.

Not all storage devices have logical block 0 aligned to a physical block boundary. These devices can be emulated using scsi_debug's lowest_aligned option. The parameter indicates the lowest LBA that is aligned to a physical block boundary.

Logical block provisioning

SBC-3 introduced Logical block provisioning. That term covers both "thin provisioning" (the earlier term for this facility) and  "over provisioning" as used in modern SSDs.

Thin provisioning means that devices can report a capacity that is bigger than the space actually allocated. When files are deleted, the relevant blocks can be reclaimed by the storage device and used for something else. And consequently only blocks that are actively in use consume physical storage space.

SBC-3 specifies two different approaches for marking blocks as unused: WRITE SAME(16) with the UNMAP bit set, and the UNMAP command. scsi_debug supports both methods and they are controlled via 4 module parameters:

Examples:

 modprobe scsi_debug lbpws=1 unmap_max_desc=0 unmap_granularity=1

will simulate a device that only supports WRITE SAME(16) and which tracks usage on a per logical block basis. This is how most solid state drives work.

 modprobe scsi_debug lbpu=1 unmap_max_desc=64 unmap_granularity=2048


will simulate a device that supports UNMAP and which is provisioned in 1MB chunks. This is a common scenario for thinly provisioned storage arrays.

The current block allocation bitmap can be viewed from user space via:

 cat /sys/bus/pseudo/drivers/scsi_debug/map

Unit attentions

An important feature of the SCSI command sets is the concept of a Unit Attention (UA). This is a mechanism for the "device server" within a logical unit (e.g. a disk) to report to the originator (e.g. a user space program, a file system or the kernel) that something, not directly related to the command that was just sent, has happened. That report takes the form of the command not being done and sense data with the UNIT ATTENTION sense key being returned. Additional information about the UA is provided in the sense data and the originator is expected to take note. UAs are typically only reported once so if the initiator repeats the command it should work (or a different type of UA might be delivered).

An example might make this clearer. It is possible to change the number of logical blocks on a disk; the FORMAT command could do that. In the scsi_debug driver even though dev_size_mb cannot be changed at run time, the virtual_gb parameter can be. If the the virtual_gb parameter is changed (via sysfs, after the driver has been running), then the "Capacity data has changed" UA condition is set. The next command sent to that device will receive that UA (with some exceptions) in the returned sense data (and the command is not done). The exceptions are the INQUIRY, REPORT LUNS and REQUEST SENSE commands which skip UA reporting (see SAM-5 for details). Once the originator sees that UNIT ATTENTION sense key, it should note the reason, and repeat the command unless it is directly impacted. If the command that got "hit" by this UA was a READ or a WRITE then the originator might want to do a READ CAPACITY command first, at least to check that the LBA given to the READ or WRITE command was still in range.

The scsi_debug driver reports these Unit Attentions:

If there is more than one UA, then they are reported in the ascending order of that list.

Zoned Block Devices

Version 1.89 of this driver added support for host-managed and host-aware Zone Block devices (i.e. disks). Host managed devices must have one or more zones that comply with the sequential write required model. Host aware devices must have one or more zones that comply with the sequential write preferred model. Both host managed and host aware devices may have one or more conventional zones and if present, they are usually placed before (i.e. have lower LBAs than) the sequential write zones. A host-aware ZBC device is a less restrictive form of ZBC which provides backward compatibility with regular disks and, as a result, random write operations have unpredictable performance. On the other hand, a host-managed ZBC device rejects random writes to ensure fast execution of sequential writes. For both ZBC types, read command processing is similar in performance to normal disks.

All scsi_debug devices generated when this module is loaded with the "zbc=host-managed" parameter will be of that type. All scsi_debug devices generated when this module is loaded with the "zbc=host-aware" parameter will be of that type, which has a ptype value of 0x0 and that happens to be the SCSI Peripheral Device Type (pdt) of a normal disk. An application can distinguish between a normal disk and a host-aware ZBC disk by reading the ZONED field in the Block Device Characteristics VPD page (0xb1); a normal disk will have 0 in the ZONED field, a ZBC host-aware disk will have 1 in the ZONED field while a ZBC host-managed disk will have 2 in the ZONED field. Also a ZBC host-managed disk will have a pdt of 0x14 which is probably the simplest way to distinguish it.

The following table shows how each scsi_debug device is set up when zbc=managed is given:

dev_sz_mb

nr_zones

[first conventional]

zone_size

max_opens

not given

4

32 MiB

1

64 MiB

4

16 MiB

1

128 MiB

4

32 MiB

1

256 MiB

4

64 MiB

1

512 MiB

4

128 MiB

1

1024 MiB

8

128 MiB

3

2048 MiB

16

128 MiB

8



A few observations from this table:

Three zone_* parameters can be given at module/driver load time, but in some cases they are overridden (e.g. max_opens). If they are not given (and "zbc=host-managed") then the driver tries to mimic what a real ZBC/ZAC disk does. One significant difference is that real ZBC disks can have sizes in the order of 20 TB and not many systems have enough ram to allow dev_size_mb=20000000.

There is an example below that exercises a scsi_debug ZBC device using a zonefs-tools test script. In that example, the relevant scsi_debug driver parameters (for ZBC) are ' zbc=managed zone_size_mb=8 zone_nr_conv=3' and this leads to a device size 128 MiB with 13 sequential write required zones in addition to the 3 conventional zones requested, for a total of 16 zones, each 8 MiB long. The maximum number of open zones is 8.

iopoll/blk_poll

This driver supports iopoll/blk_poll() in version 1.90 and later. It does this by defining the mq_poll() callback. This driver notices the REQ_HIPRI setting on an incoming request which is the block layer's way of indicating that some upper layer is going to call blk_poll() on this request until completion occurs. Usually command/request completion is signalled by a (software) interrupt from the LLD. For this LLD (i.e. the scsi_debug driver) those events are generated by kernel timers when delay (or ndelay) has expired. These kernel timers are not set up for REQ_HIPRI requests. Instead upper layers are assumed to call blk_poll() on those requests and this driver sees them as invocations of its mq_poll() entry point. And mq_poll() manually checks if delay (or ndelay) has expired, and if so carries out command/request completion.

Note that if the upper layers do not call blk_poll() after setting REQ_HIPRI on a request then the SCSI mid-level will time-out that request (typically after 60 seconds). Timed out commands are logged and appear like this:

sd 6:0:0:0: [sdf] tag#0 FAILED Result: hostbyte=DID_TIME_OUT driverbyte=DRIVER_OK cmd_age=125s

usually together with the SCSI command (in hex bytes) that failed. Excessive timeouts can lead to SCSI sub-system instability and should be avoided, if possible. When simulating a fast device (say with a latency of < 50 microseconds) poll based techniques (such as iopoll/blk_poll) have been found to perform better than event/interrupt based techniques.

There is an important difference between poll(2) and epoll(2) techniques and iopoll/blk_poll. Consider an async user space program using the SIGIO signal to indicate command/request completion: there are two stages of events for each completion. First the LLD sends a (software) interrupt to the SCSI mid-level to indicate a command/request has completed; then the block layer sends a SIGIO signal to the process that has been "armed" to receive that SIGIO signal. The poll(2) and epoll(2) system calls can be used to bypass SIGIO generation and handling, but the former (software) interrupt remains. iopoll/blk_poll is more efficient (and thus marginally faster) because it reaches down to the LLD to "short circuit" the usual two stage process.

The Linux io_uring mechanism allows requests with REQ_HIPRI to be issued and handles blk_poll() invocations until completion. The fio utility can be used with the io_uring engine and the hipri=1 setting to test this facility. The Linux scsi generic (sg) driver has a new flag: SGV4_FLAG_HIPRI that leads to its requests being issued with REQ_HIPRI set. Also the sg driver's ioctl(SG_GET_NUM_WAITING) will call blk_poll() if appropriate as will several other user space calls (e.g. the blocking ioctl(SG_IO) ).

Examples

Basic

Since scsi_debug is for testing it seems more useful to build it as a module rather than build it into the kernel. Some parameters cannot be changed once the scsi_debug driver is running. So if it is a module then it can be removed with rmmod and reloaded with another modprobe call with the desired parameters.

When the driver is loaded successfully simulated disks should be visible just like other SCSI devices:

# modprobe scsi_debug

# lsscsi -s
[0:0:0:0] disk SEAGATE ST33000650SS 0005 /dev/sda 3.00TB
[0:0:1:0] enclosu Intel RES2SV240 0d00 - -
[4:0:0:0] disk ATA ST3160812AS D /dev/sdb 160GB
[7:0:0:0] disk Linux scsi_debug 0184 /dev/sdc 8.38MB

In this case there is a 3 TB SAS disk, an ATA disk and a small scsi_debug pseudo disk. The other device (at [0:0:1:0]) is a SCSI Enclosure Service (SES) device.  The /dev/sdc pseudo disk is full of zeros and has no partitions. To get a partition the num_parts parameter could have been used on the modprobe line or it could be done from the command line with the fdisk /dev/sdc command. Assuming one ext3 partition is allocated to the whole pseudo disk (8 MB in this case) then the mkfs.ext3 /dev/sdc1 command can be used to make an ext3 file system. Now /dev/sdc1 can be mounted and treated like a normal file system. Naturally when the power is turned off anything stored in /dev/sdc1 will be forgotten.

Rather than mounting the pseudo disk, the sg3_utils package could be used to carry out various tests on it.

Information about the scsi_debug driver version, its current parameters and some other data can be found in the "proc" file system. The trailing number in the path is the scsi_debug host number which is the first element in the 4 item tuple shown in the lsscsi above :

# cat /proc/scsi/scsi_debug/2
scsi_debug adapter driver, version 0189 [20200225]
num_tgts=1, shared (ram) size=1024 MB, opts=0x0, every_nth=0
delay=-9999, ndelay=100000, max_luns=10, sector_size=512 bytes
cylinders=130, heads=255, sectors=63, command aborts=0
RESETs: device=0, target=0, bus=0, host=0
dix_reads=0, dix_writes=0, dif_errors=0
usec_in_jiffy=1000, statistics=0
cmnd_count=0, completions=0, miss_cpus=0, a_tsf=0
submit_queues=1
  queue 0:
    in_use_bm BUSY: first,last bits: 0,65
this host_no=2

host list:
  0: host_no=0, si_idx=0
  1: host_no=1, si_idx=1
  2: host_no=2, si_idx=2

per_store array [most_recent_idx=2]:
  0: idx=0
  1: idx=1
  2: idx=2

Here is an important sysfs directory for the scsi_debug driver:

# cd /sys/bus/pseudo/drivers/scsi_debug/

# ls -x
adapter0     adapter1        adapter2   add_host  ato        bind        cdb_len         delay
dev_size_mb  dif             dix        dsense    every_nth  fake_rw     guard           host_lock
map          max_luns        max_queue  ndelay    no_lun_0   no_uld      num_parts       num_tgts
opts         per_host_store  ptype      random    removable  scsi_level  sector_size     statistics
strict       submit_queues   uevent     unbind    uuid_ctl   virtual_gb  vpd_use_hostno  zbc

Those files are most of the scsi_debug parameters, those that are writable can be modified and the scsi_debug actions will change accordingly thereafter. Certain parameters cannot be changed while the driver is busy (e.g. it has queued command responses), in which case EBUSY is returned if the user attempts to change one. Reading one can be done with the cat command and changing one can be done with the echo command:

# cat every_nth
0
# echo 2000 > every_nth

Another important sysfs directory for (any) disks is /sys/block/<disk_node_name> and its queue sub-directory. So in this case of this scsi_debug pseudo disk that directory would be /sys/block/sdc/queue . Also there is the scsi_device sysfs directory that has the form /sys/class/scsi_device/<h:c:t:l>/device where the <h:c:t:l> tuple is found at the left hand side of each device listed by lsscsi. This sysfs directory contains many important SCSI device parameters some of which can be modified.

Adding and removing hosts and devices

Individual devices can be removed via sysfs and the mid-level by writing any value into the "delete" member in the sysfs directory corresponding to the scsi device. Given these devices:

# lsscsi -s
[0:0:0:0] disk SEAGATE ST200FM0073 0A04 /dev/sda 200GB
[4:0:0:0] disk ATA ST3160812AS D /dev/sdb 160GB
[7:0:0:0] disk Linux scsi_debug 0184 /dev/sdc 21.4GB

then the scsi_debug (pseudo) disk can be deleted like this:

# echo 1 > /sys/class/scsi_device/7:0:0:0/device/delete

After which this should be seen:

# lsscsi -s
[0:0:0:0] disk SEAGATE ST200FM0073 0A04 /dev/sda 200GB
[4:0:0:0] disk ATA ST3160812AS D /dev/sdb 160GB

This will work for any scsi device (not just those belonging to scsi_debug). That scsi device can be re-added with the following command:

# echo "0 0 0" > /sys/class/scsi_host/host7/scan

# lsscsi
[0:0:0:0] disk SEAGATE ST200FM0073 0A04 /dev/sda 
[4:0:0:0] disk ATA ST3160812AS D /dev/sdb 
[7:0:0:0] disk Linux scsi_debug 0184 /dev/sdc

The three numbers in the "echo" are channel number, target number and lun, respectively. Wildcards (hyphen: "-") can be given for any or all of the three numbers.

# echo 3 > /sys/bus/pseudo/drivers/scsi_debug/max_luns
# echo 2 > /sys/bus/pseudo/drivers/scsi_debug/num_tgts
# echo "0 - -" > /sys/class/scsi_host/host7/scan

# lsscsi
[0:0:0:0] disk SEAGATE ST200FM0073 0A04 /dev/sda 
[4:0:0:0] disk ATA ST3160812AS D /dev/sdb 
[7:0:0:0] disk Linux scsi_debug 0184 /dev/sdc 
[7:0:0:1] disk Linux scsi_debug 0184 /dev/sdd 
[7:0:0:2] disk Linux scsi_debug 0184 /dev/sde 
[7:0:1:0] disk Linux scsi_debug 0184 /dev/sdf 
[7:0:1:1] disk Linux scsi_debug 0184 /dev/sdg 
[7:0:1:2] disk Linux scsi_debug 0184 /dev/sdh

The 'echo "0 - -" > scan' line above added five devices: /dev/sdd to /dev/sdh .


Extra hosts can be added and removed from the scsi_debug driver as follows:

# cd /sys/bus/pseudo/drivers/scsi_debug
# echo 1 > add_host  # add a new host (after the existing hosts)
# echo -2 > add_host # remove the last two hosts (if at least that many are present)

The scsi_debug driver does not have any limits on the number of scsi devices it can create. By default when loaded it has one scsi device (owned by a host). Larger numbers of devices can be introduced at load time by specifying the add_host, num_tgts and/or max_luns parameters, the number of scsi devices created is the product of the 3 parameters (they all default to 1). Alternatively sysfs can be used to add (or remove) scsi devices after the scsi_debug driver is loaded. Two strategies can be used:

Even though the scsi_debug can create ten thousand or more devices, it doesn't mean that the scsi mid-level, sd, sg, the block layer and various other kernel components will handle it gracefully.

Mode pages

The supported mode pages are listed following the MODE SENSE entry in the supported commands sections above. Prior to version 1.80, when a mode page is read no block descriptor is included in the response. From version 1.78 the MODE SELECT command is supported. Three mode pages can be modified:

The saved pages are not supported, reflecting that the scsi_debug driver has only volatile storage. All fields can be changed, only those fields indicated above have side effects.

per_host_store

Various users have asked for each scsi_debug device (i.e. "Logical Unit" (LU) in SCSI parlance) to have its own ram or backing store rather than all devices sharing the same ram. The answer has been: "look at tcm_loop" because this driver has been built to simulate thousands of hosts, targets and devices (LUs) without consuming the sort of resources that would usually imply.

As a compromise the per_host_store parameter has been added. It can be used during the driver/module invocation or via sysfs where it is read-writable. It is boolean and defaults to 0 (false). Setting it to 1 (true) has no effect immediately, but any following add_hosts (given a positive number) will create one or more new hosts, each with their own "store". "store" in this case includes it own ramdisk, logical block partitioning map and protection information (the latter two are optional). Each store has its own read-write lock protecting access to it. Note that if a host has more than one target and/or more than device (i.e. logical unit or LU) , then they will all share the same store. So the granularity of the store it at the SCSI host level.

The number of stores and (scsi_debug) hosts are not necessarily the same. Any host added (by writing to add_host in sysfs) when per_host_store=0 will share the last added store. So if per_host_store has been 0 ever since the driver/module was loaded then all added hosts will share the one (and only) store which is the same situation that occurred before version 1.89 . When sysfs is used to remove a host that has its own store (e.g. with 'echo -1 > add_host') then the (last added usually) host will be removed but its store will not be removed. Even if that store is orphaned (i.e. it has no other hosts using it) it will continue to consume ram on the machine. If subsequently a new host was added (when per_host_store=1) then it would most likely be given that orphaned store (marked "not in use" when 'cat /proc/scsi/scsi_debug/<host_no>' is output) and that store would be marked as in use. Keeping store around like that is a programmer's way of guarding against "use after remove" errors which are very difficult to guard against. The downside is the amount of ram this driver consumes will not shrink. Since the scsi_debug driver may be built in and causing the machine grief due to its ram consumption, a mechanism has been added to shrink the number of stores back to one (since the only other option might be to reboot the machine). If the fake_rw parameter is toggled (i.e. first written to 1 then written to 0) then the number of stores is shrunk to one, and all hosts (i.e. the same number of hosts present before the toggle) will share the one remaining store.

VERIFY

Having more than one store, allows more thorough testing of copy operations. If the source and destination of a copy are on different stores owned by this driver, then just like a "real" copy, the accuracy of the copy can be checked by comparing that the source and destination after the copy are indeed the same. The normal way to check that would be to read the source and destination into use space buffers, then compare those two buffers. There is another way: read from one and use that user data to send to the other and ask the device to do the compare operation. NVMe calls its command to do that "Compare", in SCSI the naming is a bit more obscure: VERIFY(BYTCHK=1). BYTCHK is a two bit field that when set to 1 effectively does a compare operation. It can do one other trick: when BYTCHK=3 (which is implemented in Version 1.89) it takes a single logical block and compares it with each block in the LU in the VERIFY range (i.e. (starting) LBA and number of blocks). That would allow an LU to be checked if it were all zeros, for example, without having to send Terabytes of zeros to the LU.

The SCSI VERIFY(BYTCHK=1) command in version 1.89 of this driver returns GOOD status if the comparison succeeds. If that comparison fails, it stops at that point and returns a CHECK CONDITION status, with a sense key of MISCOMPARE.

The following first loads the scsi_debug module with 4 hosts, each with one LU. Each host (and therefore each LU) has its own 1 GiB of store. The logical block size defaults to 512 bytes and the ndelay=100000 random=1 means that the delay on each media command is uniformly distributed between a maximum of 100 microseconds and 0 microseconds. dd is not too smart trying to write random data off the end of /dev/sdc but it does the job: filling /dev/sdc (/dev/sg2) with pseudo random data. Then there is a copy of the data from /dev/sg2 (aka /dev/sdc) to /dev/sg3 . Finally the --verify on the sg_dd utility (found in version 1.45 and later of the sg3_utils package) compares /dev/sg2 and /dev/sg3 using the SCSI VERIFY(BYTCHK=1) command.

# modprobe scsi_debug dev_size_mb=1024 ndelay=100000 random=1 per_host_store=1 add_host=4
# lsscsi -gs
[0:0:0:0]    disk    Linux    scsi_debug       0189  /dev/sda   /dev/sg0   1.07GB
[1:0:0:0]    disk    Linux    scsi_debug       0189  /dev/sdb   /dev/sg1   1.07GB
[2:0:0:0]    disk    Linux    scsi_debug       0189  /dev/sdc   /dev/sg2   1.07GB
[3:0:0:0]    disk    Linux    scsi_debug       0189  /dev/sdd   /dev/sg3   1.07GB
[N:0:1:1]    disk    INTEL SSDPEKKF256G7L__1                    /dev/nvme0n1  -           256GB

# dd if=/dev/urandom of=/dev/sdc
dd: writing to '/dev/sdc': No space left on device
2097153+0 records in
2097152+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 49.4036 s, 21.7 MB/s

# sg_dd if=/dev/sg2 of=/dev/sg3 bs=512
2097152+0 records in
2097152+0 records out

# sg_dd --verify if=/dev/sg2 of=/dev/sg3 bs=512
2097152+0 records in
2097152+0 records verified
#
# # force a compare failure:
# dd if=/dev/zero seek=1234 bs=512 of=/dev/sdc count=1
1+0 records in
1+0 records out
512 bytes copied, 0.00119379 s, 429 kB/s

# sg_dd --verify if=/dev/sg2 of=/dev/sg3 bs=512
verifying: SCSI status: Check Condition 
Fixed format, current; Sense key: Miscompare
Additional sense: Miscompare during verify operation
sg_write failed, seek=1152
Some error occurred,  remaining block count=2096000
1280+0 records in
1152+0 records verified
1 unrecovered error(s)

#
# sg_dd --verify if=/dev/sg2 of=/dev/sg3 bs=512 bpt=1
verifying: SCSI status: Check Condition 
Fixed format, current; Sense key: Miscompare
Additional sense: Miscompare during verify operation
sg_write failed, seek=1234
Some error occurred,  remaining block count=2095918
1235+0 records in
1234+0 records verified
1 unrecovered error(s)
#

To demonstrate that the compare/verify is working, the second half of the above example writes a block of zeros at LBA 1234 in /dev/sdc then repeats the sg_dd --verify utility. This time the verify fails at some point after block 1152. This error report is a bit "coarse" because the "blocks per transfer" parameter (bpt) defaults to 128 (when the logical block size is 512 bytes). By repeating the sg_dd --verify with bpt=1 the error report is more finely tuned saying 1234 LBs where verified (i.e. blocks 0 to 1233 inclusive) before the verification failed. The "unrecovered" error has now been changed to a "miscompare" error in the sg_dd utility. That should better convey that the error is logical in nature, rather than a hardware problem with the media.

Aside: the SCSI VERIFY command with the BYTCHK field set to 3 allows a relatively fast check if a disk (or portion of it) is all zeros or all 0xff bytes. In this configuration the application client sends a single logical block in the data-out buffer and the LU's device server (in the target) compares that logical block with the LBs on the disk starting at the given LBA until give "verification length" (in LBs) is reached or until the comparison fails, whichever comes first. From version 1.89 the scsi_debug driver also supports this configuration.

The following is a variant of the same verification check using a single scsi_debug "disk". Say it is 1 GB in size which is 0x200000 512 byte blocks. Split the copy and verify in two: bottom half and top half of the LBA range (but urandom fills the whole disk).

# ddpt if=/dev/urandom of=/dev/sg0
Assume logical block size of 512 bytes for both input and output
2097152+0 records in
2097152+0 records out
time to transfer data: 10.970504 secs at 97.88 MB/sec

# sgh_dd if=/dev/sg0 bs=512 of=/dev/sg0 seek=0x100000 count=0x100000
time to transfer data was 0.511428 secs, 1049.75 MB/sec
1048576+0 records in
1048576+0 records out

# sgh_dd --verify if=/dev/sg0 bs=512 of=/dev/sg0 seek=0x100000 count=0x100000
Doing verify/cmp rather than copy
time to transfer data was 1.018330 secs, 527.21 MB/sec
1048576+0 records in
1048576+0 records verified

This also demonstrates that request sharing works (since sgh_dd uses request sharing by default) when both file descriptors (i.e. the one associated with if=/dev/sg0 and the other associated with of=/dev/sg0) are opened on the same sg device.

ZBC test scripts

The following test script is found in the libzbc package that may need to be built before running it. The latest source can be found at:

# git clone https://github.com/hgst/libzbc.git

The build sequence should be: 'cd <libzbc_root_directory> ; ./autogen.sh ; ./configure --with-test ; make ; make install' with the 'make install' probably needing root permissions. To run the test, starting from the same directory:

# modprobe scsi_debug max_luns=1 sector_size=512 zbc=managed zone_size_mb=8 zone_nr_conv=3
# lsscsi -g
< to see where the ZBC SCSI Generic device has "landed", say /dev/sg1 >
# cd test
# # ./zbc_test.sh /dev/sg1
Executing section 00 - command completion tests...
    00.010: REPORT_ZONES command completion...                                            [Passed]
    00.011: REPORT_ZONES (partial bit) command completion...                              [Passed]
    00.012: REPORT_ZONES (reporting option 0x10) command completion...                    [Passed]
    00.013: REPORT_ZONES (reporting option 0x11) command completion...                    [Passed]
    00.014: REPORT_ZONES (reporting option 0x3F) command completion...                    [Passed]
    00.020: OPEN_ZONE command completion...                                               [Passed]
...
    02.073: WRITE implicit open to full...                                                [Passed]
    02.074: WRITE closed to implicit open...                                              [Passed]
    02.075: WRITE closed to full...                                                       [Passed]
    02.076: WRITE explicit open to explicit open...                                       [Passed]
    02.077: WRITE explicit open to full...                                                [Passed]
    02.078: WRITE full to full...                                                         [Passed]

All tests should show "Passed" (although some are N/A (not applicable)).

There is another test that can be run that relies on the zonefs file system in Linux which was introduced in lk 5.5 . It is in the zonefs-tools package which may need to be built. The latest source can be found at:

# git clone https://github.com/damien-lemoal/zonefs-tools.git

and may rely other packages to be built (e.g. build with './autogen.sh ; ./configure ; make ; make install'). The fio package should also be loaded (if not several tests will be skipped). This test runs using a normal SCSI block device (e.g. /dev/sdb) rather than a SCSI Generic device (e.g. /dev/sg1) as shown in the above test. To run the zonefs test:

# modprobe scsi_debug max_luns=1 sector_size=512 zbc=managed zone_size_mb=8 zone_nr_conv=3
# lsscsi -s
< to see where the ZBC device has "landed", say /dev/sdc >
# cd <dir_with_zonefs-tools>/tests
# # ./zonefs-tests.sh /dev/sdc
zonefs-tests on /dev/sdc:
   16 zones (3 conventional zones, 13 sequential zones)
   16384 512B sectors per zone (8 MiB)
Running tests
  Test 0010:  mkzonefs (options)                                   ... PASS
  Test 0011:  mkzonefs (force format)                              ... PASS
....
  Test 0087:  Sequential file append (async)                       ... PASS
  Test 0088:  Sequential file random read                          ... PASS
  Test 0089:  Sequential file mmap read/write                      ... PASS

44 / 44 tests passed

The author found with Ubuntu 18.04 LTS the mount command was too old (from util-linux 2.31.1) but all was well on Debian 10.3 and Ubuntu 19.10 . To make the test more thorough either 'random=1' could be added to the scsi_debug load line or sysfs could be used to change that parameter like this:

# echo 1 > /sys/bus/pseudo/drivers/scsi_debug/random

In the absence of either the delay or ndelay parameters being set, the scsi_debug driver will delay each response of media access SCSI commands (e.g. READs and WRITEs) by 1 millisecond. With 'random=1' set, each IO access selects its timeout from a uniform distribution between 0 and 1 milliseconds. For asynchronously run commands, "random=1" will lead to out-of-order completions. For example if A, B, C commands are submitted in that order, they may well complete in the A, C, B order.

REPORT LUNS Well Known LU

There are two techniques for discovering the luns that a SCSI target supports. The first (and oldest) is based sending commands like INQUIRY and REPORT LUNS to lun 0 (i.e. the LUN whose value is eight bytes of zero), even if the target has no lun 0. The second technique is based on one of the so-called "well known logical units", specifically the REPORT LUNS well known logical unit. If present it must support the INQUIRY, REPORT LUNS, REQUEST SENSE and TEST UNIT READY commands. Simulating one with scsi_debug is somewhat contorted:

# modprobe scsi_debug no_lun_0=1 max_luns=2
#
# lsscsi -g
[0:0:0:0] disk ATA INTEL SSDSC2BW18 DC32 /dev/sda /dev/sg0 
[3:0:0:1] disk Linux scsi_debug 0184 /dev/sdb /dev/sg1
#
# lsscsi --hosts
[0] ahci 
[1] ahci 
[2] ahci 
[3] scsi_debug
#

# ## Pick the host number corresponding to scsi_debug (i.e. "3")
# ## The host number is also the first number in the tuple (i.e. "3" in [3:0:0:1]).
#
# cd /sys/class/scsi_host/host3
# echo "- - 49409" > scan
#
# lsscsi -g
[0:0:0:0] disk ATA INTEL SSDSC2BW18 DC32 /dev/sda /dev/sg0 
[3:0:0:1] disk Linux scsi_debug 0184 /dev/sdb /dev/sg1 
[3:0:0:49409]wlun Linux scsi_debug 0184 - /dev/sg2

The scsi_debug driver needed to be told that it had no_lun_0 so it started generating luns at 1 ([3:0:0:1]) and then the scsi sub-system needed to be told to scan specifically for lun 49409 (0xc101). Thereafter the REPORT LUNS wlun appeared.

The way a SCSI initiator (host) scans for targets is transport specific. In the case of the scsi_debug driver it has a magic transport (bus) called "pseudo" which does the right thing. Apart from target discovery, the scsi_debug driver tries to simulate SAS devices, see the next section.

SAS personality

The scsi_debug driver has a Serial Attached SCSI (SAS) personality. For any application that cares, it looks like a dual ported SAS disk accessed via the primary port (relative target port 1). In one case it masquerades as a SATA disk behind a SCSI to ATA Translation (SAT) layer (SATL). Many of the settings are in common with Fibre Channel dual ported disks.

The driver sets the MULTIP (multiport) bit in the INQUIRY response. The following VPD pages are SAS or SAT specific:

The naa-5 addresses are meant to be world wide unique names which represents a challenge to the scsi_debug driver. Amongst other things Linux does not have a IEEE company id [memo: OSDL]. Even if it did, making them truly unique in a virtual driver, especially if multiple boxes could somehow see each other, would be difficult.

There are also several SAS specific mode pages:

Both the VPD and mode pages can be viewed from the user space with an application like sdparm . Below is an example of the device identification VPD page:

# sdparm -i /dev/sda
 /dev/sda: Linux scsi_debug 0004
Device identification VPD page:
 Addressed logical unit:
 desig_type: T10 vendor identification, code_set: ASCII
 vendor id: Linux
 vendor specific: scsi_debug 2000
 desig_type: NAA, code_set: Binary
 0x53333330000007d0
 Target port:
 desig_type: Relative target port, code_set: Binary
 transport: Serial Attached SCSI (SAS)
 Relative target port: 0x1
 desig_type: NAA, code_set: Binary
 transport: Serial Attached SCSI (SAS)
 0x52222220000007ce
 Target device that contains addressed lu:
 desig_type: NAA, code_set: Binary
 transport: Serial Attached SCSI (SAS)
 0x52222220000007cd
 desig_type: SCSI name string, code_set: UTF-8
 transport: Serial Attached SCSI (SAS)
 SCSI name string:
 naa.52222220000007CD

Below is an example of the SCSI ports VPD page showing a dual ported target:

# sdparm -i -p sp /dev/sda
 /dev/sda: Linux scsi_debug 0004
SCSI Ports VPD page:
Relative port=1
 Target port descriptor(s):
 desig_type: NAA, code_set: Binary
 transport: Serial Attached SCSI (SAS)
 0x52222220000007ce
Relative port=2
 Target port descriptor(s):
 desig_type: NAA, code_set: Binary
 transport: Serial Attached SCSI (SAS)
 0x52222220000007cf

Notice that the above implies that the INQUIRY was sent via port 1 (port A) of the emulated SAS dual ported target. The protocol specific port phy control and discover mode subpage [0x19,0x1] has target port/phy SAS addresses that correspond to the SCSI ports VPD page:

# sdparm -t sas -p pcd -l /dev/sda
 /dev/sda: Linux scsi_debug 0004
 Direct access device specific parameters: WP=0 DPOFUA=0
port: phy control and discover (SAS) mode page:
 PPID_1 6 [cha: n, def: 6] Port's (transport) protocol identifier
 NOP 2 [cha: n, def: 2] Number of phys
 PHID 0 [cha: n, def: 0] Phy identifier
 ADT 1 [cha: n, def: 1] Attached device type
 NPLR 9 [cha: n, def: 9] Negotiated physical link rate
 ASIP 1 [cha: n, def: 1] Attached SSP initiator port
 ATIP 0 [cha: n, def: 0] Attached STP initiator port
 AMIP 0 [cha: n, def: 0] Attached SMP initiator port
 ASTP 0 [cha: n, def: 0] Attached SSP target port
 ATTP 0 [cha: n, def: 0] Attached STP target port
 AMTP 0 [cha: n, def: 0] Attached SMP target port
 SASA 0x52222220000007ce [cha: n, def:0x52222220000007ce] SAS address
 ASASA 0x5111111000000001 [cha: n, def:0x5111111000000001] Attached SAS address
 APHID 2 [cha: n, def: 2] Attached phy identifier
 PMILR 8 [cha: n, def: 8] Programmed minimum link rate
 HMILR 8 [cha: n, def: 8] Hardware minimum link rate
 PMALR 9 [cha: n, def: 9] Programmed maximum link rate
 HMALR 9 [cha: n, def: 9] Hardware maximum link rate
 2_PHID 1 [cha: n, def: 1] Phy identifier
 2_ADT 1 [cha: n, def: 1] Attached device type
 2_NPLR 9 [cha: n, def: 9] Negotiated physical link rate
 2_ASIP 1 [cha: n, def: 1] Attached SSP initiator port
 2_ATIP 0 [cha: n, def: 0] Attached STP initiator port
 2_AMIP 0 [cha: n, def: 0] Attached SMP initiator port
 2_ASTP 0 [cha: n, def: 0] Attached SSP target port
 2_ATTP 0 [cha: n, def: 0] Attached STP target port
 2_AMTP 0 [cha: n, def: 0] Attached SMP target port
 2_SASA 0x52222220000007cf [cha: n, def:0x52222220000007cf] SAS address
 2_ASASA 0x5111111000000001 [cha: n, def:0x5111111000000001] Attached SAS address
 2_APHID 3 [cha: n, def: 3] Attached phy identifier
 2_PMILR 8 [cha: n, def: 8] Programmed minimum link rate
 2_HMILR 8 [cha: n, def: 8] Hardware minimum link rate
 2_PMALR 9 [cha: n, def: 9] Programmed maximum link rate
 2_HMALR 9 [cha: n, def: 9] Hardware maximum link rate

Other supported mode pages can be accessed in a similar way by the sdparm utility. Note that transport specific mode pages need the transport identified: hence the '-t sas' option above.

Downloads

There is nothing to download, see <linux_kernel_source>/drivers/scsi/scsi_debug.c .

Conclusion

Hopefully the design of the scsi_debug driver lends itself to many extensions. If you think that you have a useful extension that others may be interested in, please contact the linux-scsi list or the author with a patch.


 Back  to main page

Douglas Gilbert <dgilbert at interlog dot com>
with additions from
Martin K. Petersen <martin dot petersen at oracle dot com>

Last updated: 16th April 2021 12:00 -0400