Scsi_debug adapter driver for Linux

Scsi_debug adapter driver for Linux

Introduction

The scsi_debug adapter driver simulates a variable number of SCSI disks, each sharing a common amount of RAM allocated by the driver to act as (volatile) storage. With one SCSI disk simulated, the scsi_debug driver is functionally equivalent to a RAM disk. When multiple SCSI disks are simulated, they could be viewed as multiple paths to the same storage device or simply separate devices. The driver can also be used to simulate very large disks, 2 terabytes or more in size by "wrapping" its data access within the available ram.

A small but hopefully useful set of SCSI commands is supported along with some crude error checking. The number of simulated devices and the shared RAM size for storage can be given as module parameters or boot time parameters if the scsi_debug driver is built into the kernel. The number of simulated devices (and hosts) can be varied at run time via sysfs. Various error conditions can be optionally generated to test the reaction of upper levels of the kernel and applications to abnormal situations.

This page describes the this driver as found in the Linux kernel version 3.17.0 and earlier versions of this driver worked with the Linux kernel 2.6 series. For information about the scsi_debug driver found in the lk 2.4 production series see this page.

Parameters

The parameter name given in the table below is the module parameter name and the sysfs file name. The boot time parameter (if the scsi_debug driver is built into the kernel (not recommended)) has "scsi_debug." prepended to it. Hence the boot time parameter corresponding to add_host=2 is scsi_debug.add_host=2 .

When the scsi_debug module is loaded, many parameters can be given on the command line, separated by spaces: for example to simulate 140 disks "modprobe scsi_debug max_luns=2 num_tgts=7 add_host=10" could be used. This will generate 140 devices: 10 hosts, each with 7 targets, each with 2 logical units.

Sysfs parameters can be read with the cat command and written with the echo command. Examples:
# cd /sys/bus/pseudo/drivers/scsi_debug # cat dev_size_mb 8 # echo 1 > add_host
These parameters are also found in the /sys/module/scsi_debug/parameters directory; however even if a write operation is permitted (by sysfs) the scsi_debug driver takes no account of the change.

Here is a list of scsi_debug specific driver parameters:

Parameter name	default value	sysfs access	sysfs write effect	new in version	notes
add_host	1	read-write	immediate		can add or remove hosts at runtime
ato	1	read only	-	1.81	application tag ownership (0 -> disk, 1 -> host)
clustering	0			1.84	enable large transfers
delay	1	read-write	next command		IO command response delay: units are jiffies (configurable: 1 to 10 ms) . 0: no delay, all in one thread; -1: use "hi" tasklet; -2: use normal tasklet
dev_size_mb	8	read only	-		units are Mebibytes (2**20 bytes)
dif	0	read-only	-	1.81	data integrity field type [T10: protection type]
dix	0	read-only	-	1.81	data integrity extension mask; check integrity when non zero
dsense	0	read-write	immediate	1.81	0 -> fixed; 1-> descriptor sense format
every_nth	0	read-write	n commands from now		for error injection: 0 -> don't do error injection
fake_rw	0	read-write	next command	1.80	when set does no processing when a READ or WRITE command (of any cdb size) is received. When fake_rw=1 no ram is allocated.
guard	0	read-only	-	1.81	protection checksum: 0 -> crc; 1 -> ip
host_lock	0	read-write	next command	1.84	when set wraps each submitted command a host_lock which is detrimental in a multi-queue system
lbpu	0	read-only	-	2012	LB provisioning: support UNMAP
lbpws	0	read-only	-	2012	LB provisioning: support WRITE SAME(16) and UNMAP
lbpws10	0	read-only	-	2012	LB provisioning: support WRITE SAME(10)
lbprz	1	read-only	-	2012	LB provisioning: returns 0s when reading unmapped block
lowest_aligned	0	read-only	-	1.81	RCAP_16's lowest aligned logical block address (max: 0x3fff)
max_luns	1	read-write	next positive add_host or scan		responds to luns: 0 ... (max_luns-1) or 1 ... (max_luns-1) if no_lun_0 is set
max_queue	576	read-write	next command	1.82	number of commands driver can queue before telling mid-level it is full. Safe to change when commands already queued.
ndelay	0	read-write	next command	1.84	IO command response delay: units are nanoseconds. If > 0 then the delay parameter will be ignored (it appears as -9999)
no_lun_0	0	read-write	next positive add_host or scan	1.77	no lun 0 but responds to INQUIRY and REPORT LUNS as per SPC-2
no_uld	0	read only	-	1.82	only attaches to sg and bsg devices
num_parts	0	read only	-		number of partitions
num_tgts	1	read-write	next positive add_host or scan		targets per host
opt_blks	64	read-only		1.84	'Optimal transfer length' field in Block Limits VPD page
opts	0	read-write	usually following commands		0 -> quiet and no error injection (mask 16 to inject aborted_command new in 1.81)
physblk_exp	0	read only	-	1.81	2**physblk_exp sets READ CAPACITY(16)'s logical blocks per physical block exponent field
ptype	0	read-write	next positive add_host or scan		peripheral device type (0==disk)
scsi_level	5	read only	-		from: 0 (no compliance), 1, 2 (SCSI-2), 3 (SPC), 4 (SPC-2), 5 (SPC-3), 6 (SPC-4)
sector_size	512	read only	-	1.81	logical block size in bytes. 512, 1024, 2048 and 4096 accepted
unmap_alignment	0	read only	-	1.81	Block limits VPD page's unmap granularity alignment
unmap_alignment	0	read only	-	1.81	Block limits VPD page's optimal unmap granularity
unmap_max_blocks	0	read only	-	1.81	Block limits VPD page's maximum unmap LBA count
unmap_max_desc	0	read only	-	1.81	Block limits VPD page's maximum unmap block descriptor count
virtual_gb	0	read-write	immediate, next READ CAPACITY	1.79	When 0 then device is dev_size_mb sized ram disk. When n > 0, "virtual" n Gibibyte size disk, wrapping on dev_size_mb actual ram. The Gibibyte unit is 2**30 bytes
vpd_use_hostno	1	read-write	next positive add_host or scan	1.80	the driver generates serial numbers and SAS naa-5 addresses based on host number ("hostno"), target id and lun. When set to 0, the generated numbers ignore "hostno".
write_same_length	0xffff	read-write	-	2012	maximum blocks per WRITE SAME command

The add_host parameter is the number of hosts (HBAs) to simulate. The default is 1. For boot time and module loads the allowable values are 0 through to a large positive number. For sysfs writes, a value of 0 does nothing while a positive number adds that many hosts and a negative number removes that number of hosts. A sysfs read of this parameter shows the current number of hosts scsi_debug is simulating. No more than num_tgts target ids will be used per host. Target ids are in ascending order from 0 excluding the target id that is used by the initiator (i.e. HBA) if any. The default setting of num_tgts is 1. The default setting for max_luns is 1. So the number of pseudo disks simulated at driver initialization time is (add_host * num_tgts * max_luns). Note that if any of these three parameters is set to zero at kernel boot time or module load time then no devices are created. Modifying the add_host parameter in sysfs can be used to simulate hot plugging and unplugging of hosts. See below for adding and deleting individual scsi devices

The ato parameter sets the field of the same name in the control mode page. The default value is 1 which implies the host is the application tag owner. A value of 0 implies the device server (e.g. the (pseudo) disk) is the application tag owner.

The clustering parameter informs the SCSI mid layer whether (1) of not (0) clustering is enabled. The default is that is is not (0). Setting this parameter facilitates large transfers of data with a single command.

The delay parameter is the number of jiffies by which the driver will delay responses. The default is 1 jiffy unless the ndelay parameter is given, see its description. Setting this parameter to 0 will cause the response to be sent back to the mid level before the request function is completed. The "jiffy" is a kernel space jiffy (typically the largest HZ figure yields a 1 millisecond on i386) rather than a user space jiffy (USER_HZ is typically 10 milliseconds on i386). HZ and USER_HZ are configurable in the kernel build. Both delayed and immediate responses are permitted however delayed responses are more realistic. For delayed responses, a kernel timer is used. [Real adapters would generate an interrupt when the response was ready (i.e. the command had completed).] For a fast ram disk set the delay parameter to 0. These SCSI commands ignore the delay parameter and respond immediately: INQUIRY, REPORT LUNS, REQUEST SENSE, SYNCHRONIZE CACHE plus various other non "media access" commands. TEST UNIT READY is considered a media access command.

The delay parameter may be set to -1 or -2 which uses a kernel tasklet to generate a more or less immediate response (but in a different kernel thread). The -1 variant schedules a high priority tasklet while -2 schedules a normal priority tasklet. Trying to write a new value to delay while there are queued command responses may result in an EBUSY error.

The dev_size_mb parameter allows the user to specify the size of the simulated storage. The unit is Mebibytes (each 2**20 bytes and a bit larger than a Megabyte) and the default value is 8. The maximum value depends on the capabilities of the vmalloc() call on the target architecture. If the module fails to load with a "cannot allocate memory" message then a "vmalloc=nn{KMG}" boot time argument may be needed. [See the kernel source file: Documentation/kernel-parameters.txt for more information on this.] The RAM reserved for storage is initialized to zeros which leads the sd (scsi disk) driver and the block layer to believe there is no partition table present. Partitions can be simulated with num_parts (see below). All simulated dummy devices share the same RAM. If a value of 0 or less is given then dev_size_mb is forced to 1 so 1 MB of RAM is used. Given 512 byte logical blocks, the largest ramdisk that can be allocated is 2 TB but it is unlikely a system would be able to allocate that much ram (a situation that would be bypassed if fake_rw=1). Very large amounts of "virtual" storage can be simulated with the virtual_gb parameter (see below).

The every_nth parameter takes a decimal number as an argument. When this number is greater than zero, then incoming commands are counted and when <n> is reached then the associated command generates some sort of error. Currently the available errors are timeout (when "opts & 4" is true) and RECOVERED_ERROR (when "opts & 8" is true) . Once the command count reaches <n> then it is reset to zero. For example setting every_nth to 3 and opts to 4 will cause every third command to be ignored (and hence a timeout). If every_nth is not given it is defaulted to 0 and timeouts and recovered errors will not be generated.

If every_nth is negative then an internal command counter counts down to that value and when it is reached, continually generates the error condition (specified in opts) on each newly received command. The driver flags this continual error state by setting every_nth to -1 . The user can stop error conditions being generated on receipt of every subsequent command by writing 0 to every_nth (or opts ).

The fake_rw parameter instructs the scsi_debug driver to ignore all READ and WRITE commands and return a GOOD status. This means the data "read" when fake_rw is set is whatever was previously in the scatter gather list. The default value is 0 (i.e. process READ and WRITE commands). This parameter is for testing and when set can confuse the kernel or utilities that look for partitions and other information on a "disk".

The guard parameter when set to zero (the default) use T10 defined CRC in the protection information. When set to one the IP (internet protocol) checksum (as used by iSCSI ?) is used.

The host_lock parameter indicates whether each command (excluding its response delay and associated callback into the mid-layer) is surrounded by a per host host_lock (which is a kernel "spin lock"). In a SCSI multi-queue system the presence of this host lock will have the effect of serializing all commands form a host; and that is detrimental to system performance. Prior to version 1.84 this parameter was not available and the host_lock surround all commands. In version 1.84 and later the default is 0 which means the host_lock is not applied. Set host_lock=1 for the old behaviour.

The lbpu parameter, if set, causes the logical block partitioning VPD page to set the field of the same name. The default is to set the LBPU field to 0. When set this field indicates the UNMAP command is supported.

The lpbws and lbpws10 parameters cause the corresponding bits in the logical block partitioning VPD page to be set. The imply the the UNMAP field within the WRITE SAME(16) and WRITE_SAME(10) respectively are supported.

The lbprz parameter, if set, causes the logical block partitioning VPD page to set the field of the same name. When this field is set reading unmapped logical blocks will yield block(s) of data full of xeros to be returned.

The lowest_aligned parameter sets the field called LOWEST ALIGNED LOGICAL BLOCK ADDRESS in the READ CAPACITY (16) command response.
The default is zero which implies the logical block size and the physical block size are the same.

The max_luns parameter allows an upper limit to be placed on the logical unit number (lun) that the scsi_debug driver will respond to. A value of 2 means that this driver will respond to logical unit numbers 0 and 1. If max_luns is modified by a sysfs write then the scsi_debug driver modifies the scsi_host::max_lun member of all hosts that it owns. When max_luns is modified by a sysfs write then it will take effect the next time a host is added (see add_host) or when a scan is done on any existing host. The mid level scanning code will scan for up to but not including max_scsi_luns which is a SCSI mid level boot and module load time parameter.

The max_queue parameter indicates the maximum number of queued responses the driver can handle. This defaults to an internal define in the scsi_debug driver called SCSI_DEBUG_CANQUEUE which is currently 576. If both the delay and ndelay parameters are 0, no commands have queued responses. If there is an attempt to exceed this value then either SCSI_MLQUEUE_HOST_BUSY is returned to the mid-layer (the default) or a status of TASK_SET_FULL (if the 0x200 opts mask is set). Sysfs can be used at any time to change the value of max_queue, even when the are queued command responses.

The ndelay parameter is the response delay whose units are nanoseconds. This mechanism depends on high resolution timers in the kernel which may not be supported on small or old system (it is a kernel build config option). Its default value is 0 which means the delay parameter is operative. If ndelay is a positive value then a response delay for that many nanoseconds is active (and to indicate the delay parameter is overridden, it is set to -9999). Depending on the hardware, setting ndelay to less than a few microseconds probably causes no further reduction in the observed response delays. Trying to write a new value to ndelay while there are queued command responses may result in an EBUSY error.

The num_parts parameter writes a partition table to the ramdisk if the parameter's value is greater than 0. The default is 0 so in that case the ramdisk is simply all zeros. When num_parts is greater than zero a DOS format primary partition block is written to logical block 0, so the number of partitions is limited to a maximum of 4. The partitions are given an id of 0x83 which is a "Linux" partition. The available space on the ramdisk is roughly divided evenly between partitions when 2 or more partitions are requested. The partitions are not initialized with any file system. Even if no partitions are specified, a utility like fdisk can be used to added them later.

The num_tgts parameter allows the number of targets per host to be specified. It should be 0 or greater. Target id numbers start at 0 and ascend, bypassing the target id of the initiator (i.e. the HBA). If num_tgts is modified by a sysfs write then the scsi_debug driver modifies the scsi_host::max_id member of all hosts that it owns. When num_tgts is modified by a sysfs write then it will take effect the next time a host is added (see add_host) or when a scan is done on any existing host.

The no_lun_0 parameter when set to a non zero value causes a lun 0 INQUIRY response of peripheral_qualifier==3 indicating there is no actual lu there. As required by SPC, lun 0 will still respond to the a REPORT LUNS command. If the REPORT LUNS has a 'select report' code of 1 or 2, then one of the luns reported will be the REPORT LUNS well known logical unit (lun 49409 or 0xc101). The default value is 0. If max_luns is greater than 1, the the first lun generated by scsi_debug will be lun 1 (since lun 0 is skipped). The REPORT LUNS well known logical unit (wlun) only supports the INQUIRY, REPORT LUNS, REQUEST SENSE and TEST UNIT READY SCSI commands. To make this wlun appear as a scsi generic (sg) device see the REPORT LUNS well known LUN example below.

The opt_blk parameter is placed in the "Optimal transfer length" field of the Block Limits VPD page. Its default value is 64.

The opts parameter takes a number as an argument which is the bitwise "or" of several flags. The flags that mention "nth" are only active when every_nth != 0 . So-called "read-write" commands include some others such as VERIFY. The flags supported are:

1 - "noisy" flag: all calls to entry points of driver are logged. Commands to be executed are shown in hex. Additional information such as check conditions, command aborts and resets are logged
2 - "medium error" flag: simulates a SCSI MEDIUM ERROR when sector 0x1234 (that is 4660 in decimal) is read
4 - ignore "nth" command causing a timeout.
8 - cause "nth" read or write command to yield a RECOVERED_ERROR.
0x10 - cause "nth" read-write command to yield an ABORTED_COMMAND (ack/nak timeout) which is a SAS transport error.
0x20 - cause "nth" read-write command to yield an ABORTED_COMMAND (logical block guard check failed), nominally a DIF (Protection Information) error
0x40 - cause "nth" read-write command to yield an ABORTED_COMMAND (logical block guard check failed), nominally a DIX error
0x80 - ignore "nth" media access command causing a timeout
0x100 - cause "nth" read command to yield half the data it was requested to read
0x200 - log generation of TASK SET FULL and host busy plus changes to queue depth and type
0x400 - if max_queue is exceeded yield a TEST SET FULL (default: host busy)
0x800 - cause "nth" read-write command whose queue_depth is at it maximum value to yield a status of TASK SET FULL
0x1000 - set WCE field in the caching page to 0 (default WCE=1)
0x2000 - log only abort commands and the various levels of reset
0x4000 - used together with the noisy flag (1) to suppress the logging of cdbs; additional information (if any) is still logged.

The opts "noisy" (or debug) flag will cause all scsi_debug entry points to be logged in the system log (and often sent to the console depending on how kernel informational messages are processed). With this flag set commands are listed in hex and if they yield a result other than successful then that is shown. In a busy system this may prove to be too much log "noise" in which case this combination of flags may be useful: opts=0x6201 .

The opts "medium error" flag will cause any read command whose range of sectors includes sector 0x1234 (4660 in decimal) to return a medium error indication to the mid level. The "ignore nth" flag is only active when every_nth != 0 . When an internal command counter reaches the value in every_nth and the "ignore nth" flag is set, then this command is ignored (i.e. quietly not processed). Typically this will cause the SCSI mid level code to timeout the command which leads to further error processing. The internal command counter is reset to zero whenever opts is written to, whenever every_nth is written to, when the every_nth value is reached and at driver load time. The "recovered error" flag works in a similar fashion to the "ignore nth" flag, however when the every_nth value is reached and it is either a read or a write command then the command is processed normally but yields a "recovered error" indication. Such an indication is _not_ a hard error but for a real disk could indicate deteriorating media. The "aborted command" flag injects a transport error in a similar fashion to the way the "recovered error" flag works. A minor point: the kernel boot time and module load time opts parameter is a decimal integer. However the output sysfs value is a hexadecimal number (output as 0x9 for example) while the input value is interpreted as hexadecimal if prefixed by "0x" and decimal otherwise. When combining these flags it is easier to consider them as hexadecimal numbers.

The physblk_exp parameter becomes the "Logical blocks per physical block exponent" field in the READ CAPACITY (16) response. The default value is 0 which means the logical block and physical block sizes are the same.

The ptype parameter allows the SCSI peripheral type to be set or modified. The default value is 0 which corresponds to a disk. Other useful peripheral types are 1 for tape, 3 for processor, 5 for dvd/cd and 13 for enclosure (SES).

The scsi_level parameter is the ANSI SCSI standard level that the simulated disk announces that it is compliant to. The INQUIRY response which is generated by scsi_debug contains the ANSI SCSI standard level value (in byte 2).

The virtual_gb parameter allows the scsi_debug driver to simulate a much larger storage device than physical RAM available in the machine. When the virtual_gb parameter is 0 (its default value) then the maximum storage available is that indicated by the dev_size_mb parameter. When the virtual_gb parameter is greater than zero, that many Gibibytes (each of 2**30 bytes and larger than a Gigabyte) are reported by the READ CAPACITY command. Reading and writing of the "Gigabytes" of data wraps around within the available physical ram (which the scsi_debug driver has allocated and is dev_size_mb Mebibytes in size). When the number of virtual Gibibytes is 2048 or greater then READ CAPACITY (16) is needed to represent the size and READ (16) and/or WRITE (16) are needed to access data at the 2048 Gibibyte boundary and beyond. This boundary represents 2**32-1 blocks (sectors) assuming 512 bytes long. The "wrapping" action still allows partitions to be written with fdisk and in many cases a file system to be initialized. Trying to store and retrieve any useful data on such a big virtual disk would not be wise! Setting the dev_size_mb parameter to a prime number, larger than the default value (which is 8) and that doesn't starve the machine for resources, seems to help in creating ext3 file systems. This occurs since mkfs writes the file system super block at several offsets within the partition, and the wrap may cause the file system header to be overwritten. The virtual_gb option is designed for testing, not practical data storage.

The vpd_use_hostno parameter affects the way the scsi_debug driver generates its serial numbers, SAS and naa-5 addresses. When vpd_use_hostno is set to 1 (its default value) then the host number ("hostno"), target_id and lun are used to generate the serial number, SAS and naa-5 addresses. The formula is "((hostno + 1) * 2000) + (target_id * 1000) + lun)". When vpd_use_hostno is set to 0 then the "hostno" term in the formula is set to 0. This has the affect of making multiple simulated hosts look like they are connected to the same drives (i.e. there are only "num_tgts * max_luns" unique simulated devices). The kernel will still report "add_host * num_tgts * max_luns" devices but higher level multipath aware software may see the difference.

Supported SCSI commands

Below is a list of supported commands. Some do nothing (e.g. SYNCHRONIZE CACHE). Those that have interesting functionality have notes in brackets. If the feature was introduced in a recent version (i.e. since 1.76) then that is noted.

ALLOW MEDIUM REMOVAL
GET LBA STATUS
INQUIRY [vital product data pages: 0, 0x80, 0x83] [1.77: VPD pages: 0x85, 0x86, 0x87, 0x88, 0x89, 0xb0]
LOG SENSE [1.78: temperature(0xd) and informational exceptions(0x2f)] [1.80: support log subpages]
MODE SELECT (6), MODE SELECT (10) [1.84: changeable pages: 0x8 (caching), 0xa (control) and 0x1c (informational exceptions)]
MODE SENSE (6), MODE_SENSE (10) [sense pages: 1 (rw error recovery), 2 (disconnect), 3 (format), 8 (caching), 0xa (control), 0x1c (informational exceptions), 0x3f (read all)] [1.77: subpage support plus SAS pages: 0x19,0 0x19,1 and 0x19,2]
READ (6), READ (10), READ(12), READ(16), READ(32)
READ CAPACITY (10), READ CAPACITY (16) [1.79: added 16 byte command]
RELEASE (6), RELEASE (10)
REPORT LUNS [1.77: shows REPORT LUNS wlun]
REPORT TARGET PORT GROUPS
REQUEST SENSE [1.79: shows MRIE=6 failure prediction, power states]
RESERVE (6), RESERVE (10)
REZERO UNIT (which is REWIND for tapes)
SEND DIAGNOSTIC
START STOP [1.78: maintains start and stop states, when stopped fails media access commands]
SYNCHRONIZE CACHE
TEST UNIT READY [1.78: in stopped state gives appropriate error]
VERIFY (10)
WRITE (6), WRITE (10), WRITE (12), WRITE (16), WRITE(32)
WRITE SAME(10), WRITE SAME(16)
UNMAP
XDWRITEREAD [which is a bidirectional command]

The implementations of the above commands are sufficient for the scsi subsystem to detect and attach devices. The fdisk, e2fsck and mount commands also work as do the utilities found in the sg3_utils package (see the main page). Crude error processing picks up unsupported commands and attempts to read or write outside the available RAM storage area.

Modern SCSI devices use vital product page 0x83 for identification. This driver yields both "T10 vendor identification" and "NAA" descriptors. The former yields an ASCII string like "Linux scsi_debug 4000" where the "4000" is the ((host_no + 1) * 2000) + (target_id * 1000) + lun). In this case "4000" corresponds to host_no==1, target_id==0 and lun==0. The "NAA-5" descriptor is an 8 byte binary value that looks like this hex sequence: "51 23 45 60 00 00 0f a0" where the IEEE company id is 0x123456 (fake) and the vendor specific identifier in the least significant bytes is 4000 (which is fa0 in hex). [The "4000" is derived the same way for both descriptors.]

Read and write commands executed by the scsi_debug driver are atomic (i.e. a write to one scsi_debug device will not interrupt (split) a read from another scsi_debug device. So a read command will either yield the contents of ram before a co-incident write, or after the co-incident write has finished

Logical and physical block size

scsi-debug supports emulating devices with logical block sizes bigger than 512 bytes. This can be specified using the sector_size option.

Some storage devices use physical block sizes bigger than 512 bytes internally but expose a 512-byte logical block size to the host for compatibility reasons. The physblk_exp parameter can be used to indicate that the internal block size is 2^n times bigger than the reported logical block size. For instance: Supplying physblk_exp=3 on the command line will cause scsi_debug to simulate a device with 512-byte logical blocks and 4KB physical blocks.

Not all storage devices have logical block 0 aligned to a physical block boundary. These devices can be emulated using scsi_debug's lowest_aligned option. The parameter indicates the lowest LBA that is aligned to a physical block boundary.

Thin provisioning

New in SBC-3 is the ability for block devices to be thinly provisioned.

This means that devices can report a capacity that is bigger than the space actually allocated. When files are deleted, the relevant blocks can be reclaimed by the storage device and used for something else. And consequently only blocks that are actively in use consume physical storage space.

SBC-3 specifies two different approaches for marking blocks as unused: WRITE SAME(16) with the UNMAP bit set, and the UNMAP command. scsi_debug supports both methods and they are controlled via 4 module parameters:

unmap_max_desc specifies the maximum number of ranges that can be unmapped using a single UNMAP command. If this is set to 0, only WRITE SAME is supported and UNMAP will cause a check condition.
unmap_granularity specifies the granularity at which to track mapped blocks (specified in number of logical blocks). 2048 (1 MB) is a realistic value for disk arrays although some may have a finer granularity.
unmap_alignment specifies the first LBA which is naturally aligned on an unmap_granularity boundary.
unmap_max_blocks specifies the maximum number of blocks that can be unmapped using a single UNMAP command. Default is 0xffffffff.

Examples:
    modprobe scsi_debug unmap_max_desc=0 unmap_granularity=1
will simulate a device that only supports WRITE SAME(16) and which tracks usage on a per logical block basis. This is how most solid state drives work.
    modprobe scsi_debug unmap_max_desc=64 unmap_granularity=2048
will simulate a device that supports UNMAP and which is provisioned in 1MB chunks. This is a common scenario for thinly provisioned storage arrays.

The current block allocation bitmap can be viewed from user space via:
    cat /sys/bus/pseudo/drivers/scsi_debug/map

Examples

Basic

Since scsi_debug is for testing it seems more useful to build it as a module rather than build it into the kernel. Some parameters cannot be changed once the scsi_debug driver is running. So if it is a module then it can be removed with rmmod and reloaded with another modprobe call with the desired parameters.

When the driver is loaded successfully simulated disks should be visible just like other SCSI devices:

# modprobe scsi_debug # lsscsi -s [0:0:0:0] disk SEAGATE ST33000650SS 0005 /dev/sda 3.00TB [0:0:1:0] enclosu Intel RES2SV240 0d00 - - [4:0:0:0] disk ATA ST3160812AS D /dev/sdb 160GB [7:0:0:0] disk Linux scsi_debug 0184 /dev/sdc 8.38MB

In this case there is a 3 TB SAS disk, an ATA disk and a small scsi_debug pseudo disk. The other device (at [0:0:1:0]) is a SCSI Enclosure Service (SES) device. The /dev/sdc pseudo disk is full of zeros and has no partitions. To get a partition the num_parts parameter could have been used on the modprobe line or it could be done from the command line with the fdisk /dev/sdc command. Assuming one ext3 partition is allocated to the whole pseudo disk (8 MB in this case) then the mkfs.ext3 /dev/sdc1 command can be used to make an ext3 file system. Now /dev/sdc1 can be mounted and treated like a normal file system. Naturally when the power is turned off anything stored in /dev/sdc1 will be forgotten.

Rather than mounting the pseudo disk, the sg3_utils package could be used to carry out various tests on it.

Information about the scsi_debug driver version, its current parameters and some other data can be found in the "proc" file system. The trailing number in the path is the scsi_debug host number which is the first element in the 4 item tuple shown in the lsscsi above :

# cat /proc/scsi/scsi_debug/7 scsi_debug adapter driver, version 1.84 [20140706] num_tgts=1, shared (ram) size=8 MB, opts=0x0, every_nth=0 delay=1, ndelay=0, max_luns=1, q_completions=1859648 sector_size=512 bytes, cylinders=64, heads=8, sectors=32 command aborts=0; RESETs: device=0, target=0, bus=0, host=0 dix_reads=0 dix_writes=0 dif_errors=0 usec_in_jiffy=1000

Here is an important sysfs directory for the scsi_debug driver:

# cd /sys/bus/pseudo/drivers/scsi_debug/# lsadapter0 dev_size_mb fake_rw max_queue num_tgts sector_sizeadd_host dif guard ndelay opts ueventato dix host_lock no_lun_0 ptype unbindbind dsense map no_uld removable virtual_gbdelay every_nth max_luns num_parts scsi_level vpd_use_hostno

Those files are most of the scsi_debug parameters, those that are writable can be modified and the scsi_debug actions will change accordingly thereafter. Certain parameters cannot be changed while the driver is busy (e.g. it has queued command responses), in which case EBUSY is returned if the user attempts to change one. Reading one can be done with the cat command and changing one can be done with the echo command:

# cat every_nth0# echo 2000 > every_nth

Another important sysfs directory for (any) disks is /sys/block/<disk_node_name> and its queue sub-directory. So in this case of this scsi_debug pseudo disk that directory would be/sys/block/sdc/queue. Also there is the scsi_device sysfs directory that has the form/sys/class/scsi_device/<h:c:t:l>/devicewhere the<h:c:t:l>tuple is found at the left hand side of each device listed by lsscsi. This sysfs directory contains many important SCSI device parameters some of which can be modified.

Adding and removing hosts and devices

Individual devices can be removed via sysfs and the mid-level by writing any value into the "delete" member in the sysfs directory corresponding to the scsi device. Given these devices:

# lsscsi -s[0:0:0:0]    disk    SEAGATE ST200FM0073      0A04 /dev/sda    200GB[4:0:0:0]    disk    ATA      ST3160812AS      D     /dev/sdb    160GB[7:0:0:0]    disk    Linux    scsi_debug       0184 /dev/sdc   21.4GB
then the scsi_debug (pseudo) disk can be deleted like this:

# echo 1 > /sys/class/scsi_device/7:0:0:0/device/delete

After which this should be seen:

# lsscsi -s[0:0:0:0]    disk    SEAGATE ST200FM0073      0A04 /dev/sda    200GB[4:0:0:0]    disk    ATA      ST3160812AS      D     /dev/sdb    160GB
This will work for any scsi device (not just those belonging to scsi_debug). That scsi device can be re-added with the following command:

# echo "0 0 0" > /sys/class/scsi_host/host7/scan

# lsscsi
[0:0:0:0]    disk    SEAGATE ST200FM0073      0A04 /dev/sda
[4:0:0:0]    disk    ATA      ST3160812AS      D     /dev/sdb
[7:0:0:0]    disk    Linux    scsi_debug       0184 /dev/sdc

The three numbers in the "echo" are channel number, target number and lun, respectively. Wildcards (hyphen: "-") can be given for any or all of the three numbers.

# echo 3 > /sys/bus/pseudo/drivers/scsi_debug/max_luns # echo 2 > /sys/bus/pseudo/drivers/scsi_debug/num_tgts # echo "0 - -" >/sys/class/scsi_host/host7/scan

# lsscsi
[0:0:0:0]    disk    SEAGATE ST200FM0073      0A04 /dev/sda
[4:0:0:0]    disk    ATA      ST3160812AS      D     /dev/sdb
[7:0:0:0]    disk    Linux    scsi_debug       0184 /dev/sdc
[7:0:0:1]    disk    Linux    scsi_debug       0184 /dev/sdd
[7:0:0:2]    disk    Linux    scsi_debug       0184 /dev/sde
[7:0:1:0]    disk    Linux    scsi_debug       0184 /dev/sdf
[7:0:1:1]    disk    Linux    scsi_debug       0184 /dev/sdg
[7:0:1:2]    disk    Linux    scsi_debug       0184 /dev/sdh

The 'echo "0 - -" > scan' line above added five devices: /dev/sdd to /dev/sdh .

Extra hosts can be added and removed from the scsi_debug driver as follows:

# cd /sys/bus/pseudo/drivers/scsi_debug # echo 1 > add_host # add a new host (after the existing hosts) # echo -2 > add_host # remove the last two hosts (if at least that many are present)
The scsi_debug driver does not have any limits on the number of scsi devices it can create. By default when loaded it has one scsi device (owned by a host). Larger numbers of devices can be introduced at load time by specifying the add_host, num_tgts and/or max_luns parameters, the number of scsi devices created is the product of the 3 parameters (they all default to 1). Alternatively sysfs can be used to add (or remove) scsi devices after the scsi_debug driver is loaded. Two strategies can be used:

increase the value of num_tgts or max_luns then use a line like 'echo "0 - -" > scan' (shown above) to a host already owned by the scsi_debug driver.
add more hosts with a line like 'echo 3 > add_host'. Each new host will create (num_tgts * max_luns) new scsi devices. Of course num_tgts or max_luns can be modified prior to calling 'echo 3 > add_host'.

Even though the scsi_debug can create ten thousand or more devices, it doesn't mean that the scsi mid-level, sd, sg, the block layer and various other kernel components will handle it gracefully.

Mode pages

The supported mode pages are listed following the MODE SENSE entry in the supported commands sections above. Prior to version 1.80, when a mode page is read no block descriptor is included in the response. From version 1.78 the MODE SELECT command is supported. Three mode pages can be modified:

caching (WCE field is changeable) [added in version 1.84]
control (D_SENSE field is acted upon)
informational exceptions control (MRIE and TEST fields are acted upon by REQUEST SENSE)

The saved pages are not supported, reflecting that the scsi_debug driver has only volatile storage. All fields can be changed, only those fields indicated above have side effects.

REPORT LUNS Well Known LU

There are two techniques for discovering the luns that a SCSI target supports. The first (and oldest) is based sending commands like INQUIRY and REPORT LUNS to lun 0, even if the target has no lun 0. The second technique is based on one of the so-called "well known logical units", specifically the REPORT LUNS well known logical unit. If present it must support the INQUIRY, REPORT LUNS, REQUEST SENSE and TEST UNIT READY command. Simulating one with scsi_debug is somewhat contorted:

# modprobe scsi_debug no_lun_0=1 max_luns=2
#
# lsscsi -g
[0:0:0:0]    disk    ATA      INTEL SSDSC2BW18 DC32 /dev/sda   /dev/sg0
[3:0:0:1]    disk    Linux    scsi_debug       0184 /dev/sdb   /dev/sg1
#
# lsscsi --hosts
[0]    ahci
[1]    ahci
[2]    ahci
[3]    scsi_debug
#
# cd /sys/class/scsi_host/host3
# echo "- - 49409" > scan
#
# lsscsi -g
[0:0:0:0]    disk    ATA      INTEL SSDSC2BW18 DC32 /dev/sda   /dev/sg0
[3:0:0:1]    disk    Linux    scsi_debug       0184 /dev/sdb   /dev/sg1
[3:0:0:49409]wlun    Linux    scsi_debug       0184 -          /dev/sg2

The scsi_debug driver needed to be told that it had no_lun_0 so it started generating luns at 1 ([3:0:0:1]) and then the scsi sub-system needed to be told to scan specifically for lun 49409 (0xc101). Thereafter the REPORT LUNS wlun appeared.

The way a SCSI initiator (host) scans for targets is transport specific. In the case of the scsi_debug driver it has a magic transport (bus) called "pseudo" which does the right thing. Apart from target discovery, the scsi_debug driver tries to simulate SAS devices, see the next section.

SAS personality

The scsi_debug driver has a Serial Attached SCSI (SAS) personality. For any application that cares, it looks like a dual ported SAS disk accessed via the primary port (relative target port 1). In one case it masquerades as a SATA disk behind a SCSI to ATA Translation (SAT) layer (SATL). Many of the settings are in common with Fibre Channel dual ported disks.

The driver sets the MULTIP (multiport) bit in the INQUIRY response. The following VPD pages are SAS or SAT specific:

device identification page [0x83] (yields naa-5 addresses for the lu, the accessing target port and the target device, plus some other designators)
SCSI ports [0x88] (shows the naa-5 addresses of both ports)
ATA information [0x89] (simulates a SATA disk in a SAS domain, defined in SAT)

The naa-5 addresses are meant to be world wide unique names which represents a challenge to the scsi_debug driver. Amongst other things Linux does not have a IEEE company id [memo: OSDL]. Even if it did, making them truly unique in a virtual driver, especially if multiple boxes could somehow see each other, would be difficult.

There are also several SAS specific mode pages:

protocol specific port page (SAS): short format page [0x19,0x0]
protocol specific port page (SAS): phy control and discover subpage [0x19,0x1]
protocol specific port page (SAS): shared mode subpage [0x19,0x2] (sas2 version)

Both the VPD and mode pages can be viewed from the user space with an application like sdparm . Below is an example of the device identification VPD page:

# sdparm -i /dev/sda
    /dev/sda: Linux     scsi_debug        0004
Device identification VPD page:
Addressed logical unit:
    desig_type: T10 vendor identification, code_set: ASCII
      vendor id: Linux
      vendor specific: scsi_debug      2000
    desig_type: NAA, code_set: Binary
      0x53333330000007d0
Target port:
    desig_type: Relative target port, code_set: Binary
     transport: Serial Attached SCSI (SAS)
      Relative target port: 0x1
    desig_type: NAA, code_set: Binary
     transport: Serial Attached SCSI (SAS)
      0x52222220000007ce
Target device that contains addressed lu:
    desig_type: NAA, code_set: Binary
     transport: Serial Attached SCSI (SAS)
      0x52222220000007cd
    desig_type: SCSI name string, code_set: UTF-8
     transport: Serial Attached SCSI (SAS)
      SCSI name string:
      naa.52222220000007CD

Below is an example of the SCSI ports VPD page showing a dual ported target:

# sdparm -i -p sp /dev/sda
    /dev/sda: Linux     scsi_debug        0004
SCSI Ports VPD page:
Relative port=1
   Target port descriptor(s):
    desig_type: NAA, code_set: Binary
     transport: Serial Attached SCSI (SAS)
      0x52222220000007ce
Relative port=2
   Target port descriptor(s):
    desig_type: NAA, code_set: Binary
     transport: Serial Attached SCSI (SAS)
      0x52222220000007cf

Notice that the above implies that the INQUIRY was sent via port 1 (port A) of the emulated SAS dual ported target. The protocol specific port phy control and discover mode subpage [0x19,0x1] has target port/phy SAS addresses that correspond to the SCSI ports VPD page:

# sdparm -t sas -p pcd -l /dev/sda
    /dev/sda: Linux     scsi_debug        0004
    Direct access device specific parameters: WP=0 DPOFUA=0
port: phy control and discover (SAS) mode page:
PPID_1      6 [cha: n, def: 6] Port's (transport) protocol identifier
NOP         2 [cha: n, def: 2] Number of phys
PHID        0 [cha: n, def: 0] Phy identifier
ADT         1 [cha: n, def: 1] Attached device type
NPLR        9 [cha: n, def: 9] Negotiated physical link rate
ASIP        1 [cha: n, def: 1] Attached SSP initiator port
ATIP        0 [cha: n, def: 0] Attached STP initiator port
AMIP        0 [cha: n, def: 0] Attached SMP initiator port
ASTP        0 [cha: n, def: 0] Attached SSP target port
ATTP        0 [cha: n, def: 0] Attached STP target port
AMTP        0 [cha: n, def: 0] Attached SMP target port
SASA      0x52222220000007ce [cha: n, def:0x52222220000007ce] SAS address
ASASA     0x5111111000000001 [cha: n, def:0x5111111000000001] Attached SAS address
APHID       2 [cha: n, def: 2] Attached phy identifier
PMILR       8 [cha: n, def: 8] Programmed minimum link rate
HMILR       8 [cha: n, def: 8] Hardware minimum link rate
PMALR       9 [cha: n, def: 9] Programmed maximum link rate
HMALR       9 [cha: n, def: 9] Hardware maximum link rate
2_PHID      1 [cha: n, def: 1] Phy identifier
2_ADT       1 [cha: n, def: 1] Attached device type
2_NPLR      9 [cha: n, def: 9] Negotiated physical link rate
2_ASIP      1 [cha: n, def: 1] Attached SSP initiator port
2_ATIP      0 [cha: n, def: 0] Attached STP initiator port
2_AMIP      0 [cha: n, def: 0] Attached SMP initiator port
2_ASTP      0 [cha: n, def: 0] Attached SSP target port
2_ATTP      0 [cha: n, def: 0] Attached STP target port
2_AMTP      0 [cha: n, def: 0] Attached SMP target port
2_SASA    0x52222220000007cf [cha: n, def:0x52222220000007cf] SAS address
2_ASASA   0x5111111000000001 [cha: n, def:0x5111111000000001] Attached SAS address
2_APHID     3 [cha: n, def: 3] Attached phy identifier
2_PMILR     8 [cha: n, def: 8] Programmed minimum link rate
2_HMILR     8 [cha: n, def: 8] Hardware minimum link rate
2_PMALR     9 [cha: n, def: 9] Programmed maximum link rate
2_HMALR     9 [cha: n, def: 9] Hardware maximum link rate

Other supported mode pages can be accessed in a similar way by the sdparm utility. Note that transport specific mode pages need the transport identified: hence the '-t sas' option above.

Downloads

There is nothing to download, see <linux_kernel_source>/drivers/scsi/scsi_debug.c .

Conclusion

Hopefully the design of the scsi_debug driver lends itself to many extensions. If you think that you have a useful extension that others may be interested in, please contact the author with a patch.

Back to main page

Douglas Gilbert <dgilbert at interlog dot com>
with additions from
Martin K. Petersen <martin dot petersen at oracle dot com>

Last updated: 10th July 2014