Scsi_debug
adapter driver for Linux
Introduction
The scsi_debug adapter driver simulates a variable number of SCSI
disks, each sharing a common amount of RAM allocated by the driver
to act as (volatile) storage. With one SCSI disk simulated, the
scsi_debug driver is functionally equivalent to a RAM disk. When
multiple SCSI disks are simulated, they could be viewed as multiple
paths to the same storage device or simply separate devices. The
driver can also be used to simulate very large disks, 2 terabytes or
more in size by "wrapping" its data access within the available ram.
A small but hopefully useful set of SCSI commands is supported
along with some crude error checking. The number of simulated
devices and the shared RAM size for storage can be given as module
parameters or boot time parameters if the scsi_debug driver is
built into the kernel. The number of simulated devices (and hosts)
can be varied at run time via sysfs. Various error conditions can
be optionally generated to test the reaction of upper levels of
the kernel and applications to abnormal situations.
This page describes the this driver as found in the Linux kernel
version 3.17.0 and earlier versions of this driver worked with the
Linux kernel 2.6 series. For information about the
scsi_debug driver found in the lk 2.4 production series see this page.
Parameters
The parameter name given in the table below is the module parameter
name and the sysfs file name. The boot time parameter (if the
scsi_debug driver is built into the kernel (not recommended)) has
"scsi_debug." prepended to it. Hence the boot time parameter
corresponding to add_host=2 is scsi_debug.add_host=2
.
When the scsi_debug module is loaded, many parameters can be given
on the command line, separated by spaces: for example to simulate
140 disks "modprobe scsi_debug max_luns=2 num_tgts=7 add_host=10"
could be used. This will generate 140 devices: 10 hosts, each with 7
targets, each with 2 logical units.
Sysfs parameters can be read with the cat command and written with the echo command. Examples:
# cd /sys/bus/pseudo/drivers/scsi_debug
# cat dev_size_mb
8
# echo 1 > add_host
These parameters are also found in the /sys/module/scsi_debug/parameters
directory; however even if a write operation is permitted (by sysfs)
the scsi_debug driver takes no account of the change.
Here is a list of scsi_debug specific driver parameters:
Parameter name
|
default value
|
sysfs access
|
sysfs write effect
|
new in version
|
notes
|
add_host
|
1
|
read-write
|
immediate
|
|
can add or remove hosts at runtime
|
ato
|
1
|
read only
|
-
|
1.81
|
application tag ownership (0
-> disk, 1 -> host)
|
clustering
|
0
|
|
|
1.84
|
enable large transfers
|
delay
|
1
|
read-write
|
next command
|
|
IO command response delay: units are jiffies
(configurable: 1 to 10 ms) . 0: no delay, all in one thread;
-1: use "hi" tasklet; -2: use normal tasklet
|
dev_size_mb
|
8
|
read only
|
-
|
|
units are Mebibytes (2**20 bytes)
|
dif
|
0
|
read-only
|
-
|
1.81
|
data integrity field type
[T10: protection type]
|
dix
|
0
|
read-only
|
-
|
1.81
|
data integrity extension
mask; check integrity when non zero
|
dsense
|
0
|
read-write
|
immediate
|
1.81
|
0 -> fixed; 1->
descriptor sense format
|
every_nth
|
0
|
read-write
|
n commands from now
|
|
for error injection: 0 -> don't do error
injection
|
fake_rw
|
0
|
read-write |
next command |
1.80
|
when set does no processing
when a READ or WRITE command (of any cdb size) is received.
When fake_rw=1 no ram is allocated.
|
guard
|
0
|
read-only
|
-
|
1.81
|
protection checksum: 0 ->
crc; 1 -> ip
|
host_lock
|
0
|
read-write
|
next command
|
1.84
|
when set wraps each submitted command a
host_lock which is detrimental in a multi-queue system
|
lbpu
|
0
|
read-only
|
-
|
2012
|
LB provisioning: support UNMAP
|
lbpws
|
0
|
read-only |
-
|
2012
|
LB provisioning: support WRITE SAME(16) and
UNMAP |
lbpws10
|
0
|
read-only |
-
|
2012
|
LB provisioning: support WRITE SAME(10) |
lbprz
|
1
|
read-only |
-
|
2012
|
LB provisioning: returns 0s when reading
unmapped block
|
lowest_aligned |
0
|
read-only |
-
|
1.81 |
RCAP_16's lowest aligned logical block
address (max: 0x3fff) |
max_luns
|
1
|
read-write
|
next positive add_host or scan
|
|
responds to luns: 0 ... (max_luns-1) or 1 ...
(max_luns-1) if no_lun_0 is set
|
max_queue
|
576
|
read-write
|
next command
|
1.82
|
number of commands driver can
queue before telling mid-level it is full. Safe to change
when commands already queued.
|
ndelay
|
0
|
read-write
|
next command
|
1.84
|
IO command response delay: units are
nanoseconds. If > 0 then the delay parameter will be
ignored (it appears as -9999)
|
no_lun_0
|
0
|
read-write
|
next positive add_host or
scan |
1.77
|
no lun 0 but responds to
INQUIRY and REPORT LUNS as per SPC-2
|
no_uld
|
0
|
read only
|
-
|
1.82
|
only attaches to sg and bsg
devices
|
num_parts
|
0
|
read only
|
- |
|
number of partitions
|
num_tgts
|
1
|
read-write
|
next positive add_host or scan
|
|
targets per host
|
opt_blks
|
64
|
read-only
|
|
1.84
|
'Optimal transfer length' field in Block
Limits VPD page
|
opts
|
0
|
read-write
|
usually following commands
|
|
0 -> quiet and no error injection
(mask 16 to inject aborted_command new in 1.81)
|
physblk_exp
|
0
|
read only
|
-
|
1.81
|
2**physblk_exp sets READ
CAPACITY(16)'s logical blocks per physical block exponent
field
|
ptype
|
0
|
read-write |
next positive add_host or
scan |
|
peripheral device type
(0==disk)
|
scsi_level
|
5
|
read only
|
-
|
|
from: 0 (no compliance), 1, 2 (SCSI-2), 3
(SPC), 4 (SPC-2), 5 (SPC-3), 6 (SPC-4)
|
sector_size
|
512
|
read only
|
-
|
1.81
|
logical block size in bytes.
512, 1024, 2048 and 4096 accepted
|
unmap_alignment |
0
|
read only
|
-
|
1.81
|
Block limits VPD page's unmap granularity
alignment |
unmap_alignment |
0
|
read only
|
-
|
1.81
|
Block limits VPD page's optimal unmap
granularity |
unmap_max_blocks |
0
|
read only
|
-
|
1.81
|
Block limits VPD page's maximum unmap LBA
count |
unmap_max_desc |
0
|
read only
|
-
|
1.81
|
Block limits VPD page's maximum unmap block
descriptor count |
virtual_gb
|
0
|
read-write
|
immediate, next READ CAPACITY
|
1.79
|
When 0 then device is
dev_size_mb sized ram disk. When n > 0, "virtual" n
Gibibyte size disk, wrapping on dev_size_mb actual ram. The
Gibibyte unit is 2**30 bytes
|
vpd_use_hostno
|
1
|
read-write |
next positive add_host or
scan |
1.80
|
the driver generates serial
numbers and SAS naa-5 addresses based on host number
("hostno"), target id and lun. When set to 0, the generated
numbers ignore "hostno".
|
write_same_length
|
0xffff
|
read-write
|
-
|
2012
|
maximum blocks per WRITE SAME command
|
The add_host parameter is the number of hosts (HBAs) to
simulate. The default is 1. For boot time and module loads the
allowable values are 0 through to a large positive number. For sysfs
writes, a value of 0 does nothing while a positive number adds that
many hosts and a negative number removes that number of hosts. A
sysfs read of this parameter shows the current number of hosts
scsi_debug is simulating. No more than num_tgts target ids
will be used per host. Target ids are in ascending order from 0
excluding the target id that is used by the initiator (i.e. HBA) if
any. The default setting of num_tgts is 1. The default
setting for max_luns is 1. So the number of pseudo disks
simulated at driver initialization time is (add_host * num_tgts
* max_luns). Note that if any of these three
parameters is set to zero at kernel boot time or module load time
then no devices are created. Modifying the add_host
parameter in sysfs can be used to simulate hot plugging and
unplugging of hosts. See below for adding and deleting individual
scsi devices
The ato parameter sets the field of the same name in the
control mode page. The default value is 1 which implies the host is
the application tag owner. A value of 0 implies the device server
(e.g. the (pseudo) disk) is the application tag owner.
The clustering parameter informs the SCSI mid layer whether
(1) of not (0) clustering is enabled. The default is that is is not
(0). Setting this parameter facilitates large transfers of data with
a single command.
The delay parameter is the number of jiffies by which the
driver will delay responses. The default is 1 jiffy unless the
ndelay parameter is given, see its description. Setting this
parameter to 0 will cause the response to be sent back to the mid
level before the request function is completed. The "jiffy" is a
kernel space jiffy (typically the largest HZ figure yields a 1
millisecond on i386) rather than a user space jiffy (USER_HZ is
typically 10 milliseconds on i386). HZ and USER_HZ are
configurable in the kernel build. Both delayed and immediate
responses are permitted however delayed responses are more
realistic. For delayed responses, a kernel timer is used. [Real
adapters would generate an interrupt when the response was ready
(i.e. the command had completed).] For a fast ram disk set the delay
parameter to 0. These SCSI commands ignore the delay parameter and respond
immediately: INQUIRY, REPORT LUNS, REQUEST SENSE, SYNCHRONIZE
CACHE plus various other non "media access" commands. TEST UNIT
READY is considered a media access command.
The delay parameter may be set to -1 or -2 which uses a
kernel tasklet to generate a more or less immediate response (but
in a different kernel thread). The -1 variant schedules a high
priority tasklet while -2 schedules a normal priority tasklet.
Trying to write a new value to delay while there are
queued command responses may result in an EBUSY error.
The dev_size_mb parameter allows the user to specify the
size of the simulated storage. The unit is Mebibytes (each 2**20
bytes and a bit larger than a Megabyte) and the default value is
8. The maximum value depends on the capabilities of the vmalloc()
call on the target architecture. If the module fails to load with
a "cannot allocate memory" message then a "vmalloc=nn{KMG}" boot
time argument may be needed. [See the kernel source file: Documentation/kernel-parameters.txt
for more information on this.] The RAM reserved for storage is
initialized to zeros which leads the sd (scsi disk) driver and the
block layer to believe there is no partition table present.
Partitions can be simulated with num_parts
(see below). All simulated dummy devices share the same RAM. If a
value of 0 or less is given then dev_size_mb
is forced to 1 so 1 MB of RAM is used. Given 512 byte logical
blocks, the largest ramdisk that can be allocated is 2 TB but it
is unlikely a system would be able to allocate that much ram (a
situation that would be bypassed if fake_rw=1). Very large
amounts of "virtual" storage can be simulated with the virtual_gb
parameter (see below).
The every_nth parameter takes a decimal number as an
argument. When this number is greater than zero, then incoming
commands are counted and when <n> is reached then the
associated command generates some sort of error. Currently the
available errors are timeout (when "opts & 4" is true)
and RECOVERED_ERROR (when "opts & 8" is true) . Once
the command count reaches <n> then it is reset to zero. For
example setting every_nth to 3 and opts to 4 will
cause every third command to be ignored (and hence a timeout). If
every_nth is not given it is defaulted to 0 and timeouts
and recovered errors will not be generated.
If every_nth is negative
then an internal command counter counts down to that value and
when it is reached, continually generates the error condition
(specified in opts) on
each newly received command. The driver flags this continual error
state by setting every_nth
to -1 . The user can stop error conditions being generated on
receipt of every subsequent command by writing 0 to every_nth (or opts ).
The fake_rw parameter instructs the scsi_debug driver to
ignore all READ and WRITE commands and return a GOOD status. This
means the data "read" when fake_rw is set is whatever was
previously in the scatter gather list. The default value is 0
(i.e. process READ and WRITE commands). This parameter is for
testing and when set can confuse the kernel or utilities that look
for partitions and other information on a "disk".
The guard parameter when set to zero (the default) use
T10 defined CRC in the protection information. When set to one the
IP (internet protocol) checksum (as used by iSCSI ?) is used.
The host_lock parameter indicates whether each command
(excluding its response delay and associated callback into the
mid-layer) is surrounded by a per host host_lock (which is a
kernel "spin lock"). In a SCSI multi-queue system the presence of
this host lock will have the effect of serializing all commands
form a host; and that is detrimental to system performance. Prior
to version 1.84 this parameter was not available and the host_lock
surround all commands. In version 1.84 and later the default is 0
which means the host_lock is not applied. Set host_lock=1 for the
old behaviour.
The lbpu parameter, if set, causes the logical block
partitioning VPD page to set the field of the same name. The
default is to set the LBPU field to 0. When set this field
indicates the UNMAP command is supported.
The lpbws and lbpws10 parameters cause the
corresponding bits in the logical block partitioning VPD page to
be set. The imply the the UNMAP field within the WRITE SAME(16)
and WRITE_SAME(10) respectively are supported.
The lbprz parameter, if set, causes the logical block
partitioning VPD page to set the field of the same name. When this
field is set reading unmapped logical blocks will yield block(s)
of data full of xeros to be returned.
The lowest_aligned parameter sets the field called LOWEST
ALIGNED LOGICAL BLOCK ADDRESS in the READ CAPACITY (16) command
response.
The default is zero which implies the logical block size and the
physical block size are the same.
The max_luns parameter allows an upper limit to be placed
on the logical unit number (lun) that the scsi_debug driver will
respond to. A value of 2 means that this driver will respond to
logical unit numbers 0 and 1. If max_luns is modified by a
sysfs write then the scsi_debug driver modifies the scsi_host::max_lun member
of all hosts that it owns. When max_luns is modified by a
sysfs write then it will take effect the next time a host is added
(see add_host) or when a scan is done on any existing
host. The mid level scanning code will scan for up to but not
including max_scsi_luns which is a SCSI mid level boot and
module load time parameter.
The max_queue parameter indicates the maximum number of
queued responses the driver can handle. This defaults to an
internal define in the scsi_debug driver called
SCSI_DEBUG_CANQUEUE which is currently 576. If both the delay
and ndelay parameters are 0, no commands have queued
responses. If there is an attempt to exceed this value then either
SCSI_MLQUEUE_HOST_BUSY is returned to the mid-layer (the default)
or a status of TASK_SET_FULL (if the 0x200 opts mask is set).
Sysfs can be used at any time to change the value of max_queue,
even when the are queued command responses.
The ndelay parameter is the response delay whose units
are nanoseconds. This mechanism depends on high resolution timers
in the kernel which may not be supported on small or old system
(it is a kernel build config option). Its default value is 0 which
means the delay parameter is operative. If ndelay is a
positive value then a response delay for that many nanoseconds is
active (and to indicate the delay parameter is overridden, it is
set to -9999). Depending on the hardware, setting ndelay
to less than a few microseconds probably causes no further
reduction in the observed response delays. Trying to write a new
value to ndelay while there are queued command responses
may result in an EBUSY error.
The num_parts parameter
writes a partition table to the ramdisk if the parameter's value is
greater than 0. The default is 0 so in that case the ramdisk is
simply all zeros. When num_parts
is greater than zero a DOS format primary partition block is written
to logical block 0, so the number of partitions is limited to a
maximum of 4. The partitions are given an id of 0x83 which is a
"Linux" partition. The available space on the ramdisk is roughly
divided evenly between partitions when 2 or more partitions are
requested. The partitions are not initialized with any file system. Even if
no partitions are specified, a utility like fdisk can be used to added them
later.
The num_tgts parameter allows the number of targets
per host to be specified. It should be 0 or greater. Target id
numbers start at 0 and ascend, bypassing the target id of the
initiator (i.e. the HBA). If num_tgts is modified by a sysfs
write then the scsi_debug driver modifies the scsi_host::max_id member of
all hosts that it owns. When num_tgts is modified by a
sysfs write then it will take effect the next time a host is added
(see add_host) or when a scan is done on any existing host.
The no_lun_0 parameter when
set to a non zero value causes a lun 0 INQUIRY response of
peripheral_qualifier==3 indicating there is no actual lu there. As
required by SPC, lun 0 will still respond to the a REPORT LUNS
command. If the REPORT LUNS has a 'select report' code of 1 or 2,
then one of the luns reported will be the REPORT LUNS well known
logical unit (lun 49409 or 0xc101). The default value is 0. If
max_luns is greater than 1, the the first lun generated by
scsi_debug will be lun 1 (since lun 0 is skipped). The REPORT LUNS
well known logical unit (wlun) only supports the INQUIRY, REPORT
LUNS, REQUEST SENSE and TEST UNIT READY SCSI commands. To make this
wlun appear as a scsi generic (sg) device see the REPORT LUNS well
known LUN example below.
The opt_blk parameter is placed in the "Optimal transfer
length" field of the Block Limits VPD page. Its default value is 64.
The opts parameter takes a number as an argument
which is the bitwise "or" of several flags. The flags that mention
"nth" are only active when every_nth != 0 . So-called
"read-write" commands include some others such as VERIFY. The
flags supported are:
- 1 - "noisy" flag: all calls to
entry points of driver are logged. Commands to be executed are
shown in hex. Additional information such as check conditions,
command aborts and resets are logged
- 2 - "medium error" flag:
simulates a SCSI MEDIUM ERROR when sector 0x1234 (that is 4660
in decimal) is read
- 4 - ignore "nth" command
causing a timeout.
- 8 - cause "nth" read or write
command to yield a RECOVERED_ERROR.
- 0x10 - cause "nth" read-write command
to yield an ABORTED_COMMAND (ack/nak timeout) which is a SAS
transport error.
- 0x20 - cause "nth" read-write command
to yield an ABORTED_COMMAND (logical block guard check failed),
nominally a DIF (Protection Information) error
- 0x40 - cause "nth" read-write command
to yield an ABORTED_COMMAND (logical block guard check failed),
nominally a DIX error
- 0x80 - ignore "nth" media access
command causing a timeout
- 0x100 - cause "nth" read command to yield
half the data it was requested to read
- 0x200 - log generation of TASK SET FULL and
host busy plus changes to queue depth and type
- 0x400 - if max_queue is exceeded yield a
TEST SET FULL (default: host busy)
- 0x800 - cause "nth" read-write command
whose queue_depth is at it maximum value to yield a status of
TASK SET FULL
- 0x1000 - set WCE field in the caching page
to 0 (default WCE=1)
- 0x2000 - log only abort commands and the
various levels of reset
- 0x4000 - used together with the noisy flag
(1) to suppress the logging of cdbs; additional information (if
any) is still logged.
The opts "noisy" (or debug) flag will cause all scsi_debug
entry points to be logged in the system log (and often sent to the
console depending on how kernel informational messages are
processed). With this flag set commands are listed in hex and if
they yield a result other than successful then that is shown. In a
busy system this may prove to be too much log "noise" in which case
this combination of flags may be useful: opts=0x6201 .
The opts "medium error" flag will cause any read command
whose range of sectors includes sector 0x1234 (4660 in decimal) to
return a medium error indication to the mid level. The "ignore nth"
flag is only active when every_nth != 0 . When an internal
command counter reaches the value in every_nth and the
"ignore nth" flag is set, then this command is ignored (i.e. quietly
not processed). Typically this will cause the SCSI mid level code to
timeout the command which leads to further error processing. The
internal command counter is reset to zero whenever opts is
written to, whenever every_nth is written to, when the every_nth
value is reached and at driver load time. The "recovered error" flag
works in a similar fashion to the "ignore nth" flag, however
when the every_nth value is reached and it is either a read
or a write command then the command is processed normally but yields
a "recovered error" indication. Such an indication is _not_ a hard
error but for a real disk could indicate deteriorating media. The
"aborted command" flag injects a transport error in a similar
fashion to the way the "recovered error" flag works. A minor point:
the kernel boot time and module load time opts parameter is
a decimal integer. However the output sysfs value is a hexadecimal
number (output as 0x9 for example) while the input value is
interpreted as hexadecimal if prefixed by "0x" and decimal
otherwise. When combining these flags it is easier to consider them
as hexadecimal numbers.
The physblk_exp parameter becomes the "Logical blocks per
physical block exponent" field in the READ CAPACITY (16) response.
The default value is 0 which means the logical block and physical
block sizes are the same.
The ptype parameter allows
the SCSI peripheral type to be set or modified. The default value is
0 which corresponds to a disk. Other useful peripheral types are 1
for tape, 3 for processor, 5 for dvd/cd and 13 for enclosure (SES).
The scsi_level parameter is the ANSI SCSI standard level
that the simulated disk announces that it is compliant to. The
INQUIRY response which is generated by scsi_debug contains the ANSI
SCSI standard level value (in byte 2).
The virtual_gb parameter
allows the scsi_debug driver to simulate a much larger storage
device than physical RAM available in the machine. When the virtual_gb
parameter is 0 (its default value) then the maximum storage
available is that indicated by the dev_size_mb
parameter. When the virtual_gb
parameter is greater than zero, that many Gibibytes (each of 2**30
bytes and larger than a Gigabyte) are reported by the READ CAPACITY
command. Reading and writing of the "Gigabytes" of data wraps around
within the available physical ram (which the scsi_debug driver has
allocated and is dev_size_mb
Mebibytes in size). When the number of virtual Gibibytes is 2048 or
greater then READ CAPACITY (16) is needed to represent the size and
READ (16) and/or WRITE (16) are needed to access data at the 2048
Gibibyte boundary and beyond. This boundary represents 2**32-1
blocks (sectors) assuming 512 bytes long. The "wrapping" action
still allows partitions to be written with fdisk and in many cases a
file system to be initialized. Trying to store and retrieve any
useful data on such a big virtual disk would not be wise! Setting
the dev_size_mb parameter
to a prime number, larger than the default value (which is 8) and
that doesn't starve the machine for resources, seems to help in
creating ext3 file systems. This occurs since mkfs writes the file
system super block at several offsets within the partition, and the
wrap may cause the file system header to be overwritten. The virtual_gb option is designed for
testing, not practical data storage.
The vpd_use_hostno
parameter affects the way the scsi_debug driver generates its serial
numbers, SAS and naa-5 addresses. When vpd_use_hostno is set to 1 (its default value) then
the host number ("hostno"), target_id and lun are used to generate
the serial number, SAS and naa-5 addresses. The formula is "((hostno
+ 1) * 2000) + (target_id * 1000) + lun)". When vpd_use_hostno is set to 0 then
the "hostno" term in the formula is set to 0. This has the affect of
making multiple simulated hosts look like they are connected to the
same drives (i.e. there are only "num_tgts * max_luns"
unique simulated devices). The kernel will still report "add_host
* num_tgts * max_luns" devices but higher level
multipath aware software may see the difference.
Supported SCSI
commands
Below is a list of supported commands. Some do nothing (e.g.
SYNCHRONIZE CACHE). Those that have interesting functionality have
notes in brackets. If the feature was introduced in a recent
version (i.e. since 1.76) then that is noted.
- ALLOW MEDIUM REMOVAL
- GET LBA STATUS
- INQUIRY [vital product data pages: 0, 0x80, 0x83] [1.77: VPD
pages: 0x85, 0x86, 0x87, 0x88, 0x89, 0xb0]
- LOG SENSE [1.78: temperature(0xd) and informational
exceptions(0x2f)] [1.80: support log subpages]
- MODE SELECT (6), MODE SELECT (10) [1.84: changeable pages: 0x8
(caching), 0xa (control) and 0x1c (informational exceptions)]
- MODE SENSE (6), MODE_SENSE (10) [sense pages: 1 (rw error
recovery), 2 (disconnect), 3 (format), 8 (caching), 0xa
(control), 0x1c (informational exceptions), 0x3f (read all)]
[1.77: subpage support plus SAS pages: 0x19,0 0x19,1 and 0x19,2]
- READ (6), READ (10), READ(12), READ(16), READ(32)
- READ CAPACITY (10), READ CAPACITY (16) [1.79: added 16 byte
command]
- RELEASE (6), RELEASE (10)
- REPORT LUNS [1.77: shows REPORT LUNS wlun]
- REPORT TARGET PORT GROUPS
- REQUEST SENSE [1.79: shows MRIE=6 failure prediction, power
states]
- RESERVE (6), RESERVE (10)
- REZERO UNIT (which is REWIND for tapes)
- SEND DIAGNOSTIC
- START STOP [1.78: maintains start and stop states, when
stopped fails media access commands]
- SYNCHRONIZE CACHE
- TEST UNIT READY [1.78: in stopped state gives appropriate
error]
- VERIFY (10)
- WRITE (6), WRITE (10), WRITE (12), WRITE (16), WRITE(32)
- WRITE SAME(10), WRITE SAME(16)
- UNMAP
- XDWRITEREAD [which is a bidirectional command]
The implementations of the above commands are sufficient for the
scsi subsystem to detect and attach devices. The fdisk, e2fsck
and mount commands also work as do the utilities found in
the sg3_utils package (see the main page). Crude error processing
picks up unsupported commands and attempts to read or write outside
the available RAM storage area.
Modern SCSI devices use vital product page 0x83 for
identification. This driver yields both "T10 vendor
identification" and "NAA" descriptors. The former yields an ASCII
string like "Linux
scsi_debug 4000" where the "4000" is
the ((host_no + 1) * 2000) + (target_id * 1000) + lun). In this
case "4000" corresponds to host_no==1, target_id==0 and
lun==0. The "NAA-5" descriptor is an 8 byte binary value that
looks like this hex sequence: "51 23 45 60 00 00 0f a0" where the
IEEE company id is 0x123456 (fake) and the vendor specific
identifier in the least significant bytes is 4000 (which is fa0 in
hex). [The "4000" is derived the same way for both descriptors.]
Read and write commands executed by the scsi_debug driver are
atomic (i.e. a write to one scsi_debug device will not interrupt
(split) a read from another scsi_debug device. So a read
command will either yield the contents of ram before a co-incident
write, or after the co-incident write has finished
Logical and
physical block size
scsi-debug supports emulating devices with logical block sizes
bigger than 512 bytes. This can be specified using the sector_size
option.
Some storage devices use physical block sizes bigger than 512 bytes
internally but expose a 512-byte logical block size to the host for
compatibility reasons. The physblk_exp parameter can be used to
indicate that the internal block size is 2^n times bigger than the
reported logical block size. For instance: Supplying physblk_exp=3 on the
command line will cause scsi_debug to simulate a device with
512-byte logical blocks and 4KB physical blocks.
Not all storage devices have logical block 0 aligned to a physical
block boundary. These devices can be emulated using scsi_debug's
lowest_aligned option. The parameter indicates the lowest LBA that
is aligned to a physical block boundary.
Thin provisioning
New in SBC-3 is the ability for block devices to be thinly
provisioned.
This means that devices can report a capacity that is bigger than
the space actually allocated. When files are deleted, the relevant
blocks can be reclaimed by the storage device and used for something
else. And consequently only blocks that are actively in use consume
physical storage space.
SBC-3 specifies two different approaches for marking blocks as
unused: WRITE SAME(16) with the UNMAP bit set, and the UNMAP
command. scsi_debug supports both methods and they are controlled
via 4 module parameters:
- unmap_max_desc specifies the maximum number of ranges that can
be unmapped using a single UNMAP command. If this is set to 0,
only WRITE SAME is supported and UNMAP will cause a check
condition.
- unmap_granularity specifies the granularity at which to track
mapped blocks (specified in number of logical blocks). 2048 (1
MB) is a realistic value for disk arrays although some may have
a finer granularity.
- unmap_alignment specifies the first LBA which is naturally
aligned on an unmap_granularity boundary.
- unmap_max_blocks specifies the maximum number of blocks that
can be unmapped using a single UNMAP command. Default is
0xffffffff.
Examples:
modprobe
scsi_debug unmap_max_desc=0 unmap_granularity=1
will simulate a device that only supports WRITE SAME(16) and which
tracks usage on a per logical block basis. This is how most solid
state drives work.
modprobe
scsi_debug unmap_max_desc=64 unmap_granularity=2048
will simulate a device that supports UNMAP and which is provisioned
in 1MB chunks. This is a common scenario for thinly provisioned
storage arrays.
The current block allocation bitmap can be viewed from user space
via:
cat
/sys/bus/pseudo/drivers/scsi_debug/map
Examples
Basic
Since scsi_debug is for testing it seems more useful to build it as
a module rather than build it into the kernel. Some parameters
cannot be changed once the scsi_debug driver is running. So if it is
a module then it can be removed with rmmod and reloaded with
another modprobe call with the desired parameters.
When the driver is loaded successfully simulated disks should be
visible just like other SCSI devices:
# modprobe scsi_debug
# lsscsi -s
[0:0:0:0] disk SEAGATE
ST33000650SS 0005
/dev/sda 3.00TB
[0:0:1:0] enclosu Intel
RES2SV240 0d00
-
-
[4:0:0:0] disk
ATA
ST3160812AS
D /dev/sdb 160GB
[7:0:0:0] disk
Linux
scsi_debug 0184
/dev/sdc 8.38MB
In this case there is a 3 TB SAS disk, an ATA disk and a small
scsi_debug pseudo disk. The other device (at [0:0:1:0])
is a SCSI Enclosure Service (SES) device. The /dev/sdc
pseudo disk is full of zeros and has no partitions. To get a
partition the num_parts parameter could have been used on
the modprobe line or it could be done from the command
line with the fdisk /dev/sdc command. Assuming one ext3
partition is allocated to the whole pseudo disk (8 MB in this
case) then the mkfs.ext3 /dev/sdc1 command can be used to
make an ext3 file system. Now /dev/sdc1 can be mounted and treated like a
normal file system. Naturally when the power is turned off
anything stored in /dev/sdc1
will be forgotten.
Rather than mounting the pseudo disk, the sg3_utils
package could be used to carry out various tests on it.
Information about the scsi_debug driver version, its current
parameters and some other data can be found in the "proc" file
system. The trailing number in the path is the scsi_debug host
number which is the first element in the 4 item tuple shown in the
lsscsi above :
# cat /proc/scsi/scsi_debug/7
scsi_debug adapter driver, version 1.84 [20140706]
num_tgts=1, shared (ram) size=8 MB, opts=0x0, every_nth=0
delay=1, ndelay=0, max_luns=1, q_completions=1859648
sector_size=512 bytes, cylinders=64, heads=8, sectors=32
command aborts=0; RESETs: device=0, target=0, bus=0, host=0
dix_reads=0 dix_writes=0 dif_errors=0 usec_in_jiffy=1000
Here is an important sysfs directory for the scsi_debug driver:
# cd /sys/bus/pseudo/drivers/scsi_debug/
# ls
adapter0 dev_size_mb
fake_rw max_queue
num_tgts sector_size
add_host
dif
guard
ndelay
opts uevent
ato
dix
host_lock no_lun_0
ptype unbind
bind
dsense
map
no_uld removable virtual_gb
delay every_nth
max_luns num_parts scsi_level
vpd_use_hostno
Those files are most of the scsi_debug parameters, those that are
writable can be modified and the scsi_debug actions will change
accordingly thereafter. Certain parameters cannot be changed while
the driver is busy (e.g. it has queued command responses), in
which case EBUSY is returned if the user attempts to change one.
Reading one can be done with the cat command and changing one can
be done with the echo command:
# cat every_nth
0
# echo 2000 > every_nth
Another important sysfs directory for (any) disks is /sys/block/<disk_node_name>
and its queue sub-directory. So in this case of this
scsi_debug pseudo disk that directory would be
/sys/block/sdc/queue . Also there is the scsi_device sysfs
directory that has the form
/sys/class/scsi_device/<h:c:t:l>/device where the
<h:c:t:l> tuple is found at the left hand side of
each device listed by lsscsi. This sysfs
directory contains many important SCSI device parameters some of
which can be modified.
Adding and
removing hosts and devices
Individual devices can be removed via sysfs and the mid-level by
writing any value into the "delete" member in the sysfs directory
corresponding to the scsi device. Given these devices:
# lsscsi -s
[0:0:0:0] disk
SEAGATE ST200FM0073 0A04
/dev/sda 200GB
[4:0:0:0] disk
ATA
ST3160812AS
D /dev/sdb 160GB
[7:0:0:0] disk
Linux
scsi_debug 0184
/dev/sdc 21.4GB
then the scsi_debug (pseudo) disk can be deleted like this:
# echo 1 >
/sys/class/scsi_device/7:0:0:0/device/delete
After which this should be seen:
# lsscsi -s
[0:0:0:0] disk
SEAGATE ST200FM0073 0A04
/dev/sda 200GB
[4:0:0:0] disk
ATA
ST3160812AS
D /dev/sdb 160GB
This will work for any scsi device (not just those belonging to
scsi_debug). That scsi device can be re-added with the following
command:
# echo "0 0 0" >
/sys/class/scsi_host/host7/scan
# lsscsi
[0:0:0:0] disk SEAGATE
ST200FM0073 0A04 /dev/sda
[4:0:0:0] disk
ATA
ST3160812AS
D /dev/sdb
[7:0:0:0] disk
Linux
scsi_debug 0184 /dev/sdc
The three numbers in the "echo" are channel number, target number
and lun, respectively. Wildcards (hyphen: "-") can be given for any
or all of the three numbers.
# echo 3 > /sys/bus/pseudo/drivers/scsi_debug/max_luns
# echo 2 > /sys/bus/pseudo/drivers/scsi_debug/num_tgts
# echo "0 - -" > /sys/class/scsi_host/host7/scan
# lsscsi
[0:0:0:0] disk SEAGATE
ST200FM0073 0A04 /dev/sda
[4:0:0:0] disk
ATA
ST3160812AS
D /dev/sdb
[7:0:0:0] disk
Linux
scsi_debug 0184 /dev/sdc
[7:0:0:1] disk
Linux
scsi_debug 0184 /dev/sdd
[7:0:0:2] disk
Linux
scsi_debug 0184 /dev/sde
[7:0:1:0] disk
Linux
scsi_debug 0184 /dev/sdf
[7:0:1:1] disk
Linux
scsi_debug 0184 /dev/sdg
[7:0:1:2] disk
Linux
scsi_debug 0184 /dev/sdh
The 'echo "0 - -" > scan' line above added five devices:
/dev/sdd to /dev/sdh .
Extra hosts can be added and removed from the scsi_debug driver as
follows:
# cd /sys/bus/pseudo/drivers/scsi_debug
# echo 1 > add_host # add a new host (after the existing
hosts)
# echo -2 > add_host # remove the last two hosts (if at
least that many are present)
The scsi_debug driver does not have any limits on the number of scsi
devices it can create. By default when loaded it has one scsi device
(owned by a host). Larger numbers of devices can be introduced at
load time by specifying the add_host, num_tgts and/or max_luns parameters, the number
of scsi devices created is the product of the 3 parameters (they all
default to 1). Alternatively sysfs can be used to add (or remove)
scsi devices after the scsi_debug driver is loaded. Two strategies
can be used:
- increase the value of num_tgts
or max_luns then use a
line like 'echo "0 - -" > scan' (shown above) to a host
already owned by the scsi_debug driver.
- add more hosts with a line like 'echo 3 > add_host'. Each
new host will create (num_tgts
* max_luns) new scsi
devices. Of course num_tgts
or max_luns can be
modified prior to calling 'echo 3 > add_host'.
Even though the scsi_debug can create ten thousand or more devices,
it doesn't mean that the scsi mid-level, sd, sg, the block layer and
various other kernel components will handle it gracefully.
Mode pages
The supported mode pages are listed following the MODE SENSE entry
in the supported commands sections above. Prior to version 1.80,
when a mode page is read no block descriptor is included in the
response. From version 1.78 the MODE SELECT command is supported.
Three mode pages can be modified:
- caching (WCE field is changeable) [added in version 1.84]
- control (D_SENSE field is acted upon)
- informational exceptions control (MRIE and TEST fields are
acted upon by REQUEST SENSE)
The saved pages are not supported, reflecting that the scsi_debug
driver has only volatile storage. All fields can be changed, only
those fields indicated above have side effects.
REPORT LUNS Well
Known LU
There are two techniques for discovering the luns that a SCSI target
supports. The first (and oldest) is based sending commands like
INQUIRY and REPORT LUNS to lun 0, even if the target has no lun 0.
The second technique is based on one of the so-called "well known
logical units", specifically the REPORT LUNS well known logical
unit. If present it must support the INQUIRY, REPORT LUNS, REQUEST
SENSE and TEST UNIT READY command. Simulating one with scsi_debug is
somewhat contorted:
# modprobe scsi_debug
no_lun_0=1 max_luns=2
#
# lsscsi -g
[0:0:0:0] disk
ATA INTEL SSDSC2BW18 DC32
/dev/sda /dev/sg0
[3:0:0:1] disk
Linux
scsi_debug 0184
/dev/sdb /dev/sg1
#
# lsscsi --hosts
[0]
ahci
[1]
ahci
[2]
ahci
[3] scsi_debug
#
# cd /sys/class/scsi_host/host3
# echo "- - 49409" > scan
#
# lsscsi -g
[0:0:0:0] disk
ATA INTEL SSDSC2BW18 DC32
/dev/sda /dev/sg0
[3:0:0:1] disk
Linux
scsi_debug 0184
/dev/sdb /dev/sg1
[3:0:0:49409]wlun Linux
scsi_debug 0184
- /dev/sg2
The scsi_debug driver needed to be told that it had no_lun_0 so it
started generating luns at 1 ([3:0:0:1]) and then the scsi
sub-system needed to be told to scan specifically for lun 49409
(0xc101). Thereafter the REPORT LUNS wlun appeared.
The way a SCSI initiator (host) scans for targets is transport
specific. In the case of the scsi_debug driver it has a magic
transport (bus) called "pseudo" which does the right thing. Apart
from target discovery, the scsi_debug driver tries to simulate SAS
devices, see the next section.
SAS personality
The scsi_debug driver has a Serial Attached SCSI (SAS) personality.
For any application that cares, it looks like a dual ported SAS disk
accessed via the primary port (relative target port 1). In one case
it masquerades as a SATA disk behind a SCSI to ATA Translation (SAT)
layer (SATL). Many of the settings are in common with Fibre Channel
dual ported disks.
The driver sets the MULTIP (multiport) bit in the INQUIRY response.
The following VPD pages are SAS or SAT specific:
- device identification page [0x83] (yields naa-5 addresses for
the lu, the accessing target port and the target device, plus
some other designators)
- SCSI ports [0x88] (shows the naa-5 addresses of both ports)
- ATA information [0x89] (simulates a SATA disk in a SAS domain,
defined in SAT)
The naa-5 addresses are meant to be world wide unique names which
represents a challenge to the scsi_debug driver. Amongst other
things Linux does not have a IEEE company id [memo: OSDL]. Even if
it did, making them truly unique in a virtual driver, especially if
multiple boxes could somehow see each other, would be difficult.
There are also several SAS specific mode pages:
- protocol specific port page (SAS): short format page
[0x19,0x0]
- protocol specific port page (SAS): phy control and discover
subpage [0x19,0x1]
- protocol specific port page (SAS): shared mode subpage
[0x19,0x2] (sas2 version)
Both the VPD and mode pages can be viewed from the user space with
an application like sdparm . Below is an
example of the device identification VPD page:
# sdparm -i /dev/sda
/dev/sda: Linux
scsi_debug 0004
Device identification VPD page:
Addressed logical unit:
desig_type: T10 vendor identification,
code_set: ASCII
vendor id: Linux
vendor specific:
scsi_debug 2000
desig_type: NAA, code_set: Binary
0x53333330000007d0
Target port:
desig_type: Relative target port,
code_set: Binary
transport: Serial Attached SCSI (SAS)
Relative target port: 0x1
desig_type: NAA, code_set: Binary
transport: Serial Attached SCSI (SAS)
0x52222220000007ce
Target device that contains addressed lu:
desig_type: NAA, code_set: Binary
transport: Serial Attached SCSI (SAS)
0x52222220000007cd
desig_type: SCSI name string, code_set:
UTF-8
transport: Serial Attached SCSI (SAS)
SCSI name string:
naa.52222220000007CD
Below is an example of the SCSI ports VPD page showing a dual ported
target:
# sdparm -i -p sp /dev/sda
/dev/sda: Linux
scsi_debug 0004
SCSI Ports VPD page:
Relative port=1
Target port descriptor(s):
desig_type: NAA, code_set: Binary
transport: Serial Attached SCSI (SAS)
0x52222220000007ce
Relative port=2
Target port descriptor(s):
desig_type: NAA, code_set: Binary
transport: Serial Attached SCSI (SAS)
0x52222220000007cf
Notice that the above implies that the INQUIRY was sent via port 1
(port A) of the emulated SAS dual ported target. The protocol
specific port phy control and discover mode subpage [0x19,0x1] has
target port/phy SAS addresses that correspond to the SCSI ports VPD
page:
# sdparm -t sas -p pcd -l
/dev/sda
/dev/sda:
Linux
scsi_debug 0004
Direct
access device specific parameters: WP=0 DPOFUA=0
port: phy control and discover
(SAS) mode page:
PPID_1 6 [cha: n, def:
6] Port's (transport) protocol identifier
NOP 2 [cha:
n, def: 2] Number of phys
PHID 0 [cha: n,
def: 0] Phy identifier
ADT 1 [cha:
n, def: 1] Attached device type
NPLR 9 [cha: n,
def: 9] Negotiated physical link rate
ASIP 1 [cha: n,
def: 1] Attached SSP initiator port
ATIP 0 [cha: n,
def: 0] Attached STP initiator port
AMIP 0 [cha: n,
def: 0] Attached SMP initiator port
ASTP 0 [cha: n,
def: 0] Attached SSP target port
ATTP 0 [cha: n,
def: 0] Attached STP target port
AMTP 0 [cha: n,
def: 0] Attached SMP target port
SASA 0x52222220000007ce [cha:
n, def:0x52222220000007ce] SAS address
ASASA 0x5111111000000001 [cha: n,
def:0x5111111000000001] Attached SAS address
APHID 2 [cha: n,
def: 2] Attached phy identifier
PMILR 8 [cha: n,
def: 8] Programmed minimum link rate
HMILR 8 [cha: n,
def: 8] Hardware minimum link rate
PMALR 9 [cha: n,
def: 9] Programmed maximum link rate
HMALR 9 [cha: n,
def: 9] Hardware maximum link rate
2_PHID 1 [cha: n, def:
1] Phy identifier
2_ADT 1 [cha: n,
def: 1] Attached device type
2_NPLR 9 [cha: n, def:
9] Negotiated physical link rate
2_ASIP 1 [cha: n, def:
1] Attached SSP initiator port
2_ATIP 0 [cha: n, def:
0] Attached STP initiator port
2_AMIP 0 [cha: n, def:
0] Attached SMP initiator port
2_ASTP 0 [cha: n, def:
0] Attached SSP target port
2_ATTP 0 [cha: n, def:
0] Attached STP target port
2_AMTP 0 [cha: n, def:
0] Attached SMP target port
2_SASA 0x52222220000007cf [cha: n,
def:0x52222220000007cf] SAS address
2_ASASA
0x5111111000000001 [cha: n, def:0x5111111000000001]
Attached SAS address
2_APHID 3 [cha: n, def:
3] Attached phy identifier
2_PMILR 8 [cha: n, def:
8] Programmed minimum link rate
2_HMILR 8 [cha: n, def:
8] Hardware minimum link rate
2_PMALR 9 [cha: n, def:
9] Programmed maximum link rate
2_HMALR 9 [cha: n, def:
9] Hardware maximum link rate
Other supported mode pages can be accessed in a similar way by the sdparm utility. Note that transport
specific mode pages need the transport identified: hence the '-t
sas' option above.
Downloads
There is nothing to download, see
<linux_kernel_source>/drivers/scsi/scsi_debug.c .
Conclusion
Hopefully the design of the scsi_debug driver lends itself to many
extensions. If you think that you have a useful extension that
others may be interested in, please contact the author with a patch.
Back to main page
Douglas Gilbert <dgilbert at interlog dot com>
with additions from
Martin K. Petersen <martin dot petersen at oracle dot com>
Last updated: 10th July 2014