The Linux SCSI Subsystem in kernel 2.4 ====================================== 2000/05/16 Please note. I'm experimenting with DocBook (and trying to improve my patience threshhold at the same time). Meanwhile this ASCII text form will stay dormant. Hopefully I will find out how to produce flat text back from the DocBook source (no, sgml2txt doesn't work). Anyway try pointing your browser at: http://www.torque.net/sg/p/linux_scsi_24/linuxscsi24.html Otherwise, read on .... 0) Introduction --------------- This document attempts to describe the SCSI subsystem in the Linux kernel as the 2.3 development series of kernels gives way to the 2.4 production series. An external view of the SCSI subsystem is the main theme. Sections included: 1) - architectural overview 2) - names and addresses - SCSI addressing - device names - device names using devfs 3) - kernel configuration 4) - boot parameters 5) - modules and their parameters 6) - "proc" pseudo file system 7) - mid level, unifying layer 8) - upper level drivers - disk (sd) - cdrom (sr) - tape (st) - generic (sg) 9) - lower level drivers - pseudo drivers 10) - "devfs" pseudo file system A) - appendix: common SCSI bus types B) - appendix: changes between lk 2.2 and 2.4 C) - appendix: performance and debugging tools D) - appendix: compile options and system calls including ioctls E) - appendix: references and credits Material is included to help the system administration of the Linux SCSI subsystem. There is also material relevant to those writing applications that use this subsystem (e.g. ioctl()s). However internal data structures and design issues are not addressed [see reference W2 for these]. To unclutter the presentation, compile options and system calls (including ioctls) have been placed in an appendix. 1) Architectural Overview ------------------------- Following is a diagram that shows the three levels of drivers in the SCSI subsystem: - upper level: sd, sr, st and sg - mid level showing scsi.c and friends (considered as one driver) - lower level showing adapter drivers and pseudo adapter drivers User space ------------------------------------------------------------------------- Kernel space |------------| |------------| |------------| upper | SD | | SR | | ST | |------------| level | disks | |cdroms/dvds | | tapes | | SG | |block device| |block device| |char device | |pass-through| | {sd_mod.o} | | {sr_mod.o} | | {st.o} | |char device | |------------| |------------| |------------| | {sg.o} | |------------| |--------------------------------------------| | SCSI | mid | unifying layer | level | {scsi_mod.o} | | scsi*.[hc], hosts.[hc], constants.c | |--------------------------------------------| |---------------| |---------------| |Host (e.g. UW) |-| |Pseudo drivers |-| lower | Bus Adapter | |-| |for non SCSI | |-| level | Drivers | | | |buses | | | |(e.g. aic7xxx) | | | |(e.g. ide-scsi)| | | |---------------| | | |---------------| | | |---------------| | |---------------| | |---------------| |---------------| The upper level supports the user-kernel interface. In the case of sd and sr this is a block device interface while for st and sg this is a character device interface. Any operation using the SCSI subsystem (e.g. reading a sector from a disk) involves one driver at each of the 3 levels (e.g. sd, SCSI mid level and aic7xxx drivers). As can be seen from the diagram, the SCSI mid level is common to all operations. The SCSI mid level defines internal interfaces and provides common services to the upper and lower level drivers. Ioctls provided by the mid level are available to the file descriptors belonging to any of the 4 upper level drivers. 2) Names and Addresses ---------------------- This section covers the various naming schemes that exist in Linux and the SCSI worlds and how they interact. 2.1) SCSI Addressing -------------------- Linux has a four level hierarchical addressing scheme for SCSI devices: - SCSI adapter number {host} - channel number {bus} - id number {target} - lun {lun} (which is an abbreviation for "Logical Unit Number") The terms in braces are the name conventions used by devfs. "Bus" is used in preference to "channel" in the decsription below. The SCSI adapter number is typically an arbitrary numbering of the adapter cards on the internal IO buses (e.g. PCI, PCMCIA, ISA etc) of the computer. Such adapters are sometimes termed as HBAs (host bus adapters). Each HBA may control one of more SCSI buses. The various types of SCSI buses are listed in Appendix A. Each SCSI bus can have multiple SCSI devices connected to it. In SCSI parlance the HBA is called the "initiator" and takes up one SCSI id number (typically 7). The initiator talks to targets which are commonly known as SCSI devices (e.g. disks). On SCSI parallel buses the number of ids is related to the width. 8 bit buses (sometimes called "narrow") can have 8 SCSI ids of which 1 is taken by the HBA leaving 7 for SCSI devices. Wide SCSI buses are 16 bits wide and can have a maximum of 15 SCSI devices (targets) attached. [SCSI standards allow for multiple initiators to be present on a single bus. The SCSI 3 draft standard allows a large number of ids to be present on a SCSI bus.] Each SCSI device can contain multiple Logical Unit Numbers (LUNs). These are typically used by sophisticated tape and cdrom units that support multiple media. So Linux's flavour of SCSI addressing is a four level hierarchy: Using the naming conventions of devfs this becomes: 2.2) Device Names ----------------- The device names of the various SCSI devices are found within the /dev directory. Traditionally in Linux, SCSI devices have been identified by their major and minor device number rather than their SCSI bus addresses (e.g. SCSI target id and LUN). The device pseudo file system (devfs) redresses this shortcoming [see section 2.3 and ref: W5]. Eight block major numbers are reserved for SCSI disks: 8, 65, 66, 67, 68, 69, 70 and 71. Each major can accommodate 256 minor numbers which, in the case of SCSI disks, are subdivided as follows: [b,8,0] /dev/sda [b,8,1] /dev/sda1 .... [b,8,15] /dev/sda15 [b,8,16] /dev/sdb [b,8,17] /dev/sdb1 .... [b,8,255] /dev/sdp15 The disk device names without a trailing digit refer to the whole disk (e.g. /dev/sda) while the others refer to 1 of the allowable 15 partitions within the disk. The remaining 7 SCSI disk block major numbers follow a similar pattern: [b,65,0] /dev/sdq [b,65,1] /dev/sdq1 .... [b,65,159] /dev/sdz15 [b,65,160] /dev/sdaa [b,65,161] /dev/sdaa1 .... [b,65,255] /dev/sdaf15 [b,66,0] /dev/sdag [b,66,1] /dev/sdag1 .... [b,66,255] /dev/sdav15 .... [b,71,255] /dev/sddx15 So there are 128 possible disks (i.e. /dev/sda to /dev/sddx) each having up to 15 partitions. By way of contrast, the IDE subsystem allows 20 disks (10 controllers each with 1 master and 1 slave) which can have up to 63 partitions each. SCSI CD-ROM devices are allocated the block major number of 11. Traditionally "sr" has been the device name but "scd" probably is more recognizable and is favoured by several recent distributions. 256 SCSI CD-ROM devices are allowed: [b,11,0] /dev/scd0 [or /dev/sr0] [b,11,255] /dev/scd255 [or /dev/sr255] SCSI tape devices are allocated the char major number of 9. Up to 32 tapes devices are supported each of which can be accessed in one of four modes (0, 1, 2 and 3) with or without rewind. The devices are allocated as follows: [c,9,0] /dev/st0 [tape 0, mode 0, rewind] [c,9,1] /dev/st1 [tape 1, mode 0, rewind] .... [c,9,31] /dev/st31 [tape 31, mode 0, rewind] [c,9,32] /dev/st0l [tape 0, mode 1, rewind] .... [c,9,63] /dev/st31l [tape 31, mode 1, rewind] [c,9,64] /dev/st0m [tape 0, mode 2, rewind] .... [c,9,96] /dev/st0a [tape 0, mode 3, rewind] .... [c,9,127] /dev/st31a [tape 31, mode 3, rewind] [c,9,128] /dev/nst0 [tape 0, mode 0, no rewind] .... [c,9,160] /dev/nst0l [tape 0, mode 1, no rewind] .... [c,9,192] /dev/nst0m [tape 0, mode 2, no rewind] .... [c,9,224] /dev/nst0a [tape 0, mode 3, no rewind] .... [c,9,255] /dev/nst31a [tape 31, mode 3, no rewind] The SCSI generic (sg) devices are allocated the char major number of 21. There are 256 possible SCSI generic (sg) devices: [c,21,0] /dev/sg0 [c,21,1] /dev/sg1 .... [c,21,255] /dev/sg255 Note that the SCSI generic device name's use of a trailing letter (e.g. /dev/sgc) is deprecated. Each SCSI disk (but not each partition), each SCSI CD-ROM and each SCSI tape is mapped to an sg device. Also SCSI devices that don't fit into these three categories (e.g. scanners) also appear as sg devices. Pseudo devices [section 9.1] can cause devices that are usually not considered as SCSI to appear as SCSI device names. For example an IDE ATAPI CD-ROM may be picked up by the ide-scsi pseudo driver and mapped to /dev/scd0 . The linux/Documentation/devices.txt file supplied within the kernel source is the definitive reference for Linux device name and corresponding major and minor number allocations. 2.3) Device Names in devfs -------------------------- The device pseudo file system can be mounted as /dev in which case it replaces the traditional Linux device subdirectory. Alternatively it can be mounted elsewhere (e.g. /devfs) and supplement the existing device structure. Without devfs, devices names are typically maintained in the "dev" directory of the root partition. Hence the device names (and their associated permissions) have file system persistence. The existence of a device name does not necessarily imply such a device (or even its driver) is present. To save users having to create device name entries (with the mknod command) most Linux distributions come with thousands of device names defined in the "/dev" directory. When applications try to open() the device name then a errno value of ENODEV indicates there is no corresponding device (or driver). Devfs takes a different approach in which the existence of the device name is directly related to the presence of the corresponding device (and its driver). Assuming devfs is mounted on /dev (the default when "devfs=mount" given as a boot parameter) then SCSI devices have primary device names as follows: /dev/scsi/host0/bus0/target1/lun0/disc [whole disk] /dev/scsi/host0/bus0/target1/lun0/part6 [partition 6 of same disk] /dev/scsi/host1/bus0/target2/lun0/cd [CD reader (or writer)] /dev/scsi/host2/bus0/target0/lun0/mt [tape unit, mode 0, rewind] /dev/scsi/host2/bus0/target0/lun0/mtan [tape unit, mode 3, no rewind] /dev/scsi/host0/bus0/target1/lun0/generic [sg device corresponding to disk] [Notice the quirky spelling of "disc".] It can be seen that devfs's naming scheme closely matches the SCSI addressing discussed in section 2.1 . It is worth noting that the IDE subsystem uses a similar devfs device naming scheme with the word "scsi" replaced with "ide". Devfs is discussed further in section 10. 3) Kernel Configuration ----------------------- The Linux kernel configuration is usually found in the kernel source in the file: /usr/src/linux/.config . It is not recommended to edit this file directly but to use one of these configuration options: make config - starts a character based questions and answer session make menuconfig - starts a "chunky" graphics menu-based configuration tool make xconfig - starts a X based configuration tool The descriptions of these selections that is displayed by the associated help button can be found in the flat ASCII file: /usr/src/linux/Documentation/Configure.help Ultimately these configuration tools edit the ".config" file. An option will either indicate some driver is built into the kernel ("=y") or will be built as a module ("=m") or is not selected. The unselected state can either be indicated by a line starting with "#" (e.g. "# CONFIG_SCSI is not set") or by the absence of the relevant line from the ".config" file. The 3 states of the main selection option for the SCSI subsystem (which actually selects the SCSI mid level driver) follow: CONFIG_SCSI=y CONFIG_SCSI=m # CONFIG_SCSI is not set Some other common SCSI configuration option follow: CONFIG_BLK_DEV_SD [disk (sd) driver] CONFIG_SD_EXTRA_DEVS [extra slots for disks added later] CONFIG_CHR_DEV_ST [tape (st) driver] CONFIG_BLK_DEV_SR [SCSI cdrom (sr) driver] CONFIG_BLK_DEV_SR_VENDOR [permits vendor specific cdrom commands] CONFIG_SR_EXTRA_DEVS [extra slots for cdroms added later] CONFIG_CHR_DEV_SG [SCSI generic (sg) driver] CONFIG_DEBUG_QUEUES [for debugging multiple queues] CONFIG_SCSI_MULTI_LUN [allow probes above lun 0] CONFIG_SCSI_CONSTANTS [symbolic decode of sense buffer (errors)] CONFIG_SCSI_LOGGING [allow logging to be runtime selected] CONFIG_SCSI_ [numerous lower level adapter drivers] CONFIG_SCSI_DEBUG [lower level driver for debugging] CONFIG_SCSI_PPA [older type parallel port zip drives] CONFIG_SCSI_IMM [newer type parallel port zip drives] CONFIG_BLK_DEV_IDESCSI [ide-scsi pseudo adapter, see section 8.2.1] CONFIG_I2O_SCSI CONFIG_SCSI_PCMCIA CONFIG_USB_STORAGE 4) Boot Parameters ------------------ In the following the LILO boot loader is assumed. Other loaders such as "grub" [see www.gnu.org/software/grub] should be considered if the root partition is a reiserfs or ext3 partition. Some related boot parameters: single {enter single user mode} root=/dev/sda6 {root partition on /dev/sda6 *} devfs=mount {when using devfs, it needs to be mounted} * Even when devfs is in use the initial read-only mount of the root partition is done via the old /dev/sd notation. The joint "root=/dev/sda6 single" may be useful when disk or adapter changes have broken the kernel boot load. 5) Modules and their Parameters ------------------------------- There are many SCSI related modules. The mid and upper level modules are listed below: scsi_mod.o sd_mod.o sr_mod.o st.o sg.o Notice that 3 have "_mod" appended to their normal names. Most module names use the device name followed by ".o". Lower level drivers tend to have names or abbreviations of the HBA's manufacturer (e.g. advansys) plus optionally the chip number of the major controller chip (e.g. sym53c8xx for symbios controllers based on the NCR 53c8?? family of chips). All SCSI modules depend on the mid level. This means if the SCSI mid level is not built into the kernel and if scsi_mod.o has not already been loaded then a command like 'modprobe st' will cause the scsi_mod.o module to be loaded. There could well be other dependencies as well, for example 'modprobe sr_mod' will also cause the cdrom module to be loaded if it hasn't been already. Modules can be loaded with the 'modprobe ' command when will try to load any modules that the nominated depends on. Also does not need the trailing ".o" extension which is assumed if not given. The 'insmod ' command will also try and load but without first loading modules it depends on. 6) Proc pseudo file system -------------------------- The proc pseudo file system provides some very useful information about the SCSI subsystem. [The kernel configuration option is CONFIG_PROC_FS and in almost all cases prof_fs should be built in.] SCSI specific information is found under the directory /proc/scsi. Probably the most commonly accessed entry is 'cat /proc/scsi/scsi' which lists the attached SCSI devices. See section 7.3 for more details. The lower level drivers are allocated proc_fs entries of the form: /proc/scsi// where the is something like "aic7xxx" or "BusLogic". The is to distinguish between different hosts (i.e. HBAs) that may be controlled by the same driver. What is stored at this location is lower level driver dependent (and it may be possible to set parameters via this file). When reporting problems to newsgroups or maintainers it is useful to include the output of that file (e.g. 'cat /proc/scsi/aic7xxx/0') The sg driver provides information about hosts and devices in directory /proc/scsi/sg . See section 8.4.3 . 7) Mid Level, Unifying layer ---------------------------- The SCSI mid level is common to all usage of the SCSI subsystem. Probably its most importatnt roll is to define internal interfaces and services that are used by all other SCSI drivers. These internal mechanisms are not discussed in this document [see ref: W2]. The primary kernel configuration parameter "CONFIG_SCSI" determines whether the mid level is builtin (when "=y") or a module (when "=m"). If "CONFIG_SCSI=m" then all other SCSI subsystem drivers must also be modules. When the mid level is built as a module then it probably never needs to be loaded explicitly because using 'modprobe' to load any other SCSI subsystem module will cause the mid level to be loaded first (if it is not already). 7.1) boot parameters -------------------- SCSI drivers that are built into the kernel are checked in a predetermined order. The user has no control over this order which in most cases is arbitrary but in the case of some older ISA adapters is required to stop misidentification. The recently introduced devfs defines a "scsihosts" boot time parameter to give the user some control over this. See the devfs documentation [ref: W5] for a description. The situation is made more complex by the fact that one lower level driver can detect multiple hosts. The "scsihosts" boot parameter attempts to re-arrange the host list after the host detection phase and before the SCSI bus probe phase that identifies attached devices. scsihosts=host1:hosts2... [>>>>>this doesn't work yet<<<<<<<<<] scsi_logging= where is 0 to turn logging off where is non-zero to turn logging on 7.2) module parameters ---------------------- scsi_logging_level= where is the logging level mask (0 for logging off) scsihosts=host1:hosts2... [>>>>>sensible for modules??<<<<<<<] 7.3) proc interface ------------------- To display the SCSI devices currently attached (and recognized) by the SCSI subsystem use: cat /proc/scsi/scsi The output looks like this: Attached devices: Host: scsi0 Channel: 00 Id: 02 Lun: 00 Vendor: PIONEER Model: DVD-ROM DVD-303 Rev: 1.10 Type: CD-ROM ANSI SCSI revision: 02 Host: scsi1 Channel: 00 Id: 00 Lun: 00 Vendor: IBM Model: DNES-309170W Rev: SA30 Type: Direct-Access ANSI SCSI revision: 03 After the "Attached devices:" line there are 3 lines for each recognized device. The first of these lines is SCSI address information discussed in section 2.1 . The following 2 lines of data are obtained from a INQUIRY command that was performed on the device when it was attached. See section 8.4 for the relationship between the ordering of these devices compared with the sg driver's ordering (which most of the time is the same). Existing devices can be removed using: echo "scsi remove-single-device " > /proc/scsi/scsi where the variables are host, bus (channel), target (scsi id) and lun. The success (or otherwise) of this command can be determined by sending a subsequent 'cat /proc/scsi/scsi' command. The removal will fail if the device is busy (e.g. if a file system on the device is mounted). New devices can be added using echo "scsi add-single-device " > /proc/scsi/scsi where the variables are host, bus (channel), target (scsi id) and lun. The success (or otherwise) of this command can be determined by sending a subsequent 'cat /proc/scsi/scsi' command. The SCSI subsystem does not support hot-plugging of SCSI devices (there are also electrical issues on the associated SCSI bus). It is recommended that those who use add+remove-single-device make sure that other devices on that SCSI bus are inactive if re-plugging is going to take place [caveat emptor]. To output a list of internal SCSI command blocks use: echo "scsi dump " > /proc/scsi/scsi where the numeric value of doesn't matter. This is probably only of interest to people chasing down bugs within the SCSI subsystem. To start (or stop) logging information being sent to the console/log use: echo "scsi log " > /proc/scsi/scsi where is one of: {all, none, error, timeout, scan, mlqueue, mlcomplete, llqueue, llcomplete, hlqueue, hlcomplete, ioctl} and is a number between 0 and 7. The tokens "all" and "none" don't take an argument. Prefix meanings: hl upper level drivers [exception: sg uses "timeout"] ml mid level ll lower level drivers [adapter drivers often have there own flags] The value "0" turns off logging while "7" maximizes the volume of output. Logging information will only be output if CONFIG_SCSI_LOGGING was selected in the kernel build. Warning: "scsi log all" (and several other variants) can cause a logging infinite loop if the log file (typically /var/log/messages) lies on a SCSI disk. One solution to this is to turn off the kernel logging daemon (or direct it to a non SCSI device). 8) Upper level drivers ---------------------- The upper level drivers maintain the kernel side of the OS interface for the logical class of devices they represent (e.g. disks). They are also responsible for managing certain kernel and SCSI subsystem resources such as kernel memory and SCSI command structures. Applications in the user space access these drivers by opening a special file (block or char) typically found in the /dev directory tree. 8.1) Disk driver (sd) --------------------- The sd driver is a block device which means that it is closely associated with the block subsystem. It also supports the concept of partitions. ["man sd" dates from 1992.] 8.1.1) sd boot parameters ------------------------- None. 8.1.2) sd module parameters --------------------------- None. 8.2) CDROM driver (sr or scd) ----------------------------- The SCSI upper level device name is "sr" while "sr_mod" is the module name. The device file name is either "/dev/sr" or "/dev/scd". Following is a diagram illustrating the CDROM subsystem of which sr is a part: |-------------------------------------------------| upper | Uniform CD-ROM layer | level | __________________________ | | | DVD, audio, etc | | |-------------------------------------------------| |------------| |------------| |------------| mid | ide-cd | | sr | | Older | level |ATAPI CD-ROM| |cdroms/dvds | |generic CD | |driver | |block device| |drivers (sb,| | {ide-cd.o} | | {sr_mod.o} | |mitsumi,etc)| |------------| |------------| |------------| |------------| |------------| low | IDE | | SCSI mid | level | sub system | | level | |driver | | | | {ide-cd.o} | |+ SCSI lower| |------------| |------------| Many modern IDE CDROM players and all DVD players use the ATAPI standard which then allows them to be controlled by the SCSI subsystem with the ide-scsi lower level pseudo driver. The default action is for the IDE subsystem to take ownership of all IDE devices and in this case the default driver would be ide-cd.o . Once the IDE subsystem "owns" an ATAPI CDROM player then this excludes the ide-scsi driver from attaching itself to the same device. In order to change this default action see the following boot and module parameters. 8.2.1) sr boot parameters ------------------------- None but following is related: During the boot sequence of the Linux kernel, the IDE devices are scanned before SCSI devices. This means that if both ide-cd and ide-scsi are built in, then the ide-cd driver will claim the device. To override this action use the boot parameter "hd=scsi" where is the appropriate drive letter. This indicates that the device is to managed by ide-scsi instead. 8.2.2) sr module parameters --------------------------- None but following is related: Continuing on from the ATAPI CDROM driver override discussion; for modules this can also be done with "insmod ide-cd ignore=hdb" to exclude that device. If ide-scsi is loaded after that, it will claim hdb. 8.3) Tape driver (st) --------------------- The tape driver interface is documented in the file .../linux/drivers/scsi/README.st and on the st(4) man page (man st). The file README.st also documents the different parameters and options of the driver together with the basic mechanisms used in the driver. 8.3.1) st boot parameters ------------------------- st=xxx[,yyy] where xxx is one of the following: buffer_kbs: write_threshold_kbs: max_buffers: max_sg_segs: (The old boot parameters st=aa[,bb[,cc[,dd]]] supported but deprecated) 8.3.2) st module parameters --------------------------- buffer_kbs= write_threshold_kbs= max_buffers= max_sg_segs= 8.3.3) st proc interface ------------------------ none 8.4) Generic driver (sg) ------------------------ See reference [W4] for the SCSI Generic (sg) driver documentation. For SCSI standards see reference [W1] and for a book on the subject of SCSI programming and pass through mechanisms see reference [3]. Currently the sg documentation focuses on the production version of sg found in the lk 2.2 series. The abridged form is in the file scsi-generic.txt which can also be found in the kernel source at /usr/src/linux/Documentation/scsi-generic.txt . The web site also contains a longer form called scsi-generic_long.txt . This documentation describes what is termed as "version 2" sg. The sg driver in lk 2.4 will have a "version 3" sg driver which adds an additional interface and some new ioctl()s. The most interesting new ioctl() is SG_IO which sends a SCSI command and waits for its response. The additions and differences in the version 3 sg driver are documented on the web site in the file scsi-generic_v3.txt . The abbreviation "sg" is used within the kernel to refer both to the SCSI generic driver and the scatter-gather capability offered by many modern IO devices (usually associated with DMA). The context usually makes it clear which one is being referred to. Note the contorted sg ioctl() named SG_GET_SG_TABLESIZE where the second "SG" refers to scatter gather. The public interface for sg is found in the file: /usr/src/linux/include/scsi/sg.h . Depending on the distribution this may or may not contain the same information as /usr/include/scsi/sg.h which is controlled by the GNU library maintainers. 8.4.1) sg boot parameters ------------------------- The sg driver maintains a reserved buffer for each open file descriptor. The purpose is to guarantee applications that data transfers up to the size of the reserved buffer will not fail for lack of kernel memory. This is important for applications like cdrecord that cannot easily recover (the CDR) from a ENOMEM error. In the absence of the boot parameter 'sg_def_reserved_size' or the sg module parameter 'def_reserved_size', then each time a sg file descriptor is opened the reserved buffer size is inherited from SG_DEF_RESERVED_SIZE which is defined in include/linux/sg.h . The SG_DEF_RESERVED_SIZE define value can be overridden by this kernel boot option: sg_def_reserved_size= 8.4.2) sg module parameters --------------------------- When the sg module is loaded the SG_DEF_RESERVED_SIZE define value can be overridden by supplying this option: def_reserved_size= 8.4.3) sg proc interface ------------------------ All the following files are readable by all with ASCII output. The file 'def_reserved_size' is also writeable by root. The ASCII output has been formatted in such a way as to be human and machine readable (and hence a compromise). Use Unix commands of the form 'cat device_hdrs devices' to see the output of tables. /proc/scsi/sg/debug [current state of sg's file descriptors] /proc/scsi/sg/def_reserved_size [like boot/module load parameter, writeable] /proc/scsi/sg/devices [table of numeric device data] /proc/scsi/sg/device_hdr [column headers for sg/devices] /proc/scsi/sg/device_strs [INQUIRY data for devices] /proc/scsi/sg/hosts [table of numeric host data] /proc/scsi/sg/host_hdr [column headers for sg/hosts] /proc/scsi/sg/host_strs [adapter driver's id string for host] /proc/scsi/sg/version [sg version number and date] 9) Low Level drivers --------------------- Only generalities here or perhaps an example like the aic7xxx adapter. 9.1) Pseudo drivers -------------------- ide-scsi ppa imm usb i2o ppscsi 10) Devfs pseudo file system ---------------------------- The main documentation for devfs can be found at: reference W5. The devfs name conventions for the SCSI subsystem are outlined in section 2.3 . The devfs SCSI node names with their default permissions are: disc rw------- whole disk including mbr part1 rw------- first partition {...p1} ... part15 rw------- 15th partition (absent if no 15th partition) {...p15} cd rw-rw-rw- cd or dvd devices mt rw-rw-rw- tape mode 0 with rewind {...m0} mtl rw-rw-rw- tape mode 1 with rewind {...m1} mtm rw-rw-rw- tape mode 2 with rewind {...m2} mta rw-rw-rw- tape mode 3 with rewind {...m3} mtn rw-rw-rw- tape mode 0 with no rewind {...m0n} mtln rw-rw-rw- tape mode 1 with no rewind {...m1n} mtmn rw-rw-rw- tape mode 2 with no rewind {...m2n} mtan rw-rw-rw- tape mode 3 with no rewind {...m3n} generic rw-r----- These node names are only present if the corresponding device (or subentities of the device (e.g. partitions)) and driver are present. For example if there is no sg driver present then there is no "generic" device name. The strings that appear above in braces are appended to the abridged "c0b0t0u0" notations outlined below as appropriate. The devfs file names that are block or character special files will be called the primary device names in this description. The devfs daemon, called devfsd, introduces many symbolic links to those primary device names. This is done both for backward compatibility and convenience. These symbolic links will be called secondary device names. The secondary device names are controlled by the devfsd configuration file usually found in /etc/devfsd.conf . Following is a list of secondary device names when the default devfsd.conf file is used: Secondary name slink to this primary device name ---------------------------------------------------------------------- /dev/sda /dev/scsi/host0/bus0/target2/lun0/disc /dev/sda1 /dev/scsi/host0/bus0/target2/lun0/part1 /dev/sd/c0b0t2u0 /dev/scsi/host0/bus0/target2/lun0/disc /dev/sd/c0b0t2u0p1 /dev/scsi/host0/bus0/target2/lun0/part1 /dev/sr0 /dev/scsi/host0/bus0/target4/lun0/cd /dev/sr/c0b0t4u0 /dev/scsi/host0/bus0/target4/lun0/cd /dev/st0 /dev/scsi/host1/bus0/target0/lun0/mt /dev/nst0a /dev/scsi/host1/bus0/target0/lun0/mtan /dev/st/c1b0t0u0m0 /dev/scsi/host1/bus0/target0/lun0/mt /dev/st/c1b0t0u0m3n /dev/scsi/host1/bus0/target0/lun0/mtan /dev/sg0 /dev/scsi/host0/bus0/target2/lun0/generic /dev/sg1 /dev/scsi/host0/bus0/target4/lun0/generic /dev/sg2 /dev/scsi/host1/bus0/target0/lun0/generic /dev/sg/c0b0t2u0 /dev/scsi/host0/bus0/target2/lun0/generic /dev/sg/c0b0t4u0 /dev/scsi/host0/bus0/target4/lun0/generic /dev/sg/c1b0t0u0 /dev/scsi/host1/bus0/target0/lun0/generic Note that the more common /dev/scd0 variant for SCSI cdroms is not supported. There are also /dev/discs, /dev/cdroms and /dev/tapes directories that contain symbolic links to all devices (i.e. not just SCSI devices) that fall into that categorization: Secondary name slink to this primary device ---------------------------------------------------------------------- /dev/discs/disc0 /dev/ide/host0/bus0/target0/lun0 * /dev/discs/disc1 /dev/scsi/host0/bus0/target2/lun0 * /dev/cdroms/cdrom0 /dev/ide/host0/bus1/target1/lun0/cd /dev/cdroms/cdrom1 /dev/scsi/host0/bus0/target4/lun0/cd /dev/tapes/tape0 /dev/scsi/host1/bus0/target0/lun0 * Those entries marked with "*" are directories containing the primary devices. Note that IDE devices are listed before SCSI devices. These secondary device names mimic the same persistence rules as the primary device names. So when a SCSI device (?), or it lower level driver or its upper level driver are removed then so are the primary and secondary device names associated with it. Even with devfs mounted as /dev, the old "/dev/sda6" type naming lives on in some contexts. For example if the user wants to change the root partition on a "devfs" machine then something like this is needed: boot: linux root=/dev/sda6 The root partition is first mounted read only and then remounted rw . The read only mount uses major/minor naming (e.g. /dev/sda6) while the subsequent rw mount uses fstab and devfs. There are many device scanning programs that expect to see the pre-devfs device names present and it will some time before they become devfs aware. Also some programs rely on a open of /dev/sg0 (for example) to load the sg driver (assuming it is a module and not already loaded). This can be arranged by an entry in /etc/devfsd.conf of: LOOKUP sg.* MODLOAD and the following in /etc/modules.conf : probeall /dev/sg scsi-hosts sg alias /dev/sg* /dev/sg The sg device permissions can be changed with this entry in devfsd.conf : REGISTER scsi/host.*/bus.*/target.*/lun.*/generic PERMISSIONS 0.0 rw-rw-rw- See "man devfsd" for more information on the possibilities. An application can determine with devfs is active by the presence or otherwise of the file ".devfsd". Appendix A) Common SCSI bus types -------------------------------- SCSI FAST SCSI WIDE SCSI ULTRA SCSI ULTRA WIDE ULTRA 2 WIDE ULTRA 160 WIDE FC-AL Firewire* NON SCSI buses that can use a SCSI (like) transport: ---------------------------------------------------- IDE (ATAPI) USB PC Parallel port I2O Appendix B) Changes between lk 2.2 and 2.4 ------------------------------------------ B.1) General ----------- Multiple (per device) queues rather than single queue used previously. B.2) Mid level changes ---------------------- SCSI_IOCTL_GET_IDLUN {changed} B.3) sd changes --------------- HDIO_GETGEO_BIG {new} SCSI_EMULATED_HOST {new} B.4) sr changes --------------- B.5) st changes --------------- No interface changes. B.6) sg changes --------------- sg_io_hdr {new interface structure} SG_IO {new ioctl} direct IO {present but commented out, see ALLOW_DIO} procfs output {new information in /proc/scsi/sg directory} boot/module parameters {new} Up to 64 bytes of sense data can be obtained from the sg_io_hdr interface structure. Also a residual count associated with the data transfer is available (if the lower level driver supports it, if not the residual count will be 0). Appendix C) performance and debugging tools ------------------------------------------- dd {"man dd"} vmstat {in most distributions, try "man vmstat"} sard {ftp.uk.linux.org/pub/linux/sct/fs/profiling} scsi_debug {low level driver for debugging (no adapter required)} sg_utils {utilities package for sg: www.torque.net/sg} intlat {internal timing info (pegs + interrupt latencies)Morton} ps -eo cmd,wchan Appendix D) Compile options and System calls including ioctls ------------------------------------------------------------- The compile options in this appendix are those which a system administrator might conceivably want to change. Naturally the defaults are chosen so the vast majority of users will not need modify anything. In some cases setting kernel build time options, kernel boot time parameters or module load parameters has the same effect as changing a driver compile time option. System calls act as the interface between application programs and the kernel and its drivers. In the case of the layered driver architecture that the SCSI subsystem uses, the upper layer drivers handle most of the system calls. The SCSI subsystem has a "bubble down" ioctl structure. First the upper level driver associated with the open file descriptor attempts to decode the ioctl. If it doesn't recognize it then the ioctl is passed down to the mid level. If the mid level doesn't recognize it then the ioctl is passed down to the lower level driver associated with the file descriptor. If the lower level driver doesn't recognize it then the EINVAL error is generated. Some ioctls are dispatched to related subsystems. D.1) Mid level -------------------- The following header files in the kernel source are relevant to the mid level: /usr/src/linux/include/scsi/scsi.h /usr/src/linux/include/scsi/scsi_ioctl.h /usr/src/linux/drivers/scsi/scsi.h The first 2 files are meant for external consumption (other than parts in a __KERNEL__ conditional compilation block). They may also be found in /usr/include/scsi directory but it is best not to trust these versions as they are maintained with the glibc library and may lag the kernel version being used. Since in Linux systems /usr/include/linux can be relied upon to be a symbolic link to the kernel source's include area (typically /usr/src/linux/include/linux ) then the following trick is recommended for applications trying to include scsi_ioctl.h : #include The third include file (listed above) is the key internal header file for the SCSI subsystem. As such it will not be discussed here other than to point out it has the same file name (but its in a different directory) as the first include file. This sometimes causes confusion. D.1.1) Mid level compile options -------------------------------- None. D.1.2) Mid level ioctls ----------------------- See the following files: /usr/src/linux/include/scsi/scsi.h Note that the SCSI status constants defined in include/scsi/scsi.h are shifted 1 bit right from the values in the SCSI standards: scsi.h constant value SCSI 2 standard value ------------------------------------------------------------- CHECK_CONDITION 0x1 0x2 CHECK_GOOD 0x2 0x4 BUSY 0x4 0x8 .... Summary of ioctl()s follow: SCSI_IOCTL_SEND_COMMAND This interface is deprecated - users should use the scsi generic (sg) interface instead, as this is a more flexible approach to performing generic SCSI commands on a device. The structure that we are passed should look like: struct sdata { unsigned int inlen; [i] Length of data to be written to device unsigned int outlen; [i] Length of data to be read from device unsigned char cmd[x]; [i] SCSI command (6 <= x <= 12). [o] Data read from device starts here. [o] On error, sense buffer starts here. unsigned char wdata[y]; [i] Data written to device starts here. }; Notes: - The SCSI command length is determined by examining the 1st byte of the given command. There is no way to override this. - Data transfers are limited to PAGE_SIZE (4K on i386, 8K on alpha). - The length (x + y) must be at least OMAX_SB_LEN bytes long to accomodate the sense buffer when an error occurs. The sense buffer is truncated to OMAX_SB_LEN (16) bytes so that old code will not be surprised. - If a Unix error occurs (e.g. ENOMEM) then the user will receive a negative return and the Unix error code in 'errno'. If the SCSI command succeeds then 0 is returned. Positive numbers returned are the compacted SCSI error codes (4 bytes in one int) where the lowest byte is the SCSI status. See the drivers/scsi/scsi.h file for more information on this. SCSI_IOCTL_GET_IDLUN This ioctl takes a pointer to a "struct scsi_idlun" object as its third argument. The "struct scsi_idlun" definition is found in . It gets populated with scsi host, channel, device id and lun data for the given device. Unfortunately that header file "hides" that structure behind a "#ifdef __KERNEL__" block. To use this, that structure needs to be replicated in the user's program. Something like: typedef struct my_scsi_idlun { int four_in_one; /* 4 separate bytes of info compacted into 1 int */ int host_unique_id; /* distinguishes adapter cards from same supplier */ } My_scsi_idlun; "four_in_one" is made up as follows: (scsi_device_id | (lun << 8) | (channel << 16) | (host << 24)) These 4 components are assumed (or masked) to be 1 byte each. SCSI_IOCTL_GET_BUS_NUMBER {new SCSI_IOCTL_GET_IDLUN replaces its need} SCSI_IOCTL_TAGGED_ENABLE {does little, low level responsibility} SCSI_IOCTL_TAGGED_DISABLE {does little, low level responsibility} SCSI_IOCTL_PROBE_HOST SCSI_IOCTL_DOORLOCK SCSI_IOCTL_DOORUNLOCK SCSI_IOCTL_TEST_UNIT_READY SCSI_IOCTL_START_UNIT SCSI_IOCTL_STOP_UNIT SCSI_EMULATED_HOST {same as SG_EMULATED_HOST } D.2) sd driver ------------- D.2.1) sd compile options ------------------------- MAX_RETRIES {5} SD_TIMEOUT {30 seconds} SD_MOD_TIMEOUT {75 seconds} D.2.2) sd ioctls and user interface ----------------------------------- The relevant files to see: include/linux/hdreg.h include/linux/genhd.h include/linux/fs.h A list of ioctl()s follow: HDIO_GETGEO_BIG HDIO_GETGEO [retrieve disk geometry] BLKGETSIZE [number of sectors in device] BLKROSET [set read only flag] BLKROGET [get read only flag] BLKRASET [set read ahead value] BLKRAGET [get read ahead value] BLKFLSBUF [instructs SCSI subsystem to flush buffers] BLKSSZGET BLKPG BLKELVGET BLKELVSET BLKRRPART [reread the partition table] open() (all flags ignored) close() ioctl() (see list above) D.3) sr driver ------------- D.3.1) sr compile options ------------------------- None. D.3.2) sr ioctls and user interface ----------------------------------- See the following files: /usr/src/linux/include/linux/cdrom.h /usr/src/linux/drivers/cdrom/cdrom.c [revision history section] /usr/src/linux/Documentation/cdrom/cdrom-standard.tex Some of the following ioctls are described in cdrom-standard.tex : CDROMCLOSETRAY CDROM_SET_OPTIONS CDROM_CLEAR_OPTIONS CDROM_SELECT_SPEED CDROM_SELECT_DISC CDROM_MEDIA_CHANGED CDROM_DRIVE_STATUS CDROM_CHANGER_NSLOTS CDROM_LOCKDOOR CDROM_DEBUG CDROM_GET_CAPABILITY DVD_READ_STRUCT DVD_WRITE_STRUCT DVD_AUTH CDROM_SEND_PACKET CDROM_NEXT_WRITABLE CDROM_LAST_WRITTEN The O_NONBLOCK flag on the open() of scd devices is important. Without it the open() will wait until there is media in the device before returning. open() O_NONBLOCK close() read() write() ioctl() D.4) st driver ------------- D.4.1) st compile options ------------------------- Most of the following compile options can be overridden with boot/module parameters and/or runtime configuration (ioctl). The following parameters are defined in linux/drivers/scsi/st_options.h ST_NOWAIT {0} ST_IN_FILE_POS {0} ST_RECOVERED_WRITE_FATAL {0} ST_DEFAULT_BLOCK {0} ST_BUFFER_BLOCKS {32} ST_WRITE_THRESHOLD_BLOCKS {30} ST_MAX_BUFFERS {4} ST_MAX_SG {16} ST_FIRST_SG {8} ST_FIRST_ORDER {5} ST_TWO_FM {0} ST_BUFFER_WRITES {1} ST_ASYNC_WRITES {1} ST_READ_AHEAD {1} ST_AUTO_LOCK {0} ST_FAST_MTEOM {0} ST_SCSI2LOGICAL {0} ST_SYSV {0} The following parameters are defined in linux/drivers/scsi/st.c ST_TIMEOUT {900*HZ} ST_LONG_TIMEOUT {14000*HZ} D.4.2) st ioctls and user interface ----------------------------------- The Linux tape interface is defined in /usr/src/linux/include/linux/mtio.h The following ioctl()s are listed in alphabetical order with a brief explanation to the right. [See st documentation for more details.] MTIOCTOP [execute tape commands and set drive/driver options] MTIOCGET [get the status of the drive] MTIOCPOS [get the current tape location] open() O_RDONLY, O_RDWR close() read() write() ioctl() D.5) sg driver ------------- The following header files in the kernel source are relevant to the mid level: /usr/src/linux/include/scsi/sg.h As pointed out in section D.1 this is best included in apllications using: #include D.5.1) sg compile options ------------------------- Here are some defines from the sg.h file that the user could conceivably want to change. The current default values are shown in braces on the right: SG_SCATTER_SZ {32768} SG_DEF_RESERVED_SIZE {SG_SCATTER_SZ} SG_DEF_FORCE_LOW_DMA {0} SG_DEF_FORCE_PACK_ID {0} SG_DEF_KEP_ORPHAN {0} SG_MAX_QUEUE {16} SG_DEFAULT_RETRIES {1} SG_BIG_BUFF {SG_DEF_RESERVED_SIZE} SG_DEFAULT_TIMEOUT {60 seconds} SG_DEF_COMMAND_Q {0 but set to 1 when sg_io_hdr structure seen} SG_DEF_UNDERRUN_FLAG {0} D.5.2) sg ioctls and user interface ----------------------------------- The following ioctl()s are listed in alphabetical order with a brief explanation to the right. [See sg documentation for more details.] SG_EMULATED_HOST [indicate if adapter is ide-scsi] SG_GET_COMMAND_Q [get state of command queuing flag] SG_GET_KEEP_ORPHAN [state of interrupted SG_IO keep orphan flag] SG_GET_LOW_DMA [state of "low dma flag" (<= 16 MB on i386)] SG_GET_NUM_WAITING [number of responses waiting to be read()] SG_GET_PACK_ID [pack_id of next to read() response (-1 if none)] SG_GET_REQUEST_TABLE [yields array of requests being processed] SG_GET_RESERVED_SIZE [current size of reserved buffer for this fd] SG_GET_SCSI_ID [a little more info than SCSI_IOCTL_GET_IDLUN] SG_GET_SG_TABLESIZE [max elements in host's scatter gather table] SG_GET_TIMEOUT [yields timeout (unit: jiffies (10ms on i386))] SG_GET_TRANSFORM [state of ide-scsi's transform flag] SG_IO [send given SCSI command and wait for response] SG_NEXT_CMD_LEN [change command length of next command] SG_SCSI_RESET [send a SCSI bus, device or host reset] SG_SET_COMMAND_Q [set command queuing state {old=0, new=1}] SG_SET_DEBUG [set debug level {0}] SG_SET_KEEP_ORPHAN [set SG_IO's keep orphan flag {0}] SG_SET_FORCE_LOW_DMA [force DMA buffer low (<= 16 MB on i386) {0}] SG_SET_FORCE_PACK_ID [so read() can fetch by pack_id {0}] SG_SET_RESERVED_SIZE [change default buffer size {SG_DEF_RESERVED_SIZE}] SG_SET_TIMEOUT [change current timeout {60 secs} ] SG_SET_TRANSFORM [set ide-scsi's ATAPI transform flag {0}] open() [recognized oflags: O_RDONLY, O_RDWR, O_EXCL, O_NONBLOCK] close() read() write() ioctl() poll() [used when in O_NONBLOCK mode] fasync() [associated with generation of SIGIO signal for read()] Appendix E) References and Credits ---------------------------------- Web: [W1] SCSI (draft) standards, resources [W2] Eric Youngdale's site - Eric is the chief architect of the Linux SCSI subsystem [W3] Jens Axboe's site - Jens maintains the cdrom subsystem which includes sr [W4] the author's scsi generic (sg) site [W5] Richard Gooch's devfs site: Newsgroups: linux-scsi@vger.rutgers.edu [reflector] linux-kernel@vger.rutgers.edu [reflector] Books: [1] "Linux Device Drivers" by A. Rubini [O'Reilly 1998 ISBN 1-56592-292-1] - solid text book on subject circa lk 2.2/2.2 [new edition?] [2] "Running Linux" 3rd edition by M. Welsh, M. K. Dalheimer & L. Kaufman [O'Reilly 1999 ISBN 1-56592-469-X] - classic Linux tome which includes some SCSI configuration info [3] "The Programmer's Guide to SCSI" by B. Sawert [Addison Wesley 1998 ISBN 0-201-18538-5] - covers many topics, including Linux and ASPI/32 Credits: This document was started by Douglas Gilbert with contributions from: - Kai Makisara (st) - Jens Axboe (sr) - Richard Gooch (devfs)