A Discussion on the SG device driver in Linux 2.2.6 =================================================== Douglas Gilbert 11th May 1999 Introduction ------------ In kernel 2.2.6 Linux received a new SCSI generic packet driver (sg) which was the first significant upgrade since the original driver was released by Lawrence Foard in 1992. The interface (ie sg.h) is constrained by the requirement to be backward compatible with the original interface which, in spite of its failing, is used by a very large number of people. So the exercise is to add some useful information while minimizing backward compatibility problems. These are conflicting aims. The bulk of the changes are in the implementation (ie sg.c) adding per file descriptor sequencing, command queuing and scatter gather. Another important addition is to handle the more difficult kernel memory environment that Linux presents in the 2.2 series of kernels (compared with the 2.0 series). Even the critics of the interface seem happy to accept these changes. Some discussion has arisen about the interface and its 36 byte long structure called "sg_header". The original header is presented followed by the 2 proposed "enhanced" headers to replace it. This document attempts to demonstrate that they differ in the degree of backward compatibility while offering essentially the similar amount of extra useful information/capabilities. Jörg Schilling has produced a similar document which readers are invited to examine. It presents an alternate viewpoint. It can be found at: http://www.fokus.gmd.de/research/cc/glone/employees/joerg.schilling/private/sg.txt A sadder, less structured presentation can be found at: http://www.fokus.gmd.de/research/cc/glone/employees/joerg.schilling/private/linuxscsi.html The original header: HEADER 0 ----------------------------- struct sg_header { int pack_len; /* [o] by implementation, [i] by documentation */ int reply_len; /* [i] */ int pack_id; /* [io], untouched by implementation */ int result; /* [o] */ unsigned int twelve_byte:1; /* [i] */ unsigned int other_flags:31; /* "not used" according to documentation */ /* also described as "for future use" */ unsigned char sense_buffer[16]; /* [o] */ }; This interface was written by Lawrence Foard in 1992. The "documentation" referred to is the SCSI Programming HOWTO v1.5 written by Heiko Eißfeldt on 7th May 1996. It is still the official LDP HOWTO on this driver at time of writing. When "documentation" is referred to below it is a reference to this document. Proposed HEADER A ----------------- struct sg_header { /* * cmd_status contains: * driver_byte << 24 | host_byte << 16 | msg_byte << 8 | status_byte */ #define sg_cmd_status pack_len int pack_len; /* [i] length of incoming packet (including header) / [o] command status */ int reply_len /* [i] maximum length of expected reply / [o] actual transfer count */ int pack_id; /* [i/o] id number of packet */ int result; /* [o] 0==ok, otherwise refer to errno codes */ unsigned int twelve_byte :1; /* [i] OBSOLETE: Force 12 byte command length (unused if want_new is set) */ unsigned int want_new :1; /* [i] User requests new behavior */ unsigned int grant_new :1; /* [o] Driver grants new behavior */ unsigned int cdb_len :5; /* [i] Command descriptor block length 6..31 (mandatory) */ unsigned int sense_len :5; /* [i/o] Set max / get actual sense length 0..31 (mandatory) */ unsigned int other_flags :19; /* for future use */ unsigned char sense_buffer[SG_MAX_SENSE];/* [o] used only by reads */ /* XXX mid-level currently only allocates 16 bytes to sense data, * XXX this violates CCS, SCSI-2, SCSI-3, but upgrading would * XXX break sg binary compatibility. If the mid-level is corrected, * XXX we need to introduce a ioctl to read the sense data. */ /* command follows then data for command */ }; This interface and the implementation it was originally distributed with bears Heiko Eißfeldt's copyright. It is said that it first appeared in April 1998. Of late it has been promoted by Jörg Schilling. Proposed HEADER B ----------------- struct sg_header { int pack_len; /* [o] reply_len (ie useless) ignored as input */ int reply_len; /* [i] max length of expected reply (inc. sg_header) */ int pack_id; /* [io] id number of packet (use ints >= 0) */ int result; /* [o] 0==ok, else (+ve) Unix errno code (e.g. EIO) */ unsigned int twelve_byte:1; /* [i] Force 12 byte command length for group 6 & 7 commands */ unsigned int target_status:5; /* [o] scsi status from target */ unsigned int host_status:8; /* [o] host status (see "DID" codes) */ unsigned int driver_status:8; /* [o] driver status+suggestion */ unsigned int other_flags:10; /* unused */ unsigned char sense_buffer[SG_MAX_SENSE]; /* [o] Output in 3 cases: when target_status is CHECK_CONDITION or when target_status is COMMAND_TERMINATED or when (driver_status & DRIVER_SENSE) is true. */ }; /* This structure is 36 bytes long on i386 */ This interface is written by the author and although not publicly released until January 1999, it was presented to Eric Youngdale and Jörg Schilling around September 1998. It was developed without knowledge of HEADER A. A Comparison of HEADER A and HEADER B ------------------------------------- The first point to make is that neither interface is anywhere near ideal since they are attempting to squeeze a bit of extra functionality out of an existing interface in a backward compatible way. A better approach may be a dual interface: one interface with little or no extra functionality (eg HEADER 0 or HEADER B) together with a completely new header that just happened to have an interface identify number in the position corresponding to 'reply_len' which had a negative value. The implementation knows that 'reply_len' is always given as input and should be positive for the current interface. Hence a negative value could flag a completely different interface, one with more in common with CAM, or at least one that could support scatter gather in the user space and direct io. Now to return to a comparison between HEADER A and B. An obvious observation is that HEADER A contains more than HEADER B. To make this a little fairer the features shown in HEADER A that cannot be implement in the 2.2 series of kernel should be removed (without a major rework of the SCSI mid level which is slated for the 2.3 series of kernels. This would remove the following: - ability to return DMA count - set max / get actual sense length No implementation offered with HEADER A offers these features (because they are not available). This leaves one extra feature (ability to give cdb length) and a different way of reporting errors as the major differences. In his document concerning HEADER B when compared to HEADER A Jörg Schilling states: "This interface has the following advantages compared to the interface from Heiko and me: % none" On the face of this it is true. Now for a closer look. Criticism of HEADER A --------------------- HEADER A is supposed to be backward compatible with HEADER 0 but HEADER A compromises that backward compatibility to squeeze more functionality. The 'cdb_len' extra functionality is also flawed. The 'pack_len' variable is not suitable for an input variable since its name suggests it is input as does the documentation but the original implementation ignores it as input and put an output value in it! This leaves 2 bone fide input variables 'reply_len' and 'twelve_byte'. So if these are used how do you get a new 'cdb_len' 5 bit input variable into such an interface? In my opinion the answer to this is by living dangerously. A previously "not used" 31 bit field called 'other_flags' is used in certain bit positions as input. The danger is the buffer used to hold HEADER A is either automatic (on the stack) or malloc-ed (on the heap) and thus uninitialized. This point has been discussed in the linux-scsi list in January and February 1999. Jörg Schilling has stated that all existing implementations should have interpreted "not used" (or "reserved") as 'other_flags = 0;'. Even this is not sufficient as will be demonstrated later. The only safe approach is to "memset" the buffer holding HEADER A to zero before dispatching it. This is not documented. The documentation gives example code that does not initialize HEADER 0 but is "safe" in this context because a static array at file scope holds the buffer (hence C initializes that array to 0). The risk in this case is that some applications already in use based on HEADER 0 do not arrange for other_flags to be zeroed. Jörg Schilling has stated that no such significant applications exist. It is ironic that the remnant 19 bit 'other_flags' field in HEADER A bears the comment "reserved" and not "to be zeroed by app"! The 'cdb_len' field is a more direct way of specifying the SCSI command length than the existing method of using the first byte of the command and allowing group 6 and 7 command lengths to be overridden (from 10 to 12 bytes). It does not seem that there are a lot of vendor specific SCSI commands that need 'cdb_len' but it would be "nice to have". Are there any risks to backward compatibility? Well there is the one explained above (ie pushing input data through a previously "not used" field) and an additional one. The buffer to write() on a sg device takes a HEADER [0, A or B], the SCSI command followed by optional data. There is a misalignment possibility on this buffer when application using HEADER A tries to use the original driver. For example if a vendor has a group 7 SCSI "write-type" command that is 6 bytes long then the 'cdb_len' would be 6, the 6 byte SCSI command would follow HEADER A and the data associated with the write would follow that. What would the original driver make of this? It would interpret the SCSI command as being 10 bytes long (since it is group 7) and do a misaligned write. Another interesting point comes out of the above example. Unless an application based on HEADER A arranges to have a 0 'grant_new' (ie by assignment or memset) then it will read back a random bit because the original driver doesn't touch that bit. Is this obvious when 'grant_new' is documented as being "[o]" (ie output)? Of the 3 criticisms brought up in this section, the first is addressed by a statement of faith about existing applications, the second can be addressed by using runtime selection of the interface and the third by the judicious use of memset. It is also noted that the Linux transport layer in the latest cdrecord (1.8a21) still uses compile time selection to choose its interface. Runtime selection also makes redundant the 'want_new'/'grant_new' flags in HEADER A. A criticism of the compacted error value is deferred till the next section. Criticism of HEADER B --------------------- This header has less extensions to HEADER 0, using previously "not used" bits in 'other_flags' to output various error codes. It presents no backward compatibility problem to applications compiled against HEADER 0 and using a Linux kernel with a driver using HEADER B. In 2 months of external testing (since 6th March 1999 in the "ac" series and since 16th April 1999 in the full Linux kernel (2.2.6)) no problems have been presented in this area. Documentation for the driver associated with HEADER B can be found at: http://www.torque.net/sg/p/scsi-generic.txt [abridged] http://www.torque.net/sg/p/scsi-generic_long.txt and web pages at: http://www.torque.net/sg In the main page is a link to the latest utilities and test programs. More care is needed when an application tries to use HEADER B since there is a high probability that it will be run on the original driver (HEADER 0). Runtime selection is recommended in its documentation and is used by several of its utilities (example programs). These utilities (eg sg_dd512 which is a "dd" variant) run on both the new and original driver by using runtime selection to determine which driver they actually have. The recommended method of runtime selection involves calling an ioctl only present in the new driver and noting whether an error occurs. If compile time techniques were used then problems could arise from an application compiled with HEADER B but run on the original driver. The error/status values (eg 'target_status') would not be written by the driver. Hopefully this is made clear in the documentation and reasonably obvious to application writers. Similar error information is passed back to applications by HEADER A and HEADER B which are almost totally absent from HEADER 0. Header A passes back a compacted error integer via 'pack_len'. When the 4 component parts are broken out separately, the logical mapping is HEADER A HEADER B -------- -------- status_byte target_status [== ((status_byte >> 1) & 0x1f) ] msg_byte host_byte host_status driver_byte driver_status HEADER B outputs its 3 values in separate fields while HEADER A outputs its 4 values as byte masks within an integer. The latter approach is exactly the same way that various levels within the SCSI sub-system exchange error information. What if it changes in Linux 2.3 or beyond? In the following paragraphs points raised by Jörg Schilling are individually addressed. His points are shown indented. "It" will usually refer to HEADER B or something closely related to it. > It hides the SCSI message byte from the user. > As Douglas Gilbert's "driver enhancements" are intended to support > tagged command queuing, knowledge of the message byte highly recommended. The 'msg_byte' is a status code associated the the SCSI message level which logically sits below the SCSI command level. Most Linux low level drivers don't return the 'msg_byte' to the mid level and if they do and it is anything other than COMMAND_COMPLETE (0x0) then it is treated as an error. Architecturally the messaging protocol should be handled by the low level device driver. Put simply, sg sees zero for this value. > Only 5 of the 8 SCSI status bits are available from user space. As the > hidden bits from the status byte are defined "reserved" by the SCSI > standard, it could happen that the SCSI standard defines these bits to > be included in future standards. Creating an interface with such > limitations will then prevent access to the newly defined information. Interesting point. Due to almost 15 years of abuse to the scsi_byte it is extremely unlikely that those reserved bits will ever be used. What is going to happen is that the 9 defined values (in SCSI 2) out of the 32 available are going to be be expanded. The "abuse" referred to is code like this: #define ST_CHK_COND 0x2 if (ST_CHK_COND == status_byte) .... which assumes the 3 reserved bits a zero. It should read: if (ST_CHK_COND == (0x3e & status_byte)) .... Masking is hard work. Then there is the clever code: if (status_byte & ST_CHK_COND) .... It is clever because is picking up both CHECK CONDITION and COMMAND TERMINATED (due to the latter having the code 0x22). It is not so clever when new statuses are added that use the bit in question. [The latter line of code can be found in cdrecord 1.8a21 libscg/scsi-linux-sg.c line 756.] Yes, the 'target_status' is not the same bit pattern as the SCSI status byte but it is clearly documented as such and the header file has appropriately matching constants (eg #define CHECK_CONDITION 0x1). The fact that macros and constants already existed in the Linux SCSI sub-system suggested that its architects wanted it to be that way. > It does not allow to specify vendor unique SCSI cdb lengths. This would be nice to have. It is discussed above. > It does not allow the user to see whether SCSI sense data is actually > available. The only way to check for arrived sense data is to check for > nonzero fields inside the sense buffer. It does. See its documentation. > The new ioctl SG_SET_FORCE_LOW_DMA is something that should never be > accessible from user space. Note that such an user interface may fail to > port to other Linux platforms (like Sparc etc.). In a perfect world Jörg Schilling is correct. It is placed there for backward compatibility: to force the sg implementation to obtain all its memory below the 16MB limit (on i386) since this is what the original did. That ioctl is not the only mode changing ioctl and default for forcing old behaviour. The following defaults are taken from the new sg.h : /* Default modes, commented if they differ from original sg driver */ #define SG_DEF_COMMAND_Q 0 #define SG_DEF_MERGE_FD 0 /* was 1 -> per device sequencing */ #define SG_DEF_FORCE_LOW_DMA 0 /* was 1 -> memory below 16MB on i386 */ #define SG_DEF_FORCE_PACK_ID 0 #define SG_DEF_UNDERRUN_FLAG 0 Associated with each one of these is a SET and GET ioctl to manipulate them on a per file (or per device) basis. They have proved quite useful. Cdparanoia III Alpha 9.4 managed to find a way to detect command queuing was present. Rather than throw it out, it was defaulted off. The qualification on the 16MB ISA DMA problem is meant to imply that it primarily an "i386" architecture problem. The sg driver makes the standard kernel calls to obtain memory. The controlling #define in the kernel is called MAX_DMA_ADDRESS and is architecture dependent. On both the sparc and sparc64 this is the around 32 bits. With luck, no application may need to call SG_SET_FORCE_LOW_DMA and even if one did then on many architectures it would have no effect. It is noteworthy that the SG_SET_FORCE_DMA ioctl is in the driver now being offered by Jörg Schilling. Implementation -------------- The implementation associated with the August 1998 version of the driver based on HEADER A added slightly better buffer capabilities to the application. It had the unfortunate side effect of making kernel memory allocation more difficult within the still-emerging 2.2 kernel. The latter problem was the very reason a new sg driver based on HEADER B was being considered for inclusion in February 1999. In early March 1999 the sg driver based on HEADER B was included in Alan Cox's linux test version 2.2.2-ac6 . It remained in every "ac" version until it went "live" in 2.2.6 on 16th April 1999, 6 weeks and several bug fixes after it had first appeared in the "ac" version. In mid-March 1999, about 2 weeks after the new driver first appeared in an "ac" version, Jörg Schilling released a new version of the driver based on HEADER A but this time, it had almost the same implementation as HEADER B. A careful reading of the "sg.c" file in question (or the use of "diff") will confirm this. Is there anything wrong with this? Not really, this is what the free software movement is about. However, in this case, there is at least one technical problem and several practical ones though. That implementation reserves kernel buffers for DMA purposes on a per file descriptor basis. The amount actually reserved depends on several dynamic variables (not least that memory is a limited resource). Jörg Schilling in his "hybrid" driver reports (via the ioctl SG_GET_BUFSIZE) a constant (SG_BIG_BUFF) instead of the variable. This constant is then used by his linux transport layer in cdrecord. The author has pointed this out on several occasions, but this does not match the cdrecord transport level architecture so it is "incorrect". Then there are the practical issues of who will support that "hybrid" driver and the 8 weeks of fine tuning, documentation, utility and test program work since Jörg Schilling forked the development. To date there has been no attempt to track the implementation of the released driver. Jörg Schilling's stated aim (see sg.h in his new version) is that all applications based on the sg device use his transport layer (called libscg). The "abstraction" that this transport layer offers is to make all other SCSI generic interfaces including CAM and NT's ASPI, look like a SCSI generic packet driver called scg that Jörg Schilling wrote in 1986 for SUN hardware. The danger here is that questionable design decisions in libscg become reflected in the sg device driver and hence are forced on other sg-based applications (eg SANE and cdparanoia) that are disinclined to use libscg. Conclusion ---------- Even though HEADER A was first released publicly in the middle of 1998 it was built on top of an implementation that did not supply the requested enhancements. For whatever reasons it was not accepted at the time into development kernel (2.1.x). The author played no part in this decision making process but in private email (August 1998) did point out some design weaknesses. After months of inaction and the release of the 2.2 series of kernels, the author's implementation based on HEADER B was "dusted off" and tested by people who were having real problems (usually memory related) with the original sg driver. Alan Cox noticed this and put the new driver in 2.2.2-ac6. Within weeks Jörg Schilling re-entered the fray with a driver using HEADER A based on the same implementation. The extensions offered by both headers are not mutually compatible, still an attempt was made to merge them which failed. Jörg Schilling put his technical objections up quite forcibly and was listened to. Those technical objections have been the subject of this paper. Those objections were either considered not to have merit or not be sufficient to change the sg driver under test.