A Discussion on the SG device driver in Linux 2.2.6
        ===================================================
                                                        Douglas Gilbert 
                                                        11th May 1999

Introduction
------------
In kernel 2.2.6 Linux received a new SCSI generic packet driver (sg) which was
the first significant upgrade since the original driver was released by
Lawrence Foard in 1992.

The interface (ie sg.h) is constrained by the requirement to be backward
compatible with the original interface which, in spite of its failing, is
used by a very large number of people. So the exercise is to add some useful
information while minimizing backward compatibility problems. These are 
conflicting aims.

The bulk of the changes are in the implementation (ie sg.c) adding per file
descriptor sequencing, command queuing and scatter gather. Another important
addition is to handle the more difficult kernel memory environment that Linux
presents in the 2.2 series of kernels (compared with the 2.0 series). Even 
the critics of the interface seem happy to accept these changes.

Some discussion has arisen about the interface and its 36 byte long structure
called "sg_header". The original header is presented followed by the 2 
proposed "enhanced" headers to replace it. This document attempts to 
demonstrate that they differ in the degree of backward compatibility while 
offering essentially the similar amount of extra useful 
information/capabilities.

Jörg Schilling has produced a similar document which readers are invited
to examine. It presents an alternate viewpoint. It can be found at:
http://www.fokus.gmd.de/research/cc/glone/employees/joerg.schilling/private/sg.txt
A sadder, less structured presentation can be found at:
http://www.fokus.gmd.de/research/cc/glone/employees/joerg.schilling/private/linuxscsi.html


The original header: HEADER 0
-----------------------------

struct sg_header {
    int pack_len;   /* [o] by implementation, [i] by documentation */
    int reply_len;  /* [i] */
    int pack_id;    /* [io], untouched by implementation */
    int result;     /* [o] */
    unsigned int twelve_byte:1;  /* [i] */
    unsigned int other_flags:31; /* "not used" according to documentation */
                                 /* also described as "for future use" */
    unsigned char sense_buffer[16]; /* [o] */
};

This interface was written by Lawrence Foard in 1992.
The "documentation" referred to is the SCSI Programming HOWTO v1.5
written by Heiko Eißfeldt on 7th May 1996. It is still the official
LDP HOWTO on this driver at time of writing. When "documentation"
is referred to below it is a reference to this document.


Proposed HEADER A
-----------------

struct sg_header {
/*
 * cmd_status contains:
 * driver_byte << 24 | host_byte << 16 | msg_byte << 8 | status_byte
 */
#define sg_cmd_status   pack_len
  int pack_len;      /* [i] length of incoming packet (including header) 
                        / [o] command status */
  int reply_len      /* [i] maximum length of expected reply 
                        / [o] actual transfer count */
  int pack_id;       /* [i/o] id number of packet */
  int result;        /* [o] 0==ok, otherwise refer to errno codes */
  unsigned int twelve_byte :1;  /* [i] OBSOLETE: Force 12 byte command 
                                   length (unused if want_new is set) */
  unsigned int want_new    :1;  /* [i] User requests new behavior */
  unsigned int grant_new   :1;  /* [o] Driver grants new behavior */
  unsigned int cdb_len     :5;  /* [i] Command descriptor block 
                                   length 6..31 (mandatory) */
  unsigned int sense_len   :5;  /* [i/o] Set max / get actual sense 
                                   length 0..31 (mandatory) */
  unsigned int other_flags :19; /* for future use */
  unsigned char sense_buffer[SG_MAX_SENSE];/* [o] used only by reads */
        /* XXX mid-level currently only allocates 16 bytes to sense data,
         * XXX this violates CCS, SCSI-2, SCSI-3, but upgrading would
         * XXX break sg binary compatibility. If the mid-level is corrected,
         * XXX we need to introduce a ioctl to read the sense data.
         */
  /* command follows then data for command */
};

This interface and the implementation it was originally distributed
with bears Heiko Eißfeldt's copyright. It is said that it first 
appeared in April 1998. Of late it has been promoted by Jörg Schilling. 


Proposed HEADER B
-----------------

struct sg_header
{
    int pack_len;    /* [o] reply_len (ie useless) ignored as input */
    int reply_len;   /* [i] max length of expected reply (inc. sg_header) */
    int pack_id;     /* [io] id number of packet (use ints >= 0) */
    int result;      /* [o] 0==ok, else (+ve) Unix errno code (e.g. EIO) */
    unsigned int twelve_byte:1;
        /* [i] Force 12 byte command length for group 6 & 7 commands  */
    unsigned int target_status:5;   /* [o] scsi status from target */
    unsigned int host_status:8;     /* [o] host status (see "DID" codes) */
    unsigned int driver_status:8;   /* [o] driver status+suggestion */
    unsigned int other_flags:10;    /* unused */
    unsigned char sense_buffer[SG_MAX_SENSE]; /* [o] Output in 3 cases:
           when target_status is CHECK_CONDITION or 
           when target_status is COMMAND_TERMINATED or
           when (driver_status & DRIVER_SENSE) is true. */
};      /* This structure is 36 bytes long on i386 */

This interface is written by the author and although not publicly
released until January 1999, it was presented to Eric Youngdale
and Jörg Schilling around September 1998. It was developed
without knowledge of HEADER A.


A Comparison of HEADER A and HEADER B
-------------------------------------

The first point to make is that neither interface is anywhere near
ideal since they are attempting to squeeze a bit of extra functionality
out of an existing interface in a backward compatible way. A
better approach may be a dual interface: one interface with little
or no extra functionality (eg HEADER 0 or HEADER B) together with
a completely new header that just happened to have an interface
identify number in the position corresponding to 'reply_len'
which had a negative value. The implementation knows that 'reply_len'
is always given as input and should be positive for the current
interface. Hence a negative value could flag a completely different
interface, one with more in common with CAM, or at least one that
could support scatter gather in the user space and direct io.

Now to return to a comparison between HEADER A and B. An obvious 
observation is that HEADER A contains more than HEADER B. To make
this a little fairer the features shown in HEADER A that cannot
be implement in the 2.2 series of kernel should be removed (without
a major rework of the SCSI mid level which is slated for the
2.3 series of kernels. This would remove the following:
        - ability to return DMA count
        - set max / get actual sense length
No implementation offered with HEADER A offers these features (because
they are not available).

This leaves one extra feature (ability to give cdb length) and a
different way of reporting errors as the major differences.

In his document concerning HEADER B when compared to HEADER A Jörg
Schilling states:
"This interface has the following advantages compared to the interface 
from Heiko and me:

        % none"

On the face of this it is true. Now for a closer look. 


Criticism of HEADER A
---------------------

HEADER A is supposed to be backward compatible with HEADER 0 but HEADER A
compromises that backward compatibility to squeeze more functionality.
The 'cdb_len' extra functionality is also flawed.

The 'pack_len' variable is not suitable for an input variable since its name
suggests it is input as does the documentation but the original implementation
ignores it as input and put an output value in it! This leaves 2 bone fide 
input variables 'reply_len' and 'twelve_byte'. So if these are used how do 
you get a new 'cdb_len' 5 bit input variable into such an interface?

In my opinion the answer to this is by living dangerously. A previously 
"not used" 31 bit field called 'other_flags' is used in certain bit positions
as input. The danger is the buffer used to hold HEADER A is either automatic
(on the stack) or malloc-ed (on the heap) and thus uninitialized. This point
has been discussed in the linux-scsi list in January and February 1999. Jörg
Schilling has stated that all existing implementations should have interpreted
"not used" (or "reserved") as 'other_flags = 0;'. Even this is not sufficient
as will be demonstrated later. The only safe approach is to "memset" the
buffer holding HEADER A to zero before dispatching it. This is not documented.
The documentation gives example code that does not initialize HEADER 0 but 
is "safe" in this context because a static array at file scope holds the 
buffer (hence C initializes that array to 0).

The risk in this case is that some applications already in use based on 
HEADER 0 do not arrange for other_flags to be zeroed. Jörg Schilling has
stated that no such significant applications exist. It is ironic that the
remnant 19 bit 'other_flags' field in HEADER A bears the comment "reserved"
and not "to be zeroed by app"!

The 'cdb_len' field is a more direct way of specifying the SCSI command 
length than the existing method of using the first byte of the command and
allowing group 6 and 7 command lengths to be overridden (from 10 to 12 bytes).
It does not seem that there are a lot of vendor specific SCSI commands that
need 'cdb_len' but it would be "nice to have". Are there any risks to backward
compatibility?  Well there is the one explained above (ie pushing input data 
through a previously "not used" field) and an additional one. The buffer to
write() on a sg device takes a HEADER [0, A or B], the SCSI command followed
by optional data. There is a misalignment possibility on this buffer when
application using HEADER A tries to use the original driver. For example if
a vendor has a group 7 SCSI "write-type" command that is 6 bytes long then 
the 'cdb_len' would be 6, the 6 byte SCSI command would follow HEADER A
and the data associated with the write would follow that. What would the
original driver make of this? It would interpret the SCSI command as
being 10 bytes long (since it is group 7) and do a misaligned write.

Another interesting point comes out of the above example. Unless an
application based on HEADER A arranges to have a 0 'grant_new' (ie by 
assignment or memset) then it will read back a random bit because the 
original driver doesn't touch that bit. Is this obvious when 'grant_new'
is documented as being "[o]" (ie output)?

Of the 3 criticisms brought up in this section, the first is addressed by
a statement of faith about existing applications, the second can be
addressed by using runtime selection of the interface and the third by
the judicious use of memset. It is also noted that the Linux transport
layer in the latest cdrecord (1.8a21) still uses compile time selection
to choose its interface. Runtime selection also makes redundant the 
'want_new'/'grant_new' flags in HEADER A.

A criticism of the compacted error value is deferred till the next section.


Criticism of HEADER B
---------------------

This header has less extensions to HEADER 0, using previously "not used"
bits in 'other_flags' to output various error codes. It presents no
backward compatibility problem to applications compiled against HEADER 0
and using a Linux kernel with a driver using HEADER B. In 2 months of
external testing (since 6th March 1999 in the "ac" series and since
16th April 1999 in the full Linux kernel (2.2.6)) no problems have been
presented in this area.

Documentation for the driver associated with HEADER B can be found at:
http://www.torque.net/sg/p/scsi-generic.txt          [abridged]
http://www.torque.net/sg/p/scsi-generic_long.txt 
and web pages at:
http://www.torque.net/sg 
In the main page is a link to the latest utilities and test programs.

More care is needed when an application tries to use HEADER B since there
is a high probability that it will be run on the original driver (HEADER 0).
Runtime selection is recommended in its documentation and is used by several
of its utilities (example programs). These utilities (eg sg_dd512 which is
a "dd" variant) run on both the new and original driver by using runtime 
selection to determine which driver they actually have. The recommended method
of runtime selection involves calling an ioctl only present in the new driver 
and noting whether an error occurs. If compile time techniques were used
then problems could arise from an application compiled with HEADER B but
run on the original driver. The error/status values (eg 'target_status')
would not be written by the driver. Hopefully this is made clear in the
documentation and reasonably obvious to application writers.

Similar error information is passed back to applications by HEADER A and
HEADER B which are almost totally absent from HEADER 0. Header A passes back
a compacted error integer via 'pack_len'. When the 4 component parts
are broken out separately, the logical mapping is
    HEADER A            HEADER B
    --------            --------
    status_byte         target_status [== ((status_byte >> 1) & 0x1f) ]
    msg_byte            <not available>
    host_byte           host_status
    driver_byte         driver_status

HEADER B outputs its 3 values in separate fields while HEADER A outputs its
4 values as byte masks within an integer. The latter approach is exactly the
same way that various levels within the SCSI sub-system exchange error
information. What if it changes in Linux 2.3 or beyond?

In the following paragraphs points raised by Jörg Schilling are individually
addressed. His points are shown indented. "It" will usually refer to HEADER B
or something closely related to it.

> It hides the SCSI message byte from the user.
> As Douglas Gilbert's "driver enhancements" are intended to support
> tagged command queuing, knowledge of the message byte highly recommended.

The 'msg_byte' is a status code associated the the SCSI message level which
logically sits below the SCSI command level. Most Linux low level drivers
don't return the 'msg_byte' to the mid level and if they do and it is
anything other than COMMAND_COMPLETE (0x0) then it is treated as an error.
Architecturally the messaging protocol should be handled by the low level
device driver. Put simply, sg sees zero for this value.

> Only 5 of the 8 SCSI status bits are available from user space. As the 
> hidden bits from the status byte are defined "reserved" by the SCSI 
> standard, it could happen that the SCSI standard defines these bits to 
> be included in future standards. Creating an interface with such 
> limitations will then prevent access to the newly defined information.

Interesting point. Due to almost 15 years of abuse to the scsi_byte it is
extremely unlikely that those reserved bits will ever be used. What is going
to happen is that the 9 defined values (in SCSI 2) out of the 32 available
are going to be be expanded. The "abuse" referred to is code like this:
    #define ST_CHK_COND 0x2
    if (ST_CHK_COND == status_byte) ....
which assumes the 3 reserved bits a zero. It should read:
    if (ST_CHK_COND == (0x3e & status_byte)) ....
Masking is hard work. Then there is the clever code:
    if (status_byte & ST_CHK_COND) ....
It is clever because is picking up both CHECK CONDITION and COMMAND TERMINATED
(due to the latter having the code 0x22). It is not so clever when new
statuses are added that use the bit in question. [The latter line of code can
be found in cdrecord 1.8a21 libscg/scsi-linux-sg.c line 756.] 
Yes, the 'target_status' is not the same bit pattern as the SCSI status
byte but it is clearly documented as such and the <scsi/scsi.h> header file
has appropriately matching constants (eg #define CHECK_CONDITION 0x1).
The fact that macros and constants already existed in the Linux SCSI
sub-system suggested that its architects wanted it to be that way.

> It does not allow to specify vendor unique SCSI cdb lengths.            

This would be nice to have. It is discussed above.

> It does not allow the user to see whether SCSI sense data is actually
> available. The only way to check for arrived sense data is to check for
> nonzero fields inside the sense buffer.

It does. See its documentation.

> The new ioctl SG_SET_FORCE_LOW_DMA is something that should never be 
> accessible from user space. Note that such an user interface may fail to
> port to other Linux platforms (like Sparc etc.).

In a perfect world Jörg Schilling is correct. It is placed there for
backward compatibility: to force the sg implementation to obtain all its
memory below the 16MB limit (on i386) since this is what the original did. 
That ioctl is not the only mode changing ioctl and default for forcing old
behaviour. The following defaults are taken from the new sg.h :
  /* Default modes, commented if they differ from original sg driver */
  #define SG_DEF_COMMAND_Q 0
  #define SG_DEF_MERGE_FD 0       /* was 1 -> per device sequencing */
  #define SG_DEF_FORCE_LOW_DMA 0  /* was 1 -> memory below 16MB on i386 */
  #define SG_DEF_FORCE_PACK_ID 0
  #define SG_DEF_UNDERRUN_FLAG 0
Associated with each one of these is a SET and GET ioctl to manipulate
them on a per file (or per device) basis. They have proved quite useful.
Cdparanoia III Alpha 9.4 managed to find a way to detect command queuing
was present. Rather than throw it out, it was defaulted off.

The qualification on the 16MB ISA DMA problem is meant to imply that it
primarily an "i386" architecture problem. The sg driver makes the standard
kernel calls to obtain memory. The controlling #define in the kernel is 
called MAX_DMA_ADDRESS and is architecture dependent. On both the sparc
and sparc64 this is the around 32 bits. With luck, no application may need
to call SG_SET_FORCE_LOW_DMA and even if one did then on many architectures
it would have no effect.

It is noteworthy that the SG_SET_FORCE_DMA ioctl is in the driver now
being offered by Jörg Schilling.


Implementation
--------------

The implementation associated with the August 1998 version of the driver
based on HEADER A added slightly better buffer capabilities to the
application. It had the unfortunate side effect of making kernel memory
allocation more difficult within the still-emerging 2.2 kernel. The latter
problem was the very reason a new sg driver based on HEADER B was being
considered for inclusion in February 1999. In early March 1999 the sg
driver based on HEADER B was included in Alan Cox's linux test version
2.2.2-ac6 . It remained in every "ac" version until it went "live" in
2.2.6 on 16th April 1999, 6 weeks and several bug fixes after it had
first appeared in the "ac" version.

In mid-March 1999, about 2 weeks after the new driver first appeared in
an "ac" version, Jörg Schilling released a new version of the driver
based on HEADER A but this time, it had almost the same implementation as
HEADER B. A careful reading of the "sg.c" file in question (or the use
of "diff") will confirm this.

Is there anything wrong with this? Not really, this is what the free
software movement is about. However, in this case, there is at least one
technical problem and several practical ones though. 

That implementation reserves kernel buffers for DMA purposes on a per 
file descriptor basis. The amount actually reserved depends on several 
dynamic variables (not least that memory is a limited resource). Jörg 
Schilling in his "hybrid" driver reports (via the ioctl SG_GET_BUFSIZE)
a constant (SG_BIG_BUFF) instead of the variable. This constant is then 
used by his linux transport layer in cdrecord. The author has pointed this
out on several occasions, but this does not match the cdrecord transport
level architecture so it is "incorrect". 

Then there are the practical issues of who will support that "hybrid"
driver and the 8 weeks of fine tuning, documentation, utility and test 
program work since Jörg Schilling forked the development. To date there 
has been no attempt to track the implementation of the released driver.

Jörg Schilling's stated aim (see sg.h in his new version) is that all
applications based on the sg device use his transport layer (called libscg). 
The "abstraction" that this transport layer offers is to make all other SCSI
generic interfaces including CAM and NT's ASPI, look like a SCSI generic 
packet driver called scg that Jörg Schilling wrote in 1986 for SUN hardware. 
The danger here is that questionable design decisions in libscg become
reflected in the sg device driver and hence are forced on other sg-based
applications (eg SANE and cdparanoia) that are disinclined to use libscg.


Conclusion
----------

Even though HEADER A was first released publicly in the middle of 1998 it
was built on top of an implementation that did not supply the requested
enhancements. For whatever reasons it was not accepted at the time into
development kernel (2.1.x). The author played no part in this decision making
process but in private email (August 1998) did point out some design
weaknesses.

After months of inaction and the release of the 2.2 series of kernels,
the author's implementation based on HEADER B was "dusted off" and tested
by people who were having real problems (usually memory related) with the
original sg driver. Alan Cox noticed this and put the new driver in 
2.2.2-ac6. Within weeks Jörg Schilling re-entered the fray with a driver
using HEADER A based on the same implementation. The extensions offered by
both headers are not mutually compatible, still an attempt was made to
merge them which failed. Jörg Schilling put his technical objections up
quite forcibly and was listened to. Those technical objections have been
the subject of this paper. Those objections were either considered not
to have merit or not be sufficient to change the sg driver under test.