IntroductionLinux Version 3 SCSI Generic Driver
The advantage of direct IO is that it allows the SCSI adapter's DMA chip to transfer data directly into (or from) user memory. This saves a copy through kernel buffers that usually occurs. For relatively slow SCSI devices such as CD writers and scanners (except high end professional models) direct IO offers very little performance advantage.
Sg is still capable of transferring data at 40 MB/sec (at less than 25% CPU utilization) using typical PC hardware (e.g. 500 MHz Celeron) with indirect IO. That is about as fast as a single high performance SCSI disk can source continuous data today (e.g. Seagate ST318451LW 15,000 rpm 3.9ms access).
If direct IO is requested and if it is not available then the driver transparently performs indirect IO. The 'info' output member in sg_io_hdr flags what type of transfer actually occurred. Reasons why direct IO may not be available include:
The user space scatter gather is independent of scatter gather performed by the DMA chip associated with most modern SCSI adapters. It has no alignment requirements and it is added as a convenience for applications that need to marshal a lot of data.
Pointers to the SCSI command, the data buffer (which may be a scatter gather list) and the sense buffer are passed through to the sg_io_hdr interface in the version 3 driver. This frees the user of the new interface from having to marshal and unmarshal this distinct data into a single buffer as required in the sg_header interface. Several applications that support multiple OSes in their transport layer are forced to allocate addition buffers and do otherwise redundant copies to cope with the alignment restrictions of the sg_header structure interface to sg. The SANE application for scanners is an example of this wasteful procedure. That "buffer shuffling" will no longer be required with the new interface.
The residual DMA count is not supported by all SCSI adapters. Those that don't support it yield 0. At the time of writing the advansys, sym53c8xx, aha152x and aic7xxx adapter drivers support resid.
This definition of sg_io_hdr opens up the possibility of it also being
used as an alternate structure passed to the SCSI_IOCTL_SEND_COMMAND ioctl()
command. The current rather inflexible structure (called Scsi_Ioctl_Command)
also requires a non-negative integer in its second integer postion. This
would make the new interface structure available to all SCSI devices (not
just sg).
typedef struct sg_iovec { /* parallels "struct iovec" in readv()
system call */
void * iov_base;
/* start address */
size_t iov_len;
/* length in bytes */
} sg_iovec_t; /* the scatter-gather list is an array of objects
of this type */
typedef struct sg_io_hdr
{
int interface_id;
/* [i] 'S' for SCSI generic (required) */
int dxfer_direction;
/* [i] data transfer direction */
unsigned char cmd_len;
/* [i] SCSI command length ( <= 16 bytes) */
unsigned char mx_sb_len; /*
[i] max length to write to sbp */
unsigned short iovec_count; /* [i] 0 implies
no scatter gather */
unsigned int dxfer_len;
/* [i] byte count of data transfer */
void * dxferp;
/* [i] [*io] points to data transfer memory or
scatter gather list */
unsigned char * cmdp;
/* [i] [*i] points to SCSI command to perform */
unsigned char * sbp;
/* [i] [*o] points to sense_buffer memory */
unsigned int timeout;
/* [i] MAX_UINT->no timeout (unit: millisec) */
unsigned int flags;
/* [i] 0 -> default, see SG_FLAG... */
int pack_id;
/* [i->o] unused internally (normally) */
void * usr_ptr;
/* [i->o] unused internally */
unsigned char status;
/* [o] scsi status */
unsigned char masked_status;/* [o] shifted,
masked scsi status */
unsigned char msg_status; /* [o]
messaging level data (optional) */
unsigned char sb_len_wr; /*
[o] byte count actually written to sbp */
unsigned short host_status; /* [o] errors from
host adapter */
unsigned short driver_status;/* [o] errors from
software driver */
int resid;
/* [o] dxfer_len - actual_transferred */
unsigned int duration;
/* [o] time taken (unit: millisec) */
unsigned int info;
/* [o] auxiliary information */
} sg_io_hdr_t; /* around 64 bytes long (on i386) */
/* Use negative values to flag difference from original sg_header
structure */
#define SG_DXFER_NONE -1
/* e.g. a SCSI Test Unit Ready command */
#define SG_DXFER_TO_DEV -2 /* e.g.
a SCSI WRITE command */
#define SG_DXFER_FROM_DEV -3 /* e.g. a SCSI READ
command */
#define SG_DXFER_TO_FROM_DEV -4 /* treated like SG_DXFER_FROM_DEV
with the
additional property than during indirect
IO user buffer is copied into the kernel
buffers before the transfer */
#define SG_DXFER_UNKNOWN -5 /* Unknown
data direction */
/* following flag values can be "or"-ed together */
#define SG_FLAG_DIRECT_IO 1 /* default
is indirect IO */
#define SG_FLAG_LUN_INHIBIT 2 /* default is to put
device's lun into */
/* the 2nd byte of SCSI command */
#define SG_FLAG_MMAP_IO 4 /*
selects memory mapped IO. Introduced in
version 3.1.22 . May not be present in
GNU library headders for some time */
#define SG_FLAG_NO_DXFER 0x10000 /* no transfer of kernel buffers
to/from */
/* user space (debug indirect IO) */
/* following 'info' values are "or"-ed together */
#define SG_INFO_OK_MASK 0x1
#define SG_INFO_OK 0x0
/* no sense, host nor driver "noise" */
#define SG_INFO_CHECK 0x1 /*
something abnormal happened */
#define SG_INFO_DIRECT_IO_MASK 0x6
#define SG_INFO_INDIRECT_IO 0x0 /* data xfer via kernel buffers
(or no xfer) */
#define SG_INFO_DIRECT_IO 0x2
#define SG_INFO_MIXED_IO 0x4 /* part direct,
part indirect IO */
Direct IO involves locking down user allocated memory. This is to stop
user memory "moving" while the DMA element in the SCSI adapter is
accessing it. Linux locks RAM in PAGE_SIZE units (4096 bytes on the i386
architecture). The user of "dio" needs to take care that a single unit
of memory is not locked more than once during one sg transaction. Without
precautions this could happen when queuing commands, accessing sg via multiple
threads or some external mechanism (e.g. mlock() ?). Another worrying scenario
is several processes sharing memory that contains a buffer used by sg (this
happens in some SANE drivers). The following code snippet will ensure that
no two allocations get the given the same page:
psz = getpagesize();
if (NULL == (alloc_bp = malloc(sz + psz)))
/* 'sz' bytes required */
exit(1); /* out
of memory */
buffp = (unsigned char *)
(((unsigned long)alloc_bp + psz - 1) & (~(psz - 1)));
The above code additionally aligns 'buffp' to the beginning of a page which is not strictly necessary but may improve performance.
Memory that looks contiguous in the user space (e.g. from a single malloc() ) is typically non-contiguous when the kernel looks at it. That normally limits the largest single transfer using dio to (PAGE_SIZE * adapter_scatter_gather_list_length). The latter variable is determined by the adapter driver but sg limits it to 255. So on the i386 architecture that limits a single transfer to just under 1 MB.
The following utilities in the sg3_utils package (see the main page)
have options to use dio: sg_dd, sgp_dd, sgq_dd and sg_rbuf .
Mmap-ed IO, like direct IO, removes the extra copy usually performed from sg's kernel buffers into the user space (or vice versa). Unfortunately the strategy used by direct IO causes significant per command overhead (think 1 millisecond) so that it is only a performance win for SCSI commands with big data payloads (e.g. a READ of 256 KB in one command). Mmap-ed IO has next to no per command overhead imposed by the sg driver.
Using mmap-ed IO requires an application to change its buffer management. An application will no longer call malloc() [or one of its friends] but rather call mmap() which returns a valid pointer if successful. Here is some pseudo code to illustrate the point:
sg_fd = open(sg_dev_filename, O_RDWR);
k = res_sz = 128 * 1024; /* max data transfer size required
*/
psz = getpagesize();
if (0 == (k % psz))
k = ((k / psz) + 1) * psz; /* round up to page
size multiple */
ioctl(sg_fd, SG_SET_RESERVED_SIZE, &k);
mmBuff = mmap(NULL, res_sz, PROT_READ | PROT_WRITE,
MAP_SHARED, sg_fd, 0);
for ( .... ) {
sg_io_hdr_t io_hdr;
memset(&io_hdr, 0, sizeof(sg_io_hdr_t));
...
/* no need to set io_hdr.dxferp */
io_hdr.flags = SG_FLAG_MMAP_IO;
ioctl(sg_fd, SG_IO, &io_hdr);
...
/* assuming a READ like SCSI command */
/* application now reads data at mmBuff */
}
Here are a few points to note:
Mmap() can be called multiple times on a single sg file descriptor. The process can also be forked or the sg file descriptor duplicated with dup(). I can't think why these variations would be useful but they have been tested. Zero copy between two SCSI devices (but still DMA in and DMA out) is possible if one device uses mmap-ed IO and the other uses direct IO .... [my guess prior to timing it is that copying between two mmap-ed buffers in user space (one for the read device, the other for the write device) will be quicker than the mmap/direct IO combination. Simpler still: do mmap-ed IO on the read side and normal IO on the write side. The latter approach is what sgm_dd does.]
The best example is real code. See the latest sg3_utils package (downloads on main page) that includes a new sg_dd variant called sgm_dd that uses mmap-ed IO. sg_simple4 is a bare bones example of mmap-ed IO on an INQUIRY response. The sg_rbuf SCSI bus speed tester has a new "-m" argument. There is also a new program called "sg_read" that reads multiple blocks from the same logical address. It has a command line syntax like "sg_dd" but with no "of" argument. Both "sg_rbuf" and "sg_read" offer internal transfer timimg.
Return to main page.
Author: Douglas Gilbert (dgilbert@interlog.com)
Last Updated: 13th April 2002 13:00