IO Architecture and Device Drivers

来源:岁月联盟 编辑:zhuzhu 时间:2008-10-23
IO Architecture and Device Drivers内容简介:1. I/O ports An important objective for system designers is to offer a unified approach to I/O programming without sacrificing performance. Toward that end, the I/O ports of each device are struct 1. I/O ports
An important objective for system designers is to offer a unified approach to
I/O programming without sacrificing performance. Toward that end, the I/O ports
of each device are structured into a set of specialized registers.
The in, out, ins, and outs assembly language
instructions access I/O ports.
The tree of all I/O addresses currently assigned to I/O devices can be obtained
from the /proc/ioports file.
# cat /proc/ioports
0000-001f : dma1
0020-0021 : pic1
0040-0043 : timer0
0050-0053 : timer1
0060-006f : keyboard
0070-0077 : rtc
0080-008f : dma page reg
00a0-00a1 : pic2
00c0-00df : dma2
00f0-00ff : fpu
01f0-01f7 : ide0
02f8-02ff : serial
03c0-03df : vga+
03f6-03f6 : ide0
03f8-03ff : serial
0800-0803 : PM1a_EVT_BLK
0804-0805 : PM1a_CNT_BLK
0808-080b : PM_TMR
0828-082f : GPE0_BLK
0ca8-0cac : ipmi_si
0cf8-0cff : PCI conf1
bca0-bcbf : 0000:00:1d.2
bca0-bcbf : uhci_hcd
bcc0-bcdf : 0000:00:1d.1
bcc0-bcdf : uhci_hcd
bce0-bcff : 0000:00:1d.0
bce0-bcff : uhci_hcd
cc00-ccff : 0000:10:0d.0
d000-dfff : PCI Bus #0e
dcc0-dcdf : 0000:0e:00.1
dcc0-dcdf : e1000
dce0-dcff : 0000:0e:00.0
dce0-dcff : e1000
e000-efff : PCI Bus #0c
e800-e8ff : 0000:0c:00.1
e800-e8ff : qla2400
ec00-ecff : 0000:0c:00.0
ec00-ecff : qla2400
fc00-fc0f : 0000:00:1f.1
fc00-fc07 : ide0
An I/O interface is a hardware
circuit inserted between a group of I/O ports and the corresponding device
controller. It acts as an interpreter that translates the values in the I/O
ports into commands and data for the device. In the opposite direction, it
detects changes in the device state and correspondingly updates the I/O port
that plays the role of status register. This circuit can also be connected
through an IRQ line to a Programmable Interrupt Controller, so that it issues
interrupt requests on behalf of the device.
2. The Device Driver Model
1) sysfs filesystem
The sysfs filesystem
is a special filesystem similar to /proc that
is usually mounted on the /sys directory. A goal of the sysfs filesystem is to expose the
hierarchical relationships among the components of the device driver model.
Relationships between components of the device driver models
are expressed in the sysfs filesystem as symbolic
links between directories and files. For example, the
/sys/block/sda/device file can be a symbolic link to a subdirectory
nested in /sys/devices/pci0000:00 representing the SCSI controller
connected to the PCI bus. Moreover, the /sys/block/sda/device/block file
is a symbolic link to /sys/block/sda, stating that this PCI device is the
controller of the SCSI disk.
The main role of regular files in the sysfs filesystem is to represent
attributes of drivers and devices. For instance, the dev file in the
/sys/block/hda directory contains the major and minor numbers of the
master disk in the first IDE chain.
2) Kobjects
The core data structure of the device driver model is a generic data structure
named kobject, which is inherently tied to the
sysfs filesystem: each kobject corresponds to a
directory in that filesystem.
Kobjects are embedded inside larger objectsthe so-called "containers"that
describe the components of the device driver model.
The descriptors of
buses, devices, and drivers are typical examples of containers;
struct kobject {
char * k_name;
char name[KOBJ_NAME_LEN];
atomic_t refcount;
struct list_head entry;
struct kobject * parent;
struct kset * kset;
struct kobj_type * ktype;
struct dentry * dentry;
The kobjects can be organized in a hierarchical tree by means of ksets . A kset is a
collection of kobjects of the same typethat is, included in the same type of
Collections of ksets called subsystems also exist. A subsystem may include ksets of different
types, and it is represented by a subsystem data structure.
sys, bus -- subsystem
pci -- kset
drivers -- kobject
As a general rule, if you want a kobject, kset, or subsystem to appear in the
sysfs subtree, you must first register it.
kobject_register() kobject_unregister()
3) Components of Device Driver Model
Objects related:

device. Usually, the device object is statically embedded in a larger
descriptor. For instance, PCI devices are described by pci_dev data
device_driver. Usually, the device_driver object is statically embedded in a larger
descriptor. For instance, PCI device drivers are
described by pci_driver data structures;
class. All class objects belong to the class_subsys subsystem associated with
the /sys/class directory.The classes of the device driver model are
essentially aimed to provide a standard method for exporting to User Mode
applications the interfaces of the logical devices .

3. Device Files
Unix-like operating systems are based on the notion of a file, which is just an
information container structured as a sequence of bytes. I/O devices are treated as special files called device
Network cards are a notable exception to this schema, because
they are hardware devices that are not directly associated with device
A device file is usually a real file stored in a filesystem. Its inode, however,
doesn't need to include pointers to blocks of data on the disk (the file's data)
because there are none. Instead, the inode must include an identifier of the
hardware device corresponding to the character or block device file. The mknod( ) system call is used to create
device files. It receives the name of the device file, its type, and the major
and minor numbers as its parameters.
As far as the kernel is concerned, the name of the device file is irrelevant. If
you create a device file named /tmp/disk of type "block" with the major
number 3 and minor number 0, it would be equivalent to the /dev/hda
device file shown in the table. So only minor and major number will be used for the kernel to see which device will be used.
1) User mode handling of Device Files
the size of the device numbers has been increased in Linux 2.6: the major number
is now encoded in 12 bits, while the minor number is encoded in 20 bits. Both
numbers are usually kept in a single 32-bit variable of type dev_t; the
MAJOR and MINOR macros extract the major and minor numbers,
respectively, from a dev_t value, while the MKDEV macro
encodes the two device numbers in a dev_t value.
#define MINORBITS 20
#define MINOR(dev) ((unsigned int) ((dev) & MINORMASK))
#define MKDEV(ma,mi) (((ma) > udev toolset can scan the subdirectories of /sys/class looking for the dev files. For each
such file, which represents a combination of major and minor number for a
logical device supported by the kernel, the program creates a corresponding
device file in /dev.
2) VFS Handling of Device Files
Device files live in the system directory tree but are
intrinsically different from regular files and directories. When a process
accesses a regular file, it is accessing some data blocks in a disk partition
through a filesystem; when a process accesses a device file, it is just driving
a hardware device. For instance, a process might access a device file to read
the room temperature from a digital thermometer connected to the computer. It is
the VFS's responsibility to hide the differences between device files and
regular files from application programs.
To do this, the VFS changes the default file operations of a
device file when it is opened; as a result, each system call on the device file
is translated to an invocation of a device-related function instead of the
corresponding function of the hosting filesystem. The device-related function
acts on the hardware device to perform the operation requested by the process.
Let's suppose that a process executes an open( ) system call on a device file (either of type block or
character). Essentially,
the corresponding service routine resolves the pathname to the device file and
sets up the corresponding inode object, dentry object, and file object.
The inode object is initialized by reading the corresponding
inode on disk through a suitable function of the filesystem (usually
ext2_read_inode( ) or ext3_read_inode( );). When this
function determines that the disk inode is relative to a device file, it invokes
init_special_inode( ), which initializes the i_rdev field of
the inode object to the major and minor numbers of the device file, and sets the
i_fop field of the inode object to the address of either the
def_blk_fops or the def_chr_fops file operation table,
according to the type of device file. The service routine of the open(
) system call also invokes the dentry_open( ) function, which
allocates a new file object and sets its f_op field to the address
stored in i_fopthat is, to the address of def_blk_fops or
def_chr_fops once again. Thanks to these two tables, every system call
issued on a device file will activate a device driver's function rather than a
function of the underlying filesystem.
void ext3_read_inode(struct inode * inode)
if (S_ISREG(inode->i_mode)) {
inode->i_op = &ext3_file_inode_operations;
inode->i_fop = &ext3_file_operations;
} else if (S_ISDIR(inode->i_mode)) {
inode->i_op = &ext3_dir_inode_operations;
inode->i_fop = &ext3_dir_operations;
} else if (S_ISLNK(inode->i_mode)) {
if (ext3_inode_is_fast_symlink(inode))
inode->i_op = &ext3_fast_symlink_inode_operations;
else {
inode->i_op = &ext3_symlink_inode_operations;
} else {
inode->i_op = &ext3_special_inode_operations;
if (raw_inode->i_block[0])
init_special_inode(inode, inode->i_mode,
old_decode_dev(le32_to_cpu(raw_inode->i_block[0]))); i_mode,
void init_special_inode(struct inode *inode, umode_t mode, dev_t rdev)
inode->i_mode = mode;
if (S_ISCHR(mode)) {
inode->i_fop = &def_chr_fops;
inode->i_rdev = rdev;
} else if (S_ISBLK(mode)) {
inode->i_fop = &def_blk_fops;
inode->i_rdev = rdev;
} else if (S_ISFIFO(mode))
inode->i_fop = &def_fifo_fops;
else if (S_ISSOCK(mode))
inode->i_fop = &bad_sock_fops;
printk(KERN_DEBUG "init_special_inode: bogus i_mode (%o)/n",
struct file_operations def_blk_fops = {
.open = blkdev_open,
.release = blkdev_close,
.llseek = block_llseek,
.read = generic_file_read,
.write = blkdev_file_write,
.aio_read = generic_file_aio_read,
.aio_write = blkdev_file_aio_write,
.mmap = generic_file_mmap,
.fsync = block_fsync,
.ioctl = block_ioctl,
.readv = generic_file_readv,
.writev = generic_file_write_nolock,
.sendfile = generic_file_sendfile,
4. Device Drivers
1) Registration
each system call issued on a device file is translated by the kernel into an
invocation of a suitable function of a corresponding device driver. To achieve
this, a device driver must register itself. In
other words, registering a device driver means
allocating a new device_driver descriptor, inserting it in the data
structures of the device driver model, and linking it to the corresponding device file(s).
When a device driver is being registered, the kernel looks for unsupported
hardware devices that could be possibly handled by the driver. To do this, it
relies on the match method of the relevant bus_type bus type
descriptor, and on the probe method of the device_driver
object. If a hardware device that can be handled by the driver is discovered,
the kernel allocates a device object and invokes device_register(
) to insert the device in the device driver model.
2) Device Driver Initialization
Registering a device driver and initializing it are two
different things. A device driver is registered as soon as possible, so User
Mode applications can use it through the corresponding device files. In
contrast, a device driver is initialized at the last possible moment. In fact,
initializing a driver means allocating precious resources of the system, which
are therefore not available to other drivers.
3) Monitoring I/O Operations

Pooling Mode. (like spin locks, when a processor tries to acquire a busy spin lock, it repeatedly polls the
variable until its value becomes 0.)
Interrupt mode

4) DMA (Direct Memory Access)
nowadays all PCs include auxiliary DMA circuits , which can
transfer data between the RAM and an I/O device. Once activated by the CPU, the
DMA is able to continue the data transfer on its own; when the data transfer is
completed, the DMA issues an interrupt request. The conflicts that occur when
CPUs and DMA circuits need to access the same memory location at the same time
are resolved by a hardware circuit called a memory
An example of asynchronous DMA is a network card that is
receiving a frame (data packet) from a LAN. The peripheral stores the frame in
its I/O shared memory, then raises an interrupt. The device driver of the
network card acknowledges the interrupt, then instructs the peripheral to copy
the frame from the I/O shared memory into a kernel buffer. When the data
transfer completes, the network card raises another interrupt, and the device
driver notifies the upper kernel layer about the new frame.
Why should the kernel be concerned at all about bus addresses ? Well, in a DMA operation, the data transfer takes
place without CPU intervention; the data bus is driven directly by the I/O
device and the DMA circuit. Therefore, when the kernel sets up a DMA operation,
it must write the bus address of the memory buffer involved in the proper I/O
ports of the DMA or I/O device.
The system architecture does not necessarily offer a coherency
protocol between the hardware cache and the DMA circuits at the hardware level,
so the DMA helper functions must take into consideration the hardware cache when
implementing DMA mapping operations. To see why, suppose that the device driver
fills the memory buffer with some data, then immediately instructs the hardware
device to read that data with a DMA transfer. If the DMA accesses the physical
RAM locations but the corresponding hardware cache lines have not yet been
written to RAM, then the hardware device fetches the old values of the memory
5. Character Device Drivers