DRAFT

			The UNIX Time-Sharing System

				D. M. Ritchie

1. Introduction

UNIX is a general-purpose, multi-user time sharing system implemented
on several Digital Equipment Corporation PDP series machines.

UNIX was written by K. L. Thompson, who also wrote many of the
command programs. The author of this memorandum contributed
several of the major commands, including the assembler and the
debugger. The file system was originally designed by Thompson,
the author, and R. H. Canaday.

There are two versions of UNIX. The first, which has been in existence
about a year, runs on the PDP-7 and -9 computers; a more
modern version, a few months old, uses the PDP-11. This document
describes UNIX-11, since it is more modern and many of the differences
between it and UNIX-7 result from redesign of features
found to be deficient or lacking in the earlier system.
Although the PDP-7 and PDP-11 are both small computers, the
design of UNIX is amenable to expansion for use on more powerful
machines. Indeed, UNIX contains a number of features very seldom
offered even by larger systems, including

   1. A versatile, convenient file system with complete integration
      between disk files and I/O devices;

   2. The ability to initiate asynchrously running processes.

It must be said, however, that the most important features of
UNIX are its simplicity, elegance, and ease of use.

Besides the system proper, the major programs available under
UNIX are an assembler, a text editor based on QED, a symbolic
debugger for examining and patching faulty programs, and "B", a
higher level language resembling BCPL. UNIX-7 also has a version
of the compiler writing language TMGL contributed by M. D.
McIlroy, and besides its own assembler, there is a PDP-11
assembler which was used to write UNIX-11. On the PDP-11 there is
a version of BASIC [reference] adapted from the one supplied by
DEC [reference]. All but the last of these programs were written
locally, and except for the very first versions of the editor and
assembler, using UNIX itself.

2. Hardware

The PDP-11 on which UNIX is implemented is a 16-bit 12K computer,
and UNIX occupies 8K words. More than half of this space,
however, is utilized for a variable number of disk buffers; with
some loss of speed the number of buffers could be cut significantly.

The PDP-11 has a 256K word disk, almost all of which is used for
file system storage. It is equipped with DECTAPE, a variety of
magnetic tape facility in which individual records may be addressed
and rewritten at will. Also available are a high-speed paper
tape reader and punch. Besides the standard Teletype, there are
several variable-speed communications interfaces.

3. The File System

The most important role of UNIX is to provide a file system. From
the point of view of the user, there are three kinds of files:
ordinary disk files, directories, and special files.

3.1 Ordinary Files

A file contains whatever information the user places there, for
example symbolic or binary (object) programs. No particular
structuring is expected by the system. Files of text ordinarily
consist simply of a string of characters, with lines demarcated
by the new-line character. Binary programs are sequences of words
as they will appear in core memory when the program starts
executing. A few user programs generate and expect files with
more structure; for example, the assembler generates, and the
debugger expects, a name list file in a particular format;
however, the structure of files is controlled solely by the
programs which use them, not by the system.

3.2 Directories

Directories (sometimes, "catalogs"), provide the mapping between
the names of files and the files themselves, and thus induce a
structure on the file system as a whole. Each user has a directory
of his own files; he may also create subdirectories to contain groups
of files conveniently treated together.

A directory is exactly like an ordinary file except that it cannot
be written on by user programs, so that the system controls
the contents of directories. However, anyone with appropriate
permission may read a directory just like any other file.

The system maintains several directories for its own use. One of
these is the root directory. All files in the system can be found
by tracing a path through a chain of directories until the
desired file is reached. The starting point for such searches is
often the root, which contains an entry for each user's master
directory. Another system directory contains all the programs
provided as part of the system; that is, all the commands
(elsewhere, "subsystems"). As will be seen, however, it is by no
means necessary that a program reside in this directory for it to
be used as a command.

Files and directories are named by sequences of eight or fewer
characters. When the name of a file is specified to the system,
it may be in the form of a path name, which is a sequence of
directory names separated by slashes and ending in a file name.
If the sequence begins with a slash, the search begins in the
root directory. The name "/a/b/c" causes the system to search the
root for directory "a"; then to search "a" for "b", and then to
find "c" in "b". "c" may be an ordinary file, a directory, or a
special file. As a limiting case, the name "/" refers to the root
itself.

The same non-directory file may appear in several directories under
possibly different names. This feature is called "linking"; a
directory entry for a file is sometimes called a link. UNIX differs
from other systems in which linking is permitted in that all
links to a file have equal status. That is, a file does not exist
within a particular directory; the directory entry for a file
consists merely of its name and a pointer to the information actually
describing the file. Thus a file exists independently of
any directory entry, although in practice a file is made to
disappear along with the last link to it.

When a user logs into UNIX, he is assigned a default current
directory, but he may change to any directory readable by him. A
path name not starting with "/" causes the system to begin the
search in the user’s current directory. Thus, the name "a/b"
specifies the file named "b" in directory "a", which is found in
the current working directory. The simplest kind of name, for example
"a", refers to a file which itself is found in the working directory.

Each directory always has at least two entries. The name "." in
each directory refers to the directory itself. Thus a program may
read the current directory under the name "." without knowing its
actual path name. The name ".." by convention refers to the
parent of the directory in which it appears; that is, the directory
in which it was first created.

The directory structure is constrained to have the form of a
rooted tree. Except for the special entries "." and "..", each
directory must appear as an entry in exactly one other, which is
its parent. The reason for this is to simplify the writing of
programs which visit subtrees of the directory structure, and
more important, to avoid the separation of portions of the
hierarchy. If arbitrary links to directories were permitted, it
would be quite difficult to detect when the last connection from
the root to a directory was severed.

3.3 Special Files

Special files constitute the most unusual feature of the UNIX
file system. Each I/O device supported by UNIX is associated with
at least one special file. Special files are read and written
just like ordinary disk files, but the result is activation of
the associated device. Entries for all special files reside in
the root directory, so they may all be referred to by "/" followed
by the appropriate name.

The special files are discussed further in section 6 below.

3.4 Protection

The protection scheme in UNIX is quite simple. Each user of the
system is assigned a unique user number. When a file-is created,
it is marked with the number of its creator. Also given for new
files is a set of protection bits. Four of these specify independently
permission to read or write for the owner of the file and
for all other users. A fifth bit indicates permission to execute
the file as a program. If the sixth bit is on, the system will
temporarily change the user identification of the current user to
that of the creator of the file whenever the file is executed as
a program. This feature provides for privileged programs which
may use files which should neither be read nor changed by other
users. If the set-user-identification bit is on for a program,
the accounting file may be accessed during the program’s execution
but not otherwise.

3.5 System I/O Calls

The system calls to do I/O are designed to eliminate the differences
between the various devices and styles of access. There
is no distinction between "random" and sequential I/O, nor is any
logical or physical record size imposed by the system. The size
of a file on the disk is determined by the location of the last
piece of information written on it; no predetermination of the
size of a file is necessary. In UNIX-11, the unit of information
is the 8-bit byte, since the PDP-11 is a byte-oriented machine.

To illustrate the essentials of I/O in UNIX, the basic calls are
summarized below in an anonymous higher level language which will
indicate the needed parameters without getting into the complexities
of machine language programming. (All system calls are also
described in Appendix 1 in their actual form.) Each call to the
system may potentially result in an error return, which for simplicity
is not represented in the calling sequence.

3.5.1 Open

To read or write a file assumed to exist already, it must be
Opened by the following call:

      filep = open(name, flag)


Name indicates the name of the file. An arbitrary path name may
be given. The flag argument indicates whether the file is to be
read or written. If the file is to be "updated", that is read and
written simultaneously, it may be opened twice, once for reading
and once for writing.

The returned argument filep is called a file descriptor. It is
used to identify the file in subsequent calls to read, write or
otherwise manipulate the file.

There are no locks in the file system, nor is there any restriction
on the number of users who may have a file open for reading
or writing. Although one may imagine situations in which this
fact is unfortunate, in practice difficulties are quite rare.

3.5.2 Create

To create a new file, the following call is used.

     filep = create(name, mode)

Here filep and name are as before. If the file already existed,
it is truncated to zero length. Creation of a file implies
opening for writing as well. The mode argument indicates the permissions
which are to be placed on the file by the protection
mechanism. To create a file, the user must have write permission
in the directory in which the file is being created.

3.5.3 Write

Except as indicated below, reading and writing are sequential.
This means that if a particular byte in the file was the last
byte written (or read), the next I/O call implicitly refers to
the first following byte. For each Open file there is a pointer,
maintained by the system, which always indicates the next byte to
be read or written. If n bytes are read, the pointer advances by
n bytes.

Once a file is open for writing, the following call may be used.

      nwritten = write(filep, buffer, count)

Buffer is the address of count sequentially stored bytes (words
in UNIX-7) which will be written onto the file. nwritten is the
number of bytes actually written; except in rare cases it is the
same as count. Occasionally, an error may be indicated; for example
if paper tape is being written, an error occurs if the tape
runs out.

For disk files which already existed (that is, were opened by
open, not create) the bytes written affect only those implied by
the position of the write pointer and the number of bytes
written; no other part of the file is changed.

3.5.4 Read

To read, the call is

      nread = read(filep, buffer, count)

Up to count bytes are read from the file into buffer. The number
actually read is returned as nread. Every program must be
prepared for the possibility that nread is less than count. If
the read pointer is so near the end of the file that reading
count characters would cause reading beyond the end, only sufficient
bytes are transmitted to reach the end of the file.
Furthermore, devices like the typewriters work in units of lines.
Suppose, for example, that before anything has been typed a
program tries to read 128 characters from the console. This forces
the program to wait, since nothing has been typed. The user
now types a line consisting, say, of 10 characters and hits the
"new line" key. At this point the read call would return indicating
11 characters read (including the new line). On the
other hand, it is permissible to read fewer characters than were
typed without losing information; for example bytes may be picked
up one at a time.

When the read call returns with nread equal to zero, it indicates
the end of the file. For disk files this occurs when the read
pointer becomes equal to the current size of the file. It is possible
to generate an end-of-file from a typewriter by use of an
escape sequence which depends on the device used.

3.5.5 Seek

To do "random", that is, direct access I/O it is only necessary
to move the read or write pointer to the appropriate location in
the file.

      seek(filep, base, offset)

The read pointer (respectively write pointer) associated with
filep is moved to a position offset words from the beginning,
from the current position of the pointer, or from the end of the
file, depending on whether base is O, 1, or 2. Offset may he
negative to move the pointer backwards. For some devices (e.g.
paper tape and typewriters) seek calls are meaningless and are
ignored.

3.5.6 Tell

The current position of the pointer may be discovered as follows:

      offset = tell(filep, base)

As with seek, filep is the file descriptor for an open file, and
base specifies whether the desired offset is to be measured from
the beginning of the file, from the current position of the pointer,
or from the end. In the second case, of course, the result
is always zero.

4. Implementation of the File System

As mentioned in section 3.2 above, a directory entry contains
only a name for the associated file and a pointer to the file
itself. This pointer is an integer called the i-number (for identification
number) of the file. When the file is accessed, its i-number
is looked up in a system table stored in a known part of
the disk. The entry thereby found (the file's i-node) contains
the description of the file:

      1. its owner;
      2. its protection bits;
      3. the physical disk addresses for the file contents;
      4. its size;
      5. times of creation and last modification;
      6. the number of links to the file; that is, the number of
	 times it appears in a directory;
      7. bits indicating whether the file is a directory and whether
	 it is special (in which case the size and disk addresses
	 are meaningless);
      8. a bit indicating whether the file is "large" or "small."

There is space in each i-node for eight disk addresses. A file
which fits into eight or fewer 64-word (128-byte) blocks is considered
small; in this case the addresses of the blocks themselves
are stored. For large files, each of the eight disk addresses
may point to an indirect block of 64 words containing the
addresses of the blocks constituting the file itself. Thus files
may be as large as 8*64*128, or 65,536 bytes.

When the number of links to a file drops to zero, its contents
are freed and its i-node is marked unused.

To the user, both reading and writing of files appears to be synchronous
and unbuffered. That is, immediately after return from a
read call the data is available, and conversely after a write the
user's workspace may be reused. In fact the system maintains, unseen
by the user, a rather complicated buffering mechanism.
Suppose a write call is made specifying transmission of a single
byte. UNIX will search its own buffers to see whether the affected
disk block currently resides in its own buffers; if not, it
will be read in from the disk. Then the affected byte is replaced
in the buffer and an entry is made in a list of blocks to be
written on the disk. The return from the write call may then take
place, although the actual I/O may not be completed until a later
time. Conversely, if a single byte is read, the system determines
whether the disk block in which the byte is located is already in
one of the system's buffers; if so, the byte can be returned
immediately. If not, the block is read into a buffer and the byte
picked out. Because sequential reading of a file is so common,
UNIX attempts to optimize this situation by prereading the disk
block following the one in which the requested byte is found.
This strategy tends to minimize and in some cases eliminate disk
latency delays.

A program which reads or writes files in units of 128 bytes has
an advantage over a program which reads or writes a single byte
at a time, but the gain is not immense. As an example, the editor
ed (8.9 and A2.4 below) was originally written, for simplicity,
to do I/O one character at a time; it increased its speed by a
factor of about two when it was rewritten to use 128-byte units.
Because the system attempts to retain copies of the most recently
used disk blocks in core, the speed gain in dealing with large
units comes principally from elimination of system overhead, not
from latency delays.

5. The Shell

5.l General

Communication with UNIX is carried on with the aid of a program
called the Shell. The Shell is a command line interpreter: it
reads lines typed by the user and interprets them as requests to
execute other programs. In simplest form, a command line consists
of the command name followed by arguments to the command, all
separated by spaces:

      command arg1 arg2 ... argn

The Shell splits up the command name and the arguments into
separate strings. Then a file with name command is sought;
command may be a path name including the "/" character to specify
any file in the system. If command is found, it is brought into
core and executed. The arguments collected by the Shell are accessible
to the command. When the command is finished, the Shell
resumes its own execution, and indicates its readiness to accept
another command by typing the prompt character "@".

If file command cannot be found, the Shell prefixes the string
"/bin/" to command and attempts again to find the file. Directory
"/bin" contains all the commands provided by the system itself.

5.2 Standard I/O

The discussion of I/O given above seems to imply that every file
used by a program must be opened or created by the program in
order to get a file descriptor for the file. In fact, this is not
quite true. There are two files always accessible to every
program without an explicit "open" or "create"; they have file
descriptors O and 1. As a program begins execution, file 1 is
Open for writing, and is best understood as the standard output
file. Except under circumstances indicated below, this file is
the user’s typewriter. Thus programs which wish to write informative
or diagnostic information ordinarily use file descriptor 1.
Conversely, file 0 starts off open for reading, and programs
which wish to read messages typed by the user usually read this file.

The Shell is able to change the standard assignments of these
file descriptors from the user's typewriter printer and keyboard.
If one of the arguments to a command is prefixed by ">", file
descriptor 1 will, for the duration of the command, refer to the
file named after the ">". For example,

      ls

ordinarily lists, on the typewriter, the names of the files in
the current directory. The command

      ls >files
creates a file called "files" and places the listing there. Thus
the argument ">files" means, `place output on "files"'. On the
other hand,

      ed

ordinarily enters the editor, which takes requests from the user
via his typewriter. The command

      ed <script

interprets "script" as a file of editor commands; thus "<script"
means, `take input from "script"'.

Although the file name following "<" or ">" appears to be an argument
to the command, in fact it is interpreted completely by
the Shell and is not passed to the command at all. Thus no special
coding is needed within each command; the command need
merely use the standard file descriptors O and 1 where appropriate.

5.3 Command Separators

Another feature provided by the Shell is relatively
straightforward. Commands need not be on different lines; instead
they may be separated by semicolons.

      ls; ed

will first list the contents of the current directory, then enter
the editor.

A related feature is more interesting. If a command is followed
by "&"; the Shell will not wait for the command to finish before
returning with its signal "@"; instead, it is ready immediately
to accept a new command; For example,

      as source >output &

causes "source" to be assembled, with diagnostic output going to
"output;" however, no matter how long the assembly takes, the
Shell returns immediately. The "&" may be used several times in a
line:

      as source >output & ls >files &

does both the assembly and the listing in the background. In all
the examples above using "&", an output file other than the
typewriter was provided; if this had not been done, the outputs
of the various commands would have been intermingled.
(Incidentally, the spaces before and after the "&" in the examples
above are not necessary.)

5.4 The Shell as a Command

The Shell is itself a command, and may be called recursively.
Suppose file "tryout" contains the lines

      as source
      mv a.out testprog
      testprog

The mv command causes the file "a.out" to be renamed "testprog".
"a.out" is the (binary) output of the assembler, ready to be
executed. Thus if the three lines above were typed on the console,
"source" would be assembled, the resulting program named
"testprog", and "testprog" executed. When the lines are in
"tryout", the command

      sh <tryout

would cause the Shell sh to execute the commands sequentially.
(The Shell has further capabilities, including the ability to interpret
parameters to filed command sequences; see section 8.18).

When the user types the "&" character as part of a command line,
he is explicitly invoking the multitasking facilities of UNIX.
That is, he is creating a process which runs asynchronously from
his normal command stream. Although this ability is quite convenient
for the user directly, it is even more useful to UNIX itself.

5.5 Processes and forking

A process in UNIX is the execution of a program. The evidence of
the existence of a process is a core image. While the processor
is executing on behalf of a process, the core image, quite
naturally, resides in the core memory of the computer; during the
execution of other processes, a core image is kept on the disk.
In order to provide fast response to user’s requests, UNIX, like
most time-sharing systems, swaps the core images of processes
between core and the disk.

Except while UNIX is bootstrapping itself into operation, a new
process can come into existence in only one way: by use of the
fork system call.

      processid = fork(label)

When fork is executed by a process, it splits into two independently
executing processes. The two processes have core images
which are copies of each other, but they are not precisely
equivalent: one of them is considered the parent process. In the
parent, control does not return directly from the fork, but instead
passes to location label; in the child process, there is a
normal return. The processid returned by the fork call is the
identification of the other, offspring process.

Because the return points in the parent and child process are not
the same, each copy of a program existing after a fork may determine
whether it is the parent or child process.

5.6 Execution of programs

Another system primitive on which the Shell depends heavily is
invoked by

      status = execute(file, arq1, argz, ... , argn)

which requests the system to read in and execute the program
named by file, passing it arguments arg1, arg2, ... argn.
Ordinarily, arg1 should be the same string as file. If this call
is successful, control never returns to the program which uses
it. That is, the image of the named file replaces the current
program. Only if the call fails, for example because file could
not be found or because its execute-permission bit was not set,
does a return take place from the execute primitive.

The third and last process control system call used by the Shell is

      processid, status = wait()

This primitive causes its caller to suspend execution until one
of its children has completed execution. Then wait returns the
processid of the terminated process and a status value indicating
how the process died. (Processes which are never waited for die
unnoticed and presumably unmourned).

5.7 Operation of the Shell

The outline of the operation of the Shell can now be understood.
Most of the the time, the Shell is waiting for the user to type a
command. When the new line character is typed, the Shell's read
call returns. The Shell analyzes the command line, putting the
arguments in a form appropriate for execute. Then fork is called.
The child process, whose code of course is still that of the
Shell, then attempts to perform an execute with the appropriate
arguments. If successful, this will bring in and start execution
of the program whose name was given. Meanwhile, the other process
resulting from the fork, which is the parent process, waits for
the child process to die. When this happens, the Shell knows the
command is finished, so it types out "@" and reads the typewriter
to obtain another command.

Given this framework, the implementation of background processes
is trivial; whenever a command line terminates with "&", the
Shell merely refrains from waiting for the process which it
created to execute the command.

Happily, all of this mechanism meshes very nicely with the notion
of standard input and output files. When a process is created by
the fork primitive, it inherits not only the core image of its
parent but also all the files currently open in its parent, including
those with file. descriptors 0 and 1. The Shell, of
course, uses these files to read command lines and to write its
signal "@", and in the ordinary case its children-- the command
programs-- inherit them automatically. When an argument with "<"
or ">" is given however, the offspring process, just before it
performs execute, closes file 0 or 1 respectively and opens the
named file. Because the process in which the command program runs
simply terminates when it is through, the association between a
file specified after "<" or ">" and file descriptor O or 1 is ended
automatically when the process dies. Therefore the Shell need
not know the actual names of the files which are its own standard
input and output, since it need never reopen them.

In ordinary circumstances, the main loop of the Shell never
terminates. (The main loop includes that branch of the return
from fork belonging to the parent process; that is, the branch
which does a wait, then reads another command line). The one
thing which causes the Shell to terminate is discovering an end-of-file
condition on its input file. Thus, when the Shell is executed
as a command with a given input file, as in

      sh <comfile

the commands in "comfile" will be executed until the end of
"comfile" is reached; then the instance of the Shell invoked by
sh will terminate. Since this Shell process is the child of
another instance of the Shell, the wait executed in the latter
will return, and another command may be processed.

The instances of the Shell to which each UNIX user types commands
are themselves children of another process. The last step in the
initialization of UNIX is the creation of a single process and
the invocation (via execute) of a program called init. The code
for init is kept in a file, like every other command. Its role is
to create one process for each typewriter channel which may be
dialed up by a user. The various subinstances of init open the
appropriate typewriters for input and output. Since when init was
invoked there were no files open, in each process the typewriter
keyboard will receive file descriptor 0 and the printer file
descriptor 1. Each process types out a message requesting that
the user log in and waits, reading the typewriter, for a reply.
At the outset, no one is logged in, so each process simply hangs.
Finally someone types his name or other identification. The appropriate
instance of init wakes up, receives the log-in line,
and reads a password file. If the user is found, and if he is
able to supply the correct password, init changes to the user’s
default current directory, sets the user number to that of the
person logging in, and performs an execute of the Shell. At this
point the Shell is ready to receive commands and the logging-in
protocol is complete.

Meanwhile, the mainstream path of init (the parent of all the
subinstances of itself which will later become Shells) does a
wait. If one of the child processes terminates, either because a
Shell found an end of file or because a user typed an incorrect
name or password, this path of init simply recreates the defunct
process, which reopens the appropriate input and output files and
types another log-in message. Thus a user may log out simply by
typing the end-of-file sequence in place of a command to the
Shell.

6. Census of Special Files

Here is a list of the special files currently implemented. Since
an entry for each resides in the root directory, the file "xyz"
may be referred to by "/xyz". Alternatively, one may link to any
of these files under any name desired.

6.1 ppt

When read, "ppt" refers to the paper tape reader; when written,
to the punch. Null characters are ignored for both reading and
writing, so "ppt" is suitable only for ASCII (not binary)
information; on the other hand, the program need not take account
of the leader or trailer. End of file occurs during a read when
the end of the tape passes through the sensors.

6.2 bppt

"bppt" also refers to paper tape. The tape is in a blocked format
with checksums. Completely arbitrary information may be written
and recovered unchanged in this mode.

6.3 rppt

This is raw input and output for paper tape. Every character is
passed to the program, including nulls, so that the program must
know when the leader ends and information begins during a read.
On the other hand, this mode is suitable when tapes of unusual
format must be read.

6.4 tty

This is the console typewriter. Null characters are ignored for
both reading and writing. For reading, the line is a unit of
information; a program reading "tty" will wait until a whole line
has been typed, and at most one line will be passed back to the
program. However, characters may be read one at a time from the line.

On input, erase and kill processing are performed: "#" will erase
the last character typed; "@" kills the entire line.

The ASCII character "EOT" signals an end of file to the program.
The ASCII "new line" character is the standard means of ending an
input line. On the Teletype models 33 and 35 and some other terminals
UNIX must simulate this function by echoing a "return"
character when it receives a "line space" (whose code corresponds
to the ASCII "new line").

The name "tty" refers to the user's own typewriter, no matter
which physical channel he may be using. There are also special
files for each typewriter. They have the names "ctty" (for the
central site terminal), and "tty1", "tty2", ... "ttyn" (for
user's typewriters).

6.5 rtty

This is "raw" typewriter I/O. It is identical to "tty" for output,
but on input the program waits only until at least one
character has been typed before a return from the read occurs. No
erase or kill processing is done.

6.6 tap0; tap1

These files refer to DECTAPE logical units 0 and 1. When they are
opened, the program waits until a tape is mounted on the appropriate drive.

6.7 disk

This file refers to the entire disk in a way independent of the
file system; it reads or writes the physical block corresponding
to the current file pointer.

One use of this file demonstrates convincingly the versatility of
the special file concept. There is a program called check which
scrutinizes the entire file system to determine its consistency
and the number of disk blocks used for various purposes. This
program is in no sense part of the system; it is an ordinary command
invokable by any user. Check operates by reading the file
"disk." In this way it is able to examine the list of i-nodes
(cf. section 4) which define files without depending on ad hoc
system calls to obtain its information.

6.8 system

This special file causes the area of core memory occupied by the
system to be treated as a file. Thus the system can be examined
and patched during operation by use of the ordinary debugger db
discussed below.

7. Traps

The PDP-11 hardware detects a number of program faults, such as
references to non-existent memory, unimplemented instructions,
and odd addresses used where an even address is required. Such
faults cause the processor to trap to a system routine. When an
illegal action is caught, the system writes the user's core image
on file "core" in the current working directory. Because of the
way the hardware and the system operate, the contents of all the
program-accessible registers are stored within this core image
files Thus, the debugger db discussed below can be used to determine
the state of the program at the time of the fault.

The user may also force the program to stop and a core image file
to be written by sending an interrupt signal. Currently this signal
is generated by typing the ASCII "FS" character (control "\"
on model 37 Teletypes). Thus programs which are looping or about
which the user has second thoughts may be halted.

If the user has several processes in execution simultaneously
(because he used the "&" facility of the Shell) only one of these
processes is stopped, and there is no control over which one; it
depends on which is currently in execution or executes next.
Clearly this situation leaves much to be desired, for several reasons:

      1. when the user has several processes he cannot interrupt
	 with any selectivity;
      2. in all cases the rather large core image file is produced,
	 when the user may merely have wished, for example, to stop
	 a long printout;
      3. it is often useful to send an asynchronous signal to a
	 process without stopping it (for example by causing a trap
	 to an agreed-upon location within the process’s core
	 image).

Doubtless, therefore, the interrupt facility will be reworked in
the future. Unfortunately, there are not only implementation
problems but even conceptual ones-- e.g. how does one specify a
process which may have started an arbitrary time ago?

8. Some Commands

This section summarizes several of the commands available in
UNIX. The list is not exhaustive, but it covers those most frequently
used. The assembler as, the debugger db, the editor ed
and DECTAPE manipulator tap are documented in more detail in
Appendix 2.

Where an argument like name is given, a file name is meant. In
every case, an arbitrary path name may be used to specify any
file in the system, subject to the constraints imposed by the the
protection system.

Arguments enclosed in square brackets are optional.

8.1 as -- assemble

As is the assembler for the PDP-11. It is called as follows:

      as name1 name2 ... namen

The concatenation of the files name1 ... namen is assembled. The
resulting binary output is placed in file "a.out" in the current
working directory; a copy of the name list from the assembly is
placed on "n.out." See Appendix 2 for more information.

8.2 b -- compile B program

B [reference] is a new higher-level language with implementations
on the PDP-7, PDP-11, and Honeywell 635. To compile several B
programs,

      b -name1 ... -namen

Notice that each name should be preceded by a "-" (not part of
the file name). Also, b supplies the conventional suffix ".b".
For example, to concatenate and compile "abc.b" and "def.b", type

      b -abc -def

The binary output is left on "a.out" and the name list on "n.out"
(just like the assembler). See the B reference manual [reference]
for more information.

8.3 cat -- concatenate files

The cat command concatenates several files and copies the result
onto the standard output file.

      cat name1 .,. namen

Notice that ">" may be used: "cat a b >/ppt" punches the contatenation
of "a" and "b". "cat x" simply lists x on the typewriter.

8.4 chdir -- change directories

To change the current directory, use

      chdir dirname

This command is the only one that does not reside in directory
"/bin"; instead it is part of the Shell. The reason is
interesting. Recall that each ordinary command is executed as a
separate process created by the Shell. If the system’s chdir
primitive were executed in such a process, it would have essentially
no effect, since the process would terminate instantly
without affecting the current directory of the Shell process and
its subsequent offSpring. The Shell itself recognizes the chdir
command and calls the system to change directories without
creating a new process.

8.5 chmod -- change mode of file

To change the protection bits for a set of files,

      chmod mode name1 ... namen

The modes of name1, ..., namen are set to mode. Mode is an octal
number whose bits in the binary representation have the following
meanings:

      1 write, non-owner
      2 read, non-owner
      4 write, owner
     10 read, owner
     20 execute
     40 set user ID on execution

See also section 3.4 on the protection system. Most command
programs create files with mode 17; the assembler's "a.out" file
has mode 37.

8.6 chown -- change owner of files
To change the owner of a sequence of files,

      chown owner name1 ... namen .

Owner is a user number assigned by the system administrators.
Only the owner of a file may donate the file to another user.
Notice that chown does not change the directory in which the link
to the file exists.

8.7 cp -- copy file

To make a copy of a file,

      cp name1 name2

Either name1 or name2 may be special files.

8.8 db -- debug

To examine or patch a (usually binary) file,

      db [ name [ namelist ] ]

The first argument is the file to be examined. The second is a
name list file produced as "n.out" when name was assembled. The
brackets indicate that both arguments are optional. if the first
argument is not given, "core" is assumed. If the second is not
given, "n.out" is assumed. (of course, the first argument alone
cannot be omitted).

Db is discussed in complete detail in Appendix 2.

8.9 ed -- edit

ED is the editoro It is essentially a subset of QED [references];
see Appendix 2 for the differences.

8.10 ln -- link

To create a link,

      ln name1 [ name2 ]

A link to file name1 is created. If name2 is given, the link has
name name2, otherwise it has the (last component of) name1. For
example, "ln /a/b /c/d" creates a link named "d" in directory
"/c" to file "/a/b". The user must have permission to write in
directory "/c".

8.11 ls -- list directory

To list the names of the files in a directory,

      ls [ name ]

If name is not given, the contents of the current directory are
listed.

8.12 mkdir -- make directory

To create a directory,

      mkdir name

8.13 -- move file

To move or rename a file,

      mv name1 name2 ...

This command does not copy the file. It operates by linking to
name1 by the name name2, then unlinking name1. Mv is often used
to rename a file.

If name2 is a directory, name2 is moved into that directory under
the name which is the last component of name1. For example,

      mv x /dirname

moves x to /dirname/x.

8.14 nm -- get namelist

To get a printed listing of the symbol table (name list) from an
assembly,

      nm [ name ]

where name is the "n.out" file from some assembly. If name is not
given, "n.out" is listed.

8.15 pr -- print

The command

      pr name1 name2 ... namen

prints the contents of the named files. The output is separated
into pages headed by the file name, the time and date, and the
page number.

8.16 rm -- remove file

To unlink one or more files (remove them from directories),

      rm name1 ...

Recall that removing the last link to a file causes it to go
away.

8.17 roff -- run off (format)

Roff is a program similar to the one under GE-TSS which formats
text files under the control of commands embedded in the text.
The command

      roff name1 ... namen

will run off the concatenation of name1, ... namen. UNIX roff
supports all the features of TSS roff except "merge", tabs, and
footnotes. See [reference] for details.

8.18 sh -- Shell

To invoke the Shell,

      sh [ name ]

Name is interpreted as a file of commands. Name need not be
given, in which case the Shell will read its standard input file.
When called with an argument, the Shell refrains from typing its
prompt character "@". See section 5 above. The Shell has several
features besides those mentioned in section 5.

      1. Arguments or parts of arguments to commands enclosed in
	 single (’) or double (") quotes are taken literally, so that
	 arbitrary character strings can be passed (including spaces,
	 "<" or ">" etc.).

      2. The character "\" serves to quote the next character. In
	 this way a single command may extend over several lines,
	 since a new line preceded by "\" is treated like a space.

      3. When the Shell is invoked as a command, the character sequences
	 "$0", "$1", ... "$9" are treated as parameters. "$0"
	 is replaced by the name of the file being interpreted; "$1"
	 through "$9" are replaced by the first through ninth argument
	 following the file name. For example, when

	      sh runcom arg1 arg2 arg3

	 is typed, "$0" inside of "runcom" is replaced by "runcom",
	 "$1" is replaced by "arg1", etc.

8.19 stat -- get file status

To discover interesting information about one or more files,

      stat name1 ...

Stat gives the i-number, the mode, the owner, the size, and the
times of creation and last modification for each of name1, ....

8.20 tap -- manipulate DECTAPE

Tap is used to load and dump portions of the hierarchy onto
DECTAPE. See Appendix 2 for details.

8.21 tm -- time

To discover various information connected with time, the tm command
can be used:

      tm [ command arg1 ... argn ]

If called without arguments, tm prints out the time of day and
the total times accumulated in several categories:

      1. Processor time charged to the user.
      2. System overhead time.
      3. Time spent waiting for the disk.
      4. Idle time.
      5. Time spent in the interrupt routines.

Without an argument, tm gives these values both in absolute form
(i.e., totals since creation of the system) and as changes since
the last time tm was called. When called with one or more arguments,
the arguments are assumed to constitute a command to be
timed. Tm executes the given command and prints the times required
for the command in each of the above categories.

8.22 un -- undefined symbols

It is sometimes useful, to know the names of all the undefined
symbols in a given assembly. The command

      un [ name ]

searches the (name list) file name and prints all the stbols undefined
therein. If name is omitted, "n.out" is used.

APPENDIX 1

This appendix summarizes all the system calls. To understand the
calls to UNIX, it is fortunately necessary to know only very little
of the structure of the PDP-11. The machine contains several
general registers, of which only two are used for arguments to
the calls, namely R0 and R1. There is also a condition register,
one of whose bits records a carry occurring during an arithmetic
operation. To indicate an error the system sets this carry bit;
it is cleared for successful calls. There is a conditional branch
instruction to test the state of the bit. All registers not used
to communicate explidit arguments are unchanged by calls to the
system.

The instruction used to call the system is known to the assembler
as "sys"; when the processor executes this instruction it is
trapped to a specific location inside UNIX. The address field of
"sys" contains a number indicating which system call is desired.

The arguments for a call are placed either in a register or
immediately following the "sys" instruction.

A number of the calls, principally those dealing with the file
system, take strings as arguments. There is a standard format for
such a string: it consists of a sequence of bytes ending in a
null character. The open call below, contains a complete example
of how to write such a string.

A1.1 exit

Exit is used to terminate a process as follows:

      sys exit

There are no arguments, nor is there ever any return from this call.

A1.2 fork

This is the primitive used to generate new processes.

      sys fork
      (old process return)
      (new process return)

There are no input arguments. The error bit is set if no space is
available to create a new process, and control returns only to
the old process. R0 contains the process identification of the
new process. See also section 5.

The parent process returns immediately after the sys call; the
new process skips one word; (The label argument mentioned in the
discussion of fork in section 5.5 was a white lie.)

A1.3 read

To read an open file whose file descriptor is filep, load filep
into R0 and

      sys read
      buffer
      nchars

Buffer is the address of the place into which information is
read, and nchars is the maximum number of characters desired.
(The actual number, not its address.) The number of characters
actually read returns in R0. If R0 is zero, the end of the file
has been reached. The error bit may be set if, for example, the
file is a tape file and there was a permanent read error, or if
an attempt was made to read into an area not part of the user's
core image.

A1.4 write

To write an open file with file descriptor filep, load filep into
R0 and

      sys write
      buffer
      nchars

where buffer and nchars are the same as for read. The number of
characters actually written returns in R0 (ordinarily it is the
same as nchars) and errors are indicated by the error bit.

A1.5 open

To open an already existing file,

      sys open
      name
      mode
      ...
      name:<pathname\0>

name is the address of a string of characters constituting a path
name. The name is terminated by a null (all zero) character,
which is indicated by "\O"; the characters "<" and ">" are string
quotes. Mode is 0 or 1 to indicate reading or writing respectively.
The file descriptor returns in R0. If the file cannot be
found or if permission is not granted the error bit is set.

A1.6 creat

To create or recreate a file,

      sys creat
      name
      mode

name is the same as for open. Mode is a number encoding the protection
bits as specified under the "chmod" command below (A2.9).
Creation of a file implies opening for writing; the file descriptor
is returned in R0.

A1.7 close

To close a file, move the file descriptor into R0 and

      sys close

A program may have only a limited number of files open at one
time (currently, 10). Closing a file allows another file to be
opened in its place. Closing is otherwise unnecessary, for an
automatic close on all files is performed when the process terminates.

A1.8 wait

To wait for a child process to terminate,

      sys wait

The identification of the terminated process is returned in R0.
At the present time no further information is returned; in the
future a means of determining the fate of the process will be
provided.

If the process executing a wait has no living children, an error
is returned.

A1.9 link

To create a link in an arbitrary directory,

      sys link
      name1
      name2

Name1 and name2 are pointers to names as in open. File name1 is
linked to, and the link has name name2. An error is indicated if
name1 does not exist or if name2 does exist.

A1.10 unlink

To remove the name of a file from a directory,
      sys unlink
      name

Name is a pointer to a name. The specified entry is removed from
its directory; if this was the last entry (link) pointing to the
file, the file is destroyed.

A1.11 exec

To cause execution of a file as a program,

      sys exec
      name
      argp
      ...
 argp:arg1
      arg2
       .
       .
       .
      0

The first argument is the address of a file name. The second
argument is the address of a list of argument pointers terminated
by a zero pointer. Each argument pointer is the address of a
string to be passed to the command or other program. The first
argument pointer arg1 is, by convention, the name of the file
being invoked. When a program is executed by the Shell, it can
determine the name by which it was called. Thus one may write a
single program with several names which takes various actions
according to the name used.

A file invoked by exec begins execution at its relative location
0. At the start, its stack pointer (one of the general registers)
points to a list of its own arguments as follows:

      count
      arg1
      arg2

where there are count arguments in the list. Each argi points to
a standard format string. The argi are the same as those specified
to the exec call.

A1.12 chdir

To change the current directory,

      sys chdir
      dirname

dirname points to the standard format string describing a directory.

A1.13 time

The call

      sys time

returns in the AC and MQ registers the number of sixtieths of a
second since the start of the current year.

A1.14 mkdir

The call

      sys mkdir
      name

creates the file whose name is pointed to by name and marks it as
a directory.

A1.15 chmod

The mode of a file is changed by

      sys chmod
      name
      newmode

See the "chmod" command (section 8.5) for the interpretation of
the mode. Only the owner of a file may change the mode.

A1.16 chown

To change the owner of a file,

      sys chown
      name
      newowner

Only the owner of a file may change its owner.

A1.17 break

To save time, UNIX does not swap all of the 4K user core area
when exchanging core images. The locations swapped are those from
the beginning of the core image to the initial program break, and
from the top of user core down to the stack pointer. The initial
program break is determined by the size of the file containing
the program. The system’s idea of how much to swap may be altered
by using this call:

      sys break
      newbreak

Newbreak becomes the first location not swapped. If it points
beyond the stack, or to the very first word in the core image,
the entire core image is swapped.

A1.18 stat

The user may obtain a copy of the i-node for a named file:

      sys stat
      name
      buffer

Name is the name of a file, and buffer is the address of 34
sequential bytes into which information concerning the file is
placed. See section 4 for what information is passed; consult a
UNIX programming councillor for its format.

A.19 seek
To move the read or write pointer associated with the open file
with file descriptor filep, load filep into R0 and

      sys seek
      base
      offset

See also section 3.5.5.

A.20 tell

To discover the position of the read or write pointer associated
with the open file with file descriptor filep, move filep into R0
and

      sys tell
      base
      offset

The result returns in R0. See also section 3.5.6.

A1.21 (unassigned)

A1.22 intr

To control the handling of "break" signals sent by the user,

      sys intr

If R0 is zero on entry, interrupts are disabled; if R0 is non-zero,
they are enabled.

APPENDIX 2

This Appendix discusses in more detail the usage of the
assembler, the editor, the debugger, and the DECTAPE manipulation
command.

A2.1 as

As is based on the DEC-provided assembler PAL-11 [references],
although it was coded locally. Therefore, only the differences
will be recorded.

Character changes are:

      for use
       @ *
       # $
       ; /

In as, the character ";" is a logical new line; several operations
may appear on one line if separated by ";". Several new
expression operators have been provided:

      \> right shift (logical)
      \< left shift
      * multiplication
      \/ division
      \ remainder
      ! one's complement (unary)
      [] parentheses for grouping

There is a conditional assembly operation code:

      .if expression
      .endif

If the expression evaluates to non-zero, the section of code
between the ".if" and the ".endif" is assembled; otherwise it is
ignored. ".if"s may be nested.

Temporary labels like those introduced by Knuth [reference] may
be employed. A temporary label is defined as follows:

      n:

where n is a digit 0 ... 9. Symbols of the form "nf" refer to the
first label "n:" following the use of the symbol; those of the
form "nb" refer to the last "n:". The same "n" may be used many
times. Labels of this form are less taxing both on the imagination
of the programmer and on the symbol table space of the assembler.

The PAL-11 opcodes ".eot" and ".end" are redundant and are omitted.

The symbols

      r0 ... r5
      sp
      pc
      ac
      mq
      div
      mul
      lsh
      ash
      nor
      csw

are predefined with appropriate values.

The new cpcode "sys" is used to specify system calls. Names for
system calls are predefined. See Appendix 1 for the list of
calls.

Strings of characters may be assembled in a way more convenient
than PAL-11's ".ascii" operation (which is, therefore, omitted).
Strings are included between the string quotes "<" and ">":

      <here is a string>

Escape sequences exist to enter non graphic and other difficult
characters. These sequences are also effective in single and
double character constants introduced by single and double quotes
respectively:

      use for .
      \n newline (012)
      \0 NULL (000)
      \> >
      \t TAB (011)

When errors occur, a single-character diagnostic is typed out
together with the line number and the file name in which it
occurred. Errors in pass 1 cause cancellation of pass 2. The
possible errors are:

      ) parentheses error
      ] parentheses error
      * Indirection ("*") used illegally
      . . (the location counter) has become undefined
      A error in Address
      B Branch instruction has too remote an address
      E error in Expression
      F error in local ("F" or "b") type symbol
      G garbage (unknown) character
      M Multiply defined symbol as label
      0 Odd-- word quantity assembled at odd address
      P Phase error-- "." different in pass 2 from pass 1 value
      R Relocation error
      U Undefined symbol
      X syntaX error

The binary output of the assembler is placed on the file "a.out"
in the current directory. The assembler also generates a file
"n.out" which is a copy of the name list from the assembly, that
is, a table of the names of symbols used and their values.
"n.out" is used by the db, nm, and un commands.

The assembler does not produce a listing of the source program.
This is not a serious drawback; the debugger db discussed below
is sufficiently powerful to render a printed octal translation of
the source unnecessary.

A2.3 db

Unlike many debugging packages (including DEC's CDT, on which db
is loosely based) db is not loaded as part of the core image
which it is used to examine; instead it examines files. Typically,
the file will be either a core image produced after a
fault (see section 7) or the binary output of the assembler. dnb
is called as follows:

      db [ name [ namelist ] ]

Name is the file being debugged; if omitted "core" is assumed.
namelist is the "n.out" file produced when name was assembled; if
omitted, "n.out" is assumed. If no appropriate name list file can
be found, db can still be used but some of its symbolic facilities
become unavailable.

The format for most db requests is an address followed by a one
character command.

Addresses are expressions built up as follows:

      1. A name has the value assigned to it when the input file was
	 assembled. It may be relocatable or not depending on the
	 use of the name during the assembly.

      2. An octal number is an absolute quantity with the appropriate value.

      3. An octal number immediately followed by "r" is a relocatable
	 quantity with the appropriate value.

      4. The symbol "." indicates the current pointer of db. The
	 current pointer is set by many db requests.

      5. Expressions separated by "+" or " " (blank) are expressions
	 with value equal to the sum of the components. At most one
	 of the components may be relocatable.

      6. Expressions separated by "-" form an expression with value
	 equal to the difference to the components. If the right
	 component is relocatable, the left component must be
	 relocatable.

      7. Expressions are evaluated left to right.

If no address is given for a command, the current address (also
specified by ".") is assumed. In general, "." points to the last
word or byte printed by db.

There are db commands for examining locations interpreted as
octal numbers, machine instructions, ASCII characters, and
addresses. For numbers and characters, either bytes or words may
be examined. The following commands are used to examine the
Specified file.

      / The addressed word is printed in octal.

      \ The addressed byte is printed in octal.

      " The addressed word is printed as two ASCII characters.

      ' The addressed byte is printed as an ASCII character.

      ? The, addressed word is interpreted as a machine instruction
	and a symbolic form of the instruction, including symbolic
	addresses, is printed. Usually, the result will appear
	exactly as it was written in the source program.

      & The addressed word is interpreted as a symbolic address and
	is printed as the name of the symbol whose value is closest
	to the addressed word, possibly followed by a signed
	offset.

      <n1> (i.e., the character "new line") This command advances
	the current location counter "." and prints the resulting
	location in the mode last specified by one of the above
	requests.

      ^ This character decrements "." and prints the resulting
	location in the mode last selected one of the above
	requests. It is a converse to <n1>.

It is illegal for the word-oriented commands to have odd
addresses. The incrementing and decrementing of "." done by the
<nl> and ^ requests is by one or two depending on whether the
last command was word or byte oriented.

The address portion of any of the above commands may be followed
by a comma and then by an expression. In this case that number of
sequential words or bytes specified by the expression is printed.
"." is advanced so that it points at the last thing printed.
There are two commands to interpret the value of expressions.

      = When preceded by an expression, the value of the expression
		<< page A7 is missing here >>
	typewriter (EOT character).

A2.4 ed

Ed is nearly a subset of QED [reference]. When called by

      ed name

ed performs an automatic "r" (read) command on file name. The
major differences between ed and QED are:

      1. There is no "\f" character; input mode is left by typing
	 "." alone on a line.

      2. There are no buffers and hence no "\b" stream directive.

      3. The commands are limited to:

	     a c d i p q r s w = !

      4. The only special characters in regular expressions are:

	     * ^ $ [ .

	 which have the' usual meanings. However, "^" and "s" are
	 only effective if they are the first or last character
	 respectively of the regular expression. Otherwise
	 suppression of special meaning is done by preceding the
	 character by "\", which is not otherwise special.

      5. In the substitute command, only the leftmost occurrence of
	 the matched regular expression is substituted.

A2.5 tap

The tap command is used as follows:

      tap [O1][crxdt][s][v] name1 ... namen -name1 ... -namen

The first argument consists of characters which indicate what is
to be done. Subsequent arguments specify a set of files.

A digit (0 or 1) in the first argument indicates the logical unit
number on which the tape is mounted. C, r, x, d and t are
mutually exclusive:

      c indicates the creation of a new tape. Files name1, ...,
	namen are placed on the tape. If any of these are directories,
	all files and subdirectories therein are placed on
	the tape as well. Arguments preceded by "-" indicate files
	or directories which are not to be placed on the tape even
	though implied by one of the other arguments.

      r indicates that the files specified (exactly as for c) will
	be added to the tape. If there was a file of the same name
	as one of the specified files already on the tape, it will
	be replaced.

      x indicates that the specified files are to be extracted from
	the tape and copied onto the disk. If any directory needed
	does not exist, it will be created.

      d indicates that the specified files are to be deleted from
	the tape.

      t causes a partial table of contents of the tape to be produced,
	including all files implied by the following arguments.
	(E.g., "tap t /dmr" gives the names of all files on
	the tape in directory "/dmr").

Argument "v" (for "verify") may be used in addition to the preceding
arguments. Before each file is dealt with as indicated by
one of the preceding arguments, the "v" option causes tap to
pause, type the name of the affected file, and request the user
to decide whether the file should be treated. The reply "y"
means "yes"; an empty line means "no"; a "q" means "no, and exit
from the tap command". For example, by the use of "xv", files can
be selectively restored.

Argument "s" may be used alone or in addition to one of c, r, x,
t, d. It causes tap to examine the tape, verify that it can be
read properly, and produce statistics on the contents of the
tape.