Uses and Advantages of UNIX Systems
Statistical Software Resources Available
The UNIX Learning Curve
Basic UNIX Commands
UNIX Editors
PC X-Server Software
Batch Processing
UNIX Printing
Uses and Advantages of UNIX
Systems
In addition to providing email and web services, UNIX systems are the
primary locale for statistical, research computing and archival data
resources. UNIX systems in general and Sociology systems in particular
have a number of desirable attributes:
- Data capacity - Unix systems scale well for large system
RAID data storage and tape backup - the Sociology system houses and
backs up over 100GB of user data.
- Processing speed and capacity - a dual processor compute
server with 4GB of memory provides excellent throughput for a workgroup
of this department's size.
- Remote Access - Sociology systems may be securely accessed
from anywhere on the internet via Secure Shell connections.
- Equalitarian Access - a policy objective is to provide capable
processing power and a range of statistical alternatives to all who
require it - the availability of central UNIX servers ensures this.
- Collaborative Access - UNIX systems readily allow for controlled
access to data that fosters collaborative research efforts.
- Access to Archival Data Resources - the Sociology data library
manages the Duke Inter-university Consortium for Political and Social
Research (ICPSR)
membership - any ICPSR study holding can be obtained by request
- those studies which have been ordered are locally archived and directly
available through the UNIX network.
Statistical Software Resources
Available
The following statistical packages are currently licensed:
- STATA - easy to learn with powerful interactive capabilities
and an active user support community - the general purpose package
of choice for most.
- SAS - a general purpose statistical package with powerful
data management capabilities that can be used in interactive or batch
mode.
- SPSS - another venerable product that many like to use interactively,
but which also can be run in batch mode - better supported in the
Windows environment.
- S-Plus - similar to Stata in design but not much favored
by sociologists - widely used by the Institute for Statistics and
Decision Sciences (ISDS).
- GAUSS - a more specialized program that is difficult to use,
but which excels at certain types of maximum likelihood estimation.
- LIMDEP - an econometric package with many specialized features.
- LISREL/PRELIS - a structural equation modeling package.
- RATE - a very specialized package used for event history
analysis.
- GLIM - a generalized linear interactive modeling program.
- DBMS/COPY - a conversion utility for porting system files
between statistical packages.
The UNIX Learning Curve
UNIX systems are typically accessed through command line based terminal
windows that lack, except through specific software applications, the
kind of graphical user interface that Windows users expect and rely
upon. [This will change as Linux desktop systems become more prevalent
that provide a desktop environment.] There is actually vast flexibility
at the command line, but this is of little comfort to the uninitiated.
So a basic survival strategy entails the following:
- Learn a few basic UNIX commands.
- Pick an editor and master its rudiments.
- Understand the role of PC X-server software.
- Learn the difference between interactive and batch job submission.
- Learn how to print UNIX output.
Basic UNIX Commands
In the UNIX resources section you will find quick
and basic reference
guides. The quick guide covers essential commands. The basic guide expands
the command repetoire a bit and demonstrates concepts that allow UNIX
commands to be grouped together to perform useful tasks. Key to remember
is that each command is fully described in an online manual page that
may be listed with the man command. For example, man ls,
displays the manpage for the ls command. Scroll through lengthy
pages screen-by-screen with the spacebar or line-by-line with
the Enter key. Reverse scroll with the B key.
UNIX Editors
The editor we recommend is emacs.
Another more basic, but intuitive editor is pico.
This is the editor embedded in the Pine email client. Finally, some
may be prefer vi,
an editor commonly used by UNIX system administrators.
PC X-Server Software
The standard terminal window provided by SSH logins with F-Secure or
TeraTerm is capable of displaying only textual information. It cannot
display, for example, the graphical interfaces of Stata or SAS. To display
UNIX graphics on a PC the SSH client must hand over the task to another
piece of software called a PC X-server, so named because it can display
graphics instructions written in a UNIX-based standard called X-Windows.
The X-server software enables the PC desktop to perform this function,
which is not native to Microsoft Windows.
A PC X-server application is a standard install component on departmental
PCs used for UNIX processing. We use X-Win32 from StartNet Communications,
for which Duke has a campus-wide site license. To use the X-Win software,
simply load it from your start menu so that it's icon is added to the
tray at the bottom of your Windows session. [Many department PCs are
configured to automatically load this application when you login.] SSH
clients are configured to transparently forward X-window requests to
the X-server.
The Duke
license for X-Win32 is restricted to Duke networks, which means
that it will not work off-campus. Educational licensing is available
to faculty and students directly from StartNet.
Whether a PC X-server is a necessity for home use depends on your computing
habits and tastes. Some prefer having a graphical interface for all
work. Other users evolve toward use of non-graphical, batch submission.
Batch Processing
MS Windows computing is heavily oriented toward the interactive computing
model. This also works fine under UNIX, but there are times when batch
computing is preferrable. Batch job submission involves the idea
of preparing a set of instructions (i.e.; a Stata or a SAS job) and
submitting them as a task to be performed in the background while you
do other work. There are a number of advantages to this approach:
- Jobs can be prepared and submitted from the command line without
the overhead of a graphical interface. This is easy to do remotely
and does not require a PC X-server.
- Work can be performed in a unified, modular way. The output of a
task is stored in one or more files that log the specific instructions
performed and the results reported. For complex projects with many
data management and modelling steps, this approach provides a useful
method of establishing an audit trail that is often easier to sort
out than interactive logs.
- Complex models involving large data sets may take hours or days
to complete. Batch submission frees you to do other things. Because
execution of the job is not tied to a specific output terminal, you
may even logoff the UNIX system and check the job later from another
location.
Batch submission is simple. Edit the instructions to be performed and
save them into a file, usually with a file name extension that makes
the content apparent (e.g.; job2.sas). Submit a command that tells UNIX
what application to run using which program file and instructing
that the job be run in background mode. So carrying forth the
SAS example:
$ sas job2.sas &
This runs a SAS job using the instructions stored in job2.sas.
The ampersand (&) forces background mode processing and returns
control of the terminal window. If the job finishes while you are still
logged on, a notification message appears on the terminal. In this instance,
the SAS log is written to a file called job2.log and any
output is written to job2.lst. Variations on the batch
submission idea used by different statistical packages are discussed
where relevant.
UNIX Printing
Each of the network printers in the department are configured for UNIX
printing from the servers angst and charisma. Most printing consists
of output from batch jobs or logs from interactive sessions. A general
guide to UNIX printing
explains the commands for efficiently printing this material. Instances
of special printing needs or techniques associated with the use of an
application are covered under its notes.
The files stored on the UNIX system are not directly accessible on
your PC, so they cannot be printed to a locally attached PC printer.
This is a minor inconvenience when in the department. Home printing
requires file transfers to your PC.