Header image: Architectural widgetsSociology at Duke
Navigation Bar: Resources

Research and Training

Sociological Sites

Computing Resources

Departmental Publications

University Libraries

Quick Links

  
 

UNIX Systems

Uses and Advantages of UNIX Systems
Statistical Software Resources Available
The UNIX Learning Curve
Basic UNIX Commands
UNIX Editors
PC X-Server Software
Batch Processing
UNIX Printing

Uses and Advantages of UNIX Systems

In addition to providing email and web services, UNIX systems are the primary locale for statistical, research computing and archival data resources. UNIX systems in general and Sociology systems in particular have a number of desirable attributes:

  • Data capacity - Unix systems scale well for large system RAID data storage and tape backup - the Sociology system houses and backs up over 100GB of user data.
  • Processing speed and capacity - a dual processor compute server with 4GB of memory provides excellent throughput for a workgroup of this department's size.
  • Remote Access - Sociology systems may be securely accessed from anywhere on the internet via Secure Shell connections.
  • Equalitarian Access - a policy objective is to provide capable processing power and a range of statistical alternatives to all who require it - the availability of central UNIX servers ensures this.
  • Collaborative Access - UNIX systems readily allow for controlled access to data that fosters collaborative research efforts.
  • Access to Archival Data Resources - the Sociology data library manages the Duke Inter-university Consortium for Political and Social Research (ICPSR) membership - any ICPSR study holding can be obtained by request - those studies which have been ordered are locally archived and directly available through the UNIX network.

Statistical Software Resources Available

The following statistical packages are currently licensed:

  • STATA - easy to learn with powerful interactive capabilities and an active user support community - the general purpose package of choice for most.
  • SAS - a general purpose statistical package with powerful data management capabilities that can be used in interactive or batch mode.
  • SPSS - another venerable product that many like to use interactively, but which also can be run in batch mode - better supported in the Windows environment.
  • S-Plus - similar to Stata in design but not much favored by sociologists - widely used by the Institute for Statistics and Decision Sciences (ISDS).
  • GAUSS - a more specialized program that is difficult to use, but which excels at certain types of maximum likelihood estimation.
  • LIMDEP - an econometric package with many specialized features.
  • LISREL/PRELIS - a structural equation modeling package.
  • RATE - a very specialized package used for event history analysis.
  • GLIM - a generalized linear interactive modeling program.
  • DBMS/COPY - a conversion utility for porting system files between statistical packages.

The UNIX Learning Curve

UNIX systems are typically accessed through command line based terminal windows that lack, except through specific software applications, the kind of graphical user interface that Windows users expect and rely upon. [This will change as Linux desktop systems become more prevalent that provide a desktop environment.] There is actually vast flexibility at the command line, but this is of little comfort to the uninitiated. So a basic survival strategy entails the following:

  1. Learn a few basic UNIX commands.
  2. Pick an editor and master its rudiments.
  3. Understand the role of PC X-server software.
  4. Learn the difference between interactive and batch job submission.
  5. Learn how to print UNIX output.

Basic UNIX Commands

In the UNIX resources section you will find quick and basic reference guides. The quick guide covers essential commands. The basic guide expands the command repetoire a bit and demonstrates concepts that allow UNIX commands to be grouped together to perform useful tasks. Key to remember is that each command is fully described in an online manual page that may be listed with the man command. For example, man ls, displays the manpage for the ls command. Scroll through lengthy pages screen-by-screen with the spacebar or line-by-line with the Enter key. Reverse scroll with the B key.

UNIX Editors

The editor we recommend is emacs. Another more basic, but intuitive editor is pico. This is the editor embedded in the Pine email client. Finally, some may be prefer vi, an editor commonly used by UNIX system administrators.

PC X-Server Software

The standard terminal window provided by SSH logins with F-Secure or TeraTerm is capable of displaying only textual information. It cannot display, for example, the graphical interfaces of Stata or SAS. To display UNIX graphics on a PC the SSH client must hand over the task to another piece of software called a PC X-server, so named because it can display graphics instructions written in a UNIX-based standard called X-Windows. The X-server software enables the PC desktop to perform this function, which is not native to Microsoft Windows.

A PC X-server application is a standard install component on departmental PCs used for UNIX processing. We use X-Win32 from StartNet Communications, for which Duke has a campus-wide site license. To use the X-Win software, simply load it from your start menu so that it's icon is added to the tray at the bottom of your Windows session. [Many department PCs are configured to automatically load this application when you login.] SSH clients are configured to transparently forward X-window requests to the X-server.

The Duke license for X-Win32 is restricted to Duke networks, which means that it will not work off-campus. Educational licensing is available to faculty and students directly from StartNet. Whether a PC X-server is a necessity for home use depends on your computing habits and tastes. Some prefer having a graphical interface for all work. Other users evolve toward use of non-graphical, batch submission.

Batch Processing

MS Windows computing is heavily oriented toward the interactive computing model. This also works fine under UNIX, but there are times when batch computing is preferrable. Batch job submission involves the idea of preparing a set of instructions (i.e.; a Stata or a SAS job) and submitting them as a task to be performed in the background while you do other work. There are a number of advantages to this approach:

  1. Jobs can be prepared and submitted from the command line without the overhead of a graphical interface. This is easy to do remotely and does not require a PC X-server.
  2. Work can be performed in a unified, modular way. The output of a task is stored in one or more files that log the specific instructions performed and the results reported. For complex projects with many data management and modelling steps, this approach provides a useful method of establishing an audit trail that is often easier to sort out than interactive logs.
  3. Complex models involving large data sets may take hours or days to complete. Batch submission frees you to do other things. Because execution of the job is not tied to a specific output terminal, you may even logoff the UNIX system and check the job later from another location.

Batch submission is simple. Edit the instructions to be performed and save them into a file, usually with a file name extension that makes the content apparent (e.g.; job2.sas). Submit a command that tells UNIX what application to run using which program file and instructing that the job be run in background mode. So carrying forth the SAS example:

     $ sas job2.sas &

This runs a SAS job using the instructions stored in job2.sas. The ampersand (&) forces background mode processing and returns control of the terminal window. If the job finishes while you are still logged on, a notification message appears on the terminal. In this instance, the SAS log is written to a file called job2.log and any output is written to job2.lst. Variations on the batch submission idea used by different statistical packages are discussed where relevant.

UNIX Printing

Each of the network printers in the department are configured for UNIX printing from the servers angst and charisma. Most printing consists of output from batch jobs or logs from interactive sessions. A general guide to UNIX printing explains the commands for efficiently printing this material. Instances of special printing needs or techniques associated with the use of an application are covered under its notes.

The files stored on the UNIX system are not directly accessible on your PC, so they cannot be printed to a locally attached PC printer. This is a minor inconvenience when in the department. Home printing requires file transfers to your PC.


People Graduate Program Undergraduate Program Resources Home Duke University Home