This document provides general guidelines for writing SAS programs.
The SAS Command File
SAS programs are divided into DATA steps and PROCs. The purpose
of the data step is to create one or more SAS data sets. The data
step contains statements which read in raw data files or existing SAS
data sets. Other data step tasks include transforming, creating, and selecting
variables, selecting cases, defining missing data, and providing labels
for variables. The data step begins with the word DATA followed by the name
of a data set. See Sample Program #1 for
an example of a simple data step.
SAS PROCs are used to analyze or graph data or provide information
about a SAS data set. For example, PROC REG performs multiple regression
on sample data, while PROC CONTENTS tells the user the name and location
of the variables in a SAS data set.
A SAS program may contain one or more data steps and/or one or
The following texts provide useful information for writing your SAS
- SAS Language and Procedures (Version 6)
- SAS Language (Version 6)
- Applied Statistics and the SAS Programming Language by Ronald
P. Cody and Jeffrey K. Smith (Elsevier Science Publishing Co., NY).
Basic Syntax Rules
The SAS data step consists of a series of statements. Rules for
writing these statements follow.
- Words in statements must be separated by one or more blanks.
- A word may not be split between lines.
- Words may be in upper, lower, or mixed case.
- Values of character variables must match data values exactly (case-sensitive).
- Variable names may be one through eight characters in length.
- All variable names must begin with an alphabetic character (A-Z, a-z)
or an underscore (_). Subsequent characters may include digits.
- A variable list such as
- SAS matches variable names precisely character-wise, but not
case-wise. That is,
V1 is not the same as
V1 is the same as
- Variable names may not contain embedded blanks.
V_1 are acceptable;
V 1 is not.
- Certain names are reserved for use by SAS, e.g.,
_NAME_. Similarly, logical operators
should not be used as variable names.
- A statement may begin anywhere on a line and may be continued on additional
lines as necessary.
- Statements end with a semicolons (;).
- Statements which beginning with an asterisk (*) are treated
as comments and are not interpreted. A comment is concluded with a semicolon.
- A group of statements preceded by /* are ignored until */
is read (block comment). Semicolons between the /* ... */ have no effect.
- Multiple statements may appear on a line; they must be separated by
The Data Step
- The data step begins with the word DATA followed by a name for the
temporary or permanent data set to be output by the data step. See the
Examples page for sample programs which create
and use temporary SAS data sets.
- The data step includes instructions about where to find the data and
how to read the values from the data file. For more details, see the page
on Data Files in SAS.
- The data step may contain instructions to create new variables or transform
existing variables, label variables, and select cases or variables. The
following statements are examples of valid statements for the SAS
y = sum(of x1-x15) ;
label y = 'total score' ;
if y > 10 then group = 1 ;
else group = 2 ;
keep group y ;
- To refer to a missing value for a numeric variable, use a ".".
For example the statement
if a = 99 then a = . ; forces SAS
to treat a value of 99 as if it were missing.
- The data step is terminated when SAS encounters the word PROC
(signifying the beginning of a SAS Procedure) or finds the words
DATALINES ; or CARDS ; (indicating inline data following).
- All data step commands must be contained within the data step itself;
additional data step commands may only be inserted after a PROC after beginning
a new data step and reading in the default data set.
- SAS PROCs (procedures) are used for many purposes including
carrying out statistical analysis (e.g., PROC REG, PROC MEANS), displaying
information about a SAS data set (e.g., PROC CONTENTS, PROC PRINT),
and creating graphs (PROC PLOT).
- Most PROCs produce output of some kind. The output of statistical PROCs
usually appears in the listing file.
- The PROC(s) must appear after a data step which creates the SAS
data set used in the procedure.
- The word PROC automatically terminates a SAS data step.
- Data step commands may not appear after a PROC unless a new data step
is initiated with the word DATA.
- A SAS PROC begins with word PROC followed by the name of the
specific procedure (e.g., PROC REG).
- Some PROCs have options or subcommands which allow the user to output
information into a SAS data set (e.g., PROC UNIVARIATE, PROC REG).
- The default data set used by a PROC is the data set created by the
last data step or PROC before the current PROC. To change the data set
used by a PROC, use the DATA= option on the PROC line.
OPTIONS statement allows the programmer to set options
for the current session. For example:
OPTIONS NOCENTER LINESIZE=80
; sets the line size in the listing file as 80 columns in length
and shifts the output to the left side of the page.
INFILE is used to access a specific file. An example of
a INFILE statement appears in Example #2
ENDSAS is used to terminate the SAS program.