1. What is SAS and its functions?
SAS (Statistical Analysis System)
is statistical software designed by SAS Institute in 1960 for data inspection and report writing. SAS runs on Windows, UNIX and can be downloaded into Mainframe too. It is an integrated software suite that enables us to perform statistical analysis by altering, managing, and retrieving data from a variety of sources for software solution development. It provides a graphical point-and-click user interface for non-technical users and more advanced options through SAS language.
Functions of SAS
- Statistical and Mathematical Analysis
- Data Entry, Retrieval, and Management
- Forecasting, planning and Decision Support
- Research and Management
- Report Creation and Graphics
2. What is SAS Data Set?
SAS Dataset is a SAS table or file stored in a SAS library that SAS recognizes and processes. It is created from datalines in one's code or as the outcome of data extracted/manipulated from either a database or an external raw file. In a SAS Dataset, values are organized in a table of observations (rows) and variables (columns) and these values get processed by SAS software. A SAS dataset provides descriptor information which includes data types, length of variables, and which engine was used to create data.
3. Describe the basic structure of SAS program.
SAS program consists of three steps sequentially in which DATA and PROC are its main component as follows:
- SAS DATA step retrieves and manipulates data and starts with DATA keyword. It produces a SAS data set named DISTANCE. It is used for getting input data into a SAS dataset and also for editing i.e., checks errors in data and corrects them.
- SAS PROC analyzes the data. The PROC statement starts beginning of all procedures in SAS.
- SAS OUTPUT step throws the result of analyzed data. A program ends when the PROC step ends with RUN statement. It is used to save summary statistics in a SAS data set.
4. What are the syntax rules of a SAS program?
There are three components of a SAS program with syntax rules as follows:
- Statements can begin and stop anywhere, and there should be a semicolon at the end of every line as the end mark.
- Multiple statements can lie on the same line in which each statement ends with a semicolon.
- The components in a SAS program can be separated by using space.
- Every SAS program must end with RUN statement.
SAS Data Set
- Length of a Variable name can be up to 32 characters.
- A variable name will be without blank.
- A variable name must start with any character (not case sensitive)or an underscore (_).
- A variable name can include number excepting its first character.
- Variable names are case insensitive.
- The name of a SAS Data Set can be prefixed with a library name which makes it a permanent Data Set such that it will be persisted even after the session get over.
- A single word after the DATA statement indicates a temporary data set name by which a data set gets erased at session end.
- If a SAS data set name get erased then SAS creates a data set whose name gets generated by SAS like (DATA1, DATA2, etc.)
5. What is PDV?
PDV (Program Data Vector) represents a logical area of memory that is formed at the time of DATA step processing and can also be created by the MERGE, SET, MODIFY or UPDATE statements in Data Step. It is a storage place where SAS builds the Data Set by reading one observation at a time. An input buffer is created during the compilation phase which holds a record from an external file.
There are two types of variables encountered for every DATA step:
- Permanent (Data Set and computed variables)
- Temporary (Automatically generated and Option defined)
6. How many data types in SAS ?
SAS consists of two data types: Character and Numeric. Apart from these, dates also exists as characters although there are implicit functions to work upon dates.
7. What are _N_ and _Error_ in SAS?
_N_ and _Error_ are the temporary variables in SAS that gets generated automatically by the DATA step processing.
8. Difference between PROC MEANS and PROC SUMMARY.
- PROC MEANS produces a printable output by default while a PROC SUMMARY statement requires a PRINT option.
- PROC MEANS procedure produces default statistics (N, Mean, Standard Deviation, Minimum and Maximum) while PROC SUMMARY procedure provides the values _type_, _freq_, and _stat_.
9. Differences between One-to-One Merging and Match Merging.
- One-to-one merging combines observations from multiple data sets into a single observation in a new SAS data set while Match-merging does this process according to the values of a common variable.
- One-to-one merging uses the MERGE statement without BY statement while Match-merging use BY statement at just after the MERGE statement.
- One-to-one merging is suitable for matching observations while Match-merge is suitable if the observations do not match.
10. How to create a permanent SAS data set?
To create a permanent SAS data file:
- Define a SAS library using the LIBNAME statement and assign an engine.
- Write the data. Assign both the library (other than work) and the name of data set for creating a permanent SAS data set.
11. List the error types that SAS recognizes.
These are the following error types that SAS recognizes:
- Syntax errors occur when programming statements do not confirm the rules of SAS language. These errors are detected at the compile time.
- Semantic errors occur when the language element is correct but might not be valid for a particular usage. These errors are detected at the compile time.
- Execution-time occurs when SAS tries to execute a program and execution fails. These errors are detected at the execution time.
- Data errors occur by the invalid data values and get detected at the time of execution.
- Macro-related errors occur by the incorrect use of the macro facility and get detected at the execution time or macro compile time.
12. What do Input and Put function do?
Input function performs Character to numeric conversion.
put function performs Numeric to character conversion.
13. What is SAS language?
is a computer programming language used for statistical analysis that can read data from common spreadsheets and databases and outputs the statistical analysis result in tables, graphs, and as RTF, HTML, and PDF documents.