SAS Macros

This web page contains ERDC-developed data cleaning, linking and other helpful macros available for public use.

ERDC has developed several Statistical Analysis System (SAS) macros to assist us in the cleaning and linking of education data. Since other states and other entities might find them useful, these SAS macros are available to the public.  

  • The software is free.  

  • ERDC releases all rights to these macros. We place all ERDC macros listed on this page into the public domain. If you use these macros, please cite ERDC as the original contributor of the material and include a link back to our site. 

  • If you experience issues using these macros, please email erdc@ofm.wa.gov

General Macros 

Macro Description Version Updated
CreateResearchlds.sas   Creates a research identifier based on, but completely masks, some other confidential identifier such as SSN, Student_Id, or Person_Id. 1.0  Oct. 14, 2011 
ProcFreqCubed.sas   Analyzes a SAS dataset by running PROC FREQ against every column in the dataset. This is a very useful macro for characterizing all the variables in a dataset, especially “wide” datasets where it might be onerous to manually run PROC FREQ against each variable.  1.4 May 20, 2015
SQUEEZE.sas   Developed by Ross Bettinger, with slight modifications by ERDC. A useful macro for optimizing space required to store variables in a SAS dataset.    
GetLibraryMetadata.sas  Creates a report in the form of a data set that contains user-selected metadata types for each data set in a library. Example metadata types include the record count (RecordCount), the creation date (crdate), the number of columns (nvar) and the column names (ColumnNames). Possible metadata types include all those available in SAS's Dictionary Table.  1.13 June 23, 2014 
CopyDataSetsToWorkbook.sas  Copies all the data sets in a SAS library to an Excel workbook. Which data sets that are copied can be filtered using a regular expression filter.  0.91 April 7, 2017 

 

Data Cleaning Macros 

Macro Description Version Updated
DeleteSuperfluousRecords.sas   Deletes superfluous records from a data set. A superfluous record here means a record that contains information that is a strict subset of another record. An example is with name data, where one record has a first name, last name and a complete middle name, and other records have the same first and last names, but they have only the middle initial or no initial at all. In this case, the record with the complete middle name has more complete information than the other records, so this macro will delete the other records.  0.941  April 1, 2022 

DeleteSuperfluousRecords.py 

DeleteSuperflousRecordsTest.py

A Python implementation of DeleteSupefluousRecords.sas. The second file contains Pytest unit tests for delete_superfluous_records.py. These will download as .txt files and must be converted to .py. 0.1  April 11, 2022 
ConvertFlags.sas Converts all flags in all character fields in a dataset from one defined set of two values to another defined set of two values. Can be used, for example, to convert all Y/N flags to 1/0 flags, or vise versa.    Dec. 21, 2011
DenullifyDataset.sas  Sets to empty all text fields in a dataset are equal to a given value. For example, this macro can be used to set all instances of "NULL" to the empty string.  1.0 May 8, 2015 
StandardizeNames.sas  Applies a set of business rules to standardize the first, middle and last names in a dataset. For example, the macro converts all names to upper case, strips out generational suffixes such as JR or III from the name fields and places these in a “Suffix” field, etc. 1.3  Nov. 7, 2014 
StripDownAllCharVariables.sas  Applies the SAS function strip() to all variables in each character field in a dataset. This results in all leading and trailing spaces being removed from all character variables.  1.01 June 23, 2014 
RemoveUnusedColumns.sas  Analyzed an inputted data set for columns that have no (null) data. It then outputs a second data set in which the columns without data have been removed. By using the optional “AlwaysKeep” parameter, certain columns can always be kept, even when they do not contain data.  1.01  April 7, 2017 
StackAllDataSets.sas  Stacks (appends) all data sets in a library into a single data set. This macro determines the maximum length of each variable. These lengths are used in each variables definition in the outputted data set. As a result, the macro does not truncate data. If the data sets attempting to be stacked have variables with different type definitions (char/numeric), the macro will instead generate a data set report delineating each variable's type incompatibilities.  0.97  Sep. 18, 2017 
RenameVariablesInLibref.sas For all data sets in a libref, renames all variables that have old name/new name mappings in a renaming Excel worksheet. You can use this macro to standardize variable names across multiple data sets. For example, some of your data sets could have a “LastName” variable, and others a “PersonLastName” field. With the concordant mappings in the pertinent renaming worksheet, you could standardize this field to be something else, such as “LAST_NAME”. Optionally, the renamings can be based on variable labels. This macro will also work with DBMS systems such as SQL Server.  0.93  June 3, 2015
RenameVariablesInDataSet.sas  Similar to RenameVariablesInLibref.sas, but operates on a single data set.  0.92  June 3, 2015 

 

General Linking Macros 

Macro Description Version Updated
DivineDisparateDedupLinks.sas Compares two methods that were used to deduplicate (consolidate) either the same data set, or two data sets that share an overlapping key. Outputs a data set of instances where the two methods did not identically deduplicate the same keys.  1.0 Dec. 24, 2013 
ExamineDedupKeyValue.sas  For a single record, displays how two different methods for de-duplication linked a single record to other records. Used in conjunction with DivineDisparateDeDupLinks.sas    Oct. 6, 2011 
LinkViaNameChange.sas  A set of macros that uses external name change data to match two sets of person ids that contain different names. The external name change data needs to have been arranged into old name/new name pairings. Examples of name change data sources include marriage data, divorce data and court name change data. Though there is some example test data at the bottom of this file, non-test external name change data is not provided. However, any given state will typically have either a Vital Records Office or a Center for Health Statistics from which record level marriage and divorce data could be obtained.  0.91  Feb. 20, 2013 
CreateCrosslinkID.sas 

This program resolves many-to-many relationships between two record linkage identifiers into a single identifier, the “CrosslinkID.” These many-to-many relationships can occur under certain conditions even when using software that performs one-to-many record linkages. 
Say you have two files that you have linked together through an identity matching/record linkage process. This process creates a data set containing two columns, where IDs from the first data set are linked to IDs in the second data set. If there is a one-to-one correspondence between the two sets of IDs, then you could use the file as is. Similarly, if there was a one-to-many or many-to-one correspondence, you could again use the data set as is. But if there is a many-many correspondence between the two IDs fields, then you will need to extract a "CrosslinkID" using this macro that resolves the many-to-many relationships. The result is a data set containing three columns: 

  1. CrosslinkID 

  1. The ID from the first data set 

  1. The ID from the second data set 

Do not use CreateCrosslinkID.sas against a single data source! If you want to create a unique identifier, such as a person ID, for a single data source, use %CreateConsolidationID() instead. %CreateConsolidationID() is contained in CrosslinkID_ConsolidationID_Suite.sas below. 

1.01 April 7, 2017 
CrosslinkID_ConsolidationID_Suite.sas 

This source file contains four macros that can be used to deduplicate and consolidated record linkages: 

  1. %CreateCrosslinkID() – See description for CreateCrosslinkID.sas above. 

  1. %CombineCrosslinkIDs() – Combines the linking variables between two sets of overlapping or non-overlapping data sets, and creates a single CrosslinkID 

  1. %CreateConsolidationID() – Whereas %CreateCrosslinkID()will resolve many-to-many relationships between two data sets, %CreateConsolidationID() resolves many-to-many linking variable relationships in a single partly deduplicated data set. 
    Say you have partly deduplicated a data set through an identity matching/record linkage process. The result is a series of one-to-many relationships between the deduplication ID and the ID (for example the record number) of the data set that was deduplicated. But if these one-to-many relationships overlap, this would mean more than one deduplication ID is linked to a single ID (record number). This then would result in a many-to-many relationship, in which case you will need to extract out the "ConsolidationID" using this macro. The result is a data set containing two columns: 
     

  1. ConsolidationID, 

  1. ID ((often the record number of the data set that was deduplicated). 

An example of when you would use %CreateConsolidationID() if you had used Link Plus to deduplicate a file, and you needed to merge two or more of the resulting manual review files. 

  1. %CombineConsolidationID() – Combines the linking variables between two overlapping or non-overlapping data sets, and creates a single ConsolidationID. 
    In addition to the four macros listed above, the program listing also contains a help macro, %_CreateCrosslinkID_RecurseFunc(). To gain further insight into what each macro does, please see sample data and macro calls below each macro listing in the source file. 

0.9

May 28, 2014

 

Link Plus Related Macros

Link Plus is a free probabilistic record linkage program developed at CDC's Division of Cancer Prevention and Control. It is a general-purpose program that can be used for deduplicating and linking any data set. These SAS macros serve to enhance the capabilities of Link Plus. 

Macro Description Version Updated
LinkPlusToolkit.sas  Combines all the Link Plus program files listed below into a single program file. Running this program will compile every Link Plus macro into SAS.    Oct. 9, 2012 
ReadInLinkPlusReport.sas   Reads a Link Plus external linkage report into SAS.    Jan. 31, 2012 
CreateLinkPlusFunctions.sas  Using PROC FCMP, creates a helper function for use by ReadInLinkPlusReport.sas   Jan. 31, 2012 
ModifyLinkPlusReport.sas  Modifies a Link Plus external linkage report that was brought into SAS by using ReadInLinkPlusReport.sas. Modifies the “score” variable so that each “class” value has a non-overlapping range of scores. This facilitates clerical review by allowing you to accept and reject matches en masse by score and class. This program can be modified to add additional classes beyond the default 15 classes that Link Plus provides.    Jan. 31, 2012 
AssembleLinkPlusReport.sas  Creates a Link Plus external linkage report from the constituent parts produced by running ReadInLinkPlusReport.sas and modified by ModifyLinkPlusReport.sas. This modified report can then be opened using the manual review function of Link Plus.    Jan. 31, 2012 
MergeLinkPlusLinkageFiles.sas  Combines all the data file to data file Link Plus export files in a single directory.    Feb. 27, 2012