This web page contains ERDC-developed data cleaning, linking and other helpful macros available for public use.
ERDC has developed several Statistical Analysis System (SAS) macros to assist us in the cleaning and linking of education data. Since other states and other entities might find them useful, these SAS macros are available to the public.
-
The software is free.
-
ERDC releases all rights to these macros. We place all ERDC macros listed on this page into the public domain. If you use these macros, please cite ERDC as the original contributor of the material and include a link back to our site.
-
If you experience issues using these macros, please email erdc@ofm.wa.gov.
General Macros
Macro | Description | Version | Updated |
---|---|---|---|
CreateResearchlds.sas | Creates a research identifier based on, but completely masks, some other confidential identifier such as SSN, Student_Id, or Person_Id. | 1.0 | Oct. 14, 2011 |
ProcFreqCubed.sas | Analyzes a SAS dataset by running PROC FREQ against every column in the dataset. This is a very useful macro for characterizing all the variables in a dataset, especially “wide” datasets where it might be onerous to manually run PROC FREQ against each variable. | 1.4 | May 20, 2015 |
SQUEEZE.sas | Developed by Ross Bettinger, with slight modifications by ERDC. A useful macro for optimizing space required to store variables in a SAS dataset. | ||
GetLibraryMetadata.sas | Creates a report in the form of a data set that contains user-selected metadata types for each data set in a library. Example metadata types include the record count (RecordCount), the creation date (crdate), the number of columns (nvar) and the column names (ColumnNames). Possible metadata types include all those available in SAS's Dictionary Table. | 1.13 | June 23, 2014 |
CopyDataSetsToWorkbook.sas | Copies all the data sets in a SAS library to an Excel workbook. Which data sets that are copied can be filtered using a regular expression filter. | 0.91 | April 7, 2017 |
Data Cleaning Macros
Macro | Description | Version | Updated |
---|---|---|---|
DeleteSuperfluousRecords.sas | Deletes superfluous records from a data set. A superfluous record here means a record that contains information that is a strict subset of another record. An example is with name data, where one record has a first name, last name and a complete middle name, and other records have the same first and last names, but they have only the middle initial or no initial at all. In this case, the record with the complete middle name has more complete information than the other records, so this macro will delete the other records. | 0.941 | April 1, 2022 |
A Python implementation of DeleteSupefluousRecords.sas. The second file contains Pytest unit tests for delete_superfluous_records.py. These will download as .txt files and must be converted to .py. | 0.1 | April 11, 2022 | |
ConvertFlags.sas | Converts all flags in all character fields in a dataset from one defined set of two values to another defined set of two values. Can be used, for example, to convert all Y/N flags to 1/0 flags, or vise versa. | Dec. 21, 2011 | |
DenullifyDataset.sas | Sets to empty all text fields in a dataset are equal to a given value. For example, this macro can be used to set all instances of "NULL" to the empty string. | 1.0 | May 8, 2015 |
StandardizeNames.sas | Applies a set of business rules to standardize the first, middle and last names in a dataset. For example, the macro converts all names to upper case, strips out generational suffixes such as JR or III from the name fields and places these in a “Suffix” field, etc. | 1.3 | Nov. 7, 2014 |
StripDownAllCharVariables.sas | Applies the SAS function strip() to all variables in each character field in a dataset. This results in all leading and trailing spaces being removed from all character variables. | 1.01 | June 23, 2014 |
RemoveUnusedColumns.sas | Analyzed an inputted data set for columns that have no (null) data. It then outputs a second data set in which the columns without data have been removed. By using the optional “AlwaysKeep” parameter, certain columns can always be kept, even when they do not contain data. | 1.01 | April 7, 2017 |
StackAllDataSets.sas | Stacks (appends) all data sets in a library into a single data set. This macro determines the maximum length of each variable. These lengths are used in each variables definition in the outputted data set. As a result, the macro does not truncate data. If the data sets attempting to be stacked have variables with different type definitions (char/numeric), the macro will instead generate a data set report delineating each variable's type incompatibilities. | 0.97 | Sep. 18, 2017 |
RenameVariablesInLibref.sas | For all data sets in a libref, renames all variables that have old name/new name mappings in a renaming Excel worksheet. You can use this macro to standardize variable names across multiple data sets. For example, some of your data sets could have a “LastName” variable, and others a “PersonLastName” field. With the concordant mappings in the pertinent renaming worksheet, you could standardize this field to be something else, such as “LAST_NAME”. Optionally, the renamings can be based on variable labels. This macro will also work with DBMS systems such as SQL Server. | 0.93 | June 3, 2015 |
RenameVariablesInDataSet.sas | Similar to RenameVariablesInLibref.sas, but operates on a single data set. | 0.92 | June 3, 2015 |
General Linking Macros
Macro | Description | Version | Updated |
---|---|---|---|
DivineDisparateDedupLinks.sas | Compares two methods that were used to deduplicate (consolidate) either the same data set, or two data sets that share an overlapping key. Outputs a data set of instances where the two methods did not identically deduplicate the same keys. | 1.0 | Dec. 24, 2013 |
ExamineDedupKeyValue.sas | For a single record, displays how two different methods for de-duplication linked a single record to other records. Used in conjunction with DivineDisparateDeDupLinks.sas | Oct. 6, 2011 | |
LinkViaNameChange.sas | A set of macros that uses external name change data to match two sets of person ids that contain different names. The external name change data needs to have been arranged into old name/new name pairings. Examples of name change data sources include marriage data, divorce data and court name change data. Though there is some example test data at the bottom of this file, non-test external name change data is not provided. However, any given state will typically have either a Vital Records Office or a Center for Health Statistics from which record level marriage and divorce data could be obtained. | 0.91 | Feb. 20, 2013 |
CreateCrosslinkID.sas |
This program resolves many-to-many relationships between two record linkage identifiers into a single identifier, the “CrosslinkID.” These many-to-many relationships can occur under certain conditions even when using software that performs one-to-many record linkages.
Do not use CreateCrosslinkID.sas against a single data source! If you want to create a unique identifier, such as a person ID, for a single data source, use %CreateConsolidationID() instead. %CreateConsolidationID() is contained in CrosslinkID_ConsolidationID_Suite.sas below. |
1.01 | April 7, 2017 |
CrosslinkID_ConsolidationID_Suite.sas |
This source file contains four macros that can be used to deduplicate and consolidated record linkages:
An example of when you would use %CreateConsolidationID() if you had used Link Plus to deduplicate a file, and you needed to merge two or more of the resulting manual review files.
|
0.9 |
May 28, 2014 |
Link Plus Related Macros
Link Plus is a free probabilistic record linkage program developed at CDC's Division of Cancer Prevention and Control. It is a general-purpose program that can be used for deduplicating and linking any data set. These SAS macros serve to enhance the capabilities of Link Plus.
Macro | Description | Version | Updated |
---|---|---|---|
LinkPlusToolkit.sas | Combines all the Link Plus program files listed below into a single program file. Running this program will compile every Link Plus macro into SAS. | Oct. 9, 2012 | |
ReadInLinkPlusReport.sas | Reads a Link Plus external linkage report into SAS. | Jan. 31, 2012 | |
CreateLinkPlusFunctions.sas | Using PROC FCMP, creates a helper function for use by ReadInLinkPlusReport.sas. | Jan. 31, 2012 | |
ModifyLinkPlusReport.sas | Modifies a Link Plus external linkage report that was brought into SAS by using ReadInLinkPlusReport.sas. Modifies the “score” variable so that each “class” value has a non-overlapping range of scores. This facilitates clerical review by allowing you to accept and reject matches en masse by score and class. This program can be modified to add additional classes beyond the default 15 classes that Link Plus provides. | Jan. 31, 2012 | |
AssembleLinkPlusReport.sas | Creates a Link Plus external linkage report from the constituent parts produced by running ReadInLinkPlusReport.sas and modified by ModifyLinkPlusReport.sas. This modified report can then be opened using the manual review function of Link Plus. | Jan. 31, 2012 | |
MergeLinkPlusLinkageFiles.sas | Combines all the data file to data file Link Plus export files in a single directory. | Feb. 27, 2012 |