/*------------------------------------------------------------------------------ File: StackAllDataSets.sas Author: John Sabel john.sabel@ofm.wa.gov Washington State Education Research & Data Center Creation date: 2014-07-07 Version: 0.96 Purpose Stacks (appends) all data sets in a library into a single data set. This macro determines the maximum length for each variable. These lengths are used in each variable's definition in the outputted data set. As a result, the macro does not truncate data. If the data sets attempting to be stacked have variables with different type definitions (char/numeric), the macro will instead generate a data set report delineating each variable's type incompatibilities. The macro includes a set of filters to limit which data sets and/or which variables are stacked. Parameters LibnameIn: Name of library containing data sets to be merged. DsnOut: The name of the outputted data set. RegExDsnFilter_1: (Optional) A Perl Regular Expression used to filter the data sets to merge in the "LibnameIn" library. The default is /./, which acts as no filter at all, so by default all data sets in a library will be merged. When you do use this regular expression filter consider using the "I" option in order to make the filter case insensitive. SourceColumn: (Optional) When set to Y, creates the "SourceDataset" variable in DsnOut. This variable contains the original source data set name for each record. Default is Y. OnlyTenRecords: (Optional) When set to Y (actually to not N), will only stack ten records from each data set. Allows you to get a sense of what the macro will do before you try to stack numerous large complete data sets together. Default is N. RegExColFilter_1: (Optional) A Perl regular expression used to filter which variables (columns) get stacked. The default, /./, is no filter at all (unfiltered). Debug: (Optional) When set to Y, intermediate data sets are not deleted. Also, this macro outputs to the Results Viewer the variable names, types lengths as well as data set names. Note If one or more variable name has different types (i.e. num and char) in different data sets that are trying to be stacked, this macro will instead create the data set "SchzoidVariables" which will provide information concerning the type incompatilities. If this happens, you will need to resolve the type incompatibilites before trying to run this macro again. Signature %StackAllDataSets(LibnameIn=, DsnOut=, RegExDsnFilter_1=/./I, RegExColFilter_1=/./I, SourceColumn=Y, OnlyTenRecords=N, Debug=N) Revisions: Date Version Author Comments 2014-08-20 0.91 John Added "Source" to DsnOut and added input parameter "SourceColumn". 2014-08-21 0.92 John Removed DISTINCT keyword from final DsnOut SQL statement. Also, deleting temp table error. Also, improved the reordering of DsnOut variables. 2014-08-26 0.93 John Redid sort of SchiziodVariables to improve table presentation. Also, removed UPCASE statements from SQL where clauses involving LIBNAMEs and MEMNAMEs. 2014-09-16 0.94 John Fixed lengths of DsnOut variables bug. Added parameter "OnlyTenRecords". Added parameter check for LibnameIn. 2014-11-07 0.95 John For all "MEMNAME NOT IN" clauses, added adddtional temporary tables so that macro can disregard any temp tables that were generated from previous runs where Debug parameter was set to Y. Renamed DsnOut's "Source" variable to "SourceDataset" because sometimes source data might already already have a pre-existing "Source" column. 2014-12-31 0.96 John Added code to check if given the input parameters there were any columns that could be stacked. If not, the macro terminates. Also, added code to add labels, formats and informats to DsnOut for the variables in the data sets being stacked. ------------------------------------------------------------------------------*/ %macro StackAllDataSets(LibnameIn=, DsnOut=, RegExDsnFilter_1=/./I, RegExColFilter_1=/./I, SourceColumn=Y, OnlyTenRecords=N, Debug=N); %put --- Start of %upcase(&sysmacroname) macro; %local NumVars; %local NumDataSets; %local err or; %local LibnameExists; %let err=ERR; %let or=OR; %if &Debug.~=N %then %do; %put _local_; %end; /* Helping to make repeated macro calls stateless...*/ %if %sysfunc(exist(&DsnOut.)) %then %do; proc sql; DROP TABLE &DsnOut.; quit; %end; proc sql noprint; SELECT DISTINCT LIBNAME INTO :Foo FROM DICTIONARY.TABLES WHERE LIBNAME = "%upcase(&LibnameIn.)" ; %let LibnameExists=&sqlobs.; quit; %if &LibnameExists.=0 %then %do; %put &ERR.&OR.: LibnameIn &LibnameIn. does not exist. Macro will now terminate.; %goto exit; %end; proc sql; CREATE TABLE SchizoidVariables AS SELECT DISTINCT COLUMNS.Name, COLUMNS.MEMNAME, COLUMNS.TYPE, COLUMNS.LENGTH, COLUMNS.VARNUM FROM DICTIONARY.COLUMNS INNER JOIN ( SELECT Name, COUNT(Type) AS TypeCount FROM ( SELECT DISTINCT Name, Type FROM DICTIONARY.COLUMNS WHERE LIBNAME = "%upcase(&LibnameIn.)" AND MEMNAME NOT IN ( '__STACKALL_VARS_01', '__STACKALL_VARS_01A', '__STACKALL_VARS_01B', '__STACKALL_VARS_02', '__STACKALL_DSNOUT', '__STACKALL_TEMP', 'SCHIZOIDVARIABLES') AND prxmatch("&RegExDsnFilter_1", MEMNAME) > 0 AND prxmatch("&RegExColFilter_1", NAME) > 0 ) TypeCheck GROUP BY Name HAVING TypeCount > 1 ) MultiTypes ON COLUMNS.Name = MultiTypes.Name WHERE COLUMNS.LIBNAME = "%upcase(&LibnameIn.)" AND MEMNAME NOT IN ( '__STACKALL_VARS_01', '__STACKALL_VARS_01A', '__STACKALL_VARS_01B', '__STACKALL_VARS_02', '__STACKALL_DSNOUT', '__STACKALL_TEMP', 'SCHIZOIDVARIABLES') AND prxmatch("&RegExDsnFilter_1", COLUMNS.MEMNAME) > 0 AND prxmatch("&RegExColFilter_1", COLUMNS.NAME) > 0 ORDER BY Name,MEMNAME ; quit; %if &sqlobs.=0 %then %do; proc sql; DROP TABLE SchizoidVariables; quit; %end; %if &sqlobs.>0 %then %do; %put &err&or: (&sysmacroname.) One or more variables are defined as both numeric and character in data sets that macro was attempting to stack.; %put &err&or: (&sysmacroname.) See SchizoidVariables data set for more information.; %goto exit; %end; /* Fetch a list of all variables and other information for all data sets that are to be stacked. */ proc sql; CREATE TABLE __STACKALL_VARS_01 AS SELECT DISTINCT Memname, Name, Length, VarNum, Type, label, format, informat FROM DICTIONARY.COLUMNS WHERE LIBNAME = "%upcase(&LibnameIn.)" AND MEMNAME NOT IN ('__STACKALL_VARS_01', '__STACKALL_VARS_01A', '__STACKALL_VARS_01B', '__STACKALL_VARS_02', '__STACKALL_DSNOUT', '__STACKALL_TEMP', 'SCHIZOIDVARIABLES') AND prxmatch("&RegExDsnFilter_1", MEMNAME) > 0 AND prxmatch("&RegExColFilter_1", NAME) > 0 ORDER BY Name, memname; ; quit; %if &sqlobs=0 %then %do; %put WARN%str(ING): (&sysmacroname.) There are no columns to stack. Macro will now terminate.; %goto exit; %end; /* The data set __STACKALL_VARS_01 data set was sorted by NAME and MEMNAME. So first.Name outputs the first occurance of a NAME and associated data within a list of alphabetically sorted MEMNAMES. The reason for this work is so that in the final outputted data set, the order of the variables will follow the order of the first data set (MEMNAME). Any additional variables from the second data set that aren't in the first data set will be tacked on the end, and so on through all the data sets. */ data __STACKALL_VARS_01a; set __STACKALL_VARS_01; by Name; if first.Name then output; run; /* Back to __STACKALL_VARS_01 to determine the maximum length of each NAME (variable). This will be used to help define the schema of DsnOut so that no data will be truncated. */ proc sql; CREATE TABLE __STACKALL_VARS_01b AS SELECT Name, Max(Length) AS Length FROM __STACKALL_VARS_01 GROUP BY Name ; quit; proc sql; CREATE TABLE __STACKALL_VARS_02 AS SELECT DISTINCT ColumnMetadata.Memname, ColumnMetadata.Name, LengthMetadata.Length, ColumnMetadata.varnum, ColumnMetadata.type, ColumnMetadata.label, ColumnMetadata.format, ColumnMetadata.informat FROM __STACKALL_VARS_01a ColumnMetadata INNER JOIN __STACKALL_VARS_01b LengthMetadata ON ColumnMetadata.Name = LengthMetadata.Name ; quit; /* For when Debug=Y, and you want to examine this table. */ proc sort data=__STACKALL_VARS_02; by Memname VarNum; run; proc sql %if &Debug.=N %then %do; noprint; %end; %else %do; ; title "Variable Names, Types and Maximum Lengths"; title2 "Library: %upcase(&LibnameIn.), RegEx Table Filter: &RegExDsnFilter_1., , RegEx Column Filter: &RegExColFilter_1."; %end; SELECT Name, Type, VarNum, Length AS MaximumVariableLength, label, format, informat INTO :Names1-:Names999, :Types1-:Types999, :VarNums1-:VarNums999, :Lengths1-:Lengths999, :Labels1-:Labels999, :Formats1-:Formats999, :Informats1-:Informats999 FROM __STACKALL_VARS_02 ORDER BY Memname, VarNum; ; %Let NumVars=&sqlobs.; quit; /* Create the lis of data sets to be stacked, and the number of data sets to be stacked. */ proc sql %if &Debug.=N %then %do; noprint; %end; %else %do; ; title "Data Set Names"; title2 "Library: %upcase(&LibnameIn.), RegEx Table Filter: &RegExDsnFilter_1., , RegEx Column Filter: &RegExColFilter_1."; %end; SELECT DISTINCT MEMNAME INTO :MemNames1-:MemNames999 FROM Dictionary.COLUMNS WHERE LIBNAME = "%upcase(&LibnameIn.)" AND MEMTYPE = 'DATA' AND MEMNAME NOT IN ('__STACKALL_VARS_01', '__STACKALL_VARS_01A', '__STACKALL_VARS_01B', '__STACKALL_VARS_02', '__STACKALL_DSNOUT', '__STACKALL_TEMP', 'SCHIZOIDVARIABLES') AND prxmatch("&RegExDsnFilter_1", MEMNAME) > 0 AND prxmatch("&RegExColFilter_1", NAME) > 0 ; %let NumDataSets=&sqlobs.; quit; proc sql noprint; CREATE TABLE __STACKALL_DSNOUT ( SourceDataset char(32), %do I = 1 %to &NumVars.; &&Names&I %if &&Types&I=num %then NUM(&&Lengths&I); %else CHAR(&&Lengths&I); LABEL = "&&Labels&I" %if &&Formats&I ~= %then FORMAT=&&Formats&I; %if &&Informats&I ~= %then INFORMAT=&&Informats&I; %if &I.~=&NumVars. %then ,; %end; ) ; quit; %do I = 1 %to &NumDataSets.; %if &OnlyTenRecords.=N %then %do; proc append base=__STACKALL_DSNOUT data=&LibnameIn..&&MemNames&I force; run; %end; %else %do; data __STACKALL_TEMP; set &LibnameIn..&&MemNames&I (obs=10); run; proc append base=__STACKALL_DSNOUT data=__STACKALL_TEMP force; run; %end; proc sql; UPDATE __STACKALL_DSNOUT SET SourceDataset="&&MemNames&I.." WHERE SourceDataset IS NULL ; quit; %end; proc sql; CREATE TABLE &DsnOut. AS SELECT * FROM __STACKALL_DSNOUT %if &SourceColumn.~=Y %then (drop=SourceDataset);; quit; %if &Debug.=N %then %do; proc sql; DROP TABLE __STACKALL_VARS_01, __STACKALL_VARS_01A, __STACKALL_VARS_01B, __STACKALL_VARS_02, __STACKALL_DSNOUT; quit; %if &OnlyTenRecords.=Y %then %do; proc sql; DROP TABLE __STACKALL_TEMP; quit; %end; %end; %exit: %if &Debug.~=N %then %do; %put _user_; %end; title ""; title2 ""; %put --- End of %upcase(&sysmacroname) macro; %mend StackAllDataSets; /* data NameTestData1; input @1 FirstName $10. @11 LastName $10. @21 BirthDate mmddyy10.; format BirthDate yymmddd10.; informat BirthDate mmddyy10.; cards; Jason Long 10/21/1950 Audrey Flytrap 12/25/2014 ; run; data NameTestData2; input FirstName $1-10 LastName $12-21 ShoeSize 24-25; cards; Johnjacob Longfellow 11 Dominique Luvidillawa 7 ; run; data PerhapsUnrelatedData; input RPM MPH; cards; 1298 62 4890 145 ; run; * This will merge every data set in the work directory. Since PerhapsUnrelatedData has different variables compared to NameTestData1 and NameTestData2, the ordering of the variables will be odd.; %StackAllDataSets(LibnameIn=work, DsnOut=TestData_Out1, RegExDsnFilter_1=/^(Name|Perhaps)/I, SourceColumn=Y, Debug=N) data BadName; input FirstName; cards; 1 2 3 ; run; * Data set "NameBad defines FirstName as numeric. So rerunning %StackAllDataSets() will cause this a collision of type definitions, resulting in the creation of the SchizoidVariables data set report.; %StackAllDataSets(LibnameIn=work, DsnOut=TestData_Out1, SourceColumn=Y, Debug=N) * This will merge only those data sets whose names begin with "Name". Note that the FirstName and LastName variables in NameTestData1 only have a length of 8, while the same variables in NameTestData2 have a length of 10. The resulting outputted data set uses the larger of the two lengths, 10.; %StackAllDataSets(LibnameIn=work, DsnOut=TestData_Out2, RegExDsnFilter_1=/^Name/I, Debug=N) */