QC’ing ID and time variables

When a data file is delivered in CVS or in Excel format and when it is converted into a software specific data table (e.g., SAS data), it is very important to check if ID and time variables are accurately converted.  If data are delivered to you regularly and periodically, check if SAS (or other software programs) is consistently reading the variables accurately.

The in-coming data may look the same, but when SAS reads them, the format definition of some variables may be different.

Check if all digits in numeric IDs are read.  If 2142 is read as 214 by mistake, this creates duplicative IDs as 2142 and 2143, for example, will be both 214.  This happens because when SAS is reading a CSV or Excel file, it is misreading the number of digits that a column includes.  You may need to create a fake row at the top row to force SAS to read the row correctly (e.g., enter 999999, so SAS will have ample space to read the number-based values).

Check if time/calendar vairiables are read correctly every time.  Again EVERYTIME the data file is delivered, this has to be checked.

This is an example of an error:

06/15/2015  (CVS) IS READ AS 10MAR2002 (SAS).

When merging datasets, be sure the same variable names are not used

Whe merging data A and B and if A and B has variables that happen to have the same variable name, one of them will be deleted. For example, if A has “GPA” meaning individual-level GPA and if B has GPA meaning school-level GPA, one of them will be deleted after the merge.

data A;
gpa=1.2;
run;

data B;
gpa=999;
run;

data both;
merge A B;
run;

proc print;
run;

RESULT:
Obs gpa
1 999