![]() |
|
SAS Tip
of the Month I recently had a question from Bob in New Jersey asking me to look at whether NODUPKEY always selected the first record in a BY group -- his opinion was that it always did. The short answer is 'NOT REALLY'. However the answer does depend on two other options -- the EQUALS|NOEQUALS option in the SORT procedure, and the global option SORTEQUAL|NOSORTEQUALS (default when shipped from SAS is SORTEQUAL). Lets look at first what the global option SORTEQUAL|NOSORTEQUALS is all about, quoting from the SAS documentation:
It is interesting to note that SAS suggests that in order to save resources, use NOSORTEQUALS when you do not need to maintain a specific order of observations with identical BY variable values. Now lets look at the EQUALS|NOEQUALS option in the SORT procedure:
From the SAS documentation, the NOEQUALS option is useful if you are sorting a large dataset as it can save in CPU but has the disadvantage of producing inconsistent results in your output data sets if you run the code multiple times. Now that we have looked at all the documentation, lets take a look at some data: data ae0;
input ptnum $6. aeseq event $20.;
cards;
001002 1 HEADACHE
001001 1 FEVER
001001 2 HEADACHE
001003 1 NAUSEA
001003 4 DIARRHEOA
001003 2 VOMITING
001004 1 DIARRHEOA
001001 3 DIARRHEOA
001002 2 DIARRHEOA
001004 2 HEADACHE
001003 3 FEVER
;
run;
In addition, to make sure the global option SORTEQUAL|NOSORTEQUALS is set to its usual shipped default the option shall be set to SORTEQUAL: options SORTEQUAL; Now lets look at the following code where we just use NODUPKEY (I have put the EQUALS option for clarity although this is not needed with the setting of the global option SORTEQUAL): proc sort data=ae0 out=ae1 nodupkey equals;
by ptnum;
title1 "Data with NODUPKEY and EQUALS";
proc print data=ae1 noobs;
run;
=== Output ===
Data with NODUPKEY and EQUALS
ptnum aeseq event
001001 1 FEVER
001002 1 HEADACHE
001003 1 NAUSEA
001004 1 DIARRHEOA
In the code above the first record of each "by group" value is output, i.e. the first record in the sequence. Now lets just suppose that the option EQUALS|NOEQUALS was set to NOEQUALS (or the setting of the global option NOSORTEQUALS): proc sort data=ae0 out=ae1 nodupkey noequals;
by ptnum;
title1 "Data with NODUPKEY and NOEQUALS";
proc print data=ae1 noobs;
run;
=== Output ===
Data with NODUPKEY and NOEQUALS
ptnum aeseq event
001001 3 DIARRHEOA
001002 1 HEADACHE
001003 3 FEVER
001004 1 DIARRHEOA
Note that the output shows that it is not necessarily the first record for the PTNUM variable. Inside SAS there are a large number of SAS options that can affect the output that is produced, whether is be from a procedure or datastep. If your program produces unexpected results, after you have gone though looking at the SAS LOG and exhausted other remedies, take a look at the SAS options that are applied. I know this is a very technical tip this month but I hope that it shows that SAS is a very sophisticated piece of software and must be treated with respect. I hope this topic is interesting for some. See you in March. |
|
| ________________________________ Updated February 02, 2010 |
|