![]() |
|
SAS Tip
of the Month I recently had an email from Nancy in New Jersey on how to do a many-to-many merge in SAS without using SQL, so for this month is a discussion on this issue that often occurs when dealing with data. Lets look at some data first -- this is in the form of SAS card statements (you could also use a DATALINES statement, but for this example I am using the older CARDS statement) so you can copy this data an load it into your SAS Program window: *----------------------------------------------------*;
* Lets load some data;
*----------------------------------------------------*;
data ae; ** Adverse Event Data;
infile cards;
input ptnum $ 1-3 @5 date date9. event $ 15-35;
format date date9.;
cards;
001 16NOV2009 Nausea
002 16NOV2009 Heartburn
002 16NOV2009 Acid Indigestion
002 18NOV2009 Nausea
003 17NOV2009 Fever
003 18NOV2009 Fever
005 17NOV2009 Fever
;
run;
data cm; ** Concomitant Medication Data;
infile cards;
input ptnum $ 1-3 @5 date date9. medication $ 15-35;
format date date9.;
cards;
001 16NOV2009 Dopamine
002 16NOV2009 Antacid
002 16NOV2009 Sodium bicarbonate
002 18NOV2009 Dopamine
003 18NOV2009 Asprin
004 19NOV2009 Asprin
005 17NOV2009 Asprin
;
run;
The merge that is going to be done here is a merge by PTNUM (subject number) and DATE. Some notes on the data:
Lets jump ahead to what we are looking for when we merge the two datasets -- note that we are doing an inner join, i.e. where there is no match the observation is excluded: Obs ptnum date event medication
1 001 16NOV2009 Nausea Dopamine
2 002 16NOV2009 Heartburn Antacid
3 002 16NOV2009 Heartburn Sodium bicarbonate
4 002 16NOV2009 Acid Indigestion Antacid
5 002 16NOV2009 Acid Indigestion Sodium bicarbonate
6 002 18NOV2009 Nausea Dopamine
7 003 18NOV2009 Fever Asprin
8 005 17NOV2009 Fever Asprin
When first looking at merging the two datasets, the first thought may be to use a simple MERGE statement as shown below: ** SAS Code;
*----------------------------------------------------*;
* This will not work!;
*----------------------------------------------------*;
data all0;
merge ae cm;
by ptnum date;
run;
title1 "Merge using the MERGE statement -- this fails";
proc print data=all0;
run;
** SAS Output;
Merge using the MERGE statement -- this fails
Obs ptnum date event medication
1 001 16NOV2009 Nausea Dopamine
2 002 16NOV2009 Heartburn Antacid
3 002 16NOV2009 Acid Indigestion Sodium bicarbonate
4 002 18NOV2009 Nausea Dopamine
5 003 17NOV2009 Fever
6 003 18NOV2009 Fever Asprin
7 004 19NOV2009 Asprin
8 005 17NOV2009 Fever Asprin
As can be seen, this clearly does not work, so how can a many to many merge be done successfully? The most common way that this match is done is with an SQL call, as the following code demonstrates: ** SAS Code;
*----------------------------------------------------*;
* SQL, the most common way that it is seen done;
*----------------------------------------------------*;
proc sql;
create table all0 as
select a.*, b.medication
from ae a inner join cm b
on a.ptnum=b.ptnum and
a.date=b.date;
quit;
run;
title1 "Merge using SQL -- most common way this is seen done";
proc print data=all0;
run;
** SAS Output;
Merge using SQL -- most common way this is seen done
Obs ptnum date event medication
1 001 16NOV2009 Nausea Dopamine
2 002 16NOV2009 Heartburn Antacid
3 002 16NOV2009 Heartburn Sodium bicarbonate
4 002 16NOV2009 Acid Indigestion Antacid
5 002 16NOV2009 Acid Indigestion Sodium bicarbonate
6 002 18NOV2009 Nausea Dopamine
7 003 18NOV2009 Fever Asprin
8 005 17NOV2009 Fever Asprin
This works, but it is not what we want -- we do not want to use SQL in the solution. This brings about the use of the POINT option in the SET statement -- take a look at the following code and try it out in your Program window: ** SAS Code;
*----------------------------------------------------*;
* The datastep with POINT option -- not often seen,;
* but gives most control. Key to this is that the;
* datastep takes each observation in AE, and then;
* tries to match this with each observation in;
* CM -- this is basically a loop within a loop!;
*----------------------------------------------------*;
data all1;
set ae;
drop _:; ** Drop temporary variables;
match=0; ** Match flag;
** Our loop within a loop -- output if match;
do i=1 to xnobs;
** Need to rename the "merging" variables within the CM
dataset;
set cm (rename=(ptnum=_ptnum date=_date)) nobs=xnobs point=i;
** Have to rename matching variables so that they do not overwrite
the original values in AE;
if ptnum=_ptnum and date=_date then do;
match=1; ** Yes, there is a match my the "merging" variables;
output;
end;
end;
run;
title1 "Merge using using the POINT option in a SET statement";
proc print data=all1;
run;
** SAS Output;
Merge using using the POINT option in a SET statement
Obs ptnum date event match medication
1 001 16NOV2009 Nausea 1 Dopamine
2 002 16NOV2009 Heartburn 1 Antacid
3 002 16NOV2009 Heartburn 1 Sodium bicarbonate
4 002 16NOV2009 Acid Indigestion 1 Antacid
5 002 16NOV2009 Acid Indigestion 1 Sodium bicarbonate
6 002 18NOV2009 Nausea 1 Dopamine
7 003 18NOV2009 Fever 1 Asprin
8 005 17NOV2009 Fever 1 Asprin
This is a lot of code. An important note here is that I have had to rename the matching variables in CM so that they do not overwrite the original values in AE (very important) but use the DROP statment to get rid of these when the dataset ALL1 is created. Note also that I have printed out the variable MATCH so that it is easy to see where the match is. Yippie, the output is what we have expected AND the we have not used SQL but instead used a datastep.
There are a couple of other ways to do this merge without SQL, notably a variation on the "Loading Unique Dataset into an Array" which I presented first at NESUG in 2008, which can be seen here, but the technique shown above is a shorter method with similar control (however the use of the array will do a many to many merge with the equivilant SQL OUTER JOIN). Personally I prefer the datastep approach given above since it has the most control. Hope this helps. Have a very happy and safe holiday season. See you all next year. |
|
| ________________________________ Updated December 1, 2009 |
|