![]() |
|
SAS Tip
of the Month This month I am going to talk about dataset variable labels, and why they are useful. There are many parts to a variable, key among them are the name of the variable (NAME), type (TYPE), and length (LENGTH). Under SAS version 8 and above the name of a variable can be 32 characters long, its type can be either Character or Numeric, and the length for a character variable can be set between 1 and 32767 characters while a numeric variable can be set between 3 and 8 (yes, some operating systems allow for 2 but not all). There are other parts to a variable, but the three most common are the informat, format and label -- we shall look at this last part this month. What exactly is a label and why is it used? In simple terms the label is a short descriptive text giving a more user friendly description of the variable. Take for example a very simple example, the variable WEIGHT in a dataset -- sure it tells me what the variable is (it is weight) but it does not tell me other things I may need to know like what unit it is in or when the weight data point was collected, all of this being very important information. Given that I am allowed 32 characters for a variable name I could change the variable name to reflect this information, but in most cases it is not possible to do this and it indeed impractical. So here comes the use of the LABEL statement. The label itself can be 256 characters, including blanks, in length (40 characters if you are still using SAS version 6.xx) -- this gives us plenty of room to write a good description of the variable. Setting the label is commonly achieved by using the LABEL statement inside a datastep or the LABEL option under the MODIFY statement in the DATASETS procedure, the syntax of which is given below: *** Inside a datastep;
data class;
set class;
label weight='Weight (kg) at Start of Study';
run;
*** Inside the DATASETS procedure;
proc datasets library=work;
modify class;
label weight='Weight (kg) at Start of Study';
quit;
run;
I personally use the datastep method if I am creating or modifying a variable inside that datastep, otherwise I use the DATASETS procedure, but is only my convention -- there is no set rule with this. Before going on to a real world example, lets first see how we would look at what labels are set, if any, inside a dataset. The two easiest ways to look at the data is to either run a CONTENTS procedure call on the dataset, or if you have the SAS Viewer installed, look at the attributes window for that dataset. Now lets look at a real world example where it will all come clearer. For this example I will use the dataset SASHELP.CLASS. Lets look first at the structure of the dataset using the CONTENTS procedure (will actually make a copy first so I don't overwrite the original data): data class; *** Make a copy of the dataset;
set sashelp.class;
run;
proc contents data=class; *** Get structure of the dataset;
run;
Running this code we get the following output (abridged): Alphabetic List of Variables and Attributes # Variable Type Len 3 Age Num 8 4 Height Num 8 1 Name Char 8 2 Sex Char 1 5 Weight Num 8 If a label existed for a variable we would see a column headed "Label", but in this case there is no labels applied to the dataset. So now lets set one for WEIGHT as we indicated above (this time I shall do the content structure not from the CONTENTS procedure, but the CONTENTS statement inside the DATASETS procedure): proc datasets library=work;
modify class;
label weight='Weight (kg) at Start of Study';
contents data=class;
quit;
run;
Running this code we get the following output (abridged): Alphabetic List of Variables and Attributes # Variable Type Len Label 3 Age Num 8 4 Height Num 8 1 Name Char 8 2 Sex Char 1 5 Weight Num 8 Weight (kg) at Start of Study As you can see the variable WEIGHT now has a label -- the reason why it is useful will become clear shortly. For the purposes of this example I will also now add a label to the variable AGE and HEIGHT using the DATASETS procedure as above: proc datasets library=work;
modify class;
label height='Height (in) at Start of Study'
age='Age (years) at Start of Study';
contents data=class;
quit;
run;
Note that I did not redo the label for the variable WEIGHT in the above code as it was already done previously, although I could have put in the step and even replaced it with new text. Now why is the label useful. As you can see already it gives a useful description of what the variable is. Now lets extend that to a small report using the PRINT procedure: proc print data=class noobs; run; that produces the following output (abridged): Name Sex Age Height Weight Alfred M 14 69.0 112.5 Alice F 13 56.5 84.0 Barbara F 13 65.3 98.0 Carol F 14 62.8 102.5 Henry M 14 63.5 102.5 The column headers are the same as the variable names, and we have the same problem of not knowing what the units are for Age, Height and Weight (but did we not put that in a label earlier) but lets add another option to get the labels: proc print data=class noobs LABEL; run; that produces the following output (abridged): Age (years) Height (in) Weight (kg)
at Start at Start at Start
Name Sex of Study of Study of Study
Alfred M 14 69.0 112.5
Alice F 13 56.5 84.0
Barbara F 13 65.3 98.0
Carol F 14 62.8 102.5
Henry M 14 63.5 102.5
Now, as you can see, we have a report that gives a clear description of the variables though the use of dataset variable labels. There will be those reading this who will say that I could have put the label in the PRINT procedure call (yes, the LABEL statement is a global statement that can be used in almost any procedure) and would be done this way using the following code: proc print data=class noobs LABEL;
label weight='Weight (kg) at Start of Study'
height='Height (in) at Start of Study'
age='Age (years) at Start of Study';
run;
But the one reason I don't normally use this method is that the label does not carry forward in the dataset. Now see a complete example where the CLASS data is copied from the SASHELP directory, useful labels put on the dataset, then doing a PRINT, MEANS and TABULATE call on the same data, all with the same labels: proc datasets library=work;
copy in=sashelp out=work;
select class;
modify class;
label weight='Weight (kg) at Start of Study'
height='Height (in) at Start of Study'
age='Age (years) at Start of Study';
quit;
run;
proc print data=class noobs LABEL;
run;
proc means data=class;
var age height weight;
run;
proc tabulate data=class;
class sex;
var age height weight;
tables age*
(n*f=8.0 mean*f=8.2 std*f=8.3
median*f=8.2 min*f=8.0 max*f=8.0)
(height weight)*
(n*f=8.0 mean*f=8.3 std*f=8.4
median*f=8.3 min*f=8.1 max*f=8.1),
sex all='Total';
run;
produces the following output: The PRINT Procedure
Age (years) Height (in) Weight (kg)
at Start at Start at Start
Name Sex of Study of Study of Study
Alfred M 14 69.0 112.5
Alice F 13 56.5 84.0
Barbara F 13 65.3 98.0
Carol F 14 62.8 102.5
Henry M 14 63.5 102.5
James M 12 57.3 83.0
Jane F 12 59.8 84.5
Janet F 15 62.5 112.5
Jeffrey M 13 62.5 84.0
John M 12 59.0 99.5
Joyce F 11 51.3 50.5
Judy F 14 64.3 90.0
Louise F 12 56.3 77.0
Mary F 15 66.5 112.0
Philip M 16 72.0 150.0
Robert M 12 64.8 128.0
Ronald M 15 67.0 133.0
Thomas M 11 57.5 85.0
William M 15 66.5 112.0
The MEANS Procedure
Variable Label N Mean Std Dev Minimum Maximum
---------------------------------------------------------------------------------------------------
Age Age (years) at Start of Study 19 13.3157895 1.4926722 11.0000000 16.0000000
Height Height (in) at Start of Study 19 62.3368421 5.1270752 51.3000000 72.0000000
Weight Weight (kg) at Start of Study 19 100.0263158 22.7739335 50.5000000 150.0000000
---------------------------------------------------------------------------------------------------
The TABULATE Procedure
----------------------------------------------------
| | Sex | |
| |-----------------| |
| | F | M | Total |
|-----------------------+--------+--------+--------|
|Age (years)|N | 9| 10| 19|
|at Start of|-----------+--------+--------+--------|
|Study |Mean | 13.22| 13.40| 13.32|
| |-----------+--------+--------+--------|
| |Std | 1.394| 1.647| 1.493|
| |-----------+--------+--------+--------|
| |Median | 13.00| 13.50| 13.00|
| |-----------+--------+--------+--------|
| |Min | 11| 11| 11|
| |-----------+--------+--------+--------|
| |Max | 15| 16| 16|
|-----------+-----------+--------+--------+--------|
|Height (in)|N | 9| 10| 19|
|at Start of|-----------+--------+--------+--------|
|Study |Mean | 60.589| 63.910| 62.337|
| |-----------+--------+--------+--------|
| |Std | 5.0183| 4.9379| 5.1271|
| |-----------+--------+--------+--------|
| |Median | 62.500| 64.150| 62.800|
| |-----------+--------+--------+--------|
| |Min | 51.3| 57.3| 51.3|
| |-----------+--------+--------+--------|
| |Max | 66.5| 72.0| 72.0|
|-----------+-----------+--------+--------+--------|
|Weight (kg)|N | 9| 10| 19|
|at Start of|-----------+--------+--------+--------|
|Study |Mean | 90.111| 108.950| 100.026|
| |-----------+--------+--------+--------|
| |Std | 19.3839| 22.7272| 22.7739|
| |-----------+--------+--------+--------|
| |Median | 90.000| 107.250| 99.500|
| |-----------+--------+--------+--------|
| |Min | 50.5| 83.0| 50.5|
| |-----------+--------+--------+--------|
| |Max | 112.5| 150.0| 150.0|
----------------------------------------------------
Note that the label we defined in the DATASETS procedure carried forward to the PRINT, MEANS and TABULATE procedures. I hope you find this months tip useful. |
|
| ________________________________ Updated May 8, 2009 |
|