This tool provides several methods for filtering the dataset. The window that opens has four options for you to choose from:
Levels of a categorical variables
After selecting a categorical variable from the drop down box, you can select which levels you want to retain in the data set.
Numeric condition
This allows you to define a condition with which to filter your data.
For example, you could include only the observations of height
over 180 cm by
height
from the drop down menu,>
symbol, and180
in the third box.Row number
Exclude a range of row numbers as follows:
Randomly
Essentially, this allows you to perform bootstrap randomisation manually.
The current behaviour is this:
Sort the rows of the data by one or more variables. The ordering will be nested, so that the data is first ordered by "Variable 1", and then "Variable 2", etc. For categorical variables, the ordering will be based on the order of the variable (by default, this will be alphabetical unless manually changed in "Manipulate Variables" > "Categorical Variables" > "Reorder Levels").
This function essentially allows you to obtain "summaries" of all of the numeric variables in the data set for combinations of categorical variables.
Variables: if only one variable is specified, the new data set will have one row for each level of the variable.
If two (or more) are specified, then there will be one row for each combination.
For example, the categorical variables gender = {male, female}
and ethnicity = {white, black, asian, other}
will
result in a data set with 2x4 rows.
Summaries: each row will have the chosen summaries given for each numeric variable in the data set.
For example, if the data set has the variables gender (cat)
and height (num)
, and if the user selects Mean
and Sd
,
then the new data set will have the columns gender
, height.Mean
and height.Sd
.
In the rows, the values will be for that combination of categorical variables;
the row for gender = female
will have the mean height of the females, and the standard deviation of height for the females.
A visual example of this would be do drag height
into the Variable 1 slot, and gender
into the Variable 2 slot.
Clicking on "Get Summary" would provide the same information. The advantage of using Aggregate is that the summaries are calculated for every numeric variable in the data set, not just one of them.
Convert from table form (rows corresponding to subjects) to long form (rows corresponding to observations).
In many cases, the data may be in tabular form, in which multiple observations are made but placed in different columns.
An example of this may be a study of blood pressure on patients using several medications. The columns of this data set may be:
patient.id
, gender
, drug
, Week1
, Week2
, Week3
. Here, each patient has their own row in the data set, but each row contains three observations of blood pressure.
patient.id | gender | drug | Week1 | Week2 | Week 3 |
---|---|---|---|---|---|
1 | male | A | 130 | 125 | 120 |
2 | male | B | 140 | 130 | 110 |
3 | female | A | 120 | 119 | 116 |
We may want to convert to long form, where we have each observation in a new row, and use a categorical variable to differentiate the weeks.
In this case, we would select Week1
, Week2
, and Week3
as the variables in the list. The new data set will have the columns
patient.id
, gender
, drug
, Stack.variable
("Week"), and stack.value
("blood pressure").
patient.id | gender | drug | stack.variable | stack.value |
---|---|---|---|---|
1 | male | A | Week1 | 130 |
1 | male | A | Week2 | 125 |
1 | male | A | Week3 | 120 |
2 | male | B | Week1 | 140 |
2 | male | B | Week2 | 130 |
2 | male | B | Week3 | 110 |
3 | female | A | Week1 | 120 |
3 | female | A | Week2 | 119 |
3 | female | A | Week3 | 116 |
Of course, you can rename the variables as appropriate using "Manipulate Variables" > "Rename Variables".
Restores the data set to the way it was when it was initially imported.