Transcript Lecture 8

Lecture 14
Statistical Example
Chapters 10 and 12
Outline
10.1 Solving Simple Problems
10.2 Assembling Solution Steps
10.3 Summary of Operations
10.4 Solving Larger Problems
12.1 Behavioral Abstraction
12.2 Matrix Operations
12.3 MATLAB Implementation
“Simple” Problems
• Basic Character of the Data and Operations
– Define the input data
– Define the output data
– Extract the transformations upon the input that
produce the output
– Write the transformations as code operations
• Debugging as necessary
“Not That Simple” Problems
• Build solutions to problems keeping in mind
operations we know how to perform.
• Think how applying one operation might make
the problem easier.
• Keep doing this, until the problem is broken
down into parts that we can do.
• Build modular solutions, so that the building
blocks for future problems are larger than the
operations supplied by the language.
ANOVA
• Planned tests are determined before looking at
the data and post hoc tests are performed after
looking at the data. Post hoc tests such as Tukey's
test most commonly compare every group mean
with every other group mean and typically
incorporate some method of controlling of Type I
errors.
• *from
http://en.wikipedia.org/wiki/Analysis_of_variance
Example, Apply ANOVA
• Measure B/C preference
• Three groups of data
– Tiz
– Dax
– Zup
• Is the B/C preference influenced by use of Tiz,
Dax, Zup, or, are the groups all the same?
Getting to Numbers
• Some number of tests, n, will be used to
develop a number of times the outcome B
occurs.
• Some fraction of the time, outcome B will
occur. Let this be b/n.
• If n = 1, then b/n can only be 1 or 0.
• We want to obtain a certain number of test
results, so that we can calculate their mean
and variance.
Formulae for Mean, Variance
• Mean = sum(the results)/number of results
• Square of deviation = (one result – Mean) 2
• Variance = sum(squares of deviation)/number
of results
• Standard deviation = positive square root of
variance
Information from Data
• We can produce these sums over all subjects
holding the shape constant.
– We can try to find out whether shape matters.
• We can produce these sums over all shapes
holding the subject constant.
– We can try to find out whether the subjects are
different from one another.
Arranging Information in an Array
• Suppose we have several subjects (A, F, G, J) and
several types of test (T, D, Z) and a result (number of b
choices per total choices) for each.
• We could use the subject as an index on an array.
• We could use the type of test as an index on an array.
• We could use the index of the test (A’s 12th test session
with D, so, 12) as an index on an array.
• We would store the ratio b/c as the value in the
location given by the indices:
biasMeasure (subject, type, index) = #of b choices per
total choices
Generalizing
• The example had three dimensions, subject,
type, index.
• The number of dimensions could be more or
fewer.
Mean, Variance
of Some Particular Thing
• Suppose we wanted the mean and standard
deviation of Jack’s data, averaged over all values
of index and type
• Suppose Jack’s subject identifier is “7”.
• with biasMeasure (subject, type, index)
• Mean = (1/(nTypes*nIndices))*
sum(sum(biasMeasure(7,:,:)))
• Variance = (1/(nTypes*nIndices))*
sum(sum(biasMeasure(7,:,:)-Mean) 2 ))
Mean, Variance of
Something Else
• Suppose we wanted the mean and standard
deviation of one type data, averaged over all
values of index and subject
• Suppose the type’s identifier is “3”.
• with biasMeasure (subject, type, index)
• Mean = (1/(nTypes*nSubjects))*
sum(sum(biasMeasure(:,3,:)))
• Variance = (1/(nTypes*nSubjects))*
sum(sum(biasMeasure(:,3,:)-Mean) 2 ))
Analysis of Variance
The ANOVA tests the null hypothesis that samples in
two or more groups are drawn from the same
population. To do this, two estimates are made of the
population variance. These estimates rely on various
assumptions. The ANOVA produces an F statistic, the
ratio of the variance calculated among the means to
the variance within the samples. If the group means
are drawn from the same population, the variance
between the group means should be lower than the
variance of the samples, following central limit
theorem. A higher ratio therefore implies that the
samples were drawn from different populations.
See http://en.wikipedia.org/wiki/One-way_ANOVA and
Howell, David (2002). Statistical Methods for Psychology. Duxbury. pp. 324-325
Knowing Array,
Choosing File Design
• We would like a multidimensional array, so
that we can calculate easily the variances we
want.
• Let’s look at some sample code for reading in
from a file into a multidimensional array.
The File
All entries are coded as numbers.
Sample Code - 1
[nums text raw] = xlsread('exmple4ANOVA.xls')
nums =
1
1
1
2
2
2
3
3
3
4
4
4
1
2
3
1
2
3
1
2
3
1
2
3
1
0
1
1
0
0
1
0
0
1
1
0
The subjects are numbered 1-4.
The types are numbered 1-3.
The outcome (b or s) is coded 1 for a b.
Sample Code -2
function outArray = fillMDArrayFrom2DNums(nums)
%fillArray(nums) takes an array of numbers that has been
%read in from a file
%and extracts the values (dependent variables) associated
with setting of
%independent variables
%for example, subject, type and index might be
independent variables
%the number of 'b' choices in n trials might be the
dependent variable
%the returned array has a dimension for each of the
independent variables
%the file has a column for each independent variable
(except index),
%plus one column for the dependent variable
%for example, a column for subject, a column for type, a
column for b
%the number of trials with the same independent variables
%is obtained by counting the number of repetitions
%the index of the trial is obtained by the value of the
counter
[nRows nColumns] = size(nums);
%now we can see how many independent variables there are
nIndependentVariables = nColumns -1;
dependentVariables = nums(:,end);%the last column
numsExceptLast = nums(:, 1:nIndependentVariables);
allMins = min(numsExceptLast);
allMaxs = max(numsExceptLast);
allRanges = allMaxs-allMins+1;
outArray = zeros(allRanges);
columnValue = ones(1,nIndependentVariables);
for rowIndex = 1:nRows
if dependentVariables(rowIndex) == 1 %if there is a b to be added on
for independentVariableIndex = 1:nIndependentVariables
columnValue(independentVariableIndex) = ...
nums(rowIndex, independentVariableIndex)...
-allMins(independentVariableIndex)+1;
end
outArray(columnValue(1), columnValue(2))= outArray(columnValue(1),
columnValue(2))+1;
end
end
end
File Output
• >> outArray=fillMDArrayFrom2DNums(nums)
• outArray =
•
•
•
•
1
1
1
1
0
0
0
1
1
0
0
0
• >> xlswrite('theOutputFile.xls', outArray);
The File