Statistics I - Robert Wood Johnson Medical School

Download Report

Transcript Statistics I - Robert Wood Johnson Medical School

Statistics Workshop 2011
Ramsey A. Foty, Ph.D.
Department of Surgery
UMDNJ-RWJMS
“An unsophisticated forecaster uses statistics as a drunkard uses lamp-postsfor support rather than for illumination”
Andrew Lang (1844-1912)…Scottish poet and novelist.
“Then there is the man who drowned crossing a stream with an average depth
of six-inches
W.I.E. Gates…German Author
Statistics: The only science that enables different experts using the same
figures to draw different conclusions.”
Evan Esar…American Humorist
Topics
•
•
•
Why do we need statistics?
Sample vs population.
Gaussian/normal distribution.
•
Descriptive Statistics.
– Measures of location.
• Mean, Median, Mode.
– Measures of dispersion.
• Range, Variance,
Standard Deviation.
– Precision of the mean.
• Standard Error,
Confidence Interval.
– Outliers.
• Grubb’s test.
•
•
•
•
The null hypothesis.
Significance testing.
Variability.
Comparing two means.
–
–
•
Comparing 3 or more groups.
–
–
•
•
T-test
Group exercise
ANOVA
Group Excercise
Linear Regression.
Power Analysis.
Why do we need statistics?
• Variability can obscure
important findings.
• We naturally assume
that observed
differences are real and
not due to natural
variability.
• Variability is the norm.
• Statistics allow us to
draw from the sample,
conclusions about the
general population.
Sample vs Population
•
•
Taking samples of information
can be an efficient way to draw
conclusions when the cost of
gathering all the data is
impractical.
If you measure the
concentration of factor X in the
blood of 10 people, does that
accurately reflect the
concentration of Factor X of
the human race in general? How
about from 100, 1000, or 10,000
people? How about if you
sampled everyone on the planet?
Statistical methods were developed
based on a simple model:
• Assume that an infinitely large
population of values exists and that your
sample was randomly selected from a
large subset of that population. Now,
use the rules of probability to make
inferences about the general population.
The Gaussian Distribution
• If samples are large
enough, the sample
distribution will be bellshaped.
• The Gaussian function
describing this shape is
defined as follows:
; where m represents the population
mean and s the standard deviation.
An example of a Gaussian distribution
Descriptive Statistics
Measures of Location
Measures of Dispersion
A typical or central value that best
describes the data.
• Mean
• Median
• Mode
Describe spread (variation) of the data
around that central value.
•
•
•
•
•
Range
Variance
Standard Deviation
Standard Error
Confidence Interval
No single parameter can fully describe distribution of data in the sample. Most
statistics software will provide a comprehensive table describing the distribution.

Measures of Location: Mean
Mean
• More commonly referred to
as “the average”.
• It is the sum of the data
points divided by the number
of data points.
49  27  132  24  78  80  62  39  200
M
9
M=76.78 microns = 77 microns
Migration Assay
Cell #
Distance travelled
(Microns)
1
49
2
27
3
132
4
24
5
78
6
80
7
62
8
39
9
200
Measures of Location: Median
Median for odd sample
size
• The value which has half the
data smaller than that point
and half the data larger.
• For odd numbers, you first
rank order then pick the
middle number.
• Therefore the 5’th number in
the sequence is the median =
62 microns.
Migration assay
Cell #
Distance traveled
(microns)
1
24
2
27
3
39
4
49
5
62
6
78
7
80
8
132
9
200
Measures of Location: Median
Median for even sample
size
• Find the middle two numbers
then find the value that lies
between them.
• Add two middle ones
together and divide by 2.
• Median is (7+13)/2=10.
• The median is less sensitive
for extreme scores than the
mean and is useful for
skewed data.
Unranked
Ranked
3
3
13
5
7
7
5
13
21
21
23
23
Measures of Location: Mode
Mode
• Value of the sample which
occurs most frequently.
• It’s a good measure of
central tendency.
• The Mode for this data set is
72 since this is the number
with the highest frequency in
the data set.
• Not all data sets have a
single mode. It’s only useful
in very limited situations.
• Data sets can be bi-modal.
Marble Color
Frequency
Black
6
Brown
2
Blue
34
Purple
72
Pink
71
Green
58
Rainbow
34
Boxplots
Largest observed value that is not an outlier
75’th percentile
Median
25’th percentile
Smallest observed value that is not an outlier
12, 13, 5, 8, 9, 20, 16, 14, 14, 6, 9, 12, 12
5, 6, 8, 9, 9, 12, 12 ,12, 13, 14, 14, 16, 20
Boxplots are used to display summary statistics
Measures of Location…
do not provide information on
spread or variability of the data
Measures of Dispersion
• Describe the spread or
variability within the
data.
• Two distinct samples
can have the same mean
but completely
different levels of
variability.
• Which mean has a
higher level of
variability?
110 ± 5 or 110 ± 25
• Typical measures of
dispersion include
Range, Variance, and
Standard Deviation.
Measures of Dispersion: Range
Range
• The difference between
the largest and smallest
sample values.
• It depends only on
extreme values and
provides no information
about how the remaining
data is distributed.
For the cell migration data:
Largest distance = 200 microns
Smallest distance = 24 microns
Range = 200-24 = 176 microns.
NOT a reliable measure of
dispersion of the whole data
set.
Measures of Dispersion:
Variance
Variance
• Defined as the average
of the square distance
of each value from the
mean.
To calculate variance, it is first necessary
to calculate the mean score then measure
the amount that each score deviates from
the mean.
The formula for calculating variance is:
(X

M
)

2
S 
N 1
2
Why Square?
• Squaring makes
• Makes the bigger
them all positive
differences stand
numbers (to
out, 1002 (10,000) is
eliminate negatives,
a lot bigger than
which will reduce the
502(2500).
variance.
N vs N-1
N
N-1
Size of the population
Size of the sample
For the cell migration data, the
sample variance is:
S2 
(28)2  (50)2  (55)2  (53)2  (1)2  (3)2  (15)2  (38)2  (123)2
8

NOT a very user-friendly statistic.
Measures of Dispersion:
Standard Deviation
Standard Deviation
•
•
•
The most common and useful
measure of dispersion.
Tells you how tightly each
sample is clustered around the
mean. When the samples are
tightly bunched together, the
Gaussian curve is narrow and the
standard deviation is small.
When the samples are spread
apart, the Gaussian curve is flat
and the standard deviation is
large.
• The formula to calculate
standard deviation is:
SD = square root of the variance.
• For this data set, the
mean and standard
deviation are:
77 ± 57 microns
Conclusion: There’s lots of
scatter in this data set.
But then again….
• This is a fairly small population (n=9).
• What if we were to count the migration of 90,
or 900, or 9000 cells.
• Would this give us a better sense of what the
average migration distance is?
• In other words, how can we determine
whether our mean is precise?
Precision of the Mean
Standard Error
• A measure of how far
the sample mean is away
from the population
mean.
For our data set:
SEM 

SEM gets smaller as sample size
increases since the mean of a larger
sample is likely to be closer to the
population mean.
SD 57 57


 19
N
9 3
Increasing sample size does
not change scatter in the data.
SD may increase or decrease.
Increasing sample size will, however,
predictably reduce the standard
error.
Should we show standard deviation or
standard error?
Use Standard Deviation
Use standard error
• If the scatter is caused
by biological variability
and you want to show
that variability.
• For example: You aliquot
10 plates each with a
different cell line and
measure integrin
expression of each.
• If the variability is
caused by experimental
imprecision and you want
to show the precision of
the calculated mean.
• For example: You aliquot
10 plates of the same
cell line and measure
integrin expression of
each.
Precision of the Mean
Confidence Intervals
• Combines the scatter in
any given population
with the size of that
population.
• Generates an interval in
which the probability
that the sample mean
reflects the population
mean is high.
The formula for calculating CI:
CI = X ± (SEM x Z)
•
X is the sample mean and Z is
the critical value for the normal
distribution.
•
For the 95% CI, Z=1.96.
•
For our data set:
95% CI=77 ± (19x1.96)=77 ±
32
CI 95%=45-109
•
This means that there’s a 95%
chance that the CI you
calculated contains the
population mean.
CI: A Practical Example
Data set A
Data set B
Data set A
Data set B
80
90
Mean
86.1
64.1
85
52
SD
4.1
19.3
90
30
SEM
1.3
6.1
88
44
Low 95% CI
83.2
50.3
79
68
High 95% CI
89.0
77.9
92
77
88
55
85
62
88
75
86
88
Between these two data
sets, which mean do you
think best reflects the
population mean and why?
SD/SEM/95% CI error bars
SD
SEM
95% CI
Outliers
• An observation that
is numerically
distant from the
rest of the data.
• Can be caused by
systematic error,
flaw in the theory
that generated the
data point, or by
natural variability.
How to deal with outliers?
• In general, we first
quantify the
difference between
the mean and the
outlier, then we
divide by the

scatter (usually SD).
Grubb’s test
mean  value
Z
SD
For the cell migration data set:
The mean is 77 microns. The
Sample furthest from the mean
Is the 200 micron point and the
SD is 57. So:
77  200
Z
 2.16
57
What does a Z value of -2.16 mean?
• In order to answer this
question, we must
compare this number to
a probability value (P) to
answer the following
question:
• “If all the values were
really sampled from a
normal population, what
is the chance of
randomly obtaining an
outlier so far from the
other values?”
• To do this, we compare
the Z value obtained
with a table listing the
critical value of Z at the
95% probability level.
• If the computed Z is
larger than the critical
value of Z in the table,
then the P value is less
than 5% and you can
delete the outlier.
For our data set:
• Z calc (2.16) is less
than Z Tab (2.21), so
P is greater than 5%
and the outlier must
be retained.
The Null Hypothesis
• Appears in the form Ho: m1 = m2
Ho = null hypothesis
m1 = mean of population 1
m2 = mean of population 2
• An alternate form is Ho: m1-m2=0
• The null hypothesis is presumed true until
statistical evidence in the form of a
hypothesis test proves otherwise.
Where;
Statistical Significance
• When a statistic is significant, it simply
means that the statistic is reliable.
• It does not mean that it is biologically
important or interesting.
• When testing the relationship between
two parameters we might be sure that
the relationship exists, but is it weak or
strong?
Strong vs weak relationships
r2=0.2381
r2=1.000
Significance Testing
Type I and Type II errors
Statistical
Decision
• Type I error: a true null
hypothesis can be
incorrectly rejected.
True state of
the null
hypothesis
– False positive
Ho True
Ho false
Reject Ho
Type I error
Correct
Do not
Reject Ho
Correct
Type II error
• Type II error: a false null
hypothesis can fail to be
rejected.
– False negative
A Practical Example
Type I error
• A pregnancy test has
produced a "positive" result
(indicating that the woman
taking the test is pregnant);
if the woman is actually not
pregnant, then we say the
test produced a "false
positive".
Type II error
• A Type II error, or a "false
negative", is the error of
failing to reject a null
hypothesis when the
alternative hypothesis is the
true state of nature….i.e if a
pregnancy test reports
"negative" when the woman
is, in fact, pregnant.
In significance testing we must be able to reduce the chance of rejecting a true null-hypothesis
to as low a value as desired. The test must be so devised that it will reject
the hypothesis tested when it is likely to be false.
Sources of Variability
Random Error
Systematic Error
• Caused by inherently
unpredictable
fluctuations in the
readings of a
measurement apparatus
or in the experimenter's
interpretation of the
instrumental reading.
• Can occur in either
direction.
• Is predictable, and typically
constant or proportional to
the true value. Systematic
errors are caused by
imperfect calibration of
measurement instruments or
imperfect methods of
observation.
• Typically occurs only in 1
direction.
Some Examples
Type of Error
Example
How to Minimize
Random Error
You measure the mass of a
ring three times using the
same balance and get
slightly different values:
17.46 g, 17.42 g, 17.44 g
Take more data. Random
errors can be evaluated
through statistical analysis
and can be reduced by
averaging over a large
number of observations.
Systematic Error
The electronic scale you
use reads 0.05 g too high
for all your mass
measurements (because it
is improperly tared
throughout your
experiment).
Systematic errors are
difficult to detect and
cannot be analyzed
statistically, because all of
the data is off in the same
direction (either too high
or too low).
Repeatability/Reproducibility
Repeatability
• The variation in
measurements taken by a
single person or
instrument on the same
item and under the same
conditions.
• An experiment, if
performed by the same
person, using the same
equipment, reagents, and
supplies, must yield the
same result.
Reproducibility
• The ability of a test or
experiment to be
accurately reproduced
or replicated by
someone else working
independently.
• Cold fusion is an
example of an unreproducible
experiment.
Hypothesis Testing
Observe Phenomenon
Propose Hypothesis
Designv Study
v
v
Collect and Analyze Data
Interpret Results
Draw Conclusions
Statistics are an important
Part of the study design
Comparing Two Means
• Are these two means
significantly different?
• Variability can strongly
influence whether the means
are different. Consider these
3 scenarios: Which of these
will likely yield significant
differences?
Comparing Two Means
Student t-test
• Introduced in 1908 by
William Sealy Gosset.
• Gosset was a chemist working
for the Guiness Brewery in
Dublin.
• He devised the t-test as a
way to cheaply monitor the
quality of Stout.
• He was forced to use a penname by his employer-he
chose to use the name
Student.
• N < 30
• Independent data points,
except when using a paired
t-test.
• Normal distribution for
equal and unequal variance
• Random sampling
• Equal sample size.
• Degrees of freedom
important.
• Most useful when
comparing 2 sample means.
The Student t-test
• Given two data sets, each
characterized by it’s mean,
standard deviation, and
number of samples, we can
determine whether the
means are significant by
using a t-test.
Note below that the difference
between the means is the same but
The variability is very different.
An Example
Drop #
Sample 1
Sample 2
1
345
134
2
376
116
3
292
154
4
415
142
5
359
177
6
364
111
7
298
189
8
295
187
9
352
166
10
316
184
• The null hypothesis
states that there is no
difference in the
means between
samples:
• 1) Calculate means.
• 2) Calculate SDs.
• 3) Calculate SEs.
• 4) Calculate t-value.
• 5) Compare tcalc to ttab.
• 6) Accept/reject Ho.
Plot Data
Box Plot
Bar Graph
1) Calculate Mean
M1 
M2 

 X1
N1
 X2
N2

345  376  292  415  359  364  298  295  352  316
 341
10

134 116 154 142 177 111189 187 166 184
 156
10
2) Calculate SD
SD1 
(xi  M1 )2 
N 1

(345  371)2  (376  341)2  (292  341)2  (415  341)2  (359  341)2  (364  341)2  (298  341)2  (295  341)2  (352  341)2  (316  341)2
9

16  1225  2401  5476  324  529  1849  2116  121  625
 1631  40
9

SD2 
(xi  M 2 )2 
N 1


(134 156)2  (116 156)2  (154 156)2  (142 156)2  (177 156)2  (111 156)2  (189 156)2  (187 156)2  (166 156)2  (184 156)2
9

484  1600  4  196  441  2025  1089  961  100  784
 854  29
9
3) Calculate SE
SD1
SE1 

N
SD
SE2  2 
N

40
40

 12.5  13
10 3.2
29
29

 9.1  9
10 3.2
4) Caculate the t-statistic
Sample 1
Sample 2
Mean
341
156
SD
40
29
SE
13
9
N
10
10
t=

341-156
185 185


 11.6
(13) + (9)
250 16
2
2
Now we have to compare our t-value to
a table of critical t-values to determine
whether the sample means differ.
But….
We first have to determine the
degrees of freedom….
• Describe the number
of values in the final
calculation of a
statistic that are
free to vary.
• For our data set, the
degrees of freedom
is 2N-2 or 2(10)-2 or
20-2=18.
Why 18 degrees of freedom….
• To calculate SD we must
first calculate the mean
and then compute the
sum of the several
squared deviations from
that mean.
• While there are n
deviations, only n-1 are
actually free to assume
any value whatsoever.
• This is because we used
an n value to calculate
the mean.
• Since we have 2 data
sets, then df=2n-2=18
Did you hear the one about the
statistician who was thrown in jail?
• He now has zero degrees of freedom.
5) Compare tcalc to ttab for 18 df
• For the 95% confidence
level and a df of 18,
ttab=2.101. Our t-value
was 11.6.
• Since tcalc> ttab, then we
must reject the Ho and
conclude that the
sample means are
significantly different.
Your Turn…
One-tailed vs two-tailed t-test
One-tailed t-test
A one-tailed test will test either if the mean is significantly
greater than x or if the mean is significantly less than x,
but not both. The one-tailed test provides more power to
detect an effect in one direction by not testing the effect
in the other direction.
Two-tailed t-test
A two-tailed test will test both if the mean is significantly
greater than x and if the mean significantly less than x.
The mean is considered significantly different from x if the
test statistic is in the top 2.5% or bottom 2.5% of its
probability distribution, resulting in a p-value less than 0.05.
Paired vs Unpaired t-test
Paired
• The observed data are from
the same subject or from a
matched subject and are
drawn from a population with
a normal distribution.
• Example: Measuring glucose
concentration in diabetic
patients before and after
insulin injection.
Unpaired
• The observed data are from
two independent, random
samples from a population
with a normal distribution.
• Example: Measuring glucose
concentration of diabetic
patients versus nondiabetics.
P values
“If the P value is low, the null hypothesis must go”
Comparing Three or More Means
Why not just do
multiple t-tests?
• If you set the
confidence level at 5%
and do repeated t-tests,
you will eventually
reject the null
hypothesis when you
shouldn’t…i.e you
increase your chance of
making a Type I error.
Number of
Groups
Number of
Comparisons
a=0.05
2
1
0.05
3
3
0.14
4
6
0.26
5
10
0.40
6
15
0.54
7
21
0.66
8
28
0.76
9
36
0.84
10
45
0.90
11
55
0.94
12
66
0.97
Frog Germ Layer Experiment
Multiple t-test results for
significance
Germ Later Surface Tensions
•
•
•
•
•
•
Endo vs Meso: p=0.0293 Yes
Endo vs Ecto: p= 0.0045 Yes
Endo vs Ecto under: p=0.0028 Yes
Meso vs Ecto: p=0.0512 No
Meso vs Ecto under: p=0.0018 Yes
Ecto vs Ecto under: p=0.0007 Yes
4 groups, 6 possible comparisons, 26% chance of detecting significant difference
When non exists.
To compare three or more means we must use
Analysis of Variance (ANOVA)
• In ANOVA we don’t
actually measured
variance. We
measure a term
called “sum of
squares.”
• There are 3 sum of
squares we need to
measure.
1)
Total sum of squares.
•
Total scatter around
the grand mean.
2) Between-group sum of
squares.
•
Total scatter of the
group means with
respect to the grant
mean.
3) Within-group sum of
squares.
•
The scatter of the
scores.
Frog Germ Layer Experiment
Germ Layer Surface Tensions
Anova/MCT
Frog Germ Layer Experiment
Germ Layer Surface Tensions
Comparison
T-test
Anova/MCT
Endo vs
Meso
Yes
No
Endo vs.
Ecto
Yes
No
Endo vs
Ecto under
Yes
Yes
Meso vs
Ecto
No
No
Meso vs
Ecto under
Yes
Yes
Ecto vs Ecto
under
Yes
Yes
ANOVA
The fundamental equation
for ANOVA is:
SSTot  SSBG  SSWG
From this we can calculate the
mean sum of squares by dividing
the sum of squares by the
degrees of freedom.
SS
SS
MS 
; MS 
df
df
BG
WG
BG
WG
BG
WG
We can then calculate the F statistic:

MS
F
MS
BG
WG
To calculate sums of squares we first
need to calculate two types of means.
• 1) group means ( X )
• 2) the grand mean (X)

SStotal   X  X
SS BG 



X  X  # groups
2

SSWG   x  x

2
2
Sum of squares of each sample (X) minus the
grand mean.
Sum of squares of each group mean
minus the grand mean, multiplied by the
number of groups.
Sum of squares of each sample (X) minus
the group mean.
df for ANOVA
•
To calculate the MS BG and MS
WG, we need to know the Df.
MS BG
SS BG

df BG
SSW G
MSW G 
dfW G

•
To determine the df for these
two parameters we need to
partition:
•
df of SSBG= n-1 of how many
groups there are. Therefore for
3 groups, df=2.
•
df of SSWG = n-1 of all groups.
Therefore for 30 samples (10 in
each of the 3 groups), df=27.
•
We can then compared the Fcalc
to the Ftab to determine
whether significant differences
exist in the entire data set.
Your Turn….
One-way versus two-way ANOVA
One-Way ANOVA
• 1 measurement variable
and 1 nominal variable.
• For example, you might
measure glycogen
content for multiple
samples of heart, liver,
kidney, lung etc…
Two-Way ANOVA
• 1 measurement variable
and 2 nominal variables.
• For example, you might
measure a response to
three different drugs in
both men and women.
Drug treatment is one
factor and gender is the
other.
ANOVA only tells us that the
smallest and largest means likely
differ from each other. But what
about other means?
In order to test other means,
we have to run post hoc multiple
comparisons tests.
Post hoc tests
• Are only used if the null hypothesis is rejected.
• There are many, including Tukey’s, Bonferroni’s,
Schefe’s, Dunn’s, Newman-Keul’s.
• All test whether any of the group means differ
significantly.
• These tests don’t suffer from the same issues
as performing multiple t-tests. They all apply
different “corrections” to account for the
multiple comparisons.
• Accordingly, some post hoc tests are more
“stringent” than others.
Linear Regression
• The goal of linear
regression is to
adjust the values of
slope and intercept
to find the line that
best predicts Y from
X.
• More precisely, the goal
is to minimize the sum
of the squares of the
vertical distances of the
points from the line.
Note that linear regression does not test whether your data are linear.
It assumes that your data are linear, and finds the slope and intercept that
make a straight line that best fits your data.
r2, a measure of goodness-of-fit of
linear regression
• The value r2 is a fraction
between 0.0 and 1.0, and has
no units
• An r2 value of 0.0 means
that knowing X does not help
you predict Y.
• When r2 equals 1.0, all points
lie exactly on a straight line
with no scatter. Knowing X
lets you predict Y perfectly.
How is r2 calculated?
• The left panel shows the
best-fit linear regression
line. In this example, the
sum of squares of those
distances (SSreg) equals
0.86.
• The right half of the
figure shows the null
hypothesis -- a horizontal
line through the mean of
all the Y values. Goodnessof-fit of this model
(SStot) is 4.907.
An Example
Power Analysis: How many samples
are enough?
• If sample size is too
low, the experiment
will lack the precision
to provide reliable
answers to the
questions it is
investigating.
• If sample size is too
large, time and
resources will be
wasted, often for
minimal gain.
Calculation of power requires 3
pieces of information:
1) A research hypothesis.
•
This will determine how many control and
treatment groups are required.
2) The variability of the outcomes measure.
•
Standard Deviation is the best option.
3) An estimate of the clinically (or
biologically) relevant difference.
•
A difference between groups that is large
enough to be considered important. By
convention, this is set at 0.8 SD.
An Example
• We would like to design
a study to measure two
skin barriers for burn
patients.
• We are interested in
“pain” as the clinical
outcome using the
“Oucher” scale (1-5).
• We know from previous
studies that the Oucher
scale has a SD of 1.5.

• What is the sample
size to detect 1 unit
(D) on the Oucher
scale.
Here’s the equation:
( 1   2 )  (z1a / 2  z1 )2
2
n
2
D2
Here, a is the critical value of z at 0.975
(1.96) and  is power at 80% (0.84).
n
(1.5 2  1.5 2 ) (1.96  0.84)2
12
 35.3  36
What would happen to n if our clinically relevant difference
was set at 2 Oucher units. Here:

n
(1.52  1.52 )  (1.96 0.84) 2
22
 8.8  9
What would happen to n if our clinically relevant difference
was set at 0.5 Oucher units. Here:

n
(1.52 1.52 )  (1.96 0.84) 2
2
0.5
 141.12  142
Another Example
• You want to measure whether
aggregates of invasive cell
lines are less cohesive than
those generated from noninvasive counterparts.
• You know that SD for the
control group is 3dynes/cm
and for the invasive group is
2dynes/cm.
• You set the a at 0.05
(=1.96) and at 80%
(=0.84) and D at 2

dynes/cm.
• How many aggregates
from each group would you
need?
n
(3 2  2 2 ) (1.96  0.84)2
22
(9  4) (2.80)2
n
4
13  7.84 101.9
n

 25.5  26
4
4
Therefore, we need 26 aggregates
in each group to be able to
reliably detect a difference of 2
dynes/cm cohesivity between invasive
and non-invasive cells.
In general, how do variability, detection
difference, and power influence n?
More variability in the data
Higher n required
Less variability in the data
Fewer n required
Detect small differences between
groups
Higher n required
Detect large differences between
groups
Fewer n required
Smaller a (0.01)
Higher n required
Less power (smaller )
Fewer n required