#### Transcript Slide 1 - Institute of Information Sciences and Technology

159.410/710 User Interface Design © Paul Lyons 2010 Epistemology Approaches to knowledge Engineers Holistic Constructionist Scientists Reductionist Analyst/synthesists Arts Complexity Subjectivists I hear and I forget. I see and I remember. I do and I understand. Confucius (attributed) ~ 3 ~ 159.410/710 User Interface Design Epistemology Types of HCI research Development of interaction widgets Usability – efficiency, enjoyability Internet & web Social applications Mobile applications – shoehorning complex applications onto tiny screens ~ 4 ~ 159.410/710 User Interface Design Characteristics of HCI research HCI research has a focus on people what computers can do is not the main point what computers can help people do is variety of contributing disciplines sociology psychology statistics computer science observation techniques controlled experiments handing noisy data developing (genuinely) new interface paradigms rigorous research methodologies are required it isn’t enough to develop a new interface or a new interface component does the new interface make things better? how do you know? are you sure? ~ 5 ~ 159.410/710 User Interface Design Characteristics of HCI research Things to measure time to complete a task number of tasks completed in a standard time performance measures largely industry-driven accuracy of performing a task accuracy of performing a task enjoyment emotional wellbeing why people choose to spend discretionary time using computers difficult to measure in a laboratory setting e.g. contributing to Wikipedia why people choose to stop using applications people’s usage patterns of mobile computing devices and social apps ~ 6 ~ 159.410/710 User Interface Design Characteristics of HCI research Replication of results Multiple studies that reach the same or similar conclusion Triangulation by different research methods if a single method produces identical results repeatedly, the reason may be a flawed method Results may change over time reasons for using a computer 1980s vs. 2000s finding information – searching and tagging vs. hierarchical directories ~ 7 ~ 159.410/710 User Interface Design Characteristics of HCI research Tradeoffs speed vs accuracy (Fitt’s Law) better interface vs. familiar interface more efficient keyboard vs. QWERTY keyboard iPad is cool and new it’s the coolth that persuades people to adopt it how do you measure that? security vs. usability eye-scans and fingerprints? a revolutionary, undeniably better computer vs. environmental costs of computer disposal ~ 8 ~ 159.410/710 User Interface Design Characteristics of HCI research HCI is an interdisciplinary discipline in the past human factors engineering psychology all suit experimental design in the present in the future ubiquitous? Virtual Reality? mind-activated? library science information science art and design competition with judges (cf. architecture)? reductionist widely accepted: statistical tests control groups reliable more holistic more subjective less trusted (not less trustworthy) ~ 9 ~ 159.410/710 User Interface Design Epistemology Types of model generative produce principles and guidelines or actual systems (e.g. Colour Harmoniser) prescriptive suggest ways of building things (e.g. patterns) predictive allow us to plan for the future explanatory explain what causes the data that have been observed descriptive generalisations about the data – allow us to see order amidst chaos ~ 10 ~ 159.410/710 User Interface Design Experimental Research Usability Testing The goal of usability testing is simply to find flaws in a specific interface A small number of users may take part … it can be structured or unstructured. … there is no claim that the results can be generalised. The goal is simply to find flaws and help the developers improve the interface. If that involves jumping in and helping a user or changing the task mid-process, that is acceptable. Lazar, Feng and Hocheiser Research Methods in HCI 2010 ~ 11 ~ 159.410/710 User Interface Design Experimental Research HCI research – 57 varieties observations field studies surveys usability studies interviews focus groups controlled experiments rich, not reproducible reproducible, reductionist descriptive research; observations – may be quantitative and accurate relational research; establishes correlations between factors – does not establish causality typing speed correlated with hours spent gaming does time spent gaming improve typing? are good typists successful gamers? experimental research; can establish causality allocate users to two groups randomly expose one group to games, the other not measure typing ability of groups after a suitable interval ~ 12 ~ 159.410/710 User Interface Design Experimental Research Null and alternative hypotheses H0 nochange effect no in speed no change in user satisfaction the widgetcauses… causes… thenew treatment the mutual exclusion seesaw H1 some an effect change in speed and also some change in user satisfaction However… testing multiple hypotheses can complicate controls and variables a good hypotheses is clear and unambiguous clearly distinguishes between independent and dependent variables is testable in a single experiment clearly identifies control groups and conditions of experiment generally derives from preliminary observational studies each combination of independent variables is a condition ~ 13 ~ 159.410/710 User Interface Design Experimental Research Independent and dependent variables “cause” “effect” independent variable dependent variable variations in value are under the experimenter’s control variations in value are observed Null hypothesis: there is no speed change between the original widget and the new widget experimenter measures this it’s the dependent variable if the experimental results are plotted on a graph independent variable goes on the x-axis dependent variable goes on the y-axis experimenter makes choice of widget it’s the independent variable dependent variable independent variable ~ 14 ~ 159.410/710 User Interface Design Experimental Research Typical independent variables Typical dependent variables Technology Efficiency typing vs. speech mouse vs. joystick, touchpad etc time to complete a task, speed Accuracy Design error rate pull-down vs. pop-up menu colour scheme layout Subjective Satisfaction Demographic Ease of learning and retention rate Likert scale ratings gender, age, experience, education time to learn, loss after a week, a month Context Cognitive Demand lighting, noise, seated vs standing, other people in the vicinity time before onset of fatigue ~ 15 ~ 159.410/710 User Interface Design Experimental Research Components of an experiment Treatments randomisation is often necessary things we want to compare (cf. medical treatments) compare two splines A and B for a CAD tool use a within-subjects design measure time-to-complete task with A, then B flaw: subjects learnt the task, so B is best solution; randomise order of tasks. Units “things” that treatment is applied to (normally human subjects) comparing two treatments using a between-subjects design allocate subjects to treatment A, till enough then allocate subjects to treatment B Assignment method how subjects are assigned to treatments ~ 16 ~ flaw: A is applied to early birds, B to late sleepers Solution: randomise allocation to the treatments 159.410/710 User Interface Design Experimental Research Significance tests this approach depends on being able to distinguish between an effect and no effect how do we decide whether or not an effect is real? we measure the probability that it occurred by chance if that probability is sufficiently low, we say that there is a significant effect. p < 0.05 0.005 says that the probability that the observed behaviour occurred by chance is less than 5% 0.5% or that the probability that the effect is real exceeds 95% 99.5% whether that’s good enough depends on the application for a new drug, a significance level of p < 0.05 is not good enough if the null hypothesis is “the standard dose is not fatal” ~ 17 ~ 159.410/710 User Interface Design Experimental Research Type I errors & Type II errors (aka “false positive”) (aka “false negative”) study concludes widget no better widget is no different widget is better study concludes widget is better type I (gullibility) error type II (blindness) error probability of Type I error = α probability of Type II error = β generally aim for p < 0.05 probability that effect occurred by chance, p-value = α probability of correctly rejecting an incorrect null hypothesis, statistical power of a test = 1- β α and β are related; the less gullible you are, the more likely you are to be blind to improvements keep β low by using large sample sizes probability of finding an effect that does exist ~ 18 ~ 159.410/710 User Interface Design Experimental Research Limitations of experimental research controlled experiments are a very powerful technique but hypothesis must be well-defined number of variables must be limited, preferably orthogonal HCI problems can be difficult to define many, interrelated factors may be involved factors other than independent variables may not affect dependent variables e.g. difficult to factor out familiarity with technology in an age-related study prescreen to ensure homogeneity between subject groups use statistical techniques designed to filter out confounding factors (analysis of covariables) subjects’ behaviour in a lab differs from behaviour in real world ~ 19 ~ 159.410/710 User Interface Design Experimental Design True Experiments <x> is an intuitive interface a testable hypothesis subjects will be able to use <x> correctly it in under 1 minute all of them? two conditions (one treatment, one control) 50% > 75% sometimes more quasi-experiment random assignment of subjects no? non-experiment quantitative measurements e.g. not ethical to randomly assign children to parents to study effect of single-parent upbringing significance tests attention to bias elimination replicable ~ 20 ~ 159.410/710 User Interface Design Experimental Design Other types of experiment quasi-experiments (subjects not randomly assigned) may be necessary for practical or ethical reasons can still produce useful results but more susceptible to confounding factors non-experiments (no control group) insufficient subjects – use what’s available researcher lacks influence (modified Word interface) may be necessary for practical or ethical reasons can still produce useful results but even more susceptible to confounding factors e.g. usability trials – aim is to detect problems formal experiments are designed to detect subtle effects to factor out researcher bias researcher’s specialist knowledge may trump population’s preferences (e.g. user surveys for Xerox showed little demand for such a device) is demonstrating that it is possible to build something a valid experiment? engineering research often stops at this point ~ 21 ~ 159.410/710 User Interface Design Experimental Design Important considerations number of independent variables Hypothesis: There is no difference between target selection speed when using a mouse, a joystick, or a trackball to select icons of different sizes (small, medium, large) How many independent variables? number of conditions 3x3=9 number of dependent variables 1 • type of pointing device • icon size measurement may need careful thought e.g. is typing speed wpm or error-free wpm? is speech recogniser error rate definitive? ~ 22 ~ 159.410/710 User Interface Design Experimental Design Structure of an experiment basic design factorial design withingroup 1 independent variable >1 independent variable only one group but subjects experience multiple conditions eliminates individual differences smaller population required learning and fatigue may cause effects one group per condition each subject experiences only one condition betweengroup splitplot no learning effect less fatigue effect susceptible to differences between groups mix of within-group and between-group ~ 23 ~ Order tasks using a Latin square to factor out fatigue Subj1 1 2 3 Subj2 2 3 1 Subj3 3 1 2 randomise order of conditions and/or provide preliminary training tasks with large diffs between individuals e.g. cognitively complex tasks suits small subject pools requires big groups, randomly selected cognitively simple tasks, no learning effect (inter-subject diffs increase with complexity) tasks where subject difference is indep var. effect of using GPS (binary, within-group) on three age-groups (between-groups) 159.410/710 User Interface Design Experimental Design Watch out for interaction effects if the effect of a variable depends on the value of the other variable, the variables interact the variables are (should be) independent, but their effects interact task duration complex task simple task Office 2003 genuinely independent variables Office 2007 ~ 24 ~ 159.410/710 User Interface Design Experimental Design Reliability of experimental results random errors research involving human subjects is noisy noise value actual value observed value = actual value + random error (noise) sample size with increased sample size, actual values add, relative size of noise tends to 0 systematic errors the same each time – not cancelled by large sample more deleterious than noise ~ 25 ~ 159.410/710 User Interface Design Experimental Design Systematic errors instrumentation errors can often replace instruments (e.g. stopwatch) with software experimental procedure non-random task condition order allows learning & fatigue to have an effect opposite systematic errors instructions may introduce errors complete the task as fast as possible vs. take your time, no rush produced different results instructions from different members of research team may differ subjects under time stress were slower! trivial details data entry on a PDA – holding PDA in hand produced different results from sitting PDA on table randomise conditions and tasks when using within-group design use identical instructions for all participants – written or recorded run pilot studies beforehand to detect potential biases don’t want to realise half-way through the experiment that all the results are compromised you have overlooked something use real participants from target population ~ 26 ~ 159.410/710 User Interface Design Experimental Design Systematic errors instrumentation errors can often replace instruments (e.g. stopwatch) with software participants experimental procedure non-random task condition order allows learning & fatigue to have an effect opposite systematic errors instructions may introduce errors complete the task as fast as possible vs. take your time, no rush produced different results instructions from different members of research team may differ trivial details data entry on a PDA – holding PDA in hand produced different results from sitting PDA on table randomise conditions and tasks when using within-group design use identical instructions for all participants – written or recorded run pilot studies beforehand to detect potential biases don’t want to realise half-way through the experiment that all the results are compromised you have overlooked something use real participants from target population ~ 27 ~ 159.410/710 User Interface Design Experimental Design Systematic errors instrumentation errors experimental procedure participants age bias education bias (particularly prevalent in university studies) interest in the product (or its domain) recruit a set of participants representative of target population may be quite skewed - e.g. for elder-care systems don’t stress the participants explain that the system is under test, not them any result they produce is good organise schedule conservatively so participants aren’t inconvenienced it’s polite, and it produces better results! ~ 28 ~ 159.410/710 User Interface Design Experimental Design Systematic errors instrumentation errors experimental procedure participants age bias education bias (particularly prevalent in university studies) interest in the product (or its domain) recruit a set of participants representative of target population may be quite skewed - e.g. for elder-care systems don’t stress the participants explain that the system is under test, not them any result they produce is good organise schedule conservatively so participants aren’t inconvenienced it’s polite, and it produces better results! ~ 29 ~ 159.410/710 User Interface Design Experimental Design Systematic errors instrumentation errors experimental procedure participants experimenter behaviour express no opinion about the system maintain noncommittal body language be ready to start on time use the same experimenter each time, if possible, or a recorded protocol if multiple experimenters are necessary, require them to follow a written experimental protocol ~ 30 ~ 159.410/710 User Interface Design Experimental Design Systematic errors instrumentation errors experimental procedure participants experimenter behaviour express no opinion about the system maintain noncommittal body language be ready to start on time use the same experimenter each time, if possible, or a recorded protocol if multiple experimenters are necessary, require them to follow a written experimental protocol ~ 31 ~ 159.410/710 User Interface Design Experimental Design Systematic errors instrumentation errors experimental procedure participants experimenter behaviour environmental factors physical environment: noise, temperature , humidity, lighting, vibration social environment: people nearby, power relationships of participants and people nearby, interruptions quiet room suitable lighting comfortable furniture non-distracting environment observation by CCTV or from behind 1-way mirror, if possible for field studies, visit the location beforehand to check for problems ~ 32 ~ 159.410/710 User Interface Design Experimental Design Experimental Procedures 1. 2. 3. 4. 5. 6. 7. Identify a research hypothesis Design the study Run a pilot Recruit participants Run data collection sessions Analyse the data Report the conclusions 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Set up the experimental environment/equipment Greet participants Outline the purpose of the study and the procedures Obtain participants’ consent Assign participants to experimental condition Participants complete pre-survey (if any) Participants complete training task Participants complete survey task Participants complete post-survey (if any) Debrief (can be more useful than formal survey) ~ 33 ~ 159.410/710 User Interface Design Analysing the Data There are many analytical tools independent samples t-test paired-samples (e.g. before/after tests) one-way ANOVA factorial repeated measures correlation regression Chi-square ~ 34 ~ 159.410/710 User Interface Design Analysing the Data Data Preparation error checking and correction incorrect grouping of survey forms computing experience > age impossible age paper forms needs checking survey software (or Excel) could check at data collection time age: 23½, 23 years and 7 months, nearly twenty-four if data can’t be corrected, may need to be thrown away e.g. because subjects are anonymous may need pre-processing coding text as numbers (e.g. 1, 2, 3, for no degree, bachelors, P/G) extracting general themes from individual interviews coding interaction events (e.g. click(100, 250) “select book”) consistency may need to be verified if > 1 coder analysis may require data to be restructured related information in different surveys (pre & post trial, for example) analysis software requires specific formatting SPSS independent samples and paired samples t-tests use same data in 1 column and 2 parallel columns ~ 35 ~ 159.410/710 User Interface Design Analysing the Data Start with an exploratory analysis good 50 forthinitial percentile comparison - good for between most skewed popular groups (e.g. Pareto) distribution and when outliers may be errors Get a feel for the data mean, median, mode range, variance, standard deviation Box-and-whisker plots Histograms measures of central tendency measures of spread datamax – data estimates spread – likely around to increase mean: sign-independent, with sample size &but sensitive has different to outliers) units from samples min (crude more complex measures of spread assume a normal distribution may be necessary to modify data to conform variance n ∑ (xi – x)2 i=1 ~ 36 ~ (n – 1) 159.410/710 User Interface Design Analysing the Data Start with an exploratory analysis Get a feel for the data mean, median, mode range, variance, standard deviation measures of central tendency measures of spread estimates spread around mean: sign-independent, but has different units from samples more complex measures of spread assume a normal distribution may be necessary to modify data to conform variance n ∑ (xi – x)2 i=1 ~ 37 ~ (n – 1) 159.410/710 User Interface Design Analysing the Data Start with an exploratory analysis Get a feel for the data mean, median, mode range, variance, standard deviation measures of central tendency measures of spread same units as samples more complex measures of spread assume a normal distribution may be necessary to modify data to conform √ s n ∑ (xi – x)2 i=1 (n – 1) standard deviation measures mean deviation of samples from mean of samples ~ 38 ~ 159.410/710 User Interface Design Analysing the Data Mean differences treatments are different two groups, same task comparing two search engines differences between means difference between treatments one group, two tasks Δ=5 Δ=5 significance tests are necessary to determine probability that difference is due to chance are these means different? are these means different? test IVs conditions per IV between-groups 1 1 ≥2 2 ≥3 ≥2 independent-samples t-test 1-way ANOVA factorial ANOVA within-group 1 1 ≥2 2 ≥3 ≥2 paired-samples t-test repeated measures ANOVA repeated measures ANOVA mixed ≥2 ≥2 split-plot ANOVA ~ 39 ~ 159.410/710 User Interface Design Analysing the Data To compare 2 means use a t-test null hypothesis task completion times for subjects using word-prediction software do not differ from task-completion times for subjects who do not use the software signal x1 – x2 t= noise = √ s12 + s 22 n1 n2 signal noise s2 is, of course, the variance remember: same as p-value generally say there’s a significant effect if α ≤ 0.05 for 2 gps however, significance of a particular t depends on size of subject groups specifically degrees of freedom, df = total participants – number of groups = n1 + n2 -2 consult published tables showing α value for particular (t, df) combinations statistical software usually outputs α from builtin tables ~ 40 ~ 159.410/710 User Interface Design Analysing the Data To compare 2 means use a t-test null hypothesis task completion times for subjects using word-prediction software do not differ from task-completion times for subjects who do not use the software signal x1 – x2 t= noise = √ s12 + s 22 n1 n2 signal noise generally say there’s a significant effect if α ≤ 0.05 however, significance of a particular t depends on size of subject groups specifically degrees of freedom, df = total participants – number of groups = n1 + n2 -2 consult published tables showing α value for particular (t, df) combinations statistical software usually outputs α from builtin tables ~ 41 ~ 159.410/710 User Interface Design Analysing the Data To compare 2 means use a t-test null hypothesis task completion times for subjects using word-prediction software do not differ from task-completion times for subjects who do not use the software for unrelated samples use independent-samples t-test for a single group use paired-samples t-test times for group using word-prediction software times for group uses conventional software times for subject using word-prediction software and for same subject uses conventional software SPSS t-test data comprises times and group membership SPSS t-test data comprises times with software and times without software t-value t-value high t-value high P(null hypothesis false) ~ 42 ~ 159.410/710 User Interface Design Analysing the Data What if hypothesis predicts sign of difference? if we know that sign of effect will be + or – instruct analysis software to use a 1-tailed t-test α = 0.1 indicates same level of confidence as α = 0.05 for 2-tailed test Do NOT use a one-tailed t-test because 2-tailed test indicates no significance test should be hypothesis-driven, not data-driven! ~ 43 ~ 159.410/710 User Interface Design Analysing the Data ANOVA: within-gp variances vs. population variance null hypothesis: sample sets A, B, C & D belong to 1 population if smeans for means of sample sets A - D > scombined population there is more than 1 population F= found variation in averages/expected variation in averages F=1 supports null hypothesis x1i - x1 Σ(x i )2 x1 SS1 Sum of Squares1 x2 SS2 ~ 44 ~ 159.410/710 User Interface Design Analysing the Data ANOVA: within-gp variances vs. population variance null hypothesis: sample sets A, B, C & D belong to 1 population if smeans for means of sample sets A - D > scombined population there is more than 1 population F= found variation in averages/expected variation in averages ~ 45 ~ 159.410/710 User Interface Design Analysing the Data ANOVA: within-gp variances vs. population variance null hypothesis: sample sets A, B, C & D belong to 1 population if smeans for means of sample sets A - D > scombined population there is more than 1 population F= found variation in averages/expected variation in averages within-groups variability (aka error variance) variability due to differences between means (aka effect) if effect variance is large w.r.t. error variance treated group & untreated groups act as different populations (treatment has an effect) Group 1 Group 2 Observation 1 Observation 2 Observation 3 2 3 1 6 7 5 Mean Sums of Squares (SS) 2 2 6 2 Overall Mean Total Sums of Squares 4 28 much larger differences between means than in the diagram MAIN EFFECT ANOVA determines p taking df into account ~ 46 ~ SS Effect Error 24.0 4.0 df 1 4 MS F p 24.0 1.0 24.0 .008 159.410/710 User Interface Design Analysing the Data Use ANOVA (aka F-test) to compare means of ≥ 2 groups the parameter actually generated by the calculation (cf t-test t) we’ve already seen the special case of ANOVA for comparing 2 means: the t-test design IVs conditions 1-way ANOVA between-group 1 ≥3 factorial ANOVA between-group ≥2 repeated measures ANOVA within-group split-plot ANOVA between-group and within-group ~ 47 ~ (F = t2) 159.410/710 User Interface Design Analysing the Data Use ANOVA (aka F-test) to compare means of ≥ 2 groups we’ve already seen the special case of ANOVA for comparing 2 means: the t-test design IVs conditions 1-way ANOVA between-group 1 ≥3 factorial ANOVA between-group ≥2 repeated measures ANOVA within-group split-plot ANOVA between-group and within-group ~ 48 ~ 159.410/710 User Interface Design Analysing the Data Use ANOVA (aka F-test) to compare means of ≥ 2 groups we’ve already seen the special case of ANOVA for comparing 2 means: the t-test 1-way ANOVA design IVs conditions between-group 1 ≥3 SPSS data input for 1-way ANOVA (pared down to the minimum) 245 236 321 . . 246 213 265 . . 178 289 222 . 0 0 0 . . 1 1 1 . . 2 2 2 . task durations code SPSS output from the analysis standard text entry (control group) sum of sqs between-group within-group df Mean sq F significance 7842.250 2 3921.125 2.174 0.139 37880.375 21 1803.827 text-prediction So, how would we summarise this thesis or report? the statistical calculation greater than 0.05in F-value not significant same calculation produces 2produces sets of aresults – these aren’t relevant dictation significance obtained by table lookup A 1-way ANOVA analysis with text-entry method as independent variable and task completion time as dependent variable suggests there is no significant difference between the three conditions: (F(2, 21) = 2.174, n.s.) 24 samples in 3 groups gives df = 21 ~ 49 ~ 159.410/710 User Interface Design Analysing the Data Use ANOVA (aka F-test) to compare means of ≥ 2 groups we’ve already seen the special case of ANOVA for comparing 2 means: the t-test 1-way ANOVA design IVs conditions between-group 1 ≥3 SPSS data input for 1-way ANOVA (pared down to the minimum) 245 236 321 . . 246 213 265 . . 178 289 222 . 0 0 0 . . 1 1 1 . . 2 2 2 . task durations code SPSS output from the analysis standard text entry (control group) sum of sqs between-group within-group df Mean sq F significance 7842.250 2 3921.125 2.174 0.139 37880.375 21 1803.827 text-prediction So, how would we summarise this in a thesis or report? dictation A 1-way ANOVA analysis with text-entry method as independent variable and task completion time as dependent variable suggests there is no significant difference between the three conditions: (F(2, 21) = 2.174, n.s.) 24 samples in 3 groups gives df = 21 ~ 50 ~ 159.410/710 User Interface Design Analysing the Data Use ANOVA (aka F-test) to compare means of ≥ 2 groups we’ve already seen the special case of ANOVA for comparing 2 means: the t-test design IVs conditions 1-way ANOVA between-group 1 ≥3 factorial ANOVA between-group ≥2 SPSS data entry format Q: does nature of the task (composition or transcription) affect performance? data entry 2 method 1 0 SPSS fn is called Univariate analysis task time task type data entry method task type 0 1 245 236 … 246 213 … 178 289 … 256 269 … 265 232 … 189 321 gp1 gp2 dictation gp3 gp4 predictive gp5 gp6 standard ~ 51 ~ 0 0 … 0 0 … 0 0 … 1 1 … 1 1 … 1 1 0 0 … 1 1 … 2 2 … 0 0 … 1 1 … 2 2 159.410/710 User Interface Design Analysing the Data Use ANOVA (aka F-test) to compare means of ≥ 2 groups we’ve already seen the special case of ANOVA for comparing 2 means: the t-test design IVs conditions 1-way ANOVA between-group 1 ≥3 factorial ANOVA between-group ≥2 SPSS data entry format Q: does nature of the task (composition or transcription) affect performance? data entry 2 method 1 0 SPSS fn is called Univariate analysis task time task type data entry method task type 0 1 245 236 … 246 213 … 178 289 … 256 269 … 265 232 … 189 321 gp1 gp2 dictation gp3 gp4 predictive gp5 gp6 standard ~ 52 ~ 0 0 … 0 0 … 0 0 … 1 1 … 1 1 … 1 1 0 0 … 1 1 … 2 2 … 0 0 … 1 1 … 2 2 159.410/710 User Interface Design Analysing the Data Use ANOVA (aka F-test) to compare means of ≥ 2 groups we’ve already seen the special case of ANOVA for comparing 2 means: the t-test design IVs conditions 1-way ANOVA between-group 1 ≥3 factorial ANOVA between-group ≥2 Q: does nature of the task (composition or transcription) affect performance? SPSS output task type entry method interaction task * entry error IVs sum of sq df mean square 2745.188 1 2745.188 1.410 0.242 17564.625 2 8782.313 4.512 0.017 114.875 2 57.437 0.030 0.971 81751.625 42 1946.467 F significance task type caused no significant effect: F(1, 42) = 1.41, n.s entry method had a significant effect: F(2, 42) = 4.51, p < 0.05 there is no significant interaction between task and entry ~ 53 ~ 159.410/710 User Interface Design Analysing the Data Use repeated measures ANOVA for within-group studies previous between-groups design requires lots of participants (72, if 12 subjects/group) what about a within-groups design? specially if only some are eligible – e.g. disabled to study effect of 1 IV, use 1-way repeated measures ANOVA 3 data points from each participant, all in the same row 245 236 321 246 213 265 278 289 222 to study effect of >1 IV, use multi-level repeated measures ANOVA for 3 x 2 factorial study, 6 data points per participant per row transcription composition standard predictive dictation standard predictive dictation participant1 245 246 178 256 265 189 participant2 236 213 289 269 232 321 within-groups design faster less fatigue can control for learning smaller sample ~ 54 ~ 159.410/710 User Interface Design Analysing the Data Assumptions of t tests and F tests no systematic errors e.g. different instructors, with different sets of instructions correlation between errors of participants in each instructor’s group will systematically skew results homogeneity of variance (identical distribution of errors) populations should have comparable variances x1 x2 significantly do these distributions have^different means? not easy to say, either for people or for software normal distribution of errors may be violated if data is highly skewed (non-normal distribution) ~ 55 ~ 159.410/710 User Interface Design Analysing the Data Use Pearson’s r to identify correlations is factora related to factorb? determine Pearson’s product moment correlation coefficient (r) r varies from -1 to 1 -1: perfect negative linear relationship 0: no relationship +1: perfect positive linear relationship computer experience 12 6 3 19 can determine r values for time with time with standard predictive software software 245 236 321 212 246 213 265 189 experience * standard s/ware experience * predictive s/ware standard * predictive experience r experience timestd timepred 1 -0.723 -0.468 -0.723 0.043 1 -0.468 0.325 0.243 0.325 0.432 1 significance timestd timepred r significance r significance (experience, timestd) has significant –ve correlation time with std software decreases with computer experience no other significant correlations ~ 56 ~ 159.410/710 User Interface Design Analysing the Data Use Pearson’s r to identify correlations r2 represents percentage of variance in X that can be explained by variable Y represents percentage of variance in Y that can be explained by variable X but beware: correlation does not imply causation e.g. negative correlation between income and speed of internet search does earning more make you worse at using the internet? or does higher income imply greater age and less familiarity with the internet? higher income age less internet experience lower performance ~ 57 ~ 159.410/710 User Interface Design