presentation

Transcript presentation

Empirical Usability Testing in a
Component-Based Environment:
Improving Test Efficiency with
Component-Specific Usability Measures
Willem-Paul Brinkman
Brunel University, London
Reinder Haakma
Philips Research Laboratories Eindhoven
Don Bouwhuis
Eindhoven University of Technology
Topics
 Research Motivation
 Testing Method
 Experimental Evaluation of the Testing
Method
 Conclusions
Research Motivation
Studying the usability of a
system
Research Motivation
External Comparison
relating differences in
usability to differences
in the systems
Internal Comparison trying
to link usability problems with
parts of the systems
Component-Based Software Engineering
 Multiple versions
testing paradigm
(external
comparison)
 Single version
testing paradigm
(internal
comparison)
Manage
Manage
Create
Create
feedback
new
components
Support
Support
feedback
Product requirements
and existing software
components
from repository
Reuse
Re-use
Re-use
products
Research Motivation
PROBLEM
1. Only empirical analysis of the overall
system such as task time, keystrokes,
questionnaires etc - not powerful
2. Usability tests, heuristic evaluations,
cognitive walkthroughs where experts
identify problems – unreliable
SOLUTION
•
Component-Specific usability
measures: more powerful and reliable
Testing Method
Procedure



Normal procedures of a
usability test
User task which requires
interaction with
components under
investigation
Users must complete the
task successfully
Component-specific component
measures
Perceived
ease-of-use
Perceived
satisfaction
Objective
performance
Component-specific questionnaire helps
the users to remember their interaction
experience with a particular component
Component-specific component
measures
Perceived
ease-of-use
Perceived
satisfaction
Objective
performance
Perceived Usefulness and Ease-of-use
questionnaire (David, 1989), 6
questions, e.g.

Learning to operate [name] would be easy
for me.

I would find it easy to get [name] to do
what I want it to do.
Unlikely
Likely
Component-specific component
measures
Perceived easeof-use
Post-Study System Usability Questionnaire
(Lewis, 1995)

The interface of [name] was pleasant.
Perceived
satisfaction

I like using the interface of [name].
Objective
performance
Strongly
Strongly
disagree
agree
Component-specific component
measures
Perceived easeof-use
Perceived
satisfaction
Objective
performance
Number of messages received
directly, or indirectly from lowerlevel components.
The effort users put into the
interaction
Control
process
Component
Control loop: Each message is a
cycle of the control loop
Architectural Element
Interaction component
Elementary unit of an interactive
system, on which behaviouralbased evaluation is possible.
A unit within an application that can
be represented as a finite state
machine which directly, or
indirectly via other components,
receives signals from the user.
Example of suitable
agents-models
A
A
C
C
P
M
P
A
C
P
PAC
V
V
Users must be able to perceive or
infer the state of the interaction
component.
C
MVC
Interactor
CNUCE model
Interaction layers
Control
results
Processor
15 + 23 =
38
01111
100110
10111
Add
Control
equation
Editor
15
23
+
=
15
38
15 ++ 23
User
Calculator
Control Loop
Reference value
User
Evaluation
User message
Feedback
Component
System
Lower Level Control Loop
User
Calculator
Higher Level Control Loop
User
Calculator
Experimental Evaluation of
the Testing Method
80 users
8 mobile telephones
3 components were manipulated
according to Cognitive Complexity
Theory (Kieras & Polson, 1985)
1. Function Selector
2. Keypad
3. Short Text Messages
Architecture Mobile telephone
Send text
message
Voice
Mail
Call
Send Text
Message
Characters,
Cursor
position,
STM menu
direction
Read
address
list
Read text
message
Mode
restriction
Edit
Address
list
Read
Diary
Function request, Ok,
Cancel, letter, number,
cursor move, backspace
key, function results
Flow
redirection,
function
results
Telephone
Router
Edit
Diary
Function
Selector
Function request,
Ok, Cancel
Letter,
number,
cursor move
Keypad
Function
selector
Keypad
Letter
Menu direction
Mode
Main
Screen
Characters,
cursor, STM
menu icons
Screen
Mode
Screen
0..9 keys,
* key, # key,
Mode key
Backspace key
Mode
symbol
Screen
Keyboard
Keyboard
Function keys,
left key, right
key, menu key,
ok key, cancel
key
Keyboard
Stand-by
Menu
Screen
Menu
icons
Screen
Evaluation study – Function Selector
Versions:
Broad/shallow
Narrow/deep
Evaluation study – Keypad
Versions
Repeated-Key Method
“L”
Modified-Model-Position method
“J”
Evaluation study– Send Text Message
Simple
Versions
Complex
Statistical Tests
task time
8
0
number of keystrokes
x = sample mean (estimator of µ)
s = estimation of the standard deviation (σ)
sx = estimation of the standard error of the mean, sx2 = s2/n
Statistical Tests
p-value: probability of making type I, or , error,
wrongly rejecting the hypothesis that underlying
distribution is the same.
Statistical Tests
p-value: probability of making type I, or , error,
wrongly rejecting the hypothesis that underlying
distribution is the same.
Results – Function Selector
Results of two multivariate analyses and related univariate
analyses of variance with the version of the Function Selector
as independent between-subjects variable.
Measure
Normal
Joint measure
Time in seconds
Number of keystrokes
Number of messages received
Ease of use mobile phone
Ease of use menu
Satisfaction of mobile phone
Satisfaction of menu
Correcteda
Joint measure
Number of keystrokes
Number of messages received
a
Mean
df
Broad Deep Hyp. Er.
F
p
η2
—
947
461
67
5.5
5.6
4.4
4.6
—
1394
686
265
4.8
4.5
3.8
3.5
7
1
1
1
1
1
1
1
66 34.47 <0.001
72 29.56 <0.001
72 37.72 <0.001
72 155.34 <0.001
72 11.86 0.001
72 22.33 <0.001
72
4.25 0.043
72 15.96 <0.001
0.80
0.29
0.34
0.68
0.14
0.24
0.06
0.18
—
437
52
—
602
190
2
1
1
71
72
72
0.63
0.22
0.51
Corrected for all a-priori differences between versions of the components.
60.96 <0.001
20.27 <0.001
75.36 <0.001
Results – Keypad
Results of multivariate and related univariate analyses of variance
with the version of the Keypad as independent between-subjects
variable.
Measure
Normal
Joint measure
Time in seconds
Number of keystrokes
Number of messages received
Ease of use mobile phone
Ease of use keyboard
Satisfaction of mobile phone
Satisfaction of keyboard
Mean
RK MMP
—
872
438
233
5.3
5.6
4.3
4.6
—
1083
537
271
5.0
4.9
3.9
3.8
df
Hyp. Er.
7
1
1
1
1
1
1
1
66
72
72
72
72
72
72
72
F
p
4.05 0.001
9.44 0.003
10.34 0.002
13.92 <0.001
1.07 0.305
11.13 0.001
1.76 0.188
8.97 0.004
η2
0.30
0.12
0.13
0.16
0.02
0.13
0.02
0.11
Results – Send Text Message
Results of two multivariate analyses and related univariate
analyses of variance with the version of the STM component as
independent between-subjects variable
Measure
Normal
Joint measure
Time in seconds
Number of keystrokes
Number
of
messages
received
Ease of use mobile phone
Ease of use STM function
Satisfaction of mobile phone
Satisfaction of STM function
Correcteda
Joint measure
Number of keystrokes
Number
of
messages
received
a
Mean
df
Simple Compl Hyp. Er.
ex
—
523
269
12
—
672
320
49
7
1
1
1
66
72
72
72
5.0
5.1
3.9
3.9
5.3
4.9
4.2
3.8
1
1
1
1
72
72
72
72
—
249
12
—
289
34
2
1
1
71
72
72
Corrected for all a-priori differences between versions of the components.
F
p
18.16 <0.001
8.15 0.006
4.56 0.036
74.18 <0.001
1.15
0.35
0.93
0.26
η2
0.66
0.10
0.06
0.51
0.288
0.555
0.339
0.614
0.02
0.01
0.01
0.01
20.85 <0.001
2.30 0.134
26.23 <0.001
0.37
0.03
0.27
Power of component-specific measures
Type II, or β, error: failing to reject the hypothesis
when it is false
Statistical Power: 1 - β
Power of component-specific measures
x = sample mean (estimator of µ)
s = estimation of the standard deviation (σ)
sx = estimation of the standard error of the mean, sx2 = s2/n
Power of component-specific measures
Component-specific
measure are less
affected by usability
problems users may or
may not encounter with
other part of the system
Statistical Power: 1 - β
Results- Power Analysis
1. messages received
Power
1
0.8
2. corrected messages
received
3. task duration
0.6
4. keystrokes
0.4
5. corrected keystrokes
0.2
6. comp.-spec. ease-ofuse
7. comp.-spec. satisfaction
0
0
20
40
60
Number of subjects
80
8. overall eas-of-use
9. overall satisfaction
Average probability that a measure finds a significant (α
= 0.05) effect for the usability difference between the
two versions of FS, STM, or the Keypad components
Conclusions
Component-Specific measure can be used to
test the difference in usability between different
versions of an interaction component
1. Objective Performance Measure: Number of
messages received directly or indirectly via lowerlevel components
2. Subjective Usability Measures: Ease-Of-Use and
Satisfaction questionnaire
Component-specific measures are potentially
more powerful than overall usability measures
Questions /
Discussion
Thanks for your attention
Component-Based Interactive Systems
Layered Protocol Theory
(Taylor, 1988)
Reflection
Limitations
Other
Evaluation
Methods
Exploitation of
the Testing
Method
1. Different lower level versions
 different effort involved
when sending a message
2. Usability of a component can
affect the interaction users
have with other components 
Overall measure more
powerful?
3. Can instrumentation code be
inserted?
Reflection
Limitations
Other
Evaluation
Methods
Exploitation of
the Testing
Method
1. Unit testing  lacks the
context of a real task
2. Sequential Data Analysis 
lacks direct link with higher
layers
3. Not Event-Base Usability
Evaluation  lacks direct link
with component
Reflection
Limitations
Other
Evaluation
Methods
Exploitation
of the Testing
Method
1. Creation process  Reducing
the need to deal with a
component each time when it
is deployed
2. Re-use process  Still needs
final usability test
Testing Method
Aim to evaluate the
difference in usability
between two or more
versions of a
component

presentation

Transcript presentation

Directory