Verification of Data Driven Web Applications

Download Report

Transcript Verification of Data Driven Web Applications

CS 290C: Formal Models for Web Software
Lecture 18: Verification of Data Driven Web
Applications
Instructor: Tevfik Bultan
WAVE: A Verification Tool
• Wave is a verification tool for web applications
• Focuses on data driven web applications
– Such as the web applications targeted by WebML
• Wave is capable of verifying temporal properties of WebML
style web application specifications
– Temporal properties are specified using a variant of LTL
– Web application is specified using a core query
language
Interactive, data-driven web applications
• Web application generates web pages dynamically by
sending queries to a backend database
• Web application receives input from the user
– It responds by taking some action, updating its internal
state and generating a new web page determined by a
query
– In Wave queries are specified in FO: first order queries
• FO is an abstraction of the data manipulation core of
SQL
• A run is a sequence of inputs together with the web pages,
states and actions generated by the web application
Infinite State Verification
• Data driven web applications interact with a back-end
database
• Since the possible configurations of the back-end database
is unbounded, verification of data-driven web applications is
an infinite state verification problem
• Most infinite state verification problems are undecidable
• There are two main approaches to handle this:
– Perform unsound verification by restricting the state
space (for example like in Alloy)
• This approach might miss bugs (false negative)
– Perform sound verification by mapping the infinite state
space to a finite abstraction
• This approach might generate false alarms (false
positive)
Infinite State Verification
• Wave uses a different approach
– It focuses on a restricted class of systems and
implements a sound and complete verification technique
for that restricted class
– For the specifications that are not in that class, Wave
can still be used as an unsound verification tool
• The restricted class of systems analyzed by Wave are
called input-bounded systems
– The range of quantifications in the queries used to
specify the system are restricted to the input values
– Example: for all x, for all y [pay(x,y) => price(x,y)]
where pay(x,y) is an input and price is a database
relation
Verification with Wave
• Wave takes a web application specification WA and a
property p as input
• The property p is specified using LTL-F0
– LTL-FO is an extension of LTL that allows specification
of data related properties
• Verification approach
– Explicitly specify the tuples in the database that use only
a small set of relevant constants C computed from WA
and p. This is called the core of the database and
remains unchanged throughout the run
– At each step in the run, make additional assumptions
about the content of the database, needed to determine
the next possible configurations. The assumptions
involve only a small set of additional values
Formal Model
• Wave uses the following formal model for web applications
• Each web application WA consists of
– a finite set of web page schemas, one of which is
designated to be the home page
– A database relational schema D
– A state relational schema S
• Each web page schema W specifies the following:
– The types of inputs accepted by W
– State update rules specifying the tuples to be inserted or
deleted from the state relations of S
– Actions taken in response to user input
– Target web page rules
Input schemas
• Each web page W has an input schema identifying the
types of inputs accepted by W
– Inputs can be either text inputs or a choice from an
option list (to model pull-down menus, radio buttons,
scroll-down lists etc.)
• Input is specified as an input schema containing constants
(corresponding to text input, which is undefined until
entered by the user) and relations (representing input
option lists)
• For each input relation R the options generated by the web
page is defined as an input rule
OptionsR(x) <- q(x) where q is a query on the database,
state relations and inputs provided by the user at the
previous step
Rules for States, Actions and Target pages
• These specify the tuples to be inserted or deleted from the
state relations of S
– Insertions: S(x) <- q(x)
– Deletions: !S(x) <- q(x)
• Actions taken in response to user input
– Such as sending an e-mail, an invoice, etc.
– Specified as insertions to actions relations, like the state
insertion rules
• Target web page rules
– Specify for each possible target page the condition
under which the transition occurs
– The conditions are queries to the database, current state
and current or previous user inputs
Semantics
• A web application WA produces a sequence of pages in
response to user inputs, starting at the home page
• Transitions occur as follows:
– Each web page generates input options corresponding
to its input schema
– User responds by making one choice for each input
relation and providing values for the input constants
– Then the web application takes the actions defined by
the action rules, updates the state based on the state
insertion and deletion rules and moves to the web page
based on the target rules
Semantics
• The content of the database, state relations, current web
page current input choices and computed actions form a
configuration of WA
• A run over a database instance D is a sequence of
configurations C0, C1, C2, … where C0 is the initial
configuration of he home page
• The database does not change during a run
An Example
• An e-commerce website for online computer shopping
• The web application consists of a set of pages:
– HP: home page
– RP: new user registration page
– CP: customer page
– LSP: laptop search page
– PIP: displays products returned by he search
– CC: allows the user to view the cart contents and order
items in it
Specification for LSP
Page LSP
Inputs: laptopsearch(ram,hdisk,display), button(x)
Input Rules:
Optionsbutton(x) <- x = “search” or x=“new-cart” or x=“logout”
Optionslaptopsearch(r,h,d) <- criteria(“laptop”, “ram”, r) and
criteria(“laptop”,”hdd”, h) and criteria(“laptop”, “display”, d)
State Rules:
userchoice(r,h,d) <- laptopsearch(r,h,d) and button(“search”)
Target Web Page Rules:
HP <- button(“logout”)
PIP <- exists r, h, d laptopsearch(r,h,d) and button(“search”)
C <- button(“view cart”)
End Page
Specification of the properties
• Properties are specified in LTL-FO using temporal
operators G, F, X, U, B where
– p B q holds if either q never holds or it eventually does
and p must hold sometime before q becomes true
• An example LTL-FO formula:
for all x, y, id ( pay(id,x,y) and price(x,y) B ship(id,x) )
• Input bounded LTL-FO formulas: all quantified variables
range over values from user inputs
– Given an input bounded web application WA and an
input-bounded LTL-FO formula f, it is decidable to
determine if WA satisfies f
Finite State LTL model checking
• Given an LTL property one can construct a Buchi
automaton (an automaton that accepts infinite runs) that
accepts precisely the sequences that satisfy the LTL
property
• Finite state LTL model checking algorithm (used in Spin)
– Given a transition system and a LTL property, construct
the Buchi automaton for the negated LTL property
– Search for accepting runs in the product automaton that
corresponds to the product of the transition system and
the negation of the input property. If such a run exists
then the property is violated and the accepting run is the
counter example
– An accepting run is always an accepting cycle reachable
from an initial state. Search for an accepting run can be
done using a nested depth first search
Finite State LTL Model Checking
• Finite state LTL model checking works because during the
nested depth first search we eventually run out of new
states to visit
– There are only finitely many states to visit
• We either find a counter-example path and conclude
that the property is violated or
• When there are no more new states left to visit we
conclude that there are no counter-example paths,
hence the property is satisfied
• This approach does not work for an infinite state system
since we are not able to visit all states
– For the web application verification problem, the issue is
that there are an infinite number of databases that
should be searched
The Verification Approach in Wave
• An earlier result by the authors show that to verify a web
application WA with respect to a LTL-FO property f it is
sufficient to inspect only a finitely many runs
– These are runs on a finite set of databases constructed
over a domain which depends only on the specification
and the property
• Then, it is possible to use a nested-depth first search
algorithm
– For each representative database, execute a nesteddepth first search
– Note that the only nondeterminism in a web application
model is due to user input, so the search algorithm can
try all possible user inputs
The Verification Approach in Wave
• The number of representative databases to be investigated
is exponential in the size of the web application
specification and the property
• Since we also have to search for all combinations of user
choices, the resulting algorithm ends up being doubleexponential
• The first optimization of the algorithm constructs the
representative database lazily
– at each step in the run only construct the portions of he
database that is required for evaluating the rules and the
property
• The second optimization prunes irrelevant configurations
based on a dataflow analysis
• When the two optimization are combined the verification
takes a few seconds
Further Optimizations
• Representing database configurations efficiently using
bitmaps
• Detecting the visited configurations efficiently using a trie
data structure (linear time update and membership tests in
the size of the bitmap)
• The database configurations are stored in a database using
a DBMS for efficient evaluation of the schema rules
• Efficient translation between the bitmap representation and
the database tables
• Efficient evaluation of FO queries in page schema rules
• Efficient evaluation of FO property components
• Using an efficient DBMS
Experiments
• They experimented with the following representative web
applications
– An online computer shopping application (based on Dell
web site)
– A specification of a sports website (based on Motorcycle
Grand Prix web site)
– An airline reservation site (based on Expedia web site)
– A specification of a book shopping applications (based
on Barnes & Noble web site)
Property Categories
•
•
•
•
•
•
•
•
•
•
Sequence: p B q
Session: Gp => Gq
Correlation: Fp => Fq
Response: p => F q
Reachability: G p or F q
Progress: G(F p)
Strong non-progress: F(G p)
Weak non-progress: G(p => X p)
Guarantee: F p
Invariance: G p
Example properties
• F HP: The home page is eventually reached in all runs
– This is trivially true since all runs start at the home page
• Any confirmed product was previously paid for
• An order must have status ordered before it can be
cancelled
• If a product is added into the cart, then the user must
eventually view the details of this product
• Every run mus reach the error page EP and be trapped
there forever
– This property should not hold
• Wave verifies these properties (and others)
– The maximum verification time is 4 seconds
Wave Tool
• The tool consists of four modules
– Specification module: the input can either be a text file in
the input language of Wave tool or it can be a WebML
specification that can be translated to a Wave
specification using a WebML import module
– Verification module: Implements the verification
algorithm
– Explanation module: Prints out the counter-example runs
in an informative way
– Code generation module: Automatically generates JSP
pages from the input specification
Extensions
• The authors extended the verification approach used in
Wave to verification of compositions of web services
• Message based asynchronous communication
– Flat & nested messages
– Rules can refer to messages currently read and sent
• They consider asynchronous communication with FIFO
channels
– In order to obtain decidability they have to bound the
channel sizes
• They also investigated modular verification of service
compositions where only a subset of the peers are
analyzed