Tools for Refactoring Functional Programs

Download Report

Transcript Tools for Refactoring Functional Programs

Tools for Refactoring
Functional Programs
Simon Thompson
with
Huiqing Li
Claus Reinke
www.cs.kent.ac.uk/projects/refactor-fp
Design
Models
Prototypes
Design documents
Visible artifacts
LIL 2006
2
All in the code …
Functional programs
embody their design in
their code.
This is enabled by their
high-level nature:
constructs, types …
data Message
= Message Head Body
data Head
= Head Metadata Title
data Metadata
= Metadata [Tags]
type Title = String
…
LIL 2006
3
Evolution
Successful systems are
long lived …
… and evolve
continuously.
Supporting evolution
of code and design?
LIL 2006
4
Soft-Ware
There’s no single
correct design …
… different options for
different situations.
Maintain flexibility as
the system evolves.
LIL 2006
5
Refactoring
Refactoring means changing the design or
structure of a program … without changing
its behaviour.
Modify
LIL 2006
Refactor
6
Not just programming
Paper or presentation
moving sections about; amalgamate sections; move
inline code to a figure; animation; …
Proof
add lemma; remove, amalgamate hypotheses, …
Program
the topic of the lecture
LIL 2006
7
Splitting a function in two
LIL 2006
8
Splitting a function in two
LIL 2006
9
Splitting a function in two
LIL 2006
10
Splitting a function
module Split where
f :: [String] -> String
f ys = foldr (++) [] [ y++"\n" | y <- ys ]
LIL 2006
11
Splitting a function
module Split where
f :: [String] -> String
f ys = foldr (++) [] [ y++"\n" | y <- ys ]
LIL 2006
12
Splitting a function
module Split where
f :: [String] -> String
f ys = join [y ++ "\n" | y <- ys]
where
join = foldr (++) []
LIL 2006
13
Splitting a function
module Split where
f :: [String] -> String
f ys = join [y ++ "\n" | y <- ys]
where
join = foldr (++) []
LIL 2006
14
Splitting a function
module Split where
f :: [String] -> String
f ys = join addNL
where
join zs = foldr (++) [] zs
addNL = [y ++ "\n" | y <- ys]
LIL 2006
15
Splitting a function
module Split where
f :: [String] -> String
f ys = join addNL
where
join zs = foldr (++) [] zs
addNL = [y ++ "\n" | y <- ys]
LIL 2006
16
Splitting a function
module Split where
f :: [String] -> String
f ys = join (addNL ys)
where
join zs = foldr (++) [] zs
addNL ys = [y ++ "\n" | y <- ys]
LIL 2006
17
Splitting a function
module Split where
f :: [String] -> String
f ys = join (addNL ys)
where
join zs = foldr (++) [] zs
addNL ys = [y ++ "\n" | y <- ys]
LIL 2006
18
Splitting a function
module Split where
f :: [String] -> String
f ys = join (addNL ys)
join zs = foldr (++) [] zs
addNL ys = [y ++ "\n" | y <- ys]
LIL 2006
19
Overview
Example refactorings: what they involve.
Building the HaRe tool.
Design rationale.
Infrastructure.
Haskell and Erlang.
The Wrangler tool.
Conclusions.
LIL 2006
20
Haskell 98
Standard, lazy, strongly typed, functional
programming language.
Layout is significant … “offside rule” … and
idiosyncratic. doSwap pnt = applyTP (full_buTP (idTP `adhocTP` inMatch
`adhocTP` inExp
`adhocTP` inDecl))
where
inMatch ((HsMatch loc fun pats rhs ds)::HsMatchP)
| fun == pnt
= case pats of
(p1:p2:ps) -> do pats'<-swap p1 p2 pats
return (HsMatch loc fun pats' rhs ds)
_
-> error "Insufficient arguments to swap."
inMatch m = return m
inExp exp@((Exp (HsApp (Exp (HsApp e e1)) e2))::HsExpP)
| expToPNT e == pnt = swap e1 e2 exp
inExp e = return e
LIL 2006
21
Why refactor Haskell?
The only design artefact is (in) the code.
Semantics of functional languages support largescale transformations (?)
Building real tools to support functional
programming … heavy lifting.
Platform for research and experimentation.
LIL 2006
22
Lift / demote
fxy=…h…
where
h=…

f x y = … (h y) …
hy=…

Hide a function which is
Makes h accessible to the
clearly subsidiary to f;
other functions in the
clear up the
module and beyond.
namespace.
Free variables: which parameters of f are used in h?
Need h not to be defined at the top level, … ,
Type of h will generally change … .
LIL 2006
23
Algebraic or abstract type?
data Tr a
flatten :: Tr a -> [a]
= Leaf a |
Node a (Tr a) (Tr a)
Tr
Leaf
Node
flatten (Leaf x) = [x]
flatten (Node s t)
= flatten s ++
flatten t
LIL 2006
24
Algebraic or abstract type?
Tr
data Tr a
= Leaf a |
Node a (Tr a) (Tr a)
isLeaf = …
isNode = …
…
LIL 2006
isLeaf
flatten :: Tr a -> [a]
isNode
leaf
left
right
mkLeaf
mkNode
flatten t
| isleaf t = [leaf t]
| isNode t
= flatten (left t)
++ flatten (right t)
25
Information required
Lexical structure of programs,
abstract syntax,
binding structure,
type system and
module system.
LIL 2006
26
Program transformations
Program optimisation source-to-source transformations
to get more efficient code
Program derivation calculating efficient code from
obviously correct specifications
Refactoring transforming code structure usually
bidirectional and conditional.
Refactoring = Transformation + Condition
LIL 2006
27
Conditions: renaming f to g
“No change to the binding structure”
1.
2.
3.
No two definitions of g at the same level.
No capture of g.
No capture by g.
LIL 2006
28
Capture of renamed identifier
hx=…h…f…g…
where
gy=…
hx=…h…g…g…
where
gy=…
fx=…
gx=…
LIL 2006
29
Capture by renamed identifier
hx=…h…f…g…
where
fy=…f…g…
hx=…h…g…g…
where
gy=…g…g…
gx=…
gx=…
LIL 2006
30
Refactoring by hand?
By hand = in a text editor
Tedious
Error-prone
• Implementing the transformation …
• … and the conditions.
Depends on compiler for type checking, …
… plus extensive testing.
LIL 2006
31
Machine support invaluable
Reliable
Low cost of do / undo,
even for large
refactorings.
Increased effectiveness
and creativity.
LIL 2006
32
Demonstration of HaRe, hosted in vim.
LIL 2006
33
LIL 2006
34
LIL 2006
35
LIL 2006
36
The refactorings in HaRe
Move def between modules
Rename
Delete/add to exports
Delete
Clean imports
Lift / Demote
Make imports explicit
Introduce definition
Remove definition
data type to ADT
Unfold
Short-cut, warm fusion
Generalise
All module aware
Add/remove parameters
LIL 2006
37
HaRe design rationale
Integrate with existing development tools.
Work with the complete language: Haskell 98
Preserve comments and the formatting style.
Reuse existing libraries and systems.
Extensibility and scriptability.
LIL 2006
38
Information required
Lexical structure of programs,
abstract syntax,
binding structure,
type system and
module system.
LIL 2006
39
The Implementation of HaRe
Information
gathering
Pre-condition
checking
Program
transformation
Strafunski
Program
rendering
LIL 2006
40
Finding free variables ‘by hand’
instance FreeVbls HsExp where
freeVbls (HsVar v) = [v]
freeVbls (HsApp f e)
= freeVbls f ++ freeVbls e
freeVbls (HsLambda ps e)
= freeVbls e \\ concatMap paramNames ps
freeVbls (HsCase exp cases)
= freeVbls exp ++ concatMap freeVbls cases
freeVbls (HsTuple _ es)
= concatMap freeVbls es …
Boilerplate code: 1000 noise : 100 significant.
LIL 2006
41
Strafunski
Strafunski allows a user to write general (read
generic), type safe, tree traversing programs, with
ad hoc behaviour at particular points.
Top-down / bottom up, type preserving / unifying,
full
LIL 2006
stop
one
42
Strafunski in use
Traverse the tree accumulating free variables from
components, except in the case of lambda
abstraction, local scopes, …
Strafunski allows us to work within Haskell …
Other options? Generic Haskell, Template Haskell,
AG, Scrap Your Boilerplate, …
LIL 2006
43
Rename an identifier
rename:: (Term t)=>PName->HsName->t->Maybe t
rename oldName newName = applyTP worker
where
worker = full_tdTP (idTP ‘adhocTP‘ idSite)
idSite :: PName -> Maybe PName
idSite v@(PN name orig)
| v == oldName
= return (PN newName orig)
idSite pn = return pn
LIL 2006
44
The coding effort
Transformations: straightforward in Strafunski …
… the chore is implementing conditions that the
transformation preserves meaning.
This is where much of our code lies.
LIL 2006
45
Program rendering example
-- This is an example
-- This is an example
module Main where
module Main where
sumSquares x y = sq x + sq y
where sq :: Int->Int
sq x = x ^ pow
pow = 2 :: Int
sumSquares x y = sq pow x + sq pow y
where pow = 2 :: Int
main = sumSquares 10 20
sq :: Int->Int->Int
sq pow x = x ^ pow
main = sumSquares 10 20
module Main where
sumSquares x y
= sq pow x + sq pow y where pow = 2 :: Int
sq :: Int->Int->Int
sq pow x = x ^ pow
main = sumSquares 10 20
LIL 2006
46
Token stream and AST
White space + comments only in token stream.
Modification of the AST guides the modification of
the token stream.
After a refactoring, the program source is recovered
from the token stream not the AST.
Heuristics associate comments with program
entities.
LIL 2006
47
Work in progress
‘Fold’ against definitions … find duplicate code.
All, some or one? Effect on the interface …
fx=…e…e…
Symbolic evaluation
Data refactorings
Interfaces … ‘bad smell’ detection.
LIL 2006
48
API and DSL
Combining forms
???
Refactorings
Refactoring
utilities
Strafunski
Library functions
Grammar as data
Strafunski
Haskell
LIL 2006
49
What have we learned?
Efficiency and robustness of libraries in question.
• type checking large systems,
• linking,
• editor script languages (vim, emacs).
The cost of infrastructure in building practical
tools.
Reflections on Haskell itself.
LIL 2006
50
Reflections on Haskell
Cannot hide items in an export list (cf import).
Field names for prelude types?
Scoped class instances not supported.
‘Ambiguity’ vs. name clash.
‘Tab’ is a nightmare!
Correspondence principle fails …
LIL 2006
51
Correspondence
Operations on definitions and operations on
expressions can be placed in one to one
correspondence
(R.D.Tennent, 1980)
LIL 2006
52
Correspondence
Definitions
Expressions
where
let
fxy=e
\x y -> e
fx
| g1 = e1
| g2 = e2
f x = if g1 then e1
g2 … …
LIL 2006
else if
53
Function clauses
fx
| g1 = e1
f x = if g1 then e1
g2 …
else if
fx
| g2 = e2
Can ‘fall through’ a function
clause … no direct
correspondence in the
expression language.
LIL 2006
No clauses for anonymous
functions … no reason to
omit them.
54
Haskell 98 vs. Erlang: generalities
Haskell 98: a lazy,
statically typed, purely
functional programming
language featuring
higher-order functions,
polymorphism, type
classes and monadic
effects.
LIL 2006
Erlang: a strict,
dynamically typed
functional programming
language with support
for concurrency,
communication,
distribution and faulttolerance.
55
Haskell 98 vs. Erlang: example
-- Factorial In Haskell.
module Fact(fac) where
fac :: Int -> Int
fac 0 = 1
fac n | n>0 = n * fac(n-1)
%% Factorial In Erlang.
-module (fact).
-export ([fac/1]).
fac(0) -> 1;
fac(N) when N > 0 -> N * fac(N-1).
LIL 2006
56
Haskell 98 vs. Erlang: pragmatics

Type system makes
implementation complex.
 Layout and comment
preservation.
 Types also affect the
refactorings themselves.

Clearer semantics for
refactorings, but more
complex infrastructure.
Dynamic semantics of
Erlang makes refactorings
harder to pin down.
LIL 2006
Untyped traversals much
simpler.
 Use the layout given by
emacs.
 Use cases which cannot be
understood statically.
57
Challenges of Erlang refactoring


Multiple binding occurrences of variables.
Indirect function call or function spawn:
apply (lists, rev, [[a,b,c]])




Multiple arities … multiple functions: rev/1
Concurrency
Refactoring within a design library: OTP.
Side-effects.
LIL 2006
58
Generalisation and side-effects
-module (test).
-module (test).
-export([f/0]).
-export([f/0]).
repeat(0) -> ok;
repeat(N) ->
io:format (“hello\n"),
repeat(N-1).
repeat(A, 0) -> ok;
repeat(A, N) ->
A,
repeat(A,N-1).
f( ) -> repeat(5).
f( ) -> repeat (io:format (“hello\n”), 5).
LIL 2006
59
Generalisation and side-effects
-module (test).
-module (test).
-export([f/0]).
-export([f/0]).
repeat(0) -> ok;
repeat(N) ->
io:format (“hello\n"),
repeat(N-1).
repeat(A, 0) -> ok;
repeat(A, N) ->
A(),
repeat(A,N-1).
f( ) -> repeat(5).
f( ) -> repeat (fun( )->
io:format (“hello\n”), 5).
LIL 2006
60
The Wrangler
Program source
Scanner/Parser
Parse Tree
Syntax tools
AST annotated with comments
Refactorer
AST + comments + binding structure
Program analysis and transformation
by the refactorer
Transformed AST
Pretty printer
Program source
LIL 2006
61
Teaching and learning design
Exciting prospect of using a refactoring tool as an
integral part of an elementary programming
course.
Learning a language: learn how you could modify
the programs that you have written …
… appreciate the design space, and
… the features of the language.
LIL 2006
62
Conclusions
Refactoring + functional programming: good fit.
Real win from available libraries … with work.
Substantial effort in infrastructure.
De facto vs de jure: GHC vs Haskell 98.
Correctness and verification …
Language independence …
LIL 2006
63