Slide 0 - Faculteit Wiskunde en Informatica

Download Report

Transcript Slide 0 - Faculteit Wiskunde en Informatica

Code and Pattern Mining in
C/C++
Aditya S. Deshpande
Namratha Nayak
Guides:
Dr. A.Serebrenik(TU/e)
P.Kourzanov, ir(NXP)
Y.Dajsuren, PDEng(Virage Logic)
Agenda
•
•
•
•
•
Introduction
Problem Definition
Data flow
Design patterns
Summary
/ Faculteit Wiskunde en Informatica
29-3-2016
PAGE 1
Introduction
• Code mining – Process of extracting patterns
from source code.
• Design Patterns – A design pattern is a general
reusable solution to a commonly occurring problem
in software design.
• Streaming Data - Data streaming is the transfer of
data at a steady high-speed rate sufficient to support
such applications as high-definition television or a
radio signal.
/ Faculteit Wiskunde en Informatica
29-3-2016
PAGE 2
Problem Definition
• Lack of synchronisation between models and source
code.
• Significant amount of repetitive code in different
modules.
• Identifying patterns and integrating them in the
framework.
• Objective
/ Faculteit Wiskunde en Informatica
29-3-2016
PAGE 3
Approach
• Study the Design flow models available.
• Study the various design pattern matching methods
and tools.
/ Faculteit Wiskunde en Informatica
29-3-2016
PAGE 4
Data flow models
• Kahn Process Networks.
• Synchronous Data Flow.
/ Faculteit Wiskunde en Informatica
29-3-2016
PAGE 5
Kahn Process Network - Introduction
• Processes communicate via FIFO.
• Parallel communication is organized as follows
• Autonomous computing stations are connected to each
other in a network by communication lines.
• A station computes on data coming on its input lines to
produce output on some or all of its output lines.
• Assumptions
• Communication lines are the only means of communication.
• Communication lines transmit info within a finite time.
/ Faculteit Wiskunde en Informatica
29-3-2016
PAGE 6
Kahn Process Network - Introduction
• Restrictions
• At any given time a computing station is either computing or
waiting for information on one of its input lines.
• Each computing station follows a sequential program.
/ Faculteit Wiskunde en Informatica
29-3-2016
PAGE 7
Kahn Process Network - Example
• From Kahn’s original 1974 paper
process f(in int u, in int v, out int w)
{
u
int i; bool b = true;
for (;;) {
f
w
i = b ? wait(u) : wait(v);
v
printf("%i\n", i);
send(i, w);
Process alternately reads from u
b = !b;
and v, prints the data value, and
writes it to w.
}
}
/ Faculteit Wiskunde en Informatica
29-3-2016
PAGE 8
Kahn Process Network - Example
• From Kahn’s original 1974 paper
Process interface
includes FIFO’s.
process f(in int u, in int v, out int w)
{
int i; bool b = true;
wait() returns the next
for (;;) {
token in an input FIFO,
blocking if it’s empty
i = b ? wait(u) : wait(v);
printf("%i\n", i);
send(i, w);
send() writes a data value
b = !b;
on an output FIFO
}
}
/ Faculteit Wiskunde en Informatica
29-3-2016
PAGE 9
SDF - Introduction
• Synchronous data flow graph (SDF) is a network of
synchronous nodes (also called blocks).
• For a synchronous node, the consumptions and
productions are known a priori.
• Homogeneous SDF
/ Faculteit Wiskunde en Informatica
29-3-2016
PAGE 10
SDF - Delay
• Delay of signal processing
• Unit delay on arc between A and B, means
A
d
B
• nth sample consumed by B, is (n-1)th sample
produced by A.
• The arc is initialized with ‘d’ zero samples.
/ Faculteit Wiskunde en Informatica
29-3-2016
PAGE 11
SDF - Implementation
• Implementation requires:
• Buffering of the data samples passing between nodes
• Schedule nodes when inputs are available
• Dynamic implementation (= runtime) requires
• Runtime scheduler checks when inputs are available
and schedules nodes when a processor is free.
/ Faculteit Wiskunde en Informatica
29-3-2016
PAGE 12
SDF - Implementation
• Contribution of Lee-87:
• SDF graphs can be scheduled at compile time
• No overhead
• Compiler will:
• Determine the execution order of the nodes on one or
multiple processors or data path units
• Determine communication buffers between nodes.
/ Faculteit Wiskunde en Informatica
29-3-2016
PAGE 13
Design Patterns
• Describe solutions for common recurring problems
• Can be used in a wider context as they are defined
informally
• Documenting them in a software system simplifies
maintenance and program understanding
• Usually it is not documented, so there is a need to
discover design patterns from source code
/ Faculteit Wiskunde en Informatica
29-3-2016
PAGE 14
Pattern Mining
• Structure of design pattern is searched in the source
code.
• Should include the main properties of the design
pattern
• Flexible to describe the slightly distorted pattern
occurrences.
• Helps to understand the relationships between the
different parts of a large system
/ Faculteit Wiskunde en Informatica
29-3-2016
PAGE 15
Pattern Mining
• Reverse Engineering
• Analysis of a system to
− Identify the components and their interrelationships
− Create representations of the system in another form
• Why tools for Reverse Engineering?
• Existing legacy code
• High number of participants in code development
• Tools developed to mine the patterns from the
source code
/ Faculteit Wiskunde en Informatica
29-3-2016
PAGE 16
Pattern Mining Tools
• Aspects in the different mining tools
• Programming Language : Tools for Java and C++
• Method used to discover design patterns : Graph Matching ,
Constraint Satisfaction Problem, pattern inference
• Intermediate Representation – Abstract Semantic Graph,
Abstract Syntax Tree, Matrix and Vector
/ Faculteit Wiskunde en Informatica
29-3-2016
PAGE 17
Columbus – Design Pattern Mining Tool
• Reverse engineering framework
• Developed in cooperation between the Research
Group on Artificial Intelligence in Szeged, the
Software Technology Laboratory of the Nokia
Research Center and FrontEndART Ltd.
• Analyze large C/C++ projects and extract data
according to the Columbus Schema
• Supports project handling , data extraction , data
representation, data storage, filtering and
visualization
/ Faculteit Wiskunde en Informatica
29-3-2016
PAGE 18
Columbus - Design Pattern Mining Tool
• Has a C/C++ extractor plug-in that performs the
parsing of the source code
• Information collected by the plug-in corresponds to
the Columbus Schema
• Schema captures C++ language at low detail(i.e,
Abstract Syntax Tree) and has the higher –level
elements(i.e., semantics of types)
• Supports various file formats for exporting the
extracted data
/ Faculteit Wiskunde en Informatica
29-3-2016
PAGE 19
Other Pattern Mining Tools
• Other tools to be studied
• CPP2XMI
• Maisa
• CrocoPat
/ Faculteit Wiskunde en Informatica
29-3-2016
PAGE 20
Issues to be considered
• Can the tools support NXP source Code?
• Would it be possible to add proprietary patterns to
these tools?
• Can these tools be extended to support other
languages like C?
/ Faculteit Wiskunde en Informatica
29-3-2016
PAGE 21
Summary
• Overview of the Data flow models
• Introduced the design pattern mining tool Columbus
• Find the patterns present in the NXP source code
and check whether these can be mined using the
available tools
/ Faculteit Wiskunde en Informatica
29-3-2016
PAGE 22
References
• E.A.Lee and D.G.Messerschmitt, “Synchronous data
flow”,Proc. IEEE, vol. 75, pp. 1235-1245, Sept 1987.
• G.Kahn, “The semantics of a simple language for
parallel programming”, Proc.IFIP congr., Stockholm,
Sweden, Aug.1974, pp.471-475
• Gamma, E., Helm, R., Johnson, R. and Vlissides, J.
Design Patterns - Elements of Reusable ObjectOriented Software. Addison-Wesley, 1995.
/ Faculteit Wiskunde en Informatica
29-3-2016
PAGE 23
References
• R. Ferenc, A. Beszedes, M. Tarkiainen, and T.
Gyimothy. Columbus – Reverse Engineering Tool
and Schema for C++. In Proceedings of the 6th
International Conference on Software Maintenance
(ICSM 2002), pages 172–181. IEEE Computer Society,
Oct. 2002.
• R. Ferenc , and A. Beszedes. Data Exchange with the
Columbus Schema for C++. In Proceedings of the 6th
European Conference on Software Maintenance and
Reengineering (CSMR 2002), pages 59–66. IEEE
Computer Society, Mar. 2002.
/ Faculteit Wiskunde en Informatica
29-3-2016
PAGE 24
References
• Z. Balanyi, and R. Ferenc. Mining Design Patterns
from C++ Source Code. In Proceedings of the 19th
International Conference on Software Maintenance
(ICSM 2003), pages 305–314. IEEE Computer Society,
Sept. 2003.
/ Faculteit Wiskunde en Informatica
29-3-2016
PAGE 25
Questions
/ Faculteit Wiskunde en Informatica
29-3-2016
PAGE 26