prefer - Software Engineering Laboratory

Download Report

Transcript prefer - Software Engineering Laboratory

Development of a Software Search
Engine for the World Wide Web
Ken-ichi Matsumoto
Akito Monden
Toshiyuki Kamei
Haruaki Tamada
Naoki Ohsugi
— 松本健一
— 門田暁人
— 亀井俊之
— 玉田春昭
— 大杉直樹
Software Engineering Laboratory
Nara Institute of Science and Technology
Needs for Software Search from WWW
Search for examples
Search for better implementations
Is there any other
implementation for this
function?
What is a typical usage of
this library component?
Developer
Is there any useful library for
my program?
Search for unaware components
2
Goal
Construct a software search engine for developers:



Collects various resources related to software development
from the WWW, e.g. source code, executables, Tips,
developer’s “blogs”, etc.
Provides a flexible query interface
Provides a recommendation of useful resources.
In this presentation, we focus our target on Java
programs.
3
System Architecture
Resource Summary
Repository
Analysis
Retrieval
Collection
Interface
Result
Query
- Pointers to
resources (url)
-Recommendations
Software Resource
Repository
Users
4
Three Major Features of Our Search Engine
1: Finding a typical usage of a component
Name of a component M
A set of components that use M
User
2: Finding a similar component
An implementation of a component M
User
Software Search
Engine
Components that have similar functionality to M
3: Get a recommendation
An unfinished program M
User
A set of components useful for M
5
Three Major Features of Our Search Engine
1: Finding a typical usage of a component
Name of a component M
A set of components that use M
User
2: Finding a similar component
An implementation of a component M
User
Software Search
Engine
Components that have similar functionality to M
3: Get a recommendation
User
We employ Software Birthmark,
Unfinished program M
Similarity Evaluation, and Collaborative
to implement
AFiltering
set of components
useful for M these features.
6
Software Birthmark

A set of characteristics of a program*





Constant Values in Field Variables (CVFV birthmark)
Sequence of Method Calls (SMC birthmark)
p
Inheritance Structure (IS birthmark)
Used Classes (UC birthmark)
etc.
CVFV
CVFV(p)


SMC
IS
SMC(p)
IS(p)
UC
UC(p)
Useful for detection of software theft (plagiarism)
Also useful for detection of a set programs having similar
functionality (UC birthmark and SMC birthmark)
* H. Tamada, M. Nakamura, A. Monden, and K. Matsumoto, “Design and
evaluation of birthmarks for detecting theft of Java programs,” In Proc.
7
IASTED Int’l Conf. on Software Engineering, pp.569-575, Feb. 2004.
Example of Software Birthmark for Java

UC birthmark is a set of used classes.
UC Birthmark of ArrayIterator
java.lang.reflect.Array
java.lang.Class
java.lang.IllegalArgumentException
java.lang.Object
java.lang.String
java.util.Iterator
import java.util.Iterator;
import java.lang.reflect.Array;
public class ArrayIterator extends Object
implements Iterator{
private Object array;
private int index = 0;
public ArrayIterator(Object array){
if(!Class.isArray(array.getClass())){
throw new IllegalArgumentException(
“not array type”); }
this.array = array;
}
public Object next(){
return Array.get(array, index++);
}
public boolean hasNext(){
return index < Array.getLength(array); }
...
8
Similarity between Two Components

Similarity computation of UC birthmark i and j
based on correlation coefficient
sim (i, j ) 

uU
( Ru ,i  Ri )( Ru , j  R j )
2
(
R

R
)
uU u ,i i
where
2
(
R

R
)
uU u , j j
U: A set of all classfiles
 0 (u does not use class i)
Ru,i = 
 1 (u uses class i)
Ri = # of classfiles used by i / |U|
Other computations are also available, e.g. vector (cosine)
similarity, adjusted cosine, etc.
9
Example (1): Search for typical usages


Data source: rt.jar (9206 class files)
Search for typical usages of “java.util.BitSet”
10
Example (2): Search for Similar Component


Data source: a part of bcel5.1 (100 class files)
Search for classfiles similar to “ArithmeticInstruction”
11
Collaborative Filtering (CF)


Filtering: means selecting preferred items from a
large collection of items.
Collaborative: means using the other users’
preferences to filter items.
Selecting preferred items
F is good!
K is cool!
?
F
?
K
Using the other users’ preferences
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
Large amount of items
12
Two Steps in CF


Evaluate similarities between target user and the
other users.
Estimate the preference using the other users’
preferences for target item and their similarities.
Item 1
Item 2
User A
5
(prefer)
User B
User C
User D
Item 4
Item 5
5
1
(prefer) (not prefer)
3
(even)
5
?
(target)
(prefer)
5
(prefer)
5
1
(prefer) (not prefer)
3
(even)
5
(prefer)
Similar User
5
(prefer)
5
1
5
(prefer) (not prefer) (prefer)
5
(prefer)
Similar User
1
1
(not prefer) (not prefer)
Item 3
3
(even)
5
(prefer)
1
(not prefer)
Estimate
Dissimilar User
13
CF for Software Components


Evaluate similarities between target component and
the other components based on UC birthmark.
Estimate the usefulness using the other components’
UC birthmark for target classfile and their similarities.
Class 1
Class 2
Class 3
Class 4
Class 5
Component A
1
1
0
1
1
?
(target)
(useful)
Component B
1
0
0
1
1
Similar Component
Component C
1
1
0
0
1
Similar Component
Component D
0
0
1
0
0
Dissimilar Component
0 … not used 1 … used
Estimate
14
Example (3): Get Recommendations


Data source: a part of bcel5.1 (100 class files)
Search for recommendation for “ArithmeticInstruction”
Actually used
15
Summary

Three features of a software search engine




Providing typical usage of a component
Providing a similar component
Making a recommendation
Three key technologies



Software Birthmark
Similarity Evaluation
Collaborative Filtering
16