Mining Large Software Compilations over Time: Another

Download Report

Transcript Mining Large Software Compilations over Time: Another

Mining Large Software Compilations over Time
Another Perspective on Software Evolution:
Gregorio Robles, Jesus M. Gonzalez-Barahona, Martin Michlmayr,
Juan Jose Amor
Presented by
Brian Chan
Cisc 864
12th October 2007
Overview







Background Information
Motivations for Paper
Problems Addressed
Solutions and Data Analysis
Conclusions
Thoughts about the paper
Questions/Comments
Background


Libre Software (open source software)
Compilations of software by vendors:


Group different software sources together
as a product.
Must be: Easy to install, configure
administer
Background Information

Example of Libre Sofware:



Debian – Distribution of the Linux Kernel
Versions 2.0 2.1, 2.2, 3.0 and 3.1
Lots of volunteers - all information mail etc
becomes available.
Motivation for Paper

The evolution of products created from
software compilation is new


Companies have trouble categorizing all the
programs built by different vendors.
This is different compared to normal software
evolution:


Integration Vs Development
Maintenance means additions of new software not
removal of faults or addition of new functionality
Problems addressed



Dealing with adding and removal of
packages in the Debian release and
libre Software “by the large”
Address versioning in packages
Paper is indicative of Libre Software in
general because of its size.
Solutions/Data Gathered

Information of the product
(Sources.tar.gz) contains:


Name, version, list of binary packages built
from it, name and email address of
maintainer.
Experiment focuses on source lines of code
(SLOC) using SLOCCount
Solutions/Data Gathered

1.SLOCCount transforms data into
relational and XML data formats for
viewing purposes.
Solutions/Data Gathered



As MSLOC (Million
Lines of Code)
Number of
Packages
Every two years x2
growth
Faster in earlier
years
Solutions/Data Gathered



Rule 1: Large packages grow in time
Rule 2: Many small packages introduced
Result: Mean size of packages is the same
Solutions/Data Gathered

Common Package:


Same files but
updated in later
versions
Common Versions:

Same files with no
change
Solutions/Data Gathered



25% of packages have been completely removed
15% of packages have been unchanged
Number of packages with versions in common increases
Solutions/Data Gathered

C dominates (between 85%-55%) in all
versions
Solutions/Data Gathered


300% increase in lines of C code
But overall direction is heading to Python Perl

Reasoning: Many more shell scripts for installation
purposes
Conclusions

Evidence shows:



Versions that stay double in size (in terms
of packages or lines of code) every two
years.
Mean size of packages is the same
Not indicative of package behavior!

Because more files with more lines but many
small packages as well
Conclusions


One developer can only handle N
amount of files but software is getting
larger => more developers
C is becoming less important even
though it is still leading in terms of
percentage of lines
Conclusions


More research needed if link between
skills, # of developers, complexity and
activities performed found.
Debian provides good example for
understanding compilation evolution.
Thoughts about the paper

Strong points:



Data provided shows interesting
progression of versioning for this product;
another face to software evolution
Good use of linux product that has
mainstream versioning for example:
Ubuntu may have been too new
Good explanation of reasons for trend: i.e.
same mean, more shell code.
Thoughts about the paper

Points that need improvement



Borrow terms like maintenance from usual
definition: versioning probably would have
sufficed.
Does not really explain the significance of common
packages, files between versions, just lists them.
Bold claim to say Debian is indicative of software
compilation evolution as a whole:

Other releases may show alternate patterns=> show
background research on that.
Questions/Comments