Geographical Locations of Developers at SourceForge

Download Report

Transcript Geographical Locations of Developers at SourceForge

Geographical Locations of
Developers at SourceForge:
Gregorio Robles
Jesus M. Gonzalez-Barahona
Presented by
Brian Chan
Cisc 864
Overview





Background
Motivation
Data Gathering Methods
Results
Conclusions
Background Information

Developers are distributed across the
world for projects


i.e. Libre Software
Hard to account for all the developers
and harder to control these resources
Motivation


To accurately account for all personnel
in the world for a given project
Interesting for academic and economic
reasons
Data Gathering Methods

Two Primary Sources of Information:



Private email address
Time Zone of the User
Acquired from special database for
research purposes
Data Gathering Methods

Useful information in email and time
zone:


ccTLD – Country Code Tope Level Domain
i.e. gsyc.escet.urjc.es
Figure1.0
Figure 2.0
Data Gathering Methods

Hard to Pinpoint
Location when data
consists of:


gTLD (Generic Top
Level Domain)
Time Zones which
are GMT
(Greenwhich Mean
Time)
Figure 3.0
Data Gathering Methods

Use Distributed
Method to estimate
where users should
go:

i.e. 22 users in one
domain, 10 are
unaccounted for due
to GMT only
Figure 4.0
Data Gathering Methods

What if the user was
actually in the GMT?

Need to rebalance
equation to account
for data that was
ignored
Figure 5.0
Data Gathering Methods


Weigh results by
that factor
Ratio of Own TZ to
GMT for GMT
countries same as
Non-GMT regions
Figure 6.0
Data Gathering Methods

Different Types of
data sets
Figure 7.0
Results


Top 50 Countries
account for 96.5%
of developers in
SourceForge.
Top 20 Countries
account for 83.9%
of developers in
SourceForge
Figure 8.0
Results


Most developers are
from Europe and
North America:
almost 50-50 ratio
Penetration in Libre
Software higher in
North America
because Europe has
higher population
Figure 9.0
Conclusion



Method for redistributing developers to their
place of origin
Not to identifying users to a single
geographical location but aggregate numbers
of developers of a certain national origin
Can be used to look for correlations which
explain the GDP, the GDP per capita or other
economic patterns
Personal Thoughts

Good Points


Interesting Results-North America accounts
for almost half of total activity
Interesting method for redistributing
unknown data sources to certain region
Personal Thoughts

Points for Improvement


Questionable: Is it really hard to ascertain
the nationality of the developer or
geographical location entry (even though
private information)
SourceForge might be one of the most
common open source systems, but is this
indicative of all open source systems?
Questions Comments