Analyzing the Evolution of Scientific Citations
Download
Report
Transcript Analyzing the Evolution of Scientific Citations
By
Soumajit Pramanik
Guide : Dr. Bivas Mitra
Important Author-based Metrics:
•In-Citation Count
•H-Index etc.
Previous works on Citation Network mainly
focused on:
◦ Analyzing the evolution of citation and
collaboration networks using “Preferential
Attachment” [Barabasi et al. 2002]
◦ Understanding the importance of community
structure in citation networks [Chin et al. 2006]
◦ Studying the evolution of research topics [He et al.
2009]
Previous works on Collaboration Network
mainly focused on:
◦ Adopting social network measures of degree,
closeness, betweenness and eigenvector centrality
to explore individuals’ positions in a given coauthorship network [Liu et al. 2005].
◦ Analyzing the importance of the geographical
proximity (same university/city/country etc.) of the
collaborators [Divakarmurthy et al. 2011].
1. Existing studies focused on the
dominant factors like preferential
attachment
2. None of these factors can be selfregulated.
3. Does their exist any self-tunable factor
(suppressed by dominant factors) for
boosting own citations/collaboration?
Advantage of attending
Conferences:
Face-to-Face interactions
with Fellow Scientists
Studying the influence of
such interactions on the
evolution of Citation and
Collaboration Networks
The authors, whose talks are scheduled in the
same technical session of a conference, have
high chances of interaction.
In general, the first or the last author (or
sometimes both) of a paper attends the
conference.
Citations & Collaborations:
◦ DBLP Dataset for Computer Science domain (19602008)
◦ Around 1 million papers along with information
about author, year, venue and references
◦ 501060 authors tagged with continents (using
Microsoft Academic Search)
◦ 6559415 author-wise citation links
http://arnetminer.org/citation
http://cse.iitkgp.ac.in/resgrp/cnerg/Files/resources.html
Interactions:
◦ Two domains: 1> Networking & Distributed
Computing
2> Artificial Intelligence
◦ Selected 3 leading conferences from each domain:
1> INFOCOM, ICDCS, IPDPS from the first domain (19822007)
2> AAAI, ICRA, ICDE from the second domain (1980-2008)
◦ Collected session information from DBLP and program
schedule of the conferences
To regulate some important parameters and
manifest their effects on the citation network
Followed statistics regarding articles per field
per year, distribution of the number of
authors in a paper and citation information
from the real dataset
Only tunable parameter used: Successful
interaction Rate p (p=0.1,0.2,…,1)
Multiplex Network Construction:
For each year t:
◦ Citation Layer:
Directed author-wise citation links created at t, pointing to
papers published before t (or sometimes, in t)
◦ Interaction Layer:
Undirected interaction links between authors presenting in
same sessions in selected conferences in t
◦ Co-authorship Layer:
Undirected collaboration links between two authors if they
co-author a paper published in those chosen conferences
in t
1. Conversion Rate (CR) for a conference C for a
time-span T:
No. of “Successful” interactions in C during T
-------------------------------------Total no. of interactions in C during T
From this, the definition of the Overall Conversion
rate can be simply extended.
2. Induced Citation Link Repetition (LR):
LR measures the no. of times each “induced”
citation link appears within the recorded time
period.
3. Lifespan of Induced citation (LS):
The Lifespan of an “induced” citation is measured
as the difference between the first and the last
appearing year of the “induced” citation link.
4. Rate of appearance (RA):
The rate of appearance of the of a induced
citation link is denoted by the ratio of the
repetition count and lifespan.
Hence RA = LR / LS
5. Influence of successful interaction (IG):
The influence of a “successful” interaction is
measured as the latency between the “successful”
interaction and the formation of the first induced
citation.
Interactions to Citations
Real Datasets:
Networking Domain:
2.87% (381 out of
13240) for [0.9,0.1]
interaction probabilities
AI Domain:
2.1% (1291 out of
61896) for [0.9,0.1]
interaction probabilities
Synthetic Dataset:
Downfall near end years due to
“Boundary Effect”
Networking Domain:
1. Overall Value increasing
2. Distributed Contribution
AI Domain:
1. Overall Value slowly
increasing
2. Dominated Contribution
Networking
Domain
AI Domain
Significant no. of “induced”
citations have high RA values
Reasons can be
a) Low LS or/and
b) High LR
In both domains,
1. Power-Law distribution
2. A significant no. of “induced”
citations repeat a high no.
of times
Networking
Domain
AI Domain
AI Domain
Networking Domain
AI Domain
1. High RA ratio results from
mainly low LS
2. Ä large no. of induced" citations
missing from the right side of the
plot due to the boundary effect.
1. Aperiodicity of repetitions
of “induced” citations
increase almost linearly with
their Lifespan
2. High LR not necessarily imply
high standard deviation
Networking
Domain
AI Domain
Influence Gap (IG)
1. All the highly repeating
“induced” citations
have low “Influence”
Gap
Networking
Domain
AI Domain
Influence of Continents
Dominance of
North America-North America
pairs
AI Domain
Networking Domain
Domain
LR vs LS
Standard
Deviation
vs LS
LR vs IG
LS vs IG
Artificial
Intelligence
0.57
0.98
-0.13
-0.12
Networking
&
Distributed
Systems
0.61
0.97
-0.14
-0.13
Citations To
Collaborations
Conversion Rates
◦ 1. Considered only collaboration between established researchers
(having at least 1 publication)
◦ 2. In Networking domain out of 8920 co-author links, 2495
(28%) exhibits a past history of mutual citations!
◦ 3. In AI domain 3211 out of 10192 (31.5%) are such “induced”
co-author links.
Induced Collaboration Repetition Count and
Influence Gap
Here also, all highly repeating
“induced” collaborations have
small “influence” gap
Networking
Domain
AI Domain
Networking Domain:
1. Giant component
size 8152,
Second Largest
Component size 63
2. 28% (167) of induced
collaboration links
took part in the
merging process
AI Domain:
1. Giant component size 16203, Second Largest Component size 41
2. 36:6% (263) of induced collaboration links took part in the merging
process
Interactions during conferences can be used as a tool to boost own
citation-count.
This can indirectly help in creating effective future collaborations and
this cycle goes on.
With time people are being more and more aware about the benefits
of interacting with fellow researchers during conferences.
Need to check
1. Influence of specific fields of interacting authors on
creation of “induced” citations
2. Effects of “induced” citations/collaborations on the
citation/collaboration degree distribution
3. Modeling the dynamics
1. A. L. Barabasi, H. Jeong, Z. Neda, E. Ravasz, A. Schubert, and T.
Vicsek: “Evolution of the social network of scientic collaborations”.
Physica A: Statistical Mechanics and its Applications, 311(3-4):590 614, 2002.
2. A. Chin and M. Chignell.: “A social hypertext model for finding
community in blogs. In HYPERTEXT '06”. Proceedings of the seventeenth
conference on Hypertext and hypermedia, pages 11-22, New York, NY,
USA, 2006. ACM Press.
3. Q. He, B. Chen, J. Pei, B. Qiu, P. Mitra, and C. L. Giles: “Detecting topic
evolution in scientific literature: how can citations help?” In CIKM, pages
957-966, 2009.
4. X. Liu, J. Bollen, M. L. Nelson, and H. Van de Sompel.: “Co-authorship
networks in the digital library research community”. Information
processing & management, 41(6):1462-1480, 2005.
5. P. Divakarmurthy, P. Biswas, and R. Menezes.: “A temporal analysis of
geographical distances in computer science collaborations”. In
SocialCom/PASSAT, pages 657-660. IEEE, 2011.