5-SEASR-Analytics-For

Download Report

Transcript 5-SEASR-Analytics-For

SEASR Analytics and Zotero
University of Illinois at Urbana-Champaign
Outline
• Brief Zotero Introduction
• SEASR Analytics for Zotero Plugin
• Interaction between SEASR and VUE
• Zotero Flows
• Configuration Mechanism
• Web Service Components
• Zotero-enabled Flows
• VUE-enabled Flows
• Attendee Project Work
The Zotero Picture
The
WEB
Zotero
Store
What is Zotero? (from Zotero Quick Start Guide)
•
A citation manager. It is
designed to store, manage,
and cite bibliographic
references, such as books and
articles. In Zotero, each of
these references constitutes
an item.
•
An extension for the Firefox
web-browser by the Center for
History and New Media at
George Mason University.
•
Installed by visiting zotero.org
and clicking the download
button on the page.
Zotero Features (from zotero.org)
• Automatically capture citations
• Remotely back up and sync your
library
• Store PDFs, images, and web
pages
• Cite from within Word and
OpenOffice
• Take rich-text notes in any
language
• Wide variety of import/export
options
• Free, open source, and extensible
• Collaborate with group libraries
• Organize with collections and tags
• Access your library from anywhere
• Automatically grab metadata for
PDFs
• Use thousands of bibliographic
styles
• Instantly search your PDFs and
notes
• Advanced search and data mining
tools
• Interface available in over 30
languages
The Zotero + SEASR Picture
The
WEB
The
WEB
Zotero
Store
SEASR Analytics for Zotero
• An extension for the Firefox web-browser by the SEASR
Team
• Uses your Zotero Collections
• Performs analysis using SEASR Services
SEASR Analytics for Zotero Interface
How to Setup Your Machine
• Install/Open Firefox
• Install Zotero
– https://addons.mozilla.org/en-US/firefox/addon/3504
– http://zotero.org
• Install the SEASR Zotero plugin
– https://addons.mozilla.org/en-US/firefox/addon/10020
• The plugin points to the default services provided by
SEASR (running on our server)
Zotero and SEASR
Tag Cloud Analysis
Readability Analysis
Date Entity to Simile
Timeline
Automatic Summarization
Network Analysis
Location Entity to Google Map
Example: Zotero, SEASR, Protovis,
Google Maps, Simile
Tag Cloud Examples
• Tag Cloud Viewer
– Creates tag cloud for all items submitted (with a url), stop words
filtered including common tokens (punctuation), stemmed, top
100 words displayed in tag cloud viewer
• NGram Tag Cloud Viewer
– Creates tag cloud for all items submitted (with a url), stop words
filtered including common tokens (punctuation), 2-grams, top
100 2-grams displayed in tag cloud viewer
Entity Extraction Examples
• Date Entities to Simile Timeline
– Extracts date entities from all items submitted (with a url), and
plots these dates on the Simile Timeline
• Location Entities to Google Map
– Extracts location entities from all items submitted (with a url), and
plots these on a Google Map
• Entities to Protovis Network
– Extracts entities, creates relationships of entities existing in the
same sentence and display in a Protovis force directed link node
graph
Text Summarization
HITS Summarizer
– Finds top sentences and tokens from all items submitted (with a
url) and displays them in a report
Flesch-Kincaid Readability Test
• Given: Zotero item(s)
• Results show scores for
each item selected
– Designed to indicate
comprehension difficulty
when reading a passage of
contemporary academic
English
– Flesch Reading Ease:
higher scores indicate
material that is easier to
read; lower numbers mark
passages that are more
difficult to read
– Flesch–Kincaid Grade
Level: result is a number
that corresponds with a
Authorship Analysis
• Given: Zotero Collection (or
multiple items selection) with
Author/Co-Author Information
• Determine importance of given
authors in this collection?
– Each author is a vertex in the graph
– Authors are connected with an edge
if they are co-authors of an item
– List of Authors ranked by the
Betweenness Centrality Measure
– Betweenness is a centrality measure
of a vertex within a graph. Vertices
that occur on many shortest paths
between other vertices have higher
betweenness than those that do not.
The Value Added
• Analytical Results are saved as Zotero items (View Snapshot)
– Includes metadata
– Item naming strategy identifies the item or collection processed
– Creator indicates the Menu Label of the SEASR Analysis
• Related Tab links to the items processed in the Analysis
• No need to install the analysis, it runs as web service
The Zotero Plugin
• Open Firefox
• Install Zotero
– https://addons.mozilla.org/en-US/firefox/addon/3504
• Install the SEASR Zotero plugin
– https://addons.mozilla.org/en-US/firefox/addon/10020
• The plugin will point to the default services provided by
SEASR
• You can develop and deploy your own (samples
available)
• SEASR plugin preferences allow to point to other service
providers
Zotero and VUE
The VUE team has integrated their tool, so that items can
be exported into VUE
SEASR Support in VUE
• Goal: Provide functionality in
VUE to use SEASR flows
• Implementations:
– Add content to map for top
10 words from the given url
– Get metadata for content
– Get information about
content
SEASR and VUE
• Top words from 2 different web pages with nodes moved
around to see overlap
university
pig
services
organisers
week
web
parallelism
trends
navigation
Dynam ic Distributed DataIntensive Applications ESIWiki
approaches
times
tools
talk
scalable
permanent
subtopics
personal
specific
topics
session
scientific
subheadings
databases
problems
themes
operation
utc
operations
community
thore
production
random
signs
volume
paradigm
printable toolbox
table
data
Database Paradigms - ESIWiki
scratch
technologies
paradigms
programmatic
separate
signing
provided
traditionally technology
partitioning
rely
2010
database
solutions
stream
viglas
sign
systems
warehouses
main
recent
query
sql
esi
discussion
storage
system
vertical
overlap
add
workshop
years
sort
simple
stratis
modes
processors
web
version
scale
special
terabytes
store
microsoft
programming
research
8
text
headings
rethink
uk
web
types
questions
patterns
williams
comments
relational
standard
views
tied
variants structured
soliciting
search
insight
march
analysis
online
requirements
stages
retrieved
page
sviglas
universal
scaling
talks
streams
source
visitors
workshops
provide
complexity
sawzall
return
policy
signature
dbmss
dbms
parallel
Programming Paradigms ESIWiki
esiwiki
privacy
purpose
parallelise
web
theme
wiki
processing
roger
reflect
specifically
view
variable
saltz
related
intensive
versus
primacy
process
shantenu
programmatically
overcome
programming_paradigms
pages
Analysis Paradigm s - ESIWiki
SEASR and VUE
pig
services
organisers
web
approaches
versus
primacy
process
shantenu
scalable
saltz
permanent
subtopics
personal
talk
programmatically
overcome
programming_paradigms
parallelism
trends
Dynam ic Distributed DataIntensive Applications ESIWiki
roger
reflect
specifically
microsoft
table
rethink
technologies
web
session
subheadings
Pr ogr amming Paradigms ESIWiki
sawzall
overlap
talks
streams
return
processing
online
requirements
store
privacy
purpose
databases
operation
programming
query
signature
utc
dbms
parallel
provide
production
random
retrieved
questions
patterns
separate
recent
dbmss
tied
variants structured
soliciting
operations
parallelise
programmatic
stages
source
policy
paradigm
relational
community
printable
sviglas
universal
scaling
special
standard
insight
variable
search
web
types
Database Paradigm s - ESIWiki
tools
traditionally technology
partitioning
rely
version
database
sql
workshops
solutions
theme
stratis
stream
systems
terabytes
warehouses
comments
volume
pages
toolbox
topics
uncertainty
headings
university
week
text
techniques
network
main
esiwiki
data
knowledge
order
intensive
tasks
scale
amounts
centric
arise
themes
modes
domain
quality
experts
visualization
processors
predictive
address
learning
understanding
web
trade
Analysis Paradigm s - ESIWiki
exploratory
unsupervised
large
simple
storage
make
analysis
complexity
times
sort
viglas
sign
related
wiki
visitors
esi
page
research
system
vertical
uk
paradigms
signs
workshop
navigation
signing
provided
thore williams
specific
views
scratch
years
view
add
discussion
problems
regard
8
chris
william
2010
cleaning
march
scientific
integration
world
mining
train
typically
methodstechnological
prior
problem
important
datasets
incorporation
structure
variables
time
challenges
thinking
domains
models
modelling
gain descriptive
working
ways
understand
wide
information
Demonstration
• We will be demonstrating how to install and use the
SEASR Analytics extension for Zotero
• We will also showcase Tufts' Visual Understanding
Environment (VUE) for Zotero and its integration with
SEASR
Learning Exercises: Zotero Collection
Have participants run some of the Zotero-enabled flows
– Setup a Zotero collection you want to use, skip to the next step
• Create a collection by right-clicking on "My Library" and selecting "New
Collection"
– Give the collection a name (such as "DocSouth")
• Select this collection
• Use Firefox to navigate to http://docsouth.unc.edu/neh/aaron/aaron.html
• Open Zotero by clicking the Zotero icon in Firefox (bottom-right corner)
• Capture the current webpage as a Zotero item by clicking the "Create
new item from current page" button (fifth from the left on the Zotero
toolbar)
• Navigate to http://docsouth.unc.edu/neh/adams/adams.html and repeat
the previous step
Learning Exercise: Access SEASR
• Select one or more items in Zotero and then right-click
on one of the selected items and choose SEASR
Analytics -> SEASR -> Tag Cloud Viewer to create a tag
cloud from text extracted from your Zotero item(s)
• Do the same thing but select SEASR Analytics ->
SEASR -> Hits Summarizer instead, to view a list of top
tokens and sentences extracted from your item(s)
• Repeat the same procedure one more time, but this time
select SEASR Analytics -> SEASR -> Date Entities to
Simile Timeline to view a timeline containing dates
extracted from your item(s)
Learning Exercise: Zotero and VUE
• Run an analysis on a Zotero collection through VUE.
These steps will create nodes in VUE and extract the
words from these documents and connect them with a
link. By doing this for multiple documents, you will see
what words (concepts) are mentioned in multiple
documents. Note that we could change from words to an
extracted entity, like Person and automatically build a
social network around the documents that are selected.
– Note: This exercise requires the existence of a Zotero collection;
you can create one by following step 1 in the previous exercise,
if necessary
Learning Exercise: SEASR and VUE
– Open Zotero by clicking on the Zotero icon in Firefox (bottomright corner)
– Click the Settings button on the Zotero toolbar (third from left)
and select "Start VUE"
• Confirm the security prompt, if one is presented
– Right-click the collection (ex: "DocSouth") and select "Send to
VUE and Add to Map"
– Select one of the boxes (documents) in the VUE workspace and
then choose Analysis -> SEASR from the VUE menu
– In the new window that is displayed, select "Create new nodes"
in step 2, and the "Resource Word Count" analysis in step 3 and
press Analyze
– Repeat this for additional nodes in your graph to build a more
complex network of words. You can now use the functionality of
VUE to rearrange your graph to tell a story.
Discussion Questions
• What kinds of data assets would you be creating in
Zotero?
• What other analysis would you like to use against this
data?
Creating Zotero Flows
Outline
• Zotero Flow
• SEASR Configuration File
• VUE-SEASR Configuration File
SEASR Plugin Preferences
• Configuration files are
managed in a list
• Each configuration file
can be enabled or
disabled
• Reload will refresh the
plugin with the flows in
the configuration files
Local Setup
• Copy config file to your machine from
– http://repository.seasr.org/Zotero/config/seasr.config
• In Zotero,
– Select Preferences from Menu
– Go to SEASR
• Click Add
– Specify a Provider Name
– Specify a URL for the config file
(file:///Users/lauvil/Sites/zotero.config)
– Click box for Enabled
• Note: In the future, after editing the config you only
need to click “Reload”
Extensible to Analysis that You Create
• You can deploy the flows we have on your server or
request your university to host this analysis
•
You can modify these flows and redeploy
• You can create new flows
– Perhaps you want to see only nouns or verbs
– Perhaps you want to see a list of extracted entities
• You can share these flows back to the community
Configuration File (XML or json)
•
Contains 2 attribute-value pairs
– name: label to use in the Zotero drop-down display
– url: url for where to send the post
• XML
<seasr_analytics>
<flows>
<flow name="Author Centrality Analysis"
url="http://services.seasr.org:10000/http://seasr.org/flows/zotero-socialnetwork/instance/service-head-post/1"/>
</flows>
</seasr_analytics>
• json
{"seasr_flows":[
{"name":"Author Centrality Analysis",
"url":"http://services.seasr.org:1718/meandre://seasr.org/components/zotero/serv
ice-head-post/instance/shp" } ,
{"name":"Flesch-Kincaid Readability Test",
What does a Web Service Flow Look Like
Common components used for creating a web service flow
• Service Head Post
– Receives the http post and sends the data to the rest of the flow
• Service Tail Text
– Send the results back to the http request
Another Zotero Service Flow
Components that read Zotero data from the web service
• Zotero Author Extractor (previous slide)
– Extracts the author-coauthor from each item
• Zotero URL Extractor
– Extracts the url from each item
VUE-SEASR Configuration File (XML)
<?xml version="1.0" encoding="UTF-8"?>
<seasr_analytics>
<flow_group label="Create New Nodes" input="one" output="map”>
<flow label="Resource Word Count" uri="http://vue.tufts.edu/word-countsvuetokenizer/word-counts-for-vue-using-vue-tokenizer/" url="http://vuedl.tccs.tufts.edu:1719/service/ping" duplicate="false" >
<input>location</input>
</flow>
</flow_group>
<flow_group label="Add Metadata" input="one" output="map" >
<flow label="Resource Word Count" uri="http://vue.tufts.edu/word-countsvuetokenizer/word-counts-for-vue-using-vue-tokenizer/" url="http://vuedl.tccs.tufts.edu:1719/service/ping" duplicate="false”>
<input>location</input>
</flow>
</flow_group>
</seasr_analytics>
Demonstration
• We will go through an example of what a Zotero-enabled
flow looks like and what's special about it
• We will show how to modify an existing Zotero-enabled
flow and how to "deploy" it so that it can be leveraged
within Zotero
Learning Exercises
1. Create a new flow (or adapt an existing flow) using the
Meandre Workbench that performs some simple
analysis and "deploy" it for access by Zotero
1.
We can use the flow we constructed in an earlier session as a
base
2.
Execute this flow
3.
Change the configuration of SEASR plugin so that it knows
how to access this flow
4.
From Zotero, refresh the configuration file
5.
Select some data to process through the updated SEASR flow
Discussion Questions
• What kinds of data assets would you be creating in
Zotero?
• What other analysis would you like to use against this
data?