Multi-media Monitoring: The Global Media Lab
Download
Report
Transcript Multi-media Monitoring: The Global Media Lab
digital methodologies for global
media research
Randy Kluver
Dept of Communication
Texas A&M University
speech, language and digital media: History
TSWG GMT, DARPA BOLT, Operational deployment of Media Monitoring and TransTalk
First Demonstration of end-to-end GALE Distillation, DARPA MADCAT, TSWG CALL
DARPA GALE contract awarded in 2005 focusing on machine translation
Real-time monitoring of foreign broadcast news with retrieval, alert,
and playback capabilities. .
Call center automation launched.
Audio Mining system combines speech and language
processing to index broadcast news. 1998
Development of multi-lingual optical character
recognition (OCR). 1996
2011
2008
2005
2004
2000
1995 Pioneered statistical language understanding and data
extraction.
1993 First demonstration of 20,000-word real time speech recognizer. HARK™
Recognizer product introduced.
First software-only, real-time, large-vocabulary, speaker independent
1986
continuous speech recognition.
Introduced Byblos recognition system with context-dependent phonetic units.
1992
1982
1976
1974
First use of statistical modeling for speaker identification.
BBN developed one of the first continuous speech understanding systems.
First demonstration of speech transmission over ARPANET using 2400
bit/s LPC speech coding.
2
©Raytheon BBN Technologies Corp., 2011. Proprietary Information. All Rights Reserved.
3
Multi-Media Monitoring System (M3S) at
Texas A&M
Satellite Television: stream, automatically transcribe and
translate foreign language broadcasts
24/7 video stream with machine-generated transcript and
translation
Data then remains available within the system for later research
Web sites
Targeted websites in languages of interest
Broad-based identification of websites that provide solid
regional perspective
Social media
Based on selected social media actors, and captures all tweets,
retweets, and mentions
Currently support Twitter and Facebook (Arabic only)
Russian under development (mid-2016)
4
advantages to this system
Provides bigger data approach to global media studies, by
harvesting large amounts of content
Allows English-language access to content that is inaccessible
to users without target language skills.
Allows keyword searchable/Boolean searches on content in
either english or target language over entirety of dataset
Archives material in an unaltered/unedited format for
multiple types of analysis
Creates exportable clips and/or rich media content for
incorporation into analysis, reports, or other media
5
broadcast translation system
story view
7
pivot view of stories
8
network analysis of twitter influence
9
developing research agendas
Globalize traditional research agendas
Multi-modal (broadcast/web/social media)
Multi-lingual/regional
Framing, content analysis, cross-platform agenda setting
Regional, comparative
Broadens the scope dramatically of available data
Allow studies that normally cannot be studied across
cultural/linguistic boundaries
Potential for new methods
Sharable, marked-up databases
Big data approaches
10
where we are going:
Developing M3S as an academic/pedagogical resource
Developing collaborative research projects with key
researchers/institutions
Developing research templates for the archive
Creating datasets of key global media content for use by multiple
scholars
Eg, Arab Spring Archive
Developing a corpus of research to emerge from the systems to
raise the visibility
Technical Refinement
Developing mechanism for incorporating digital object architecture
into the systems
Developing meta-tagging capacity to current content
Improving usability of the system
11
partners
Partners
TSWG (Technical Support Working Group-
Department of Defense)
BBN Raytheon Technologies
CNRI-Center for National Research Initiatives
For more info:
mms.tamu.edu, gnma.tamu.edu
[email protected], [email protected]