Multi-media Monitoring: The Global Media Lab

Download Report

Transcript Multi-media Monitoring: The Global Media Lab

digital methodologies for global
media research
Randy Kluver
Dept of Communication
Texas A&M University

speech, language and digital media: History
TSWG GMT, DARPA BOLT, Operational deployment of Media Monitoring and TransTalk
First Demonstration of end-to-end GALE Distillation, DARPA MADCAT, TSWG CALL
DARPA GALE contract awarded in 2005 focusing on machine translation
Real-time monitoring of foreign broadcast news with retrieval, alert,
and playback capabilities. .
Call center automation launched.
Audio Mining system combines speech and language
processing to index broadcast news. 1998
Development of multi-lingual optical character
recognition (OCR). 1996
2011
2008
2005
2004
2000
1995 Pioneered statistical language understanding and data
extraction.
1993 First demonstration of 20,000-word real time speech recognizer. HARK™
Recognizer product introduced.
First software-only, real-time, large-vocabulary, speaker independent
1986
continuous speech recognition.
Introduced Byblos recognition system with context-dependent phonetic units.
1992
1982
1976
1974
First use of statistical modeling for speaker identification.
BBN developed one of the first continuous speech understanding systems.
First demonstration of speech transmission over ARPANET using 2400
bit/s LPC speech coding.
2
©Raytheon BBN Technologies Corp., 2011. Proprietary Information. All Rights Reserved.
3
Multi-Media Monitoring System (M3S) at
Texas A&M

Satellite Television: stream, automatically transcribe and
translate foreign language broadcasts
 24/7 video stream with machine-generated transcript and
translation
 Data then remains available within the system for later research

Web sites
 Targeted websites in languages of interest
 Broad-based identification of websites that provide solid
regional perspective

Social media
 Based on selected social media actors, and captures all tweets,
retweets, and mentions
 Currently support Twitter and Facebook (Arabic only)
 Russian under development (mid-2016)
4
advantages to this system
 Provides bigger data approach to global media studies, by
harvesting large amounts of content
 Allows English-language access to content that is inaccessible
to users without target language skills.
 Allows keyword searchable/Boolean searches on content in
either english or target language over entirety of dataset
 Archives material in an unaltered/unedited format for
multiple types of analysis
 Creates exportable clips and/or rich media content for
incorporation into analysis, reports, or other media
5
broadcast translation system
story view
7
pivot view of stories
8
network analysis of twitter influence
9
developing research agendas
 Globalize traditional research agendas
 Multi-modal (broadcast/web/social media)
 Multi-lingual/regional
 Framing, content analysis, cross-platform agenda setting
 Regional, comparative
 Broadens the scope dramatically of available data
 Allow studies that normally cannot be studied across
cultural/linguistic boundaries
 Potential for new methods
 Sharable, marked-up databases
 Big data approaches
10
where we are going:

Developing M3S as an academic/pedagogical resource



Developing collaborative research projects with key
researchers/institutions
Developing research templates for the archive
Creating datasets of key global media content for use by multiple
scholars



Eg, Arab Spring Archive
Developing a corpus of research to emerge from the systems to
raise the visibility
Technical Refinement



Developing mechanism for incorporating digital object architecture
into the systems
Developing meta-tagging capacity to current content
Improving usability of the system
11
partners
 Partners
 TSWG (Technical Support Working Group-
Department of Defense)
 BBN Raytheon Technologies
 CNRI-Center for National Research Initiatives
 For more info:
 mms.tamu.edu, gnma.tamu.edu
 [email protected], [email protected]