The Long March (長征) to 3D Video

Download Report

Transcript The Long March (長征) to 3D Video

The Long March (長征)
to 3D Video
Leonardo Chiariglione
Speech at 3D Systems and Applications
Seoul – 2014/05/28
It has already been
not a short march
Analogue
Digital
 Printing
 Video conference
 Photography
 Video telephony
 Telegraphy
 Video interactive
 Telephony
 Television
 Audio recording
 (3D TV)
 Radio
 Television
 Video recording
2
The dimensions
of future media
 Time/space resolution
 File format
 Screen content
 Sensors/actuators
 Colour
 Human interaction
 Brightness
 Fusion of real & virtual
 Scalability
 Detection/analysis
 3D Video
 Linking
 3D Audio
 Energy saving
 Metadata
 User profile
3
There has been progress
in resolution
 QSIF
 SIF
 Standard Definition (interlace)
 High Definition (Interlaced/progressive)
 4k (Progressive)
 8k (Progressive)
4
The cost of being digital
Video
#lines
#pixels
Frame freq.
Mbit/s
“VHS”
288
360
25
41
SD
Speech
CD
576
720
25
166
HD
1080
1920
25
829
4k
8k
2160
3840
50
6636
4320
7680
50
26542
Audio
Sampling freq.
bits/sample
#channels
Mbit/s
8
8
1
0.064
44.1
16
2
1.411
Stereo
48
16
2
1.536
5.1
48
16
5.33
4.093
22.2
48
16
22.66
17.403
5
Compression is making
progress affordable
Base
MP1
~VHS
MP2
2Mbit/s
MP4-V -25%
AVC
-30%
HEVC
-60%
?
?
Scalable Stereo
-10%
-10%
-25%
-25%
?
-15%
-15%
-25%
-25%
?
Depth Selectable
viewpoint
-20%
5/10%
-20%
5/10%
?
?
yr
92
94
98
03
13
?
6
Are there
limits to compression?
 Input bandwidth to humans
 Eyes: 2 channels of 430–790 THz
 Ears: 2 channels of 20 Hz – 20 kHz
 A nerve fiber connecting senses to the brain can
transmit a new impulse every ~6ms = 167 spikes/s
(1 bit ~16 spikes)
 Eye
 1.2 M fibers transmit 10 bit/s each
 An eye sends ~12 Mbit/s to brain
 Ear
 30 k fibers in the cochlear nerve
 An ears sends ~300 kbit/s to brain
7
Sensors-to-brain bitrates
430
–
790
THz
1.2M
nerve
fibers
~12Mbit/s
0.02020kHz
~0.3Mbit/s
30k
nerve
fibers
8
High Dynamic Range and
Wider Color Gamut
 Higher Dynamic Range and Wider Color Gamut can
give users a better sense of “being there”, with a
viewing experience closer to real life experience
 Light bulb > 10,000 nits
 Surface lit in the sunlight > 100,000 nits
 Night sky < 0.005 nits
 Question: if dynamic ranges and volumes of the color
gamut increase significantly, are existing MPEG video
coding standards able to efficiently support future
needs?
9
Wider Color Gamut
ITU-R BT.709
ITU-R BT.2020
10
Dynamic Range- Examples
Bright areas
can have >
10,000
Cd/m2
luminance
Dark areas
can have
< 0.01 Cd/m2
luminance
Screen Content applications
 Wireless display
 Cloud computing and gaming
 Companion screen
 Factory automation display
 Control rooms with high
 Supervisory control and data
resolution display wall
 Digital operating room
(DiOR)
 Virtual desktop infrastructure
(VDI)
 Screen/desktop sharing and
collaboration
acquisition (SCADA) display
 Automotive/navigation
display
 PC over IP (PCoIP)
 Ultra-thin client
 Remote sensing
12
Use case #1: Hi-res display wall
Use case #2: collaboration
Use case 3: DiOR
Where we are
 Janury 2014: Joint Call for Proposals for Coding of
Screen Content
 April 2014: Proposals evaluation
 Conclusion: evidence that significantly improved coding
efficiency can be obtained by exploiting screen content
characteristics with novel dedicated coding tools
 April 2014: Standardization plan and tentative time line




First Test Model: Apr. 2014
PDAM: Oct. 2014
DAM: Feb. 2015
FDAM: Oct. 2015
Test sequence #1 (text and graphics with motion)
Test sequence #2 (text and graphics with motion)
Test sequence #3 (mixed content)
Test sequence #4 (animation)
MPEG standards for coding
multiple cameras
 A long history, starting from MPEG-2 (mid 1990s)
 MPEG standards (existing and under development)
 Multiview coding – can only display views captured at
the source
 Depth-based coding – can also display limited number
of additional views
 Camera arrangement: cameras are assumed to be
linearly arranged
21
Free viewpoint television
(FTV)/1
 Free viewpoint television (FTV): a hypothetical 3D
transmission system that enables a viewer to select
arbitrarys viewpoints, inside and outside a scene
 FTV requires many technologies, not just from MPEG
 A 3D video format supporting the generation of views
not already included in the bitstream generated by the
encoder would be a major enabler for FTV.
 Purpose of MPEG FTV exploration: to develop the
know-how to enable MPEG to develop the said 3D
video format
22
Free viewpoint television
(FTV)/2
 Areas considered in the MPEG FTV exploration
 Compare and evaluate the depth quality attainable for general
camera arrangements
 Evaluate view synthesis algorithms and improve their
performance
 To investigate the coding efficiency of the most promising
coding technologies currently available
 To investigate the influence of mis-registration on the View
Synthesis performance
 To investigate the representation capability of BIFS to clarify the
elements that need to be standardized
23
FTV Seminar
A Viewing Revolution in the Making
Date: 2014 July 8 T14:00-18:00
Venue: Main Hall B, Sapporo Convention Center
Sapporo, Japan
Exhibition of FTV demos
Room 101, 10:00-17:00, July 1 to 4.
24
3D Audio – NHK Loudspeaker
Array Frame
25
Parallel worlds
 For centuries humans have been building two different
types of worlds
Physical
Books
Films
Knowledge
Informational
Music
26
Immersion
 A definition of immersion: a state in which connections of
a human with
 Physical world are severed
 Informational world are activated
27
How far is immersion
progressing?
Fairly…
…or too far?
28
Can we reconnect
the two worlds?
 Smartphones
 Enable universal access to the Informational world while
sensing also the Physical world
 Enhance history and meaning of the real world with
powerful digital elements
 Let’s create two-way
bridges
 Extend reality to virtual
 Add reality to virtual
Physical & Informational
29
Functions of an
Augmented Reality browser
 Retrieve scenario from the internet
 Start video acquisition and track objects
 Recognise objects and recover camera pose
 Get streamed 3D graphics and compose new scenes
 Get input from various sensors
 Access interaction possibilities and objects from a
remote server
 Adapt to offer optimal AR experience
30
The AR technology chain
Remote
Sensors &
Actuators
Remote
Real World
Environment
Authoring
Tools
Local
Sensors &
Actuators
ARAF
Browser
Local
Real World
Environment
User
MPEG
ARAF
ARAF
Augmented Reality
Application Format
Media
Servers
Service
Servers
31
Augmented Reality
Application Format
 A set of MPEG-4 scene graph nodes
 Audio, image, video, graphics, programming, communication,
user interactivity, animation
 Map, MapMarker, Overlay, ReferenceSignal,
ReferenceSignalLocation, CameraCalibration,
AugmentedRegion
ARAF
 Connection to sensors defined in MPEG-V
 Orientation, Position, Angular Velocity, Acceleration, GPS,
Geomagnetic, Altitude, Local camera(s)
 Compressed media
 Image, (3D) sound, (3D) video, 2D/3D graphics
32
The whole
used to be the message
Classic Books: the value is in the content as a whole
33
Today the link adds value
to the message
On-line knowledge: the value is in the link
34
The video
used to be the message
Classic video content: the value is in the content as a whole
35
Next the link will add value
to the video message
New video content: the value is in the link
From EU FP7 BRIDGET project
36
An unequal fight
 Many new services – all more demanding in
bandwidth
 Compression improves, but cannot cope with all
the demands just by itself
 UHD is 4 times the uncompressed bitrate of HD, but
HEVC “only” compresses two times AVC)
 And we have HDR, WCG, SCC, FTV…
 At prime time 30% of USA internet is taken by
Netflix traffic
 We need more tools to solve the problem
37
The mobile industry
perspective
10 x more
spectrum
X
10 x better
spectrum
utilisation
X
10 x more
base stations
=
1000 x more
capacity
38
Making the network
smarter
 Video has lion’s share of internet traffic – more so
as we add more dimensions to the user experience
 We need to cope with (human-vehicle) mobility
 More and more of human life happens on the move
 We need new smarter approaches instead of just
throwing more network capacity, beyond
 Digital video recording (on premises or networked)
 Peer-to-Peer (P2P) Overlays
 Content Distribution Networks (CDNs)
39
Video and Information Centric
Networking
Migration path from today’s IP
infrastructure to pub/sub support for ICN
Same content available at
different network locations
Information Centric Network
IP Network
Client - content - network mobility
under energy consumption constraints
From FP7/NICT EU-JAPAN GreenICN project
40
Green MPEG
Media
Pre-processor Media
Green Metadata Generator
Media
Encoder
Green Metadata Generator
Encoded
Media
Green
Metadata
Power
control
Encoded
Media
Green
Metadata
Power
control
Power optimization module
Media
Decoder
Media
Green Metadata Extractor
Power
control
Green
Feedback
Green
Feedback
Presentation
Subsystem
Power
control
Power optimization module
http://mpeg.chiariglione.org/
42