Presentation title

Download Report

Transcript Presentation title

Improving Performance for Enterprise Applications
Presenter: David C Stewart, Software Engineering Manager
March 2016
Legal Information
Intel technologies, features and benefits depend on system configuration and may require enabled hardware, software or service
activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your
system manufacturer or retailer or learn more at intel.com.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel
microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee
the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessordependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel
microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more
information regarding the specific instruction sets covered by this notice. Notice Revision #20110804
No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document.
Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a
particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in
trade.
This document contains information on products, services and/or processes in development. All information provided here is subject to
change without notice. Contact your Intel representative to obtain the latest forecast, schedule, specifications and roadmaps.
The products and services described may contain defects or errors known as errata which may cause deviations from published
specifications. Current characterized errata are available on request.
Copies of documents which have an order number and are referenced in this document may be obtained by calling 1-800-548-4725 or by
visiting www.intel.com/design/literature.htm.
Intel, the Intel logo, Intel vPro, Look Inside., the Look Inside. logo, Intel Xeon Phi, and Xeon are trademarks of Intel Corporation in the
U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
© 2016 Intel Corporation.
2
INTEL CONFIDENTIAL
Executive Summary
• Software development is changing dramatically with new apps being built
for the cloud and the software stack center of gravity shifting to runtimes
• Intel is optimizing the top-tier runtimes to impact leading data center growth
segments… runtimes teams are delivering significant gains
• Key Take-aways
– Understand how we optimize Runtimes
– Learn about Intel tools for optimizing code
– Discover ways to engage on requirements, tools and techniques
* Runtimes: Java, .NET, PHP, HHVM, Python, Node.js, Go, R
3
INTEL CONFIDENTIAL
Agenda
• Runtimes Environment
• How We Optimize Runtimes
• Runtimes Languages
• Innovations
• Guidance
• Summary
4
INTEL CONFIDENTIAL
Software Development is Changing
Languages
New apps being developed for the Cloud
• Accelerate development
Total of top
runtimes
C
C++
C#
Go
• Operate at scale
Java
• Continuously deploy
JavaScript
Perl
Software stack shifting from compiled
to runtime code
Platform-as-a-Service is becoming the
preferred development model
Source: openhub tracking of open source application development www.openhub.net/languages
5
INTEL CONFIDENTIAL
PHP
Python
Ruby
PaaS Solution Providers
Language Adoption is Application Domain Specific
Application Server,
Web Hosting
Tech Computing
Java
Go
Java
Python
.NET
Python
Python
R
PHP
HHVM
Node.js
6
Software Defined
Infrastructure
Analytics,
Machine Learning,
Big Data
INTEL CONFIDENTIAL
R
Runtimes Scope and Impact
Ensure runtimes run best on Intel
Developer
Services
Orchestration
Cloud Services:
PaaS
Languages
PaaS
IaaS
Self-service
Portal
Hypervisor
SDI
Libraries
Virtualization
Controller
Intel
`
Compute
OS
• Broad impact across apps and
infrastructure, fundamental to Intel data
center strategy
Infrastructure
• Runtimes provide broad portability
between platforms and operating systems
Technologies
• Runtimes critical to Enterprise and leading
data center growth areas: Cloud, Big Data,
IoT, HPC
Applications
Server
Silicon
SDI – Software Defined Infrastructure
7
INTEL CONFIDENTIAL
Intel-defined SDI Software Stack
Intel® Software Developer Tools
Technical, Enterprise & Cloud Computing
Faster Code Faster for Cluster,
Workstation and Server Computing.
Standards Based, Performance
Driven Tools and Libraries
Visual Computing and Media
Create Visually Stunning Media and
Graphics Intensive Applications
on Intel® Architecture
Unleash Your Code
Embedded Systems & IoT
Deep System Wide Insights for
Smart Code & Fast Time to Market
for Embedded Systems & IoT
Development
Mobile Client
One Code Base - All-in-One
Cross-Platform IDE for Mobile
Applications Development
How Do We Optimize Runtimes? (example: Python)
• Find a workload: representative, repeatable, 100% CPU (ssbench)
• Hot function analysis with Perf / Vtune (main dispatch loop)
• Instrumenting opcodes to find the hot ones (not conclusive)
• Use emon to determine behavior
Processor front-end
• Heavy front-end load (50%)
• Due to large code footprint
Fetch
Decode
Execute
• What contributes to high pathlength?
• Interpreted languages, dynamic typing are inefficient
• Branch mispredicts, i-cache misses
9
INTEL CONFIDENTIAL
Memory
Commit
Tough Slog? Or Radical Turn?
• Tough slog: optimize existing Cpython code
• PGO – train the compiler to do you optimization for you (2-10% Python)
• Determine root of poor i-cache packing and branch mispredicts
(developed tool and open sourced it) (10-12% HHVM)
• Radical turn: Drive PyPy and MKL adoption in the industry
Python Performance Boost on
Select Numerical Functions
10
INTEL CONFIDENTIAL
Agenda
• Runtimes Environment
• How We Optimize Runtimes
• Runtimes Languages
• Innovations
• Guidance
• Summary
11
INTEL CONFIDENTIAL
Python
Loose typing + large ecosystem = popularity
 SDI: 70% of OpenStack written in Python
 “Access language” for Big Data, Machine Learning, HPC
STORAGE
NODES
Swift
AUTH
1. Upstream optimizations and enabling
 OpenStack Swift: 70% of cycles spent in Python interpreter
 PyPy yields up to 111% gain on ssbench requests/sec;
POC with SwiftStack planned
2. Intel® Distribution for Python (avail May 2016)
 Targeted at machine learning, data science, and scientific
computing, Windows*, Linux*, OS X*, Python 2.7 & 3.5
 Optimized for Phi & Xeon by leveraging MKL & DAAL:
Includes scikit-learn, NumPy, SciPy, pandas, Matplotlib,
IPython, Sympy, NumExpr, distarray, MPI4Py, pyDAAL &
HDF5 support
 Compatible with Continuum* Anaconda Python Distribution
 Up to 4x on single thread and nearly 100x faster on 32
threads
ssbench
PROXY
What to use?
Enterprise
application
Use PyPy
Scientific Compute
Using
NumPy/SciPy
Use Intel®
Distribution
for Python
12
INTEL CONFIDENTIAL
PHP – the “P” in LAMP
PHP has strong adoption as a language:
 Most popular language for websites
 80% of the 1.2B websites globally
 5M PHP developers worldwide
Similar performance challenges to Python
 No JIT, strong compatibility concerns for major
changes
Zend Infographic: http://www.zend.com/en/resources/php7_infographic
Zend Collaboration
 Analysis showed function hotspots in allocating /
freeing
 Zend addressed resulting in significant improvement
Use PHP7 !
13
INTEL CONFIDENTIAL
HipHop VM (HHVM)
• Developed by Facebook to accelerate
their PHP code
• Several assembly-level optimizations
have resulted in generous improvements
in real customer workloads
• Facebook performance Lockdown in June
 Linker change to load hot functions together
in the cache (2% improvement)
• Memory operations are 40% faster than
glibc in some cases, drive improvements
Impact: Facebook accelerated HSW upgrade
14
INTEL CONFIDENTIAL
WordPress Improvement
observed on Haswell
Compile with AVX2 enabled
5%
Memset() assembly tuning
1.8%
Memcpy() assembly tuning
3%
(generic)
Node.js
Leading runtime for web, mobile, IoT back-end, APIs, foundation of
the emerging MEAN stack
• Customers: Walmart, PayPal, LinkedIn, Yahoo, Netflix, IBM, MSFT, Uber, GE,
eBay, ADP, Citi, Fidelity, Goldman Sachs, Wells Fargo, Red Hat, CA, Oracle
• Intel’s contribution to V8 for client – Power, performance, 439 patches to date
• Intel joins the board of the new Node.js Foundation
• Microsoft submitted patch to integrate Node with Chakra
New in 2016: Optimize Node.js for Servers
• Create an industry standard benchmark (with IBM & Node project)
• Improve core Node libraries: compile using PGO/LTO, libuv
• Http Parser: DPDK, SO_REUSEPORT
• Garbage collector
• Key NPM modules: Websockets
Can Use Help – We’re Hiring, Looking for Workloads
15
INTEL CONFIDENTIAL
Java
Dominant in Enterprise; strong growth in
cloud and Big Data
35x
SPECjbb*2005 Performance
30x
Long Collaboration with BEA/Sun/Oracle
HW+SW
25x
HW+SW
•
IA optimized for Java JIT compiler and libraries
20x
•
Lead/influence Java Specification Evolution
through OpenJDK Project Participation
15x
•
Optimizations span multiple verticals and
platforms; enterprise, cloud, BigData, IoT
Major focus areas 2016+
•
Continue driving adoption of new IA technologies
•
SIMD/vector support in Java
•
Garbage-collection Optimizations
HW+SW
5x
16
INTEL CONFIDENTIAL
HW
HW+SW
HW+SW
HW
HW
0x
Xeon X5160 Xeon X5535 Xeon X5470 Xeon X5570 Xeon X5680
JDK5u7
JDK6
JDK6u5p JDK6u14p JDK6u21p
Xeon E52690
JDK6u29
Xeon E52697
JDK6u29
Performance gain over 7 Xeon generations
12X Hardware
32X Hardware + Software
Customer/developer feedback critical to drive IA Feature Adoption
* Other names and brands may be claimed as the property of others.
HW
10x
.NET Server Runtimes
Microsoft has open sourced .NET Framework
Latest version RC2 with late Q1 2016 for 1.0
.NET Core consists of:
.NET Proprietary
Only windows
Windows
Linux
Mac OS
.NET Open source
Runtime (CoreCLR)
– RyuJIT compiler, GC, base library called mscorlib etc.
Fundamental Libs (CoreFX)
– Foundational libraries, classes for collections, file systems,
console, XML etc.
Contributions
• Already contributed Intel optimized compression library
delivering up to ~30% gain*
.NET Application
ASP.NET
CoreFX
Native
CoreCLR
OS
Open source .NET is the path going forward !
• Performance Improvement Over Compression Workload http://links.uwaterloo.ca/Repository.html https://en.wikipedia.org/wiki/Calgary_corpus
17
INTEL CONFIDENTIAL
Go
Core SDI programming language
 CoreOS, Docker, Kubernetes, Cloud Foundry, etc.
Enabling in progress
 13 patches in Go 1.6 release (Feb’2016): more AVX/AVX2 instructions,
TSX, memory/string functions improvements for up to 200%
 SHA, AES and other new instruction support in Go toolchain
 Hashing/Encryption optimization in Go runtime libraries using SHA/AES
 Etcd: several workloads improved for up to 15%;
Kubernetes: identified hotspots
 Support for VTune accepted in 2016 Update 2 (experimental feature)
 Checkout Amplifier XE 2017!
Some focus areas for 2016
 Vectorization support (AVX, AVX2), acceleration of crypto, hashing,
DPDK, Docker, JSON parsing, Garbage collector and a lot more.
18
INTEL CONFIDENTIAL
R
R is a language and environment for statistical computing and
visualization
Use of R
• Extracting or Data Mining Information from Large Data Sets
• Data Exploration, Cleaning, Visualization, Statistics and Analysis
• Used in Finance, Bio-Science, Retail, Manufacturing, etc
Status of R at Intel
• Exploring the market and doing deeper dives to understand where we can optimize
for the runtimes
• Developing R interfaces into Intel® Data Analytics Acceleration Library (Intel®DAAL)
– Targeted Technical Preview late 2016
19
INTEL CONFIDENTIAL
Agenda
• Runtimes Environment
• How We Optimize Runtimes
• Runtimes Languages
• Innovations
• Guidance
• Summary
20
INTEL CONFIDENTIAL
Innovations
0-Day Infrastructure
• Master code branch changes daily, performance can swing
5%-10% daily
• “0-day” lab builds Master every night, runs performance
benchmarks (on latest IA), reports impact to open source
communities
• “This is a nice canary in the coal mine” – Python’s
originator
• “This makes a lot of sense for runtimes” – Cloud Foundry
CTO
0-Day dashboard – daily performance results
VTune™ for Python, Node.js & Go
• Unmatched capabilities. Low overhead, line level
information, support for threaded code, unified view of
Python and native extensions in C
• Support for Go and Python available as of Intel® Parallel
Studio XE 2016 Update 2. Mixed Python/C profiling coming
in Parallel Studio XE 2017 Beta (April)
• Customer Success – large multi-national fin institution –
Used Python support of VTune and found logging
bottleneck. Able to pinpoint and resolve for greatly
increased performance!
Profiling Chromium gyp script using VTune Python Profiler
21
INTEL CONFIDENTIAL
Guidance For Working With Developers Using Runtimes
• Use standard optimization methodology
• For all Runtimes use the latest release
• Track the Intel 0-day report for the latest performance results…
a newer, intermediate version may be a superior option
• Use PyPy or Intel Python distribution
• Intel Software Developer Zone for Developer Runtime Languages is
a resource with latest white papers, technical content, BKMs,
software downloads, blogs
• Escalate to SSG AE Services if you have a significant customer/ISV
engagement requiring Runtimes engineering support
22
INTEL CONFIDENTIAL
In Closing…
Software Development is
changing: cloud native apps, a shift
to runtimes, the use of PaaS
Intel is making significant
investment and contributions to
runtime languages
As an application developer you should
• Get involved: engage with us on pain points, workloads, opportunities
• Apply: guidance when optimizing runtime-based applications
• Feedback: where we can improve our runtime support
Source: Global Developer Population and Demographics Survey: Volume II © 2015 Evans Data
23
INTEL CONFIDENTIAL
Thank You
Q&A
24
INTEL CONFIDENTIAL
Resources
• Developer Runtime Language Software Developer Zone
• Intel® Distribution for Python
• Intel® VTune Amplifier Profiler
• Intel 0-Day Runtimes Performance
Click links for download pages
25
INTEL CONFIDENTIAL
Intel® Software Developer Tools & Programs
Open Source Programs
• Intel® Open Source Community – 01.org
• Developer Runtimes Language Software
Developer Zone – software.intel.com/runtimes
Commercial Software
• Intel® Software Development Tools software.intel.com/intel-sdp-home
• Download, Evaluate & Buy
• Fully Supported
Free Software Tools
• Intel® Software Development Tools – Supporting Qualified Students, Educators, Academic
Researchers and Open Source Contributors - software.intel.com/qualify-for-free-software
Get Started Today – Visit, Download & Optimize