Presentation

Download Report

Transcript Presentation

Application of Python in Big Data
B.Balamurugan,
B.Balaji
Dept. of EEE
Velammal Institute of Technology,
Chennai.
ABSTRACT
Increasingly large datasets processes in space and
time demand models and statistical methods that
can process this type of data. It is shown that the
advection-diffusion stochastic partial solution
differential equation class provides a flexible
model for processes that space-time is also
possible for the calculations of large data sets.
Gaussian process defined partial stochastic
differential equation is generally not separable
covariance structure
ABSTRACT
In addition, parameters can be interpreted as physical
phenomena modeled explicitly as transport and
diffusion that occurs in many natural processes in
diverse fields ranging from environmental sciences to
ecology. For efficient calculation algorithms use
statistical spectral methods for solving stochastic partial
differential equation. This has the advantage that the
approximation errors are not cumulative over time and
in spectral space computational cost increases linearly
with the size, total cost Bayesian inference calculation or
frequents be dominated by fast Fourier treaties.
BIG DATA ANALYTICS

Big data analytics is the process of examining
large data sets containing a variety of data types -i.e., big data -- to uncover hidden patterns,
unknown correlations, market trends, customer
preferences and other useful business
information. The analytical findings can lead to
more effective marketing, new revenue
opportunities, better customer service, improved
operational efficiency, competitive advantages
over rival organizations and other business
benefits.
ABOUT BIG DATA ANALYTICS

Big data analytics is the use of advanced analytic techniques
against very large, diverse data sets that include different
types such as structured/unstructured and streaming/batch,
and different sizes from terabytes to zettabytes. Big data is a
term applied to data sets whose size or type is beyond the
ability of traditional relational databases to capture,
manage, and process the data with low-latency. And it has
one or more of the following characteristics – high volume,
high velocity, or high variety. Big data comes from sensors,
devices, video/audio, networks, log files, transactional
applications, web, and social media - much of it generated in
real time and in a very large scale.

Analyzing big data allows analysts, researchers,
and business users to make better and faster
decisions using data that was previously
inaccessible or unusable. Using advanced
analytics techniques such as text analytics,
machine learning, predictive analytics, data
mining, statistics, and natural language
processing, businesses can analyze previously
untapped data sources independent or together
with their existing enterprise data to gain new
insights resulting in significantly better and faster
decisions.
Enterprises demand access to huge volumes of
data and rely on powerful insights from that
data to produce better business outcomes.
 IBM big data solutions, featuring enterprise
Hadoop solutions, enable users to store,
manage and analyze data across numerous
sources while making data accessible to
business analysts, data scientists and IT users.

FEATURED BIG DATA SOLUTIONS
Hadoop system
Use distributed storage and processing of large
amounts of structured and unstructured data
to gain business insight.
 Stream computing
Harness data streams, including the Internet of
Things, for context aware, near real-time data
processing and analytics.

FEDERATED DISCOVERY AND NAVIGATION

Help organizations access and analyze
information across the enterprise.
 Big data analytics is the process of collecting,
organizing and analyzing large sets of data
(called big data) to discover patterns and other
useful information.
BIG DATA REQUIRES HIGH-PERFORMANCE
ANALYTICS

To analyze such a large volume of data, big data
analytics is typically performed using specialized
software tools and applications for predictive
analytics, data mining, text mining, forecasting and
data optimization. Collectively these processes are
separate but highly integrated functions of highperformance analytics. Using big data tools and
software enables an organization to process
extremely large volumes of data that a business
has collected to determine which data is relevant
and can be analyzed to drive better business
decisions in the future.
THE CHALLENGES OF BIG DATA ANALYTICS


For most organizations, big data analysis is a challenge.
Consider the sheer volume of data and the different formats
of the data (both structured and unstructured data) that is
collected across the entire organization and the many
different ways different types of data can be combined,
contrasted and analyzed to find patterns and other useful
business information.
The first challenge is in breaking down data silos to access
all data an organization stores in different places and often
in different systems. A second big data challenge is in
creating platforms that can pull in unstructured data as
easily as structured data. This massive volume of data is
typically so large that it's difficult to process using
traditional database and software methods.
HOW BIG DATA ANALYTICS IS USED TODAY


As the technology that helps an organization to break down data silos and
analyze data improves, business can be transformed in all sorts of ways.
According to Datamation, today's advances in analyzing big data allow
researchers to decode human DNA in minutes, predict where terrorists plan
to attack, determine which gene is mostly likely to be responsible for certain
diseases and, of course, which ads you are most likely to respond to on
Facebook.
Another example comes from one of the biggest mobile carriers in the world.
France's Orange launched its Data for Development project by releasing
subscriber data for customers in the Ivory Coast. The 2.5 billion records,
which were made anonymous, included details on calls and text messages
exchanged between 5 million users. Researchers accessed the data and
sent Orange proposals for how the data could serve as the foundation for
development projects to improve public health and safety. Proposed projects
included one that showed how to improve public safety by tracking cell
phone data to map where people went after emergencies; another showed
how to use cellular data for disease containment. (source)
THE BENEFITS OF BIG DATA ANALYTICS



Enterprises are increasingly looking to find actionable insights into
their data. Many big data projects originate from the need to answer
specific business questions. With the right big data analytics
platforms in place, an enterprise can boost sales, increase efficiency,
and improve operations, customer service and risk management.
Webopedia parent company, QuinStreet, surveyed 540 enterprise
decision-makers involved in big data purchases to learn which
business areas companies plan to use Big Data analytics to improve
operations. About half of all respondents said they were applying big
data analytics to improve customer retention, help with product
development and gain a competitive advantage.
Notably, the business area getting the most attention relates to
increasing efficiency and optimizing operations. Specifically, 62
percent of respondents said that they use big data analytics to
improve speed and reduce complexity.

Python is a programming language that lets
you work more quickly and integrate your
systems more effectively.
WHY PYTHON FOR DATA ANALYSIS?







Python has gathered a lot of interest recently as a choice of language
for data analysis. I had compared it against SAS & R some time back.
Here are some reasons which go in favour of learning Python:
Open Source – free to install
Awesome online community
Very easy to learn
Can become a common language for data science and production of
web based analytics products.
Needless to say, it still has a few drawbacks:
It is an interpreted language rather than compiled language – hence
might take up more CPU time. However, given the savings in
programmer time (due to ease of learning), it might still be a good
choice.
QUOTES ABOUT PYTHON

Python is used successfully in thousands of
real-world business applications around the
world, including many large and mission critical
systems. Here are some quotes from happy
Python users:







YouTube.com
"Python is fast enough for our site and allows us to produce maintainable features in
record times, with a minimum of developers," said Cuong Do, Software
Architect, YouTube.com.
Industrial Light & Magic
"Python plays a key role in our production pipeline. Without it a project the size of
Star Wars: Episode II would have been very difficult to pull off. From crowd rendering
to batch processing to compositing, Python binds all things together," said Tommy
Burnette, Senior Technical Director, Industrial Light & Magic.
"Python is everywhere at ILM. It's used to extend the capabilities of our applications,
as well as providing the glue between them. Every CG image we create has involved
Python somewhere in the process," said Philip Peterson, Principal Engineer,
Research & Development, Industrial Light & Magic.
Google
"Python has been an important part of Google since the beginning, and remains so
as the system grows and evolves. Today dozens of Google engineers use Python, and
we're looking for more people with skills in this language." said Peter Norvig, director
of search quality at Google, Inc.
APPLICATIONS FOR PYTHON








Python is used in many application domains. Here's a
sampling.
The Python Package Index lists thousands of third party
modules for Python.
Web and Internet Development
Python offers many choices for web development:
Frameworks such as Django and Pyramid.
Micro-frameworks such as Flask and Bottle.
Advanced content management systems such
as Plone and django CMS.
Python's standard library supports many Internet protocols:













Advanced content management systems such as Plone and django CMS.
Python's standard library supports many Internet protocols:
HTML and XML
JSON
E-mail processing.
Support for FTP, IMAP, and other Internet protocols.
Easy-to-use socket interface.
And the Package Index has yet more libraries:
Requests, a powerful HTTP client library.
BeautifulSoup, an HTML parser that can handle all sorts of oddball HTML.
Feedparser for parsing RSS/Atom feeds.
Paramiko, implementing the SSH2 protocol.
Twisted Python, a framework for asynchronous network programming.
SCIENTIFIC AND NUMERIC
Python is widely used in scientific and
numeric computing:
 SciPy is a collection of packages for mathematics,
science, and engineering.
 Pandas is a data analysis and modeling library.
 IPython is a powerful interactive shell that features
easy editing and recording of a work session, and
supports visualizations and parallel computing.
 The Software Carpentry Course teaches basic
skills for scientific computing, running bootcamps
and providing open-access teaching materials.

EDUCATION
Python is a superb language for teaching
programming, both at the introductory level and
in more advanced courses.
 Books such as How to Think Like a Computer
Scientist, Python Programming: An Introduction
to Computer Science, and Practical
Programming.
 The Education Special Interest Group is a good
place to discuss teaching issues.

SOFTWARE DEVELOPMENT






Python is often used as a support language for software
developers, for build control and management, testing,
and in many other ways.
SCons for build control.
Buildbot and Apache Gump for automated continuous
compilation and testing.
Roundup or Trac for bug tracking and project
management.
Python Success Stories
Python is part of the winning formula for productivity,
software quality, and maintainability at many companies
and institutions around the world.







Doing Math with Python shows you how to use Python to delve into high
school–level math topics like statistics, geometry, probability, and calculus.
You’ll start with simple projects, like a factoring program and a quadraticequation solver, and then create more complex projects once you’ve gotten
the hang of things.
Along the way, you’ll discover new ways to explore math and gain valuable
programming skills that you’ll use throughout your study of math and
computer science. Learn how to:
Describe your data with statistics, and visualize it with line graphs, bar
charts, and scatter plots
Explore set theory and probability with programs for coin flips, dicing, and
other games of chance
Solve algebra problems using Python’s symbolic math functions
Draw geometric shapes and explore fractals like the Barnsley fern, the
Sierpinski triangle, and the Mandelbrot set
Write programs to find derivatives and integrate functions


Creative coding challenges and applied examples help
you see how you can put your new math and coding
skills into practice. You’ll write an inequality solver, plot
gravity’s effect on how far a bullet will travel, shuffle a
deck of cards, estimate the area of a circle by throwing
100,000 “darts” at a board, explore the relationship
between the Fibonacci sequence and the golden ratio,
and more.
Whether you’re interested in math but have yet to dip
into programming or you’re a teacher looking to bring
programming into the classroom, you’ll find that Python
makes programming easy and practical. Let Python
handle the grunt work while you focus on the math.
WHY PYTHON FOR BIG DATA?

Accelerating Time to Value, Connecting Dots in
the Data, Empowering Everyone
The Python team has gained popularity in recent
years. For good reasons: It is fast and easier to
code and use for small tasks "prototypes" that
there is no need for an explicit declaration of
variables or a series of separate compilation. It is
freely available for most computer platforms, and
comes with a huge repository of packages that
cover a wide range of applications. Python also
has features that facilitate the development and
encourages the documentation of large well
structured program systems.
NEW METHODS
Here we describe a method for making some
comments thesis routines even easier to use
for a range of applications, the numerical
solution of partial differential equations was
discredited rectangular grid (or a sub domain of
Tal grid). As a reference, one may consider the
resolution of problems wave equation in the
frequency domain,
However, the classes used to solve this problem
have been designed with additional topologies,
geometries and applications in mind. These
classes are mesh Lattice Function and Lattice
Operator
2. CLASS LATTICE
This class is designed to manage the most basic
properties and operations of a discrete model.
We divide them into topological and
geometrical aspects of the model . The most
basic properties of a discrete model are the
dimensionality of space, and how we approach
a continuous space with a number of sites in
each direction (referred form ) .
The code snippet
Key words
Boundary conditions,
 Sub domains and Slices
 Index arrays and broadcasting

3.CONCLUSION

In this paper, we introduced the Python
programming language as a suitable choice for
learning and real world programming. Due to this,
history and philosophy of creating this program
was talked. Then we got to definition and
distinguished characteristics of it. According these
characteristics we found Python as a fast,
powerful, portable, simple and open source
language that supports other technologies.
Numerical programming and other programming
applications. some of corporations that use Python
for developing their products were introduced.
REFERENCES
[1] D. Albanese, R. Visintainer, S. Merler, S. Riccadonna, G. Jurman, and C. Furlanello. mlpy:
Machine learning Python. CoRR, abs/1202.6548, 2012.
[2] C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on
Intelligent system and Technology, 2:27:1–27:27, 2011.
[3] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, and V. Dubourg. scikit-learn: Machine learning in Python. The Journal of
Machine Learning Research, 12:2825–2830, 2011.
[4] "Programming Language Trends - O'Reilly Radar". Radar.oreilly.com. 2 August 2006.
[5] "Python Buildbot". Python Developer’s Guide. Python soft ware Foundation. Retrieved 24
September 2011.
[6] "3.3. Special method names". The Python Language Reference. Python Software Foundation.
Retrieved 27 June 2009.
[7] "PyDBC: method preconditions, method post conditions and class invariants for Python".
Retrieved 24 September 2011.
[8] "The Red Monk Programming Language Rankings: January 2011 tecosyst ems". Redmonk.com.
[9] The Cain Gang Ltd. "Python Metaclasses: Who? Why? When?“ (PDF). Archived from the original
on 10 December
2009.
[10] Warsaw, Barry; Hylton, Jeremy; Goodger, David (13 June 2000). "PEP 1 – PEP Purpose and
Guidelines". Python Enhancement Proposals. Python software Foundation. Retrieved 19 April
2011.
THANK YOU