Slides (PPTX)

Download Report

Transcript Slides (PPTX)

Week 4 - Wednesday




What did we talk about last time?
Data storage
Databases
Shapes in Visual Python
Data mining means looking for patterns in
massive amounts of data
 These days, governments and companies collect
huge amounts of data
 No human being could sift through it all
 We have to write computer programs to analyze
it
 It is sort of a buzzword, and people argue about
whether some of these activities should simply
be called data analysis or analytics



It is a form of machine learning or artificial
intelligence
At the most general, you can:
 Cluster analysis: Find a group of records that are
probably related
▪ Like using cell phone records to find a group of drug dealers
 Anomaly detection: Find an unusual record
▪ Maybe someone who fits the profile of a serial killer
 Association rule mining: Find dependencies
▪ If people buy gin, they are also likely to buy tonic
Seeing patterns in the warehouses can help
improve their inventory management
 Seeing patterns in sales can help them prepare
for changing demands from customers
 The items a customer buys can prompt them to
rearrange the store to encourage other
purchases

 People make 20 million trips to Walmart each day
 Lots of data to analyze

Given information about potential customers,
businesses can contact those that are most likely
to be interested
What do Walmart, hurricanes, and
Pop-Tarts have to do with one
another?
 A 2004 NY Times article says that
Walmart's analysis shows the
demand for strawberry Pop-Tarts
goes up by a factor of 7 before a
hurricane makes landfall
 But the top selling item is beer

Social media providers have access to lots of
data
 Facebook alone has details about over a billion
people
 Can they find hidden patterns about your life?
 Should they inform the police if they think they
can reliably predict crime?
 What about data the government has?
 For research purposes, some sets of
"anonymized" data are made public

 But researchers often discover that the people
involved can be discovered anyway
Walmart did some data mining and
discovered that, on Friday afternoons, young
American males who buy diapers are more
likely to buy beer
 It's an appealing story because the result isn't
expected
 But it's also not true:

 Walmart may have done analysis along those
lines, but no story was published
 Its roots go back to a story in 1992 when Osco
Drugs did analysis that showed that all
customers who shopped from 5-7pm were more
likely to buy diapers and beer
Even if it were true, data mining is statistical
in nature
 Correlation is not causation

Looking for patterns in DNA can
help find links between genetics and
diseases like cancer
 Data mining can find trends in
negative reactions to prescription
drugs
 It has even been applied to college
education to see why some students
learn better than others (and even
why some drop out)

The free Internet dating website okcupid.com was
built by people who wanted to use good computer
science to match people
 They stopped updating their blog
(blog.okcupid.com) in 2011, but they published
some interesting analysis:

 If you want to know if someone will have sex on their
first date, ask:
▪ Do you like the taste of beer?
 If you want to know their politics, ask:
▪ Do you prefer life to be simple or complex?
 If you want to know if they are religious, ask:
▪ Do spelling and grammar mistakes annoy you?

You can read more at your own risk, noting that
the subject matter is mature
If you want to do something
repeatedly, you will use a loop in most
languages
 The loops in Scratch were

 The repeat loop which runs a fixed
number of times
 The forever if loop which runs forever if a
condition is true
 The forever loop which runs forever
 The repeat until loop which runs until a
condition becomes true

We only used the forever loop and the
repeat loop


Good news: There are only two loops to learn in
Python
The while loop runs as long as a condition is
true
countdown = 10
while countdown > 0:
print(countdown)
countdown = countdown – 1
print("Blast-off!")

In this case, when countdown becomes 0, the
loop stops (since 0 is not less than 0)

Perhaps the most common loop in Python is
the for loop, which will loop over everything
in a list of things
for letter in "Help me!":
print(letter)

This code prints each letter in the text on a
separate line


If you want to do something, say, 10 times in
Python, you can use a for loop
But you need to call the range() method to
generate a list of numbers for you
for i in range(10):
print(i)

In this case, the loop will run 10 times, and
the numbers 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9
will print out



The first line of every loop ends with a colon (:)
If you don't put it there, Python will not run your
code
The stuff inside of a loop needs to be indented
with a tab
 That's how you know what should be repeated and
what is after the loop
tab

for i in range(10):
print(i)
colon
If you type the colon, most Python editors
automatically indent the next line for you

What if we wanted to display 10 red balls on the
Visual Python screen?
x = -5
for column in range(10):
sphere(pos=vector(x,0,0),radius=.25,color=color.red)
x = x + 1



Each time the loop runs, a ball is created
If they were all in the same position, we would
only see one
To fix that problem, we start with x at -5 and
increase it by 1 every time


We can repeat while repeating something else
What if we wanted to display 4 rows of 10 red balls on
the Visual Python screen?
y = -2
for row in range(4):
x = -5
for column in range(10):
sphere(pos=vector(x,y,0),radius=.25,
color=color.red)
x = x + 1
y = y + 1


The inner loop is like the previous slide, but we run it
inside of an outer loop
We had to add a y variable so that each row is on a
different line



In Scratch there was a green block that allowed us to create random
numbers
In Python, there are functions that give us random values
Unfortunately, GlowScript only supports one of these methods,
random(), which gives a random floating point value between 0 and 1
value = random()

If you want a value between a and b, you can do the following:
value = (b – a)*random() + a

These are always floating point values, but you can use the int()
conversation operator to make integers
value = int((b – a)*random() + a)
Recall that we represent color as a group of red,
green, and blue intensities
 In Visual Python, those intensities are between 0 and 1

color1 = vector(1.0, 0, 0)
color2 = vector(1.0, 1.0, 1.0)
color3 = vector(0, 0, 0)

#bright red
#white
#black
We can create a sphere with a random color as follows
red = random()
green = random()
blue = random()
sphere(pos=vector(0,0,0), radius=1,
color=vector(red, green, blue))


Review
Lab 4

Finish Project 1
 Due this Friday before midnight



Review Python Chapters 1 – 3
Review Blown to Bits Chapter 1
Study for Exam 1 next Monday