Statistics project

Download Report

Transcript Statistics project

STATISTICS
BY NICK TANG AND NICK YU
HOW WOULD BIAS, USE OF LANGUAGE, ETHICS, COST, TIME AND TIMING,
PRIVACY, AND CULTURAL SENSITIVITY MAY INFLUENCE THE RESULTS
Bias, use of language, ethics, cost, time and timing, privacy, and cultural sensitivity may
influence how the data is represented and the way the data is collected. Data results could
be manipulated because when the data is being collected the collectors of the data could
manipulate the data collection method. Also, when the data is being represented it could
be manipulated because the numbers of the data and the method the data is displayed may
not match up.
BIAS
Influences the collection of data because you can manipulate the words to make one side of data
more or less appealing. This causes buyers, users, and customers to be misled by the wording of
the question and it make them feel obligated to choose one side.
Example: : Over 90% of Canadians today choose Tide over the leading
detergent
This is bias because it's leading the customers towards picking Tide
because they said that majority of Canadian use Tide.
USE OF LANGUAGE
influences the collection of data by leaving out certain information and replacing it with a bias word.
example: Most customers pick Campbell’s Chicken Noodle Soup over
President’s Choice Chicken Noodle Soup.
This is misleading because it uses the word most which makes buyers believe
that over 80% of customers choose it but in reality only 54% of customers
choose it.
ETHICS
influence the collection of data by asking the question in a certain way. A Bad ethic is asking a
question in an inappropriate way or crossing the line which could make the customer feel
uncomfortable. A good ethic is letting the customer make their own decisions and speak on their
own terms.
Example: A sales person calls you and you say you are not interested. The sales person
calls back later that day and continues throughout the week.
This is unethical because the sales person crosses the line by consistently calling when
they were told they were uninterested
COST
influences the collection of data because it may cost a lot of money to ask large amounts of people. Because
of this, smaller percentages of the population will be asked and it effects the data collected. Smaller
amounts of people may have similar opinions to each other and their opinions don’t relate to the rest of
the population.
Example : mcdonalds is asking their customers which new smoothie is their favourite. They
asked 10 customers instead of 100 to cut the cost of the questionnaires needed. 8 of 10
people liked the blueberry one, and 2 of 10 people liked the strawberry one. Their results
concluded that 80% of people asked like the new blueberry one.
This is misleading because the customers don’t know how many people were asked for the
poll, so they just assume that it’s a larger number than 10. This tricks customers into thinking
that the results are 100% true, when in reality very few people were asked and their opinion
doesn’t relate to everyone else.
TIME AND TIMING
influences the collection of data because the time effects what you are more likely to choose. Also
based on the month or what time of the year the questions being asked your answer can vary.
Example : : Asking someone what their favourite drink from Starbucks is
This is misleading because it depends on the month because in winter they will most likely choose a warm
drink
instead of a cold one.
PRIVACY
influences the collection of data because the information stays private and anyone could choose the
answers. This makes it inaccurate because other people who view the data wouldn't know how
they obtained it.
Example : Confidential surveys where others don't know who took the survey or
when the data was collected
This is misleading because the people would not know how the data was obtained
CULTURAL SENSITIVITY
influences the collection of data because if you ask a question to a certain culture they will answer
differently than people who are not in that culture.
Example : Asking people from a certain culture that can not eat ham, what brand of ham
they would choose.
This is misleading because the people who choose not to eat ham will not like a certain
brand of ham over another.
THE DIFFERENCE BETWEEN A POPULATION AND A SAMPLE
The difference between a population and a sample is how many people
were surveyed.A Population means everyone in the area was surveyed
the same way and then the results would be figured out with the data
that was collected. A sample means a portion of the population was
surveyed the same way and then the results would be figured out with
the data that was collected.
EXAMPLE
Population – all of Canada would be surveyed
Sample – a province of Canada would be surveyed
DIFFERENT TYPES OF SAMPLING METHODS
CONVENIENCE SAMPLE
• A convenience sample is one of the main types of non-probability sampling methods. A
convenience sample is made up of people who are easy to reach.
• Example: A pollster interviews shoppers at a local mall. If the mall was chosen because it
was a convenient site from which to survey participants or/and because It was close to
the interviewer’s home or business, then this would be a convenience sample.
• Convenient for the interviewer
RANDOM SAMPLE
• Random sampling is a procedure for sampling from a population in which the selection
of a sample unit is based on chance and every element in the population has a “known,
non-zero” probability of being selected
• Random sampling helps produce representative samples by eliminating voluntary
response bias and guarding against undercover bias. All good sampling methods rely on
random sampling.
STRATIFIED SAMPLE
• “Stratified” sampling refers to a type of sampling method. With stratified sampling, the
researcher divides the population into several groups, called strata. Then a simple
random sample is drawn from each group.
• Using stratified sampling, it may be possible to reduce the sampling size required to
achieve given precision. Or it may be possible to increase the precision with the same
sample size.
SYSTEMATIC SAMPLE
• With systematic random sampling, we create a list of every member of the population.
From the list, we randomly select the first sample element from the first k elements on
the population list. Afterwards, we select every kth elements on the population list.
• This is different from the simple random sampling since every possible sample of n
elements is not equally likely.
VOLUNTARY RESPONSE SAMPLE
• Main types of non-probability sampling methods. A voluntary sample is made up of
people who self-select into the survey. Often, these people have a strong interest in the
main topic of the survey
• Example: a news show asks viewers to participate in an on-line poll. This would be a
voluntary sample. The sample is chosen by the viewers, not by the survey administrator
DIFFERENCE BETWEEN THEORETICAL AND EXPERIMENTAL PROBABILITY.
Theoretical probability is the probability that is calculated using math formulas. This is the probability
based on math theory
Experimental probability is calculated when the actual situation or problem is performed as an
experiment. In this case, you would perform the experiment, and use the actual results to
determine the probability.
Example: Chance of flipping a heads or tails on a coin is 50/50. That is theoretical probability. But
when do you the tests (experimental probability) you most likely will never get the same number
of heads than tails.
3 EXAMPLES OF MISLEADING STATISTICS
EXAMPLE 1
Vitamin water is very misleading because they use
the word vitamin in their name which leads
consumers to believe it is healthy but are body
only needs so much of the vitamin and then the
rest of the vitamins get flushed out of our
bodies.We consume the vitamins then we
consume a big amount of sugar which is not
healthy for our bodies.
EXAMPLE 2
• A place where you could find misleading statistics
are on the news. They hope you wont notice and
often just slip in a graph with exaggerated data or
ones that are sometimes not even accurate at all.
There was a chart on a documentary where they
were showing lines on a graph going up, but there
weren’t even any numbers on the graph at all.
EXAMPLE 3
• One classic example involves false positives while testing for rare events.
• Suppose there is a test for tuberculosis and it’s given to every schoolkid in the US. Then it’s found that 99% of the
positive results were false positives; the kid was fine but the test said they were sick.
• Many people would interpret this to mean that the test has low accuracy. Not so - its accuracy is still pretty high,
but there are simply very few kids who actually do have tuberculosis out there, so the number of positive results
is simply very low.
• For example, if the test is given to 10 million kids and it returns the correct result 99.9% of the time (regardless
of the kid's health) and 100 kids in the country have tuberculosis, the test will have 100 or 99 or 98 correct
positive results and about 10,000 incorrect positive results. 99% of positive results will be wrong results, even
though the test was actually very accurate.