3 - Marriott School
Download
Report
Transcript 3 - Marriott School
HOMEWORK #3
C HAPTER 4 : D E SCRIBIN G N UMERICAL DATA
CHAPTER 4 PROBLEM 54
Cars. A column in this data file gives the engine displacement in
liters of 509 vehicles sold in the United States. These vehicles are
2012 models, are not hybrids, have automatic transmissions, and
lack turbochargers. Another column in this data file cars gives the
rated combined fuel economy (in miles per gallon) for 509 vehicles
sold in the United States.
CHAPTER 4 PROBLEM 54 (A)
Produce a histogram of these data. Describe and interpret the
histogram.
• The histogram extends from 10 to 37.5 with 2.5 sized bins. The
histogram peaks in the bin 20-22.5. The histogram is right
skewed.
CHAPTER 4 PROBLEM 54 (B)
Compare the histogram to the boxplot. What does the histogram
tell you that the boxplot does not, and vice versa?
• The boxplot tells you the mean, median, IQR, and that there is
an outlier. The histogram shows more about the shape of the
distribution and where observations actually locate.
CHAPTER 4 PROBLEM 54 (C)
Find the mean and standard deviation of the rated mileages. How
are these related to the histogram, if at all?
• The mean = 20.03 and standard deviation =4.81. The mean is
the middle and the SD explains the deviation from the mean.
CHAPTER 4 PROBLEM 54 (D)
Find the coefficient of variation and briefly interpret its value.
CV=24.03
• A higher CV (100+) denotes variation. These data are not very
spread out.
CHAPTER 4 PROBLEM 54 (E)
Identify any unusual values (outliers). Do you think that these are
coding errors?
• There is one outlier at 37 mpg which is the Scion iQ. This
probably isn’t a blunder or rogue but truly an interesting outlier
CHAPTER 4 PROBLEM 54 (F)
Government standards call for cars to get 27.5 MPG. What
percentage of these vehicles meet this goal? (Are all of these
vehicles cars?)
• We have created a variable which is one when the mileage is at
least 27.5 and 0 otherwise. Then we simply need to sum the
variable and divide by the sample size. This gives approximately
8%.
CHAPTER 4 PROBLEM 57
Information Industry. This data table includes several
characteristics of 428 companies classified as being in the
information industry in 2010. One column gives the total revenue
of the company, in millions of dollars.
CHAPTER 4 PROBLEM 57(A)
Find the median, mean, and
standard deviation of the total
revenue of these companies.
What units do these summary
statistics share?
• Mean=2240.16
• Median=219.81
• SD=9622.30
• They are all $1,000s
CHAPTER 4 PROBLEM 57(B)
Describe the shape of the histogram and boxplot. What does the
White Space Rule have to say about the histogram?
• It is all white space. The data is highly concentrated on the lower
end and there are some outliers that are very very high. These
outliers conceal much of the data.
CHAPTER 4 PROBLEM 57(C)
Do the data have any extreme outliers? Identify the company if
there’s an extreme outlier.
• AT&T, Verizon, and Microsoft are all extreme outliers.
CHAPTER 4 PROBLEM 57(D)
What do these graphs of the distribution of net sales tell you
about this industry? Is this industry dominated by a few
companies, or is there a level playing field with many comparable
rivals?
• There are several dominant companies at the top and there are
many less competitive companies fighting at the bottom.
CHAPTER 4 PROBLEM 59
Tech Stocks. These data give the monthly returns on stocks in
three technology companies: Dell, IBM, and Microsoft. For each
month from January 1990 through the end of 2005 (192 months),
the data give the return earned by owning a share of stock in each
company. The return is the percentage change in the price, divided
by 100.
CHAPTER 4 PROBLEM 59(A)
a. Describe and contrast histograms of the three companies. Be
sure to use a common scale for the data axes of the histograms
to make the comparison easier and more reliable.
CHAPTER 4 PROBLEM 59(A)
• The histograms, boxplots, and violin plots show that Dell has the highest
median and IQR. Microsoft has the second highest median and IQR. IBM
has lowest median and interquartile range. Microsoft has the most outliers.
CHAPTER 4 PROBLEM 59(B)
Find the mean, SD, and coefficient of variation for each set of returns.
Are means and SDs useful summaries of variables such as these?
• The means and standard deviations are regularly used to characterize
the expected returns and risks of equity market data. Because this type
of data often deviate from the assumptions of a normal distribution,
we should exercise case when interpreting them.
CHAPTER 4 PROBLEM 59(C)
What does comparison of the coefficients of variation tell you
about these three stocks?
• The CVs tell us that Dell varies least, then Microsoft, and IBM
varies the most. In this case, however, because the means are so
close to zero, the CV’s are not good indicators of risk or scale.
Coefficients of variation are valuable only when the means are
not close to zero.
CHAPTER 4 PROBLEM 59(D)
Investors prefer stocks that grow steadily. In that case, what values
are ideal for the mean and SD of the returns? For the coefficient
of variation?
• Investors would prefer smaller CVs that denote less variability.
Investors would also like to see positively skewed data as well,
which leans towards growth. In this case, because the means are
so close to zero, the CV’s are not good indicators of risk or scale.
CHAPTER 4 PROBLEM 59(E)
It is common to find that stocks that have a high average return
also tend to be more volatile, with larger swings in price. Is that
true for these three stocks?
• Yes. The highest means/medians have the highest SD.