Entropy: Measure of Diversity?
Download
Report
Transcript Entropy: Measure of Diversity?
Entropy: Measure of
Diversity?
David Lee Baker
David E. Booth
William Acar
Management & Information Systems Working Paper MIS2007-08:1
(Do not cite without permission)
1
Entropy—In Management Strategy I
• Managers are interested in knowing how
diversified are the firm’s lines of business
• It has been considered (and aptly
discussed and hotly debated) that more
diverse businesses are more profitable
– May not always be the case though
2
Entropy—In Management Strategy II
• In diversification we need to consider related
(similar to the firm’s core business) versus
unrelated (dissimilar) diversification
• In related diversification the firm’s several lines
of business, even though distinct, still possess
some kind of “fit”
• In unrelated diversification there is no common
linkage or element of fit among the firm’s lines of
business
– In this sense unrelated diversification may be
considered as “pure” diversification
3
Diversity—An Example
• If we take a beaker of water (H2O) and a very
concentrated solution of red food coloring and
we then add a drop of the coloring to the water
we will see the red color diffusing throughout the
water and thus, we go from concentration to
diversification
• Economists, as well as chemists and physicists,
want to define concentration vs. diversification
• Concentration and diversification are two ends
of the spectrum
4
Entropy—Background & History I
• Introduction of a mysterious entity called
the H-function, or statistical entropy, by
Ludwig Boltzmann (1896)
– Defined as the mean value of the logarithm of
a probability density function (p.d.f.)
• Measured the amount of uncertainty about the
possible states of a physical system
• Because there was still disagreement about the
existence of atoms his statistical entropy
generated much debate
5
Entropy—Background & History II
• Claude E. Shannon (1948) generalized Boltzmann’s
entropy to information theory and proved that it had
the properties that allow it to be taken as the
average amount of information conveyed by a
discrete random variable about another
• Mathematicians have further refined the Shannon
entropy, and new tools, such as the relative or
conditional entropies have been developed
• Norbert Wiener and Claude E. Shannon along with
others extended Boltzmann’s earlier theories to
more general cases
– Shannon had studied under Wiener at MIT in the late 1930s,
graduating with both a master’s and doctorate in mathematics
6
Entropy—Background & History III
• Mathematically this reduces to the amount
of uncertainty contained in a probabilistic
experiment A as measured by the
function:
Hm (p1, . . . , pm) = – i = 1 ∑m pi log pi,
7
Entropy—Background & History IV
• The Entropy (inverse) measure of industry
concentration weights each pi by the
logarithm (log) of 1/pi, e.g.:
E = i = 1 ∑n pi log 1/pi,
Notice that we have replaced H by E and
used the fact that –log(A)=log(1/A)
8
Herfindhahl’s Measure
• Herfindahl’s contribution to diversification
measures was the suggestion that the
share of each firm be weighted by itself,
i.e..: (using H for Herfindahl)
H=
n p p
∑
i=1
i i
9
Decomposability of Entropy
• Entropy is a decomposable measure
(Khinchin’s Decomposition Theorem)
• Herfindahl is decomposable because
• in fact,
Herfindahl
is an approximation to Entropy
• 2 & 4 digit SIC code is compatible with
these decompositions but is NAICS?
• Further research is needed
10
Diversification-Score Anomalies
Based on Entropy Decomposition proposed in this paper.
Probably violates the Decomposition Theorem
Note that columns 17 & 18 do not sum to column 16.
Source: Ragunathan (1995), Journal of Management, 21(5), June, excerpts of p. 992.
11
Corporate Diversification—Correct Totals
Note that columns 8 & 9 add up to column 7, as they should.
Source: Jacquemin & Berry (1979), The Journal of Industrial Economics, 27(4), June, p. 362.
12
Triangular Numbers I
These are the first 100 triangular numbers:
Source: http://www.mathematische-basteleien.de/triangularnumber.htm
13
Triangular Numbers II
You can illustrate the name triangular number by the
following drawing:
Source: http://www.mathematische-basteleien.de/triangularnumber.htm
14
Sample Triangular Load
Distribution—Graph
Triangular Load
0.3500
1; 0.3333
0.3000
2; 0.2667
0.2500
3; 0.2000
0.2000
0.1500
4; 0.1333
0.1000
5; 0.0667
0.0500
0.0000
0
1
2
3
4
5
6
15
Triangular Distributions—Examples
Position in Pascal's Triangle top
...
...
Pascal's triangle makes a contribution to
many fields of the number theory.
The red numbers are triangle numbers.
You even can find the sum of the
triangular numbers easily.
Example: 1+3+6+10+15=35
You can express the triangular numbers as binomial coefficients
Source: http://www.mathematische-basteleien.de/triangularnumber.htm
16
Triangular Distributions—Analyses I
Triangular
Samples,
Figure #
Range of Values
n
3.1
5/15, 4/15, 3/15, 2/15, 1/15
5
3.2
10/55, 9/55, 8/55, 7/55, . . . 1/55
10
17
Calibrated
A2, Acar-Troutt
Single-Sum Formula
Calibrated
A1, Acar-Bhatnagar
Calibrated
Entropy
n
Uncalibrated
Entropy
Range of Values
Calibrated
Herfindahl
TRIANGULAR
SAMPLES
Figure #
Uncalibrated
Herfindahl
Triangular Distributions—Analyses II
3.1
5/15, 4/15, 3/15, 2/15, 1/15
5
0.75556
0.94444
1.48975
0.92563
0.50000
0.66667
3.2
10/55, 9/55. 8/55, 7/55, . . . 1/55
10
0.87273
0.96970
2.15128
0.93429
0.51279
0.66667
18
Concluding Remarks
and Future Directions II
• In their seminal article, Jacquemin and
Berry (1979) have specified how the
decomposition of the Entropy measure
can be related to SIC codes by breaking
down the diversity measurement between
the 2-digit and the 4-digit codes. We now
need to see if that is still true for NAICS
– We will be following up on their work and
further examining statistical properties
19
20
Triangular Number Theory
A triangular number is the sum of the n natural numbers from 1 to n.
Triangular numbers are so called because they describe numbers of
balls that can be arranged in a triangle. The nth triangular number is
given by the following formula:
Tn = k=1∑n k = 1+2+3+ . . . +(n-2)+(n-1)+n = n(n+1) = n2+1 = (n+1)
2
2
( 2 )
As shown in the rightmost term of this formula, every triangular number
is a binomial coefficient: the nth triangular is the number of distinct
pairs to be selected from n + 1 objects. In this form it solves the
'handshake problem' of counting the number of handshakes if each
person in a room shakes hands once with each other person.
The sequence of triangular numbers (sequence A000217 in OEIS) for
n = 1, 2, 3... is:
1, 3, 6, 10, 15, 21, 28, 36, 45, 55, ...
21
Thermodynamics
• The first law of Thermodynamics which states
that energy is neither created or destroyed
directs us to a world where energy is lost
• The second law says that entropy always tends
to increase in a closed system, forecasting a
universe that is constantly winding down
• The tension between the first and second laws
runs like a recurring theme between turn-of-the
century cultural formations
22
NAICS vs. SIC Codes
• The North American Industry Classification
System (NAICS) has replaced the U.S.
Standard Industrial Classification (SIC)
system. NAICS will reshape the way we
view our changing economy.
• NAICS was developed jointly by the U.S.,
Canada, and Mexico to provide new
comparability in statistics about business
activity across North America.
23
NAICS
• The official 2007 US NAICS Manual North
American Industry Classification System--United
States, 2007 includes definitions for each
industry, tables showing correspondence
between 2007 NAICS and 2002 NAICS for
codes that changed, and a comprehensive
index--features also available on this web site.
To order the 1400-page 2007 Manual, in print,
call NTIS at (800) 553-6847 or (703) 605-6000,
or check the NTIS web site. The 2002 Manual,
showing correspondence between 2002 NAICS
and 1997 NAICS, and the 1997 Manual,
showing correspondence between 1997 NAICS
and 1987 SIC, are also available.
24