Impact of sodium butyrate supplementation on global gene

Download Report

Transcript Impact of sodium butyrate supplementation on global gene

A REVIEW OF OCCUPANCY PROBLEMS
AND THEIR APPLICATIONS WITH A MATLAB
DEMO
Samuel Khuvis, Undergraduate
Nagaraj Neerchal, Professor of Statistics
Department of Mathematics and Statistics, University of Maryland Baltimore County,
1000 Hilltop Circle, Baltimore, MD 21250
Acknowledgments: I would like to thank Andrew Raim and all of the members of
CIRC for their help.
Applications
Abstract
Statistical Mechanics
Consider an experiment of randomly
distributing r balls into n cells. One can
conceive several easily described probability
problems related to this experiment. Obtaining
the probability that no two adjacent cells are
empty, finding the distribution of the number of
balls occupying a given cell and deriving the
distribution of the smallest number of balls over
all cells are a few examples of such problems
which are collectively referred to as
occupancy problems. Solutions to some of
these problems are non-trivial and in fact some
naturally give rise to well known probability
distributions such as binomial and multinomial
distributions. Occupancy problems have found
important applications in many areas.
Distribution of Bose- Einstein and Fermi- Dirac
statistics are the most celebrated examples of
such applications. More recently, questions
from genetics, involving non-randomness of
occurrence of mutagen-induced mutations
across loci, have also been connected to this
general topic. In this poster, we provide a
glimpse to the probability calculations
underlying occupancy problems, and
demonstrate them using an interactive MATLAB
program.
-We have r indistinguishable particles subdivided into n small regions, or
phase spaces with the particles being randomly distributed into these
phase spaces
-It would seem that all arrangements are equally possible, however
physicists have shown that this is not the case. So, there are two statistics
to describe the behavior of particles:
-Fermi-Dirac Statistics
-In this realization, no two particles may be in the same cell and
all distinguishable arrangements have equal probabilities
-This means that r ≤ n, so any of the
arrangements can be
chosen by randomly selecting which r cells contain a particle.
Each arrangement has a probability of
and describes the
behavior of electrons, protons, and neutrons.
-Bose-Einstein Statistics
-In this realization, each distinguishable arrangement is given a
probability of
-This has been proven, experimentally, to describe the behavior
of photons, nuclei, and atoms that have an even number of
elementary particles
Population Genetics
Fig. 1: This is a screenshot of the MATLAB Demo used to
visualize the occupancy problems. With six different
operations that may be selected to the right.
Examples
A
Basic Calculations Concerning
the Occupancy Problem
B
|S| = (n) (n)…(n)=
In the program, realizations were generated by:
For i=1 to number of balls
Generate a random number from 1 to the number of cells
with each number having a uniform probability of occurring
End
C
Multinomial Distribution
D
-A multinomial distribution is similar to a binomial with the
exception that instead of having 2 possible outcomes,
there are greater than 2 possible outcomes
-Let = number of balls in cell 1 and
= number of balls
in cell 2
-The third outcome is a ball going into a cell other than
cell 1 or cell 2
-( , ) ~ multinomial
Fig. 2: These are four realizations generated by the MATLAB
demo of an experiment in which 5 balls are thrown into 4
cells.
-Since genetic data is often analyzed through categorical
observations, the computation of expected frequencies
of different genetic models can be described using
occupancy problems
-These are important in genetics when testing the nonrandomness of mutagen-induced mutations across loci.
-The occupancy problem is applied to these analyses to
combinatorially solve the problem of an inadequate
sample size.
-In this application, r is the size of the random sample and
n is the number of classes being analyzed in the sample
Matlab Demo
Function 1: Can generate one realization at a time for a certain number of
balls and cells.
Function 2: Can simulate a large number of realizations and empirically
compute probabilities.
Function 3: Allows the user to change the number of balls and cells
Minimum Calculations
Y=minimum number of balls occurring in any cell
So,
Binomial Distributions
-A binomial distribution describes the distribution of results of an
experiment in which:
1. There is a sequence of n trials, where n is fixed in advance
2. Each trial results in one of two possible outcomes, which is
denoted as either a success or a failure
3. The trials are independent, so each outcome on any particular
trial does not influence the outcome of any other trial
4. The probability of success (p) is constant from trial to trial
-Where the probability of x number of successes is
Output 1: One arrangement of 50 balls and 25 cells
Output 2: Randomly generates birthdays of50 people
For Y>0 it is non-trivial to calculate the P(Y) without the use of a
simulation.
Conclusion
This has only been a basic introduction to
occupancy problems and there are many other
calculations that may be done based on the
experiment of throwing r balls into n cells.
Binomial Distribution of
1st
Cell
T=number of balls in cell 1, where T is a random variable
t=0, 1, 2,…, r, where t is all r of the balls being thrown
: exactly t of ‘s are 1 and the
others are not 1}
r-t
Number of (T=t)=
(n-1)
These problems have many applications in the
natural sciences, especially in physics. More
complex calculations are able to explain the
behaviors of elementary particles. Using the
simulation method, we can begin to understand
the probability distributions which arise from these
models.
Output 3: Displays the number of balls in each cell over 1,000 realizations
Output 4: Displays the minimum number of balls for each of 10,000 realizations
if each arrangement has 50 balls and 25 cells selected by the user
References
So, T is a binomial distribution such that T ~ Bin(r, )
Feller, William. An Introduction to Probability Theory and
Its Applications. New York: John Wiley & Sons, 1950.
Chakraborty, Ranajit. “A Class Population Genetic
Questions Formulated as the Generalized Occupancy
Problem.” Genetics Society of America (1993) 953-958.
Output 5: Displays the distribution of the balls in the first cell over 1000
realizations
Output 6: Shows how many days have a certain number of births in common