As definied in Wikipedia (https://en.wikipedia.org/wiki/Probability_distribution), a probability distribution is a mathematical function that, stated in simple terms, can be thought of as providing the probabilities of occurrence of different possible outcomes in an experiment.
There are different distributions:
Binomial Distribution Poisson Distribution Normal Distribution Exponential Distribution
The binomial distribution is a discrete probability distribution.
It describes the outcome of n independent trials in an experiment. Each trial is has only two outcomes, either success or failure (e.g., tossing a coin).
Suppose a binomial experiment consists of n trials and results in x successes. If the probability of success on an individual trial is P, then the binomial probability is:
bdist(x, n, p) = nCx * px * (1 – p)n – x
bdist(x, n, p) = (n! / (x! (n – x)! )) * px * (1 – p)n – x
In an Engineering entrance test for chemistry, there are 60 multiple choice questions with 4 choices each with only 1 correct choice. I am very poor in chemistry and want to ramdonly mark an anwser for each question and complete the test.
For each question (trail), the probability of success is 1/4 = 0.25 (4 choices each with only 1 correct choice)
Hence the probability of getting 10 correct answers is bdist(10,60,0.25)
bdist(10;60,0.25) = 60C10 * 0.2510 * (1 – 0.25)60 – 10
= (60! / (10!*50!)) * 0.2510 * (0.75)50
Below is few lines in Python:
import scipy.stats as ss pmf = ss.distributions.binom.pmf(10,60,0.25) #10 correct answers, 60 attempted, each probability of correctness 1/4 print(pmf)
suppose I want to know what is the probability of getting atleast 10 answers correct, then I need to calculate the cumulative binomial probability i.e., bdist (x > 10, 60, 0.25) which is:
sum of (bdist(10;60,0.25), bdist(11;60,0.25), bdist(12;60,0.25), bdist(13;60,0.25)….., bdist(60;60,0.25))
This is calculated as 0.95483251
Below is few lines in Python:
import scipy.stats as ss spmf =0.0 for x in range(10,60): spmf += ss.distributions.binom.pmf(x,60,0.25) #at least 10 correct answers, 60 attempted, each probability of correctness 1/4 print(spmf)
Poisson Distribution: As per investopedia, a Poisson distribution is a statistical distribution showing the likely number of times that an event will occur within a specified period of time. It is used for independent events which occur at a constant rate within a given interval of time.
Suppose we conduct experiment several times and the average number (mean) of success is represented by m, the poisson probability is for the actual number of successes
pdist(x, m) = (e-m) (mx) / x!
I have noticed that on an average, I get 7 spam calls a week (banks offering credit cards, advertising, people offering investment services etc). This happens usually during weekends. Suppose I want to calculate the probability of getting 4 spam calls next week, I could use the Poisson distribution formula – pdist(4, 7) = (e-7) (74) / 4! which calculates to 0.09122619.
Also, I want to clauclate the probability of 4 or less spam calls, it becomes the cumulative probability of 0 spam calls, 1 spam call, 2 spam calls, 3 spam calls, 4 spam calls. This should be 0.17299160
Below is few lines in Python:
import scipy.stats as ss pmf = ss.distributions.poisson.pmf(4,7) #7 spam calls average a week, calculating probability of 4 spam calls print(pmf) spmf =0.0 for x in range(0,5): spmf += ss.distributions.poisson.pmf(x,7) #7 spam calls average a week, calculating probability of at least 4 spam calls print(spmf)
Unlike Binomial and Poisson distributions, this is a continious distribution (can take on any value within the range of the random variable and not just integers). Here the probability values are expressed in terms of an area under a curve (bell and symmetrical) that represents the continuous distribution.
is the mean or expectation of the distribution (and also its median and mode), is the standard deviation, and is the variance.
About 68% of the area under the curve falls within 1 standard deviation of the mean. 95% of the area under the curve falls within 2 standard deviations of the mean and 99.7% of the area under the curve falls within 3 standard deviations of the mean.
Suppose, I know that the average score in chemistry test in a class is 68.5 %. I also know that the standard deviation is 10.8 %. To find probability of score <= 80 %, we can use Normal distribution. This calculates to 0.85652013
import scipy.stats as ss cdf = ss.distributions.norm.cdf(80.0, 68.5, 10.8) print(cdf)
Please note that we are using cumiulative distribution function (cdf)
Suppose we need to know probability of score between 80 and 90 %, we need to calculate the cdf for <90% and <80% and subtract the cdfs
cdf for <90% is calculate to be 0.97674530.
probability of score between 80 and 90 % = 0.97674530 – 0.85652013 = 0.12022517
import scipy.stats as ss cdf80 = ss.distributions.norm.cdf(80.0, 68.5, 10.8) cdf90 = ss.distributions.norm.cdf(90.0, 68.5, 10.8) print(cdf90-cdf80)
Exponential distribution: As defined in wikipedia, Exponential distribution is the probability distribution that describes the time between events in a process in which events occur continuously and independently at a constant average rate. It is often used to model the time elapsed between events.
The probability density function (pdf) of an exponential distribution is
The cumulative distribution function is given by
- λ is the mean time between events
- x is a random variable
While driving on Bangalore roads, suppose you usually drive over 5 potholes per hour. In order to compute the probability that a pothole will arrive within the next half an hour -> 5 potholes per hour means that we would expect one pothole every 1/5 hour so λ = 0.2. We can then compute this as follows:
P(0 <= X <= 1)= expcdf(0.5,0.2) = 0.259181
import scipy.stats as ss cdfexp = ss.distributions.expon.cdf(0.5,0.2) print(cdfexp)