罐子裡有100顆球,紅色的球佔40%,綠色的球佔60%。從罐子裡取出10顆球,希望出現最多1顆紅色球的機率是多少?
這個問題,可以用下圖表示:
此問題,有兩種解法:
- 不放回取樣 (Sampling without Replacement)
- 放回取樣 (Sampling with Replacement)
Sampling without Replace
在上一篇,A Question of Hoeffding's Inequality,計算機率的方式,是用Sampling without Replace。本篇,用Python程式觀察,當母體(Population)數量變大的情況下,所要球的機率是否有極限值?
Bin3.py (from Gist)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# | |
# Peak balls from a bin (Sampling wihtout Replacement) | |
# | |
import math | |
import matplotlib.pyplot as plt | |
import numpy as np | |
# | |
# n Choice r | |
# | |
def combination (n, r): | |
f = math.factorial | |
return f(n) // f(r) // f(n-r) | |
# | |
# Probability of the question. | |
# | |
def prob (n, mu): | |
n1 = int (n * mu) | |
n2 = n - n1 | |
p1 = combination (n1, 1) * combination (n2, 9) / combination (n, 10) | |
p2 = combination (n1, 0) * combination (n2, 10) / combination (n, 10) | |
p = p1 + p2 | |
return p | |
if __name__ == '__main__': | |
print ("prob (100, 0.4) = ", prob (100, 0.4)) | |
print ("prob (1000, 0.4) = ", prob (1000, 0.4)) | |
print ("prob (10000, 0.4) = ", prob (10000, 0.4)) | |
largestY = prob (100000, 0.4) | |
print ("prob (100000, 0.4) = ", largestY) | |
x = np.arange (100, 3000, 10) | |
y = [prob (x, 0.4) for x in x] | |
plt.plot (x, y, label = 'y = prob (x, 0.4)') | |
plt.axhline (y = largestY, color='r', linestyle='-', label='y = %f' % largestY) | |
plt.legend () | |
plt.show () |
執行結果:
prob (100, 0.4) = 0.038515760543954836
prob (1000, 0.4) = 0.045571401459545594
prob (10000, 0.4) = 0.04627879605450445
prob (100000, 0.4) = 0.04634954100183593
上圖看來,當Population夠大的情況下,似乎 prob = 0.0463...是極限了。我們先觀察有這個現象,如何用數學證明,待我研究完後,再給大家參考。
Sampling with Replacements
如從罐子每取出一顆,馬上放回,那麼這個機率,就很好計算,解釋如下:
因為取球再放回去,每次都是獨立事件,所以計算10次都是綠色球的機率,就是將每次取球為綠色的機率0.6自乘10次:
P [# = 0] = 0.6^10 = 0.006
10次裡面,只有1次為紅色球的機率,就是先球9次為為綠色的機率 (0.6^9),再乘上1次為紅色的機率 (0.4)。因為不確定紅色的球會出現在第幾次,但有10個可能出現的位置,所以再乘上10:
P [# = 1] = 0.6^9 * 0.4 * 10
最後將兩個機率加起來,就得到最後的值
P [v <= 0.1] = P [# = 0] + P [# = 1] = 0.046…
用Sampling with Replacements方法的好處是:
- 不需要考慮母體的大小
- 計算方式簡單
- 當母體數量夠大,其結果接近Sampling without Replacement
下一篇,Use Python to Emulate Sampling with Replacement,用Python模擬這個問題。
-Count
沒有留言:
張貼留言