问题类型1:参数估计

真实值是否等于X?

给出数据,对于参数,可能的值的概率分布是多少?

例子1:抛硬币问题

硬币扔了n次,正面朝上是h次。

参数问题

想知道 p 的可能性。给定 n 扔的次数和 h 正面朝上次数,p 的值很可能接近 0.5,比如说在 [0.48,0.52]?

说明

* 参数的先验信念:p∼Uniform(0,1)
* 似然函数:data∼Bernoulli(p)

import pymc3 as pm import numpy.random as npr import numpy as np import
matplotlib.pyplot as plt import matplotlib as mpl from collections import
Counter import seaborn as sns sns.set_style('white') sns.set_context('poster')
%load_ext autoreload %autoreload 2 %matplotlib inline %config
InlineBackend.figure_format = 'retina' import warnings
warnings.filterwarnings('ignore') from random import shuffle total = 30 n_heads
= 11 n_tails = total - n_heads tosses = [1] * n_heads + [0] * n_tails
shuffle(tosses)
可视化数据:
def plot_coins(): fig = plt.figure() ax = fig.add_subplot(1,1,1)
ax.bar(list(Counter(tosses).keys()), list(Counter(tosses).values()))
ax.set_xticks([0, 1]) ax.set_xticklabels(['tails', 'heads']) ax.set_ylim(0, 20)
ax.set_yticks(np.arange(0, 21, 5)) return fig fig = plot_coins() plt.show()

 建立模型:
with pm.Model() as coin_model: # Specify prior using Uniform object. p_prior =
pm.Uniform('p', 0, 1) # Specify likelihood using Bernoulli object. like =
pm.Bernoulli('likelihood', p=p_prior, observed=tosses) # "observed=data" is key
for likelihood.
MCMC Inference Button 
with coin_model: step = pm.Metropolis() # focus on this, the Inference Button:
coin_trace = pm.sample(2000, step=step)
结果:
pm.traceplot(coin_trace) plt.show()

pm.plot_posterior(coin_trace[100:], color='#87ceeb',rope=[0.48,0.52],
point_estimate='mean', ref_val=0.5) plt.show()
 

 

例子2:化学活性问题

我有一个新开发的分子X; X在阻止流感方面的效果有多好?

实验

* 测试X的浓度范围,测量流感活动
* 根据实验结果计算 IC50:导致病毒复制率减半的X浓度。
数据

import numpy as np import pandas as pd chem_data = [(0.00080, 99), (0.00800,
91), (0.08000, 89), (0.40000, 89), (0.80000, 79), (1.60000, 61), (4.00000, 39),
(8.00000, 25), (80.00000, 4)] chem_df = pd.DataFrame(chem_data) chem_df.columns
= ['concentration', 'activity'] chem_df['concentration_log'] =
chem_df['concentration'].apply(lambda x:np.log10(x))
参数问题

给出数据,化学品的IC50 值是多少, 以及其周围的不确定性?

说明

可视化数据
def plot_chemical_data(log=True): fig = plt.figure(figsize=(10,6)) ax =
fig.add_subplot(1,1,1) if log: ax.scatter(x=chem_df['concentration_log'],
y=chem_df['activity']) ax.set_xlabel('log10(concentration (mM))', fontsize=20)
else: ax.scatter(x=chem_df['concentration'], y=chem_df['activity'])
ax.set_xlabel('concentration (mM)', fontsize=20) ax.set_xticklabels([int(i) for
i in ax.get_xticks()], fontsize=18) ax.set_yticklabels([int(i) for i in
ax.get_yticks()], fontsize=18) plt.hlines(y=50, xmin=min(ax.get_xlim()),
xmax=max(ax.get_xlim()), linestyles='--',) return fig fig =
plot_chemical_data(log=True) plt.show()

with pm.Model() as ic50_model: beta = pm.HalfNormal('beta', sd=100**2)
ic50_log10 = pm.Flat('IC50_log10') # Flat prior # MATH WITH DISTRIBUTION
OBJECTS! measurements = beta / (1 + np.exp(chem_df['concentration_log'].values
- ic50_log10)) y_like = pm.Normal('y_like', mu=measurements,
observed=chem_df['activity']) ic50 = pm.Deterministic('IC50', np.power(10,
ic50_log10)) # MCMC Inference Button with ic50_model: step = pm.Metropolis()
ic50_trace = pm.sample(10000, step=step) pm.traceplot(ic50_trace[2000:],
varnames=['IC50_log10', 'IC50']) # live: sample from step 2000 onwards.
plt.show()

 
pm.plot_posterior(ic50_trace[4000:], varnames=['IC50'], color='#87ceeb',
point_estimate='mean') plt.show()

 该化学物质的IC50在约 [2mM,2.4mM](95%HPD)

问题类型2:实验组之间的比较

实验组和对照组的不同

例子1:药物IQ问题

药物治疗是否影响 IQ Scores

数据(包括对照实验数据)
drug = [ 99., 110., 107., 104., 省略] placebo = [ 95., 105., 103., 99., 省略] def
ECDF(data): x = np.sort(data) y = np.cumsum(x) / np.sum(x) return x, ydef
plot_drug(): fig = plt.figure() ax = fig.add_subplot(1,1,1) x_drug, y_drug =
ECDF(drug) ax.plot(x_drug, y_drug, label='drug, n={0}'.format(len(drug)))
x_placebo, y_placebo = ECDF(placebo) ax.plot(x_placebo, y_placebo,
label='placebo, n={0}'.format(len(placebo))) ax.legend() ax.set_xlabel('IQ
Score') ax.set_ylabel('Cumulative Frequency') ax.hlines(0.5, ax.get_xlim()[0],
ax.get_xlim()[1], linestyle='--') return fig fig = plot_drug() plt.show()

 
from scipy.stats import ttest_ind ttest_ind(drug, placebo) #
Ttest_indResult(statistic=2.2806701634329549, pvalue=0.025011500508647616)
 

实验

* 随机将参与者分配给两个实验组:
* +drug vs. -drug
* 测量每个参与者的 IQ Scores
说明

建模: 
y_vals = np.concatenate([drug, placebo]) labels = ['drug'] * len(drug) +
['placebo'] * len(placebo) data = pd.DataFrame([y_vals, labels]).T data.columns
= ['IQ', 'treatment'] with pm.Model() as kruschke_model: # Linking Distribution
Objects together is done by # passing objects into other objects' parameters.
mu_drug = pm.Normal('mu_drug', mu=0, sd=100**2) mu_placebo =
pm.Normal('mu_placebo', mu=0, sd=100**2) sigma_drug =
pm.HalfCauchy('sigma_drug', beta=100) sigma_placebo =
pm.HalfCauchy('sigma_placebo', beta=100) nu = pm.Exponential('nu', lam=1/29) +
1 drug_like = pm.StudentT('drug', nu=nu, mu=mu_drug, sd=sigma_drug,
observed=drug) placebo_like = pm.StudentT('placebo', nu=nu, mu=mu_placebo,
sd=sigma_placebo, observed=placebo) diff_means = pm.Deterministic('diff_means',
mu_drug - mu_placebo) pooled_sd = pm.Deterministic('pooled_sd',
np.sqrt(np.power(sigma_drug, 2) + np.power(sigma_placebo, 2) / 2)) effect_size
= pm.Deterministic('effect_size', diff_means / pooled_sd) with kruschke_model:
kruschke_trace = pm.sample(10000, step=pm.Metropolis())

 结果:
pm.traceplot(kruschke_trace[2000:], varnames=['mu_drug', 'mu_placebo'])
plt.show()

 
pm.plot_posterior(kruschke_trace[2000:], color='#87ceeb',varnames=['mu_drug',
'mu_placebo', 'diff_means']) plt.show()

 

技术
©2019-2020 Toolsou All rights reserved,
hive大量小文件处理方法总结最优化方法总结:公式解、数值优化、求解思想JavaScript中的 Call 和 Apply内存溢出和内存泄漏的区别、产生原因以及解决方案创建数据mysql库流程微信小程序(uni-app)url参数传递对象蝗灾虫群上亿只很少发生碰撞 蝗虫要成自动驾驶功臣vue 监听 Treeselect 选择项的改变第十一届蓝桥杯C/C++ 大学 B 组大赛软件类省赛服务器价格有什么差异?