标签:ssi 随机 sum 问题 str group 组合 cat sig
当我们想通过小样本量的数据推测大样本(总体)的参数情况时,用到的方法我们称之为参数检验。那何为参数,像我们经常用到的均值或误差都是参数。案例如下:
我们想知道初一学生(总体)的平均年龄(参数),可通过抽样出的目标人群(样本)的实际平均年龄进行检验。
参数检验根据对总体了解的程度可分为两类,1)已知总体参数的部分情况(比如Z检验) 和 2)对下总体参数一无所知(比如t检验)
确定参数统计量
统计学中把总体的指标统称为参数。 而由样本算得的相应的总体指标称为统计量。 如研究某地成年男子的平均脉搏数(次/分),并从该地抽取1000名成年男子进行测量,所得的样本平均数即称为统计量。
p
显著性水平,我们日常判断一件事不是小概率事件的概率水平,通常设定为0.05,表示95%的概率这不是小概率事件而是件常事。所以当P>0.05时,说明原假设大概率是真的,而<0.05则拒真。
数据种类 | 数据类型 | 参数类型 | 检验方法 |
---|---|---|---|
One sample | Noraml and same distribution | Parametric | One-sample t-test |
One sample | unknown | Nonparametric | Wilcoxon signed-rank test |
Matched pairs | Noraml and same distribution | Parametric | two-sample t-test |
Matched pairs | unknown | Nonparametric | Wilcoxon signed-rank test |
two independent pairs | Noraml and same distribution | Parametric | two-sample t-test |
two independent pairs | unknown | Nonparametric | Wilcoxon rank sum test(or Mann-Whitney Test) |
Q:
假设有两组来自同一分布的样本,随机分成两组,比较两者是否还是同分布。
A = (1.3, 3.4), nA = 2, A~F
B = (4.9,10.3,3.3), nB = 3, B~F
A:
> term <- c(0.80, 0.83, 1.89, 1.04, 1.45,1.38, 1.91, 1.64, 0.73, 1.46)
> mid <- c(1.15, 0.88, 0.90, 0.74,1.21)
> rank(c(term,mid))
[1] 3 4 14 7 11 10 15 13 1 12 8 5 6 2 9
> sum(rank(c(term,mid))[1:10])
[1] 90
> sum(rank(c(term,mid))[1:10])-(10*11/2)
[1] 35
> 1-pwilcox(34,10,5) # Beware this is a discrete random variable....
[1] 0.1272061 Output from the test:
> wilcox.test(term, mid, alternative = "g") # greater
Wilcoxon rank sum test
data: term and mid W = 35, p-value = 0.1272
alternative hypothesis: true mu is greater than 0
如果生男生女概率相同,意味了生孩子属于某一性别的概率为0.5,在统计学里可以对应为正态分布。所以检测两组成对样本间是否有显著差异的问题,可以转变为差异量是否符合0-1标准正态分布问题。
Q:
比较黑鱼中两种细胞检测物质所测量出的汞含量之间是否有显著差异?
A:
原假设:差异量D均匀于0
> pnorm(-1.27)
[1] 0.1020423
2 sided P 0.203
> wilcox.test(Hg~way,paired=T)
Wilcoxon signed rank test with correction
data: Hg by way
V =107,
p-value = 0.2242
alternative hypothesis: true mu is
not equal to 0
> pt(-1.745,24)
[1] 0.04688927
2 sided P 0.094
> t.test(Hg~way,paired=T)
Paired t-test
data: Hg by way
t = -1.7448, df = 24,
p-value =0.0938
alternative hypothesis:
true difference
in means is not equal to 0
95 percent confidence interval:
-0.088189837 0.007389837
sample estimates:
mean of the differences
-0.0404
House price
price <-c(120, 110, 108, 100, 150, 106, 100, 100, 114, 130, 122, 100, 120, 130, 115, 112, 126, 110, 120, 128)
hist(price)
Use the following command to perform a one-sample t-test, testing the null hypothesis that the population mean is 118.
t.test(price, mu=118)
##
## One Sample t-test
##
## data: price
## t = -0.67654, df = 19, p-value = 0.5069
## alternative hypothesis: true mean is not equal to 118
## 95 percent confidence interval:
## 110.0172 122.0828
## sample estimates:
## mean of x
## 116.05
Use the wilcox.test() command to compare the results to a wilcoxon signed ranks test. For this test, you should test the null hypothesis that the population median=118.
wilcox.test(price, mu=118)
## Warning in wilcox.test.default(price, mu = 118): cannot compute exact p-
## value with ties
##
## Wilcoxon signed rank test with continuity correction
##
## data: price
## V = 80, p-value = 0.3594
## alternative hypothesis: true location is not equal to 118
Salary data
Salary <- c(18.9,10.5, 17.5, 13.1, 13.0, 18.2, 22.0, 13.0, 25.0, 12.2, 10.3,15.5, 24.4, 11.8, 15.0, 25.6, 11.8, 22.8, 19.4, 12.3, 22.7, 27.3, 16.0, 11.0, 12.6, 17.7, 17.2, 20.2, 34.0, 36.4, 11.3, 24.0, 17.6, 26.0, 25.7, 17.2, 14.1, 22.0, 17.2, 20.9, 16.8, 19.3, 15.8, 27.0, 20.4, 25.5, 30.1, 28.3, 29.5, 31.6)
Sector<- c(rep(0,25), rep(1,25))
hist(Salary[Sector==0])
hist(Salary[Sector==1])
Use the following command to perform the two-sample t-test. This assumes that the population variances for each population are equal. We are testing the null hypothesis that the population mean for group 1 = population mean for group 2.
t.test(Salary~Sector, var.equal=TRUE)
##
## Two Sample t-test
##
## data: Salary by Sector
## t = -3.3933, df = 48, p-value = 0.001392
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -9.16664 -2.34536
## sample estimates:
## mean in group 0 mean in group 1
## 16.876 22.632
Since the normality assumptions is dubious from the histograms. Use the wilcox.test() command to perform a Wilcoxon rank sum test(Mann-Whitney test) and compare the results to the two-sample t-test.
wilcox.test(Salary~Sector)
## Warning in wilcox.test.default(x = c(18.9, 10.5, 17.5, 13.1, 13, 18.2,
## 22, : cannot compute exact p-value with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: Salary by Sector
## W = 156.5, p-value = 0.002547
## alternative hypothesis: true location shift is not equal to 0
标签:ssi 随机 sum 问题 str group 组合 cat sig
原文地址:https://www.cnblogs.com/daiweifan/p/12455872.html