虚拟变量 ( Dummy Variables) 又称虚设变量、名义变量或哑变量,用以反映质的属性的一个人工变量,是量化了的自变量,通常取值为0或1。引入哑变量可使线形回归模型变得更复杂,但对问题描述更简明,一个方程能达到俩个方程的作用,而且接近现实。
3、提高模型的精度,相当于将不同属性的样本合并,扩大了样本容量(增加了误差自由度,从而降低了误差方差)
‘data.frame‘: 1000 obs. of 21 variables: $ V1 : Factor w/ 4 levels "A11","A12","A13",..: 1 2 4 1 1 4 4 2 4 2 ... $ V2 : int 6 48 12 42 24 36 24 36 12 30 ... $ V3 : Factor w/ 5 levels "A30","A31","A32",..: 5 3 5 3 4 3 3 3 3 5 ... $ V4 : Factor w/ 10 levels "A40","A41","A410",..: 5 5 8 4 1 8 4 2 5 1 ... $ V5 : int 1169 5951 2096 7882 4870 9055 2835 6948 3059 5234 ... $ V6 : Factor w/ 5 levels "A61","A62","A63",..: 5 1 1 1 1 5 3 1 4 1 ... $ V7 : Factor w/ 5 levels "A71","A72","A73",..: 5 3 4 4 3 3 5 3 4 1 ... $ V8 : int 4 2 2 2 3 2 3 2 2 4 ... $ V9 : Factor w/ 4 levels "A91","A92","A93",..: 3 2 3 3 3 3 3 3 1 4 ... $ V10: Factor w/ 3 levels "A101","A102",..: 1 1 1 3 1 1 1 1 1 1 ... $ V11: int 4 2 3 4 4 4 4 2 4 2 ... $ V12: Factor w/ 4 levels "A121","A122",..: 1 1 1 2 4 4 2 3 1 3 ... $ V13: int 67 22 49 45 53 35 53 35 61 28 ... $ V14: Factor w/ 3 levels "A141","A142",..: 3 3 3 3 3 3 3 3 3 3 ... $ V15: Factor w/ 3 levels "A151","A152",..: 2 2 2 3 3 3 2 1 2 2 ... $ V16: int 2 1 1 1 2 1 1 1 1 2 ... $ V17: Factor w/ 4 levels "A171","A172",..: 3 3 2 3 3 2 3 4 2 4 ... $ V18: int 1 1 2 2 2 2 1 1 1 1 ... $ V19: Factor w/ 2 levels "A191","A192": 2 1 1 1 1 2 1 2 1 1 ... $ V20: Factor w/ 2 levels "A201","A202": 1 1 1 1 1 1 1 1 1 1 ... $ V21: int 1 2 1 1 2 1 1 1 1 2 ...
变量V1是factor类型,有四个值A11,A12,A13,A14
在R语言中可以用两种方法将哑变量转成数值
方法一:
用neuralnet 包
A11 A12 A13 A14 [1,] 1 0 0 0 [2,] 0 1 0 0 [3,] 0 0 0 1 [4,] 1 0 0 0 [5,] 1 0 0 0 [6,] 0 0 0 1
原文地址:http://www.cnblogs.com/anny-1980/p/3822936.html