标签:ima pandas port col read imp color blog uniq
import pandas titanic = pandas.read_csv("titanic_train.csv") # 读取数据 # titanic.head() print titanic.describe() # 每一列的统计
从上图中可以看到,age字段有缺少值,可以用平均值进行填充 titanic["Age"] = titanic["Age"].fillna(titanic["Age"].median()) print titanic.describe()
将male转换成可以计算的int值,并赋值为0
print titanic["Sex"].unique() titanic.loc[titanic["Sex"] == "male","Sex"] = 0 titanic.loc[titanic["Sex"] == "female","Sex"] = 1
print titanic["Embarked"].unique() titanic["Embarked"] = titanic["Embarked"].fillna(‘S‘) titanic.loc[titanic["Embarked"] == "S","Embarked"] = 0 titanic.loc[titanic["Embarked"] == "C","Embarked"] = 1 titanic.loc[titanic["Embarked"] == "Q","Embarked"] = 2
标签:ima pandas port col read imp color blog uniq
原文地址:http://www.cnblogs.com/panjie123pakho/p/7878355.html