标签:desc local else unless host attribute already 进一步 http
First,we download a data file to the localhost , such as crx.data file
we will use pure python operation crx.data file
step are as follows
input : crx.data file
output : A 2-D list
it should look like
>>> output
[[data_0], [data_1], [data_2], ...]
individual data example
>>> data_[0]
[‘b‘, 30.83, 0, ‘u‘, ‘g‘, ‘w‘, ‘v‘, 1.25, ‘t‘, ‘t‘, ‘01‘, ‘f‘, ‘g‘, ‘00202‘, 0, ‘+‘]
Mind the data types,Do‘t make all of them string.注意数据类型
my code is as follows,for reference only
file_name = "E:\data\crx.data"
data_ = open(file_name, ‘r‘)
# print(data_)
lines = data_.readlines()
output = []
# never use built-in names unless you mean to replace it
for list_str in lines:
str_list = list_str[:-1].split(",")
# keep it
# str_list.remove(str_list[len(str_list)-1])
data = []
for substr in str_list:
if substr.isdigit():
if len(substr) > 1 and substr.startswith(‘0‘):
data.append(substr)
else:
substr = int(substr)
data.append(substr)
else:
try:
current = float(substr)
data.append(current)
except ValueError as e:
if substr == ‘?‘:
substr = ‘missing‘
data.append(substr)
output.append(data)
return output
通过上面的操作,我们就可以感觉到已经做和数据相关的事情了,the importance of data types
It is important for you to at least have a rough idea of what kind of data you are dealing with. For instance, if you have read through all the files in the data folder and the description on the website, you should at least know that:
This dataset consists of 690 credit card applicants‘ personal information and whether or not they are approved for the credit card.
Each data entry has 15 attributes, and data types of each attribute are on the website
we see that A2, A3, A8, A11, A14, A15 are continuous (number)
All others are categorical (choices)
37 cases (5%) have one or more missing values
This dataset has 2 classes, positive and negative, meaning approved and declined
If you haven‘t already read through all these information, go back and try to capture and understand your dataset first
Here is the link:
https://archive.ics.uci.edu/ml/datasets/Credit+Approval
通过对数据文件和网站上的描述(By describing data folders and website )
我们已经了解了这些数据实际是干什么用的
也知道了python解析出来的每条数据对应的属性和分类
标签:desc local else unless host attribute already 进一步 http
原文地址:https://www.cnblogs.com/jcjc/p/10234540.html