用numpy处理大数据遇到的问题

时间：2019-08-17 18:30:49 阅读：6862 评论：0 收藏：0 [点我收藏+]

标签：大数据 exce enumerate 解决方案 pen _for oca lines 使用

在使用numpy读取一个四百多万行数据的.csv文件时抛出了如下异常：

numpy.core._exceptions.MemoryError: Unable to allocate array with shape (4566386, 23) and data type <U20

以下是我的源代码：

import numpy as np
import matplotlib.pyplot as mp
import sklearn.ensemble as se
import sklearn.metrics as sm
headers = None
data = []
with open (‘/home/tarena/桌面/i-80.csv‘,‘r‘) as f:
    for i,line in enumerate( f.readlines()):
        if i==0:
            headers=line.split(‘,‘)[2:]
        else:
            data.append(line.split(‘,‘)[2:])
headers = np.array(data)
data = np.array(data)
print(headers.shape)
print(data.shape)

以下是运行结果：

Traceback (most recent call last):
  File "/home/tarena/桌面/read_forest.py", line 13, in <module>
    headers = np.array(data)
numpy.core._exceptions.MemoryError: Unable to allocate array with shape (4566386, 23) and data type <U20

Process finished with exit code 1

虽然是报错，但是还是拿到了结果。

各位大佬们，有没有解决方案？

用numpy处理大数据遇到的问题

标签：大数据 exce enumerate 解决方案 pen _for oca lines 使用

原文地址：https://www.cnblogs.com/bitrees/p/11369327.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年07月29日 (22)
2021年07月28日 (40)
2021年07月27日 (32)
2021年07月26日 (79)
2021年07月23日 (29)
2021年07月22日 (30)
2021年07月21日 (42)
2021年07月20日 (16)
2021年07月19日 (90)
2021年07月16日 (35)

周排行