本文整理自:Machine Learning Plus
Numpy是Python中最基本和最强大的科学计算和数据处理软件包。
numpy提供了优秀的ndarray对象,n维数组的简称。
在‘ndarray‘对象中,又名‘数组‘,可以存储多个相同数据类型的项目。正是这些围绕数组对象的工具使numpy便于执行数学和数据操作。
你可能会想,‘我可以将数字和其他对象存储在Python列表中,并通过列表解析,for循环等方式进行各种计算和操作。我需要一个numpy数组用于什么?‘
那么,使用numpy数组相对于列表有非常显着的优势。
如何创建numpy 数组
1. 从已有数据中创建:
1 # create from list 2 a = np.array([1, 2, 3, 4]) 3 print ‘a is:‘, a 4 5 #create from tuple 6 ‘‘‘ 7 tuple can‘t change after initialization; 8 tuple1 = (1,) 9 number1 = (1) 10 ‘‘‘ 11 b = np.array((1, 2, 3, 4)) 12 print ‘b is:‘, b 13 14 #load from file 15 #text saved by this method is not readable 16 from tempfile import TemporaryFile 17 18 origin_array = np.array([1, 2, 3, 4]) 19 np.save(‘/tmp/array‘, origin_array) 20 21 array_from_file = np.load(‘/tmp/array.npy‘) 22 print array_from_file 23 24 #text saved by this method is readable 25 origin_array = np.array([1, 2, 3, 4]) 26 np.savetxt(‘/tmp/array.txt‘, origin_array) 27 28 array_from_file = np.loadtxt(‘/tmp/array.txt‘) 29 print array_from_file 30 31 #read from string 32 array = np.fromstring(‘1 2 3 4‘, dtype=float, sep=‘ ‘) 33 #best practice is explicitly indicate dtye
2. 创建矩阵
#一维数组 #创建给定形状的多维数组并将数组中所有元素填充为 1 print np.ones((3, 4)) #创建给定形状的多维数组并将数组中所有元素填充为 0 print np.zeros((3, 4)) #创建给定形状的多维数组,但不进行初始化,得到的多维数组中的元素值是不确定的 print np.empty((3, 4)) #创建给定形状的多维数组并将数组中所有元素填充为指定值 print np.full((3, 4), 17) #从numerical range 创建多维数组 #创建一个一维的数组, arange(start, stop[, step]) print np.arange(10) print np.arange(9, -1, -1) #给定一个区间,返回等差数列 print np.linspace(start, stop, num = 50, endpoint=True..是否加上最后一个数, retstep=False..是否返回间距, dtype=None) #给定一个区间,返回等比数列 print np.logspace(start, stop, num=50, endpoint=True, base=10.0, dtype=None) ## the sequence starts at base ** start (base to the power of start) and ends with base ** stop (see endpoint below). #创建矩阵(二维数组) #创建一个对角矩阵或者 super/sub diagional square matrix,且所指定的对角线上的元素值为 1. numpy.eye(N, M=None, k=0, dtype=<type ‘float‘>) #k : int, optional :Index of the diagonal: 0 (the default) refers to the main diagonal, a positive value refers to an upper diagonal, and a negative value to a lower diagonal. #创建单位矩阵 numpy.identity(n, dtype=None)[source] #创建对角矩阵或 super/sub diagional matrix。 ‘‘‘ 与 eye 的不同之处在于: 对角线上的元素值不是都为 1 ,而是手动指定 不需要制定矩阵的形状,而是靠指定对角线上元素值来确定矩阵的形状 ‘‘‘ numpy.diag(v, k=0) ‘‘‘ v : array_like If v is a 2-D array, return a copy of its k-th diagonal. If v is a 1-D array, return a 2-D array with v on the k-th diagonal. k : int, optional Diagonal in question. The default is 0. Use k>0 for diagonals above the main diagonal, and k<0 for diagonals below the main diagonal. ‘‘‘
narray与Python list的主要区别是:
- 数组支持向量化操作,而列表不支持。
list + 2 # error narray + 2 # every element add 2
- 创建数组后,您无法更改其大小。你将不得不创建一个新的数组或覆盖现有的数组。
- 每个数组都只有一个dtype。其中的所有项目都应该是dtype。
# how to change data type narray1.astype(‘int‘)
- 等价的numpy数组占用的空间比列表的python列表少得多。
# Create an object array to hold numbers as well as strings arr1d_obj = np.array([1, ‘a‘], dtype=‘object‘) # Convert an array back to a list arr1d_obj.tolist()
获取ndarray 的属性
ndarray 的属性:
T |
Same as self.transpose(), except that self is returned if self.ndim < 2. |
data |
Python buffer object pointing to the start of the array’s data. |
dtype |
Data-type of the array’s elements. |
flags |
Information about the memory layout of the array. |
flat |
A 1-D iterator over the array. |
imag |
The imaginary part of the array. |
real |
The real part of the array. |
size |
Number of elements in the array. 一个数 |
itemsize |
Length of one array element in bytes. |
nbytes |
Total bytes consumed by the elements of the array. |
ndim |
Number of array dimensions. 维度:二维矩阵 之类 |
shape |
Tuple of array dimensions. (3, 2) |
strides |
Tuple of bytes to step in each dimension when traversing an array. |
ctypes |
An object to simplify the interaction of the array with the ctypes module. |
base |
Base object if memory is from some other object. |
#example print(ndarray.shape)
获取特定的元素
arr2 #> array([[ 1., 2., 3., 4.], #> [ 3., 4., 5., 6.], #> [ 5., 6., 7., 8.]])
# Extract the first 2 rows and columns arr2[:2, :2]
#> array([[ 1., 2.],
#> [ 3., 4.]])
# Get the boolean output by applying the condition to each element.
b = arr2 > 4
#> array([[False, False, False, False], #> [False, False, True, True], #> [ True, True, True, True]], dtype=bool)
# Reverse only the row positions
arr2[::-1, ]
# Reverse the row and column positions
arr2[::-1, ::-1]
# Insert a nan and an inf arr2[1,1] = np.nan # not a number arr2[1,2] = np.inf # infinite arr2 #> array([[ 1., 2., 3., 4.], #> [ 3., nan, inf, 6.], #> [ 5., 6., 7., 8.]]) # Replace nan and inf with -1. Don‘t use arr2 == np.nan missing_bool = np.isnan(arr2) | np.isinf(arr2) arr2[missing_bool] = -1 arr2 #> array([[ 1., 2., 3., 4.], #> [ 3., -1., -1., 6.], #> [ 5., 6., 7., 8.]])
ndarray 的复制
如果只是将数组的一部分分配给另一个数组,那么刚刚创建的新数组实际上是指内存中的父数组。
这意味着,如果对新数组进行更改,它也会反映到父数组中。
因此为了避免干扰父数组,需要使用copy()复制它。所有numpy数组都附带copy()方法。
# Assign portion of arr2 to arr2a. Doesn‘t really create a new array. arr2a = arr2 [:2 ,:2 ] arr2a [:1 ,:1 ] = 100 # 100 will reflect in arr2 ARR2 #> array([[ 100., 2., 3., 4.], #> [ 3., -1., -1., 6.], #> [ 5., 6., 7., 8.]]) # Copy portion of arr2 to arr2b arr2b = arr2 [:2 ,:2 ]。copy () arr2b [:1 ,:1 ] = 101 # 101 will not reflect in arr2 ARR2 #> array([[ 100., 2., 3., 4.], #> [ 3., -1., -1., 6.], #> [ 5., 6., 7., 8.]])
重塑和平整多维数组
ravel和flatten之间的区别在于,使用ravel创建的新数组实际上是对父数组的引用。所以,对新数组的任何更改都会影响父级。但是由于不创建副本,所以内存效率很高。
数据生成
np.tile
将重复整个列表或数组n次。而np.repeat重复每个元素n次.
a = [1,2,3] # Repeat whole of ‘a‘ two times print(‘Tile: ‘, np.tile(a, 2)) # Repeat each element of ‘a‘ two times print(‘Repeat: ‘, np.repeat(a, 2)) #> Tile: [1 2 3 1 2 3] #> Repeat: [1 1 2 2 3 3]
生成随机数:
# Random numbers between [0,1) of shape 2,2 print(np.random.rand(2,2)) # Normal distribution with mean=0 and variance=1 of shape 2,2 print(np.random.randn(2,2)) # Random integers between [0, 10) of shape 2,2 print(np.random.randint(0, 10, size=[2,2])) # One random number between [0,1) print(np.random.random()) # Random numbers between [0,1) of shape 2,2 print(np.random.random(size=[2,2])) # Pick 10 items from a given list, with equal probability print(np.random.choice([‘a‘, ‘e‘, ‘i‘, ‘o‘, ‘u‘], size=10)) # Pick 10 items from a given list with a predefined probability ‘p‘ print(np.random.choice([‘a‘, ‘e‘, ‘i‘, ‘o‘, ‘u‘], size=10, p=[0.3, .1, 0.1, 0.4, 0.1])) # picks more o‘s #> [[ 0.84 0.7 ] #> [ 0.52 0.8 ]] #> [[-0.06 -1.55] #> [ 0.47 -0.04]] #> [[4 0] #> [8 7]] #> 0.08737272424956832 #> [[ 0.45 0.78] #> [ 0.03 0.74]] #> [‘i‘ ‘a‘ ‘e‘ ‘e‘ ‘a‘ ‘u‘ ‘o‘ ‘e‘ ‘i‘ ‘u‘] #> [‘o‘ ‘a‘ ‘e‘ ‘a‘ ‘a‘ ‘o‘ ‘o‘ ‘o‘ ‘a‘ ‘o‘]