标签:
1. Reading data into NumPy
NumPy is a Python module that has a lot of functions for working with data. If you want to do serious work with data in Python, you‘ll be using a lot of NumPy. We‘ll work through importing NumPy and loading in a csv file.
2. Fixing the data types
If you looked at the data you read in last screen, you may have noticed that it looked very strange. This is because genfromtxt reads the data into a?NumPy?array. Every element in an array has to be the same data type. So everything is a string, or everything is an integer, and so on. NumPy?tried to convert all of our data to floats, which caused the values to become strange. We‘ll need to specify the data type when we read our data in so we can avoid that.
3. Indexing the data
Now that we know how to read in a file, let‘s start pulling values out. Remember how all elements in a matrix have an index? We can print the item at row 1, column 2, by typing?print world_alcohol[0,1]
4. Vectors
When we grab a whole row or column from the matrix, we actually end up with a vector. Just like a matrix is a 2-dimensional array because it has rows and columns, a vector is a 1-dimensional array. Vectors are similar to Python lists in that they can be indexed with only one number. Think of a vector as just a single row, or a single column.
5. Array shape
All arrays, whether they are 1-dimensional (vectors), two dimensional (matrices), or even larger, have a number of elements in each dimension. For example, a matrix may have 200 rows and 10 columns. We can use the?shape?method to find these dimensions.
6. Boolean elements
We can also use boolean statements on arrays to get truth values. The interesting part about this is that the booleans are computed elementwise.
The above code will actually compare each element of the fourth column of?world_alcohol, check if it equals?"Beer", and create a new vector with the True/False values.
7. Subsets of vectors
We can subset vectors based on boolean vectors like the ones we generated in the last screen.
The code above will select and print only the elements in the fourth column whose value is "Beer". world_alcohol[:,3][beer]?goes through each position in the fourth column vector (from 0 to the last index), and checks if the beer vector is True at the same position. If the beer vector is True, it assigns the element of the fourth column at that position to the subset. If the beer vector is False, the element is skipped.
8. Subsets of matrices
We can subset a matrix in the same way that we can subset a vector.
The above code will print all of the rows in?world_alcohol?where the "Type" column equals?"Beer". Note how because matrices are indexed using two numbers, we are substituting the boolean vector?beer?for the first number. We can alter the second number to select different columns.
The above code would select the second column where the "Type" column equals?"Beer".
9. Subsets with multiple conditions
So now we can find all of the rows that correspond to?"Algeria", for example. But what if what we really want is to find all the rows for?"Algeria"?in?"1985"?
We‘ll have to use multiple conditions to generate our vector.
The code above will generate a boolean that uses multiple conditions. How it works is that the parentheses specify that the two component vectors should be generated first. (order of operations)Then the two vectors will be compared index by index. If both vectors are True at index 1, then the resulting vector will be True at index 1. If either vector is False at index 1, the result will be False at index 1. Here‘s an expanded example:
We can add more than 2 conditions if we want -- we just have to put an?&?symbol between each one. The resulting vector will contain?True?in the position corresponding to rows where all conditions are True, and?False for rows where any condition is False.
10. Convert a column to floats
We now know almost everything we need to compute how much alcohol the people in a country drank in a given year! But there are a couple of things we need to work through first. First, we need to convert the?"Liters of alcohol drunk"?column (the fifth one) to floats. We need to do this because they are?strings?now, and we can‘t take the sum of strings. Because they aren‘t numeric, their sum wouldn‘t make much sense. We can use the?astype?method on the array to do this.
11. Replace values in an array
There are values in our alcohol consumption column that are preventing us from converting the column from floats to strings. In order to fix this, we first have to learn how to replace values. We can replace values in a?NumPy array?by just assigning to them with the equals sign.
The code above will replace any item in the alcohol consumption column that contains ‘0‘ (remember that the world alcohol matrix is all?string?values) with ‘10‘.
12. Convert the alcohol consumption column to floats
Now that you know what the bad value is, we can replace it and then convert the column to floats.
13. Compute the total alcohol consumption
We can compute the total value of a column using the?sum?method.
14.?Finding how much alcohol a person in a country drank in a year
We can subset a vector with another vector, as we learned earlier. This means that we can find the total alcohol consumed by any given country in any given year now.
15. A function to sum yearly alcohol consumption
Now that we know how to find the total alcohol consumption of the average person in a country in a given year, we can make a function out of it. A function will make it easier for us to calculate the alcohol consumption for all countries.
?16. Finding the country that drinks the least
We can now loop over our dictionary keys to find the country with the lowest amount of alcohol consumed per person in 1989.
Data Analysis with Pandas-(1)-Getting started with matrices
标签:
原文地址:http://www.cnblogs.com/yuehq/p/4937906.html