码迷,mamicode.com
首页 > 编程语言 > 详细

python数据分析——数据筛选实例【茅台】

时间:2019-09-30 12:49:01      阅读:90      评论:0      收藏:0      [点我收藏+]

标签:sam   ram   div   wrap   分析   panda   bool   事件   span   

import tushare as ts
import pandas as pd
from pandas import DataFrame,Series
In [3]:
df = ts.get_k_data(‘600519‘,start=‘1999-01-01‘)
In [5]:
#将df存储到本地
df.to_csv(‘./maotai.csv‘)
In [9]:
#将date这列数据类型转成事件序列,然后将改列作为原数据的行索引
df = pd.read_csv(‘./maotai.csv‘,index_col=‘date‘,parse_dates=[‘date‘])
df.drop(labels=‘Unnamed: 0‘,axis=1,inplace=True)
df.head(5)
Out[9]:
 openclosehighlowvolumecode
date      
2001-08-27 5.392 5.554 5.902 5.132 406318.00 600519
2001-08-28 5.467 5.759 5.781 5.407 129647.79 600519
2001-08-29 5.777 5.684 5.781 5.640 53252.75 600519
2001-08-30 5.668 5.796 5.860 5.624 48013.06 600519
2001-08-31 5.804 5.782 5.877 5.749 23231.48 600519
In [26]:
#输出该股票所有收盘比开盘上涨3%以上的日期。
s = (df[‘close‘] - df[‘open‘])/df[‘open‘] > 0.03
df.loc[s].index
Out[26]:
DatetimeIndex([‘2001-08-27‘, ‘2001-08-28‘, ‘2001-09-10‘, ‘2001-12-21‘,
               ‘2002-01-18‘, ‘2002-01-31‘, ‘2003-01-14‘, ‘2003-10-29‘,
               ‘2004-01-05‘, ‘2004-01-14‘,
               ...
               ‘2018-08-27‘, ‘2018-09-18‘, ‘2018-09-26‘, ‘2018-10-19‘,
               ‘2018-10-31‘, ‘2018-11-13‘, ‘2018-12-28‘, ‘2019-01-15‘,
               ‘2019-02-11‘, ‘2019-03-01‘],
              dtype=‘datetime64[ns]‘, name=‘date‘, length=294, freq=None)
In [20]:
#输出该股票所有开盘比前日收盘跌幅超过2%的日期。
s1 = (df[‘open‘] - df[‘close‘].shift(1))/df[‘close‘].shift(1) < -0.02
s1
Out[20]:
date
2001-08-27    False
2001-08-28    False
2001-08-29    False
2001-08-30    False
2001-08-31    False
2001-09-03    False
2001-09-04    False
2001-09-05    False
2001-09-06    False
2001-09-07    False
2001-09-10    False
2001-09-11    False
2001-09-12     True
2001-09-13    False
2001-09-14    False
2001-09-17    False
2001-09-18    False
2001-09-19    False
2001-09-20    False
2001-09-21    False
2001-09-24    False
2001-09-25    False
2001-09-26    False
2001-09-27    False
2001-09-28    False
2001-10-08    False
2001-10-09    False
2001-10-10    False
2001-10-11    False
2001-10-12    False
              ...  
2019-01-16    False
2019-01-17    False
2019-01-18    False
2019-01-21    False
2019-01-22    False
2019-01-23    False
2019-01-24    False
2019-01-25    False
2019-01-28    False
2019-01-29    False
2019-01-30    False
2019-01-31    False
2019-02-01    False
2019-02-11    False
2019-02-12    False
2019-02-13    False
2019-02-14    False
2019-02-15    False
2019-02-18    False
2019-02-19    False
2019-02-20    False
2019-02-21    False
2019-02-22    False
2019-02-25    False
2019-02-26    False
2019-02-27    False
2019-02-28    False
2019-03-01    False
2019-03-04    False
2019-03-05    False
Length: 4174, dtype: bool
In [23]:
df.loc[s1].index
Out[23]:
DatetimeIndex([‘2001-09-12‘, ‘2002-06-26‘, ‘2002-12-13‘, ‘2004-07-01‘,
               ‘2004-10-29‘, ‘2006-08-21‘, ‘2006-08-23‘, ‘2007-01-25‘,
               ‘2007-02-01‘, ‘2007-02-06‘, ‘2007-03-19‘, ‘2007-05-21‘,
               ‘2007-05-30‘, ‘2007-06-05‘, ‘2007-07-27‘, ‘2007-09-05‘,
               ‘2007-09-10‘, ‘2008-03-13‘, ‘2008-03-17‘, ‘2008-03-25‘,
               ‘2008-03-27‘, ‘2008-04-22‘, ‘2008-04-23‘, ‘2008-04-29‘,
               ‘2008-05-13‘, ‘2008-06-10‘, ‘2008-06-13‘, ‘2008-06-24‘,
               ‘2008-06-27‘, ‘2008-08-11‘, ‘2008-08-19‘, ‘2008-09-23‘,
               ‘2008-10-10‘, ‘2008-10-15‘, ‘2008-10-16‘, ‘2008-10-20‘,
               ‘2008-10-23‘, ‘2008-10-27‘, ‘2008-11-06‘, ‘2008-11-12‘,
               ‘2008-11-20‘, ‘2008-11-21‘, ‘2008-12-02‘, ‘2009-02-27‘,
               ‘2009-03-25‘, ‘2009-08-13‘, ‘2010-04-26‘, ‘2010-04-30‘,
               ‘2011-08-05‘, ‘2012-03-27‘, ‘2012-08-10‘, ‘2012-11-22‘,
               ‘2012-12-04‘, ‘2012-12-24‘, ‘2013-01-16‘, ‘2013-01-25‘,
               ‘2013-09-02‘, ‘2014-04-25‘, ‘2015-01-19‘, ‘2015-05-25‘,
               ‘2015-07-03‘, ‘2015-07-08‘, ‘2015-07-13‘, ‘2015-08-24‘,
               ‘2015-09-02‘, ‘2015-09-15‘, ‘2017-11-17‘, ‘2018-02-06‘,
               ‘2018-02-09‘, ‘2018-03-23‘, ‘2018-03-28‘, ‘2018-07-11‘,
               ‘2018-10-11‘, ‘2018-10-24‘, ‘2018-10-25‘, ‘2018-10-29‘,
               ‘2018-10-30‘],
              dtype=‘datetime64[ns]‘, name=‘date‘, freq=None)
In [33]:
#假如我从2010年1月1日开始,每月第一个交易日买入1手股票,每年最后一个交易日卖出所有股票,到今天为止,我的收益如何?
data = df[‘2010‘:‘2019‘]
data.head()
Out[33]:
 openclosehighlowvolumecode
date      
2010-01-04 109.760 108.446 109.760 108.044 44304.88 600519
2010-01-05 109.116 108.127 109.441 107.846 31513.18 600519
2010-01-06 107.840 106.417 108.165 106.129 39889.03 600519
2010-01-07 106.417 104.477 106.691 103.302 48825.55 600519
2010-01-08 104.655 103.379 104.655 102.167 36702.09 600519
In [35]:
#对数据进行重新取样
monthes = data.resample(‘M‘).first()
Out[35]:
 openclosehighlowvolumecode
date      
2010-01-31 109.760 108.446 109.760 108.044 44304.88 600519
2010-02-28 107.769 107.776 108.216 106.576 29655.94 600519
2010-03-31 106.219 106.085 106.857 105.925 21734.74 600519
2010-04-30 101.324 102.141 102.422 101.311 23980.83 600519
2010-05-31 81.676 82.091 82.678 80.974 23975.16 600519
2010-06-30 84.075 84.637 85.166 83.278 23525.57 600519
2010-07-31 81.586 81.057 81.586 80.725 7449.69 600519
2010-08-31 89.296 92.465 93.567 89.296 42965.73 600519
2010-09-30 102.288 101.052 103.834 100.420 25589.00 600519
2010-10-31 108.858 111.776 113.045 108.858 31608.00 600519
2010-11-30 105.122 105.483 106.217 104.478 49658.00 600519
2010-12-31 130.759 130.372 133.980 128.839 38016.00 600519
2011-01-31 120.388 119.487 121.033 117.619 60462.00 600519
2011-02-28 114.656 116.124 117.136 114.334 17758.00 600519
2011-03-31 115.113 115.068 115.899 114.527 23059.00 600519
2011-04-30 115.931 115.596 116.813 115.010 18227.00 600519
2011-05-31 117.876 119.718 120.324 117.232 41270.00 600519
2011-06-30 131.467 135.197 135.255 130.772 39093.00 600519
2011-07-31 137.895 137.272 137.895 136.033 15810.00 600519
2011-08-31 148.519 145.689 148.519 145.274 18572.00 600519
2011-09-30 153.798 151.491 154.357 150.789 25689.00 600519
2011-10-31 136.735 134.672 137.394 134.407 14283.00 600519
2011-11-30 145.546 148.418 148.640 145.066 27147.00 600519
2011-12-31 152.724 152.466 154.149 150.625 33785.00 600519
2012-01-31 137.179 132.716 138.089 132.523 33878.00 600519
2012-02-29 133.382 133.347 135.044 132.437 24824.00 600519
2012-03-31 145.789 145.116 147.186 144.980 16920.00 600519
2012-04-30 140.474 147.566 147.852 140.474 40960.00 600519
2012-05-31 161.893 161.878 162.301 159.027 35339.00 600519
2012-06-30 170.489 172.187 174.071 169.565 38504.00 600519
... ... ... ... ... ... ...
2016-10-31 289.945 294.268 295.142 289.945 21103.00 600519
2016-11-30 308.828 307.594 309.847 306.496 19512.00 600519
2016-12-31 310.819 311.810 315.277 309.537 27941.00 600519
2017-01-31 324.689 324.961 327.331 323.261 20763.00 600519
2017-02-28 336.073 336.898 339.162 335.102 20936.00 600519
2017-03-31 344.912 346.378 347.923 344.815 21944.00 600519
2017-04-30 374.595 378.480 383.657 373.954 40789.00 600519
2017-05-31 399.791 400.257 403.093 397.440 23291.00 600519
2017-06-30 429.804 436.390 437.040 428.357 40604.00 600519
2017-07-31 460.595 447.161 460.595 444.762 49628.00 600519
2017-08-31 474.266 473.517 476.238 470.716 27501.00 600519
2017-09-30 483.337 488.248 489.332 483.170 31451.00 600519
2017-10-31 517.196 520.381 528.052 512.819 34891.00 600519
2017-11-30 612.188 614.288 622.698 610.551 43435.00 600519
2017-12-31 629.078 613.637 630.044 611.448 47192.00 600519
2018-01-31 690.200 693.996 700.218 680.232 49612.00 600519
2018-02-28 756.262 747.122 756.558 742.379 50582.00 600519
2018-03-31 717.808 731.582 736.394 713.637 44794.00 600519
2018-04-30 670.480 670.539 681.326 664.673 32039.00 600519
2018-05-31 650.760 658.480 659.624 636.029 70259.00 600519
2018-06-30 740.614 734.679 744.410 728.417 36177.00 600519
2018-07-31 734.520 711.550 739.330 703.000 37558.00 600519
2018-08-31 731.400 714.940 732.300 714.110 25237.00 600519
2018-09-30 652.000 666.210 667.670 650.800 30179.00 600519
2018-10-31 715.410 686.150 719.000 686.150 82745.00 600519
2018-11-30 555.000 563.000 585.500 551.250 98106.00 600519
2018-12-31 589.000 601.200 605.000 584.770 83414.00 600519
2019-01-31 609.980 598.980 612.000 595.010 62286.00 600519
2019-02-28 697.040 692.670 699.000 689.610 30520.00 600519
2019-03-31 761.500 789.300 790.000 761.000 63840.00 600519

111 rows × 6 columns

In [36]:
years = data.resample(‘Y‘).last()
Out[36]:
 openclosehighlowvolumecode
date      
2010-12-31 117.103 118.469 118.701 116.620 46084.0 600519
2011-12-31 138.039 138.468 139.600 136.105 29460.0 600519
2012-12-31 155.208 152.087 156.292 150.144 51914.0 600519
2013-12-31 93.188 96.480 97.179 92.061 57546.0 600519
2014-12-31 157.642 161.056 161.379 157.132 46269.0 600519
2015-12-31 207.487 207.458 208.704 207.106 19673.0 600519
2016-12-31 317.239 324.563 325.670 317.239 34687.0 600519
2017-12-31 707.948 687.725 716.329 681.918 76038.0 600519
2018-12-31 563.300 590.010 596.400 560.000 63678.0 600519
2019-12-31 785.000 779.780 789.550 775.880 44830.0 600519 

python数据分析——数据筛选实例【茅台】

标签:sam   ram   div   wrap   分析   panda   bool   事件   span   

原文地址:https://www.cnblogs.com/bilx/p/11611899.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!