码迷,mamicode.com
首页 > 编程语言 > 详细

Python字典实现

时间:2014-09-05 22:21:42      阅读:607      评论:0      收藏:0      [点我收藏+]

标签:http   os   io   使用   ar   strong   for   art   cti   

  • Python dictionaries are implemented as hash tables.
  • Hash tables must allow for hash collisions i.e. even if two keys have same hash value, the implementation of the table must have a strategy to insert and retrieve the key and value pairs unambiguously.
  • Python dict uses open addressing to resolve hash collisions (explained below) (see dictobject.c:296-297).
  • Python hash table is just a continguous block of memory (sort of like an array, so you can do O(1)lookup by index).
  • Each slot in the table can store one and only one entry. This is important
  • Each entry in the table actually a combination of the three values - . This is implemented as a C struct (see dictobject.h:51-56)
  • The figure below is a logical representation of a python hash table. In the figure below, 0, 1, ..., i, ... on the left are indices of the slots in the hash table (they are just for illustrative purposes and are not stored along with the table obviously!).

    # Logical model of Python Hash table
    -+-----------------+
    0| <hash|key|value>|
    -+-----------------+
    1|      ...        |
    -+-----------------+
    .|      ...        |
    -+-----------------+
    i|      ...        |
    -+-----------------+
    .|      ...        |
    -+-----------------+
    n|      ...        |
    -+-----------------+
  • When a new dict is initialized it starts with 8 slots. (see dictobject.h:49)

  • When adding entries to the table, we start with some slot, i that is based on the hash of the key. CPython uses initial i = hash(key) & mask. Where mask = PyDictMINSIZE - 1, but that‘s not really important). Just note that the initial slot, i, that is checked depends on the hash of the key.
  • If that slot is empty, the entry is added to the slot (by entry, I mean, <hash|key|value>). But what if that slot is occupied!? Most likely because another entry has the same hash (hash collision!)
  • If the slot is occupied, CPython (and even PyPy) compares the the hash AND the key (by compare I mean == comparison not the is comparison) of the entry in the slot against the key of the current entry to be inserted (dictobject.c:337,344-345). If both match, then it thinks the entry already exists, gives up and moves on to the next entry to be inserted. If either hash or the key don‘t match, it startsprobing.
  • Probing just means it searches the slots by slot to find an empty slot. Technically we could just go one by one, i+1, i+2, ... and use the first available one (that‘s linear probing). But for reasons explained beautifully in the comments (see dictobject.c:33-126), CPython uses random probing. In random probing, the next slot is picked in a pseudo random order. The entry is added to the first empty slot. For this discussion, the actual algorithm used to pick the next slot is not really important (seedictobject.c:33-126 for the algorithm for probing). What is important is that the slots are probed until first empty slot is found.
  • The same thing happens for lookups, just starts with the initial slot i (where i depends on the hash of the key). If the hash and the key both don‘t match the entry in the slot, it starts probing, until it finds a slot with a match. If all slots are exhausted, it reports a fail.
  • BTW, the dict will be resized if it is two-thirds full. This avoids slowing down lookups. (seedictobject.h:64-65)

Python实现的规则是:
初始情况下,dict的hash table大小为8(PyDict_MINSIZE常量),当dict的hash table使用率达到2/3的时候,就会resize以保证较少的index碰撞。当key的数量小于50k,size*4;当key的数量大于50k,size*2。需要注意的是每次resize的时候,所有的key会被重新插入(从上文的探测算法角度来说,i改变了,index需要重新计算),所以key的顺序很有可能又改变了。

英文部分http://stackoverflow.com/questions/327311/how-are-pythons-built-in-dictionaries-implemented

中文部分http://zhoutall.com/archives/497

具体实现http://www.laurentluce.com/posts/python-dictionary-implementation/

Python字典实现

标签:http   os   io   使用   ar   strong   for   art   cti   

原文地址:http://www.cnblogs.com/zxpgo/p/3958700.html

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!