【转】hash_map介绍与使用

时间：2014-07-18 19:20:07 阅读：316 评论：0 收藏：0 [点我收藏+]

转自：http://blog.csdn.net/holybin/article/details/26050897

0 概述

虽然hash_map和map都是STL的一部分，但是目前的C++标准（C++11）中只有map而没有hash_map，可以说STL只是部分包含于目前的C++标准中。主流的GNU C++和MSVC++出于编译器扩展的目的实现了hash_map，SGI有hash_map的实现，Boost也有类似于hash_map的unordered_map实现，google有dense hash_map和sparse hash_map两种实现（前者重时间效率，后者重空间效率）。罗列如下：

（1）SGI的hash_map：http://www.sgi.com/tech/stl/hash_map.html。本文中以SGI的hash_map为例子进行说明。

（2）GNU C++的hash_map：

https://gcc.gnu.org/onlinedocs/libstdc++/libstdc++-html-USERS-4.1/class____gnu__cxx_1_1hash__map.html

（3）MSVC++的hash_map：msdn.microsoft.com/en-us/library/bb398039.aspx（VS中自带的就是这个）

（4）Boost的unordered_map（头文件：<boost/unordered_map.hpp>）：

http://www.boost.org/doc/libs/1_55_0/doc/html/boost/unordered_map.html

（5）google的sparsehash和densehash：http://code.google.com/p/sparsehash/source/browse/trunk/src/google/?r=98

1 与map定义对比

相同点：二者都是STL中的关联容器，且都具有以下两个性质：

a.键值形式，即元素类型都是键值形式：pair<const Key, Data>；

b.键唯一性，即没有两个元素有相同的键：key。

不同点：

hash_map是一种将对象的值data和键key关联起来的哈希关联容器（Hashed Associative Container），而map是一种将对象的值data和键key关联起来的排序关联容器（Sorted Associative Container）。所谓哈希关联容器是使用hash table实现的关联容器，它不同于一般的排序关联容器：哈希关联容器的元素没有特定的顺序，大部分操作的最差时间复杂度为O(n)，平均时间复杂度为常数，所以在不要求排序而只要求存取的应用中，哈希关联容器的效率要远远高于排序关联容器。

2 与map实现对比

map的底层是用红黑树实现的，操作的时间复杂度是O(log(n))级别；hash_map的底层是用hash table实现的，操作的时间复杂度是常数级别。

3 与map应用对比

在元素数量达到一定数量级时如果要求效率优先，则采用hash_map。但是注意：虽然hash_map 操作速度比map的速度快，但是hash函数以及解决冲突都需要额外的执行时间，且hash_map构造速度慢于map。其次，hash_map由于基于hash table，显然是空间换时间，因此hash_map对内存的消耗高于map。所以选择时需要权衡三个因素：速度，数据量，内存。

4 重点介绍hash_map

（1）hash_map原理

hash_map使用hash table来实现，首先分配内存，形成许多bucket用来存放元素，然后利用hash函数，对元素的key进行映射，存放到对应的bucket内。这其中hash函数用于定址，额外的比较函数用于解决冲突。该过程可以描述为：

a.计算元素的key

b.通过hash函数对key进行映射（常见的为取模），得到hash值，即为对应的bucket索引

c.存放元素的key和data在bucket内。

对应的查询过程是：

a.计算元素的key

b.通过hash函数对key进行映射（常见的为取模），得到hash值，即为对应的bucket索引

c.比较bucket内元素的key’与该key相等，若不相等则没有找到。

d.若相等则取出该元素的data。

所以实现hash_map最重要的两个东西就是hash函数和比较函数。以下以SGI的hash_map为例子进行说明。

（2）hash_map类定义

map构造时只需要比较函数（小于函数），hash_map构造时需要定义hash函数和比较函数（等于函数）。SGI中hash_map定义于stl_hash_map.h，定义为：

// Forward declaration of equality operator; needed for friend declaration.

template <class _Key, class _Tp,  
class _HashFcn  __STL_DEPENDENT_DEFAULT_TMPL(hash<_Key>),  
class _EqualKey __STL_DEPENDENT_DEFAULT_TMPL(equal_to<_Key>),  
class _Alloc =  __STL_DEFAULT_ALLOCATOR(_Tp) >  
class hash_map;  

......  

template <class _Key, class _Tp, class _HashFcn, class _EqualKey,  
class _Alloc>  
class hash_map  
{  
......  
}

其中，参数1和参数2分别为键和值，参数3和参数4分别为hash函数和比较函数，实际上STL中使用结构体来封装这两个函数，用户可以自定义这两个结构体，也可以采用提供的默认值。参数5是hash_map的allocator，用于内部内存管理。

下面分三种情况说明这2个函数的使用：默认hash和比较函数，自定义hash函数，自定义比较函数。

（3）默认hash和比较函数

// SGI hash_map definition
#include "hash_map.h"

int main()  
{  
//use class as Compare
    hash_map<const char*, int> months;  

    months["january"] = 31;  
    months["february"] = 28;  
    months["march"] = 31;  
    months["april"] = 30;  
    months["may"] = 31;  
    months["june"] = 30;  
    months["july"] = 31;  
    months["august"] = 31;  
    months["september"] = 30;  
    months["october"] = 31;  
    months["november"] = 30;  
    months["december"] = 31;  

return 0;  
}

从上面hash_map的定义可以看出，这里采用了默认的hash函数（hash<_Key>）和比较函数（equal_to<_Key>），对于这个例子：

hash_map<const char*, int> months;

就等同于

hash_map<const char*, int, hash<const char*>, equal_to< const char* >> months;

（4）自定义hash函数

首先SGI的STL提供了这些默认的hash函数，均定义在stl_hash_fun.h中：

//默认hash函数
struct hash<char*>  
struct hash<const char*>  
struct hash<char>   
struct hash<unsigned char>   
struct hash<signed char>  
struct hash<short>  
struct hash<unsigned short>   
struct hash<int>   
struct hash<unsigned int>  
struct hash<long>   
struct hash<unsigned long>

其次，自定义hash函数时，定义一个结构体，名字任意，结构体中重载operator()，参数为自定义键的类型的对象引用。在定义hash_map的时候，将该结构体传给第三个参数即可。假设自定义键的类型为KeyClass，则如下所示：

#include "hash_map.h"

struct my_hash  
{  
size_t operator()(const KeyClass& x) const
    {  
        ......  
    }  
};  

//hash_map定义
hash_map<KeyClass, ..., my_hash, ...> my_hash_map;

基于上面默认的例子，自定义一个string类型的hash函数：

#include "hash_map.h"

//直接调用系统定义的字符串hash函数"__stl_hash_string"：
struct str_hash  
{  
size_t operator()(const string& str) const
    {  
return return __stl_hash_string(str.c_str());  
    }  
};  
//或者自己写：
struct str_hash  
{  
size_t operator()(const string& str) const
    {  
        unsigned long __h = 0;  
for (size_t i = 0 ; i < str.size() ; i ++)  
            __h = 5*__h + str[i];  
return size_t(__h);  
    }  
};  

//上面默认例子中的months可以改成：
hash_map<string, int, str_hash> my_months;

（5）自定义比较函数

首先SGI的STL提供了默认的比较函数，定义在stl_function.h中：

//默认比较函数
template <class _Tp>  
struct equal_to : public binary_function<_Tp,_Tp,bool>   
{  
bool operator()(const _Tp& __x, const _Tp& __y) const { return __x == __y; }  
};  

// binary_function函数声明
template <class _Arg1, class _Arg2, class _Result>  
struct binary_function {  
typedef _Arg1 first_argument_type;  
typedef _Arg2 second_argument_type;  
typedef _Result result_type;  
};

其次，自定义比较函数时，有两种方法：重载operator==操作符实现元素（键值对）相等的比较；自定义比较函数结构体来重载operator()（与自定义hash函数类似）。

第一种方法，假设要比较的元素有两个字段iID和len：

struct my_element   
{   
int iID;   
int len;   
bool operator==(const my_element& e) const 
    {   
return (iID==e.iID) && (len==e.len) ;   
    }   
};

第二种方法：参数为两个自定义键类型的对象引用，在函数中实现对两个对象是否相等的比较。在定义hash_map的时候，将比较函数的结构体传给第四个参数即可。假设自定义键的类型为KeyClass，自定义hash函数结构体为my_hash ，自定义比较函数结构体为my_compare，则如下所示：

#include "hash_map.h" 

struct my_ compare   
{   
bool operator()(const KeyClass& x, const KeyClass& y) const 
    {   
//比较x和y 
    }   
};   

//hash_map定义 
hash_map<KeyClass, ..., my_hash, my_compare > my_hash_map;

同样的，基于上面默认的例子，采用第二种方法自定义比较函数：

#include "hash_map.h" 

//1.采用const char*作为键类型 
//注意：这里可以直接采用默认的hash<const char*>函数 

//定义const char*的比较函数 
struct str_compare   
{   
bool operator()(const char* s1, const char* s2) const 
  {   
return strcmp(s1, s2) == 0;   
  }   
};   

//上面默认例子中的months可以改成： 
hash_map<const char*, int, hash<const char*>, str_compare> my_months;   

///////////////////////////////////////////////////////////////// 

//2.采用string作为键类型 
//注意：这里需要同时定义string的hash函数 
struct str_hash   
{   
size_t operator()(const string& str) const 
    {   
return return __stl_hash_string(str.c_str());   
    }   
};   
//定义string的比较函数 
struct str_compare   
{   
bool operator()(const string& s1, const string& s2) const 
  {   
return strcmp(s1.c_str(), s2.c_str()) == 0;   
  }   
};   

//上面默认例子中的months可以改成： 
hash_map<string, int, str_hash, str_compare> my_months;

（6）其他hash_map成员函数

hash_map的函数和map的函数差不多。具体函数的参数和解释，请参看SGI的hash_map介绍，这里主要介绍几个常用函数：

hash_map(size_type n)：如果讲究效率，这个参数是必须要设置的。n 主要用来设置hash_map 容器中hash桶的个数。桶个数越多，hash函数发生冲突的概率就越小，重新申请内存的概率就越小。n越大，效率越高，但是内存消耗也越大。

const_iterator find(const key_type& k)const：用查找，输入为键值，返回为迭代器。

data_type& operator[](constkey_type& k) ：像数组一样随机访问元素。注意：当使用[key]操作符时，如果容器中没有key元素，这就相当于自动增加了一个key元素（等同于插入操作）。所以如果只是想知道容器中是否有key元素时，可以使用find函数。

insert函数：在容器中不包含key值时，insert函数和[]操作符的功能差不多。但是当容器中元素越来越多，每个桶中的元素会增加，为了保证效率， hash_map会自动申请更大的内存，以生成更多的桶。因此在insert以后，以前的iterator有可能是不可用的。

erase 函数：在insert的过程中，当每个桶的元素太多时，hash_map可能会自动扩充容器的内存，但在SGI的STL中erase并不自动回收内存。因此你调用erase后，其他元素的iterator还是可用的。

最后如何具体使用hash_map就忽略了，可以参考详细解说STL hash_map系列[blog.163.com/liuruigong_lrg/blog/static/27370306200711334341781]。

5 其他hash容器类

比如hash_set，hash_multimap，hash_multiset，这些容器与set，multimap，multiset的区别同hash_map与map的区别一样。

6 hash_map性能测试

【引用自：各类 C++ hashmap 性能测试总结_雨之絮语_百度空间】

本文一开头提到了不同的hash_map的实现，这里测试各hash_map插入数据和查找数据两类操作的性能：设定hash_map的key 和 value 均使用 int，也就是 map<int, int>的形式。经过对比，插入 1000万次和查找1万次，各种实现的性能如下图：

bubuko.com,布布扣

（红圈部分为插入 1000万条记录需要的时间，绿圈部分是查找1万次需要的时间）

如上图所示，插入需要的时间大致在 1~4秒之间，查询所需要的时间比较少。对比各类实现的性能，boost::unordered_map 在综合性能上比较好。google::dense_hash_map在内存占用上非常出色，查找速度极快，插入速度比boost::unordered_map慢。至于 Visual Studio 2010 自带的std::map和std::hash_map的实现，只能用惨不忍睹来形容。

【备注】

测试环境：

CPU: Duo T6600

Memory: 4GB

软件版本：

Visual Studio 2010

boost 1.48.0

google sparsehash 1.11

【转】hash_map介绍与使用,布布扣,bubuko.com

【转】hash_map介绍与使用

标签：style blog http color 使用 strong

原文地址：http://www.cnblogs.com/dy-techblog/p/3850772.html