码迷,mamicode.com
首页 > 其他好文 > 详细

malloc和free的内存到底有多大?——GNU glib库

时间:2015-04-17 23:54:51      阅读:402      评论:0      收藏:0      [点我收藏+]

标签:malloc   free   glib   c   c++   

大家应该都比较熟悉这一点:malloc分配的内存一定大于用户指定的大小!而且很多人也问过这样的问题:到底大多少?以及实际上malloc到底分配了多少?

我们知道这个大小一定在某个“神奇”地方记录着,但是就像自己的“思维”一样,你确无法感知!不过,这是错觉,只是我们习惯了只使用,而没有深入剖析源码,在这里我将揭开这个面纱,去掉其透明化!

声明:源码基于GNU glib库的2.7版本的malloc目录下相关文件

再声明:不同的C库实现方式不一定一样,这里是glib库,如果你想知道window的或者其他,请Alt + F4


摘要

malloc.c中开篇注释表达一种观点:这里的算法不一定是最好的,但是应该是普遍适用的

此文件包含的函数实现,以及Vital statistics,Alignment,Minimum/Maximum allocated size,最后注明:我是线程安全的,骚年call me!@_@

/*
* Why use this malloc?

  This is not the fastest, most space-conserving, most portable, or
  most tunable malloc ever written. However it is among the fastest
  while also being among the most space-conserving, portable and tunable.
  Consistent balance across these factors results in a good general-purpose
  allocator for malloc-intensive programs.

  The main properties of the algorithms are:
  * For large (>= 512 bytes) requests, it is a pure best-fit allocator,
    with ties normally decided via FIFO (i.e. least recently used).
  * For small (<= 64 bytes by default) requests, it is a caching
    allocator, that maintains pools of quickly recycled chunks.
  * In between, and for combinations of large and small requests, it does
    the best it can trying to meet both goals at once.
  * For very large requests (>= KB by default), it relies on system
    memory mapping facilities, if supported.

  For a longer but slightly out of date high-level description, see
     http://gee.cs.oswego.edu/dl/html/malloc.html

  You may already by default be using a C library containing a malloc
  that is  based on some version of this malloc (for example in
  linux). You might still want to use the one in this file in order to
  customize settings or to avoid overheads associated with library
  versions.

* Contents, described in more detail in "description of public routines" below.

  Standard (ANSI/SVID/...)  functions:
    malloc(size_t n);
    calloc(size_t n_elements, size_t element_size);
    free(void* p);
    realloc(void* p, size_t n);
    memalign(size_t alignment, size_t n);
    valloc(size_t n);
    mallinfo()
    mallopt(int parameter_number, int parameter_value)

  Additional functions:
    independent_calloc(size_t n_elements, size_t size, void* chunks[]);
    independent_comalloc(size_t n_elements, size_t sizes[], void* chunks[]);
    pvalloc(size_t n);
    cfree(void* p);
    malloc_trim(size_t pad);
    malloc_usable_size(void* p);
    malloc_stats();

* Vital statistics:

  Supported pointer representation:       4 or 8 bytes
  Supported size_t  representation:       4 or 8 bytes
       Note that size_t is allowed to be 4 bytes even if pointers are 8.
       You can adjust this by defining INTERNAL_SIZE_T

  Alignment:                              2 * sizeof(size_t) (default)
       (i.e., 8 byte alignment with 4byte size_t). This suffices for
       nearly all current machines and C compilers. However, you can
       define MALLOC_ALIGNMENT to be wider than this if necessary.

  Minimum overhead per allocated chunk:   4 or 8 bytes
       Each malloced chunk has a hidden word of overhead holding size
       and status information.

  Minimum allocated size: 4-byte ptrs:  16 bytes    (including 4 overhead)
			  8-byte ptrs:  24/32 bytes (including, 4/8 overhead)

       When a chunk is freed, 12 (for 4byte ptrs) or 20 (for 8 byte
       ptrs but 4 byte size) or 24 (for 8/8) additional bytes are
       needed; 4 (8) for a trailing size field and 8 (16) bytes for
       free list pointers. Thus, the minimum allocatable size is
       16/24/32 bytes.

       Even a request for zero bytes (i.e., malloc(0)) returns a
       pointer to something of the minimum allocatable size.

       The maximum overhead wastage (i.e., number of extra bytes
       allocated than were requested in malloc) is less than or equal
       to the minimum size, except for requests >= mmap_threshold that
       are serviced via mmap(), where the worst case wastage is 2 *
       sizeof(size_t) bytes plus the remainder from a system page (the
       minimal mmap unit); typically 96 or 8192 bytes.

  Maximum allocated size:  4-byte size_t: 2^32 minus about two pages
			   8-byte size_t: 2^ minus about two pages

       It is assumed that (possibly signed) size_t values suffice to
       represent chunk sizes. `Possibly signed' is due to the fact
       that `size_t' may be defined on a system as either a signed or
       an unsigned type. The ISO C standard says that it must be
       unsigned, but a few systems are known not to adhere to this.
       Additionally, even when size_t is unsigned, sbrk (which is by
       default used to obtain memory from system) accepts signed
       arguments, and may not be able to handle size_t-wide arguments
       with negative sign bit.  Generally, values that would
       appear as negative after accounting for overhead and alignment
       are supported only via mmap(), which does not have this
       limitation.

       Requests for sizes outside the allowed range will perform an optional
       failure action and then return null. (Requests may also
       also fail because a system is out of memory.)

  Thread-safety: thread-safe
*/

malloc的实现

void * __libc_malloc (size_t bytes)
{
	mstate ar_ptr;
	void *victim;

	void *(*hook) (size_t, const void *)
		= atomic_forced_read (__malloc_hook);
	if (__builtin_expect (hook != NULL, 0))
		return (*hook)(bytes, RETURN_ADDRESS (0));

	arena_get (ar_ptr, bytes);

	if (!ar_ptr)
		return 0;

	victim = _int_malloc (ar_ptr, bytes);
	if (!victim)
	{
		LIBC_PROBE (memory_malloc_retry, 1, bytes);
		ar_ptr = arena_get_retry (ar_ptr, bytes);
		if (__builtin_expect (ar_ptr != NULL, 1))
		{
			victim = _int_malloc (ar_ptr, bytes);
			(void) mutex_unlock (&ar_ptr->mutex);
		}
	}
	else
		(void) mutex_unlock (&ar_ptr->mutex);
	assert (!victim || chunk_is_mmapped (mem2chunk (victim)) ||
			ar_ptr == arena_for_chunk (mem2chunk (victim)));
	return victim;
}
libc_hidden_def (__libc_malloc)

抛开细节看重点,这个函数只需要注意这两句代码即可,就是这两句

victim = _int_malloc (ar_ptr, bytes);
return victim;
换言之,__libc_malloc只是一个封装,真正完成分配任务的是函数_int_malloc。

_int_malloc()这函数非常大,上百行的样子,有兴趣的骚年,自己读去哈!

此函数的主要思想就是根据用户申请而指定的大小,做出不同的分配方案;具体怎么分配先不管喽,解决主要问题!——代码无边,重点是岸 ^_^!

其中有四句,一句定义,剩余三句共有的重要的代码

mchunkptr victim; 
void *p = chunk2mem (victim); 
alloc_perturb (p, bytes);
return p;

victim的数据类型mchunkptr,它是个指针

typedef struct malloc_chunk* mchunkptr;
结构体struct malloc_chunk定义

struct malloc_chunk {
	size_t prev_size;			/* Size of previous chunk (if free).  */
	size_t size;				/* Size in bytes, including overhead. */

	struct malloc_chunk *fd;	/* double links -- used only if free. */
	struct malloc_chunk *bk;

	/* Only used for large blocks: pointer to next larger size.  */
	struct malloc_chunk *fd_nextsize;	/* double links -- used only if free. */
	struct malloc_chunk *bk_nextsize;
};

victim是特定分配算法分配的内存地址,其中已经包含分配的大小信息,chunk2mem是个宏; alloc_perturb此函数调用memset初始分配的内存——全部清零!

#define chunk2mem(p)   ((void*)((char*)(p) + 2*SIZE_SZ))
其中SIZE_SZ还是个宏,迭代展开就是sizeof(size_t), size_t大家都熟悉,为什么加两个无符号整型大小呢?
看上面结构体struct malloc_chunk定义?秒懂?跳过的就是prev_size成员和size成员(两个成员的意义:看结构中注释)

此外,在_int_malloc中有这段注释,也很有价值

/*
     Convert request size to internal form by adding SIZE_SZ bytes
     overhead plus possibly more to obtain necessary alignment and/or
     to obtain a size of at least MINSIZE, the smallest allocatable
     size. Also, checked_request2size traps (returning 0) request sizes
     that are so large that they wrap around zero when padded and
     aligned.
  */

最后,_int_malloc执行返回偏移调整后的p,回到主调函数_lib_malloc中,然后_lib_malloc执行return victim;——即用户malloc得到的地址。

一目了然,大小的信息就藏在malloc返回地址的前面,即struct malloc_chunk结构体内

结构体其布局

/*
   malloc_chunk details:

    (The following includes lightly edited explanations by Colin Plumb.)

    Chunks of memory are maintained using a `boundary tag' method as
    described in e.g., Knuth or Standish.  (See the paper by Paul
    Wilson ftp://ftp.cs.utexas.edu/pub/garbage/allocsrv.ps for a
    survey of such techniques.)  Sizes of free chunks are stored both
    in the front of each chunk and at the end.  This makes
    consolidating fragmented chunks into bigger chunks very fast.  The
    size fields also hold bits representing whether chunks are free or
    in use.

    An allocated chunk looks like this:


    chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             Size of previous chunk, if allocated            | |
	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             Size of chunk, in bytes                       |M|P|
      mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             User data starts here...                          .
	    .                                                               .
	    .             (malloc_usable_size() bytes)                      .
	    .                                                               |
nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             Size of chunk                                     |
	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


    Where "chunk" is the front of the chunk for the purpose of most of
    the malloc code, but "mem" is the pointer that is returned to the
    user.  "Nextchunk" is the beginning of the next contiguous chunk.

    Chunks always begin on even word boundaries, so the mem portion
    (which is returned to the user) is also on an even word boundary, and
    thus at least double-word aligned.

    Free chunks are stored in circular doubly-linked lists, and look like this:

    chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             Size of previous chunk                            |
	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    `head:' |             Size of chunk, in bytes                         |P|
      mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             Forward pointer to next chunk in list             |
	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             Back pointer to previous chunk in list            |
	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             Unused space (may be 0 bytes long)                .
	    .                                                               .
	    .                                                               |
nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    `foot:' |             Size of chunk, in bytes                           |
	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

    The P (PREV_INUSE) bit, stored in the unused low-order bit of the
    chunk size (which is always a multiple of two words), is an in-use
    bit for the *previous* chunk.  If that bit is *clear*, then the
    word before the current chunk size contains the previous chunk
    size, and can be used to find the front of the previous chunk.
    The very first chunk allocated always has this bit set,
    preventing access to non-existent (or non-owned) memory. If
    prev_inuse is set for any given chunk, then you CANNOT determine
    the size of the previous chunk, and might even get a memory
    addressing fault when trying to do so.

    Note that the `foot' of the current chunk is actually represented
    as the prev_size of the NEXT chunk. This makes it easier to
    deal with alignments etc but can be very confusing when trying
    to extend or adapt this code.

    The two exceptions to all this are

     1. The special chunk `top' doesn't bother using the
	trailing size field since there is no next contiguous chunk
	that would have to index off it. After initialization, `top'
	is forced to always exist.  If it would become less than
	MINSIZE bytes long, it is replenished.

     2. Chunks allocated via mmap, which have the second-lowest-order
	bit M (IS_MMAPPED) set in their size fields.  Because they are
	allocated one-by-one, each must contain its own trailing size field.

*/

free的实现

free过程是malloc过程逆过程,瞅瞅框架!常识告诉我们,free必须通过某种方式知道要释放的大小才能完成释放工作,因此开篇的问题,将在这里获得最终答案!

首先,这是实现代码

void __libc_free(void *mem)
{
	mstate ar_ptr;
	mchunkptr p;
	void (*hook) (void *, const void *)
		= atomic_forced_read(__free_hook);
	if (__builtin_expect(hook != NULL,)) {
		(*hook) (mem, RETURN_ADDRESS());
		return;
	}
	if (mem == 0)
		return;
	p = ((mchunkptr) ((char *) (mem) - 2 * (sizeof(size_t))));
	if (chunk_is_mmapped(p)) {
		if (!mp_.no_dyn_threshold
			&& p->size > mp_.mmap_threshold
			&& p->size <= DEFAULT_MMAP_THRESHOLD_MAX) {
			mp_.mmap_threshold = ((p)->size & ~(SIZE_BITS));
			mp_.trim_threshold = 2 * mp_.mmap_threshold;
			LIBC_PROBE(memory_mallopt_free_dyn_thresholds, 2,
					   mp_.mmap_threshold, mp_.trim_threshold);
		}
		munmap_chunk(p);
		return;
	}
	ar_ptr = (((p)-> size & 0x4) ? 
			((heap_info *) ((unsigned long) (p) & ~((10 * 10) - 1)))->ar_ptr : &main_arena);
	_int_free(ar_ptr, p, 0);
}*/
框架:

0、动态分配的hook,参考gnu相关内容(google __free_hook),这里忽略它,虽然它代码的一大坨,但和讨论的问题无关! 飘过
1、传入空指针(free(NULL)),直接返回,这个比较熟悉,手册中经常见;
2、p = mem2chunk (mem);这是依据很关键的代码
这是个宏

#define mem2chunk(mem) ((mchunkptr)((char*)(mem) - 2*SIZE_SZ))
是不是感觉好熟悉,是的,和上文malloc中的 chunk2mem对应的逆操作,上面是加(+),这里是减(-)
3、如果是映射方式分配的大内存,用解映射方式释放,不管它,飘过

4、最后两句

4.1 倒数第二句是宏,展开还是宏,以此继续迭代展开这个样子

ar_ptr = (((p)-> size & 0x4) ? 
             ((heap_info *) ((unsigned long) (p) & ~((10 * 10) - 1)))->ar_ptr : &main_arena);
大致意思就是依据结构体中布局,检测size的第2位(从0计)是不是1,然后做相应的处理。没有深究,不影响问题的讨论,暂且不论了!(贴出来给深究的人,避免再一层层宏展开了)

4.2 最后一句,_int_free,它要完成释放的工作,因此_libc_free也是个封装
_int_free这也是一个不小的函数,飘过与问题无关的,抓住与问题相关的,只有一句,这一句将揭开问题的答案!
进此函数有一句

size = chunksize (p);

chunksize是个宏

#define PREV_INUSE 0x1
#define IS_MMAPPED 0x2
#define NON_MAIN_ARENA 0x4
#define SIZE_BITS (PREV_INUSE | IS_MMAPPED | NON_MAIN_ARENA)
#define chunksize(p)         ((p)->size & ~(SIZE_BITS))
5句话表达的意思就就是屏蔽掉结构体size成员的低3位,就得到chunk的大小了,chunk是什么——姑且翻译成内存块,NND,它就是malloc真实分配的大小。参见上面malloc_chunk details注释的chunk示意图!


测试

#include <stdio.h> 
#include <stdlib.h> 
#include <malloc.h> 

struct malloc_chunk {
	size_t prev_size;
	size_t size;
	struct malloc_chunk *fd;
	struct malloc_chunk *bk;
	struct malloc_chunk *fd_nextsize;
	struct malloc_chunk *bk_nextsize;
};

typedef struct malloc_chunk *mchunkptr;

int main(int argc, char *argv[])
{
	void *mem;
	mchunkptr p;
	int ret;
	int i;

	for(i = 0; i < 10; ++i) {
		mem = malloc(i);

		p = ((mchunkptr) ((char *) (mem) - 2 * (sizeof(size_t))));
		printf("malloc size : %d; chunk size : %d\n", i, p->size & ~0x7);

		free(mem);

	}

	for(i = 0; i < 10; ++i) {
		srand(i);
		mem = malloc(ret = rand() % 1024);

		p = ((mchunkptr) ((char *) (mem) - 2 * (sizeof(size_t))));
		printf("malloc size : %d; chunk size : %d\n", ret, p->size & ~0x7);

		free(mem);
	}
	exit(0);

}

运行结果

技术分享


如果malloc和free是你的开发产品中的性能瓶颈,可以自行实现malloc和free,据说很多公司这样做了!!!

大功告成!

最后,先要再次提醒这是glib C的。至于windows下其怎么malloc和free的,有兴趣自己研究吧!!!


malloc和free的内存到底有多大?——GNU glib库

标签:malloc   free   glib   c   c++   

原文地址:http://blog.csdn.net/cwcmcw/article/details/45100661

(0)
(0)
   
举报
评论 一句话评论(0
登录后才能评论!
© 2014 mamicode.com 版权所有  联系我们:gaon5@hotmail.com
迷上了代码!