标签:
Simon Cooke,美国 (原作者)
北京理工大学 20981 陈罡(翻译)
写在前面的话:
循环缓冲区是一个非常常用的数据存储结构,已经被广泛地用于连续、流数据的存储和通信应用中。对于循环缓冲区,传统的操作方法是开辟一块连续的存储区,不
断地写入数据,当写入到存储区的末尾的时候,再从存储区的首部再开始写入数据,由此不断地重复下去构成了循环缓冲区。偶曾经写过很多循环缓冲区,也看过很
多人编写的循环缓冲区,但是拜读Simon Cooke先生的文章────“两段式”循环缓冲区(原文名称是:The Bip Buffer - The
Circular Buffer with a
Twist)确实觉得与众不同,于是就有了把它介绍给国内开发者的意愿。这里的twist的意思是“缠绕、绞合”,在这里有紧密联系的意味,作者的本意是
希望通过twist这个词能够体现出这个循环缓冲区的特点,但是如果直译出来,会让很多人感到费解。所以在此,根据偶个人的理解将这个标题翻译成“两段
式”循环缓冲区。接下来偶会把英文原文跟偶的理解写出来,感兴趣的朋友可以对照着看,如果翻译有误的地方还请个位高手不吝斧正!
────译者
1、Introduction 简介
Instead of keeping one head and tail pointer to the data in the
buffer, it maintains two revolving regions, allowing for fast data
access without having to worry about wrapping at the end of the buffer.
Buffer allocations are always maintained as contiguous blocks, allowing the buffer to be used in a highly efficient manner with API calls, and also reducing the amount of copying which needs to be performed to put data into the buffer. Finally, a two-phase allocation system allows the user to pessimistically reserve an area of buffer space, and then trim back the buffer to commit to only the space which was used..
Let‘s cover a little history first. If you don‘t already know why a circular buffer can be implemented really efficiently in hardware, or why that makes them the buffer of choice in most electronics, here‘s why.
Bip-Buffer使用起来有些类似循环缓冲区,但是在结构上略有不同。Bip-Buffer内部采用了两个循环存储区(而不是靠维持头指针和尾指针) 来实现数据的高速存取,而且可以让Bip-Buffer的使用者完全不必担心写入数据到达缓冲区末尾,导致重新从缓冲区的首部开始写入的问题。Bip- Buffer维护的存储区是连续的,因此,Bip-Buffer可以通过API调用非常高效地使用存储区,在整个使用过程中可以最大限度避免使用诸如 memcpy(),memmove()之类的内存拷贝操作(通常对于循环缓冲区来说,频繁地调用内存操作函数会成为效率瓶颈)。最后,Bip- Buffer两段式的内存分配系统允许用户申请一块较大的内存,而通过Commit操作来确认真正需要的内存大小,然后把没有用完的内存回收。Once upon a time, computers were much simpler. They didn‘t have 64 bit data buses. Heck, they didn‘t even have real 16 bit registers - although you could occasionally convince a pair of them to sub in for that purpose. These were simpler times, where Real Men programmed in assembly language, and laughed at anyone who didn‘t know how to use the carry flag for all kinds of nefarious purposes.
在很久以前,计算机要简单得多,它们并没有64位的总线,甚至没有真正的16位的寄存器,尽管你可能会相信真的有一对寄存器在做减法计算(译者:这 句话很绕口,应该是作者在以一种调侃的语气跟读者交流,也不知道偶理解得对不对。作者的意思是暗指计算机是没有减法的,所谓减法就是在进行补码的加法运 算,但是对于编程人员来说,由于汇编指令集里面是有SUB指令的,或许有些初级的开发者根据这个指令会想当然地认为寄存器之间在做减法运算。)。在这个 “石器”时代,大师们使用汇编语言编写程序(译者,石刀、石斧?),他们会嘲笑那些不知道如何使用进位标志位来进行编程的开发者。
With simpler times came elegant hacks to eke the most power out of every
instruction cycle available. Take, for example, a simple terminal
communications program. Newer RS232 serial controllers had things like
automatic handling of RTS and CTS signal lines
to control the flow of data - but this came at a cost. Namely, the
connection would be stopping and starting all the time, instead of
streaming along. So in between the controller card and the system, would
often be found a FIFO. This simple circular buffer
was often no more than a couple of bytes long, but it meant that the
system could run smoothly along without polling to see if data had
arrived, or being hammered by constant interrupts from the serial
controller.
在那个时代里,一流的黑客们想方设法地“压榨”计算机在每一条指令周期的运算能力。举个例子来说,一个简单的终端通信程序(译者,这里的终端通信, 指的是基于rs232的串行通信,在作者所指的那个时代,应该还没有所谓互联网这样的东西存在),较新的rs232串行控制器可以通过自动处理RTS和 CTS信号线来控制数据流向(但是这带来了一定程度的带宽资源浪费)。正如RTS和CTS这样的名字所代表的意思一样,串行通信的数据连接需要不断地处理 控制器开始和停止信号,而不是采用类似“流”的方式连续不断地传输数据。于是,在控制卡和系统之间,我们通常可以找到一个叫做FIFO(译者,可以理解为 数据结构中的先进先出型的队列)东西。这个或许是最简单的循环缓冲区的雏形,它通常只有几个到几十个字节左右的长度,但是它的出现,意味着整个系统可以流 畅地运行,不需要实时地检查(译者,也就是所谓的轮循)是否有新的数据到达,或者应用程序的执行过程不断地被串行控制器的硬件中断(译者,这里的中断应该 跟微机原理中学到的中断类似)所打扰。
Most FIFOs started out on-chip, but people also added their own in their code - the idea being that if you had some really gnarly dancing that you had to do on the incoming data, you may as well batch it all up into one lump and do it infrequently... giving spare time to the system to do other things. Like scroll the console, or decode GIFs.
绝大多数的FIFO都是在芯片上完成的,但是开发者们也会把这种理念用于他们的代码中,尤其是当某些通信连续性很糟糕的场合,需要开发者多次接收数 据,然后一次性读取出来处理的时候,很多人想到了循环缓冲区。有了它的帮助,开发者可以在等待的这段时间里让计算机做一些其它事情,例如滚动控制台输出或 者解码GIF图片之类的。
As I said, a FIFO is a very simple circular buffer. Most are implemented very simply as well; they‘re typically 2n bytes in size, which allows the pointers to simply overflow to get back around to the other end of the buffer. The FIFO logic can tell if the FIFO is empty because the head and tail values are the same, and it‘s full if the head is one greater than the tail.
正如我所提到的,FIFO是非常简单的循环缓冲区,而且绝大部分都是非常简单的实现;它们的长度一般都是2的n次方,这样就可以允许对指针进行简单 的溢出判断和处理完成指针重新指向缓冲区的起始位置。FIFO的逻辑可以很容易地通过头指针和尾指针的值来判断缓冲区是“空”还是“满”——头指针和尾指 针的数值相等,代表缓冲区为空;头指针的数值如果比尾指针大1,则代表缓冲区满。
Implementing these in software was easy on the old 8 bit systems. Take a 16 bit register pair. Decide on a location in memory (a multiple of 256) to store the FIFO data in. Then, after setting the register to the start of the buffer, don‘t touch the high register - just increment the low register. This gives you a 256-byte long buffer which you can walk through in one (in the case of the Zilog Z80, 4 cycles - the smallest execution unit available on that system) instruction. You can never go out of the bounds of your buffer, because the low register acts as an index with a value from 0 to 255. When you hit what would have been index 256, the register overflows and clocks back over to zero.
在老式的8位系统里面实现上述FIFO是非常简单的事情。找两个8位的寄存器构成一个16位的寄存器对,分配一段内存(取256的倍数)来保存 FIFO数据,然后,让寄存器对指向该段内存的起始地址(译者,8位的系统,一般寻址空间是16位的,作者的意思是要用两个8位的寄存器来保存16位的内 存地址,256的倍数代表了一个对齐问题,如果取256的倍数的话,就会让16位的寄存器对,只有高8位是有数值的,低8位是从0开始的),注意不要去碰 高8位的寄存器,就让它保留内存地址的值即可;然后可以使用8位模式来操作低8位寄存器对16位的寄存器对指向的内存地址进行FIFO数据写入操作,把低 8位的寄存器做为0-255的索引,每写入一个字节,就把低8位的寄存器加1,一旦超过了255,低8位的寄存器就会溢出,让低8位寄存器重新从0开始, 这样就由硬件自动完成了循环缓冲区指针的调整。用这种方式就为你提供了只需要一个指令周期就可以完成操作的256个字节的循环缓冲区(在Zilog Z80系统上面,需要4个指令周期,这是在该系统里可以得到的最小的执行单位)。在这个实现中,由于是采用硬件溢出的方式来调整循环缓冲区的指针,因此, 根本不必担心会溢出,会把数据写到其它的内存里面。(译者,这可能是可以用软件实现的效率最高、安全性最好的循环缓冲区了。)
3. The Modern Day “帝国”时代
Unfortunately, there is no solution quite as elegant available to
Windows programmers today as that simple old 8-bit solution. Sure, you
can dive down into assembly language (provided you can work out how the
compiler maps registers to values... something I‘ve
never seen a good enough explanation of to get my head around), but
most people don‘t have time for assembly language any more. And besides,
we‘re dealing with 32 bit registers now - incrementing just one
low-order byte from inside that register isn‘t really
all that kosher any more. It can lead to cache flushing, pipeline
stalling, printer fires, rains of frog, etc.
很不幸,对于现代windows程序开发人员来说,已经没有可能找到一种效率可以与早先8位机时代的FIFO相媲美的循环缓冲区的“完美”解决方案了。当
然了,你可以深入研究汇编语言(你可以知道编译器是如何把寄存器和程序中的数值映射起来,然后做某种优化。。。总之我从来没有看到过一个能够让我改变我的
这个看法的汇编解决方案),但是绝大多数人没有时间去挖掘汇编语言的潜力。而且,我们现代的操作系统都采用的是32位的寄存器,依靠寄存器加1,然后利用
硬件溢出来达到循环利用缓冲区的做法,基本上已经不太现实了。现代的操作系统会利用cache(缓存)技术,管道延迟技术,printer
fires, rains of frog等等来扩大寻址的空间。(译者,这后面两个技术不知道是什么意思,还望知道的朋友提示一下。)
If you can‘t just clock the low-order register to walk through the
buffer, you have to start worrying about things like checking to see how
much buffer you have filled before the end, making sure that you
remember to copy the rest of the data from the start
of the buffer, and all kinds of other bookkeeping headaches.
如果不能够通过简单的自增低位寄存器来实现重复使用整个缓冲区的话,那么我们就不得不去面对诸如已经往缓冲区中写入了多少数据,如何确保当写到缓冲区末尾的时候,要把余下的数据从缓冲区的首部开始写入等等让人头疼的问题。
My first attempt at implementing something like this relied on the vague
hope that the virtual memory system could be tricked into setting
things up in such a way that you could set up a mirror of a section of
memory right next to the original. The idea being
that you could still use the rotating allocation of data; a copy
operation could go at full speed without any checking to see if you‘d
walked off the end of the buffer - because as far as your process‘s
address space is concerned, the end of your buffer is
also the beginning of your buffer.
我的第一个在现代的操作系统中实现高效的FIFO循环缓冲区的设想是基于一个模糊的目标,希望能够欺骗虚拟内存系统,在当前缓冲区的后面做一个镜像的缓冲
区,这样一旦对这段缓冲区写入数据超过了内存的边界,数据会自动写入到当前当前缓冲区的起始位置去。这样,就可以仍然构成一个循环使用的存储区,而且内存
拷贝等的操作不需要检查当前指针是否到达了内存区的边界——这是因为进程的地址空间已经被修改过了,缓冲区末尾的再下一个字节的地址恰恰就是缓冲区的开始
的地址。(译者,作者的这个设想确实很有趣,但是估计现代的操作系统还没有开放到这个程度,估计用linux通过修改一些内核代码,应该是可以做到作者这
个想法的)
Now, this mirroring technique may actually work. Due to some
restrictions, I decided not to implement it myself (yet - I‘m sure I‘ll
find a use for it some day). The idea behind it is that first one
reserves two areas of virtual memory, side by side. One then
maps the same temporary file into both virtual memory sections. Voila!
Instant mirroring, and a nice large buffered expanse one can copy data
from willy-nilly.
这个镜像技术的设想或许真的会工作,但是由于一些系统限制性的原因,我决定不去自己实现它(虽然现在没有,我肯定将来的什么时候我会为它找到一种应用方
式)。这个想法背后,意味着程序需要维护两块并列的虚拟内存,在两个虚拟内存中映射的是同一个临时文件。Instant
mirroring技术(译者,这或许是作者一时激动,给这个设想起的名字吧。。。)最终可以允许用户无限制地向缓冲区写入数据、读取数据。
Unfortunately, while it should (again, I‘ve not tried it) indeed work,
there is another problem - namely, that files can only be mapped on 64kb
boundaries (possibly larger on larger memory systems). This means that
your buffer has to be a minimum of 64kb in
size, and will take up 128kb of your virtual address space. Depending
on your application, this may be a valid technique. However, I don‘t see
writing a server application with 1000‘s of sockets being a valid
prospect here.
So what to do? If mirroring won‘t work, how close can we get to using a
circular buffer in our code? Heck, even if we can get close, why would
we want to?
不幸的是,尽管它确实应该工作(再次声明,我没有尝试去做这个实验),但是会引起另外一个问题,也就是说文件只能被映射到64kb的边界(也许在更大的内
存系统中会大一些)。这就意味着缓冲区最小也需要64kb的大小,并且会用掉虚拟地址空间的128kb的寻址空间。无论如何,这取决于你的应用程序的规
模,这也许是一个可行的技术,但是我从没看到过为1000个socket端口提供服务的程序有采用这种技术的苗头。(译者,作者很无奈,毕竟想法是好的,
但是真实的服务器开发,需要的是可靠、稳定、以及高效,没人愿意为了测试那些还在设想中的技术而赌上自己的schedule
:P)。
既然如此,该怎么办呢?如果这种mirroring的技术不能够工作,我们如何找到一种在效率上最最接近它的循环缓冲区的实现方案呢?假设我们可以做出这样的接近于上述方案的循环缓冲区,那么我们该如何做呢?
The biggest difference from an implementation standpoint between a regular circular buffer and the Bip Buffer is the fact that it only returns contiguous blocks. With a circular buffer, you need to worry about wrapping at the end of the buffer area - which is why for example if you look at Larry Antram‘s Fast Ring Buffer implementation, you‘ll see that you pass data into the buffer as a pointer and a length, the data from which is then copied byte by byte into the buffer to take into account the wrapping at the edges.
BipBuffer与常规的循环缓冲区相比较,最大的区别在于它可以返回连续的存储区。使用常规的循环缓冲区,需要考虑如何对缓冲区的末尾进行封
装。这也是当看到Larry Antram‘s Fast Ring
Buffer文章中对于循环缓冲区实现的时候,会发现需要传入数据缓冲区的指针以及数据的长度,然后数据会一个字节一个字节地拷贝到循环缓冲区中去(当然
他的这种实现是需要考虑缓冲区的边界问题的。)
Another possibility which was brought up in the bulletin board (and the person who brought it up shall remain nameless, if just because they... erm...are nameless) was that of just splitting the calls across wraps. Well, this is one way of working around the wrapping problem, but it has the unfortunate side-effect that as your buffer fills, the amount of free space which you pass out to any calls always decreases to 1 byte at the minimum - even if you‘ve got another 128kb of free space at the beginning of your buffer, at the end of it you‘re still going to have to deal with ever shrinking block sizes. The Bip-Buffer neatly sidesteps this issue by just leaving that space alone if the amount you request is larger than the remaining space at the end of the buffer. When writing networking code, this is very useful; you always want to try to receive as much data as possible, but you never can guarantee how much you‘re going to get. (For most optimal results, I‘d recommend allocating a buffer which is some multiple of your MTU size).
Yes, you are going to lose some of what would have been free space at the end of the buffer. It‘s a small price to pay for playing nicely with the API.
Use of this buffer does require that one checks twice to see if the buffer has been emptied; as one has to deal with the possibility that there are two regions currently in use. However, the flexibility and performance gains outweigh this minor inconvenience.
class BipBuffer
{
private:
BYTE* pBuffer;
int ixa, sza, ixb, szb, buflen, ixResrv, szResrv;
public:
BipBuffer();
The constructor initializes the internal variables for tracking regions, and memory pointers to null; it does not allocate any memory for the buffer, in case one needs to use the class in an environment where exception handling cannot be used.
~BipBuffer();
The destructor simply frees any memory which has been allocated to the buffer.
bool AllocateBuffer(int buffersize = 4096);
AllocateBuffer allocates a buffer from virtual memory. The size of the buffer is rounded up to the nearest full page size. The function returns true if successful, or false if the buffer cannot be allocated.
void FreeBuffer();
FreeBuffer frees any memory allocated to the buffer by the call to AllocateBuffer, and releases any regions allocated within the Bip-Buffer.
bool IsInitialized() const;
IsInitialized returns true if the buffer has had memory allocated to it (by calling AllocateBuffer), or false if there is no memory allocated to the buffer.
int GetBufferSize() const;
GetBufferSize returns the total size (in bytes) of the buffer. This may be greater than the value passed into AllocateBuffer, if that value was not a multiple of the system‘s page size.
void Clear();
Clear ... well... clears the buffer. It does not free any memory allocated to the buffer; it merely resets the region pointers back to null, making the full buffer usable for new data again.
BYTE* Reserve(int size, OUT int& reserved);
Now to the nitty-gritty. Allocating data in the Bip-Buffer is a two-phase operation. First an area is reserved by calling the Reserve function; then, that area is Committed by calling the Commit function. This allows one to, say, reserve memory for an IO call, and when that IO call fails, pretend it never happened. Or alternatively, in a call to an overlapped WSARecv() function, it allows one to advertise how much memory is available to the network stack to use for incoming data, and then adjust the amount of space used based on how much data was actually read in (which may be less than the requested amount).
To use Reserve, pass in the size
of block requested. The function will return the size of the largest free block available which is less than or equal tosize
in length in the reserved
parameter you passed in. It will also return a BYTE* pointer to the area of the buffer which you have reserved.
In the case where the buffer has no space available, Reserve will return a NULL pointer, andreserved
will be set to zero.
Note: you cannot nest calls to Reserve and Commit; after calling Reserve you must call Commit before calling Reserve again.
void Commit(int size);
Here‘s the other half of the allocation. Commit takes a size parameter, which is the number of bytes (starting at the BYTE* you were passed back from Reserve) which you have actually used and want to keep in the buffer. If you pass in zero for this size, the reservation will be completely released, as if you had never reserved any space at all. Alternatively, in a debug build, if one passes in a value greater than the original reservation, an assert will fire. (In a release build, the original reservation size will be used, and no one will be any the wiser). Committing data to the buffer makes it available for routines which take data back out of the buffer.
The diagram above shows how Reserve and Commit work. When you call Reserve, it will return a pointer to the beginning of the gray area above (fig. 1). Say you then only use as much of that buffer as the blue section (fig 2). It‘d be a shame to leave this area allocated and going to waste, so you can call Commit with only as much data as you used, which gives you fig. 3 - namely, the committed space extends to fill just the part you needed, leaving the rest free.
int GetReservationSize() const;
If at any time you need to find out if you have a pending reservation, or need to find out that reservation‘s size, you can call GetReservationSize to find the amount reserved. No reservation? You‘ll get a zero back.
BYTE* GetContiguousBlock(OUT int& size);
Well, after all this work to put stuff into the buffer, we‘d better have a way of getting it out again.
First of all, what if you need to work out how much data (total) is available to be read from the buffer?
int GetCommittedSize() const;
One method is to call GetCommittedSize, which will return the total length of data in the buffer - that‘s the total size of both regions combined. I would not recommend relying on this number, because it‘s very easy to forget that you have two regions in the Bip-Buffer if you do. And that would be a bad thing (as several weeks of painful debugging experience has proved to me). As an alternative, you can call:
BYTE* GetContiguousBlock(OUT int& size);
... which will return a BYTE* pointer to the first (as in FIFO, not left-most) contiguous region of committed data in the buffer. Thesize parameter is also updated with the length of the block. If no data is available, the function returns NULL (and thesize
parameter is set to zero).
In order to fully empty the buffer, you may wish to loop around, calling GetContiguousBlock until it returns NULL. If you‘re feeling miserly, you can call it only twice. However, I‘d recommend the former; it means you can forget that there‘s two regions, and just remember that there‘s more than one.
void DecommitBlock(int size);
So what do you do after you‘ve consumed data from the buffer? Well, in keeping with the spirit of the aforementioned Reserve and Commit calls, you then call DecommitBlock to release data from it. Data is released in FIFO order, from the first contiguous block only - so if you‘re going to call DecommitBlock, you should do it pretty shortly after calling GetContiguousBlock. If you pass in asize of greater than the length of the contiguous block, then the entire block is released - but none of the other block (if present) is released at all. This is a deliberate design choice to remind you that there is more than one block and you should act accordingly. (If you really need to be able to discard data from blocks you‘ve not read yet, it‘s not too difficult to copy the DecommitBlock function and modify it so that it operates on both blocks; just unwrap the if statement, and adjust the size parameter after the first clause. Implementation of this is left as the dreaded youknowwhat).
And that‘s the Bip-Buffer implementation done. A short example of how to use it is provided below.
#include "BipBuffer.h"
BipBuffer buffer;
SOCKET s;
bool read_EOF;
bool StartUp
{
// Allocate a buffer 8192 bytes in length
if (!buffer.AllocateBuffer(8192)) return false;
readEOF = false;
s = socket(...
... do something else ...
}
void Foo()
{
_ASSERTE(buffer.IsValid());
// Reserve as much space as possible in the buffer:
int space;
BYTE* pData = buffer.Reserve(GetBufferSize(), space);
// We now have *space* amount of room to play with.
if (pData == NULL) return;
// Obviously we‘ve not emptied the buffer recently
// because there isn‘t any room in it if we return.
// Let‘s use the buffer!
int recvcount = recv(s, (char*)pData, space, 0);
if (recvcount == SOCKET_ERROR) return;
// heh... that‘s some kind of error handling...
// We now have data in the buffer (or, if the
// connection was gracefully closed, we don‘t have any)
buffer.Commit(recvcount);
if (recvcount == 0) read_EOF = true;
}
void Bar()
{
_ASSERTE(buffer.IsValid());
// Let‘s empty the buffer.
int allocated;
BYTE* pData;
while (pData = buffer.GetContiguousBlock(allocated)
!= NULL)
{
// Let‘s do something with the data.
fwrite(pData, allocated, 1, outputfile);
// (again, lousy error handling)
buffer.DecommitBlock(allocated);
}
}
代码:
标签:
原文地址:http://www.cnblogs.com/UnGeek/p/5800913.html