标签:gcc uda round har wap cache null col glibc
参考
https://gcc.gnu.org/onlinedocs/gcc-4.3.2/gcc/Other-Builtins.html#Other-Builtins
https://en.wikipedia.org/wiki/Find_first_set#CTZ
Tool/library | Name | Type | Input type(s) | Notes | Result for zero input |
---|---|---|---|---|---|
POSIX.1 compliant libc 4.3BSD libc OS X 10.3 libc[2][21] |
ffs |
Library function | int | Includes glibc. POSIX does not supply the complementary log base 2 / clz. |
0 |
FreeBSD 5.3 libc OS X 10.4 libc[22] |
ffsl fls flsl |
Library function | int, long |
fls ("find last set") computes (log base 2) + 1. | 0 |
FreeBSD 7.1 libc[23] | ffsll flsll |
Library function | long long | 0 | |
GCC | __builtin_ffs[l,ll,imax] |
Built-in functions | unsigned int, unsigned long, unsigned long long, uintmax_t |
0 | |
GCC 3.4.0[24][25] | __builtin_clz[l,ll,imax] __builtin_ctz[l,ll,imax] |
undefined | |||
Visual Studio 2005 | _BitScanForward [28]_BitScanReverse [29] |
Compiler intrinsics | unsigned long, unsigned __int64 |
Separate return value to indicate zero input | 0 |
Visual Studio 2008 | __lzcnt [30] |
Compiler intrinsic | unsigned short, unsigned int, unsigned __int64 |
Relies on x64-only lzcnt instruction | Input size in bits |
Intel C++ Compiler | _bit_scan_forward _bit_scan_reverse [31] |
Compiler intrinsics | int | undefined | |
NVIDIA CUDA[32] | __clz |
Functions | 32-bit, 64-bit | Compiles to fewer instructions on the GeForce 400 Series | 32 |
__ffs |
0 | ||||
LLVM | llvm.ctlz.* llvm.cttz.* [33] |
Intrinsic | 8, 16, 32, 64, 256 | LLVM assembly language | Input size if arg 2 is 0, else undefined |
GHC 7.10 (base 4.8), in Data.Bits |
countLeadingZeros countTrailingZeros |
Library function | FiniteBits b => b |
Haskell programming language | Input size in bits |
分支预测
You may use
__builtin_expect
to provide the compiler with branch prediction information. In general, you should prefer to use actual profile feedback for this (-fprofile-arcs), as programmers are notoriously bad at predicting how their programs actually perform. However, there are applications in which this data is hard to collect.The return value is the value of exp, which should be an integral expression. The semantics of the built-in are that it is expected that exp == c. For example:
if (__builtin_expect (x, 0)) foo ();would indicate that we do not expect to call
foo
, since we expectx
to be zero. Since you are limited to integral expressions for exp, you should use constructions such asif (__builtin_expect (ptr != NULL, 1)) error ();when testing pointer or floating-point values.
预取
This function is used to minimize cache-miss latency by moving data into a cache before it is accessed. You can insert calls to
__builtin_prefetch
into code for which you know addresses of data in memory that is likely to be accessed soon. If the target supports them, data prefetch instructions will be generated. If the prefetch is done early enough before the access then the data will be in the cache by the time it is accessed.The value of addr is the address of the memory to prefetch. There are two optional arguments, rw and locality. The value of rw is a compile-time constant one or zero; one means that the prefetch is preparing for a write to the memory address and zero, the default, means that the prefetch is preparing for a read. The value locality must be a compile-time constant integer between zero and three. A value of zero means that the data has no temporal locality, so it need not be left in the cache after the access. A value of three means that the data has a high degree of temporal locality and should be left in all levels of cache possible. Values of one and two mean, respectively, a low or moderate degree of temporal locality. The default is three.
for (i = 0; i < n; i++) { a[i] = a[i] + b[i]; __builtin_prefetch (&a[i+j], 1, 1); __builtin_prefetch (&b[i+j], 0, 1); /* ... */ }
Data prefetch does not generate faults if addr is invalid, but the address expression itself must be valid. For example, a prefetch of
p->next
will not fault ifp->next
is not a valid address, but evaluation will fault ifp
is not a valid address.If the target does not support data prefetch, the address expression is evaluated if it includes side effects but no other code is generated and GCC does not issue a warning.
标签:gcc uda round har wap cache null col glibc
原文地址:http://www.cnblogs.com/justinh/p/7754655.html