堆和栈的速度

今天被问到一个问题,堆和栈的速度比较。

当时我并没有仔细了解过堆栈的速度比较,之前也没看过相关的博客技术,所以并不知道标准答案,但我还是猜想出来了。

堆和栈最后都是在物理内存DDRM上的,对于上层的应用来说,由于页表的机制,是看不到物理内存的,而且无论是堆还是栈,都是被打散分布在DDRAM上的,所以,从物理上来说,堆和栈应该没有差异。但是既然有堆和栈的速度比较,那肯定是有差异的,那么差异不来自于物理的DDRM,那就只可能来自于Cache,一定是栈的cache的命中率比较高,导致了这样一个速度的差异。

以上内容是面试的时候推导出来的,面试后我又查了相关的知识,发现确实有这样的原因。

  • On the stack, temporal allocation locality (allocations made close together in time) implies spatial locality (storage that is close together in space). In turn, when temporal allocation locality implies temporal access locality (objects allocated together are accessed together), the sequential stack storage tends to perform better with respect to CPU caches and operating system paging systems.

  • Memory density on the stack tends to be higher than on the heap because of the reference type overhead (discussed later in this chapter). Higher memory density often leads to better performance, e.g., because more objects fit in the CPU cache.

  • Thread stacks tend to be fairly small – the default maximum stack size on Windows is 1MB, and most threads tend to actually use only a few stack pages. On modern systems, the stacks of all application threads can fit into the CPU cache, making typical stack object access extremely fast. (Entire heaps, on the other hand, rarely fit into CPU caches.)
    《Pro .NET Performance》

总体上来说:

  1. 栈内存的分配具有空间上的局部性(分配时间的局部性),而我们知道cache设计的基础理论就是程序的局部性,因此这是利于cache的
  2. 栈的内存密度比堆更高,这是利于cache的
  3. 线程栈往往比较小,可以完全cache到缓冲里面
  4. 此外,申请堆内存需要malloc,需要经历一些内存空间的寻找,还可能会引发系统调用,而栈内存是自动分配的,在内存的分配速度上,栈就比堆快