

Then the stack-pointer-register may be adjusted by the total size of function arguments, local variables, return addresses and saved CPU registers as the function is entered and returns (i.e. layout - certain data structures ("containers" / "collections") are more cache-friendly (hence faster), while general purpose implementations of some require heap allocations and may be less cache friendly.įor global data (including C++ namespace data members), the virtual address will typically be calculated and hardcoded at compile time (possibly in absolute terms, or as an offset from a segment register occasionally it may need tweaking as the process is loaded by the OS).įor stack-based data, the stack-pointer-register-relative address can also be calculated and hardcoded at compile time.access - differences in the CPU instructions used by the program to access globals vs stack vs heap, and extra indirection via a runtime pointer when using heap-based data,.allocation - time the program spends "allocating" and "deallocating" memory, including occasional sbrk (or similar) virtual address allocation as the heap usage grows.

While the global vs stack vs heap usage to which memory is put is unknown to the OS and hardware, and all are backed by the same type of memory with the same performance characteristics, there are other subtle considerations (described in detail after this list): The OS (which is responsible for page faulting / swapping), and the hardware (CPU) trapping on accesses to not-yet-accessed or swapped-out pages, would not even be tracking which pages are "global" vs "stack" vs "heap". (With multi-CPU-socket motherboards using Non-Uniform Memory Architecture (NUMA), the time for one CPU to access memory that's "closer" to the other CPU tends to differ though, but that's a bit outside the scope of this question.) on every architecture I've ever worked on, all the process "memory" can be expected to operate at the same set of speeds, based on which level of CPU cache / RAM / swap file is holding the current data, and any hardware-level synchronisation delays that operations on that memory may trigger to make it visible to other processes, incorporate other processes'/CPU (core)'s changes etc. Is accessing data in the heap faster than from the stack?
