Microsoft SQL Server is non-uniform memory access (NUMA) aware, and performs well on NUMA hardware without special configuration. As clock speed and the number of processors increase, it becomes increasingly difficult to reduce the memory latency required to use this additional processing power. To circumvent this, hardware vendors provide large L3 caches, but this is only a limited solution. NUMA architecture provides a scalable solution to this problem. SQL Server has been designed to take advantage of NUMA-based computers without requiring any application changes.
The trend in hardware has been towards more than one system bus, each serving a small set of processors.
NUMA node is group of processors which has its own memory and possibly its own I/O channels. However, each CPU can access memory associated with the other groups in a coherent way.
The number of CPUs within a NUMA node depends on the hardware vendor. It is faster to access local memory than the memory associated with other NUMA nodes. This is the reason for the name, non-uniform memory access architecture.
Local memory is the memory that is on the same node as the CPU currently running the thread.
Foreign Memory/remote memory is memory that does not belong to the node on which the thread is currently running.
NUMA ratio is ratio of the cost to access foreign memory over that for local memory.
If the NUMA ratio is 1, it is symmetric multiprocessing (SMP). The greater the ratio, the more it costs to access the memory of other nodes. Windows applications that are not NUMA aware (including SQL Server 2000 SP3 and earlier) sometimes perform poorly on NUMA hardware.
The main benefit of NUMA is scalability. The NUMA architecture was designed to surpass the scalability limits of the SMP architecture. With SMP, all memory access is posted to the same shared memory bus. This works fine for a relatively small number of CPUs, but not when you have dozens, even hundreds, of CPUs competing for access to the shared memory bus. NUMA alleviates these bottlenecks by limiting the number of CPUs on any one memory bus and connecting the various nodes by means of a high speed interconnection.
Computers with hardware NUMA have more than one system bus, each serving a small set of processors.The number of CPUs within a NUMA node depends on the hardware vendor. Your hardware manufacturer can tell you if your computer supports hardware NUMA.
If you have hardware NUMA, it may be configured to use interleaved memory instead of NUMA. In that case, Windows and therefore SQL Server will not recognize it as NUMA. Run the following query to find the number of memory nodes available to SQL Server:
SELECT DISTINCT memory_node_id FROM sys.dm_os_memory_clerks
If SQL Server returns only a single memory node (node 0), either you do not have hardware NUMA, or the hardware is configured as interleaved (non-NUMA). If you think your hardware NUMA is configured incorrectly, contact your hardware vendor to enable NUMA. SQL Server ignores NUMA configuration when hardware NUMA has four or less CPUs and at least one node has only one CPU.
- SQL Server allows you to group CPUs into nodes referred to as soft-NUMA.
- You usually configure soft-NUMA when you have many CPUs and do not have hardware NUMA, but you can also use soft-NUMA to subdivide hardware NUMA nodes into smaller groups.
- Only the SQL Server scheduler and SQL Server Network Interface (SNI) are soft-NUMA aware.
- Memory nodes are created based on hardware NUMA and therefore not impacted by soft-NUMA. So, for example, if you have an SMP computer with eight CPUs and you create four soft-NUMA nodes with two CPUs each, you will only have one memory node serving all four NUMA nodes. Soft-NUMA does not provide memory to CPU affinity.
- The benefits of soft-NUMA include reducing I/O and lazy writer bottlenecks on computers with many CPUs and no hardware NUMA.
- There is a single I/O thread and a single lazy writer thread for each NUMA node.
- Depending on the usage of the database, these single threads may be a significant performance bottleneck. Configuring four soft-NUMA nodes provides four I/O threads and four lazy writer threads, which could increase performance.