Why are there size limits for hardware transactional memory (HTM)?
Here are a few possible reasons:
printf()
statements gracefully.
On the one hand, unbuffered debugging print statements abort the
transaction, while on the other hand, buffered debugging print
statements are of no help if the transaction aborts.
Furthermore, buffered I/O increases the
size of the transaction, which increases probability of abort
due to transaction size limitations.
HTM implementations based on unbounded transactional memory (UTM) might eventually offer significantly larger transaction size limits, though some form of associativity limitation would likely still be in force. Use of high-associativity victim caches could help alleviate associativity limits.
Debugging support might be provided via emulators, but the low performance of typical emulators is likely to be a significant problem for a number of workloads. Alternatively, although software transactional memory (STM) could be used while debugging, there are subtle differences between HTM and STM that could prove problematic in some cases. For but one example, consider a program that uses both locking and transactions running on a lock-based STM implementation. Testing on STM could result in false-positive deadlocks involving the locks used by the STM implementation. These deadlocks would not occur while running on the HTM.
Adaptive tickless kernels might help as well by reducing the frequency of scheduling-clock interrupts. However, this reduction requires that there be only one runnable user thread on a given CPU at any given time, which will not be the case for all workloads. Therefore, although adaptive tickless kernels would greatly increase the probability of HTM transaction success on some workloads (for example, high performance computing (HPC)), it will not be helpful on others. This should be no surprise: To the best of my knowledge, adaptive tickless kernels were not designed with HTM in mind.
Although we should expect continued HTM innovation, transaction sizes are likely to remain limited. However, the question is whether the limits will grow to the point beyond which no one will care, and if so, when. In the meantime, it will continue to be very important to combine HTM with other synchronization mechanisms that are less subject to size limitations. Failing to do so will result in HTM techniques that work extremely well on toy problems, but that are subject to embarrassing failures when applied to large real-world applications.