Low-cost cache coherency for accelerators

Embodiments of the invention provide methods and systems for reducing the consumption of inter-node bandwidth by communications maintaining coherence between accelerators and CPUs. The CPUs and the accelerators may be clustered on separate nodes in a multiprocessing environment. Each node that conta...

Full description

Saved in:
Bibliographic Details
Main Authors Clark, Scott Douglas, Wottreng, Andrew Henry
Format Patent
LanguageEnglish
Published 24.01.2012
Online AccessGet full text

Cover

Loading…
More Information
Summary:Embodiments of the invention provide methods and systems for reducing the consumption of inter-node bandwidth by communications maintaining coherence between accelerators and CPUs. The CPUs and the accelerators may be clustered on separate nodes in a multiprocessing environment. Each node that contains a shared memory device may maintain a directory to track blocks of shared memory that may have been cached at other nodes. Therefore, commands and addresses may be transmitted to processors and accelerators at other nodes only if a memory location has been cached outside of a node. Additionally, because accelerators generally do not access the same data as CPUs, only initial read, write, and synchronization operations may be transmitted to other nodes. Intermediate accesses to data may be performed non-coherently. As a result, the inter-chip bandwidth consumed for maintaining coherence may be reduced.