Articles

To Cross, or Not to Cross Pages for Prefetching?

Despite processor vendors reporting that cache prefetchers operating with virtual addresses are permitted to cross page boundaries, academia is focused on optimizing cache prefetching for patterns within page boundaries. This work reveals that page-cross prefetching at the first-level data cache (L1D) is seldom beneficial across different execution phases and workloads while showing that state-of-the-art L1D prefetchers are not very accurate at prefetching across page boundaries. In response, we propose MOKA, a holistic framework for designing Page-Cross Filters, i.e., microarchitectural schemes that ensure effective and accurate prefetching across page boundaries. MOKA combines (i) hashed perceptron predictors that use prefetcher-independent program features, (ii) predictors that adapt decisions based on the system state (e.g., TLB pressure), and (iii) a scheme to dynamically optimize predictions across different execution phases and workload types. We use the MOKA framework to prototype a Page-Cross Filter, named DRIPPER, for three relevant L1D prefetchers (Berti [60], IPCP [61], BOP [57]). We show that DRIPPER accurately enables pagecross prefetching only when it is beneficial for performance. For instance, Berti [60] (state-of-the-art prefetcher) combined with DRIPPER improves single-core geomean performance over Berti that always permits page-cross prefetches and Berti that always discards page-cross prefetches by 1.7%(1.2%) and 2.5%(2.1%) across 218 seen (178 unseen) workloads, respectively. Across 300 8 -core mixes, the corresponding geomean speedups are 2.0% and 3.3%. Finally, we show that DRIPPER provides consistent benefits when both 4KB pages and 2MB large pages are used.