The Roofline modeling represents an insightful approach for describing the attainable upper bounds of a micro-architecture (e.g., performance, power, energy or efficiency upper-bounds). From the micro-architecture perspective, it is based on the observation that the overall execution can be limited by computations or memory operations. As such, the Roofline models have two distinct modeling regions (i.e., compute and memory region) that intersect at a single point (i.e., the ridge point). The x-axis refers to the intensity (differs across Roofline models), and it is usually expressed as a ratio between compute (flops) and memory operations (data traffic in bytes).
Roofline models are used to simplify detection of main application bottlenecks (i.e., application characterization) and to provide optimization guidelines. For this purpose, the application is plotted in the model (usually, with a single point). By observing the relative position of the application point in respect to the modeled rooflines, one can derive:
For Performance Roofline modeling, the overall execution is considered as limited either by the processor compute capabilities (e.g., peak FP performance in flops/s) or by the memory subsystem capabilities (i.e., memory bandwidth in bytes/s). To date, there are two main approaches for performance Roofline modeling: the Original Roofline Model (ORM) [1] and the Cache-aware Roofline Model (CARM) [2]. The ORM and CARM are different models, namely they differ in the way how memory traffic is considered and how intensity is defined, i.e., the x-axis in the plots.
These fundamental differences have direct repercussions in how the two models are constructed, experimentally validated, and used for application characterization and optimization (for more details see [2,4]). Most notably:
The Roofline methodology is also applied for power consumption, energy and energy-efficiency modeling, by relying on both ORM [3] and CARM [4] principles. These models inherit all differences between the CARM and ORM from the performance domain, thus they offer fundamentally different architecture modeling (for more details see [2,4]).
As stated by Intel® [1]: "The Intel Advisor will soon offer a great step forward in memory performance optimization with a new vivid
Advisor “Roofline”
bounds and bottlenecks analysis.
This new feature provides insights beyond vectorization, such as memory usage and the quality of algorithm implementation.
The Intel Advisor implemented "Cache-aware roofline" model (...).
It provides additional insight by addressing all levels of memory / cache hierarchy:
Intel Advisor places a dot for every loop in the Roofline plot. Consider the Intel Advisor roofline plot in the figure above. Most of loops require extra cache use optimizations. Loops to the right of the plotted blue data point fall below the scalar execution roofline and therefore require vectorization."
Aleksandar Ilic, Frederico Pratas and Leonel Sousa.
Beyond the Roofline: Cache-aware Power and Energy-Efficiency Modeling for Multi-cores,
IEEE Transactions on Computers, vol. 66, n. 1, pp. 52-58, January 2017.
doi: 10.1109/TC.2016.2582151
André Lopes, Frederico Pratas, Leonel Sousa and Aleksandar Ilic. Exploring GPU performance, power and energy-efficiency bounds with Cache-aware Roofline Modeling, In Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS'17), San Francisco Bay Area, California, USA, April 2017.
Aleksandar Ilic, Frederico Pratas and Leonel Sousa.
CARM: Cache-Aware Performance, Power and Energy-Efficiency Roofline Modeling,
In Compiler, Architecture and Tools Conference (CATC 2015),
Intel, Haifa, Israel, November 2015.
Aleksandar Ilic, Frederico Pratas and Leonel Sousa.
Cache-aware Roofline model: Upgrading the loft,
IEEE Computer Architecture Letters,
vol. 13, n. 1, pp. 21-24, January 2014.
doi: 10.1109/L-CA.2013.6 |
[ PDF ]
Luís Taniça, Aleksandar Ilic, Pedro Tomás and Leonel Sousa.
SchedMon: A Performance and Energy Monitoring Tool for Modern Multi-cores,
In Proceedings of the International Workshop on Multi/Many-Core Computing Systems (MuCoCoS/Euro-Par 2014),
Porto, Portugal, Springer International Publishing, v. 8806, pp. 230-241, August 2014.
doi: 10.1007/978-3-319-14313-2_20 |
[ PDF ]
Aleksandar Ilic. Heterogeneous Systems: Load Balancing and Performance Modeling, Ph.D. Thesis, Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal, February 2014. [ bibtex ]
Diogo Antão, Luís Taniça, Aleksandar Ilic, Frederico Pratas, Pedro Tomás and Leonel Sousa.
Monitoring Performance and Power for Application Characterization with Cache-aware Roofline Model,
In Proceedings of the International Conference on Parallel Processing and Applied Mathematics (PPAM 2013),
Warsaw, Poland, Springer Berlin Heidelberg, v. 8384/5, pp. 693–703, September 2013.
doi: 10.1007/978-3-642-55224-3_70 |
[ PDF ]
Leonel Sousa, Aleksandar Ilic, and Frederico Pratas (in collaboration with Intel®).
Cache-aware Roofline Model: Performance, Power and Energy-Efficiency Modeling of Multi-Cores,
in High Performance and Embedded Architecture and Compilation Conference (HiPEAC), Tutorial,
Stockholm, Sweden, January 2017.
url: CARM@HiPEAC'17
Leonel Sousa, Aleksandar Ilic, and Frederico Pratas (in collaboration with Intel®).
Cache-aware Roofline Model: Performance, Power and Energy-Efficiency Modeling of Multi-Cores,
in NESUS Winter School and PhD Symposium, Tutorial,
Vibo Valentia, Calabria, Italy, February 2017.
url: CARM@Nesus'17
Leonel Sousa, Aleksandar Ilic, and Frederico Pratas (in collaboration with Intel®).
Performance, Power and Energy-Efficiency Insightful Modeling of Multi-Cores,
in IEEE International Conference on Computer Design (ICCD), Tutorial,
Phoenix, AZ, USA, October 2016.
url: ICCD'16 Program
Leonel Sousa, Aleksandar Ilic, and Frederico Pratas.
CARM: Cache-aware Roofline model for Multicores,
Computer Architecture Lab, Carnegie Mellon University,
Pittsburgh, PA, USA, September 2016.
url: CMU'16 Seminar
Leonel Sousa, Aleksandar Ilic, and Frederico Pratas. Balancing Performance, Power and Energy-Efficiency on Multi-cores towards Exascale Computing , University of Tokyo, Tokyo, Japan, July 2016.
Leonel Sousa, Aleksandar Ilic, and Frederico Pratas. Cache-aware Modeling of Multi-cores: Performance, Power and Energy-Efficiency, Kyushu Institute of Technology, Colloquium, Kyushu, Japan, June 2016.
Aleksandar Ilic, Frederico Pratas and Leonel Sousa.
Cache-Aware Roofline Model: Performance, Power and Energy-Efficiency,
in IEEE Latin American Symposium on Circuits and Systems (LASCAS), Tutorial,
Florianopolis, SC, Brazil, March 2016.
url: gse.ufsc.br/lascas2016
Aleksandar Ilic, Frederico Pratas and Leonel Sousa.
Cache-Aware Roofline Model: Performance, Power and Energy-Efficiency,
in Avancées sur les modèles de performance pour les nouvelles architectures HPC (Seminaire Exceptionnel),
CMLA, ENS Cachan, Université Paris-Saclay, France, November 2015.
url: teratec.eu/.../Annonce_Sem_Intel_HP.pdf
Leonel Sousa, Frederico Pratas, Svetislav Momcilovic and Aleksandar Ilic.
Coping with Complexity: CPUs, GPUs and Real-world Applications,
in Scheduling for Large Scale Systems Workshop,
Lyon, France, July 2014.
url: scheduling2014.sciencesconf.org/../presentation_leonel_lyon.pdf
Leonel Sousa, Svetislav Momcilovic, Frederico Pratas and Aleksandar Ilic.
Modeling and Load Balancing for Multicore Systems,
University of Auckland,
New Zealand, June 2014.
url: ece.auckland.ac.nz/../events-2014/modelling-and-load-balancing.html
Leonel Sousa, Aleksandar Ilic, Svetislav Momcilovic and Frederico Pratas.
Overhauling Multicores Performance: Modeling and Load Balancing,
In International Conference on Parallel and Distributed Computing and Networks (PDCN 2014),
Innsbruck, Austria, February 2014.
url: iasted.org/conferences/speaker1-811.html | invited keynote
Diogo Antão, Luís Taniça, Aleksandar Ilic, Frederico Pratas, Pedro Tomás and Leonel Sousa.
Monitoring Performance and Power for Application Characterization with Cache-aware Roofline Model,
In International Conference on Parallel Processing and Applied Mathematics (PPAM 2013),
Warsaw, Poland, June 2013.
url: ppam.pl/download/presentations/Leonel_Sousa_ppam_2013.pdf
Aleksandar Ilic, Frederico Pratas and Leonel Sousa. Cache-aware Roofline Model: Upgrading the Loft, In Joint European COST IC0804/IC0805 Meeting, Madrid, Spain, April 2013.
Aleksandar Ilic, Frederico Pratas and Leonel Sousa.
Multicores: Performance, Power and Energy Modeling,
In Joint European COST IC0804/IC0805 Meeting,
La Laguna, Spain, February 2013.
url: vega.deioc.ull.es/../ilic_fcpp_las_cost.pdf
Intel® Advisor "Roofline model",
Intel®, May 2016.
Register for the early access program at:
software.intel.com/en-us/articles/intelr-advisor-roofline-model-early-access-program
SchedMon: An Open Source Software Tool for Accurate Performance and Energy Monitoring in Modern Multi-cores,
Luís Taniça, Aleksandar Ilic, Pedro Tomás and Leonel Sousa.
SiPS Group, INESC-ID, August 2014.
available at sips.inesc-id.pt/tools/schedmon/
Copyright © 2008–2014, Aleksandar Ilic [sips.inesc-id.pt/~ilic]