Join us at the PACT tutorial: Saturday, November 3 at 14h00 (LINK)
Using CARM for research? Share your findings with us and we will gladly disseminate it here.

Cache-Aware Roofline Model (CARM) : Performance, Power, Energy and Energy-Efficiency

Roofline
Modeling
[overview]

The Roofline modeling represents an insightful approach for describing the attainable upper bounds of a micro-architecture (e.g., performance, power, energy or efficiency upper-bounds). From the micro-architecture perspective, it is based on the observation that the overall execution can be limited by computations or memory operations. As such, the Roofline models have two distinct modeling regions (i.e., compute and memory region) that intersect at a single point (i.e., the ridge point). The x-axis refers to the intensity (differs across Roofline models), and it is usually expressed as a ratio between compute (flops) and memory operations (data traffic in bytes).

Roofline models are used to simplify detection of main application bottlenecks (i.e., application characterization) and to provide optimization guidelines. For this purpose, the application is plotted in the model (usually, with a single point). By observing the relative position of the application point in respect to the modeled rooflines, one can derive:

  • if the application is memory- or compute-bound -- by determining if the application intensity (on x-axis) is on the left or the right side of the ridge point, and
  • how far is the application from fully exploiting the architecture capabilities -- by observing the point position (on y-axis) in respect to the modeled maximum (rooflines).
Performance
Roofline
Models

For Performance Roofline modeling, the overall execution is considered as limited either by the processor compute capabilities (e.g., peak FP performance in flops/s) or by the memory subsystem capabilities (i.e., memory bandwidth in bytes/s). To date, there are two main approaches for performance Roofline modeling: the Original Roofline Model (ORM) [1] and the Cache-aware Roofline Model (CARM) [2]. The ORM and CARM are different models, namely they differ in the way how memory traffic is considered and how intensity is defined, i.e., the x-axis in the plots.

  • ORM [1] considers data traffic between two subsequent memory levels, thus the x-axis refers to Operational Intensity (OI), e.g., flops/DRAMbytes (DRAMbytes is the amount of data traffic between the LLC and DRAM) [1].
  • CARM [2] considers Arithmetic Intensity (AI) on the x-axis, i.e., flops/bytes, where bytes reflect data traffic at the memory ports of the processor pipeline (i.e., as seen by the cores) [2].

These fundamental differences have direct repercussions in how the two models are constructed, experimentally validated, and used for application characterization and optimization (for more details see [2,4]). Most notably:

  • CARM [2] is a single-plot model, where all memory levels are included in a single chart (i.e., caches and DRAM). There is no need to construct separate models for each memory level (like in ORM [1]).
  • It is easier to analytically and experimentally assess the application AI in CARM [2] (e.g., by simple code analysis as in Intel® Advisor).
  • CARM provides consistent and more intuitive application characterization. There is no shift towards compute-bound region whenever the working data-set fits in a higher memory level (as in ORM [1]).

Roofline:
Power, Energy,
Efficiency

The Roofline methodology is also applied for power consumption, energy and energy-efficiency modeling, by relying on both ORM [3] and CARM [4] principles. These models inherit all differences between the CARM and ORM from the performance domain, thus they offer fundamentally different architecture modeling (for more details see [2,4]).

  • In [3], the ORM principles are applied to energy modeling of processors with a two-level memory hierarchy, by assuming that energy of FP and memory operations can not be overlapped. Power consumption and energy-efficiency ORMs are derived from the energy ORM, and the scope is the complete CPU package.
  • CARM-based power, energy and efficiency models for multi-cores with complex memory hierarchy are proposed in [4]. Power CARMs consider the power consumption of several micro-architecture domains (cores, uncore and package) and the impact of accessing different memory levels. The energy and energy-efficiency CARMs are derived by combining the fundamental models, i.e., performance and power CARMs [2,4].

  1. S. Williams, A. Waterman, and D. Patterson, "Roofline: An Insightful Visual Performance Model for Multicore Architectures", Communications of the ACM, vol. 52, no. 4, pp. 65–76, Apr. 2009. [ PDF ]
  2. A. Ilic, F. Pratas, and L. Sousa, "Cache-aware Roofline model: Upgrading the loft", IEEE Computer Architecture Letters, vol. 13, no. 1, pp. 21–24, Jan. 2014. [ PDF ]
  3. J. W. Choi, D. Bedard, R. Fowler, and R. Vuduc, "A Roofline Model of Energy", in IEEE IPDPS, pp. 661–672, 2013. [ PDF ]
  4. A. Ilic, F. Pratas, and L. Sousa, "Beyond the Roofline: Cache-aware Power and Energy-Efficiency Modeling for Multi-cores", IEEE Transactions on Computers, vol. 66, n. 1, pp. 52-58, January 2017. [ PDF ]


Intel@ Advisor with Cache-aware Roofline Model

Intel®
Advisor

As stated by Intel® [1]: "The Intel Advisor will soon offer a great step forward in memory performance optimization with a new vivid Advisor “Roofline” bounds and bottlenecks analysis. This new feature provides insights beyond vectorization, such as memory usage and the quality of algorithm implementation.
The Intel Advisor implemented "Cache-aware roofline" model (...). It provides additional insight by addressing all levels of memory / cache hierarchy:

  • Slope rooflines illustrate performance levels, if all the data fits into respective cache.
  • Horizontal lines show the peak achievable performance levels if vectorization and other CPU resources are used effectively.

Intel Advisor places a dot for every loop in the Roofline plot. Consider the Intel Advisor roofline plot in the figure above. Most of loops require extra cache use optimizations. Loops to the right of the plotted blue data point fall below the scalar execution roofline and therefore require vectorization."

  1. Intel® Advisor "Roofline model", Intel®, May 2016.Register for the early access program at:
    software.intel.com/en-us/articles/intelr-advisor-roofline-model-early-access-program

Publications

(2017)

Aleksandar Ilic, Frederico Pratas and Leonel Sousa. Beyond the Roofline: Cache-aware Power and Energy-Efficiency Modeling for Multi-cores, IEEE Transactions on Computers, vol. 66, n. 1, pp. 52-58, January 2017.
doi: 10.1109/TC.2016.2582151

André Lopes, Frederico Pratas, Leonel Sousa and Aleksandar Ilic. Exploring GPU performance, power and energy-efficiency bounds with Cache-aware Roofline Modeling, In Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS'17), San Francisco Bay Area, California, USA, April 2017.

(2015)

Aleksandar Ilic, Frederico Pratas and Leonel Sousa. CARM: Cache-Aware Performance, Power and Energy-Efficiency Roofline Modeling, In Compiler, Architecture and Tools Conference (CATC 2015), Intel, Haifa, Israel, November 2015.

(2014)

Aleksandar Ilic, Frederico Pratas and Leonel Sousa. Cache-aware Roofline model: Upgrading the loft, IEEE Computer Architecture Letters, vol. 13, n. 1, pp. 21-24, January 2014.
doi: 10.1109/L-CA.2013.6 | [ PDF ]

Luís Taniça, Aleksandar Ilic, Pedro Tomás and Leonel Sousa. SchedMon: A Performance and Energy Monitoring Tool for Modern Multi-cores, In Proceedings of the International Workshop on Multi/Many-Core Computing Systems (MuCoCoS/Euro-Par 2014), Porto, Portugal, Springer International Publishing, v. 8806, pp. 230-241, August 2014.
doi: 10.1007/978-3-319-14313-2_20 | [ PDF ]

Aleksandar Ilic. Heterogeneous Systems: Load Balancing and Performance Modeling, Ph.D. Thesis, Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal, February 2014. [ bibtex ]

(2013)

Diogo Antão, Luís Taniça, Aleksandar Ilic, Frederico Pratas, Pedro Tomás and Leonel Sousa. Monitoring Performance and Power for Application Characterization with Cache-aware Roofline Model, In Proceedings of the International Conference on Parallel Processing and Applied Mathematics (PPAM 2013), Warsaw, Poland, Springer Berlin Heidelberg, v. 8384/5, pp. 693–703, September 2013.
doi: 10.1007/978-3-642-55224-3_70 | [ PDF ]


Invited Talks, Tutorials and Keynotes

(2017)

Leonel Sousa, Aleksandar Ilic, and Frederico Pratas (in collaboration with Intel®). Cache-aware Roofline Model: Performance, Power and Energy-Efficiency Modeling of Multi-Cores, in High Performance and Embedded Architecture and Compilation Conference (HiPEAC), Tutorial, Stockholm, Sweden, January 2017.
url: CARM@HiPEAC'17

Leonel Sousa, Aleksandar Ilic, and Frederico Pratas (in collaboration with Intel®). Cache-aware Roofline Model: Performance, Power and Energy-Efficiency Modeling of Multi-Cores, in NESUS Winter School and PhD Symposium, Tutorial, Vibo Valentia, Calabria, Italy, February 2017.
url: CARM@Nesus'17

(2016)

Leonel Sousa, Aleksandar Ilic, and Frederico Pratas (in collaboration with Intel®). Performance, Power and Energy-Efficiency Insightful Modeling of Multi-Cores, in IEEE International Conference on Computer Design (ICCD), Tutorial, Phoenix, AZ, USA, October 2016.
url: ICCD'16 Program

Leonel Sousa, Aleksandar Ilic, and Frederico Pratas. CARM: Cache-aware Roofline model for Multicores, Computer Architecture Lab, Carnegie Mellon University, Pittsburgh, PA, USA, September 2016.
url: CMU'16 Seminar

Leonel Sousa, Aleksandar Ilic, and Frederico Pratas. Balancing Performance, Power and Energy-Efficiency on Multi-cores towards Exascale Computing , University of Tokyo, Tokyo, Japan, July 2016.

Leonel Sousa, Aleksandar Ilic, and Frederico Pratas. Cache-aware Modeling of Multi-cores: Performance, Power and Energy-Efficiency, Kyushu Institute of Technology, Colloquium, Kyushu, Japan, June 2016.

Aleksandar Ilic, Frederico Pratas and Leonel Sousa. Cache-Aware Roofline Model: Performance, Power and Energy-Efficiency, in IEEE Latin American Symposium on Circuits and Systems (LASCAS), Tutorial, Florianopolis, SC, Brazil, March 2016.
url: gse.ufsc.br/lascas2016

(2015)

Aleksandar Ilic, Frederico Pratas and Leonel Sousa. Cache-Aware Roofline Model: Performance, Power and Energy-Efficiency, in Avancées sur les modèles de performance pour les nouvelles architectures HPC (Seminaire Exceptionnel), CMLA, ENS Cachan, Université Paris-Saclay, France, November 2015.
url: teratec.eu/.../Annonce_Sem_Intel_HP.pdf

(2014)

Leonel Sousa, Frederico Pratas, Svetislav Momcilovic and Aleksandar Ilic. Coping with Complexity: CPUs, GPUs and Real-world Applications, in Scheduling for Large Scale Systems Workshop, Lyon, France, July 2014.
url: scheduling2014.sciencesconf.org/../presentation_leonel_lyon.pdf

Leonel Sousa, Svetislav Momcilovic, Frederico Pratas and Aleksandar Ilic. Modeling and Load Balancing for Multicore Systems, University of Auckland, New Zealand, June 2014.
url: ece.auckland.ac.nz/../events-2014/modelling-and-load-balancing.html

Leonel Sousa, Aleksandar Ilic, Svetislav Momcilovic and Frederico Pratas. Overhauling Multicores Performance: Modeling and Load Balancing, In International Conference on Parallel and Distributed Computing and Networks (PDCN 2014), Innsbruck, Austria, February 2014.
url: iasted.org/conferences/speaker1-811.html | invited keynote

(2013)

Diogo Antão, Luís Taniça, Aleksandar Ilic, Frederico Pratas, Pedro Tomás and Leonel Sousa. Monitoring Performance and Power for Application Characterization with Cache-aware Roofline Model, In International Conference on Parallel Processing and Applied Mathematics (PPAM 2013), Warsaw, Poland, June 2013.
url: ppam.pl/download/presentations/Leonel_Sousa_ppam_2013.pdf

Aleksandar Ilic, Frederico Pratas and Leonel Sousa. Cache-aware Roofline Model: Upgrading the Loft, In Joint European COST IC0804/IC0805 Meeting, Madrid, Spain, April 2013.

Aleksandar Ilic, Frederico Pratas and Leonel Sousa. Multicores: Performance, Power and Energy Modeling, In Joint European COST IC0804/IC0805 Meeting, La Laguna, Spain, February 2013.
url: vega.deioc.ull.es/../ilic_fcpp_las_cost.pdf


Software / Tools

(2016)

Intel® Advisor "Roofline model", Intel®, May 2016. Register for the early access program at:
software.intel.com/en-us/articles/intelr-advisor-roofline-model-early-access-program

(2014)

SchedMon: An Open Source Software Tool for Accurate Performance and Energy Monitoring in Modern Multi-cores, Luís Taniça, Aleksandar Ilic, Pedro Tomás and Leonel Sousa. SiPS Group, INESC-ID, August 2014.
available at sips.inesc-id.pt/tools/schedmon/

[ go to top ]