Code Monkey home page Code Monkey logo

maestro's Introduction

MAESTRO: An Open-source Infrastructure for Modeling Dataflows within Deep Learning Accelerators

License: MIT

What is MAESTRO?

MAESTRO is an open-source tool for modeling and evaluating the performance and energy-efficiency of different dataflows. MAESTRO is actively developed by the Synergy Lab at Georgia Institute of Technology. For more details about MAESTRO, please visit the following links.

Codebase

Updates

May 26th, 2021

We updated the hardware description file, added off-chip bandwidth added as constraint.

We added a validation folder with data for Eyeriss and MAERI from MICRO 2019 paper.

Oct 13th, 2020

We added a direct support for GEMM layers. For more information, please take a look at here.

May 13th, 2020

We updated the naming convention of mappings and the directory structure of data folder.

Oct 14th, 2019

Latest codebase released along with MAESTRO MICRO 2019 paper.

Maintainers

Technical Contributors

  • Hyoukjun Kwon (Georgia Tech, now at Facebook Reality Labs): Main developer (core framework and functionalities)
  • Prasanth Chatarasi (Georgia Tech, now at IBM Research): APIs + interface to mapping optimizers.
  • Felix (Sheng-Chun) Kao (Georgia Tech): Pytorch frontend + updates to cost-model/interface + GAMMA mapper
  • Geonhwa Jeong (Georgia Tech): Keras frontend + debugging + website maintainer.
  • Saurabh Malik (Georgia Tech, now at Microsoft): Jupyter Notebooks demo + website.

Citations

@inproceedings{maestro_micro2019,
  author    = {Hyoukjun Kwon and
               Prasanth Chatarasi and
               Michael Pellauer and
               Angshuman Parashar and
               Vivek Sarkar and
               Tushar Krishna},
  title     = {Understanding Reuse, Performance, and Hardware Cost of {DNN} Dataflow:
               {A} Data-Centric Approach},
  booktitle = {Proceedings of the 52nd Annual {IEEE/ACM} International Symposium
               on Microarchitecture, {MICRO}},
  pages     = {754--768},
  publisher = {{ACM}},
  year      = {2019},
}

@article{maestro_toppicks2020,
  author    = {Hyoukjun Kwon and
               Prasanth Chatarasi and
               Vivek Sarkar and
               Tushar Krishna and
               Michael Pellauer and
               Angshuman Parashar},
  title     = {{MAESTRO:} {A} Data-Centric Approach to Understand Reuse, Performance,
               and Hardware Cost of {DNN} Mappings},
  journal   = {{IEEE} Micro},
  volume    = {40},
  number    = {3},
  pages     = {20--29},
  year      = {2020},
}

maestro's People

Contributors

felix0901 avatar felixchu avatar ghjeong12 avatar hyoukjun avatar jseo1207 avatar smalik48 avatar tushar-krishna avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

maestro's Issues

MAESTRO merge-h5 out of memory

Hi developers,

Thanks for creating MAESTRO!

We encountered an issue when trying to merge 6 samples (h5 file) with merge-h5 command. Basically, it runs out of memory very quickly even with 320 G memory. When I merge 2 samples, it seems okay. Do you suggest two round merging to solve the memory issue?

Thanks!

L1, L2, NOC_BW constraints not met

Hi,

I am new to this tool and I ran run_example.sh and I noticed that in the hw file "acclerator_1.m" we specified cstr for L1, L2, NOC_BW and Offchip_BW.
But in the results.csv file, I can see NoC BW Req (Elements/cycle) | Offchip BW Req (Elements/cycle) | L2 SRAM Size Req (Bytes) | L1 SRAM Size Req (Bytes) which doesn't follow the constraints given in hw file.

Does it mean that maestro gives you the optimal numbers for these hardware specifications per layer?
If yes, then how are we evaluating given accelerator that we want to evaluate if evaluation is not conforming our hardware constraints?

Another question is regarding the definition of PE's in x-axis and y-axis.
If we want to evaluate exact eyeriss architecture with (12 x 14 PE's) how can we define that in hw file? I can see that we can provide total number of PE's.

I covered the tutorials and got the idea that cluster just takes a single parameter which shows the no of PE's in a single cluster.
In the run_example.sh mapping file what does the Cluster(64, P) means?
64 means 64 PE's in a single cluster. Does that infer that given 256 PE's total clusters available are 4?
What about P? what does that define?

Thanks.

Question about calculation of throughput.

Q1

image
Here we have throughput(522.605) much greater than the number of PEs(256).
How could that be possible since we know that the upper bound of throughput should be the same as the number of PEs (one cycle for a PE to produce one caculation)

Q2

Could you please explain the calculation of run time since it's critical for the calculation of throughput?
What are the meanings of num_case_occurences and outstanding_delay?

results->UpdateRuntime(results->GetRuntime(CA::EstimationType::Exact) + num_case_occurrences * outstanding_delay, CA::EstimationType::Exact);

hardware accelerator definition file EXAMPLES

how are hardware accelerator files defined?
./data/hw/accelerator_1.m

are there any docs/pptx/tutorials on how I can define PE clusters and/or other chip architectures?
I'd like to write a maestro definition file (xxx.m) that closely matches the hardware accelerator I'm now using (custom made).
A sample on how to write the definition files will be great.

Note: I've checked the docs/pptx from the tutorials (Micro 2020) and the papers but need some more details on how the parameters match my/other hardware accelerator architecture

thanks in advance. and thanks for the great job!

Changing NoC parameters

Hi,
I was playing around with the NoC parameters, and noted that when I change noc_hops from 1 to 1024 and/or noc_hop_latency from 1 to 1024 (or any other values) in the run_example.sh file, my results doesn't change at all.
Can you help me understand why that is? My intuition is that once I increase my hop latency by a large amount, at least the runtime cycle count should increase. I did not see that.

how do we print the help option?

I am attempting to see all the complete options available (not just those listed in the example). However, I am unable to run the help print out.

I attempted the following:
./maestro --help or ./maestro -help or ./maestro help

It requests an input file, so I included them:
./maestro --HW_file='<my hardware file>.m' --Mapping_file='<some mapping file>.m' --help

However, it still does not recognize the help flag option. I see that is there code included (I believe in the option.hpp) for this feature.

Computation order

Hi,
when I swap the order of SpatialMap and TemporalMap, the estimated results are the same.
Could please tell me the reason?
Thank you!

question about hw.parameter

image

I saw the accelerator_1.m has 5 parameters but in the web page, there seem have 8 parameters, how can I config them in the accelerator_1.m ? and the offchip_bw_cstr mean the read and write bandwidth about the L2 buffer ?

A question about buffer inputs in Cacti tools

In the file BASE_constants.hpp in the directory: maestro-stable-master\lib\include\base, the variable l1_energy_multiplier is set to 1.68, and it seems that this variable is for reading and writing energy in L1 buffer. According to your paper, this value is calculated by Cacti tools.
Since we are working on a similar project, and we are interested to compare our results with yours, I would like to kindly ask you to send me your assumptions for L1 buffer in Cacti tools.

design space exploration

Hi,

I have a question related to design space exploration. When I was using maestro tool I have given "--do_dse" flag but I really don't understand the logic behind it. Can someone give explanation about how to use it? Or is there any doc related with this issue. Thank you four your time.

Segmentation fault occurs when executing `data/model/dnn_model.m`

Hello,

I'm currently trying to apply my custom model to maestro.
However, I met Segmentation Fault error when executing my custom model (the mapping file is generated by frontend).
(I just modify Mapping_file argument in run_example.sh like below)

./maestro --HW_file='data/hw/accelerator_1.m' \
          --Mapping_file='data/model/custom_model.m' \
          --print_res=true \
          --print_res_csv_file=true \
          --print_log_file=false \

So, I try to use another example mapping file data/model/dnn_model.m, it shows Segmentation Fault also.

./maestro --HW_file='data/hw/accelerator_1.m' \
          --Mapping_file='data/model/dnn_model.m' \
          --print_res=true \
          --print_res_csv_file=true \
          --print_log_file=false \

My questions are below:
(1) How can I execute data/model/*.m with maestro? many of them just show Segmentation Fault error.
(2) How can I execute my custom model? It shows Segmentation Fault also.

Thank you so much for your consideration!

Compiling Errors, i.e., error: use of deleted function 'boost::filesystem3::.....

Hi,
When I try to compile with 'scons', I meet errors.
The below is part of compiling error information:
....
....
/usr/local/include/boost/type_traits/is_convertible.hpp:135:75: error: use of deleted function 'boost::filesystem3::directory_iterator::directory_iterator(const boost::filesystem3::directory_iterator&)'
static bool const value = sizeof( boost::detail::checker::_m_check(_m_from, 0) )
....
....
/usr/local/include/boost/filesystem/v3/operations.hpp:581:9: error: use of deleted function 'boost::shared_ptrboost::filesystem3::detail::dir_itr_imp::shared_ptr(const boost::shared_ptrboost::filesystem3::detail::dir_itr_imp&)'
...
...
/usr/include/c++/5/ext/new_allocator.h:120:4: error: use of deleted function 'boost::filesystem3::directory_iterator::directory_iterator(const boost::filesystem3::directory_iterator&)'
{ ::new((void )__p) _Up(std::forward<_Args>(__args)...); }
....
....
/usr/include/c++/5/bits/stl_construct.h:75:7: error: use of deleted function 'boost::filesystem3::directory_iterator::directory_iterator(const boost::filesystem3::directory_iterator&)'
{ ::new(static_cast<void
>(__p)) _T1(std::forward<_Args>(__args)...); }
.....
.....

My OS is Ubantu-16.04.7 LTS.
I have tried to change the version of c++(11, 17) and libboost(all, 1.58)-dev.
But the errors still exist.
Do you have any ideas to solve it?
Thanks.

Some questions about hardware constraint

Hello,
thanks for open-sourcing great tool!

I am comparing hardware simulation against the tool, but I see big discrepancies with the result.
I suspect the tool has made some assumptions, so I want to ask some questions about hardware constraints and the code and get confirmed.

  1. In the maestro doc, it says it supports any level of hierarchies. https://maestro.ece.gatech.edu/docs/build/html/hw_supported.html. When level is 2, does it assume noc_bw for noc level 1 and noc level 2 are same?
  2. There are lots of places in the code where it assumes the level is 2 at maximum. (e.g. loop iteration assumes level-2). Do you have any tutorials to run archietecture with more than 2 levels of hierarchy?
  3. multicast parameter doesn't seem to be used anywhere. Does it assume multicast from L2 memory to L1 is always available?
  4. Does it assume that the word size of partial sum is same as input/filter data?

Thanks again

L2, L1 Definition and read/write account

Thank you for taking the time and help us with those questions.

Q1

What is the definition of L2 and L1 in your memory hierarchy? Does L2 represent global buffer and L1 represent register file?

Q2

By running the first layer of vgg16_nlr.m we observed that for the weight tensor the L2 buffer write is larger than the L2 buffer read. Since we are not modifying the weight tensor, the write account is larger than read account does not make much sense here. Could you kindly explain why this is happening?

Thank you again for your time!!

Setting up MAESTRO on python

Hi Everyone,

Iā€™m new to MAESTRO. Iā€™m still trying to figure out how to install and run MAESTRO for my project. I would really appreciate if I can get some help from anyone about how to set MAESTRO up on python. Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.