maestro-project / maestro Goto Github PK

View Code? Open in Web Editor NEW

163.0 7.0 54.0 761 KB

An analytical cost model evaluating DNN mappings (dataflows and tiling).

Home Page: http://maestro.ece.gatech.edu

License: MIT License

Python 1.44% MATLAB 51.03% C++ 29.64% Shell 0.16% Jupyter Notebook 17.62% Objective-C 0.07% M 0.04%

deep-learning deep-neural-networks dataflow

maestro's Introduction

MAESTRO: An Open-source Infrastructure for Modeling Dataflows within Deep Learning Accelerators

What is MAESTRO?

MAESTRO is an open-source tool for modeling and evaluating the performance and energy-efficiency of different dataflows. MAESTRO is actively developed by the Synergy Lab at Georgia Institute of Technology. For more details about MAESTRO, please visit the following links.

Codebase

Updates

May 26th, 2021

We updated the hardware description file, added off-chip bandwidth added as constraint.

We added a validation folder with data for Eyeriss and MAERI from MICRO 2019 paper.

Oct 13th, 2020

We added a direct support for GEMM layers. For more information, please take a look at here.

May 13th, 2020

We updated the naming convention of mappings and the directory structure of data folder.

Oct 14th, 2019

Latest codebase released along with MAESTRO MICRO 2019 paper.

Maintainers

Felix (Sheng-Chun) Kao ([email protected])
Geonhwa Jeong ([email protected])
Tushar Krishna ([email protected])

Technical Contributors

Hyoukjun Kwon (Georgia Tech, now at Facebook Reality Labs): Main developer (core framework and functionalities)
Prasanth Chatarasi (Georgia Tech, now at IBM Research): APIs + interface to mapping optimizers.
Felix (Sheng-Chun) Kao (Georgia Tech): Pytorch frontend + updates to cost-model/interface + GAMMA mapper
Geonhwa Jeong (Georgia Tech): Keras frontend + debugging + website maintainer.
Saurabh Malik (Georgia Tech, now at Microsoft): Jupyter Notebooks demo + website.

Citations

@inproceedings{maestro_micro2019,
  author    = {Hyoukjun Kwon and
               Prasanth Chatarasi and
               Michael Pellauer and
               Angshuman Parashar and
               Vivek Sarkar and
               Tushar Krishna},
  title     = {Understanding Reuse, Performance, and Hardware Cost of {DNN} Dataflow:
               {A} Data-Centric Approach},
  booktitle = {Proceedings of the 52nd Annual {IEEE/ACM} International Symposium
               on Microarchitecture, {MICRO}},
  pages     = {754--768},
  publisher = {{ACM}},
  year      = {2019},
}

@article{maestro_toppicks2020,
  author    = {Hyoukjun Kwon and
               Prasanth Chatarasi and
               Vivek Sarkar and
               Tushar Krishna and
               Michael Pellauer and
               Angshuman Parashar},
  title     = {{MAESTRO:} {A} Data-Centric Approach to Understand Reuse, Performance,
               and Hardware Cost of {DNN} Mappings},
  journal   = {{IEEE} Micro},
  volume    = {40},
  number    = {3},
  pages     = {20--29},
  year      = {2020},
}

maestro's People

Contributors

Stargazers

Watchers

Forkers

yuechengli markplagge felix0901 ghjeong12 vmiheer wxbbuaa2011 sarabjeetsingh007 acsys-yonsei maryam-tavakoli rhhc zhuangzhuangwu shana34 owenhoffend jaynotleno mahditaheri claint76 smmzhang dliul footprintss rohanjuneja ren-research wangfeng012316 jaideepheer raykuo18 kelvinyang0320 sinnliu mtr008 shreyas42singh leesh6796 chocolife-96 kongty frankinwi utalfred yimin-github yeonan lackhole chaogaoucr lgy35 daltamaulana ghazalejv yeqiao a-nnonymous haochengx zhuhanqing luigialtamura yihenc aserenate mohamed-s-ibrahim luciusmos pengchengai kasraus ssatyendras mohitpinninti neb03

maestro's Issues

Where is graph_utils library imported from?

Hello,

The docs (here: http://maestro.ece.gatech.edu/docs/build/html/maestro_output_analysis/maestro_output_analysis.html) state that graph_utils is imported but no reference is shown. Whereas numpy, pandas, matplotlib are referenced to show how to install them.

Question about scons compile

MAESTRO merge-h5 out of memory

Hi developers,

Thanks for creating MAESTRO!

We encountered an issue when trying to merge 6 samples (h5 file) with merge-h5 command. Basically, it runs out of memory very quickly even with 320 G memory. When I merge 2 samples, it seems okay. Do you suggest two round merging to solve the memory issue?

Thanks!

L1, L2, NOC_BW constraints not met

Hi,

I am new to this tool and I ran run_example.sh and I noticed that in the hw file "acclerator_1.m" we specified cstr for L1, L2, NOC_BW and Offchip_BW.
But in the results.csv file, I can see NoC BW Req (Elements/cycle) | Offchip BW Req (Elements/cycle) | L2 SRAM Size Req (Bytes) | L1 SRAM Size Req (Bytes) which doesn't follow the constraints given in hw file.

Does it mean that maestro gives you the optimal numbers for these hardware specifications per layer?
If yes, then how are we evaluating given accelerator that we want to evaluate if evaluation is not conforming our hardware constraints?

Another question is regarding the definition of PE's in x-axis and y-axis.
If we want to evaluate exact eyeriss architecture with (12 x 14 PE's) how can we define that in hw file? I can see that we can provide total number of PE's.

I covered the tutorials and got the idea that cluster just takes a single parameter which shows the no of PE's in a single cluster.
In the run_example.sh mapping file what does the Cluster(64, P) means?
64 means 64 PE's in a single cluster. Does that infer that given 256 PE's total clusters available are 4?
What about P? what does that define?

Thanks.

Question about calculation of throughput.

Q1

Here we have throughput(522.605) much greater than the number of PEs(256).
How could that be possible since we know that the upper bound of throughput should be the same as the number of PEs (one cycle for a PE to produce one caculation)

Q2

Could you please explain the calculation of run time since it's critical for the calculation of throughput?
What are the meanings of num_case_occurences and outstanding_delay?

results->UpdateRuntime(results->GetRuntime(CA::EstimationType::Exact) + num_case_occurrences * outstanding_delay, CA::EstimationType::Exact);

hardware accelerator definition file EXAMPLES

how are hardware accelerator files defined?
./data/hw/accelerator_1.m

are there any docs/pptx/tutorials on how I can define PE clusters and/or other chip architectures?
I'd like to write a maestro definition file (xxx.m) that closely matches the hardware accelerator I'm now using (custom made).
A sample on how to write the definition files will be great.

Note: I've checked the docs/pptx from the tutorials (Micro 2020) and the papers but need some more details on how the parameters match my/other hardware accelerator architecture

thanks in advance. and thanks for the great job!

Changing NoC parameters

Hi,
I was playing around with the NoC parameters, and noted that when I change noc_hops from 1 to 1024 and/or noc_hop_latency from 1 to 1024 (or any other values) in the run_example.sh file, my results doesn't change at all.
Can you help me understand why that is? My intuition is that once I increase my hop latency by a large amount, at least the runtime cycle count should increase. I did not see that.

how do we print the help option?

I am attempting to see all the complete options available (not just those listed in the example). However, I am unable to run the help print out.

I attempted the following:
./maestro --help or ./maestro -help or ./maestro help

It requests an input file, so I included them:
./maestro --HW_file='<my hardware file>.m' --Mapping_file='<some mapping file>.m' --help

However, it still does not recognize the help flag option. I see that is there code included (I believe in the option.hpp) for this feature.

Computation order

Hi,
when I swap the order of SpatialMap and TemporalMap, the estimated results are the same.
Could please tell me the reason?
Thank you!

Extra Spaces in CSV header names

I dont know if this space in the CSV headers has a specific purpose but it took a while to figure out there was extra space in the Label while trying to handle data using pandas.
DSE_csv_writer.hpp

question about hw.parameter

I saw the accelerator_1.m has 5 parameters but in the web page, there seem have 8 parameters, how can I config them in the accelerator_1.m ? and the offchip_bw_cstr mean the read and write bandwidth about the L2 buffer ?

A question about buffer inputs in Cacti tools

In the file BASE_constants.hpp in the directory: maestro-stable-master\lib\include\base, the variable l1_energy_multiplier is set to 1.68, and it seems that this variable is for reading and writing energy in L1 buffer. According to your paper, this value is calculated by Cacti tools.
Since we are working on a similar project, and we are interested to compare our results with yours, I would like to kindly ask you to send me your assumptions for L1 buffer in Cacti tools.

design space exploration

Hi,

I have a question related to design space exploration. When I was using maestro tool I have given "--do_dse" flag but I really don't understand the logic behind it. Can someone give explanation about how to use it? Or is there any doc related with this issue. Thank you four your time.

Segmentation fault occurs when executing `data/model/dnn_model.m`

Hello,

I'm currently trying to apply my custom model to maestro.
However, I met Segmentation Fault error when executing my custom model (the mapping file is generated by frontend).
(I just modify Mapping_file argument in run_example.sh like below)

./maestro --HW_file='data/hw/accelerator_1.m' \
          --Mapping_file='data/model/custom_model.m' \
          --print_res=true \
          --print_res_csv_file=true \
          --print_log_file=false \

So, I try to use another example mapping file data/model/dnn_model.m, it shows Segmentation Fault also.

./maestro --HW_file='data/hw/accelerator_1.m' \
          --Mapping_file='data/model/dnn_model.m' \
          --print_res=true \
          --print_res_csv_file=true \
          --print_log_file=false \

My questions are below:
(1) How can I execute data/model/*.m with maestro? many of them just show Segmentation Fault error.
(2) How can I execute my custom model? It shows Segmentation Fault also.

Thank you so much for your consideration!

Modelling Clusters in MAESTRO

Hi,
I'm trying to change the hardware descriptor file to model clusters/hierarchical PE arrays (something similar to your example at http://maestro.ece.gatech.edu/docs/build/html/hw_supported.html#hardware-supported)

However, I did not see any knobs for that. Could you help me understand how I'd model clusters in MAESTRO's hw descriptor file?

Compiling Errors, i.e., error: use of deleted function 'boost::filesystem3::.....

Hi,
When I try to compile with 'scons', I meet errors.
The below is part of compiling error information:
....
....
/usr/local/include/boost/type_traits/is_convertible.hpp:135:75: error: use of deleted function 'boost::filesystem3::directory_iterator::directory_iterator(const boost::filesystem3::directory_iterator&)'
static bool const value = sizeof( boost::detail::checker::_m_check(_m_from, 0) )
....
....
/usr/local/include/boost/filesystem/v3/operations.hpp:581:9: error: use of deleted function 'boost::shared_ptrboost::filesystem3::detail::dir_itr_imp::shared_ptr(const boost::shared_ptrboost::filesystem3::detail::dir_itr_imp&)'
...
...
/usr/include/c++/5/ext/new_allocator.h:120:4: error: use of deleted function 'boost::filesystem3::directory_iterator::directory_iterator(const boost::filesystem3::directory_iterator&)'
{ ::new((void )__p) _Up(std::forward<_Args>(__args)...); }
....
....
/usr/include/c++/5/bits/stl_construct.h:75:7: error: use of deleted function 'boost::filesystem3::directory_iterator::directory_iterator(const boost::filesystem3::directory_iterator&)'
{ ::new(static_cast<void>(__p)) _T1(std::forward<_Args>(__args)...); }
.....
.....

My OS is Ubantu-16.04.7 LTS.
I have tried to change the version of c++(11, 17) and libboost(all, 1.58)-dev.
But the errors still exist.
Do you have any ideas to solve it?
Thanks.

Some questions about hardware constraint

Hello,
thanks for open-sourcing great tool!

I am comparing hardware simulation against the tool, but I see big discrepancies with the result.
I suspect the tool has made some assumptions, so I want to ask some questions about hardware constraints and the code and get confirmed.

In the maestro doc, it says it supports any level of hierarchies. https://maestro.ece.gatech.edu/docs/build/html/hw_supported.html. When level is 2, does it assume noc_bw for noc level 1 and noc level 2 are same?
There are lots of places in the code where it assumes the level is 2 at maximum. (e.g. loop iteration assumes level-2). Do you have any tutorials to run archietecture with more than 2 levels of hierarchy?
multicast parameter doesn't seem to be used anywhere. Does it assume multicast from L2 memory to L1 is always available?
Does it assume that the word size of partial sum is same as input/filter data?

Thanks again

L2, L1 Definition and read/write account

Thank you for taking the time and help us with those questions.

What is the definition of L2 and L1 in your memory hierarchy? Does L2 represent global buffer and L1 represent register file?

By running the first layer of vgg16_nlr.m we observed that for the weight tensor the L2 buffer write is larger than the L2 buffer read. Since we are not modifying the weight tensor, the write account is larger than read account does not make much sense here. Could you kindly explain why this is happening?

Thank you again for your time!!

Setting up MAESTRO on python

Hi Everyone,

I’m new to MAESTRO. I’m still trying to figure out how to install and run MAESTRO for my project. I would really appreciate if I can get some help from anyone about how to set MAESTRO up on python. Thanks

Maestro finds it difficult to read the text in Snackbar , but is able to read action button text

Maestro finds it difficult to read the text in Snackbar , but is able to read action button text inside Snackbar with ease