neurosim / dnn_neurosim_v2.1 Goto Github PK

View Code? Open in Web Editor NEW

104.0 104.0 49.0 8.29 MB

Benchmark framework of compute-in-memory based accelerators for deep neural network (on-chip training chip focused)

Python 7.52% C++ 81.71% C 1.62% Makefile 0.22% HTML 8.32% MATLAB 0.61%

benchmarking-framework chip-training dnn-neurosim memory-accelerators xiaochen-peng

dnn_neurosim_v2.1's People

Contributors

Stargazers

Watchers

Forkers

nikhil-garg csu-gh ashivalagar sirius93123 pigkingno1 hfutlichao amir-hk maanas-verma tommyr06 bk174 cqu-luj quanganh-hoang tienipia chwol zgylzbvm yogesh1601 adveisor-gruppe-5 tunghoang290780 junmolee yimin-github shirosakirukia pritishsh jianquanliu vivekadi seeder-research wangsongq yongwonshin lousi12138 czh-ee-2023 holowila uhvardhan haoyuanma-cs peilin-chen hypermr-project zeshengchen318 wangc0812 jzh886 therealanonymous008 akothen patmnic fatfish-no2 jinquan-shi doctryucsd umichyujl cccronaldooo woodszp

dnn_neurosim_v2.1's Issues

It seems wrong for tileLocaEachLayer

In the function ChipFloorPlan in Training_pytorch/NeuroSIM/Chip.cpp, it calculate the double vector tileLocaEachLayer.
I presume that tileLocaEachLayer record the location of the first tile that store a layer.
The following code to calculate tileLocaEachLayer seems wrong.

for (int i=0; i<netStructure.size(); i++) {
  if (i==0) {
	tileLocaEachLayerRow.push_back(0);
	tileLocaEachLayerCol.push_back(0);
  } else {
        // original code here
	// thisTileTotal += numTileEachLayer[0][i]*numTileEachLayer[1][i];
	tileLocaEachLayerRow.push_back((int)thisTileTotal/(*numTileRow));
	tileLocaEachLayerCol.push_back((int)thisTileTotal%(*numTileRow)-1);
  }
  // I think it should be moved here.
  thisTileTotal += numTileEachLayer[0][i]*numTileEachLayer[1][i];
}

I think the calculation of thisTileTotal should be moved from the else clause to outside.

DenseNet40 for On-Chip training

Hi @neurosim,

Has anyone tried using DenseNet40 for on-chip training using DNN_NeuroSim_V2.1?

I run into this error - ERROR: SubArray Size is too large, which break the chip hierarchey, please decrease the SubArray size!

Due to this error, I am unable to get the circuit-level performance metrics. Any suggestions on how to fix this error?

Thanks!

Question about weight update latency

When I set param->batchSize to 1, the result weight update latency become infns. I don't understand the relation between batch size and weight update. Could anyone please help me and explain? thanks.

Tool Not working in cude 11.6 latest version

Hii,
I have tried setup and install the tool according to the user manual but while running after Floorplan stage tool is not giving any output it is just running and running for default values as provided in manual.I have tried to reduce no of epochs and batch size but this also not working.
I am using latest Cuda version I.e 11.6. Can you please suggest something.
Thank You.

DNN NeuroSim as MLP

Why can't I use DNN NeuroSim as MLP by removing the Convolutional Layers, Max Pool and Activation Functions? Is there any advantage of using MLP NeuroSim over the altered DNN NeuroSim?

And I did try using DNN NeuroSim as MLP, but in the Hardware Performance it's getting stuck at Layer 1. May I know the issue/cause behind this?

QE Blow Assertion

I'm attempting to look at the effects of certain hardware parameters (cellBit, ADCPrecision, etc.) on accuracy and energy. I set "--inference 1" on a relatively unchanged clone of the repository and my GPU ran out of memory. After reducing the size of the layers but leaving everything else generally unchanged (except for a few errors), I keep getting a "QE Blow" assertion error. I've used print statements to find that the assertion error occurs during the second run of "backward" for WAGERounding. Changing grad_scale hasn't helped, nor has adjusting the network architecture. Adding a small value to "x" since it is zero also doesn't help. Is there a possible explanation for why this error is occurring?

The program is stuck on Estimation of Layer 1

We use the DNN_NeuroSim_V2.0 programs, The steps are as follows:

make in NeuroSim folder using CMD in Ubuntu, we can find many .o files
Run train.py, the program is stuck on Estimation of Layer 1，no other logs.
Could you please help me? Thank you very much.

It seems the Input data is not all written to file

As below shows, only the input_matrix[0, :] is changed into filled_matrix_bin and then saved into csv file, so it means not all data in input_matrix can be saved into file? And the silulator cann't read the whole input data, which lead to the wrong result.

def write_matrix_activation_conv(input_matrix,fill_dimension,length,filename):
    filled_matrix_b = np.zeros([input_matrix.shape[2],input_matrix.shape[1]*length],dtype=np.str)
    filled_matrix_bin,scale = dec2bin(input_matrix[0,:],length)
    for i,b in enumerate(filled_matrix_bin):
        filled_matrix_b[:,i::length] =  b.transpose()
    activity = np.sum(filled_matrix_b.astype(np.float), axis=None)/np.size(filled_matrix_b)
    np.savetxt(filename, filled_matrix_b, delimiter=",",fmt='%s')
    return activity```
The codes come from hook.py. And what does the function dec2bin exactly do, waht does  each parammeter means?

Question about Amp latency calculation

Hi! May I ask, why the latency in MultilevelSenseAmp::CalculatePower is not the same with the readLatency calculated in MultilevelSenseAmp::CalculateLatency simulation? In CalculateLatency, readLatency = LatencyCol*numColMuxed, while CalculatePower only takes one LatencyCol. How is the energy consumption during the remaining (numColMuxed-1)*LatencyCol ? I'd appreciate it a lot if someone sheds a little light on it. :)

Can't solve the issue that 'nan's keep appearing in Loss if the parameter in train.py is greater than 1.96

If I set the args.nonlinearityLTP or args.nonlinearityLTD greater than 1.96, the 'nan's will keep appearing in Loss at training phase and it will report an error while converting the decimal data to binary data in hook.py.

First I tried to adjust the learning rate and it didn't work. Then I tried to add normalization before the converting but it didn't work either. I don't know when the 'nan's appear and how to fix it.

Where is the output

After running make it is showing no errors, but the output CSV files are being generated nowhere. I am a beginner at this. Am I missing something? Please help. Thank you.

Energy calculation of cells

It seems that energy produced by memory cells, typically RRAM, is not taken into account when calculating dynamic energy. I'm wondering whether this would influence the accuracy of energy calculation.

writeLatency of Weight Update

Hi, I'm using your DNN_NeuroSim 2.1 for my research. Could you kindly tell me if the "writeLatency of Weight Update" is the accumulation latency of the weight gradients and the write latency to the PEs?

code after return

It may sound stupid. I found plenty of codes that appear after a function returns.
For example in ProcessingUnit.cpp, line 700-712.

vector<vector<double> > CopySubArray(const vector<vector<double> > &orginal, int positionRow, int positionCol, int numRow, int numCol) {
	vector<vector<double> > copy;
	for (int i=0; i<numRow; i++) {
		vector<double> copyRow;
		for (int j=0; j<numCol; j++) {
			copyRow.push_back(orginal[positionRow+i][positionCol+j]);
		}
		copy.push_back(copyRow);
		copyRow.clear();
	}
	return copy;
	copy.clear();
}

It seems that code after return (in the example copy.clear()) is useless.
Does such code do any job? Why add such code?
I'm so confused. Can anyone explain this to me? Thank you so much.

condition of the if statement in MultilevelSenseAmp::CalculateLatency

In MultilevelSenseAmp.cpp:
Is there a typo in line 148? It seems that the condition of line 148 will always be true...

About ResNet training

Does DNN_NeuroSim_V2.1 support ResNet training? I saw from the user manual that it should be supported, but how to define the network structure in NetWork.csv? Can you add an example in the user manual on how to define ResNet in NetWork.csv in your future work? And explain how to reflect the residual structure of the network?

Question about uniform random noise when calculating weight gradient

May I ask, why is uniform random noise added at the end of the non-ideal weight gradient calculation, in wage_quantizer.py line 74-75?

It may seem that the gradient calculated in line 73 is already quantized to the desired resolution. May I ask why the process of adding uniform random noise and re-quantizing the gradient is necessary?
Can someone please explain the reason for this? Thank you.

weak_script_method no longer supported in newer torch versions

Newer torch versions does not come with a @weak_script_method decorator which is used in modules/quantization_cpu_np_infer.py.
Is there a workaround?

Could you please give a conda list

Could you please give a conda list? So we can run the code

Issue with ADC area calculation

Hello,

In ProcessingUnit.cpp, the area of ADC per subarray is calculated using below equation
areaResults.push_back(subArray->areaADC*(numSubArrayRow*numSubArrayCol));
However, if numColMuxed is set to non-zero value (i.e. set to 8 or 8 bitlines share 1 ADC), the total area of ADC should be divided by numColMuxed as below
areaResults.push_back(subArray->areaADC*(numSubArrayRow*numSubArrayCol/numColMuxed));

Am I missing some pieces?

Thanks
/T