Code Monkey home page Code Monkey logo

Comments (16)

shivamsaboo17 avatar shivamsaboo17 commented on May 3, 2024 3

@karanchahal @williamFalcon I ported the pure python code to cython and got significant speedups:
My experiments are on 3x64x64 input tensor and filters size is 256x3x3x3
Pure python:
50% sparse --> 45 seconds
90% sparse --> 11 seconds
100% sparse --> 60ms

Cython optimized:
50% sparse --> 13 ms
90% sparse --> 5 ms
100% sparse --> 661 microseconds

For ref: PyTorch conv2d took 1.9 ms on my machine (CPU). (Prev results were on colab(CPU))

google drive link to .pyx and ipynb file:
https://drive.google.com/open?id=1gnrbFNWJBZbyPH6KKnCLmrPBNqOFtKUD
https://drive.google.com/open?id=1--_B89H4iSZuJuj9QKqBRrB5Tlr7DMnH

Link to compiled C file:
https://drive.google.com/open?id=1nCGKRmM4AGcmepEJCkWAl_SBZc2l-rrA

I am looking at more ways to optimize cython code now.

from lightning.

williamFalcon avatar williamFalcon commented on May 3, 2024 2

@shivamsaboo17 @karanchahal https://gitter.im/PyTorch-Lightning/community?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge

from lightning.

williamFalcon avatar williamFalcon commented on May 3, 2024 2

super excited about this feature!

from lightning.

karanchahal avatar karanchahal commented on May 3, 2024 1

from lightning.

shivamsaboo17 avatar shivamsaboo17 commented on May 3, 2024 1

Great! Will start reading these papers.

from lightning.

shivamsaboo17 avatar shivamsaboo17 commented on May 3, 2024 1

The paper actually metions using CSR format as row slicing is very fast. Not sure if COO format would be as efficient but we can try. Although converting from COO to CSR should be possible (but not sure how) with small computational overhead

from lightning.

williamFalcon avatar williamFalcon commented on May 3, 2024

@karanchahal this sounds great. let's add both and we can use the official PyTorch version when it's ready!

The first one as a trainer option:
Trainer(quantize_bits=4)

The second after training which can be called on Module.

trainer.fit(model)

model.quantize(bits=8)

@karanchahal submit a PR and we can walk through the implementation!

from lightning.

shivamsaboo17 avatar shivamsaboo17 commented on May 3, 2024

@karanchahal can you please check the link you provide for pruning notebook. I think it's the same link for quantization notebook.
Also, regarding the implementation of neural network pruning, I found that masking the weights that we need to prun is very simple to implement, but if we still keep the weight tensors as the same datatype as before, we still have to do entire matrix multiplication. While multiplications with 0's take less time, still I believe this is really inefficient when you prun 90% of weights but still have to do full matrix multiplication. Are you familiar with a way to handle sparse weights more efficiently in pytorch or some other way such that we can re-structure the network based on prunned weights (assuming unstructured pruning)?

from lightning.

karanchahal avatar karanchahal commented on May 3, 2024

from lightning.

shivamsaboo17 avatar shivamsaboo17 commented on May 3, 2024

Thanks for the reply! I was too unaware of so many challenges of working on sparse tensors.
But I was really interested in implementing custom layers in PyTorch for just inference (only writing forward pass perhaps using torch.sparse API) once we have all the boolean mask. Would you be interested in collaborating on implementing such layers? Perhaps we can start specifically for linear layers and then extend to other types of layers.

from lightning.

shivamsaboo17 avatar shivamsaboo17 commented on May 3, 2024

I read through the ICLR 17 paper and implemented their algorithm in python (link to colab). It is not the most efficient implementation as I used python loops to implement their algorithm, but the key takeaway is the speed increase when sparsity in the weights increase, whereas the PyTorch conv2d need almost same time for all sparsity levels (even all 0's weights). I will try to implement the algorithm using PyTorch C++ extension functionality perhaps (haven't worked on it before), but before that I need to figure out how to use CSR sparse matrix in PyTorch (currently I am using scipy).
If you have any suggestions please let me know!

from lightning.

karanchahal avatar karanchahal commented on May 3, 2024

from lightning.

shivamsaboo17 avatar shivamsaboo17 commented on May 3, 2024

I used the numba jit decorator for sparse convolution function and it runs on CPU (implemented using scipy sparse arrays). I felt it would convert python loops to C++ but when I use nopython=True to compile entire function I get an error because it cannot recognize scipy sparse matrix format and is treated as a regular python object.

I too think that I should first try to make the implementation work with cython and numba before C++ implementation.

Regarding pytorch's conv I think it uses im2col but not sure. But I too think if we can somehow implement this paper's algorithm using torch's inbuilt functions and/or optimize the loops we can get faster layer.

Will try out a few things this weekend and let you know if I get any improvements

from lightning.

karanchahal avatar karanchahal commented on May 3, 2024

from lightning.

williamFalcon avatar williamFalcon commented on May 3, 2024

@sidhanthholalkere @karanchahal spoke with @soumith about this. I think this is better added to core PyTorch. Check out this issue.

Once it's merged and live there we can do whatever we need to do to support it.

Closing to move this work to the PyTorch issue.

from lightning.

gottbrath avatar gottbrath commented on May 3, 2024

Note that we have a notebook with a preview tutorial on eager mode post training quantization in core pytorch over in pytorch/pytorch#18318 ... please check it out and leave feedback.

from lightning.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.