Hello!
I was wondering if this code can be manipulated to transform a tensor - say (32128128) to a smaller tensor (86464).
Basically reduce the size of the llm layer by layer.
I tried to use the provided scripts to compress LLAMA 2 with 0.2 compression ratio. The model evaluation script shows a perplexity of 7.2 on wikitext, but the model responses are mostly incoherent. I am getting responses like
Instruction: tell me about you==\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ selecting\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
where as original model is giving decent responses.
Is there any modification to be done for the inference script or the tokeniser after model compression? , Is there an inference script within the repository?
Firstly, I want to express my gratitude for the fascinating work you've been doing. It's been inspiring.
I've recently come across your paper where you describe the integration of SVD-LLM with GPTQ, and I'm eager to explore the implementation further.
Could you please share the code where you've integrated SVD-LLM with GPTQ as described in the paper?
Your assistance in providing access to this code would be appreciated. Thank you for your time and consideration.
Hello,I have some trouble to reproduce the results on llama-13b.An error "scaling_matrix_inv = torch.linalg.inv(scaling_diag_matrix) torch._C._LinAlgError: linalg.inv: The diagonal element 6940 is zero, the inversion could not be completed because the input matrix is singular" occurs on line 203, in whitening function.
How can I sovle this problem? Thanks.
Hi, thank you for your reply. But I still get the same problem as mentioned before.
Traceback (most recent call last):
File "/home/xxx/SVD-LLM/SVDLLM_new.py", line 193, in whitening
scaling_matrix_inv = torch.linalg.inv(scaling_diag_matrix)
torch._C._LinAlgError: linalg.inv: The diagonal element 6940 is zero, the inversion could not be completed because the input matrix is singular. "
My python environment is built on requirements.txt. And I run the code on 2 3090 GPUs