Code Monkey home page Code Monkey logo

Comments (11)

pablomazo avatar pablomazo commented on June 20, 2024 1

That was the solution, thank you.

I manage to install it and it seems to be working. The GPU is recognized by pytorch and I was able to install functorch and passed a simple test.

In case it is useful for someone else, the only change with respect to the PyTorch documentation for installing from source is to change

conda install -c pytorch magma-cuda110

to

conda install -c pytorch magma-cuda101

from functorch.

zou3519 avatar zou3519 commented on June 20, 2024

Thanks for the report, @pablomazo. I think we need to bump the nightly version mentioned in the installation instructions. I was able to install functorch using the following:

Step 1: Install the latest PyTorch nightly binary (pick one)

# For CUDA 10.2
pip install --pre torch -f https://download.pytorch.org/whl/nightly/cu102/torch_nightly.html
# For CUDA 11.1
pip install --pre torch -f https://download.pytorch.org/whl/nightly/cu111/torch_nightly.html
# For CPU-only build
pip install --pre torch -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html

Step 2: Install functorch:

pip install --user "git+https://github.com/zou3519/functorch.git"

Could you let us know if that resolves your problem?

from functorch.

pablomazo avatar pablomazo commented on June 20, 2024

It worked nicely, thank you very much.

May I also ask. Sadly I have CUDA version: 10.1 installed in the machine I'm using. It seems like installation of functorch for that version is not working. I install PyTorch through:

pip install --pre torch -f https://download.pytorch.org/whl/nightly/cu101/torch_nightly.html

but functorch is not able to build against this version. The error log is quite large but in the end I get RuntimeError: Error compiling objects for extension.
I don't really think it is a big issue (just me having and old CUDA version), but just in case it is not the expected behavior.

from functorch.

zou3519 avatar zou3519 commented on June 20, 2024

@pablomazo Would it be possible for you to send us the full log? That would be helpful for debugging.

An easy way to get the log is to pipe all output to a single file:
pip install --user "git+https://github.com/zou3519/functorch.git" > build_log.txt 2>&1

from functorch.

pablomazo avatar pablomazo commented on June 20, 2024

This is the log file I get:

build_log.txt

from functorch.

zou3519 avatar zou3519 commented on June 20, 2024

build_log.txt

Can you run the following script and paste its output? https://github.com/pytorch/pytorch/blob/master/torch/utils/collect_env.py

My guess right now is that you have a local version of PyTorch installed that is interfering with the build. One way to resolve this is:

# Repeat until you're certain that there is no PyTorch left
pip uninstall torch
pip uninstall torch
pip uninstall torch

And then following the install instructions again:

pip install --pre torch -f https://download.pytorch.org/whl/nightly/cu101/torch_nightly.html
pip install --user "git+https://github.com/zou3519/functorch.git"

from functorch.

pablomazo avatar pablomazo commented on June 20, 2024

This is what I get running https://github.com/pytorch/pytorch/blob/master/torch/utils/collect_env.py

/srv/hdd/pablom93/miniconda3/envs/functorch/lib/python3.8/site-packages/torch/package/_mock_zipreader.py:17: UserWarning: Failed to initialize NumPy: numpy.core.multiarray failed to import (Triggered internally at  /pytorch/torch/csrc/utils/tensor_numpy.cpp:66.)
  _dtype_to_storage = {data_type(0).dtype: data_type for data_type in _storages}
Collecting environment information...
PyTorch version: 1.9.0.dev20210415+cu101
Is debug build: False
CUDA used to build PyTorch: 10.1
ROCM used to build PyTorch: N/A

OS: Debian GNU/Linux 10 (buster) (x86_64)
GCC version: (Debian 8.3.0-6) 8.3.0
Clang version: 7.0.1-8+deb10u2 (tags/RELEASE_701/final)
CMake version: version 3.13.4

Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: GPU 0: GeForce GTX 1080
Nvidia driver version: 418.74
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] torch==1.9.0.dev20210415+cu101
[conda] cudatoolkit               10.1.243             h6bb024c_0  
[conda] torch                     1.9.0.dev20210415+cu101          pypi_0    pypi

I also executed uninstalled torch, at the second pip uninstall torch I already got:

WARNING: Skipping torch as it is not installed.

so I think I only had one version. Reinstalling both pytorch and functorch lead to the same error.

from functorch.

zou3519 avatar zou3519 commented on June 20, 2024

Interesting. It looks like the PyTorch CUDA 10.1 nightly binaries aren't getting updated anymore (the latest is from 4/15). There are only CUDA 10.2 and CUDA 11.1 binaries (and I think neither of those work with 10.1).

You might need to build PyTorch from source locally before installing functorch.

EDIT: I'm not sure if one can actually build PyTorch with CUDA 10.1 anymore, let me go ask around

from functorch.

zou3519 avatar zou3519 commented on June 20, 2024

@pablomazo -- If you're willing to share, what is your use case for functorch and how did the tests with vmap go?

from functorch.

pablomazo avatar pablomazo commented on June 20, 2024

@zou3519 -- we finally manage to write a function that solved our problem without using vmap. Still, I am very happy to show an example of what we were trying and some of the results on the tests we performed.

In particular we had a function that would receive a matrix, diagonalize it and after some other changes return the eigenvalues and eigenvector:

def eig(A):
    eigval, eigvec = torch.linalg.eigh(A)
    # Some work with eigval and eigvec
    return eigval, eigvec

The test that made us try to write a batched version of this function is that we got the following execution times for the function above (at the end of the post is the code used in this test):

# Batch size,  vmap/no vmap execution times (s)
1    3.7324e-04    9.7635e-05
2    4.8938e-04    1.0503e-04
3    7.3761e-04    1.1439e-04
4    5.7082e-04    1.2108e-04
5    6.7674e-04    1.3086e-04
6    7.6906e-04    1.4241e-04
7    8.6485e-04    1.5229e-04
8    9.6456e-04    1.6277e-04
9    1.0655e-03    1.7103e-04

It seems like vmap is adding some extra time as a function of the batch size larger that the one without vmap.

The second test we run, was to perform a multiplication of a batch of matrices. Here we got the following times:

# Batch size,  vmap/no vmap execution times (s)
1    6.4454e-05    1.4212e-05
2    5.5112e-05    1.3865e-05
3    5.4205e-05    1.3789e-05
4    5.4581e-05    1.4083e-05
5    5.3946e-05    1.3917e-05
6    5.4087e-05    1.3832e-05
7    5.3648e-05    1.3880e-05
8    5.4571e-05    1.4019e-05
9    5.3989e-05    1.4117e-05

I see this version of vmap is actually better that the one currently in PyTorch, since this one preserves a constant computation time for the matrix product. Still there is yet a big difference wrt not using vmap. So I really see the potential of this for more complex functions, but ours was actually doable without vmap so we went for that solution.

This is the code I have used to get this results:

import time
import torch
from functorch import vmap

def eig(A):
    eigval, eigvec = torch.linalg.eigh(A)
    # Some work with eigval and eigvec
    return (eigval, eigvec)

def prod(A):
    return A @ A

def test_fn(fn):
    for batch in range(1, 10):
        A = torch.rand((batch, ndim, ndim), device='cuda')

        elapsed = 0e0
        for i in range(nruns):
            t1 = time.perf_counter()
            _ = fn(A)
            t2 = time.perf_counter()

            if i >= skip:
                elapsed += t2 - t1

        print(f"{batch}  {elapsed / (nruns - skip):.4e}")
    print()

batched_eig = vmap(eig)
batched_prod = vmap(prod)

nruns = 10
skip = 1
ndim = 10

print("test eig...")
print("vmap version")
test_fn(batched_eig)

print("No vmap")
test_fn(eig)

print("test prod...")
print("vmap version")
test_fn(batched_prod)

print("No vmap")
test_fn(prod)

from functorch.

zou3519 avatar zou3519 commented on June 20, 2024

@pablomazo thank you for your feedback!

I can reproduce your numbers. For the first case, it looks like we didn't implement the batching rule for linalg.eigh. After implementing that, the performance numbers look comparable:

vmap version
1  2.3085e-04
2  3.2712e-04
3  4.3248e-04
4  5.4156e-04
5  6.4610e-04
6  7.5422e-04
7  8.5887e-04
8  9.6214e-04
9  1.0669e-03

No vmap
1  1.5184e-04
2  2.5819e-04
3  3.5842e-04
4  4.8089e-04
5  5.6922e-04
6  6.6924e-04
7  7.7815e-04
8  8.8299e-04
9  1.0003e-03

For the second case, I suspect something is wrong with our matmul batching rule.

In theory, vmap should be able to give you code that is similar in performance to manually batched code (module some overhead that becomes negligible as the batch size increases), but we haven't built out all of the batching rules necessary to make it performant.

from functorch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.