Code Monkey home page Code Monkey logo

Comments (20)

sharvil avatar sharvil commented on May 29, 2024

Doesn't look like it's the CUDA Toolkit version AFAICT. I've built with 10.1 in a fresh conda environment without any problems. Here's the entire package list for my conda env:

# packages in environment at /home/sharvil/.miniconda2/envs/haste_cuda10.1:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main
binutils_impl_linux-64    2.31.1               h6176602_1
binutils_linux-64         2.31.1               h6176602_9
ca-certificates           2020.1.1                      0
certifi                   2020.4.5.1               py37_0
cudatoolkit-dev           10.1.243             h516909a_3    conda-forge
gcc_impl_linux-64         7.3.0                habb00fd_1
gcc_linux-64              7.3.0                h553295d_9
gxx_impl_linux-64         7.3.0                hdf63c60_1
gxx_linux-64              7.3.0                h553295d_9
libedit                   3.1.20181209         hc058e9b_0
libffi                    3.2.1                hd88cf55_4
libgcc-ng                 9.1.0                hdf63c60_0
libstdcxx-ng              9.1.0                hdf63c60_0
ncurses                   6.2                  he6710b0_1
openssl                   1.0.2u               h7b6447c_0
pip                       20.0.2                   py37_3
python                    3.7.0                h6e4f718_3
python_abi                3.7                     1_cp37m    conda-forge
readline                  7.0                  h7b6447c_5
setuptools                46.4.0                   py37_0
sqlite                    3.31.1               h62c20be_1
tk                        8.6.8                hbc83047_0
wheel                     0.34.2                   py37_0
xz                        5.2.5                h7b6447c_0
zlib                      1.2.11               h7b6447c_3

These packages came from running:

conda create -n haste_cuda10.1
conda activate haste_cuda10.1
conda install python==3.7
conda install -c conda-forge cudatoolkit-dev
conda install gxx_linux-64

I then built haste by setting the include path to conda's include directory in the Makefile. I've also built successfully against CUDA Toolkit 10.0.

What's the host compiler and version that you're using? I'm using gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0.

from haste.

amurashov avatar amurashov commented on May 29, 2024

First of all - thanks for your help (and the package, which - based on the description - is awesome).

1.a I have executed your list of commands (to create haste_cuda10.1 etc) and result for me is the same. I have amended -I key to anaconda include and the same error appears (actually, my anaconda include dir does not contain cublas_v2.h at all, so it appears that compiler has the headers somewhere itself).

gcc / g++ is gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36)

but as it is nvcc fails, not gcc, I think gcc version is not relevant (does nvcc uses gcc under the hood???)

NVCC shows version: Cuda compilation tools, release 10.1, V10.1.243

1.b WORKAROUND: I have managed to compile haste by amending haste.h to directly declare needed functions and call appropriate H/S/D methods of cublas (see attach).
blas.txt

May be it is a good idea to replace blas.h with my version? I think I might be not the only one running into this mysterious error -- it is not so fancy / easy code, but it compiles.

  1. I ran into another issue - once I have compiled everything using 1.b workaround, trying importing haste_tf in tensorflow results in:

NotFoundError: /home/amur/anaconda2/envs/tf-2-gpu-py3p6/lib/python3.6/site-packages/haste_tf/libhaste_tf.so: undefined symbol: _ZN10tensorflow12OpDefBuilder4AttrESs

Which is very very weird, as both compilation and notebook where I try to import it now show exact same tf configuarion options:

compiling with:

g++ -std=c++11 -c frameworks/tf/lstm.cc -o frameworks/tf/lstm.o -I/usr/include/eigen3 -I/usr/local/cuda-10.1/include -Ilib -O3 -I/home/amur/anaconda2/envs/tf-2-gpu-py3p6/lib/python3.6/site-packages/tensorflow_core/include -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC

[... other files compiled with same flags ...]

linking with:

g++ -shared frameworks/tf/*.o libhaste.a -o frameworks/tf/libhaste_tf.so -L/usr/local/cuda-10.1/lib64 -L. -lcudart -lcublas -L/home/amur/anaconda2/envs/tf-2-gpu-py3p6/lib/python3.6/site-packages/tensorflow_core -l:libtensorflow_framework.so.2 -fPIC

when I try to import haste_tf and it fails:

tf.sysconfig.get_compile_flags()

['-I/home/amur/anaconda2/envs/tf-2-gpu-py3p6/lib/python3.6/site-packages/tensorflow_core/include',
'-D_GLIBCXX_USE_CXX11_ABI=1']

tf.sysconfig.get_link_flags()

['-L/home/amur/anaconda2/envs/tf-2-gpu-py3p6/lib/python3.6/site-packages/tensorflow_core',
'-l:libtensorflow_framework.so.2']

So everything matches, but still - unresolved symbols. Any ideas?

from haste.

amurashov avatar amurashov commented on May 29, 2024

UPDATE:

problem solved, now imports without error. Apparently GCC 4.8 I was using ignores D_GLIBCXX_USE_CXX11_ABI = 1 flag, and it is needed.
(a) installing gcc_linux-64 / gxx_linux-64 on my environment
(b) amending Makefile to use x86_64-conda_cos6-linux-gnu-g++ as $CXX
and
(c) adding -L /lib64 to the $LOCAL_LDFLAGS (otherwise it was failing to find -lcublas)

solved the issue.

PS Please note that it STILL does not compile in the original -- I had to use 1.b workaround re-writing haste.h as mentioned to make it compile on my system.

from haste.

sharvil avatar sharvil commented on May 29, 2024

Thanks for the update. I'm glad you have it running.

nvcc splits source code into CUDA kernels and non-CUDA code, and then delegates compilation of non-CUDA code to the host compiler. My guess is that gcc 4.8.5 is rejecting the program due to a bug in the compiler. The code is valid C++11 so you might need to update gcc (you can use a newer gcc from conda).

You shouldn't need to rewrite the Makefile to set x86_64-conda_cos6-linux-gnu-g++ as $CXX. If you deactivate and reactivate your conda environment after installing gxx_linux-64, conda will set $CXX correctly and the provided Makefile will use it automatically.

I'm surprised that cuBLAS is installed in /lib64. That's a non-standard (and quite privileged) directory for a user library.

from haste.

sharvil avatar sharvil commented on May 29, 2024

Well, it looks like nvcc doesn't honor the $CXX environment variable and invokes gcc blindly. In your case, it's probably picking up the host compiler even though you've installed a newer gcc from conda so you're running into the gcc 4.8.5 compilation bug.

I've updated the Makefile to force nvcc to use gcc from $CXX. I'd appreciate it if you could pull the latest code and try compiling without your changes to blas.h.

from haste.

amurashov avatar amurashov commented on May 29, 2024

Thanks, I will check your fixes tomorrow with the original blas.h, as it is quite late here already at my place.

On a separate note, not a bug - but just lack of my TF knowledge most likely - any ideas how to wrap haste.LSTM into the keras layer so I can re-use other elements of the model which are native to keras? Naive solution to create a simple wrapper did not work, as a keras class wrapper does not pick up trainable_variables created by haste.LSTM as trainable_weights.

If there is no easy solution I will just write my own keras-version of TF haste classes, should not be too difficult.

from haste.

sharvil avatar sharvil commented on May 29, 2024

All of the RNN layers in Haste inherit from tf.Module so you do have access to the trainable_variables, variables and submodules properties. The variables are typically created on-demand as part of your first call into the layer. If you need the variables to be defined before first use, you can call build(shape).

For example:

import haste_tf as haste
import tensorflow as tf

N, T, in_channels, out_channels = 5, 10, 15, 20
x = tf.random.normal([N, T, in_channels])

lstm = haste.LSTM(out_channels)
lstm.build(x.shape)

# rest of your code

If you know the shape but don't have an input tensor handy, you would do something like this:

import haste_tf as haste
import tensorflow as tf

N, T, in_channels, out_channels = 5, 10, 15, 20

lstm = haste.LSTM(out_channels)
lstm.build(tf.TensorShape([N, T, in_channels]))

# rest of your code

Hope that helps.

from haste.

amurashov avatar amurashov commented on May 29, 2024

Hello!

(a) I have checked, you corrections to Makefile solved the compile issue.

(b) I am sttill struggling to make TF model learn anything. Do you have any example of TF code which fits anything? I am still making my experiments so can't report any particular bug or anything, but just curious do you have any working code for TF you can share as an example?

from haste.

amurashov avatar amurashov commented on May 29, 2024

I am sure smth is wrong with the code, as I have replaced LSTM with your implementation, and training basically broke down, bias weight of LSTM is just stuck at exact zero.

Have you done comparisons in training on any toy examples in TF (for example - keras has multiple LSTM toy examples, have you tried replacing their LSTM with your implementation?)

If not - I can try that and report back, but I dont want to do the double work, so if you already have some working training example, I would appreciate if you throw it at me!

from haste.

sharvil avatar sharvil commented on May 29, 2024

Thanks for checking the build issue.

The validation directory has code that verifies the activations and gradients produced by Haste are the same as what's produced by the official TensorFlow and PyTorch implementations. Also, all of our internal models are using Haste now and we were simply able to replace native RNN layers with Haste RNN layers without any issue.

What you're observing may be an issue with how Keras and Haste interoperate. Haste doesn't do anything special to support Keras, so maybe some additional work is required. We don't use Keras internally so we wouldn't have observed what you're observing.

If you can point me at a simple Keras example where swapping out their RNNs with Haste doesn't work, I can dig into it.

from haste.

amurashov avatar amurashov commented on May 29, 2024

Below is an example of my non-working interoperability with Keras.

I had to use raw TF gradient logics to demonstrate, but please see the 'additional issue' below (after the code).

import haste_tf as haste
import tensorflow as tf
from tensorflow.python.keras import layers as L
from tensorflow.python.keras import backend as K

embedding_size = 100 #n_channels
lstm_nunits = 200
ntimestamps = 300
batch_size = 16

class HasteLSTM(tf.keras.layers.Layer):
    def __init__(self, num_units, dropout, zoneout, shape):
      super(HasteLSTM, self).__init__()
      self.haste_lstm = haste.LSTM(num_units = num_units, dropout = dropout, zoneout = zoneout, direction='unidirectional')
      self.haste_lstm.build(shape)

    def call(self, inputs, training):
       return self.haste_lstm(inputs, training = training)

haste_lstm = HasteLSTM(lstm_nunits, 0.00, 0.00, [batch_size, ntimestamps, embedding_size])

#not really a CuDNN but a normal LSTM, so number of parameters matches
cudnn_lstm = L.LSTM(lstm_nunits, return_sequences = True, unit_forget_bias = False)



dummy_input  = tf.random.normal([batch_size, ntimestamps, embedding_size])
dummy_target = np.zeros(shape=(batch_size, ntimestamps, lstm_nunits))

for i in range(dummy_target.shape[0]):
    for j in range(dummy_target.shape[1]):
        dummy_target[i,j,np.random.randint(0, lstm_nunits)] = 1 #one in random position for each timestamp


input_ = L.Input(shape = [ntimestamps, embedding_size])
model_ = haste_lstm(input_, training = True)
if isinstance(model_, tuple): model_ = model_[0] #take only output, no states
model_ = K.softmax(model_) #simple classificiton task

model_haste = tf.keras.Model(inputs=input_, outputs=model_, name='haste_model')

input_ = L.Input(shape = [ntimestamps, embedding_size])
model_ = cudnn_lstm(input_, training = True)
if isinstance(model_, tuple): model_ = model_[0] #take only output, no states
model_ = K.softmax(model_) #simple classification task

model_cudnn = tf.keras.Model(inputs=input_, outputs=model_, name='cudnn_model')

total_trainable = 0
haste_trainable = []
for w in haste_lstm.haste_lstm.trainable_variables:
    K.set_value(w, np.zeros_like(w.numpy()))
    haste_trainable.append(w)
    total_trainable += w.numpy().flatten().shape[0]
print("HASTE has total %d trainable variables!" % total_trainable)

total_trainable = 0
cudnn_trainable = []
for w in cudnn_lstm.trainable_weights:
    K.set_value(w, np.zeros_like(w.numpy()))
    cudnn_trainable.append(w)
    total_trainable += w.numpy().flatten().shape[0]
print("CuDNN has total %d trainable variables!" % total_trainable)


#check HASTE gradients on the dummy example
with tf.GradientTape() as tape:
    prediction = model_haste(dummy_input, training=True)
    loss = tf.keras.losses.categorical_crossentropy(dummy_target, prediction)
    
gradients = tape.gradient(loss, haste_trainable)

print("HASTE maxabs of each grad:")
for grad in gradients:
    print (np.max(np.abs(grad)))
    

print("Non-HASTE maxabs of each grad:")
#check CuDNN (actually - plain LSTM) gradients on the dummy example
with tf.GradientTape() as tape:
    prediction = model_cudnn(dummy_input, training=True)
    loss = tf.keras.losses.categorical_crossentropy(dummy_target, prediction)
    
gradients = tape.gradient(loss, cudnn_trainable)
for grad in gradients:
    print (np.max(np.abs(grad)))

The additional issue is that even if gradients are correct and you write your own Keras layer (as I did - HasteLSTM), you can't add self.haste_lstm.trainable_variables to self.trainable_weights (as it is read-only and weights are supposed to get there automatically when you create them, but in case of my wrapping of HASTE they don't). It means you can't really use keras fitting API as it only computes gradients w.r.t of weights which are in model.trainable_weights, and as you can check yourself model_haste.trainable_weights is empy.

Of course, using tf.GradientTape() raw-tf style fitting should work as you can manually ask TF to calculate grads w.r.t. all neccessary parameters, which is just not very convenient (well, and as described above - even that currently does not work as grads appear to be wrong even in this case, or I am doing smth wrong).

from haste.

sharvil avatar sharvil commented on May 29, 2024

Okay, so there are 2 separate issues that we're discussing here. First, you're seeing a discrepancy between gradients produced by Haste and gradients produced by TensorFlow. Second, Haste RNN layers can't be used as-is with the Keras fitting API.

For the first issue, I see correct output from Haste when I run the script you provided:

HASTE has total 240800 trainable variables!
CuDNN has total 240800 trainable variables!
HASTE maxabs of each grad:
7.5397606
5.6501727
Non-HASTE maxabs of each grad:
5.650175
0.0
7.539993

Note that the printout isn't in the same order but the values are approximately the same (there will always be a small difference between any two implementations that aren't identical due to non-associativity and limited precision of floating point operations).

As for the second issue, yes, I understand the problem. Unfortunately, it looks like the Keras API in TensorFlow (tf.keras.layers.Layer) is incompatible with other parts of the TensorFlow API (tf.Module). This is not surprising; TensorFlow has a history of being inconsistent and incompatible with itself. I'll need some time to come up with a good solution here.

from haste.

amurashov avatar amurashov commented on May 29, 2024

Interesting! I have a different output on this code (note that values for abs grads are a bit different but that's due to the random nature of dummy inputs / targets):

HASTE has total 240800 trainable variables!
CuDNN has total 240800 trainable variables!
HASTE maxabs of each grad:
0.0
0.0
Non-HASTE maxabs of each grad:
6.444291
0.0
7.539972

from haste.

amurashov avatar amurashov commented on May 29, 2024

Full code which is being run after the re-start of the kernel is:

haste_test.txt

TF is 2.1.0 and GPU is quite old but above 6.0 CC --> I have K80 for experiments here. Can try on V100 for example, which GPUs do you use for tests?

from haste.

sharvil avatar sharvil commented on May 29, 2024

Hmm, that's funny. I'm using TensorFlow 1.14 with eager execution enabled (Python 3.6.9). Which versions are you running?

from haste.

sharvil avatar sharvil commented on May 29, 2024

Ah, we posted at the same time. I'm using 2080Ti and 1080Ti locally. We can also experiment with K80 and T4 on Colab if needed.

from haste.

amurashov avatar amurashov commented on May 29, 2024

If you can test it on K80 with Colab and I can test on 2080Ti // Titan RTX 8000 // V100?

from haste.

sharvil avatar sharvil commented on May 29, 2024

These are the configurations I've tried so far:
TF 1.14 - 1080Ti
TF 1.14 - 2080Ti
TF 2.0 - 2080Ti
TF 2.2 - P100

All of these configurations produce correct output. I wasn't able to get a K80 from Colab; looks like they're providing P100s now. Here's a link to the Colab notebook I used.

from haste.

amurashov avatar amurashov commented on May 29, 2024

I will test on 2080Ti over the weekend and report back.
Thanks for your help though!

If 2080Ti works while K80 doesn't --> K80 are still available on Amazon Cloud. I know K80 is quite an old chip, and we use it purely for experiments as have a huge surplus of those, but I feel like people still use K80 a lot (for same reasons - lots of them), and if it is a K80 issue it probably worth fixing.

from haste.

amurashov avatar amurashov commented on May 29, 2024

Confirmed that it is a K80 issue, see the new opened issue. I am closing this one as your amendments to the makefile solved the original installation issue. Thank you!

from haste.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.