Comments (20)
Doesn't look like it's the CUDA Toolkit version AFAICT. I've built with 10.1 in a fresh conda environment without any problems. Here's the entire package list for my conda env:
# packages in environment at /home/sharvil/.miniconda2/envs/haste_cuda10.1:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
binutils_impl_linux-64 2.31.1 h6176602_1
binutils_linux-64 2.31.1 h6176602_9
ca-certificates 2020.1.1 0
certifi 2020.4.5.1 py37_0
cudatoolkit-dev 10.1.243 h516909a_3 conda-forge
gcc_impl_linux-64 7.3.0 habb00fd_1
gcc_linux-64 7.3.0 h553295d_9
gxx_impl_linux-64 7.3.0 hdf63c60_1
gxx_linux-64 7.3.0 h553295d_9
libedit 3.1.20181209 hc058e9b_0
libffi 3.2.1 hd88cf55_4
libgcc-ng 9.1.0 hdf63c60_0
libstdcxx-ng 9.1.0 hdf63c60_0
ncurses 6.2 he6710b0_1
openssl 1.0.2u h7b6447c_0
pip 20.0.2 py37_3
python 3.7.0 h6e4f718_3
python_abi 3.7 1_cp37m conda-forge
readline 7.0 h7b6447c_5
setuptools 46.4.0 py37_0
sqlite 3.31.1 h62c20be_1
tk 8.6.8 hbc83047_0
wheel 0.34.2 py37_0
xz 5.2.5 h7b6447c_0
zlib 1.2.11 h7b6447c_3
These packages came from running:
conda create -n haste_cuda10.1
conda activate haste_cuda10.1
conda install python==3.7
conda install -c conda-forge cudatoolkit-dev
conda install gxx_linux-64
I then built haste by setting the include path to conda's include
directory in the Makefile
. I've also built successfully against CUDA Toolkit 10.0.
What's the host compiler and version that you're using? I'm using gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
.
from haste.
First of all - thanks for your help (and the package, which - based on the description - is awesome).
1.a I have executed your list of commands (to create haste_cuda10.1 etc) and result for me is the same. I have amended -I key to anaconda include and the same error appears (actually, my anaconda include dir does not contain cublas_v2.h at all, so it appears that compiler has the headers somewhere itself).
gcc / g++ is gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36)
but as it is nvcc fails, not gcc, I think gcc version is not relevant (does nvcc uses gcc under the hood???)
NVCC shows version: Cuda compilation tools, release 10.1, V10.1.243
1.b WORKAROUND: I have managed to compile haste by amending haste.h to directly declare needed functions and call appropriate H/S/D methods of cublas (see attach).
blas.txt
May be it is a good idea to replace blas.h with my version? I think I might be not the only one running into this mysterious error -- it is not so fancy / easy code, but it compiles.
- I ran into another issue - once I have compiled everything using 1.b workaround, trying importing haste_tf in tensorflow results in:
NotFoundError: /home/amur/anaconda2/envs/tf-2-gpu-py3p6/lib/python3.6/site-packages/haste_tf/libhaste_tf.so: undefined symbol: _ZN10tensorflow12OpDefBuilder4AttrESs
Which is very very weird, as both compilation and notebook where I try to import it now show exact same tf configuarion options:
compiling with:
g++ -std=c++11 -c frameworks/tf/lstm.cc -o frameworks/tf/lstm.o -I/usr/include/eigen3 -I/usr/local/cuda-10.1/include -Ilib -O3 -I/home/amur/anaconda2/envs/tf-2-gpu-py3p6/lib/python3.6/site-packages/tensorflow_core/include -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC
[... other files compiled with same flags ...]
linking with:
g++ -shared frameworks/tf/*.o libhaste.a -o frameworks/tf/libhaste_tf.so -L/usr/local/cuda-10.1/lib64 -L. -lcudart -lcublas -L/home/amur/anaconda2/envs/tf-2-gpu-py3p6/lib/python3.6/site-packages/tensorflow_core -l:libtensorflow_framework.so.2 -fPIC
when I try to import haste_tf and it fails:
tf.sysconfig.get_compile_flags()
['-I/home/amur/anaconda2/envs/tf-2-gpu-py3p6/lib/python3.6/site-packages/tensorflow_core/include',
'-D_GLIBCXX_USE_CXX11_ABI=1']
tf.sysconfig.get_link_flags()
['-L/home/amur/anaconda2/envs/tf-2-gpu-py3p6/lib/python3.6/site-packages/tensorflow_core',
'-l:libtensorflow_framework.so.2']
So everything matches, but still - unresolved symbols. Any ideas?
from haste.
UPDATE:
problem solved, now imports without error. Apparently GCC 4.8 I was using ignores D_GLIBCXX_USE_CXX11_ABI = 1 flag, and it is needed.
(a) installing gcc_linux-64 / gxx_linux-64 on my environment
(b) amending Makefile to use x86_64-conda_cos6-linux-gnu-g++ as $CXX
and
(c) adding -L /lib64 to the $LOCAL_LDFLAGS (otherwise it was failing to find -lcublas)
solved the issue.
PS Please note that it STILL does not compile in the original -- I had to use 1.b workaround re-writing haste.h as mentioned to make it compile on my system.
from haste.
Thanks for the update. I'm glad you have it running.
nvcc
splits source code into CUDA kernels and non-CUDA code, and then delegates compilation of non-CUDA code to the host compiler. My guess is that gcc 4.8.5 is rejecting the program due to a bug in the compiler. The code is valid C++11 so you might need to update gcc (you can use a newer gcc from conda).
You shouldn't need to rewrite the Makefile to set x86_64-conda_cos6-linux-gnu-g++
as $CXX
. If you deactivate and reactivate your conda environment after installing gxx_linux-64
, conda will set $CXX
correctly and the provided Makefile will use it automatically.
I'm surprised that cuBLAS is installed in /lib64. That's a non-standard (and quite privileged) directory for a user library.
from haste.
Well, it looks like nvcc
doesn't honor the $CXX
environment variable and invokes gcc
blindly. In your case, it's probably picking up the host compiler even though you've installed a newer gcc from conda so you're running into the gcc 4.8.5 compilation bug.
I've updated the Makefile to force nvcc to use gcc from $CXX
. I'd appreciate it if you could pull the latest code and try compiling without your changes to blas.h
.
from haste.
Thanks, I will check your fixes tomorrow with the original blas.h, as it is quite late here already at my place.
On a separate note, not a bug - but just lack of my TF knowledge most likely - any ideas how to wrap haste.LSTM into the keras layer so I can re-use other elements of the model which are native to keras? Naive solution to create a simple wrapper did not work, as a keras class wrapper does not pick up trainable_variables created by haste.LSTM as trainable_weights.
If there is no easy solution I will just write my own keras-version of TF haste classes, should not be too difficult.
from haste.
All of the RNN layers in Haste inherit from tf.Module so you do have access to the trainable_variables
, variables
and submodules
properties. The variables are typically created on-demand as part of your first call into the layer. If you need the variables to be defined before first use, you can call build(shape)
.
For example:
import haste_tf as haste
import tensorflow as tf
N, T, in_channels, out_channels = 5, 10, 15, 20
x = tf.random.normal([N, T, in_channels])
lstm = haste.LSTM(out_channels)
lstm.build(x.shape)
# rest of your code
If you know the shape but don't have an input tensor handy, you would do something like this:
import haste_tf as haste
import tensorflow as tf
N, T, in_channels, out_channels = 5, 10, 15, 20
lstm = haste.LSTM(out_channels)
lstm.build(tf.TensorShape([N, T, in_channels]))
# rest of your code
Hope that helps.
from haste.
Hello!
(a) I have checked, you corrections to Makefile solved the compile issue.
(b) I am sttill struggling to make TF model learn anything. Do you have any example of TF code which fits anything? I am still making my experiments so can't report any particular bug or anything, but just curious do you have any working code for TF you can share as an example?
from haste.
I am sure smth is wrong with the code, as I have replaced LSTM with your implementation, and training basically broke down, bias weight of LSTM is just stuck at exact zero.
Have you done comparisons in training on any toy examples in TF (for example - keras has multiple LSTM toy examples, have you tried replacing their LSTM with your implementation?)
If not - I can try that and report back, but I dont want to do the double work, so if you already have some working training example, I would appreciate if you throw it at me!
from haste.
Thanks for checking the build issue.
The validation
directory has code that verifies the activations and gradients produced by Haste are the same as what's produced by the official TensorFlow and PyTorch implementations. Also, all of our internal models are using Haste now and we were simply able to replace native RNN layers with Haste RNN layers without any issue.
What you're observing may be an issue with how Keras and Haste interoperate. Haste doesn't do anything special to support Keras, so maybe some additional work is required. We don't use Keras internally so we wouldn't have observed what you're observing.
If you can point me at a simple Keras example where swapping out their RNNs with Haste doesn't work, I can dig into it.
from haste.
Below is an example of my non-working interoperability with Keras.
I had to use raw TF gradient logics to demonstrate, but please see the 'additional issue' below (after the code).
import haste_tf as haste
import tensorflow as tf
from tensorflow.python.keras import layers as L
from tensorflow.python.keras import backend as K
embedding_size = 100 #n_channels
lstm_nunits = 200
ntimestamps = 300
batch_size = 16
class HasteLSTM(tf.keras.layers.Layer):
def __init__(self, num_units, dropout, zoneout, shape):
super(HasteLSTM, self).__init__()
self.haste_lstm = haste.LSTM(num_units = num_units, dropout = dropout, zoneout = zoneout, direction='unidirectional')
self.haste_lstm.build(shape)
def call(self, inputs, training):
return self.haste_lstm(inputs, training = training)
haste_lstm = HasteLSTM(lstm_nunits, 0.00, 0.00, [batch_size, ntimestamps, embedding_size])
#not really a CuDNN but a normal LSTM, so number of parameters matches
cudnn_lstm = L.LSTM(lstm_nunits, return_sequences = True, unit_forget_bias = False)
dummy_input = tf.random.normal([batch_size, ntimestamps, embedding_size])
dummy_target = np.zeros(shape=(batch_size, ntimestamps, lstm_nunits))
for i in range(dummy_target.shape[0]):
for j in range(dummy_target.shape[1]):
dummy_target[i,j,np.random.randint(0, lstm_nunits)] = 1 #one in random position for each timestamp
input_ = L.Input(shape = [ntimestamps, embedding_size])
model_ = haste_lstm(input_, training = True)
if isinstance(model_, tuple): model_ = model_[0] #take only output, no states
model_ = K.softmax(model_) #simple classificiton task
model_haste = tf.keras.Model(inputs=input_, outputs=model_, name='haste_model')
input_ = L.Input(shape = [ntimestamps, embedding_size])
model_ = cudnn_lstm(input_, training = True)
if isinstance(model_, tuple): model_ = model_[0] #take only output, no states
model_ = K.softmax(model_) #simple classification task
model_cudnn = tf.keras.Model(inputs=input_, outputs=model_, name='cudnn_model')
total_trainable = 0
haste_trainable = []
for w in haste_lstm.haste_lstm.trainable_variables:
K.set_value(w, np.zeros_like(w.numpy()))
haste_trainable.append(w)
total_trainable += w.numpy().flatten().shape[0]
print("HASTE has total %d trainable variables!" % total_trainable)
total_trainable = 0
cudnn_trainable = []
for w in cudnn_lstm.trainable_weights:
K.set_value(w, np.zeros_like(w.numpy()))
cudnn_trainable.append(w)
total_trainable += w.numpy().flatten().shape[0]
print("CuDNN has total %d trainable variables!" % total_trainable)
#check HASTE gradients on the dummy example
with tf.GradientTape() as tape:
prediction = model_haste(dummy_input, training=True)
loss = tf.keras.losses.categorical_crossentropy(dummy_target, prediction)
gradients = tape.gradient(loss, haste_trainable)
print("HASTE maxabs of each grad:")
for grad in gradients:
print (np.max(np.abs(grad)))
print("Non-HASTE maxabs of each grad:")
#check CuDNN (actually - plain LSTM) gradients on the dummy example
with tf.GradientTape() as tape:
prediction = model_cudnn(dummy_input, training=True)
loss = tf.keras.losses.categorical_crossentropy(dummy_target, prediction)
gradients = tape.gradient(loss, cudnn_trainable)
for grad in gradients:
print (np.max(np.abs(grad)))
The additional issue is that even if gradients are correct and you write your own Keras layer (as I did - HasteLSTM), you can't add self.haste_lstm.trainable_variables to self.trainable_weights (as it is read-only and weights are supposed to get there automatically when you create them, but in case of my wrapping of HASTE they don't). It means you can't really use keras fitting API as it only computes gradients w.r.t of weights which are in model.trainable_weights, and as you can check yourself model_haste.trainable_weights is empy.
Of course, using tf.GradientTape() raw-tf style fitting should work as you can manually ask TF to calculate grads w.r.t. all neccessary parameters, which is just not very convenient (well, and as described above - even that currently does not work as grads appear to be wrong even in this case, or I am doing smth wrong).
from haste.
Okay, so there are 2 separate issues that we're discussing here. First, you're seeing a discrepancy between gradients produced by Haste and gradients produced by TensorFlow. Second, Haste RNN layers can't be used as-is with the Keras fitting API.
For the first issue, I see correct output from Haste when I run the script you provided:
HASTE has total 240800 trainable variables!
CuDNN has total 240800 trainable variables!
HASTE maxabs of each grad:
7.5397606
5.6501727
Non-HASTE maxabs of each grad:
5.650175
0.0
7.539993
Note that the printout isn't in the same order but the values are approximately the same (there will always be a small difference between any two implementations that aren't identical due to non-associativity and limited precision of floating point operations).
As for the second issue, yes, I understand the problem. Unfortunately, it looks like the Keras API in TensorFlow (tf.keras.layers.Layer
) is incompatible with other parts of the TensorFlow API (tf.Module
). This is not surprising; TensorFlow has a history of being inconsistent and incompatible with itself. I'll need some time to come up with a good solution here.
from haste.
Interesting! I have a different output on this code (note that values for abs grads are a bit different but that's due to the random nature of dummy inputs / targets):
HASTE has total 240800 trainable variables!
CuDNN has total 240800 trainable variables!
HASTE maxabs of each grad:
0.0
0.0
Non-HASTE maxabs of each grad:
6.444291
0.0
7.539972
from haste.
Full code which is being run after the re-start of the kernel is:
TF is 2.1.0 and GPU is quite old but above 6.0 CC --> I have K80 for experiments here. Can try on V100 for example, which GPUs do you use for tests?
from haste.
Hmm, that's funny. I'm using TensorFlow 1.14 with eager execution enabled (Python 3.6.9). Which versions are you running?
from haste.
Ah, we posted at the same time. I'm using 2080Ti and 1080Ti locally. We can also experiment with K80 and T4 on Colab if needed.
from haste.
If you can test it on K80 with Colab and I can test on 2080Ti // Titan RTX 8000 // V100?
from haste.
These are the configurations I've tried so far:
TF 1.14 - 1080Ti
TF 1.14 - 2080Ti
TF 2.0 - 2080Ti
TF 2.2 - P100
All of these configurations produce correct output. I wasn't able to get a K80 from Colab; looks like they're providing P100s now. Here's a link to the Colab notebook I used.
from haste.
I will test on 2080Ti over the weekend and report back.
Thanks for your help though!
If 2080Ti works while K80 doesn't --> K80 are still available on Amazon Cloud. I know K80 is quite an old chip, and we use it purely for experiments as have a huge surplus of those, but I feel like people still use K80 a lot (for same reasons - lots of them), and if it is a K80 issue it probably worth fixing.
from haste.
Confirmed that it is a K80 issue, see the new opened issue. I am closing this one as your amendments to the makefile solved the original installation issue. Thank you!
from haste.
Related Issues (20)
- Install on pip on systems without cuda HOT 7
- Segmentation fault on Cuda 10.0 HOT 2
- Support zoneout on lstm cell state and add recurrent dropout HOT 2
- CUDA error: an illegal memory access was encountered HOT 6
- haste_pytorch: Gradient for kernel/recurrent_kernel becomes zero when trained on gpu HOT 4
- How to expose LayerNormGRUCell to python ? HOT 2
- Can't run haste layers in Keras HOT 12
- Biases in final IndRNN layer are 0 HOT 1
- Zoneout remains during eval() HOT 2
- return_state_sequence for tf version
- layer_norm_gru_cell HOT 1
- Can Bidirectional Rnn and multi-layer Rnn be supported? HOT 1
- Activation function in IndRNN HOT 1
- haste_pytorch does not install properly with conda cudatoolkit? HOT 3
- Feature request for cell classes for pytorch HOT 7
- `RNN`s with `zoneout > 0.0` have wrong gradients HOT 1
- haste_tf compilation fails with "‘bfloat16’ in namespace ‘Eigen’ does not name a type"
- Support for PyTorch packed sequences HOT 2
- Supporting RWKV (a RNN that can match transformer LM & zero-shot performance at 1B+ params)
- Nan loss when replace pytorch LSTM with your LSTM or LayerNormLSTM HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from haste.