Hi! Does anybody know how to use Embedder.sents2elmo() as a Layer fo

Yeah, but I would like to use non-english Embeddings from <a href="https://github.com/

how about hub.Module("https://tfhub.dev/google/elmo/2",</cod

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

ELMoForManyLangs as Keras Layer about elmoformanylangs HOT 13 OPEN

hit-scir commented on September 4, 2024 3

ELMoForManyLangs as Keras Layer

from elmoformanylangs.

Comments (13)

juckeltour commented on September 4, 2024 3

Yeah, but I would like to use non-english Embeddings from ELMoForManyLangs and need to be able to retrain them with custom training data. tfhub is no option, I think.

from elmoformanylangs.

juckeltour commented on September 4, 2024 1

Thanks for answering! I think we can use the Lambda layer or create a custom one. I don' know how to handle the Tensors that keras gives and return a data structure that keras accepts.

e = Embedder('...')
sess = tf.Session()
def ElmoEmbedding(x):
    with sess.as_default():
        return tf.convert_to_tensor(e.sents2elmo(x.eval())[0]) # this does not work
...
embedding = Lambda(ElmoEmbedding, output_shape=(None, max_len, 1024))(input_text)

from elmoformanylangs.

bazzmx commented on September 4, 2024 1

So, I have a workaround but it is somewhat impractical. Obtain the vocabulary of your dataset, then create an embedding file similar to a word2vec or glove file (ie. word 1024-dim-weights per line). And then implement an custom weight embedding layer, it worked ok for Spanish.

from elmoformanylangs.

nhatsmrt commented on September 4, 2024

You can create a Keras Sequence, in which you can apply the embedder on the input sequences, and use model.fit_on_generator/predict_on_generator

from elmoformanylangs.

muximus3 commented on September 4, 2024

how about hub.Module("https://tfhub.dev/google/elmo/2", trainable=True)`

from elmoformanylangs.

Rusiecki commented on September 4, 2024

Hey juckeltour did you solve the problem?

I found a tutorial that comes close to a task that i try to solve. let me know if you found a solution

maybe this will help you out :

http://hunterheidenreich.com/blog/elmo-word-vectors-in-keras/

from elmoformanylangs.

marinkreso95 commented on September 4, 2024

Hi @juckeltour, have you found the way to use ELMoForManyLangs embeddings with Keras

from elmoformanylangs.

commented on September 4, 2024

can anyone find any solution?

from elmoformanylangs.

ai-nlp commented on September 4, 2024

can anyone find any solution?

Hi guys, anyone found a solution yet?

from elmoformanylangs.

juckeltour commented on September 4, 2024

No, I didn't solve this problem.

We switched to BERT (and pytorch)...

from elmoformanylangs.

erk4n commented on September 4, 2024

@bazzmx do you have a code snippet?

from elmoformanylangs.

bazzmx commented on September 4, 2024

This is based on this blog post

First generate a list of unique words in your vocabulary and obtain their corresponding elmo embbedings using elmoformanylangs using sent2elmo and save to a file that contains one word and its 1024_dim_weights per line.
In this example this file is emb_table.txt and the words that compose my vocabulary are the set of lemmas and words that will be used during training and testing

vocab = list(set(list(words_train)+list(lemmas_train))) # unique words in your vocabulary
enumerated_vocab = enumerate(sorted(vocab), 1) # indexed words
index_labels = {} # index_no:word/label dict
for i in enumerated_vocab:
    index_labels[i[0]]=i[1] # appends words and indexes to dict
labels_index = {v:k for k,v in index_labels.items()} # reversed dict = word/labels:index

With these dictionaries then you create the index to embedding matrix. Now that you are working with indexes you have to remember to convert all your words (string) to these indexes (int32).

embeddings_index = {}
f = open("./emb_table.txt", encoding="utf8") # opens generated elmo embeddings file
for line in f:
    values = line.split() # splits each line
    word = values[0] # first value is the word entry
    coefs = np.asarray(values[1:], dtype='float32') # the rest is converted to an embedding array
    embeddings_index[word] = coefs # appends data to dict
f.close()

Now we generate an embedding matrix putting all the pieces together:

EMBEDDING_DIM = 1024 # elmo's default size
embedding_matrix = np.zeros((len(vocab) + 1, EMBEDDING_DIM)) # creates a nx1024 matrix
for word, i in labels_index.items():
    embedding_vector = embeddings_index.get(word)
    if embedding_vector is not None:
        embedding_matrix[i] = embedding_vector

With all this elements then you add an input layer and an embedding layer to your model, the input will be int32 because you have to convert each word to its index, then the embedding layer will assign the corresponding weights to each index:

input_words = Input(shape=(max_len,), dtype="int32", name="input_words")
EMB = Embedding(len(vocab) + 1,
                EMBEDDING_DIM,
                weights=[embedding_matrix],
                input_length=(max_len,),
                trainable=False, name="embedding")

emb_text = EMB(input_words)

Your input words should be an array of indexes that you can convert again to words using the dictionaries that were created.

This is an impractical workaround but it does the job for now, the key limitations are that you don't have acces to the embeddings in the same way that you would using tfhub.

I tried using the lambda layer approach but I ended up getting errors related to tensors and map_fn, etc,

from elmoformanylangs.

DuyguA commented on September 4, 2024

Anyone was able to make text classification in Keras successfully with sentence vectors?

from elmoformanylangs.

ELMoForManyLangs as Keras Layer about elmoformanylangs HOT 13 OPEN

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent