Hi, Thanks for the excellent work on repo. I was able to train and f

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Issue <a class="issue-link js-issue-link" data-error-text="Failed to load title" data-

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Sorry about that. The lines are: <div class="Box Box--condensed

Exporting finetuned model to SavedModel format for Tensorflow Serving about bilm-tf HOT 6 CLOSED

mohammedayub44 commented on August 16, 2024

Exporting finetuned model to SavedModel format for Tensorflow Serving

from bilm-tf.

Comments (6)

carolmanderson commented on August 16, 2024 1

@mohammedayub44 ah, ok. In that case, you can export the model as described in #107 and reload it in Tensorflow 2 within your Streamlit app. Here's sample code (caveat: I haven't run this in a Streamlit app. But I have confirmed it works in Tensorflow 2):

import tensorflow as tf

from bilm import Batcher

# reload the model
loaded = tf.saved_model.load("/path/to/saved/model") # this is a directory. Don't include the file itself in the path. 
infer = loaded.signatures["serving_default"]

# get the char ids for your documents
vocab_file = '/path/to/my_vocab.txt'
batcher = Batcher(vocab_file, 50)
char_ids = batcher.batch_sentences([["Hello", "world"]])
char_ids = char_ids.astype('int32')    # must be cast to int32 before feeding to model

# get embeddings
embs = infer(tf.constant(char_ids))['import/concat_3:0']

Don't be alarmed if you see this message: INFO:tensorflow:Saver not created because there are no variables in the graph to restore. This is expected.

Regarding the output size, you'll get a 3 x 1024 tensor for every token in your input. So long documents or large batches can both cause large outputs.

from bilm-tf.

carolmanderson commented on August 16, 2024

Issue #193 contains an explanation of the outputs you can get from this implementation. Unlike the TF Hub implementation, the bilm-tf implementation can't directly give you a weighted sum of the three output layers. You can, however, weight the three output layers yourself, for example by including a keras WeightedAverage layer in a model that's consuming the ELMo embeddings. Note that the first of the three output layers from lm_embeddings contains the character-based representations you wanted.

Here's a code snippet for getting the three output layers. Also note that per my comment on issue #107, this code requires the model saved in Step 1, not the final model with TF serving tags. When you're ready to deploy the model in TF serving, use the model saved in Step 2.

import tensorflow as tf
from bilm import Batcher, BidirectionalLanguageModel

# load the saved model
frozen_graph = '/path/to/my_saved_model.pb'
with tf.gfile.GFile(frozen_graph, "rb") as f:
    restored_graph_def = tf.GraphDef()
    restored_graph_def.ParseFromString(f.read())

with tf.Graph().as_default() as graph:
	tf.import_graph_def(
	    restored_graph_def,
	    input_map=None,
	    return_elements=None,
	    name="")

output_node = graph.get_tensor_by_name("concat_3:0")
input_node = graph.get_tensor_by_name("Placeholder:0")

# generate character ids for your input documents
vocab_file = '/path/to/my_vocab.txt'
batcher = Batcher(vocab_file, 50)
char_ids = batcher.batch_sentences([["Hello", "world"]])

# get embeddings
sess = tf.Session(graph=graph)
my_feed_dict = {input_node: char_ids}
embs = sess.run(output_node, feed_dict=my_feed_dict)

Also, one more note: this model produces very large outputs. When you deploy the model in TF serving, the embeddings have to be serialized to be returned to you. If you're then feeding them to another model, they will have to be de-serialized. The serialization/deserialization steps are time-consuming, and it would be faster to deploy the models in native tensorflow, rather than via TF serving, so that the embeddings can be passed directly to the downstream model as numpy arrays, skipping the serialization/deserialization steps.

from bilm-tf.

mohammedayub44 commented on August 16, 2024

@carolmanderson Great. Thanks for in the detailed code snippet.

I was using Streamlit to build my prototype app. All my other word embedding models are using Tensorflow 2 and are natively loaded from checkpoints. Since this repo doesn't support TF2.0. I had to go down this route of including them as REST endpoints.

Good point about output size. I'm passing independent sentences. Does that depend on batch size or no of sentences. My guess is simple python pickling should work ?

from bilm-tf.

mohammedayub44 commented on August 16, 2024

@carolmanderson Thanks. Could you verify the lines to be commented in #107 . The link did not work unfortunately.

from bilm-tf.

carolmanderson commented on August 16, 2024

Sorry about that. The lines are:

bilm-tf/bilm/model.py

Lines 587 to 593 in 7cffee2

    
           with tf.control_dependencies([layer_output]): 
        
               # update the initial states 
        
               for i in range(2): 
        
                   new_state = tf.concat( 
        
                       [final_state[i][:batch_size, :], 
        
                        init_states[i][batch_size:, :]], axis=0) 
        
                   state_update_op = tf.assign(init_states[i], new_state)

from bilm-tf.

mohammedayub44 commented on August 16, 2024

No problem. It works smoothly in Tensorflow 2. Guess I will skip the serving part for now as loading natively works better for me using Streamlit.

from bilm-tf.

Exporting finetuned model to SavedModel format for Tensorflow Serving about bilm-tf HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	with tf.control_dependencies([layer_output]):
	# update the initial states
	for i in range(2):
	new_state = tf.concat(
	[final_state[i][:batch_size, :],
	init_states[i][batch_size:, :]], axis=0)
	state_update_op = tf.assign(init_states[i], new_state)