Code Monkey home page Code Monkey logo

Comments (22)

urialon avatar urialon commented on September 26, 2024

Hi @vijayantajain ,
Thank you for your interest in code2seq!

I hypothesize that it's just an issue of the first iteration - the model is not trained enough to produce good results, so sometimes the empty hypothesis is the best.
We can manually prevent the model from predicting end-of-sequence at the first step.
But before that - can you please try to disable Rouge (by returning always a dummy score, without checking rouge), so your model can continue training for additional epochs, and see if the problem is solved by itself in the next training iterations?

from code2seq.

vijayantajain avatar vijayantajain commented on September 26, 2024

Hi @urialon,

I was going through the logs from previous runs I tried and I came across a very similar problem. Here's the log for that.

Finished 1 epochs
Done testing, epoch reached
Evaluation time: 0h0m3s
Accuracy after 1 epochs: 0.00000
After 1 epochs: Precision: 0.99527, recall: 0.68522, F1: 0.81165
Rouge:  {'rouge-1': {'f': 0.012882575672040321, 'p': 0.010102335345053785, 'r': 0.02283571109591501}, 'rouge-2': {'f': 0.0002206531098050254, 'p': 0.00026968716289104636, 'r': 0.00018670649738610905}, 'rouge-l': {'f': 0.03772484042499432, 'p': 0.1407766990291262, 'r': 0.022670375460831883}}
Saved after 1 epochs in: models/DEBUG/funcom-modified-test-3-code2seq-large-test-2-ignore-empty-true/model_iter1
Finished 1 epochs
Done testing, epoch reached
Evaluation time: 0h0m1s
Accuracy after 2 epochs: 0.00000
After 2 epochs: Precision: 0.86817, recall: 0.67134, F1: 0.75717
Rouge:  {'rouge-1': {'f': 0.08981957268154325, 'p': 0.10148185998978035, 'r': 0.10669313639812034}, 'rouge-2': {'f': 0.00018318371095680936, 'p': 0.00025549310168625444, 'r': 0.00014277555682467162}, 'rouge-l': {'f': 0.12804680996155146, 'p': 0.7378640776699029, 'r': 0.07233086263065558}}
Finished 1 epochs
Done testing, epoch reached
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1365, in _do_call
    return fn(*args)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1350, in _run_fn
    target_list, run_metadata)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1443, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.OutOfRangeError: 2 root error(s) found.
  (0) Out of range: End of sequence
	 [[{{node IteratorGetNext}}]]
  (1) Out of range: End of sequence
	 [[{{node IteratorGetNext}}]]
	 [[model/gradients/model/embedding_lookup_grad/Size/_33]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/vijayantajain/code/experiments/code2seq/model.py", line 96, in train
    _, batch_loss = self.sess.run([optimizer, train_loss])
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 956, in run
    run_metadata_ptr)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1180, in _run
    feed_dict_tensor, options, run_metadata)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1359, in _do_run
    run_metadata)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/client/session.py", line 1384, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: 2 root error(s) found.
  (0) Out of range: End of sequence
	 [[node IteratorGetNext (defined at /opt/conda/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
  (1) Out of range: End of sequence
	 [[node IteratorGetNext (defined at /opt/conda/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
	 [[model/gradients/model/embedding_lookup_grad/Size/_33]]
0 successful operations.
0 derived errors ignored.

Original stack trace for 'IteratorGetNext':
  File "code2seq.py", line 39, in <module>
    model.train()
  File "/home/vijayantajain/code/experiments/code2seq/model.py", line 77, in train
    config=self.config)
  File "/home/vijayantajain/code/experiments/code2seq/reader.py", line 43, in __init__
    self.output_tensors = self.compute_output()
  File "/home/vijayantajain/code/experiments/code2seq/reader.py", line 192, in compute_output
    return self.iterator.get_next()
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/data/ops/iterator_ops.py", line 426, in get_next
    name=name)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/ops/gen_dataset_ops.py", line 2518, in iterator_get_next
    output_shapes=output_shapes, name=name)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
    op_def=op_def)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
    attrs, op_def, compute_device)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
    op_def=op_def)
  File "/opt/conda/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
    self._traceback = tf_stack.extract_stack()


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "code2seq.py", line 39, in <module>
    model.train()
  File "/home/vijayantajain/code/experiments/code2seq/model.py", line 108, in train
    results, precision, recall, f1, rouge = self.evaluate()
  File "/home/vijayantajain/code/experiments/code2seq/model.py", line 230, in evaluate
    hyp_path=predicted_file_name, ref_path=ref_file_name, avg=True, ignore_empty=True)
  File "/opt/conda/lib/python3.7/site-packages/rouge/rouge.py", line 47, in get_scores
    ignore_empty=ignore_empty)
  File "/opt/conda/lib/python3.7/site-packages/rouge/rouge.py", line 98, in get_scores
    hyps, refs = zip(*hyps_and_refs)
ValueError: not enough values to unpack (expected 2, got 0)

In these iterations, pred.txt is completely empty.

I tried by commenting rouge as you suggested and still the same result. On batches, I actually get loss as nan.

from code2seq.

urialon avatar urialon commented on September 26, 2024

from code2seq.

vijayantajain avatar vijayantajain commented on September 26, 2024

Hi,

Sorry for the confusion. The previous stack trace is not after I commented out rouge. It was one of the experiments I tried and had a slightly different error than the first stack trace.

Following is the stack trace after I commented out the rouge part.

Average loss at batch 100: nan, 	throughput: 175 samples/sec
Average loss at batch 200: nan, 	throughput: 206 samples/sec
Average loss at batch 300: nan, 	throughput: 206 samples/sec
Average loss at batch 400: nan, 	throughput: 206 samples/sec
Average loss at batch 500: nan, 	throughput: 206 samples/sec
Average loss at batch 600: nan, 	throughput: 206 samples/sec
Average loss at batch 700: nan, 	throughput: 206 samples/sec
Average loss at batch 800: nan, 	throughput: 206 samples/sec
Average loss at batch 900: nan, 	throughput: 206 samples/sec
Average loss at batch 1000: nan, 	throughput: 205 samples/sec
Average loss at batch 1100: nan, 	throughput: 206 samples/sec
...
Prediction throughput: 525
Prediction throughput: 540
Prediction throughput: 545
Prediction throughput: 547
Prediction throughput: 548
Prediction throughput: 548
Prediction throughput: 549
Prediction throughput: 549
Prediction throughput: 549
Prediction throughput: 550
Prediction throughput: 547
Prediction throughput: 548
Prediction throughput: 548
Prediction throughput: 548
Prediction throughput: 549
Prediction throughput: 549
Prediction throughput: 549
Prediction throughput: 549
Prediction throughput: 549
Prediction throughput: 549
Prediction throughput: 550
Prediction throughput: 549
Prediction throughput: 550
Prediction throughput: 550
Prediction throughput: 550
Prediction throughput: 550
Prediction throughput: 550
Prediction throughput: 550
Prediction throughput: 550
Prediction throughput: 550
Prediction throughput: 550
Prediction throughput: 550
Prediction throughput: 550
Done testing, epoch reached
Evaluation time: 0h6m35s
Accuracy after 1 epochs: 0.00000
After 1 epochs: Precision: 0.00000, recall: 0.00000, F1: 0.00000
Rouge:  1.0
Average loss at batch 13600: nan, 	throughput: 27 samples/sec
Average loss at batch 13700: nan, 	throughput: 207 samples/sec
Average loss at batch 13800: nan, 	throughput: 207 samples/sec
Average loss at batch 13900: nan, 	throughput: 206 samples/sec
Average loss at batch 14000: nan, 	throughput: 206 samples/sec
Average loss at batch 14100: nan, 	throughput: 206 samples/sec
Average loss at batch 14200: nan, 	throughput: 206 samples/sec
Average loss at batch 14300: nan, 	throughput: 205 samples/sec
Average loss at batch 14400: nan, 	throughput: 205 samples/sec
Average loss at batch 14500: nan, 	throughput: 206 samples/sec
Average loss at batch 14600: nan, 	throughput: 205 samples/sec
Average loss at batch 14700: nan, 	throughput: 205 samples/sec
Average loss at batch 14800: nan, 	throughput: 205 samples/sec
Average loss at batch 14900: nan, 	throughput: 205 samples/sec
Average loss at batch 15000: nan, 	throughput: 205 samples/sec
Average loss at batch 15100: nan, 	throughput: 205 samples/sec
...

This goes for 3 epochs and then it stops (I set config.PATIENCE as 3). I checked pred.txt file in the models directory and it is empty.

from code2seq.

urialon avatar urialon commented on September 26, 2024

Where did you comment rouge?
Because rouge should be run after the first training iteration, at test time.
So the training loss should not be affected by this (because rouge happens later).

from code2seq.

vijayantajain avatar vijayantajain commented on September 26, 2024

I commented them in model.py as follows -

        elapsed = int(time.time() - eval_start_time)
        precision, recall, f1 = self.calculate_results(true_positive, false_positive, false_negative)
#       files_rouge = FilesRouge()
#       rouge = files_rouge.get_scores(
#           hyp_path=predicted_file_name, ref_path=ref_file_name, avg=True, ignore_empty=True)
        print("Evaluation time: %sh%sm%ss" % ((elapsed // 60 // 60), (elapsed // 60) % 60, elapsed % 60))
        return num_correct_predictions / total_predictions, \
               precision, recall, f1, 1.0

from code2seq.

urialon avatar urialon commented on September 26, 2024

Are you sure that:

  1. This is the only change
    and
  2. Without this change, the loss is not nan?

Because this doesn't have anything to do with the loss, it is reached only afterward.

from code2seq.

vijayantajain avatar vijayantajain commented on September 26, 2024

You are right. It's not the rouge, it's the dataset. Originally, I had punctuations in the dataset, but after I removed it, I don't get the error. Any ideas why that might be happening?

from code2seq.

urialon avatar urialon commented on September 26, 2024

Oh, I suspect that it's the comma (",") symbol.
Since we use a comma to separate between the path and the tokens of each "context" (see the format here) - if you have a token that contains commas - it breaks the input.

Can you keep the punctuations that you need - but remove all commas?

from code2seq.

vijayantajain avatar vijayantajain commented on September 26, 2024

Makes sense! I forgot about the format of the input. I will need to try this.

from code2seq.

vijayantajain avatar vijayantajain commented on September 26, 2024

I tried with only periods and no other punctuation and it didn't work. Getting nan as loss. Only when there is no punctuation, the training runs fine.

from code2seq.

urialon avatar urialon commented on September 26, 2024

Can you try to find a single example that makes the nan loss?
It shouldn't be too hard, because it seems that you are getting a nan loss starting from the first batch.

If you create a dataset that is created of only the first line in your dataset (like: cat my_data.train.c2s | head -1 > my_data_one_line.train.c2s) - are you still getting a nan?

from code2seq.

vijayantajain avatar vijayantajain commented on September 26, 2024

No, with one example it was not nan. It worked well. I did not even have any trouble with rouge. Although, the model only predicted a sequence of periods but it sort of worked.

from code2seq.

urialon avatar urialon commented on September 26, 2024

from code2seq.

vijayantajain avatar vijayantajain commented on September 26, 2024

Ok, so we are finding individual sample(s) that might be causing the error. Sure. Regarding removing the pip symbols, you mean to remove them from the target part of the data, right? If so, should I then separate them with spaces?

from code2seq.

urialon avatar urialon commented on September 26, 2024

Hi,
Regarding the pipe symbol - I meant to remove them from the "tokens", if they exist.
But I guess that it's very rare, only if they appear in strings, such as:

String s = "hello | world";

If you find a single example that causes nan and paste it here, I will probably be able to see what's wrong with it.

from code2seq.

urialon avatar urialon commented on September 26, 2024

Closing due to inactivity, feel free to re-open.

from code2seq.

vijayantajain avatar vijayantajain commented on September 26, 2024

Hi @urialon,

Sorry for the delayed response, I got caught up with other things.

I tried finding out individual samples but it turns out that this happens in larger quantities.

Here's what I tried:

  1. I checked with first lines 0-1000 - no error.
  2. Then I doubled the number of lines and checked with 0-2048 lines. This caused an error and pred.txt was empty. So I halved the lines to find the sample giving the error.
  3. I tried with lines 1000-2048, which gave no error. This was confusing because lines 0-1000 did not give any error but 1000-2048 did.
  4. Just to check, I then tried with lines 1000-1501 - no error.
  5. Finally, with lines 1501-2048 - no error.

My guess is it is not an individual sample that is causing this error but the dataset itself that is causing this error.

from code2seq.

urialon avatar urialon commented on September 26, 2024

Let's try this:

  1. Keep a copy of the original training data of lines 0-2048. Verify again that it causes nan loss.
  2. Use the following commands to remove hidden characters from the data (the data right before running the script preprocess.py):
sed -i 's/\x0//g' my_file.txt
sed -i 's/\x0b//g' my_file.txt
sed -i 's/\x1d//g' my_file.txt
sed -i 's/\x1db//g' my_file.txt
tr -cd "[:print:]\n" < my_file.txt > my_file_new.txt

Then check if the new file my_file_new.txt is different than the old file my_file.txt using diff or md5sum.

from code2seq.

vijayantajain avatar vijayantajain commented on September 26, 2024

So I did that on the training data with 2k lines and found some lines with non-alphanumeric characters. These were also present in the original data. Here are some examples -

Original train.c2s file:
trueをセットするとjs|engine.receive||がsent-byのポートで待ち受け、|falseにするとjs|engine.receive||がlocalポートで待ち受ける動作モードとなる。|.|true|sent-by|false|local|. void,Void0|Mth|Nm1,METHOD_NAME void,Void0|Mth|Prm|VDID0,b ...

Modified:
truejs|engine.receive||sent-by|falsejs|engine.receive||local|.|true|sent-by|false|local|. void,Void0|Mth|Nm1,METHOD_NAME void,Void0|Mth|Prm|VDID0,b void,Void0|Mth|Prm|Prim1,boolean METHOD_NAME,Nm1|Mth|Prm|VDID0,b ...

Original:
scroll��|off|image��|�׷�|���ƾ�|�ѵ�|. void,Void0|Mth|Nm1,METHOD_NAME void,Void0|Mth|Prm|VDID0,e ...

Modified:
scroll|off|image||||. void,Void0|Mth|Nm1,METHOD_NAME void,Void0|Mth|Prm|VDID0,e ...

Original:
este|m�todo|inicializa|o|txt|modelos|disponiveis|.|return|javax.swing.jtext|field|. j|text|field,Cls0|Mth|Nm1,METHOD_NAME j|text|field,Cls0|Mth|Bk|If|Eq|Nm0 ...

Modified:
este|mtodo|inicializa|o|txt|modelos|disponiveis|.|return|javax.swing.jtext|field|. j|text|field,Cls0|Mth|Nm1,METHOD_NAME j|text|field,Cls0|Mth|Bk|If|Eq|Nm0 ...

However it did not work. I tried again with the modified data and still getting nan loss.

Maybe I can send you the data and you can get a better idea about it?

from code2seq.

urialon avatar urialon commented on September 26, 2024

from code2seq.

vijayantajain avatar vijayantajain commented on September 26, 2024

Hi @urialon,

I removed all the redundant pipes (|| or |||) from the dataset but still getting nan when training with the refined dataset.

from code2seq.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.