Comments (2)
I did some more research regarding this and wrote a test script to try to find out what your code is doing.
import math
import numpy as np
import pandas as pd
import tensorflow as tf
sorted_data_labels = [((1, ), 2 ) for _ in range(5000)]
processed_dataset = tf.data.Dataset.from_generator(lambda: sorted_data_labels, output_types=(tf.int32, tf.int32))
BATCH_SIZE = 32
batched_dataset = processed_dataset.padded_batch(BATCH_SIZE, padded_shapes=((None, ), ()))
TOTAL_BATCHES = math.ceil(len(sorted_data_labels) / BATCH_SIZE)
TEST_BATCHES = TOTAL_BATCHES // TOTAL_BATCHES
batched_dataset.shuffle(TOTAL_BATCHES)
test_data = batched_dataset.take(TEST_BATCHES)
train_data = batched_dataset.skip(TEST_BATCHES)
# print("list(train_data): {0}".format(list(train_data)))
# print("list(test_data): {0}".format(list(test_data)))
print("len(list(train_data)): {0}".format(len(list(train_data))))
print("len(list(test_data)): {0}".format(len(list(test_data))))
The output is:
len(list(train_data)): 156
len(list(test_data)): 1
So, as supposed, TEST_BATCHES
evaluates to 1 (which is not a ratio / percentage but an absolute number), so exactly one element is allocated to test_data
, and all others to train_data
. test_data
is not being used in the code. So I guess you don't want to skip this 1 element, as you have split the dataset in train and test before already. Is that correct?
Best regards and thanks
Elias Hohl
from bert-based-tag-recommendation.
Thank you for deep-diving into our code. Unfortunately, the main developer is currently unavailable and I have to make temporary assumptions. This code was changed over and over during development and seems to have some issues or work-in-progress parts.
But I think you are absolutely correct. For this issue, we can allocate all TOTAL_BATCHES
to train_data
. I updated the notebook.
Thank you and best regards
Issa
from bert-based-tag-recommendation.
Related Issues (3)
- Variable most_occurs is not defined HOT 1
- Add license HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from bert-based-tag-recommendation.