mryimings / nucleus Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 1.0 7.04 MB

Python 18.69% Shell 0.07% CSS 2.71% JavaScript 1.58% HTML 76.95%

nucleus's People

Contributors

Stargazers

Watchers

Forkers

michaellee955

nucleus's Issues

models/r_net/prepro.py line 93 unsafe indexing

bad operand type for unary -: NoneType (invalid-unary-operand-type)

word = ''.join(array[0:-vec_size])

Did not check for array length, unsafe; non-standard representation of index

Change Data Download Directory

Right now, the R-Net model code saves training data within the ~/data directory no matter where the code is on the system. It would probably be worth changing the code to make the data be saved within the project itself because this current approach is unconventional and can lead to unexpected errors if another program were to modify this directory.

Unused argument

'context_char_idxs' (unused-argument)
In file models/r_net/util.py after line 47

        def key_func(context_idxs, ques_idxs, context_char_idxs, ques_char_idxs, y1, y2, qa_id):
            c_len = tf.reduce_sum(
                tf.cast(tf.cast(context_idxs, tf.bool), tf.int32))
            buckets_min = [np.iinfo(np.int32).min] + buckets
            buckets_max = buckets + [np.iinfo(np.int32).max]
            conditions_c = tf.logical_and(
                tf.less(buckets_min, c_len), tf.less_equal(c_len, buckets_max))
            bucket_id = tf.reduce_min(tf.where(conditions_c))
            return bucket_id

Better deleted unused argument. Btw, why didn't you guys use char embeddings.

Too many local variables

Too many local variables used when a function or method has too many local variables.

************* Module func.py
R: 8, 4: Too many local variables (16/15) (too-many-locals)
R: 29, 4: Too many local variables (19/15) (too-many-locals)
R: 55, 4: Too many local variables (16/15) (too-many-locals)
R: 77, 4: Too many local variables (19/15) (too-many-locals)
R:167, 0: Too many local variables (18/15) (too-many-locals)

************* Module main.py
R: 11, 0: Too many local variables (35/15) (too-many-locals)
R: 93, 0: Too many local variables (19/15) (too-many-locals)
R:115, 0: Too many local variables (23/15) (too-many-locals)

************* Module inference.py
R: 65, 4: Too many local variables (30/15) (too-many-locals)
R:149, 4: Too many local variables (19/15) (too-many-locals)

************* Module model.py
R: 57, 4: Too many local variables (33/15) (too-many-locals)

************* Module prepro.py
R: 31, 0: Too many local variables (34/15) (too-many-locals)
R: 83, 0: Too many local variables (23/15) (too-many-locals)
R:121, 0: Too many local variables (31/15) (too-many-locals)
R:204, 0: Too many local variables (20/15) (too-many-locals)

Unnecessary else statement

In application.py at line 76, there is an if - else statement. But the else statement is unnecessary, you can just return it.

No-name-in-module

The file config.py doesn’t have the ‘cognito_userpool_id’ and 'cognito_app_client_id', but it is imported in the file application.py, which may raise the error. It is needed to define these two variables in config.py before it is imported.

wrong-import-order

Module/func.py, line : 3
standard import "from logging.handlers import RotatingFileHandler" should be placed before "from flask import Flask, render_template, redirect, url_for, flash, session, request" (wrong-import-order)

Invalid unary operand type

************* Module prepro.py
E: 93,39: bad operand type for unary -: NoneType (invalid-unary-operand-type)
E: 94,47: bad operand type for unary -: NoneType (invalid-unary-operand-type)

...
word = ''.join(array[0:-vec_size])
vector = list(map(float, array[-vec_size:]))
...

too general exception

In application.py line 65, there is a try-catch block and the exception is too general. You can use some specific exceptions.

application.py Too general exception

W:113,11: Catching too general exception Exception (broad-except)
Not specific exception handling.

bad operand type for unary -: NoneType

E: 93,39: bad operand type for unary -: NoneType (invalid-unary-operand-type)

def get_embedding(counter, data_type, limit=-1, emb_file=None, size=None, vec_size=None, token2idx_dict=None):
    ...
    for line in tqdm(fh, total=size):
        array = line.split()
        word = ''.join(array[0:-vec_size])
        vector = list(map(float, array[-vec_size:]))
    ...

vec_size is an argument with default value None, use a None value in array alice with unary - is not good code, although they have an assert prior to this, the arguments list should be repacked and formatted.

Unused Argument

Many arguments are unused after defined.

************* Module util.py
W: 47,46: Unused argument 'context_char_idxs' (unused-argument)
W: 47,89: Unused argument 'qa_id' (unused-argument)
W: 47,65: Unused argument 'ques_char_idxs' (unused-argument)
W: 47,81: Unused argument 'y1' (unused-argument)
W: 47,85: Unused argument 'y2' (unused-argument)
W: 47,35: Unused argument 'ques_idxs' (unused-argument)
W: 57,24: Unused argument 'key' (unused-argument)

************* Module prepro.py
W:127,29: Unused argument 'is_test' (unused-argument)

************* Module func.py
W: 8,100: Unused argument 'scope' (unused-argument)

Reimport

In file: Module/inference.py, line: 13
Reimport 'dirname' (imported line 4) (reimported)

Update README and Include Trained Model

It would probably be worth updating the root README to provide more specific steps as to how to train and test the R-Net model to avoid any confusion. Additionally, including the trained model that you used would allow us to reproduce your test results with complete certainty and would save us from training the model ourselves. You can do so in a release like the original author did here. If you'd rather include a link to the original author's trained model, you could add that to the root README and include steps as to how to use it within the project. To that end, including a link to the original model repo is probably a good idea too.

Import twice

dirname and abspath have been imported twice.

redefined-outer-name

Module/inference.py, line: 66
Redefining name 'd' from outer scope (line 9) (redefined-outer-name)

Line too long

According to the pylint result, there are many lines in the project that are considered too long, which are longer than 100 characters.

************* Module db_update_class.py
C: 8, 0: Line too long (113/100) (line-too-long)
C: 44, 0: Line too long (103/100) (line-too-long)
C: 54, 0: Line too long (119/100) (line-too-long)
C: 83, 0: Line too long (120/100) (line-too-long)
C: 93, 0: Line too long (103/100) (line-too-long)
C: 98, 0: Line too long (103/100) (line-too-long)

************* Module test_inference.py
C: 11, 0: Line too long (108/100) (line-too-long)

************* Module test_database.py
C:105, 0: Line too long (116/100) (line-too-long)

************* Module util.py
C: 17, 0: Line too long (102/100) (line-too-long)

************* Module prepro.py
C: 73, 0: Line too long (124/100) (line-too-long)
C: 77, 0: Line too long (102/100) (line-too-long)
C: 83, 0: Line too long (110/100) (line-too-long)
C:121, 0: Line too long (103/100) (line-too-long)
C:128, 0: Line too long (102/100) (line-too-long)
C:182, 0: Line too long (131/100) (line-too-long)
C:183, 0: Line too long (125/100) (line-too-long)
C:184, 0: Line too long (141/100) (line-too-long)
C:185, 0: Line too long (135/100) (line-too-long)
C:186, 0: Line too long (111/100) (line-too-long)
C:187, 0: Line too long (111/100) (line-too-long)
C:188, 0: Line too long (110/100) (line-too-long)
C:223, 0: Line too long (133/100) (line-too-long)
C:230, 0: Line too long (126/100) (line-too-long)

************* Module model.py
C: 59, 0: Line too long (153/100) (line-too-long)
C:105, 0: Line too long (105/100) (line-too-long)

************ Module inference.py
C: 48, 0: Line too long (104/100) (line-too-long)
C: 50, 0: Line too long (104/100) (line-too-long)
C: 60, 0: Line too long (107/100) (line-too-long)
C: 61, 0: Line too long (107/100) (line-too-long)
C: 66, 0: Line too long (109/100) (line-too-long)
C: 71, 0: Line too long (101/100) (line-too-long)
C: 72, 0: Line too long (101/100) (line-too-long)
C: 75, 0: Line too long (112/100) (line-too-long)
C: 78, 0: Line too long (112/100) (line-too-long)
C: 92, 0: Line too long (106/100) (line-too-long)
C: 98, 0: Line too long (107/100) (line-too-long)
C:103, 0: Line too long (109/100) (line-too-long)
C:114, 0: Line too long (119/100) (line-too-long)
C:141, 0: Line too long (105/100) (line-too-long)
C:143, 0: Line too long (120/100) (line-too-long)
C:197, 0: Line too long (1308/100) (line-too-long)

************* Module main.py
C: 65, 0: Line too long (104/100) (line-too-long)
C: 70, 0: Line too long (110/100) (line-too-long)

************* Module func.py
C: 8, 0: Line too long (112/100) (line-too-long)
C: 55, 0: Line too long (120/100) (line-too-long)
C: 86, 0: Line too long (104/100) (line-too-long)
C:167, 0: Line too long (101/100) (line-too-long)

************* Module application.py
C: 27, 0: Line too long (109/100) (line-too-long)
C: 45, 0: Line too long (102/100) (line-too-long)
C: 46, 0: Line too long (116/100) (line-too-long)
C: 61, 0: Line too long (115/100) (line-too-long)
C: 85, 0: Line too long (105/100) (line-too-long)

Duplicate code

The files evaluate_v1.1.py:10 and util.py:102 have 3 functions completely duplicated. It might be good to put them in one place and import from there. It is not an error but might cause one in the future.

R:  1, 0: Similar lines in 2 files
==evaluate_v1.1:10
==util:102
def normalize_answer(s):
    """Lower text and remove punctuation, articles and extra whitespace."""
    def remove_articles(text):
        return re.sub(r'\b(a|an|the)\b', ' ', text)

    def white_space_fix(text):
        return ' '.join(text.split())

    def remove_punc(text):
        exclude = set(string.punctuation)
        return ''.join(ch for ch in text if ch not in exclude)

    def lower(text):
        return text.lower()

    return white_space_fix(remove_articles(remove_punc(lower(s))))


def f1_score(prediction, ground_truth):
    prediction_tokens = normalize_answer(prediction).split()
    ground_truth_tokens = normalize_answer(ground_truth).split()
    common = Counter(prediction_tokens) & Counter(ground_truth_tokens)
    num_same = sum(common.values())
    if num_same == 0:
        return 0
    precision = 1.0 * num_same / len(prediction_tokens)
    recall = 1.0 * num_same / len(ground_truth_tokens)
    f1 = (2 * precision * recall) / (precision + recall)
    return f1


def exact_match_score(prediction, ground_truth):
    return (normalize_answer(prediction) == normalize_answer(ground_truth))


def metric_max_over_ground_truths(metric_fn, prediction, ground_truths):
    scores_for_ground_truths = []
    for ground_truth in ground_truths:
        score = metric_fn(prediction, ground_truth)
        scores_for_ground_truths.append(score)
    return max(scores_for_ground_truths) (duplicate-code)

Include Version Numbers in "requirements.txt" Files

Due to the sensitivity of packages in deep learning projects to version number (especially Tensorflow), I would highly recommend including version numbers in the requirements.txt files to ensure that everything will work in the future. The author of the R-Net implementation that you're using even states: "There have been a lot of known problems caused by using different software versions." For reference, the Tensorflow downloaded via the root requirements.txt on my machine is 1.12.0, and it appears that the original author tested his/her implementation with 1.5.0, so there is a high probability that some of the code he/she wrote will be deprecated in a newer Tensorflow version in the near future.

Unused import

In application.py at line 2, you import logging but didn't use it