Code Monkey home page Code Monkey logo

udsmprot's People

Contributors

nstrodt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

udsmprot's Issues

Using new dataset for classification

Hi, thank you for the amazing work!
I am just curious if I have my own dataset for which I would like to use your model architecture for protein classification, then what would be the best way to do that?

Also, I am not able to find these Enzyme classification datasets in the repository which are mentioned in code.
path_ec_knna = git_data_path/"suppa.txt"
path_ec_knnb = git_data_path/"suppb.txt"

Thanks.

Error when trying to run the benchmarks

Hello, when trying to run the benchmarks from the jupyter notebook, which for various reasons I had to port to run in a local script instead, I received the following error message:
2023-12-23 18:19:26.336247: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2023-12-23 18:19:26.359764: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2023-12-23 18:19:26.359816: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2023-12-23 18:19:26.360542: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2023-12-23 18:19:26.364383: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2023-12-23 18:19:26.364542: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-12-23 18:19:28.150936: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
14945 training set records, 1661 validation set records, 4152 test set records.
Traceback (most recent call last):
File "/home/gradwan/protein_bert/sally1.py", line 37, in
pretrained_model_generator, input_encoder = load_pretrained_model()
File "/home/gradwan/protein_bert/proteinbert/existing_model_loading.py", line 54, in load_pretrained_model
return load_pretrained_model_from_dump(dump_file_path, create_model_function, create_model_kwargs = create_model_kwargs, optimizer_class = optimizer_class, lr = lr,
File "/home/gradwan/protein_bert/proteinbert/model_generation.py", line 159, in load_pretrained_model_from_dump
n_annotations, model_weights, optimizer_weights = pickle.load(f)
_pickle.UnpicklingError: invalid load key, 'v'.

Could you please help? Many thanks!

Recreate same EC dataset

Hello,
I am trying to replicate the same dataset for EC prediction(EC40 and EC50) as in your paper UDSMProt but I find some difficulties.

First in your script code/create_datasets.sh in line 27 :
python proteomics_preprocessing.py clas_ec --drop_ec7=True --working_folder=datasets/clas_ec/clas_ec_ec50_level1 --pretrained_folder=datasets/lm/lm_sprot_uniref --level=2 --include_NoEC=False --dataset="uniprot" --sampling_method_train=1 --sampling_method_valtest=3 --ignore_pretrained_clusters=True --sampling_ratio=[.8,.1,.1] --save_prev_ids=True
I think it should be :
python proteomics_preprocessing.py clas_ec --drop_ec7=True --working_folder=datasets/clas_ec/clas_ec_ec50_level2 --pretrained_folder=datasets/lm/lm_sprot_uniref --level=2 --include_NoEC=False --dataset="uniprot" --sampling_method_train=1 --sampling_method_valtest=3 --ignore_pretrained_clusters=True --sampling_ratio=[.8,.1,.1] --save_prev_ids=True The working folder should be name clas_ec_ec50_level2 not clas_ec_ec50_level1 .

Secondly when I run the script create_datasets.sh, I have an error that says :
../tmp_data/cdhit04_uniprot_sprot_2017_03.pkl not found.
And I think this maybe because in the link you provide :
there are two files with two different versions of swissprot. The file "cdhit04_uniprot_sprot_2016_07.pkl" which uses the 07/2016 version and the file "uniref50_2017_03_uniprot_sprot_2017_03.pkl" which uses the 03/2017 version.

So I am a little bit confused, I don't know if I have to download the Swiss-Prot release of 03/2017 or 07/2016 with the files in your link in order to replicate exactly the same dataset as you in your UDSMProt paper. And even if I have the correct cdhit04 file, will I have exactly the same test dataset ? And if it is not the case, it would be kind if you could provide a link to download the exact train/dev/test dataset for ECpred.

Thanks a lot in advance

incompatible packages in the proteomics.yml

Hi,
I'm trying to install the UDSMProt through conda env create -f proteomics.yml, but incompatible packages were found. Could you provide an updated proteomics.yml file?
Thanks!

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions.

Hi,

I tried to replicate the data using ./create_datasets.sh but I received the following error:

File "/UDSMProt/code/utils/dataset_utils.py", line 268, in prepare_dataset
tok_num = np.array([[tok_stoi[o] for o in p] for p in tok])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (556825,) + inhomogeneous part.

Could you please help me how I can fix it. Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.