Code Monkey home page Code Monkey logo

Comments (7)

jason-fries avatar jason-fries commented on June 13, 2024

Hi @elsirdavid
Thanks for the detailed debugging information! Let's test a few things first (using the dev branch)

1. Can you confirm that the UMLS zip file isn't corrupted?

Test this via the command line md5 umls-2020AB-metathesaurus.zip --> 69d2929e0902e7e42af0b2cb74d5005a
or using the use_checksum flag in UMLS.init_from_nlm_zip(NLM_ZIPFILE_PATH, use_checksum=True)

2. Try creating a new conda env using the enviornment.yml file

You can init from scratch using conda env create -f enviornment.yml

If neither of these fix the UMLS issue we can dive deeper into debugging.

from trove.

DavidLikesLearning avatar DavidLikesLearning commented on June 13, 2024

Hi @jason-fries (and Happy New Year!!)

Thank you for your help.

I couldn't use the md5 command from the command line. I did use the checksum suggested and used other code to get a md5 hash of the file.

The checksum was added inline, the hash is below the list of python libraries in the environment. The UMLS code seems to have a problem with the declaration of the 'release' variable.

1_Installing_the_UMLS_md5_checksum.pdf

for the creation of a new environment, I used the 'requirements.txt' file as directed by the README. This manages to install some libraries but crashes when collecting scipy (error in preparing metadata regardign pyproject.toml).

troveDistUtilsFail

I installed msgpack, pandas by hand. The results were the same and are below:

1_Installing_the_UMLS-Copy-trove_env_md5_checksum.pdf

from trove.

jason-fries avatar jason-fries commented on June 13, 2024

Hi @elsirdavid

Two issues: (1) For your MD5 hash check, your provided code

import hashlib
md5 = hashlib.md5(b'umls-2020AB-metathesaurus.zip')
print(md5, '\n',md5.digest()) 

generates a hash of the string literal not the contents of the UMLS zip file. You'll want to use

hashlib.md5(open("umls-2020AB-metathesaurus.zip", "rb").read()).hexdigest()

to generate a hash of the contents of the zip file. The above code snippet should return 69d2929e0902e7e42af0b2cb74d5005a for the 2020AB release. If you get a different number your file is corrupted and should be redownloaded from the NLM.

(2) Trove is only tested with Python 3.7.x. From your PDF it looks like your environment is 3.9.7 If you create a fresh env using conda env create -f environment.yml it should install the correct Python version.

On my machine installing from the latest trove dev branch commit using a fresh conda env works, so let's see if any of the above are the source of your issues.

Also make certain to wipe your temp directory (~/.trove/umls2022AB in your code) if the installation of the UMLS bombs out.

from trove.

DavidLikesLearning avatar DavidLikesLearning commented on June 13, 2024

Could you point me to that environment.yml file? I can't find it in the github or any of the folders I've searched. The README from trove suggests using requirements.txt but as i mentioned earlier, that fails too. I'm not certain how to make this environment, then.

from trove.

DavidLikesLearning avatar DavidLikesLearning commented on June 13, 2024

Also, thanks for fixing my hash code. It is indeed not corrupted, I do get the right hash thankfully.

from trove.

DavidLikesLearning avatar DavidLikesLearning commented on June 13, 2024

Thank you for the changing branch idea. I have now tried to to use the relevatn yml file. The creation fails with the output in the included txt file. I am going to try to install the relevant libraries and python version by hand.
create_env.txt

from trove.

DavidLikesLearning avatar DavidLikesLearning commented on June 13, 2024

I ended up installing python 3.7, msgpack and pandas as the yml file directed and the resulting notebook is here:
1_Installing_the_UMLS_013123.pdf

from trove.

Related Issues (5)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.