Comments (7)
Hi @elsirdavid
Thanks for the detailed debugging information! Let's test a few things first (using the dev
branch)
1. Can you confirm that the UMLS zip file isn't corrupted?
Test this via the command line md5 umls-2020AB-metathesaurus.zip
--> 69d2929e0902e7e42af0b2cb74d5005a
or using the use_checksum
flag in UMLS.init_from_nlm_zip(NLM_ZIPFILE_PATH, use_checksum=True)
2. Try creating a new conda env using the enviornment.yml
file
You can init from scratch using conda env create -f enviornment.yml
If neither of these fix the UMLS issue we can dive deeper into debugging.
from trove.
Hi @jason-fries (and Happy New Year!!)
Thank you for your help.
I couldn't use the md5 command from the command line. I did use the checksum suggested and used other code to get a md5 hash of the file.
The checksum was added inline, the hash is below the list of python libraries in the environment. The UMLS code seems to have a problem with the declaration of the 'release' variable.
1_Installing_the_UMLS_md5_checksum.pdf
for the creation of a new environment, I used the 'requirements.txt' file as directed by the README. This manages to install some libraries but crashes when collecting scipy (error in preparing metadata regardign pyproject.toml).
I installed msgpack, pandas by hand. The results were the same and are below:
1_Installing_the_UMLS-Copy-trove_env_md5_checksum.pdf
from trove.
Hi @elsirdavid
Two issues: (1) For your MD5 hash check, your provided code
import hashlib
md5 = hashlib.md5(b'umls-2020AB-metathesaurus.zip')
print(md5, '\n',md5.digest())
generates a hash of the string literal not the contents of the UMLS zip file. You'll want to use
hashlib.md5(open("umls-2020AB-metathesaurus.zip", "rb").read()).hexdigest()
to generate a hash of the contents of the zip file. The above code snippet should return 69d2929e0902e7e42af0b2cb74d5005a
for the 2020AB release. If you get a different number your file is corrupted and should be redownloaded from the NLM.
(2) Trove is only tested with Python 3.7.x. From your PDF it looks like your environment is 3.9.7
If you create a fresh env using conda env create -f environment.yml
it should install the correct Python version.
On my machine installing from the latest trove dev
branch commit using a fresh conda env works, so let's see if any of the above are the source of your issues.
Also make certain to wipe your temp directory (~/.trove/umls2022AB
in your code) if the installation of the UMLS bombs out.
from trove.
Could you point me to that environment.yml
file? I can't find it in the github or any of the folders I've searched. The README from trove suggests using requirements.txt
but as i mentioned earlier, that fails too. I'm not certain how to make this environment, then.
from trove.
Also, thanks for fixing my hash code. It is indeed not corrupted, I do get the right hash thankfully.
from trove.
Thank you for the changing branch idea. I have now tried to to use the relevatn yml file. The creation fails with the output in the included txt file. I am going to try to install the relevant libraries and python version by hand.
create_env.txt
from trove.
I ended up installing python 3.7, msgpack and pandas as the yml file directed and the resulting notebook is here:
1_Installing_the_UMLS_013123.pdf
from trove.
Related Issues (5)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from trove.