drcandacemakedamoore / cleanx Goto Github PK
View Code? Open in Web Editor NEWPython library for exploring, cleaning, normalizing, and augmenting large datasets of radiological data.
License: GNU General Public License v3.0
Python library for exploring, cleaning, normalizing, and augmenting large datasets of radiological data.
License: GNU General Public License v3.0
Image comparison for copies function is too slow and memory intensive at present. Maybe we can implement something with the numpy library that is faster.
I can't test the Anaconda version on my machine as the environment has too many conflicts right now. I need other people to test it.
The version in documentation still says version 0.0.4,
Debugged, ready for starting the code-free workflow.
For a couple of days, I haven't been able to make an updated build on my machine. Salient log below:
ModuleNotFoundError: No module named 'conda.vendor.auxlib'
Traceback (most recent call last):
File "setup.py", line 648, in
zip_safe=False,
File "D:\bin\anaconda3\envs\cleanX\lib\site-packages\setuptools_init.py", line 153, in setup
return distutils.core.setup(**attrs)
File "D:\bin\anaconda3\envs\cleanX\lib\distutils\core.py", line 148, in setup
dist.run_commands()
File "D:\bin\anaconda3\envs\cleanX\lib\distutils\dist.py", line 966, in run_commands
self.run_command(cmd)
File "D:\bin\anaconda3\envs\cleanX\lib\distutils\dist.py", line 985, in run_command
cmd_obj.run()
File "setup.py", line 476, in run
raise RuntimeError('Couldn't build {} package'.format(name))
RuntimeError: Couldn't build cleanX package
We seem to have put in an older (with a small bug) version of the zero_to_twofivefive_simplest_norming(). All image normalization functions should be tested and updated tonight (24/1/2022)
This is a list of some improvements/suggestions or issues that may need clarifications.
Is this file needed GNU GENERAL PUBLIC LICENSE.txt?
Include Conda badges https://anaconda.org/doctormakeda/cleanx/badges
Make sure that the test badges link to the test builds. Currently, they link to the image of the badge.
Create a paper
folder for the paper files and include a copy of the LICENSE file.
Include some examples on how to get started in the readme file. The same applies to the documentation. I would expect at least some sort of getting started guide.
Since version v0.1.9 was released, I would expect the current changes to have v0.2.0.dev as the version for these changes in development. Later to be released as v0.2.0. But if you desire to have the current pattern, thats fine.
Move all document files to a docs folder. I think readthedocs
could also enable the docs have two versions, the stable and the latest.
In the Jupyter
we have paths like 'D:/projects/cleanX'
It would be nice to start by getting the current project's directory and then use relative paths with join.
For example:
dicomfile_directory1 = 'D:/projects/cleanX/test/dicom_example_folder'
example = pd.read_csv("D:/projects/cleanX/workflow_demo/martians_2051.csv")
# To
working_dir = "Path to project home"
example_path = os.path.normpath(os.path.join(working_dir, "workflow_demo/martians_2051.csv"))
example = pd.read_csv(example_path)
It would be nice to normalize the paths. This will help Windows users who have a hard time with / and \ characters
Create a report to help us improve
Describe the bug
I had some issues with DCM images.
While there's a stop-gap solution for the text formatting right now, it could definitely use some better formatting, s.a. indentation, emphasis on paragraph headers, and so on. @drcandacemakedamoore
The function seems to work in Jupyter, maybe the unit test needs to be rewritten, or an inverted picture must be pushed up to the folder it looks for images in? (This is a reminder to myself to investigate)
The workflow on-tag.yml is referencing action s-weigand/setup-conda using references v1. However this reference is missing the commit a30654e576ab9e21a25825bf7a5d5f2a9b95b202 which may contain fix to the some vulnerability.
The vulnerability fix that is missing by actions version could be related to:
(1) CVE fix
(2) upgrade of vulnerable dependency
(3) fix to secret leak and others.
Please consider to update the reference to the action.
It would be nice the builds are tested on Windows and Mac. One can do that using GitHub actions: https://docs.github.com/en/actions/using-jobs/using-a-matrix-for-your-jobs#example-adding-configurations
https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idruns-on
Describe the bug
No package 'tesseract' found
Screenshots
Using legacy 'setup.py install' for tesserocr, since package 'wheel' is not installed.
Installing collected packages: tesserocr, opencv-python, matplotlib, cleanX
Running setup.py install for tesserocr ... error
ERROR: Command errored out with exit status 1:
command: /Users/henrykironde/Documents/GitHub/testenv/bin/python3.9 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/d5/vm7jfw2x7q550xrltd8ss5440000gn/T/pip-install-d2ki3n8m/tesserocr_9008e8f2373142109467f5269239caa5/setup.py'"'"'; __file__='"'"'/private/var/folders/d5/vm7jfw2x7q550xrltd8ss5440000gn/T/pip-install-d2ki3n8m/tesserocr_9008e8f2373142109467f5269239caa5/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/d5/vm7jfw2x7q550xrltd8ss5440000gn/T/pip-record-zx09fpub/install-record.txt --single-version-externally-managed --compile --install-headers /Users/henrykironde/Documents/GitHub/testenv/include/site/python3.9/tesserocr
cwd: /private/var/folders/d5/vm7jfw2x7q550xrltd8ss5440000gn/T/pip-install-d2ki3n8m/tesserocr_9008e8f2373142109467f5269239caa5/
Complete output (20 lines):
pkg-config failed to find tesseract/leptonica libraries: Package tesseract was not found in the pkg-config search path.
Perhaps you should add the directory containing `tesseract.pc'
to the PKG_CONFIG_PATH environment variable
No package 'tesseract' found
Failed to extract tesseract version from executable: [Errno 2] No such file or directory: 'tesseract'
Supporting tesseract v3.04.00
Tesseract major version 3
Building with configs: {'libraries': ['tesseract', 'lept'], 'compile_time_env': {'TESSERACT_MAJOR_VERSION': 3, 'TESSERACT_VERSION': 50593792}}
WARNING: The wheel package is not available.
running install
running build
running build_ext
Detected compiler: unix
building 'tesserocr' extension
creating build
creating build/temp.macosx-11-x86_64-3.9
clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX.sdk -I/usr/local/include -I/usr/local/opt/openssl@1.1/include -I/usr/local/opt/sqlite/include -I/Users/henry/Documents/GitHub/testenv/include -I/usr/local/opt/python@3.9/Frameworks/Python.framework/Versions/3.9/include/python3.9 -c tesserocr.cpp -o build/temp.macosx-11-x86_64-3.9/tesserocr.o
clang: error: invalid version number in 'MACOSX_DEPLOYMENT_TARGET=11'
error: command '/usr/bin/clang' failed with exit code 1
----------------------------------------
ERROR: Command errored out with exit status 1: /Users/henrykironde/Documents/GitHub/testenv/bin/python3.9 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/private/var/folders/d5/vm7jfw2x7q550xrltd8ss5440000gn/T/pip-install-d2ki3n8m/tesserocr_9008e8f2373142109467f5269239caa5/setup.py'"'"'; __file__='"'"'/private/var/folders/d5/vm7jfw2x7q550xrltd8ss5440000gn/T/pip-install-d2ki3n8m/tesserocr_9008e8f2373142109467f5269239caa5/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /private/var/folders/d5/vm7jfw2x7q550xrltd8ss5440000gn/T/pip-record-zx09fpub/install-record.txt --single-version-externally-managed --compile --install-headers /Users/henrykironde/Documents/GitHub/testenv/include/site/python3.9/tesserocr Check the logs for full command output.
WARNING: You are using pip version 21.1.3; however, version 21.2.4 is available.
(testenv) ➜ cleanX git:(docs) ✗
Your computer environment info: (please complete the following information):
OS: [MacOSX]
Python V=version [3.9]
It is not clear where the datasets in additional_demos
and workflow_demo
are coming from and what license they have.
Installing tesserocr through pip doesn't work. You should install with conda
conda install -c conda-forge tesserocr
!pip install git+https://github.com/drcandacemakedamoore/cleanX.git
Some of our users are applying this to color images i.e. endoscopic images. This is by change, and it could have been pathology images. We should add functions explicitly for this starting with finding color outliers. I will attack this once the JOSS review completes.
cleanX.image_work.show_images_in_df() sometimes repeats images in the matrix of pictures
Expected behavior
Should put each image in matrix once, not more, not less
OS: Windows
Python V=version 3.8 (or 3.9)
It appears that neither of two Sphinx extensions that supposedly accomplish this don't actually work:
setuptools
integrationSo, it looks like this is going to be a lot of manual labor involving CSS and HTML of Sphinx templates...
test_find_tiny_image_differences (from my Windows) is currently not working on most advanced branch, unclear why
need test for pipeline functions: Normalize, Crop etc.
It would be nice to have a list of alternative software. Mentioning how cleanX
is different than the alternatives will help users to decide what option is better for him/her.
Mamba is a good alternative to conda which has a faster dependency resolver.
https://github.com/mamba-org/mamba
I changed the index.rst.txt to an rst file (and others in the Sphinx area), but still not sure how to get an automatic documentation build (it seems to fail).
Apparently has a new bug- probably created by a dependency change. Now breaks in classes_workflow notebook (commented out). Must investigate break.
Need sanity to also have optional dependencies installed, but make sure the rest runs w/o those dependencies present.
ERROR conda.cli.main_run:execute(33): Subprocess for 'conda run ['python', 'setup.py', 'bdist_egg']' command failed (see the actions tab)
CleanX uses some sensitive language that may offend some users. I would recommend that you remove words like idiots
since it is against the code of conduct for Joss.
There are typos in the doc strings, like """This class allows normalization by throwing off exxtreme values on" It would be nice to look through the doc strings and try to remove the typos.
Note: I am still failing to install CleanX, but I think it is some complications with my Conda setup. I will keep you updated. My target is to finish with the review and final decision in 14 days.
Can you add documentation in the following files?
Conda build is failing after I updated the bib document.
@drcandacemakedamoore 👍🏿 for getting this to finally install smoothly. Some issues that I have are detailed below.
README.md Example:
cleaned
or always delete it first and then make a new one.dst = 'cleaned'
if not os.path.exists(dst):
os.mkdir(dst)
dst = 'cleaned'
os.rmdir(dst)
os.mkdir(dst)
Improve this README.md example, I had to install SimpleITK and PyDICOM. You could add this to required dependencies.
(cleanx) henrysenyondo ~/Downloads/cleanX [main] $ python examplecleanX.py
WARNING:root:Don't know how to find Tesseract library version
/Users/henrysenyondo/Downloads/cleanX/cleanX/dicom_processing/__init__.py:37: UserWarning:
Neither SimpleITK nor PyDICOM are installed.
Will not be able to extract information from DICOM files.
warnings.warn(
Traceback (most recent call last):
File "examplecleanX.py", line 36, in <module>
from cleanX.dicom_processing import DicomReader
ImportError: cannot import name 'DicomReader' from 'cleanX.dicom_processing' (/Users/henrysenyondo/Downloads/cleanX/cleanX/dicom_processing/__init__.py)
(cleanx) henrysenyondo ~/Downloads/cleanX [main] $
Use a path that does actually exist in the repo src = GlobSource('/images/to/clean/*.jpg')
workflow_demo examples:
Can we add template matching functions for certain equipment, or will it only work if we have a pixel-based match exactly? I think we can try it in notebooks first.
CI needs work, we need to cover more ground in testing and releases. Specifically, we need to test on Python 3.9, build conda
packages for this version. Need to have SimpleITK
, pydicom
, and click
installed for tests that need it.
I think, this file was added in response to the previous review. It's out of date somewhat at this point, and it only covers a fraction of what's actually supported. Also, I think the script it mentions in the beginning might use some work.
In retrospect, using https://www.sphinx-doc.org/en/master/man/sphinx-apidoc.html was a bad idea. The code it generates is awful and impossible to control. In particular, there's no way to disable or enable special methods on per-class basis. Similarly for inheritance etc.
Apparently, we need to replace this with something else that would generate sensible documentation pages. There's no hope that sphinx-apidoc
will ever improve.
There may be another installation problem. Let's follow up. (see new JOSS review)
This lone combination takes most of the CI time, and is responsible for 90% of bad builds. We may need to rewrite the code on this one combination.
Also, our cli dicom extract-images
does something unrelated right now: it generates a report instead...
class Mean(Aggregate): needs precise documentation inserted
Thank you for the nice changes PatriceJada ...we need to figure out how to contact this person to see if he or she wants to be an author on the associated paper, which is still under review. Are you there @PatriceJada ?
The continous integration seems stuck. It has runs on que from 6 hours ago. Is this something I introduced with the new folders? Why is it all failing?
Create a report to help us improve
Describe the bug
tesserocr
To Reproduce
Steps to reproduce the behavior:
pip install cleanx
Expected behavior
A clear and concise description of what you expected to happen.
ERROR: Failed building wheel for tesserocr
Running setup.py clean for tesserocr
Screenshots
If applicable, add screenshots to help explain your problem.
Your computer environment info: (please complete the following information):
Ubuntu 16.
OS: [e.g. Linux]
Python V=version [e.g. 3.7]
I think you should add minimum requirement in the readme file
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.