Code Monkey home page Code Monkey logo

Comments (34)

jpivarski avatar jpivarski commented on June 18, 2024

I would guess that it's not complaining about remote_datasets.yml being a directory (confusing file for directory); I would guess that it's not finding the skhep_testdata directory.

Does _default_data_dir depend on the existence of a user directory (something that ~ expands to)? That's a fairly strong assumption to make about the site where this will be running; it's just the sort of thing that services like Travis might not have. (I'm not saying it doesn't—I don't know—but it's suspicious.)

Why does should a package like this be configurable, anyway? As a user, I wouldn't tell this package where the files are, it needs to tell me.

from scikit-hep-testdata.

jpivarski avatar jpivarski commented on June 18, 2024

Explicitly pip-installing it fixes the issue in Travis, so the difference is between getting the package by manually pip-installing versus getting it automatically through test_requires in setup.py. Try creating a dummy package that depends on scikit-hep-testdata at the testing level and see what files are left out and why.

Explicitly pip-installing fixed one issue in AppVeyor (Windows), too, but then there's another one:

____________________ ERROR collecting tests/test_jagged.py ____________________
tests\test_jagged.py:40: in <module>
    class Test(unittest.TestCase):
tests\test_jagged.py:42: in Test
    sample = uproot.open(skhep_testdata.data_path("uproot-sample-6.10.05-uncompressed.root"))["sample"]
uproot\rootio.py:83: in open
    raise ValueError("URI scheme not recognized: {0}".format(path))
E   ValueError: URI scheme not recognized: C:\Python27\lib\site-packages\skhep_testdata\data\uproot-sample-6.10.05-uncompressed.root

(Windows is an example of a system that doesn't have ~ users in the same sense as Linux, one I hadn't thought of but really obvious in hindsight...)

from scikit-hep-testdata.

jpivarski avatar jpivarski commented on June 18, 2024

Actually, this last one is an uproot failure: you gave me an absolute path on Windows, and I didn't parse it right. I wonder why that's never happened before...?

from scikit-hep-testdata.

jpivarski avatar jpivarski commented on June 18, 2024

Oddly, this error also applies to PyPy, even when scikit-hep-testdata is explicitly pip-installed. There's nothing in this package that ought to behave differently in PyPy as opposed to CPython.

from scikit-hep-testdata.

jpivarski avatar jpivarski commented on June 18, 2024

Uproot's use of scikit-hep-testdata (scikit-hep/uproot3#237) with have to wait for this to be fixed. Explicitly pip-installing is a workaround, but that workaround doesn't even work with PyPy (oddly). See this.

But in general, implicitly installing through tests_require ought to be enough. I suspect that when that's fixed, my PyPy test will pass.

from scikit-hep-testdata.

benkrikler avatar benkrikler commented on June 18, 2024

Thanks for flagging this and for digging so much into it @jpivarski! I'll take a look as soon as I can, but unfortunately it might not be until late next week since I've got a few deadlines coming up. Perhaps @eduardo-rodrigues or @mayou36 could take a look before then?

I suppose the CI testing for this package should include a dummy package which depends and installs this one through test_requires and so on.

from scikit-hep-testdata.

eduardo-rodrigues avatar eduardo-rodrigues commented on June 18, 2024

Hi all, I won't really have time today to dig much but has still had a quick look at the package files. I did spot a missing future import statement, see #17.

@jpivarski, by chance were you getting trouble specifically with Python 2? The fix above might then help. Anything more subtle will require more time ...

from scikit-hep-testdata.

jpivarski avatar jpivarski commented on June 18, 2024

No, I had the same issue on all versions of Python. Once I pip-installed explicitly (shouldn't be necessary), I had the issue on PyPy only (not the standard CPython), but PyPy for both Python language versions 2 and 3.

I don't want to try to debug the CPython/PyPy difference, but I suspect that it will go away once the missing configuration files are correctly installable through tests_requires.

from scikit-hep-testdata.

eduardo-rodrigues avatar eduardo-rodrigues commented on June 18, 2024

Understood.

Indeed I see the line _default_data_dir = "~/.local/share/skhep_testdata/" in remote_files.py, which is a no-go. That needs to be fixed asap.

from scikit-hep-testdata.

jpivarski avatar jpivarski commented on June 18, 2024

I'll be able to try it out once it's on PyPI. That's how Travis gets it in the uproot tests.

Thanks!

from scikit-hep-testdata.

eduardo-rodrigues avatar eduardo-rodrigues commented on June 18, 2024

Sounds fair :-). @benkrikler, do you have 5 minutes - no need for more - to review my 2 PRs or shall I go ahead, merge, push to PyPI, and @jpivarski then makes a full-scale test?

from scikit-hep-testdata.

benkrikler avatar benkrikler commented on June 18, 2024

Sure, I'll take a quick look at things

from scikit-hep-testdata.

benkrikler avatar benkrikler commented on June 18, 2024

Latest version is on pypi now. Really need to fix the auto-deployment...

from scikit-hep-testdata.

jpivarski avatar jpivarski commented on June 18, 2024

Testing now:

from scikit-hep-testdata.

eduardo-rodrigues avatar eduardo-rodrigues commented on June 18, 2024

Hmm, things are done on Windows: works for Py3 but not for Py2 :-(.

from scikit-hep-testdata.

jpivarski avatar jpivarski commented on June 18, 2024

Nope, just failed in Windows:

____________________ ERROR collecting tests/test_jagged.py ____________________
tests\test_jagged.py:40: in <module>
    class Test(unittest.TestCase):
tests\test_jagged.py:42: in Test
    sample = uproot.open(skhep_testdata.data_path("uproot-sample-6.10.05-uncompressed.root"))["sample"]
build\bdist.win32\egg\skhep_testdata\local_files.py:13: in data_path
    ???
build\bdist.win32\egg\skhep_testdata\remote_files.py:93: in is_known_remote
    ???
build\bdist.win32\egg\skhep_testdata\remote_files.py:51: in is_known
    ???
build\bdist.win32\egg\skhep_testdata\remote_files.py:37: in load_remote_configs
    ???
E   IOError: [Errno 2] No such file or directory: 'c:\\projects\\uproot\\.eggs\\scikit_hep_testdata-0.1.3-py2.7.egg\\skhep_testdata\\remote_datasets.yml'

and I was waiting on Linux before sending this comment...

(I cancelled the Windows manually when I saw that it was failing. Python 2 was merely first.)

((Dang, it's slow. Maybe I should do that Azure thing.))

from scikit-hep-testdata.

jpivarski avatar jpivarski commented on June 18, 2024

There we go: fails in Linux, too.

____________________ ERROR collecting tests/test_jagged.py _____________________
tests/test_jagged.py:40: in <module>
    class Test(unittest.TestCase):
tests/test_jagged.py:42: in Test
    sample = uproot.open(skhep_testdata.data_path("uproot-sample-6.10.05-uncompressed.root"))["sample"]
build/bdist.linux-x86_64/egg/skhep_testdata/local_files.py:13: in data_path
    ???
build/bdist.linux-x86_64/egg/skhep_testdata/remote_files.py:93: in is_known_remote
    ???
build/bdist.linux-x86_64/egg/skhep_testdata/remote_files.py:51: in is_known
    ???
build/bdist.linux-x86_64/egg/skhep_testdata/remote_files.py:37: in load_remote_configs
    ???
E   IOError: [Errno 20] Not a directory: '/home/travis/build/scikit-hep/uproot/.eggs/scikit_hep_testdata-0.1.3-py2.7.egg/skhep_testdata/remote_datasets.yml'

I'm going to try it with an explicit pip-install of the latest version. Before, the only thing that failed in that was PyPy, so I've correspondingly put the PyPy tests first.

from scikit-hep-testdata.

eduardo-rodrigues avatar eduardo-rodrigues commented on June 18, 2024

The .yml file is definitely in the distribution, as quickly tested with python setup.py sdist. So it seems the code in remove_files.py is the culprit?

from scikit-hep-testdata.

jpivarski avatar jpivarski commented on June 18, 2024

Trying again:

from scikit-hep-testdata.

eduardo-rodrigues avatar eduardo-rodrigues commented on June 18, 2024

I do not understand the "???" in the stack trace. I will continue digging in my "free time" ... Very frustrating.

from scikit-hep-testdata.

eduardo-rodrigues avatar eduardo-rodrigues commented on June 18, 2024

@benkrikler, I had a closer look at https://github.com/scikit-hep/scikit-hep-testdata/blob/master/skhep_testdata/local_files.py and do not understand the logic in this

def data_path(filename, raise_missing=True):
    if remote_files.is_known_remote(filename):
        return remote_files.remote_file(filename, raise_missing=raise_missing)

    path = os.path.join(__data_directory__, filename)
    if not os.path.isfile(path) and raise_missing:
        raise RuntimeError("Unknown or missing file: %s" % filename)
    return path

function: it seems to me that the 2 blocks of code should be swapped, meaning, if the file is known in the data directory, just return that and do not even bother looking remotely, no?

I do realise that this does not solve what seems to be a bug in the remote_files module. But at least it should get rid of the error @jpivarski is seeing above, which is triggered by the call in if remote_files.is_known_remote(filename):.

from scikit-hep-testdata.

eduardo-rodrigues avatar eduardo-rodrigues commented on June 18, 2024

Folks, I will have to be offline as usual. I will check things back later in the evening ...

from scikit-hep-testdata.

jpivarski avatar jpivarski commented on June 18, 2024

Yeah, we're in the same situation. Explicit pip-install makes everything work except PyPy. To get a release out, I'm going to cherry-pick my PR but keep it around, waiting for a correction later.

Backing away from these details, why do you need a user-editable configuration file, anyway? Why not just data files (ROOT) and lookups (auxiliary text or right in the Python code itself)? Then you would have no dependence on any placement—let Python decide where to put your egg/wheel/tarball.

You know about __file__, to get the location of the Python file from which it's called, right?

from scikit-hep-testdata.

eduardo-rodrigues avatar eduardo-rodrigues commented on June 18, 2024

@jpivarski, I don't think there any such configuration now. Where are you looking? Yep, the __file__ trick is being used, see for example https://github.com/scikit-hep/scikit-hep-testdata/blob/master/skhep_testdata/local_files.py.

from scikit-hep-testdata.

jpivarski avatar jpivarski commented on June 18, 2024

(By the way, I don't use any remote files. It's failing on setup.)

from scikit-hep-testdata.

jpivarski avatar jpivarski commented on June 18, 2024

I thought remote_datasets.yml was a configuration file because it's YAML. My misunderstanding.

from scikit-hep-testdata.

benkrikler avatar benkrikler commented on June 18, 2024

I wanted the description of the data to be separated from code that actually handles this, since I imagine the dictionary of files being updated more often than the code that handles it. At some point I was going to swap the YAML file for just a file containing a python dictionary, but I went for YAML because it's so much easier to read; I hoped it would eventually double up as the documentation of what's actually in the datasets (possibly adding a description section, for example).

from scikit-hep-testdata.

eduardo-rodrigues avatar eduardo-rodrigues commented on June 18, 2024

Yeah, I'm really convinced that the swap of code blocks I mention above will "fix" the problem (meaning it will hide it, since the remove_files module still needs to be reorganised/fixed) seen e.g. on Travis, so this:
``

==================================== ERRORS ====================================

_____________________ ERROR collecting tests/test_http.py ______________________

tests/test_http.py:41: in

LOCAL = skhep_testdata.data_path("uproot-{FILE}.root".format(FILE=FILE))

/home/travis/build/scikit-hep/uproot/.eggs/scikit_hep_testdata-0.1.3-py3.7.egg/skhep_testdata/local_files.py:13: in data_path

if remote_files.is_known_remote(filename):

/home/travis/build/scikit-hep/uproot/.eggs/scikit_hep_testdata-0.1.3-py3.7.egg/skhep_testdata/remote_files.py:93: in is_known_remote

return RemoteDatasetList.is_known(filename)

/home/travis/build/scikit-hep/uproot/.eggs/scikit_hep_testdata-0.1.3-py3.7.egg/skhep_testdata/remote_files.py:51: in is_known

cls.load_remote_configs()

/home/travis/build/scikit-hep/uproot/.eggs/scikit_hep_testdata-0.1.3-py3.7.egg/skhep_testdata/remote_files.py:37: in load_remote_configs

with open(_remote_dataset_cfg, "r") as infile:

E NotADirectoryError: [Errno 20] Not a directory: '/home/travis/build/scikit-hep/uproot/.eggs/scikit_hep_testdata-0.1.3-py3.7.egg/skhep_testdata/remote_datasets.yml'

!!!!!!!!!!!!!!!!!!! Interrupted: 1 errors during collection !!!!!!!!!!!!!!!!!!!!

=========================== 1 error in 1.29 seconds ============================
``

This happens with a call that is not al all necessary for any of the calls of testdate Jim is doing, which only use files under data/.

Now I really go offline ;-).

from scikit-hep-testdata.

benkrikler avatar benkrikler commented on June 18, 2024

@eduardo-rodrigues could you put that test somewhere (eg open a PR with it)? It's hard to know what you're looking at otherwise

from scikit-hep-testdata.

benkrikler avatar benkrikler commented on June 18, 2024

I suggest agree with Eduardo's idea of swapping the remote-file and local-file precedence, that at least will bury bugs in the remote file module for cases of local files, which should suit Jims needs for now, I guess. Otherwise I don't think the precedence of local vs remote should be a concern anyway: two files with the same name but one considered "remote" the other "local" should probably be considered a bug in any case.

from scikit-hep-testdata.

benkrikler avatar benkrikler commented on June 18, 2024

@eduardo-rodrigues could you put that test somewhere (eg open a PR with it)? It's hard to know what you're looking at otherwise

Ah, apologies, that test is an Uproot test, I see it now.

from scikit-hep-testdata.

jpivarski avatar jpivarski commented on June 18, 2024

You're starting to create a virtual namespace (merging local and remote) that could potentially get large. It's worthwhile to think about how to organize it.

If you decide to move files that I'm using (in order to implement a grand hierarchy), just give me a heads-up and I'll adapt. We need to coordinate versions: I have to restrict uproot versions to particular scikit-hep-testdata versions (much as I have to restrict awkward and uproot-methods). There's a standard way to do this: if you introduce a breaking change in your namespace, create a x.y.0-rc1 version in which y is incremented. pip won't pick that up automatically (saving old uproot versions from breaking for new users). Then tell me about it, I'll update uproot to require >=x.y.0 and that uproot version will use the release candidates. pip treats release candidates as opt-in. When that works, you can release a x.y.0 without the "rc-N" and things move smoothly from there.

from scikit-hep-testdata.

henryiii avatar henryiii commented on June 18, 2024

v0.2.0 is out, @jpivarski can you check to see if this is now better? Also test_requires is deprecated (since python setup.py test is being deprecated/discouraged), you should use [test] and add items to the test options, and run directly with pytest.

from scikit-hep-testdata.

eduardo-rodrigues avatar eduardo-rodrigues commented on June 18, 2024

Folks, let me close this at this stage since a fair bit changed in the package in the last year. If new issues show up with using the package, then better open a new issue.

from scikit-hep-testdata.

Related Issues (19)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.