Code Monkey home page Code Monkey logo

Comments (12)

jcwchen avatar jcwchen commented on July 20, 2024 2

Yes we should change to remove the actual model path from onnx.hub (was updated by #5267) instead of model_name as @justinchuby suggested since the CI will easily break if it is running out of space.

However, 184 seems workable 2 weeks ago: https://github.com/onnx/onnx/actions/runs/8776485339/job/24080049585 and so there might be other issue. Still, we can help @ramkrishna2910 to add "test ONNX Model Zoo" label in his PR to test the fix in advance. I can help review PR as well. Thanks!

from onnx.

justinchuby avatar justinchuby commented on July 20, 2024 1

I can submit a fix for that.

Much appreciated! I think running the tests on all models is ok, given that the CI is triggered weekly. I find using tempfile (https://github.com/microsoft/onnxscript/blob/03b55e3cd2aeb5603b4d880a6beb02333af3974a/tools/ir/model_zoo_test/model_zoo_test.py#L28-L32) and multiprocessing https://github.com/microsoft/onnxscript/blob/03b55e3cd2aeb5603b4d880a6beb02333af3974a/tools/ir/model_zoo_test/model_zoo_test.py#L105 helpful.

from onnx.

ramkrishna2910 avatar ramkrishna2910 commented on July 20, 2024

Hi @justinchuby
Thanks for pointing this out. Let me take a look at this.

from onnx.

justinchuby avatar justinchuby commented on July 20, 2024

the lines

            # remove the model to save space in CIs
            if os.path.exists(model_name):
                os.remove(model_name)

may not be functioning correctly because I don't believe model_name is the actual path to the downloaded model

from onnx.

ramkrishna2910 avatar ramkrishna2910 commented on July 20, 2024

I am running the test locally and I dont see any issues so far after running through 20 models. I will let the test run all the way to see where it fails.
I dont see an exception captured on CI for this failure, which is surprising.

from onnx.

justinchuby avatar justinchuby commented on July 20, 2024

Could be disk out of space if the models are not properly cleared

from onnx.

ramkrishna2910 avatar ramkrishna2910 commented on July 20, 2024

Yeah thats likely. We are downloading 184 models.

from onnx.

justinchuby avatar justinchuby commented on July 20, 2024

Can be updated using this example: https://github.com/microsoft/onnxscript/pull/1489/files

from onnx.

ramkrishna2910 avatar ramkrishna2910 commented on July 20, 2024

I was able to run the entire test on my local machine (it took a while). The test completed without any failures but as you suspected the cache is not being deleted.
I can submit a fix for that. Also, do we want to run this test on all 184 models? I believe we can reduce the number of models in this flow to a few representative ones. Thoughts?

from onnx.

justinchuby avatar justinchuby commented on July 20, 2024

Forgot to add the label

from onnx.

justinchuby avatar justinchuby commented on July 20, 2024

Now runs fine. Thanks @ramkrishna2910 ! There are four failures:

--------------Time used: 0.31854987144470215 secs-------------
In all 184 models, 4 models failed, 25 models were skipped
ResNet-preproc failed because: Field 'type' of 'value_info' is required but missing.
VGG 16-bn failed because: /Users/runner/work/onnx/onnx/onnx/version_converter/adapters/transformers.h:35: operator(): Assertion `node->i(attr) == value` failed: Attribute spatial must have value 1
VGG 19-bn failed because: /Users/runner/work/onnx/onnx/onnx/version_converter/adapters/transformers.h:35: operator(): Assertion `node->i(attr) == value` failed: Attribute spatial must have value 1
SSD-MobilenetV1 failed because: [ShapeInferenceError] Inference error(s): (op_type:Loop, node name: generic_loop_Loop__48): [ShapeInferenceError] Inference error(s): (op_type:If, node name: Postprocessor/BatchMultiClassNonMaxSuppression/map/while/PadOrClipBoxList/cond_If__115): [ShapeInferenceError] Inference error(s): (op_type:Concat, node name: Postprocessor/BatchMultiClassNonMaxSuppression/map/while/PadOrClipBoxList/cond/concat): [ShapeInferenceError] All inputs to Concat must have same rank. Input 1 has rank 1 != 2

from onnx.

justinchuby avatar justinchuby commented on July 20, 2024

I will create a separate issue

from onnx.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.