Code Monkey home page Code Monkey logo

Comments (17)

corneliusboehm avatar corneliusboehm commented on June 9, 2024

Hi @tjb-tech! First of all, is your GPU readily set up with CUDA? You can verify that by running nvidia-smi and checking the reported CUDA version.

If you enable the --use_gpu option on the train_classifier.py script, it will automatically use the GPU at index 0 for training and nvidia-smi should list a new python process.

from sense.

tjb-tech avatar tjb-tech commented on June 9, 2024

Hi @tjb-tech! First of all, is your GPU readily set up with CUDA? You can verify that by running nvidia-smi and checking the reported CUDA version.

If you enable the --use_gpu option on the train_classifier.py script, it will automatically use the GPU at index 0 for training and nvidia-smi should list a new python process.

First of all, thank you very much for your timely reply to us. Through your method, we have checked that nvidia-smi does have a new line of Python process. However, I found that the CPU utilization rate was close to 100%, while the GPU utilization rate was close to 1%. Could you tell me why? I hope to get your professional reply. Thank you very much

from sense.

corneliusboehm avatar corneliusboehm commented on June 9, 2024

The problem with video datasets is that loading and decoding of the videos can get expensive. So it can happen that the update step of the model on the GPU is done faster than the preparation of the next batch, which leads to the CPU being utilized more than the GPU.
However, a utilization of 1% is really low. Could you verify with the following command during training if the utilization is constantly that low or if there are at least some periodic spikes?

watch -n 1 nvidia-smi

And could you send over your CPU and GPU specs?

from sense.

tjb-tech avatar tjb-tech commented on June 9, 2024

The problem with video datasets is that loading and decoding of the videos can get expensive. So it can happen that the update step of the model on the GPU is done faster than the preparation of the next batch, which leads to the CPU being utilized more than the GPU.
However, a utilization of 1% is really low. Could you verify with the following command during training if the utilization is constantly that low or if there are at least some periodic spikes?

watch -n 1 nvidia-smi

And could you send over your CPU and GPU specs?

First of all, thank you very much for taking time out of your busy schedule to answer my questions. My CPU is I5-9300H, and my GPU is GTX1650. The screenshots of SMI before and after operation are as follows. The running status of my CPU and GPU is as follows. Thank you again for your prompt and enthusiastic reply. Looking forward to hearing from you soon
N$DUF587IWJ2 P$((S8H4JI
X%P3WL)OY07{870UMZ461MF
C{`CGF0VP XCS{1VAA1{%71

from sense.

corneliusboehm avatar corneliusboehm commented on June 9, 2024

Thanks for the info! It looks like the Python process is allocating some memory on the GPU, which is a good sign. Do you see any other output on the console of the epochs being finished? And do you get a resulting checkpoint?

Generally, does training in PyTorch work for you in other projects?
One more thing you could check is the following:

import torch
torch.cuda.is_available()

I must admit that we haven't tested our code on Windows in a while, so there might also be a platform-related issue.

from sense.

corneliusboehm avatar corneliusboehm commented on June 9, 2024

Hey @tjb-tech, have you been able to resolve your issue?

from sense.

tjb-tech avatar tjb-tech commented on June 9, 2024

Hey @tjb-tech, have you been able to resolve your issue?

Thank you very much for your concern and I'm sorry for not replying to you in time. We tried using torch.cuda.is_available(), and the return value is true, but the GPU and CPU usage are still the same as before, so I think it may be a system compatibility issue, which you can do some further research on. By the way, my system is Windows 10.

from sense.

corneliusboehm avatar corneliusboehm commented on June 9, 2024

Thanks for the update. Do you still get a checkpoint after training and how long does it take? Because if that generally works, I would go ahead and close this issue for now.

from sense.

tjb-tech avatar tjb-tech commented on June 9, 2024

Thanks for your reply. That's all about the GPU problem for the time being. I'm still trying to run your sense_studio code but we encountered the first error:

* Serving Flask app "sense_studio" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: on
 * Restarting with stat
 * Debugger is active!
 * Debugger PIN: 105-309-328
 * Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)

I tried the following scenario

from gevent import pywsgi
if __name__ == '__main__':
Server = pywsgi.WSGIServer(('0.0.0.0', 5000), app)
server.serve_forever()

Here we go
But again, I encountered the following problems
JYN)EEP5V {U%G`JEVP365F

[2021-04-06 13:19:30,225] ERROR in app: Exception on / [GET]
Traceback (most recent call last):
  File "D:\Anaconda\envs\sense\lib\site-packages\flask\app.py", line 2447, in wsgi_app
    response = self.full_dispatch_request()
  File "D:\Anaconda\envs\sense\lib\site-packages\flask\app.py", line 1952, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "D:\Anaconda\envs\sense\lib\site-packages\flask\app.py", line 1821, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "D:\Anaconda\envs\sense\lib\site-packages\flask\_compat.py", line 39, in reraise
    raise value
  File "D:\Anaconda\envs\sense\lib\site-packages\flask\app.py", line 1950, in full_dispatch_request
    rv = self.dispatch_request()
  File "D:\Anaconda\envs\sense\lib\site-packages\flask\app.py", line 1936, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "D:/MyDocuments/Service Outsourcing/sense/tools/sense_studio/sense_studio.py", line 46, in projects_overview
    project['exists'] = os.path.exists(project['path'])
TypeError: 'bool' object is not subscriptable
127.0.0.1 - - [2021-04-06 13:19:30] "GET / HTTP/1.1" 500 490 0.004986
127.0.0.1 - - [2021-04-06 13:19:30] "GET /favicon.ico HTTP/1.1" 404 420 0.000999

from sense.

corneliusboehm avatar corneliusboehm commented on June 9, 2024
* Serving Flask app "sense_studio" (lazy loading)
* Environment: production
  WARNING: This is a development server. Do not use it in a production deployment.
  Use a production WSGI server instead.
* Debug mode: on
* Restarting with stat
* Debugger is active!
* Debugger PIN: 105-309-328
* Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)

This is a regular output and not relevant when running this application locally. You don't need to worry about setting up a WSGI server.

If the second error persists, I will have to take a look at that though 😕 Can you already send me the contents of your sense/tools/sense_studio/projects_config.json?

from sense.

tjb-tech avatar tjb-tech commented on June 9, 2024
* Serving Flask app "sense_studio" (lazy loading)
* Environment: production
  WARNING: This is a development server. Do not use it in a production deployment.
  Use a production WSGI server instead.
* Debug mode: on
* Restarting with stat
* Debugger is active!
* Debugger PIN: 105-309-328
* Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)

This is a regular output and not relevant when running this application locally. You don't need to worry about setting up a WSGI server.

If the second error persists, I will have to take a look at that though 😕 Can you already send me the contents of your sense/tools/sense_studio/projects_config.json?

Of course, my file looks like this
image

from sense.

corneliusboehm avatar corneliusboehm commented on June 9, 2024

Very interesting. This is either a very outdated format or an error occurred. Anyway, I would recommend deleting this file and trying again. Also you might want to pull our latest master branch, as we've recently added a few improvements.

from sense.

tjb-tech avatar tjb-tech commented on June 9, 2024

Very interesting. This is either a very outdated format or an error occurred. Anyway, I would recommend deleting this file and trying again. Also you might want to pull our latest master branch, as we've recently added a few improvements.

Thank you so much for your timely help. We have opened Sense Studio and created our own project. We have also uploaded our own data, but we can't click the Training button, the browser shows JavaScript :void(0); , as shown in the figure below
image

I would appreciate it very much if you could answer my questions

from sense.

corneliusboehm avatar corneliusboehm commented on June 9, 2024

Yes, the training module has only been added a few days ago, so after pulling our latest updates this feature should be enabled for you.

from sense.

tjb-tech avatar tjb-tech commented on June 9, 2024

I am very sorry that I have not been able to continue to discuss this project with you recently due to my busy business. Your suggestion last time was very effective and I admire it very much. These two days, I reviewed your project again, and carefully read the blog on the 20BN official website. I noticed the following test screen in your demo video, which was very impressive.
截屏2021-04-17 上午10 42 42
I want to achieve this effect on my computer. Could you please tell me how the content of this test page is completed? Could you share this part? My heartfelt thanks in advance! Once again, I would like to express my admiration for your open source spirit.

from sense.

guillaumebrg avatar guillaumebrg commented on June 9, 2024

Hey @tjb-tech, thank you for the kind words!

That specific demo which you found on our website is kind of old and wasn't obtained using sense. We haven't released this exact model with this exact set of classes. However, we've recently been working on providing a gesture control demo within sense which might do what you need. It's still work in progress (model weights haven't been released yet) but you can already have a look here: #149.

from sense.

corneliusboehm avatar corneliusboehm commented on June 9, 2024

It looks like the original issue has been solved, so I'm going to close this thread now.
We're happy to keep supporting you, if more questions come up!

from sense.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.