Code Monkey home page Code Monkey logo

Comments (11)

cyberpython avatar cyberpython commented on May 14, 2024

Duplicate of bentoml/BentoML#3985

from openllm.

aarnphm avatar aarnphm commented on May 14, 2024

qq: Can you check if on Linux the following path exists /opt/rocm/libexec/rocm_smi?

from openllm.

cyberpython avatar cyberpython commented on May 14, 2024

Yeap (looks like I have 2 ROCM installations for some reason, one under /opt/rocm and another under /opt/rocm-5.5.0 with the latter one being used):

~ $ ls -al /opt/rocm/libexec/rocm_smi
total 176
drwxr-xr-x 2 root root   4096 Ιουν 23 00:50 .
drwxr-xr-x 6 root root   4096 Ιουν 24 14:28 ..
-rwxrwxr-x 1 root root 148963 Απρ  19 03:03 rocm_smi.py
-rw-r--r-- 1 root root  19229 Απρ  19 03:03 rsmiBindings.py
~ $ ls -al /opt/rocm-5.5.0/libexec/rocm_smi/
total 176
drwxr-xr-x 2 root root   4096 Ιουν 23 00:50 .
drwxr-xr-x 6 root root   4096 Ιουν 24 14:28 ..
-rwxrwxr-x 1 root root 148963 Απρ  19 03:03 rocm_smi.py
-rw-r--r-- 1 root root  19229 Απρ  19 03:03 rsmiBindings.py

from openllm.

aarnphm avatar aarnphm commented on May 14, 2024

Does the /opt/rocm-5.5.0 symlinked back to /opt/rocm?

from openllm.

aarnphm avatar aarnphm commented on May 14, 2024

I have preliminary support for AMD GPU on main branch. You can try it out:

pip install git+https://github.com/bentoml/openllm@main

from openllm.

aarnphm avatar aarnphm commented on May 14, 2024

This has been included in 0.1.19

from openllm.

cyberpython avatar cyberpython commented on May 14, 2024

Awesome!! Thanks!

I installed 0.1.19 from PyPI and tried to run the StarCoder model but I still get No GPU available, therefore this command is disabled. I was not expecting my GPU to be able to load StarCoder (not enough vRAM) but it should at least try to.

Installation & execution steps:

~$ python3 -m venv env
~$ source env/bin/activate
~$ pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.4.2
~$ pip3 install openllm==0.1.19
~$ pip install "openllm[starcoder]"
openllm start starcoder

PyTorch reports that GPU is available:

~$ python3 -c "import torch;print(torch.cuda.is_available())"
True

rocminfo output:

~$ rocminfo 
ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 7 5825U with Radeon Graphics
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 7 5825U with Radeon Graphics
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2000                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            16                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    12351708(0xbc78dc) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    12351708(0xbc78dc) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    12351708(0xbc78dc) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx90c                             
  Uuid:                    GPU-XX                             
  Marketing Name:          AMD Radeon Graphics                
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      1024(0x400) KB                     
  Chip ID:                 5607(0x15e7)                       
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   2000                               
  BDFID:                   1792                               
  Internal Node ID:        1                                  
  Compute Unit:            8                                  
  SIMDs per CU:            4                                  
  Shader Engines:          1                                  
  Shader Arrs. per Eng.:   1                                  
  WatchPts on Addr. Ranges:4                                  
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          64(0x40)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        40(0x28)                           
  Max Work-item Per CU:    2560(0xa00)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    4194304(0x400000) KB               
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx90c:xnack-   
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***

from openllm.

aarnphm avatar aarnphm commented on May 14, 2024

Can you set CUDA_VISIBLE_DEVICES=0?

from openllm.

cyberpython avatar cyberpython commented on May 14, 2024

Just tried it, no luck - is there any logs / output I could provide that could help?

In any case, it could be an issue with my setup, if it works on another AMD setup I suggest this issue gets closed.

from openllm.

aarnphm avatar aarnphm commented on May 14, 2024

I don't have my hand on a AMD GPU right now, so I'm just going off the assumption of rocm is setting up correctly.

amd_gpus = get_resource(resource_request, "amd.com/gpu")
This is where we handle the model.

from openllm.

aarnphm avatar aarnphm commented on May 14, 2024

tentatively support since I don't really have the access to hardware to test this out.

from openllm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.