Code Monkey home page Code Monkey logo

maxas's Introduction

DISCONTINUATION OF PROJECT

This project will no longer be maintained by Intel. Intel has ceased development and contributions including, but not limited to, maintenance, bug fixes, new releases, or updates, to this project. Intel no longer accepts patches to this project.

MaxAs

Assembler for NVIDIA Maxwell architecture

To install (system-wide):

sudo cpanm git://github.com/NervanaSystems/maxas.git

or

perl Makefile.PL
make
sudo make install

See wiki pages for more information:

Related work with lots of additional shader assembly (sass) examples:

This project is released under the MIT License.

-- Scott Gray

maxas's People

Contributors

khosra avatar scttl avatar sfblackl-intel avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

maxas's Issues

TLDS RGBA instruction broken

The TLDS RGBA instruction has been broken since March 4 checkin b744f88

This breaks maxDNN.

The problem is that the following lines were removed from MaxasGrammar.pm:

TLDS: chnls
0x0010000000000000 RGBA

Problems with MOV-instruction and the "enhanced cubin support" commit

Hi,

I've encountered two problems. Both are possible to replicate with the microbench application code by extracting the sass code and then trying to insert the unmodified code into the kernel again:

$ maxas.pl -e microbench.cubin microbench.sass
$ maxas.pl -i microbench.sass microbench.cubin

  1. When doing the insertion step, I get this:
    Unable to encode instruction: MOV R1, c[0x0];

It seems like something goes wrong in the extraction step here, because the second argument should have two address fields. This is what nvdisasm returns: MOV R1, c[0x0][0x20];
The error is removed by adding [0x20].

I've seen this problem in all kernels I've been trying to insert after extracting them with maxas.

  1. After modifying the MOV-instruction, this is the output from the insertion step:

Unknown Code 0x10 (size:0)

Unknown Code 0x18 (size:0)

Unknown Code 0x00 (size:0)

Unknown Code 0x00 (size:0)

Modified microbench CTAID Offsets: '' => '0018,0028'

Modified microbench Exit Offsets: '00e8,0000,0000' => '0138'

Modified microbench ParamSecSize: 100 => 88

Kernel: microbench, Instructions: 0, Register Count: 10, Bank Conflicts: 0, Reuse: 21.1% (4/19)

It seems strange that all these parameters are modified when the kernel is unchanged. And indeed, when running the application with the modified cubin I get a segmentation fault. I've tried different kernels with the same result. This doesn't happen if I roll back maxas to the commit before the enhanced cubin support was added and try the exact same thing, so I guess something goes wrong when the offsets are modified.

Thanks!

missing copyright on sgemm64.sass

I am releasing a derivative work of sgemm64.sass, but the file has no copyright information. Please add so that proper attribution can be made.

Thanks!

microbench.cpp checks minor is >= 2. is this normal?

Per the message in microbench.cpp, minor should be >=0:

printf("No compute 5.0 device found, exiting.\n");

But hte check is for >=2:

if (major >= 5 && minor >= 2)

On a 940M (Maxwell), this fails to run, since 940M is 5.0 only (I think?). But by modifying the if statement, to accept minor >= 0, it runs ok:

$ ./microbench major 5 minor 0
Using: Id:0 GeForce 940M (5.0)

b:00 w:000 t:0000 l:00 clocks:00000034 out:00000000
b:00 w:001 t:0032 l:00 clocks:00000031 out:00000020
b:00 w:002 t:0064 l:00 clocks:00000032 out:00000040
b:00 w:003 t:0096 l:00 clocks:00000034 out:00000060
average: 32.750, min 31, max: 34

What is Max64-8?

Hi! Thank you for this repo! It is very helpful to me!!! I have a question, in the wiki part, the last comparison figure has a Max64-8 or Max64-16. I am not sure what does that mean. Would you tell me the meaning? Thank you!!!!

for newbies: filling the missing bits in the documentation

So not an issue per se. But some added documentation for newbies to CUDA-Assembler.

Somehow I found it challenging to follow the docs+code so I first started by looking at the tid/address-shifts-xors via a small c programm, also printing the addresses in binary as mentioned.
Than I figured the matrix A is stored as non-transpose and B as transpose so the loading is the same for both and how the FFMAs are actually added together, which surprisingly is lacking in the documentation.
Finally I wrote 3 pages with some explanations so I don't forget them and which could be helpful to others.

Greetings from Munich,Germany
Stefan

Question about broadcast of shared memory in SGEMM wiki

Hi:
I readed the SGEMM document through, I found the following narratation is different from Nvidia's document.

In the wiki: https://github.com/NervanaSystems/maxas/wiki/SGEMM
It was said:
How do you load from shared using quad vectors without bank conflicts? Well, according to the documentation, so long as all the accesses are within 32 words (128 bytes), we're fine.

from CUDA C PROGRAMMING GUIDE V7.5 =>section G.5.3. Shared Memory
It was said:
A shared memory request for a warp does not generate a bank conflict between two
threads that access any address within the same 32-bit word (even though the two
addresses fall in the same bank): In that case, for read accesses, the word is broadcast to
the requesting threads and for write accesses, each address is written by only one of the
threads (which thread performs the write is undefined).

The difference is that 32 words and 32 bit word , which is right ?

"It is illegal to set a Read-After-Write dependency on a memory store op", on unmodified sass

using microbench, with .cu modified to be simply:

extern "C" __global__ void  microbench(int *out, int *clocks, int *in)
{
  out[0] = 7.0f;
  out[2] = 5.0f;
  clocks[1] = 9.0f;
}

.cpp is modified slightly, since I'm on a 5.0, so I hacked microbench.cpp to accept this. It could be this is root cause for the issue in this issue?

        if (major >= 5 && minor >= 0)
        {

Then I do:

set -e

nvcc -l cuda -o microbench microbench.cpp
nvcc -arch sm_50 -cubin microbench.cu
maxas.pl -e microbench.cubin > microbench.sass

maxas.pl -i microbench.sass microbench.cubin
./microbench

... however it fails on the maxas.pl -i line with:

It is illegal to set a Read-After-Write dependency on a memory store op (store ops don't write to a register)
STG.E [R4], R6;

The .sass generated by maxas.pl -i looks like:

--:-:-:-:6      MOV R1, c[0x0][0x20];
--:-:-:-:1      MOV R0, param_1[0];
--:-:-:-:1      MOV R2, param_0[0];
--:-:-:-:4      MOV R3, param_0[1];
--:-:-:-:1      IADD32I R4.CC, R0, 0x4;
--:-:-:-:2      MOV32I R0, 0x7;
20:3:1:Y:7      IADD.X R5, RZ, param_1[1];
--:1:1:Y:4      STG.E [R4], R6;
--:-:-:Y:0      NOP;
--:-:-:Y:0      NOP;
--:-:-:Y:0      NOP;

Is this:

  • because I'm using 5.0 system, and maxas only supports 5.2?
  • a bug in maxas.pl -e (since I cant see the 9.0f constant anywhere in the sass?, I'm immediately suspicious?)
  • because I'm using not the latest versoin of maxas, which is eg in neon repo?
  • something I'm doing wrong (other than the cc5.0 thing) ?
  • something else?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.