nervanasystems / maxas Goto Github PK

Assembler for NVIDIA Maxwell architecture

Home Page: https://github.com/NervanaSystems/maxas

License: MIT License

Perl 36.83% C++ 5.17% Cuda 1.73% Sass 56.27%

maxas's Introduction

DISCONTINUATION OF PROJECT

This project will no longer be maintained by Intel. Intel has ceased development and contributions including, but not limited to, maintenance, bug fixes, new releases, or updates, to this project. Intel no longer accepts patches to this project.

MaxAs

Assembler for NVIDIA Maxwell architecture

To install (system-wide):

sudo cpanm git://github.com/NervanaSystems/maxas.git

perl Makefile.PL
make
sudo make install

See wiki pages for more information:

Related work with lots of additional shader assembly (sass) examples:

Nervana Neon

This project is released under the MIT License.

-- Scott Gray

maxas's People

Contributors

Stargazers

Watchers

Forkers

andravin owenustc jethrotan cfscosta alexmlamb longjohncoder davidlee80 mdabidullah abduld anilhebbar gopal-m sudanenator sanyaade-machine-learning liuzeheng mkolod jkloosterman hughperkins ottolu lakinsm sunqilin leliaonvidia apc-llc vedangshah zhangyangang leimingyu ragerdl mlzxy brad-mengchi chenkaiidy jellohuang natureofnature blackhc rdspring1 oxfff luoxiaojian zangcq h4e paradiseds oceanwave2017 jassonvia jokeren tanyokwok mkhairy strategist922 hengjli gpus lucaswu srodrb yangjunpro maxfy1992 kuroro-tian galv andrei-pokrovsky lxgsbqylbk swpflow bubuface0802 csuhawk janboeye yuchaocs lizhelong zhbxlm muxing20 ylxqll shockzort karolherbst sergv fernandos27 melshaer zhujingyang520 litianjian yuboz leonardo-ding benjaminxiang aiyi2099 heruix maziyang08 pinpiew pangge tilaba ainoobs crystalbobby yujaeseo jychen zkh2018 ghscan pengcuo ltj2013 zouyixin116 shadowkun milort wenming2014 airmeng dlyll tonedown zhangboyue brucecenter xrosliang rogerxujiang bruce-zzz caozhongz

maxas's Issues

What is Max64-8?

Hi! Thank you for this repo! It is very helpful to me!!! I have a question, in the wiki part, the last comparison figure has a Max64-8 or Max64-16. I am not sure what does that mean. Would you tell me the meaning? Thank you!!!!

"It is illegal to set a Read-After-Write dependency on a memory store op", on unmodified sass

using microbench, with .cu modified to be simply:

extern "C" __global__ void  microbench(int *out, int *clocks, int *in)
{
  out[0] = 7.0f;
  out[2] = 5.0f;
  clocks[1] = 9.0f;
}

.cpp is modified slightly, since I'm on a 5.0, so I hacked microbench.cpp to accept this. It could be this is root cause for the issue in this issue?

        if (major >= 5 && minor >= 0)
        {

Then I do:

set -e

nvcc -l cuda -o microbench microbench.cpp
nvcc -arch sm_50 -cubin microbench.cu
maxas.pl -e microbench.cubin > microbench.sass

maxas.pl -i microbench.sass microbench.cubin
./microbench

... however it fails on the maxas.pl -i line with:

It is illegal to set a Read-After-Write dependency on a memory store op (store ops don't write to a register)
STG.E [R4], R6;

The .sass generated by maxas.pl -i looks like:

--:-:-:-:6      MOV R1, c[0x0][0x20];
--:-:-:-:1      MOV R0, param_1[0];
--:-:-:-:1      MOV R2, param_0[0];
--:-:-:-:4      MOV R3, param_0[1];
--:-:-:-:1      IADD32I R4.CC, R0, 0x4;
--:-:-:-:2      MOV32I R0, 0x7;
20:3:1:Y:7      IADD.X R5, RZ, param_1[1];
--:1:1:Y:4      STG.E [R4], R6;
--:-:-:Y:0      NOP;
--:-:-:Y:0      NOP;
--:-:-:Y:0      NOP;

Is this:

because I'm using 5.0 system, and maxas only supports 5.2?
a bug in maxas.pl -e (since I cant see the 9.0f constant anywhere in the sass?, I'm immediately suspicious?)
because I'm using not the latest versoin of maxas, which is eg in neon repo?
something I'm doing wrong (other than the cc5.0 thing) ?
something else?

microbench.cpp checks minor is >= 2. is this normal?

Per the message in microbench.cpp, minor should be >=0:

printf("No compute 5.0 device found, exiting.\n");

But hte check is for >=2:

if (major >= 5 && minor >= 2)

On a 940M (Maxwell), this fails to run, since 940M is 5.0 only (I think?). But by modifying the if statement, to accept minor >= 0, it runs ok:

$ ./microbench major 5 minor 0
Using: Id:0 GeForce 940M (5.0)

b:00 w:000 t:0000 l:00 clocks:00000034 out:00000000
b:00 w:001 t:0032 l:00 clocks:00000031 out:00000020
b:00 w:002 t:0064 l:00 clocks:00000032 out:00000040
b:00 w:003 t:0096 l:00 clocks:00000034 out:00000060
average: 32.750, min 31, max: 34

for newbies: filling the missing bits in the documentation

So not an issue per se. But some added documentation for newbies to CUDA-Assembler.

Somehow I found it challenging to follow the docs+code so I first started by looking at the tid/address-shifts-xors via a small c programm, also printing the addresses in binary as mentioned.
Than I figured the matrix A is stored as non-transpose and B as transpose so the loading is the same for both and how the FFMAs are actually added together, which surprisingly is lacking in the documentation.
Finally I wrote 3 pages with some explanations so I don't forget them and which could be helpful to others.

Greetings from Munich,Germany
Stefan

Problems with MOV-instruction and the "enhanced cubin support" commit

Hi,

I've encountered two problems. Both are possible to replicate with the microbench application code by extracting the sass code and then trying to insert the unmodified code into the kernel again:

$ maxas.pl -e microbench.cubin microbench.sass
$ maxas.pl -i microbench.sass microbench.cubin

When doing the insertion step, I get this:
Unable to encode instruction: MOV R1, c[0x0];

It seems like something goes wrong in the extraction step here, because the second argument should have two address fields. This is what nvdisasm returns: MOV R1, c[0x0][0x20];
The error is removed by adding [0x20].

I've seen this problem in all kernels I've been trying to insert after extracting them with maxas.

After modifying the MOV-instruction, this is the output from the insertion step:

Unknown Code 0x10 (size:0)

Unknown Code 0x18 (size:0)

Unknown Code 0x00 (size:0)

Modified microbench CTAID Offsets: '' => '0018,0028'

Modified microbench Exit Offsets: '00e8,0000,0000' => '0138'

Modified microbench ParamSecSize: 100 => 88

Kernel: microbench, Instructions: 0, Register Count: 10, Bank Conflicts: 0, Reuse: 21.1% (4/19)

It seems strange that all these parameters are modified when the kernel is unchanged. And indeed, when running the application with the modified cubin I get a segmentation fault. I've tried different kernels with the same result. This doesn't happen if I roll back maxas to the commit before the enhanced cubin support was added and try the exact same thing, so I guess something goes wrong when the offsets are modified.

Thanks!

missing copyright on sgemm64.sass

I am releasing a derivative work of sgemm64.sass, but the file has no copyright information. Please add so that proper attribution can be made.

Thanks!

Why no updates?

why no updates?

All the images in the SGEMM Walkthrough wiki are now broken links

https://github.com/NervanaSystems/maxas/wiki/SGEMM

Question about broadcast of shared memory in SGEMM wiki

Hi:
I readed the SGEMM document through, I found the following narratation is different from Nvidia's document.

In the wiki: https://github.com/NervanaSystems/maxas/wiki/SGEMM
It was said:
How do you load from shared using quad vectors without bank conflicts? Well, according to the documentation, so long as all the accesses are within 32 words (128 bytes), we're fine.

from CUDA C PROGRAMMING GUIDE V7.5 =>section G.5.3. Shared Memory
It was said:
A shared memory request for a warp does not generate a bank conflict between two
threads that access any address within the same 32-bit word (even though the two
addresses fall in the same bank): In that case, for read accesses, the word is broadcast to
the requesting threads and for write accesses, each address is written by only one of the
threads (which thread performs the write is undefined).

The difference is that 32 words and 32 bit word , which is right ?

No Project or Solution files for microbenchmark

Wiki / Getting Started page indicates that the microbenchmark program be built with Visual Studio, but there are no project or solution files in the distribution.

TLDS RGBA instruction broken

The TLDS RGBA instruction has been broken since March 4 checkin b744f88

This breaks maxDNN.

The problem is that the following lines were removed from MaxasGrammar.pm:

TLDS: chnls
0x0010000000000000 RGBA