tensor-compiler / taco Goto Github PK

View Code? Open in Web Editor NEW

1.2K 57.0 183.0 6.93 MB

The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs

Home Page: http://tensor-compiler.org

License: Other

CMake 0.32% C++ 95.36% Ruby 0.06% C 0.07% Makefile 0.02% Python 3.82% Shell 0.36%

tensor tensor-algebra linear-algebra library code-generator sparse tensor-algebra-compiler

taco's People

Contributors

Stargazers

Watchers

Forkers

shoaibkamil lugatod liviust stephenchouca willow-ahrens xuanhan863 hal2001 strategist922 lefromage bittdy khan-faiz benjamesbabala hbcbh1999 syoyo emrul c8pan stillwater-sc ggxxding eqqlyz backyes jamienoss menuka94 praveenmunagapati formulas-and-numbers neo4reo codeaudit happybaoliang leiqu007 ml-lab vseledkin francesco-bongiovanni jonike rsenapps lmd1993 wolf1981 limin2021 oicirtap johirbuet denglly yochju tmarkovich hungpham2017 gkanwar fredrikbk reiisky liuguoyou preejackie keithc-ca ychen306 pecanjk marek-sezemsky mbrukman sergeco brandonzhong jcschaffabm suzmue chandra1123 suganth1997 stelleg dongxiao92 denmak1 trevol eadonpeng seanbaxter luoyujun batermj yujunfeng yzh119 architectureofthings mtaillefumier xichuang penpornk yangdegang ct-clmsn shaolinbit amaleewilson kulinseth infinoid dcbdan jjolly liutongxuan jwshii vtlvs hochawa learncv abaaac advaypal mattlkf tewk guilhermeleobas quansight-labs roastduck rohany shawn-peng dbd64 changwan0000 nidhalrahali zyzyzy3zy heptagonhust wangyuyue

taco's Issues

Read/write test suite does not clean up temporary files

After running the test suite the following temporary files are left in the file system

diffcommand.tac
diffresult
rua_32.rb.csc

Any temporary files should be created in a temporary directory retrieved from the operating system, the same way that Shoaib's C backend works. They should also be deleted.

Perhaps a good way to do this is to have the file writers write to an ostream object? This is both more flexible for a user, and also allows the the test suite to insert a listener that does not require writing the data to disk.

No Examples Printed When Running Taco

License and comments for taco-tool

Generate a license-comment for each use of taco-tool, probably something short referring to the MIT license and the taco url.
The taco-tool could also have an option "-nolicense" to hide this comment.

Variable Declaration IR node

It would be nice to either add a VarDecl node or to be able to tag a VarAssign with whether it is a declaration or not.

This way we can emit more readable code, code that is easier to copy/modify, and it will also help us detect when we emit code that accesses variables in the wrong scope using fuzz testing.

Example:

for (int iB = 0; iB < 3; iB += 1) {
  B1_ptr = ((0 * 3) + iB);
  a1_ptr = ((0 * 3) + iB);

  tj = 0;
  for (int jB = 0; jB < 3; jB += 1) {
    B2_ptr = ((B1_ptr * 3) + jB);
    c1_ptr = ((0 * 3) + jB);

    tj = (tj + (B.vals[B2_ptr] * c.vals[c1_ptr]));
  }
  a.vals[a1_ptr] = tj;
}

// becomes

for (int iB = 0; iB < 3; iB += 1) {
  int B1_ptr = ((0 * 3) + iB);
  int a1_ptr = ((0 * 3) + iB);

  double tj = 0.0;
  for (int jB = 0; jB < 3; jB += 1) {
    int B2_ptr = ((B1_ptr * 3) + jB);
    int c1_ptr = ((0 * 3) + jB);

    tj = (tj + (B.vals[B2_ptr] * c.vals[c1_ptr]));
  }
  a.vals[a1_ptr] = tj;
}

Support for parenthesis parsing

taco "a(i) = (b(i) + c(i)) * d(i)"
Expected tensor name
Segmentation fault: 11

Scalar variable and literal parsing

Add support for scalar literals in the parser.

Non-templated Tensor

The Tensor class should allow non-templated usage, e.g. Tensor A(taco::Double, {2,3}, csr);. This because we don't need template code for compute, only for insertion.

With this API we might be able to deprecate internal::Tensor, which would clean up our code.

Cyclic shared_ptr in TensorPath

The class TensorPath has a shared_ptr to a class that contains TensorPathStep instances. However, these instances has a copy of the TensorPath which creates a shared_ptr cycle.

taco "a(i) = b(i) + c(i)" -f=b:s -f=a:s

Generated code is incorrect as a1_ptr is not reset to zero after first loop and before second loop, so can potentially go out of bounds of a.vals:

for (int b1_ptr = b.d1.ptr[0]; b1_ptr < b.d1.ptr[(0 + 1)]; b1_ptr += 1) {
  ib = b.d1.idx[b1_ptr];
  c1_ptr = ((0 * 3) + ib);

  a.vals[a1_ptr] = (a.vals[a1_ptr] + (b.vals[b1_ptr] + c.vals[c1_ptr]));
  a1_ptr = (a1_ptr + 1);
}
for (int ic = 0; ic < 3; ic += 1) {
  c1_ptr = ((0 * 3) + ic);

  a.vals[a1_ptr] = (a.vals[a1_ptr] + c.vals[c1_ptr]);
  a1_ptr = (a1_ptr + 1);
}

Different generated code on different environment

2 different generated codes.
First one on my dell linux laptop with gcc540
Second one on Lanka with gcc540

https://gist.github.com/Lugatod/3d8ddf73d1aedfaea15878b61558143a

To generate the second one on Lanka:

export CC=/data/scratch/lugato/gcc540/bin/gcc
export CXX=/data/scratch/lugato/gcc540/bin/g++
export LD_LIBRARY_PATH=/data/scratch/lugato/gcc540/lib64
compile taco with cmake ..
./bin/taco-test vector_add/alloc.storage/0

Common Iterator Elimination

We should merge common iterators before invoking the lowering machinery. That is, two iterators for the same index variable and that iterate over identical dimensions, should be replaced with a single iterator.

Examples

for (int ib = 0; ib < 3; ib += 1) {
  b1_ptr = ((0 * 3) + ib);
  c1_ptr = ((0 * 3) + ib);
  a1_ptr = ((0 * 3) + ib);

  a.vals[a1_ptr] = (b.vals[b1_ptr] + c.vals[c1_ptr]);
}

// becomes

for (int i = 0; i < 3; i += 1) {
  i_ptr = ((0 * 3) + ib);
  a.vals[i_ptr] = (b.vals[i_ptr] + c.vals[i_ptr]);
}

b1_ptr = b.d1.ptr[0];
ic = 0;
while (b1_ptr < b.d1.ptr[(0 + 1)]) {
  ib = b.d1.idx[b1_ptr];
  c1_ptr = ((0 * 3) + ic);
  a1_ptr = ((0 * 3) + ic);

  if (ib == ic) {
    a.vals[a1_ptr] = (b.vals[b1_ptr] + c.vals[c1_ptr]);
  }
  else {
    a.vals[a1_ptr] = c.vals[c1_ptr];
  }

  if (ib == ic)
    b1_ptr = (b1_ptr + 1);
  ic = (ic + 1);
}
while (ic < 3) {
  c1_ptr = ((0 * 3) + ic);
  a1_ptr = ((0 * 3) + ic);

  a.vals[a1_ptr] = c.vals[c1_ptr];

  ic = (ic + 1);
}

// becomes

b1_ptr = b.d1.ptr[0];
i = 0;
while (b1_ptr < b.d1.ptr[(0 + 1)]) {
  ib = b.d1.idx[b1_ptr];
  i_ptr = ((0 * 3) + i);

  if (ib == i) {
    a.vals[i_ptr] = (b.vals[b1_ptr] + c.vals[i_ptr]);
  }
  else {
    a.vals[i_ptr] = c.vals[i_ptr];
  }

  if (ib == i)
    b1_ptr = (b1_ptr + 1);
  i = (i + 1);
}
while (i < 3) {
  i_ptr = ((0 * 3) + i);
  a.vals[i_ptr] = c.vals[i_ptr];
  i = (i + 1);
}

Output dimension order wrong

taco "A(i,j) = B(i,k,l) * C(k,j) * D(l,j)" -f=B:sss -f=A:ss:10 -f=C:ss:10 -f=D:ss:10 -a

The generated assembly loop does not take into account the output dimension ordering.

taco tool prints empty assembly loops when the output is dense

It would be better if it emitted an empty function or reported that the output is dense so no need for assembly.

Perhaps we have to actually emit code to allocate memory for the dense value arrays...

MTTKRP slower with dense output than with sparse output

Compute time of MTTKRP with dense output is approximately 10% more than with sparse output.

use einstein convention by default

Right now, you have to declare a variable as a summation variable, e.g. k("k", Var::Sum) . It would be nice if that was default. Infer this from A(i,j) = X(i,j,k) * v(k);

Taco should generate dimension-size-independent code

Currently, the code generation uses dimension sizes in the code. We should (perhaps optionally) also generate dimension-size-independent code, so that the same code can be used to e.g. do spmv() for a 10x25 matrix as well as a 100x25 matrix, etc.

Compute code emission incorrect

We need proper compute code emission at and below the last free variable. The current compute code emission scheme emits code into the wrong loop causing excessive computation. In the code below the value a(i) is added in too many times.

taco "A(i,j) = B(i,j) * c(j) + a(i)"
for (int iB = 0; iB < 3; iB += 1) {
  B1_ptr = ((0 * 3) + iB);
  a1_ptr = ((0 * 3) + iB);
  A1_ptr = ((0 * 3) + iB);

  t11 = a.vals[a1_ptr];
  for (int jB = 0; jB < 3; jB += 1) {
    B2_ptr = ((B1_ptr * 3) + jB);
    c1_ptr = ((0 * 3) + jB);
    A2_ptr = ((A1_ptr * 3) + jB);

    A.vals[A2_ptr] = (A.vals[A2_ptr] + ((B.vals[B2_ptr] * c.vals[c1_ptr]) + t11));
  }
}

It should be something like this:

for (int iB = 0; iB < 3; iB += 1) {
  B1_ptr = ((0 * 3) + iB);
  a1_ptr = ((0 * 3) + iB);
  A1_ptr = ((0 * 3) + iB);

  ti = a.vals[a1_ptr];
  tk = 0.0;
  for (int jB = 0; jB < 3; jB += 1) {
    B2_ptr = ((B1_ptr * 3) + jB);
    c1_ptr = ((0 * 3) + jB);
    A2_ptr = ((A1_ptr * 3) + jB);

    tk += B.vals[B2_ptr] * c.vals[c1_ptr];
  }
  A.vals[A2_ptr] = tk + t11;
}

This means implementing all three cases of compute code emission (above, at, and below the last free variable).

Kronecker products

Make sure we support Kronecker products of matrices and add this to the linear algebra API

a(i) = b(i) * b(i)

$taco "a(i) = b(i) * b(i)"
/var/folders/90/tqlq4g3j7t1cpscxp2rn62nr0000gn/T/mvfeyknabhu5.c:8:10: error: redefinition of 'b'
  void** b = &(inputPack[4]);
         ^
/var/folders/90/tqlq4g3j7t1cpscxp2rn62nr0000gn/T/mvfeyknabhu5.c:7:10: note: previous definition is here
  void** b = &(inputPack[2]);
         ^
/var/folders/90/tqlq4g3j7t1cpscxp2rn62nr0000gn/T/mvfeyknabhu5.c:30:10: error: redefinition of 'b'
  void** b = &(inputPack[4]);
         ^
/var/folders/90/tqlq4g3j7t1cpscxp2rn62nr0000gn/T/mvfeyknabhu5.c:29:10: note: previous definition is here
  void** b = &(inputPack[2]);
         ^
2 errors generated.
Error in compile in file /Users/fred/Dropbox/projects/tensor-compiler/taco/src/backends/backend_c.cpp:617
 Compilation command failed:
cc -O3 -ffast-math -std=c99 -shared -fPIC /var/folders/90/tqlq4g3j7t1cpscxp2rn62nr0000gn/T/mvfeyknabhu5.c -o /var/folders/90/tqlq4g3j7t1cpscxp2rn62nr0000gn/T/mvfeyknabhu5.so
returned 256
Abort trap: 6

Integers with `Tensor::operator()`

We would be able to use integers to index into Tensor::operator(). Ideally A(0,1) reads coordinate (0,1) and A(0,i) is an index expression that holds the first dimension fixed. (This might not work out; needs to be explored.)

So in addition to writing index expressions you can also use integers (and eventually a mix):

IndexVar i, j;
A(i,j) = B(i,j); // Index expression
A(0,1) = 1;      // Insertion
A(0,i);          // Read the first row?

Additional element-wise operations

Maybe we can provide an operation and whether it is disjunctive or conjunctive.

Equality testing between two tensors
Powers of every element of one tensor

Support adding tensor of different orders

What is the most reasonable semantics for a(i) = b(i) + c? Does it mean:

that we add c to every non-zero component in b, or
that we add c to every component in b?

This is the same question as what is the most reasonable semantics for A(i,j) = B(i,j) + c(i).

If it is the latter, we need to stop handling scalars as a special case, to be ignored in the merge lattices and iteration schedules and simply multiplied in when computing. Rather we must merge with it, assuming it is a dense space.

a(i) = B(i,j)*c(j) + d(i)

taco "a(i) = B(i,j)*c(j) + d(i)" -f=B:ss -f=c:s
Compiler bug at /Users/fred/Dropbox/projects/tensor-compiler/taco/src/lower/merge_lattice.cpp:183 in make
Please report it to developers
 Condition failed: lattice.getSize() > 0
 Every merge lattice should have at least one lattice point
Abort trap: 6

User-defined dimension set relationships

It could be useful to let the user specify set relationships between non-dense dimensions. This means specifying whether two sparse dimensions are the same set, or whether one is a subset (non-strict) of the other.

We use such properties to optimize, but at the moment we only do these optimizations for dense dimensions (same sets) and dense-sparse dimensions (superset/subset relationship).

Permission issue on Lanka

When I run taco on Lanka I get a permission error.

$ taco "a(i) = b(i) * c(i)"
/usr/bin/ld: cannot open output file /tmp/38jtrdwqmvy7.so: Permission denied
collect2: error: ld returned 1 exit status
Error in compile in file /afs/csail.mit.edu/u/f/fred/projects/taco/src/backends/backend_c.cpp:617
 Compilation command failed:
cc -O3 -ffast-math -std=c99 -shared -fPIC /tmp/38jtrdwqmvy7.c -o /tmp/38jtrdwqmvy7.so
returned 256
Aborted

Eclipse C++ analysis errors

3 errors in lower.cpp

'lower' is ambiguous '
Candidates are:
std::vector<taco::ir::Stmt,std::allocatortaco::ir::Stmt> lower(const taco::Expr &, const taco::Var &, taco::lower::Context &)
taco::ir::Stmt lower(const taco::internal::Tensor &, std::__cxx11::basic_string<char,std::char_traits,std::allocator>, const std::set<enum taco::lower::Property,std::less,std::allocator> &)
' lower.cpp /Taco/src/lower line 220 Semantic Error

Invalid arguments '
Candidates are:
void append(std::vector<#0,std::allocator<#0>> &, const #1 &)
void append(std::vector<#0,std::allocator<#0>> &, const std::initializer_list<#0> &)
' lower.cpp /Taco/src/lower line 221 Semantic Error

Values array zero-initialization

Currently values array is zero-initialized as part of the assembly step. This can be incorrect if compute is run multiple times without redoing assembly. Instead, zero-initialization should be done as part of the compute step.

ASSERT_TENSOR_EQ

Want ASSERT_TENSOR_EQ function in test.h. Something like:

template <typename T>
void ASSERT_TENSOR_EQ(const Tensor<T>& expect, const Tensor<T>& actual) {
  SCOPED_TRACE("expected: {" + util::join(expect) + "}");
  SCOPED_TRACE("  actual: {" + util::join(actual) + "}");
  // ...

Inferred dimensions

If the user sets the dimensions of one tensor the defaults of other dimensions should be set to those. E.g. taco "a(i) = B(i,j) * c(i)" -d=B:4,3 should cause the default of dimensions indexed by i to be 4 and j to 3.

If the user gives incompatible options then that should be carried out and should lead to an error due to #37.

Mixing Dense and Sparse in 'a(i) = b(i) + c(i)'

The generated code for taco "a(i) = b(i) + c(i)" -f=a:d -f=b:d -f=c:s is wrong.

ib = (ib + 1);
if (ic == ib)
   c1_ptr = (c1_ptr + 1);

should be

if (ic == ib)
   c1_ptr = (c1_ptr + 1);
ib = (ib + 1);

Repeated use of an index variable in a tensor access

taco "a(i) = A(i,i)"
Compiler bug at /Users/fred/Dropbox/projects/tensor-compiler/taco/src/lower/iteration_schedule_forest.cpp:90 in IterationScheduleForest
Please report it to developers
 Condition failed: levels.size() == vertices.size()

Abort trap: 6

Want the ability to print merge lattices

If the taco tool could print merge lattices that would help development.

Result ptr variable not initialized properly

Using a debug option for the generated code leads to a crash in the test suite.
export TACO_CFLAGS="-g" or "-O0"

regex_error on lanka

Can't run taco-test on Lanka as we get a regex error. Upgrade Lanka or fix taco?

taco-test
[==========] Running 216 tests from 40 test cases.
[----------] Global test environment set-up.
[----------] 14 tests from BackendCTests
[ RUN      ] BackendCTests.GenEmptyFunction
regex_error
[  FAILED  ] BackendCTests.GenEmptyFunction
[ RUN      ] BackendCTests.GenMin
regex_error
[  FAILED  ] BackendCTests.GenMin
[ RUN      ] BackendCTests.GenPrint
regex_error
[  FAILED  ] BackendCTests.GenPrint
[ RUN      ] BackendCTests.GenCommentAndBlankLine
regex_error
[  FAILED  ] BackendCTests.GenCommentAndBlankLine
[ RUN      ] BackendCTests.GenEmptyFunctionWithOutput
regex_error
[  FAILED  ] BackendCTests.GenEmptyFunctionWithOutput
[ RUN      ] BackendCTests.GenStore
regex_error
[  FAILED  ] BackendCTests.GenStore
[ RUN      ] BackendCTests.GenVarAssign
regex_error
[  FAILED  ] BackendCTests.GenVarAssign
[ RUN      ] BackendCTests.GenFor
regex_error
[  FAILED  ] BackendCTests.GenFor
[ RUN      ] BackendCTests.GenCase
regex_error
[  FAILED  ] BackendCTests.GenCase
[ RUN      ] BackendCTests.GenWhile
regex_error
[  FAILED  ] BackendCTests.GenWhile
[ RUN      ] BackendCTests.BuildModule
/usr/bin/ld: cannot open output file /tmp/38jtrdwqmvy7.so: Permission denied
collect2: error: ld returned 1 exit status
Error in compile in file /afs/csail.mit.edu/u/f/fred/projects/taco/src/backends/backend_c.cpp:617
 Compilation command failed:
cc -O3 -ffast-math -std=c99 -shared -fPIC /tmp/38jtrdwqmvy7.c -o /tmp/38jtrdwqmvy7.so
returned 256
Aborted

Filling bug

test this expression taco "a(i) = b(i) + c(i)" -g=b:d -g=c:d in the branch filling-bug

Tensor values not zeroed out

Values are not zeroed out before each compute, so in kernels that increment into the output repeated compute calls increment into the result.

Tensor broadcast

taco "A(i,j) = b(i)"
Error at /Users/fred/Dropbox/projects/tensor-compiler/taco/src/tensor.cpp:524 in setExpr:
 All variables must appear on the right-hand-side of an assignment. This restriction will be removed in the future.
Expression: A(i,j) = b(i)
Abort trap: 6

llvm backend (online)

We need to develop an llvm backend in the same style as Simit/Halide. This backend should support compiling the taco IR to llvm IR and then have llvm compile the llvm IR to a function pointer.

SpMV performance with dense-sparse matrices

For some of the matrices in SuiteSparse Matrix Collection, SpMV performance might be improved.

Build Failure When Spaces In Path Name

If there's spaces in the path name building will fail.

Header file taco.h

The taco build system should make a taco.h header file that contains all the files that are needed by the user (e.g. tensor.h, format.h, operator.h, expr.h, var.h).

One option is to write a small utility to do this along the lines of Halide's build_halide_h that also removes #include lines.

Repeated use of tensors in an index expression fails in the backend

taco "a(i) = b(i) + b(i)"
/var/folders/90/tqlq4g3j7t1cpscxp2rn62nr0000gn/T/mvfeyknabhu5.c:11:10: error: redefinition of 'b'
  void** b = &(inputPack[4]);
         ^
/var/folders/90/tqlq4g3j7t1cpscxp2rn62nr0000gn/T/mvfeyknabhu5.c:10:10: note: previous definition is here
  void** b = &(inputPack[2]);
         ^
/var/folders/90/tqlq4g3j7t1cpscxp2rn62nr0000gn/T/mvfeyknabhu5.c:37:10: error: redefinition of 'b'
  void** b = &(inputPack[4]);
         ^
/var/folders/90/tqlq4g3j7t1cpscxp2rn62nr0000gn/T/mvfeyknabhu5.c:36:10: note: previous definition is here
  void** b = &(inputPack[2]);
         ^
2 errors generated.
Error in compile in file /Users/fred/Dropbox/projects/tensor-compiler/taco/src/backends/module.cpp:66
 Compilation command failed:
cc -O3 -ffast-math -std=c99 -shared -fPIC /var/folders/90/tqlq4g3j7t1cpscxp2rn62nr0000gn/T/mvfeyknabhu5.c -o /var/folders/90/tqlq4g3j7t1cpscxp2rn62nr0000gn/T/mvfeyknabhu5.so
returned 256
Abort trap: 6

Repeated use of an index variable in a tensor access

We should support diagonal accesses:

taco "A(i,i) = b(i) * c(i)"
Compiler bug at /Users/fred/Dropbox/projects/tensor-compiler/taco/src/lower/iteration_schedule_forest.cpp:88 in IterationScheduleForest
Please report it to developers
 Condition failed: levels.size() == vertices.size()

Abort trap: 6

taco "a(i) = A(i,i)"
Compiler bug at /Users/fred/Dropbox/projects/tensor-compiler/taco/src/lower/iteration_schedule_forest.cpp:90 in IterationScheduleForest
Please report it to developers
 Condition failed: levels.size() == vertices.size()

Abort trap: 6

Tensor Value printing

We can't print the values returned by dereferencing Tensor iterators. For instance, the following doesn't work:

cout << util::join(tensor) << endl;

a(i) = B(i,j) + c(j), where B is dense-sparse

Segmentation fault.

backtrace:

    frame #1: 0x0000000100114306 libtaco.dylib`taco::Var::operator=(taco::Var const&) [inlined] std::__1::shared_ptr<taco::Var::Content>::swap(this=0x00007fff5fbfb840, __r=nullptr) + 12 at memory:4713
    frame #2: 0x00000001001142fa libtaco.dylib`taco::Var::operator=(taco::Var const&) [inlined] std::__1::shared_ptr<taco::Var::Content>::operator=(this=0x0000001102011e50, __r=std::__1::shared_ptr<taco::Var::Content>::element_type @ 0x00000001004005f0 strong=3 weak=1) + 151 at memory:4601
    frame #3: 0x0000000100114263 libtaco.dylib`taco::Var::operator=(this=0x0000001102011e50, (null)=0x00000001004008b0) + 67 at var.h:10
    frame #4: 0x0000000100136c41 libtaco.dylib`taco::lower::IterationSchedule::make(this=0x00007fff5fbfbe20, op=0x0000000100400860)::CollectTensorPaths::visit(taco::internal::Read const*) + 465 at iteration_schedule.cpp:72
    frame #5: 0x000000010019ef9e libtaco.dylib`taco::internal::Read::accept(this=0x0000000100400860, v=0x00007fff5fbfbe20) const + 30 at expr_nodes.h:26
    frame #6: 0x00000001000e97b3 libtaco.dylib`taco::Expr::accept(this=0x0000000100400ac0, v=0x00007fff5fbfbe20) const + 51 at expr.cpp:22
    frame #7: 0x00000001000f19eb libtaco.dylib`taco::internal::ExprVisitor::visit(this=0x00007fff5fbfbe20, op=0x0000000100400ab0) + 43 at expr_visitor.cpp:60
    frame #8: 0x00000001000f187f libtaco.dylib`taco::internal::ExprVisitor::visit(this=0x00007fff5fbfbe20, op=0x0000000100400ab0) + 47 at expr_visitor.cpp:28
    frame #9: 0x00000001000f0d5e libtaco.dylib`taco::internal::Add::accept(this=0x0000000100400ab0, v=0x00007fff5fbfbe20) const + 30 at expr_nodes.h:103
    frame #10: 0x00000001000e97b3 libtaco.dylib`taco::Expr::accept(this=0x00007fff5fbfbec8, v=0x00007fff5fbfbe20) const + 51 at expr.cpp:22
    frame #11: 0x0000000100134a37 libtaco.dylib`taco::lower::IterationSchedule::make(tensor=0x00007fff5fbfe828) + 1191 at iteration_schedule.cpp:81
    frame #12: 0x0000000100169fb5 libtaco.dylib`taco::lower::lower(tensor=0x00007fff5fbfe828, funcName="assemble", properties=size=1) + 6901 at lower.cpp:445
    frame #13: 0x00000001000fcb16 libtaco.dylib`taco::internal::Tensor::compile(this=0x00007fff5fbfe828) + 2422 at internal_tensor.cpp:400
    frame #14: 0x0000000100003c5f taco`main(argc=3, argv=0x00007fff5fbff488) + 7919 at taco.cpp:397
    frame #15: 0x00007fffc0b0e255 libdyld.dylib`start + 1
    frame #16: 0x00007fffc0b0e255 libdyld.dylib`start + 1

Linear Algebra API

The tensor algebra compiler supports tensor index notation. Tensor index notation can be used to do linear algebra, but for convenience we ought to have a linear algebra API as well.

This API (linalg.h) can be built on top of the current functionality (tensor.h). It should define types (Scalar, Vector, Matrix), facilities for converting between these types and tensors, and the usual linear algebra operations (addition, subtraction, multiplication with scalars, vectors and matrices). The goal is to provide something roughly as convenient as Eigen.

It should also support blocked linear algebra, where the user can define and compute with blocked vectors and matrices (see also #59).

for (int b1_ptr = b.d1.ptr[0]; b1_ptr < b.d1.ptr[(0 + 1)]; b1_ptr += 1) {
  ib = b.d1.idx[b1_ptr];
  c1_ptr = ((0 * 3) + ib);
  a1_ptr = ((0 * 3) + ib);

  a.vals[a1_ptr] = (a.vals[a1_ptr] + (b.vals[b1_ptr] + c.vals[c1_ptr]));
}
for (int ic = 0; ic < 3; ic += 1) {
  c1_ptr = ((0 * 3) + ic);
  a1_ptr = ((0 * 3) + ic);

  a.vals[a1_ptr] = (a.vals[a1_ptr] + c.vals[c1_ptr]);
}