ginkgo-project / ginkgo Goto Github PK
View Code? Open in Web Editor NEWNumerical linear algebra software package
Home Page: https://ginkgo-project.github.io/
License: BSD 3-Clause "New" or "Revised" License
Numerical linear algebra software package
Home Page: https://ginkgo-project.github.io/
License: BSD 3-Clause "New" or "Revised" License
It almost never makes sense to pass a null pointer to one of Ginkgo's methods. There's usually no way to continue and the best we can do is just throw an exception. However, we don't ensure this behavior anywhere yet, so passing a null pointer will result in a crash when we try to dereference it.
The proposed C++ way (Sutter's Mill, also suggested by @mhoemmen) is to use a reference (&
) instead of a pointer for objects that cannot be null (assuming we are not giving any onwership).
I must say I don't like this approach as it has some drawbacks (in the order of importance, so if you are convinced by statement n
, you can skip statements m > n
):
uniqe_ptr
or shared_ptr
, just a plain pointer.->
we need to start using .
). This non-uniformity makes it more difficult to write code that doesn't depend on the kind of ownership we have (and may be confusing for some inexperienced users).
gko::clone
utility proposed in #30. While we only need a single implementation, that internally calls obj->clone()
independent of what kind of ownership the object has, supporting references would require writing another version of the method that calls obj.clone()
instead.std::unique_ptr<LinOp> x = /* something */;
// do 1st thing with x (multiple statements, using x->something())
// do 2nd thing with x (multiple statement, using x->something())
void f1(LinOp &x)
, and the second thing void f2(LinOp &y)
. I didn't change the functionality, but I have to find and replace all x->something()
with x.something()
just because I want to assert that x
cannot be null.void f(LinOp &x) {
// do thing 1 with x
// do thing 2 with x
}
x
, and "thing 1" should have a temporary copy of 'x'. I decide to add auto tmp = gko::clone(x)
, and use tmp
instead. Now I sure have to replace every occurrence of x
with tmp
, but I also have to change how I call things ('.' with '->'), which can be a bit confusing.Rather than mixing references/pointers just to express if the argument can be null, I would suggest adding a decorator that throws a runtime error if the object is null. Here is the pseudocode (that would probably have to be refined a bit to handle special cases, e.g. plain pointer):
template <typename Pointer>
class non_null {
public:
template <typename... Args>
non_null(Args...&& args) : p(std::forward<Args>(args)...) {
if (p == nullptr) {
throw std::runtime_error("Expected a non-null pointer");
}
}
auto get() const { p.get(); } // have to specialize this for plain pointer
auto operator *() const { return *p; }
auto operator ->() const { return p; }
operator Pointer&() const { return p; } // maybe explicit?
private:
mutable Pointer p;
};
We could also publicly inherit from Pointer
instead, to provide all the same methods as Ponter does (we have to be careful not to inherit from plain pointer).
Then we could use:
std::unique_ptr<LinOp> LinOp::generate(non_null<std::shared_ptr<const LinOp>> op);
void LinOp::apply(non_null<const LinOp *> b, non_null<LinOp *> x);
// currently don't have a function in Ginkgo that takes a unique pointer, so making it up:
void my_sink(non_null<std::unique_ptr<LinOp>> op);
I think it's pretty self documenting that all these expect a non-null pointer, maybe even more than by passing by reference, and we get a run-time check that the pointer is actually not null.
In examples/simple_solver.cpp
, should
auto x = gko::read<vec>("data/x.mtx", exec);
be instead
auto x = gko::read<vec>("data/x0.mtx", exec);
since in data
subdirectory there is x0.mtx
but not x.mtx
?
COO spmv kernels need to works on the output vector c
first.
COO cuda kernels only do c = A*b + c
Thus, in spmv (c = A*b = A*b + 0
), we set zero in c
for ensuring kernel works correctly
In Hybrid format, its spmv function is composed of ELL and COO spmv functions.
ell_mtx->apply(b, x); // x = (ELL) * b
coo_mtx->apply(1, b, 1, x); // x = 1 * x + 1 * (COO) * b
The second command spends time to multiply x
with 1, which is redundant.
Does It have any performance issue?
If it does, maybe we need the axpy function like c = A*b + c
instead of calling advanced spmv.
ell_mtx->apply(b, x); // x = (ELL) * b
coo_mtx->axpy(b, x); // x = x + (COO) * b
gko::matrix::Hybrid
class template in core/matrix/hybrid.hpp
core/matrix/hybrid.cpp
NOT_IMPLEMENTED
core/test/matrix/hybrid.cpp
The SpMV comparison benchmark causes warnings due to destructors throwing exceptions:
/home/gflegar/Sources/ginkgo/public/repo/benchmark/spmv_comparison_cuda/spmv_comparison_cuda.cpp: In destructor ‘CuspCsrmp::~CuspCsrmp()’:
/home/gflegar/Sources/ginkgo/public/repo/core/base/exception_helpers.hpp:195:64: warning: throw will always call terminate() [-Wterminate]
::gko::CusparseError(__FILE__, __LINE__, __func__, _errcode)
This is because destructors are marked noexcept
by default. To change the default, solve the warnings, and prevent the code from crashing if an exception is thrown, the destructor should be marked with noexcept(false)
. For example:
struct A {
~A() noexcept(false) { throw std::runtime_error("No warnings here"); }
};
(Original issue posted by @hartwiganzt on private repo)
Now, that we already have dense, ELL matrices, I conclude from a discussion with Mark that the name "padding" is probably inappropriate. Padding is what you add to make it fit nicely. What we use it for is something different. Traditionally called "leading dimension" Mark suggests the name "stride".
> *--* \
> |||| |
> |||| |
> |||| | stride
> *--* \ |
> : : | padding |
> *--* / /
EDIT: In more detail, the extra amount of memory added to a multidimensional array (unfolded as a series of vectors of size n
) to assure aligned access to each vector is usually called padding
. Meaning that the distance (usually called stride
) between first elements of two consecutive vectors is exactly stride = n + padding
.
Current Ginkgo naming is a bit misleading as it calls the distance between two first elements padding
(instead of stride
).
padding
with stride
@tcojean could we modify the solver generator script to automatically comment-out these placeholders and add NOT_IMPLEMENTED
specifications instead?
It shouldn't be that difficult, every line containing just {
(with no indent) is bound to be a start of function body (classes and namespaces have {
on the same line as the name), and every line containing just }
is bound to be end of a function (classes have };
and namespaces } // namespace <name>
).
Ginkgo currently supports only left preconditioning.
However, it seems that some use-cases require right and two-sided preconditioning. We started this as an off-topic (or maybe vaguely-on-topic) discussion in thread #17 , so I'll collect related comments, and we can continue the discussion here.
[...] we have to decide between left preconditioning ( MAx = Mb ) Right preconditioning ( AMx =c, x = My) and mixed preconditioning. Matlab only supports left preconditioning (afaik), and the same holds for MAGMA-sparse (that I should know.) Is it valid to stick with that? I am not sure what Trilinos/SuperLU/PetSC does. I would appreciate input from the community.
-- @hartwiganzt
Iterative solvers may
- choose to accept left, right, or both (split) preconditioners.
- have this option without overhead, if the solver implementation dispatches to different algorithms based on whether or not the (left,right,both) preconditioner(s) exist(s).
[...]- Most users want right preconditioning because it preserves the residual norm (e.g., makes GMRES' "implicit" residual norm the same as its "explicit" residual norm
norm(b-A*x, 2)
).- However, some users want left preconditioning (optimization folks, for example -- I can fish around for details if you're interested).
-- @mhoemmen
I know some people who use left preconditioning and others who use right. Mathematically they give different results, so I think both options should be supported. However, split/both is much less common, so lower priority though it is the most general approach.
-- @egboman (upvoted by @mhoemmen, @drittelhacker)
We talked a little bit about LSQR over the phone today. Matlab actually right-preconditions LSQR:
https://www.mathworks.com/help/matlab/ref/lsqr.html?s_tid=gn_loc_drop
That makes sense because LSQR solves a (constrained) least-squares problem. A "left preconditioner" would actually change the function to minimize. (That's true for GMRES as well, but GMRES aims to solve the linear system and a good solution to the linear system will minimize that function regardless. LSQR works even when there is no x such that Ax=b.)
Our users tend to prefer right preconditioners for GMRES etc. for this reason. However, it's not hard to make GMRES, etc. achieve the tolerance for the actual ("explicit") residual
norm(b-A*x,2)
; first iterate until the "implicit residual" (the thing that GMRES computes from the little upper Hessenberg system) achieves the desired tolerance, than iterate some more untilnorm(b-A*x,2)
does as well. Thus, it's not much different to use a left preconditioner.-- @mhoemmen
[...] if we support both left and right, we can also support two-sided preconditioning without significant effort.
-- @gflegar
There are some small problems with the script create_new_algorithm
.
This is what I did:
./create_new_algorithm.sh gmres solver cg
I want to create a new algorithm gmres with cg as model.
The script adds false test targets in CMakeLists.txt
under test/solver
directories
In this case, false test targets bigmresstab
, fgmres
, and gmress
are generated.
Another small issue:
omp/test/solver/gmres_kernels.cpp
is not generated.
The following is the output message at generation.
Creating temporary files:
./tmp_1532434475/core/solver/gmres.cpp
./tmp_1532434475/core/solver/gmres.hpp
./tmp_1532434475/core/solver/gmres_kernels.hpp
./tmp_1532434475/reference/solver/gmres_kernels.cpp
./tmp_1532434475/omp/solver/gmres_kernels.cpp
./tmp_1532434475/cuda/solver/gmres_kernels.cu
./tmp_1532434475/core/test/solver/gmres.cpp
./tmp_1532434475/reference/test/solver/gmres_kernels.cpp
./tmp_1532434475/cuda/test/solver/gmres_kernels.cpp
Renaming and distributing files
cleaning up temporary files.
Modifiying CMakeLists.txt and common_kernels.inc.cpp
head: cannot open '259' for reading: No such file or directory
./create_new_algorithm.sh: line 326: 178
259: syntax error in expression (error token is "259")
Error message is spotted here
Summary:
Created solver file
core/solver/gmres.cpp
This is where the gmres algorithm needs to be implemented.
Created class header
core/solver/gmres.hpp
This is where the gmres class functions need to be implemented.
Created kernel header
core/solver/gmres_kernels.hpp
This is where the algorithm-specific kernels need to be added.
Created kernel file
reference/solver/gmres_kernels.cpp
Reference kernels for gmres need to be implemented here.
Created kernel file
omp/solver/gmres_kernels.cpp
OMP kernels for gmres need to be implemented here.
Created kernel file
cuda/solver/gmres_kernels.cu
CUDA kernels for gmres need to be implemented here.
Created unit tests for gmres
core/test/solver/gmres.cpp
Created unit tests for gmres reference kernels
reference/test/solver/gmres_kernels.cpp
Created unit tests for gmres CUDA kernels
cuda/test/solver/gmres_kernels.cpp
Modified core/CMakeLists.txt
Modified reference/CMakeLists.txt
Modified omp/CMakeLists.txt
Modified cuda/CMakeLists.txt
Modified core/test/solver/CMakeLists.txt
Modified reference/test/solver/CMakeLists.txt
Modified cuda/test/solver/CMakeLists.txt
Modified core/device_hooks/common_kernels.inc.cpp
In all of the previous files cg_kernels.cpp was automatically replaced into gmres. Ensure there is no inconsistency.
All the imported code was commented and TODO items were generated in the new files.
Check all the modified files for '// TODO (script):' items
e.g. by using grep -HR '// TODO (script):' /home/u/yanjen/ginkgo/dev_tools/scripts/../..
A summary of the required next steps has been written to:
todo_gmres.txt
This issue is the summary of a discussion which arose from MR #46, but is more general in Ginkgo. I will try to expose the current state of things and eventual problems. If you find any mistake here feel free to correct it or notify me.
Macros are very useful to generate code. This allows to completely remove boiler plate code from every place and therefore reduces the code base by quite an amount of lines of code. One such current usage is GKO_REGISTER_OPERATION(xxx, kernel)
which does the following:
xxx_operation
containing run(Executor)
specializations launching the correct kernel
methodmake_xxx_operation
which is an interface to build an object of the previously defined classIn this case, the macro usage is at the top of the kernel cpp file, therefore you can at least go there easily and inspect the code if needed in a relatively easy way.
But that is not always the case, and it's particularly true with the factory macros of MR #46, and probably some macros we already use elsewhere in Ginkgo.
There is a very old saying, as old as macros, that macros are evil incarnate and should be killed with fire. There are multiple classic reasons for that, such as the fact that macros don't have scope, don't ensure type safety, are very hard to debug: you need to run the preprocessor to know what code is actually generated by your macro in order to debug it, and much more.
I'm actually kind of fine with the previous points (not so much debugging), but for me one big problem is when macros add class members or class/method definitions, in particular when the internal name is not directly in the parameters, then you have in your code something such as:
// [...]
make_xxx_operation(...);
// [...]
But if you grep for this symbol, or use more advanced tools such as tags, then you will never find this particular symbol in your source code. The symbol appears from nowhere in the code, because a macro generates it. To find it, you need to go through the macro and pretty much navigate through the code by hand, which makes code navigation and understanding problematic.
My main question is the following: I perceive this to be a pretty serious problem personally, is this the case for you? That is maybe not the case for everyone, it really depends on habits and usage I guess.
Supposing this is a problem, multiple solutions can be thought of. All of these solutions involve some sort of compromise between doing more boiler plate code and keeping the code generation capacity that macros provide.
This may be hard to implement in practice so that's not really a true solution. The theory is to never use it but in practice there is always some cases where you will need macros. Maybe a combination of templated classes and namespaces can completely remove the need for some macros though. This will still lead to some more boiler plate code, though.
This solution is to let the user declare his own members therefore they are properly defined and embedded in the code. The main downside is this clearly increases the amount of code that you have to write. In addition, you may have weird user/macro interactions such as:
foo_type
foo_type foo_var;
, so the user needs to know what the macro does which is not a great designThe main idea here is that if your macro declares a type foo_type
, then as parameter to the macro you pass foo_type
directly. That is:
MACRO(foo)
MACRO(foo_type, foo_var,...)
We have a clear addition of boiler plate code (names), but at least we can grep the code for foo_type
and find that very easily. Therefore there are no magically appearing names anywhere and most usual navigation tools should work properly. The downside I can see is that for some macros, this list may get fairly long.
I'm not sure we can do this properly and with which tool, but an idea would be to add anchors/bookmarks at places in the code, and when we use a symbol which comes from nowhere we give a direct link to the precise macro call which generates it.
We could just keep macros and have tools which pretty much always call the compiler preprocessor, so we can view every particular macro call if needed. A problem is the tools I have seen doing that are too simple and not very usable, though.
The ByInteraction
stopping criterion in asynchronous_stopping_criterion
example is implemented inside the gko::stop
namespace. We should change this as it might cause the user to think that custom stopping criteria have to be implemented in that namespce.
In fact, the exact opposite is true. User-defined stopping criteria should be implemented in the namespace of the application using them, and the gko
namespace should be used exclusively for Ginkgo code. Otherwise, the user-defined criterion might have name clashes with future versions of Ginkgo.
Currently we use std::move
, .get()
, and ->clone()
with combination of different objects to express who owns what, who shares what, etc. This might get a bit confusing for users that are not so proficient in C++ and the code can get a bit strange.
For example:
using Mtx = gko::matrix::Dense<>;
auto unique = Mtx::create(exec); // this is a matrix with unique ownership (i.e. it cannot be passed to methods that want shared ownership)
std::shared_ptr<Mtx> shared = Mtx::create(exec); // this is a matrix with shared ownership
auto unique_clone = unique->clone(); // a clone of "unique" with unique ownership, note that the static type of unique_clone is LinOp, not Mtx
std::shared_ptr<gko::LinOp> unique_clone2 = unique->clone(); // a clone of "unique" with shared ownership
auto shared_clone = shared->clone(); // a clone of "shared" with unique ownership
std::shared_ptr<gko::LinOp> shared_clone2 = shared->clone(); // a clone of "shared" with shared ownership
auto unique_ref = unique.get(); // a "reference" to unique with no ownership - i.e. they are the same object, unique_ref doesn't "own" the object, "unique" does
auto shared_ref = shared.get(); // same as above, just that shared can share ownership with someone else
Now if we want to call some functions (consider all the lines as stand-alone examples, they don't work as a block of code, since some lines invalidate the objects):
auto cg_f = gko::solver::CgFactory::create(...);
cg_f->generate(unique); // error, "unique" has unique ownership of object
cg_f->generate(unique_ref); // error, "unique_ref" doesn't have ownership of object, and generate() expects it
cg_f->generate(std::move(unique)); // o.k. transferring ownership of unique to generate
cg_f->generate(unique->clone()); // o.k. creating a copy of the object to give to generate
cg_f->generate(shared); // ok, sharing ownership with generate()
cg_f->generate(shared_ref); // error, "shared_ref" doesn't have ownership of object
cg_f->generate(std::move(shared)); // ok, giving my ownership of shared to generate() (but there might be others who still have ownership)
cg_f->generate(shared->clone()); // ok, creating a private clone for generate()
auto cg = cg_f->generate(/* something */);
cg->apply(unique, unique_clone); // error, apply doesn't want ownership
cg->apply(unique_ref, unique_clone.get()); // o.k. passing "references" without ownership
cg->apply(shared, shared_clone2); // error, apply doesn't want ownership
cg->apply(shared_ref, shared_clone2.get()); // o.k. passing "references" without ownership
There are of course more combinations, but this is enough to make it confusing for some users. Where to put .get(), where to put ->clone(), where to use std::move. To make it worse all 3 are called in radically different ways: as a method, as a method on a dereferenced object, and as a top-level function. There's also the problem of having to repeat the type twice when wanting to create a shared object, and the problem of clone which doesn't return the same type, but a super-type, which leads to clumsy syntax like:
std::unique_ptr<Mtx> clone = static_cast<Mtx *>(unique->clone().release());
So, I recommend the following wrappers:
Wrapper | Semantics | Implementation |
---|---|---|
gko::clone |
Clones any object, unique/shared/no ownership | Calls ->clone() and preserves type |
gko::share |
Shares a unique/shared object* | Moves a unique_ptr/shared_ptr to shared_ptr |
gko::give |
Gives the ownership of a unique/shared object to a function/operator* | same as std::move |
gko::lend / gko::loan |
Returns a reference to the object, without changing ownership | either call .get() , or just return the input |
* The original input becomes invalid
take and use (something belonging to someone else) with the intention of returning it.
"he had borrowed a car from one of his colleagues"
synonyms: take, take for oneself, help oneself to, use as one's own, abscond with, carry off, appropriate, commandeer, abstract; steal, purloin, shoplift rob, swipe, nab, rip off, lift, liberate, snaffle, snitch, pinch, half-inch, whip, knock off, nobble, bone, scrump, bag, blag, glom, hook
"his workmates had ‘borrowed’ all his tools"
If anyone has a better name, I appreciate it.
I think we figured out we want either gko::lend
or gko::loan
, which better describe what is happening.
With the new wrappers, the examples would look like this:
using Mtx = gko::matrix::Dense<>;
auto unique = Mtx::create(exec); // this is a matrix with unique ownership (i.e. it cannot be passed to methods that want shared ownership)
auto shared = gko::share(Mtx::create(exec)); // this is a matrix with shared ownership
auto unique_clone = gko::clone(unique); // a clone of "unique" with unique ownership, static type is now Mtx
auto unique_clone2 = gko::share(gko::clone(unique)); // a clone of "unique" with shared ownership
auto shared_clone = gko::clone(shared); // a clone of "shared" with unique ownership
auto shared_clone2 = gko::share(gko::clone(shared)); // a clone of "shared" with shared ownership
auto unique_ref = gko::lend(unique); // a "reference" to unique with no ownership - i.e. they are the same object, unique_ref doesn't "own" the object, "unique" does
auto shared_ref = gko::lend(shared); // same as above, just that shared can share ownership with someone else
auto cg_f = gko::solver::CgFactory::create(...);
cg_f->generate(unique); // error, "unique" has unique ownership of object
cg_f->generate(unique_ref); // error, "unique_ref" doesn't have ownership of object, and generate() expects it
cg_f->generate(gko::give(unique)); // o.k. transferring ownership of unique to generate
cg_f->generate(gko::clone(unique)); // o.k. creating a copy of the object to give to generate
cg_f->generate(shared); // ok, sharing ownership with generate()
cg_f->generate(shared_ref); // error, "shared_ref" doesn't have ownership of object
cg_f->generate(gko::give(shared)); // ok, giving my ownership of shared to generate() (but there might be others who still have ownership)
cg_f->generate(gko::clone(shared)); // ok, creating a private clone for generate()
auto cg = cg_f->generate(/* something */);
cg->apply(unique, unique_clone); // error, apply doesn't want ownership
cg->apply(unique_ref, gko::lend(unique_clone)); // o.k. passing "references" without ownership
cg->apply(shared, shared_clone2); // error, apply doesn't want ownership
cg->apply(shared_ref, gko::lend(shared_clone2)); // o.k. passing "references" without ownership
@ginkgo-project/developers what do you think about this?
Sometimes it is beneficial to have a method that computes the norm of a matrix (or liner operator). (Currently we only have a function which computes the column-wise dot product of two dense matrices - which is enough for use in Krylov solvers, but not necessarily in all scenarios).
There are a few questions though on how we implement this:
transpose
), or do we just put that as non-virtual members of the concrete linear operators implementing those?Currently the only supported format for reading/writing matrices is the matrix market format (see PRs #94 and #98).
There are other, more general, data iterchange formats (JSON, YAML, etc.) which could be used to store linear operators on disk to enable better interoperability with other software / languages (especially web-based software, which almost exclusively uses JSON for data exchange).
Existing open-source C++ libraries that support this should be investigated (maybe boost has something?), and integrated into Ginkgo. It should also be investigate if there is a standardized way to represent sparse and dense matrices (or even linear operators) in these formats, and, if so, represent them in the same way in Ginkgo.
Ginkgo is supposed to be designed to work with user-provided datatypes. However, due to the use of external libraries (cuBLAS and cuSPARSE), which do not have support for this, it is not entirely true.
PR #49 fixes Ginkgo so it can be compiled for such data types, but it doesn't provide alternatives for all external library calls, so parts of Ginkgo will report a non-implemented error for these types.
This issue tracks the progress of these implementations:
matrix::Dense::compute_dot()
(needs custom dot
kernel)matrix::Dense::apply()
(needs custom gemm
kernel)matrix::Dense::scale()
(needs custom scal
kernel)matrix::Dense::add_scaled()
(needs custom axpy
kernel)matrix::Dense::transpose()
(needs custom geam
kernel)matrix::Dense::conj_transpose()
(needs custom geam
kernel)matrix::Csr::apply()
(needs custom spmv
kernel)matrix::Csr::transpose()
(needs custom transpose
kernel)matrix::Csr::conj_transpose()
(needs custom transpose
kernel)In addition to the scripts for generating solvers/preconditioners/matrices, it would be nice to have a script that generates general source files. For example, typing something like:
ginkgo-cli generate source "module/path/to/file"
Should create the following files:
module/path/to/file.cpp:
/* Copy of the licence */
#include "moudle/path/to/file.hpp"
namespace gko {
// Your code goes here
} // namespace gko
module/path/to/file.hpp:
/* Copy of the licence */
#ifndef MODULE_PATH_TO_FILE_HPP_
#define MODULE_PATH_TO_FILE_HPP_
namespace gko {
// Your code goes here
} // namespace gko
#endif // MODULE_PATH_TO_FILE_HPP_
module/test/path/to/file.cpp:
/* Copy of the licence */
#include <moudle/path/to/file.hpp>
#include <gtest/gtest.h>
namespace {
class File : public ::testing::Test {
// your test setup goes here
};
TEST_F(File, YourTestName) {
// Your test goes here
}
// Other tests go here
} // namespace
In addition:
file.cpp
added to its SOURCES
variable (e.g. if there is no CmakeLists.txt in module/path/to/, but there is one in module/path, the SOURCES
variable in module/path/CmakeLists.txt should be modified to include to/file.cpp
).create_test(file)
should be added to it.We need to check the functionality of all GPU kernels also for problems where more than one single thread block is launched.
The function "read" of matrix computes the matrix parameter by itself.
For example, we cannot set customized stride for ell matrices when reading mtx file.
However, when we convert Dense matrix to another matrix format, we can set the custom initial parameters of the matrix.
How about we also allow custom initial parameters for function "read"?
gpu/matrix/sellp_kernels.cu
gpu/test/matrix/sellp_kernels.cpp
Here are some changes that may be worth considering?
To document functions which are standalone (e.g. gko::transpose
), then either:
gko
namespace here)In addition, if we want to include some functions inside the documentation of a class, such as the operators and transpose
for dim, I think the only solution is the following:
\memberof dim @{
/** @} */
Finally, at some point, we may want to also document the master
branch separately, do we already prepare a location for that and add master
as a target for documentation deployment?
With the current version of GPU detection script, CMake configuration fails for new CUDA versions, which have not yet been added to the script.
While this is not a lot of work, it's not very pleasant for the user. For example, someone may be happy with an old version of Ginkgo, and wants to compile it with a newer CUDA version. This will of course sometimes be impossible if the new CUDA API has changed, but if it didn't there is no reason why it wouldn't compile.
Thus, removing this restriction from the detection script would be a good idea. It should be possible to do by extracting the supported architectures directly from nvcc --help
command instead of relying on hardcoded values. We still won't get support for new generation names (Fermi
, Maxwell
, etc.), but at least the user will be able to compile Ginkgo by specifying the architectures explicitly.
Having converted reference/test/solver/bicgstab_kernels.cpp
to use single precision by copying it to reference/test/solver/bicgstab_kernels_float.cpp
and making changes as per diff below:
51c51
< using Mtx = gko::matrix::Dense<>;
---
> using Mtx = gko::matrix::Dense<float>;
57c57
< gko::solver::BicgstabFactory<>::create(exec, 8, 1e-15))
---
> gko::solver::BicgstabFactory<float>::create(exec, 8, 1e-7))
62c62
< std::unique_ptr<gko::solver::BicgstabFactory<>> bicgstab_factory;
---
> std::unique_ptr<gko::solver::BicgstabFactory<float>> bicgstab_factory;
74c74
< ASSERT_MTX_NEAR(x, l({-4.0, -1.0, 4.0}), 1e-8);
---
> ASSERT_MTX_NEAR(x, l({-4.0, -1.0, 4.0}), 1e-4);
87c87
< ASSERT_MTX_NEAR(x, l({{-4.0, 1.0}, {-1.0, 2.0}, {4.0, -1.0}}), 1e-8);
---
> ASSERT_MTX_NEAR(x, l({{-4.0, 1.0}, {-1.0, 2.0}, {4.0, -1.0}}), 1e-4);
102c102
< ASSERT_MTX_NEAR(x, l({-8.5, -3.0, 6.0}), 1e-8);
---
> ASSERT_MTX_NEAR(x, l({-8.5, -3.0, 6.0}), 1e-4);
118c118
< ASSERT_MTX_NEAR(x, l({{-8.5, 1.0}, {-3.0, 2.0}, {6.0, -5.0}}), 1e-8);
---
> ASSERT_MTX_NEAR(x, l({{-8.5, 1.0}, {-3.0, 2.0}, {6.0, -5.0}}), 1e-4);
the following tests fail by not attaining the required accuracy:
[ RUN ] Bicgstab.SolvesMultipleDenseSystems
.../ginkgo/reference/test/solver/bicgstab_kernels_float.cpp:87: Failure
Relative error between x and {{-4.0, 1.0}, {-1.0, 2.0}, {4.0, -1.0}} is 0.11072188922929171
which is larger than 1e-4 (which is 0.0001)
x is:
-4.0000228881835938 0.54083889722824097
-1.0000052452087402 1.858891487121582
4.000030517578125 -0.50263333320617676
{{-4.0, 1.0}, {-1.0, 2.0}, {4.0, -1.0}} is:
-4 1
-1 2
4 -1
component-wise relative error is:
5.7220131568155855e-06 0.45916110277175903
5.2451812281639528e-06 0.070554256439208984
7.6293363240331724e-06 0.49736666679382324
[ FAILED ] Bicgstab.SolvesMultipleDenseSystems (1 ms)
[ RUN ] Bicgstab.SolvesDenseSystemUsingAdvancedApply
.../ginkgo/reference/test/solver/bicgstab_kernels_float.cpp:102: Failure
Relative error between x and {-8.5, -3.0, 6.0} is 0.0014036462767795599
which is larger than 1e-4 (which is 0.0001)
x is:
-8.4900836944580078
-2.9963669776916504
5.9890694618225098
{-8.5, -3.0, 6.0} is:
-8.5
-3
6
component-wise relative error is:
0.0011666241814108455
0.0012110074361165364
0.0018217563629150391
[ FAILED ] Bicgstab.SolvesDenseSystemUsingAdvancedApply (0 ms)
[ RUN ] Bicgstab.SolvesMultipleDenseSystemsUsingAdvancedApply
.../ginkgo/reference/test/solver/bicgstab_kernels_float.cpp:118: Failure
Relative error between x and {{-8.5, 1.0}, {-3.0, 2.0}, {6.0, -5.0}} is 0.0015665962953109148
which is larger than 1e-4 (which is 0.0001)
x is:
-8.4900836944580078 0.99250507354736328
-2.9963669776916504 1.9976270198822021
5.9890694618225098 -4.9917192459106445
{{-8.5, 1.0}, {-3.0, 2.0}, {6.0, -5.0}} is:
-8.5 1
-3 2
6 -5
component-wise relative error is:
0.0011666241814108455 0.0074949264526367188
0.0012110074361165364 0.0011864900588989258
0.0018217563629150391 0.0016561508178710937
[ FAILED ] Bicgstab.SolvesMultipleDenseSystemsUsingAdvancedApply (1 ms)
It is often required to create a linear operator as a composition of two operators without explicitly multiplying their coefficient matrices.
This is especially useful for composing factorization-based preconditioners M = U * V
where preconditioner application is performed in two stages as M^-1 * x = V^-1 * (U^-1 * x)
.
In general, given a type ComposedLinOp
which contains the two operators as members, every factorization F
, which decomposes a matrix A
into factors B
and C
can be viewed as a higher-order operation F : A -> (B*C)
, which returns an object of ComposedLinOp
type.
Pseudocode:
class ComposedLinOp : public BasicLinOp<ComposedLinOp> {
public:
LinOp *get_first() { return first_; }
LinOp *get_second() { return second_; }
void apply(const LinOp *b, LinOp *x) const {
auto *tmp = x->clone(); // should optimize this by allocating only once
second_->apply(b, tmp.get());
first_->apply(tmp.get(), x);
}
unique_ptr<LinOp> transpose() const {
return ComposedLinOp::create(second_->transpose(), first_->transpose());
}
protected:
ComposedLinOp(shared_ptr<LinOp> first, shared_ptr<LinOp> second);
private:
shared_ptr<LinOp> first_;
shared_ptr<LinOp> second_;
};
I always had trouble finding good and up-to-date tutorials about CMake, and how to properly do stuff there (especially the things that seem to be possible to do in multiple ways).
I think I finally managed to find a good collection of resources here.
Just by taking a quick look I already figured out that we're doing some of the things the wrong way which will prevent us in playing nicely with other software.
Someone should go over the materials in the link, and update our build system to follow the rules.
Excerpt from @venovako's e-mail:
SET_CUDA_HOST_COMPILER=ON
and manually setting CMAKE_CUDA_HOST_COMPILER
are mutually exclusive.
Here's what happens with the latest GitHub Ginkgo + CMake 3.11.4 if both are set:
Building CUDA object gpu/CMakeFiles/ginkgo_gpu.dir/matrix/coo_kernels.cu.o
cd /home/novakoni/src/GitHub/build/gpu && /state/partition1/soft/cuda-9.1/bin/nvcc -ccbin=/state/partition1/soft/gnu/gcc-6.3.0/bin/g++ -Dginkgo_gpu_EXPORTS -I/home/novakoni/src/GitHub/build/include -I/home/novakoni/src/GitHub/flx_ginkgo/include -isystem=/state/partition1/soft/cuda-9.1/include -I/home/novakoni/src/GitHub/flx_ginkgo -Xcompiler=-mno-fma -I/home/novakoni/src/GitHub/FloatX/src --compiler-bindir /state/partition1/soft/gnu/gcc-6.3.0/bin/g++ -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_60,code=compute_60 --expt-relaxed-constexpr -O3 -DNDEBUG -Xcompiler=-fPIC -std=c++11 -x cu -c /home/novakoni/src/GitHub/flx_ginkgo/gpu/matrix/coo_kernels.cu -o CMakeFiles/ginkgo_gpu.dir/matrix/coo_kernels.cu.o
nvcc fatal : redefinition of argument 'compiler-bindir'
If you look at the compile line, it specifies -ccbin
and again --compiler-bindir
.
CMAKE_CUDA_HOST_COMPILER
is manually set to /state/partition1/soft/gnu/gcc-6.3.0/bin/g++
The message is: if the user sets CMAKE_CUDA_HOST_COMPILER
manually, and your system of setting the compiler also kicks in, nvcc
dies.
It doesn't bother to check if the two definitions of the host compiler are in fact identical or not.
So, you might want to check if CMAKE_CUDA_HOST_COMPILER
and SET_CUDA_HOST_COMPILER
are both set and complain about it.
Some operations on operators on Ginkgo assume that the executors of all arguments are the same. This does not have to be true, and with the addition of gko::make_temporary_clone
helper can usually be solved easily.
BLAS-1 operations on dense matrices are an example of this problem.
I've been redesigning some of Ginkgo's core classes, and one thing that I'm not sure about is whether we want to expose the num_stored_elements
property in the LinOp
interface. We currently do that, but I think that's just for historical reasons (while we still didn't have a unified interface for all linear operators). I think that we should have this (or equivalent) parameter only in classes where it makes sense (like matrices, and some preconditioners).
The cons of having it in the base class are:
LinOp
s, as we have to pass this extra parameter.(A^+) A + (A^T) A
(A^+
is the moore-penrose inverse), which is implemented by just keeping a reference to A
, and computing the required operations (multiplication, transposed multiplication, solving a under/overdetermined system on the fly)? We can immediately say that the number of rows and column of such an operator is the same as the number of rows of A
, but we cannot really say what is the number of stored elements (0, the same as that of A
, something else?).The only pro I can see is that users can get the number of stored elements even without knowing the type of the operator they have. However, is this a realistic use case? Can they do anything useful with such an information, even though they have no idea if they have a solver, a preconditioner, or a matrix, and which kind it is?
@ginkgo-project/developers what do you think?
I would like to include the conclusions of this discussion into #46, as I've already changes the way we handle operator sizes there, so makes no sense in changing it twice.
For unbalanced matrices, a pre-sorting step can accumulate rows with similar nonzero counts in the same matrix slice. Starting with scheduling the largest slices, this should give good performance.
For a sorting ecosystem, we need:
The reordering of the matrix can happen in the conversion process from CSR to SELLP.
The current implementation of atomic operations (introduced in #57 and #64) does not support custom value types. Ginkgo will compile, but attempts of using it will result in assertions failures.
We should explore the possibility of general mutex functionalities for GPUs, which would allow us to implement atomics. They won't be extremely fast, but will at least work correctly.
Compile with gcc-5.5 fails at compiling range.cpp
[ 64%] Building CXX object core/test/base/CMakeFiles/core_test_base_range.dir/range.cpp.o
This is part of the error message:
In file included from /home/u/yanjen/ginkgo/build/third_party/gtest/src/googletest/include/gtest/gtest.h:370:0, from /home/u/yanjen/ginkgo/core/test/base/range.cpp:37: /home/u/yanjen/ginkgo/core/test/base/range.cpp: In member function ‘virtual void {anonymous}::Range_ComputesBitwiseNot_Test::TestBody()’: /home/u/yanjen/ginkgo/core/test/base/range.cpp:243:26: error: no match for call to ‘(gko::range<gko::accessor::bitwise_not<{anonymous}::dummy_accessor> >) (int, int, int)’ EXPECT_EQ(res(1, 2, 3), ~(2 + 6 + 3)); ^ In file included from /home/u/yanjen/ginkgo/core/test/base/range.cpp:34:0: /home/u/yanjen/ginkgo/core/base/range.hpp:331:35: note: candidate: template<class ... DimensionTypes> constexpr decltype (declval<gko::range<Accessor>::accessor>()((forward<DimensionTypes>)(gko::range::operator()::dimensions)...)) gko::range<Accessor>::operator()(DimensionTypes&& ...) const [with DimensionTypes = {DimensionTypes ...}; Accessor = gko::accessor::bitwise_not<{anonymous}::dummy_accessor>] GKO_ATTRIBUTES constexpr auto operator()(DimensionTypes &&... dimensions)
The std_extensions.hpp
file contains implementations of some useful standard utilities that are part of the C++14 and C++17 standards, but are not part of the C++11 standard (though they can be implemented in C++11).
For now, we are not documenting any of them, as there are documented in numerous online references about C++ (the standard draft, cplusplus.com, cppreference.com).
However, there is a proposal that we should stil provide some documentation for them, to make the documentation self-contained, and not require implementers of Ginkgo to look online for the documentation.
The purpose of this issue is to discuss if we should do this or not. Please use the thumbs up / thumbs down buttons to vote, and write comments if you want to clarify your position on this.
After already reviewing a couple of other implementations of Krylov solvers, today I looked into the TFQMR pull request (#23). After taking a closer look, I realized that the same problems I found here could have easily slipped through the review process of all the other Krylov methods (o.k. I do believe CG is sound). And I'm not only talking about design and implementation issues, I'm pretty sure the code itself is wrong (unless there's supposed to be a complicated calculation in there that doesn't affect the final result at all 🤔).
I believe this can happen because we ignore a lot of problems:
x
is the solution, r
is the residual, and tau
is the residual norm, maybe you can figure out that d
is the update direction, but that's about it). They are taken from some book / implementation which we do not reference anywhere, and we have no idea what each variable represents.x
is updated by d
multiplied by some scalar, and r
by A * d
multiplied by the same scalar, but everything else is a mystery).I'm sure that the implementation of TFQMR did quack once, but there's obviously something wrong with it. The only reason I found there was a problem is because I didn't like how the loop body looked, and wanted to see if I can simplify it. How many other "solvers" quacked once, but won't quack again, and got through the review process because no one disliked their loop bodies?
Here is what I propose (this is marked with "help wanted", and "question", so I do want to hear other people's opinion):
solution
, residual
, residual_norm
, update_direction
, update_step_size
instead of x
, r
, tau
, d
, <solver-dependent Greek letter>
. Doing a bit more typing won't kill us, but it will do a lot for code readability. I realize that the semantics of some variables are hard to express in 2-3 words, so we should add a more detailed description of what every variable is in the documentation.What follows is the example for CG (and should be included in the doxygen for Cg class):
The conjugate gradient method (CG) solves a linear system Ax = b
by minimizing the quadratic form f(x) = x^T A x + b^T x
. If A
is symmetric positive definite, the unique minimum is achieved in point x
which satisfies Ax = b
[reference]. Throughout this documentation and the implementation we will also refer to the linear operator A
as system_matrix
, b
as input_vector
and the solution x
as result_vector
. [note: exact notation is still to be discussed, but keep in mind that b
and x
should have a meaningful name in the context of any linear operator, not just a solver].
The minimum is computed in a series of iterations by starting with an initial guess result_vector(0)
, and at each iteration i
constructing the next guess result_vector(i+1)
by making a step of size update_step_size(i)
from the current guess result_vector(i)
along the update direction vector update_direction(i)
. That is, the next solution is constructed as:
result_vector(i + 1) = result_vector(i) + update_step_size(i) * update_direction(i) (1)
Each new update direction is chosen to be A
-orthogonal (i.e. orthogonal w.r.p. to the dot product p(x, y) = x^T * system_matrix * y
) to the vector subspace search_subspace(i)
spanned by all previous search directions search_direction(0), ..., search_direction(i-1)
. [Optional: (Note that A
-orthogonal vectors are sometimes called conjugate, hence the "C" in the method name.)]
The size of the step is chosen such that the next error error(i + 1) = true_result - result_vector(i + 1)
is A
-orthogonal to the new search subspace search_subspace(i + 1)
(i.e. span(search_subspace(i) U { update_direction(i) })
).
[Optional:
An interesting side-affect of this choice is that the residual residual(i + 1) = input_vector - system_matrix * result_vector(i + 1)
is orthogonal to search_subspace(i + 1)
.
Thus (assuming exact arithmetic), after n
steps of the method (n
being the size of the problem), the current search subspace will span the entire solution space, so the residual will surely be 0, i.e. the exact solution will be obtained.]
To reduce the amount of computation, the residuals can be obtained using the recurrence:
residual(i + 1) = residual(i) - update_step_size(i) * mapped_update_direction(i) (2)
Where mapped_update_direction(i)
is the update direction mapped using the linear operator A
to its range:
mapped_update_direction(i) = system_matrix * update_direction(i) (3)
This quantity is already available, as it is required to compute the update step size.
[Optional: The above explained method is often called the method of conjugate directions (CD). The CG method has the additional requirement that the previous residual (residual(i)
) is included in the new search subspace search_subspace(i + 1)
.
It can be shown [reference] that this, together with (2) and (3) implies that the search direction is a Krylov subspace:
search_subspace(i+1) = span{residual(0), system_matrix * residual(0), ..., system_matrix^i * residual(0)}
] [Can be replaced by a single statement: Finally, the new search direction is chosen such that the previous residual (residual(i)
) is included in the new search subspace search_subspace(i + 1)
.]
With these restrictions, the next update direction can be computed by A
-orthogonalizing the last residual [reference]:
residual_norm(i) = residual(i)^T * residual(i)
orthogonalization_factor = residual_norm(i) / residual_norm(i-1)
update_direction(i) = residual(i) + orthogonalization_factor * update_direction(i - 1) (4)
And the update step size as [reference]:
update_step_size(i) = residual_norm(i) / (update_direction(i)^T * mapped_update_direction(i)) (5)
[Finally, a section on how to apply the preconditioner should come here, since this is not trivial with CG - we have to implicitly factorize it so we cannot just say we apply the method on M^-1 A.]
Note that this is a bit long, since we are skipping 2 methods that logically come before CG (steepest descent and conjugate directions). We can make this shorter by removing some of the interesting facts I included here (e.g. reference to conjugate directions, the fact it is a Krylov subspace). In general it should be explained what the method does and where to get more information. My guess is that other methods like BiCG will have something in common, so we can reference CG from it - i.e they all have the concept of residual, update direction, update step size, residual norm and such, so we don't have to define it from scratch.
The main point I wanted to make here is that the code can be written in a way such that all variables have some meaning. E.g. you can immediately see that (4) constructs the new search direction from the residual by orthogonalizing it to the previous direction.
You cannot see that from a formula like d_{i+1} = r_{i+1) + beta_{i+1} d_{i}
. You may figure out what d
and r
are, but you sure cannot figure out what is beta
.
I've written this description using CG without the agonizing pain (If you look at it as 64 pages condensed into 1-2, I feel I did a decent job), so all [reference] anotations should reference a specific formula/chapter in that book (I was just to lazy to write that).
EDIT: I edited the above text to remove all of my useless and offensive ranting from the first version of the text, so at least anyone reading this later doesn't have to go through it. I would also like to apologize to anyone that read the first version and felt offended, that was not my intention. Finally, I added the tags [Optional: content] to all the optional text in the proposed Cg documentation, to make it clearer what can be removed if we want to be more concise.
Currently we only test the default configuration of ValueType = double
, IndexType = int32
.
We should modify our tests to use Type-parameterized tests to check for other combinations.
gpu/matrix/hybrid_kernels.cu
gpu/test/matrix/hybrid_kernels.cpp
Some Krylov methods, such as the Quasi-Minimal Residual (templates, page 22), are designed for preconditioners of the form M = U * V
.
The implementation of the method assumes that the preconditioner can be decomposed and U
and V
factors applied separately.
How should we implement this in Ginkgo? Currently, the only assumption is that preconditioners can be applied to a vector (there's nothing about decomposing them).
gko::matrix::Ell
class template in core/matrix/ell.hpp
core/matrix/ell.cpp
NOT_IMPLEMENTED
core/test/matrix/ell.cpp
contributors.txt
😄Ginkgo's build system doesn't specify the GPU architectures on nvcc's command line. For this reason it just compiles the kernels to an intermediate code, which requires a jitter pass when the code is first run (maybe even every time the system is rebooted?), and even fails to build with CUDA 8 or lower, as it tries to build for c.c. 2.x, which is not supported by Ginkgo. To fix this the user has to manually modify the CMAKE_CUDA_FLAGS
variable to set the architectures to the correct versions (this is awfully verbose if setting multiple architectures).
To simplify the process we could provide the user with a cmake variable which will let them set the architectures in a simpler way, the build system could then check if all the specified architectures are supported by Ginkgo (and throw a nice error if this is not true), and even detect the hardware and populate the variable automatically if the user doesn't set it by themselves.
Currently we are only compiling Ginkgo for default (float
, double
, std::complex<>
of those) value types when checking the correctness of our implementation. (We're also testing only for double
, see #32 for discussions on that).
This means that the interface that ValueType
needs to have may change after a PR without anyone noticing (e.g. someone used a part of the interface of default types, which user-defined types may not have to implement).
I propose we create a dummy ValueType
implementation (we would create a concept instead if TS concepts were a part of C++11), that could just be a double
wrapped in a class, and have the interface that Ginkgo's ValueType
has to satisfy defined. This gives us two benefits:
ValueType
's interfacecore/matrix/ell.cpp
core/base/ell_kernels.hpp
NOT_COMPILED
in core/base/device_hooks/common_kernels.inc.cpp
NOT_IMPLEMENTED
in cpu/matrix/ell_kernels.cpp
and gpu/matrix/ell_kernels.cu
reference/matrix/ell_kernels.cpp
reference/test/matrix/ell_kernels.cpp
core/matrix/sellp.cpp
core/base/sellp_kernels.hpp
NOT_COMPILED
in core/base/device_hooks/common_kernels.inc.cpp
NOT_IMPLEMENTED
in cpu/matrix/sellp_kernels.cpp
and gpu/matrix/sellp_kernels.cu
reference/matrix/sellp_kernels.cpp
reference/test/matrix/sellp_kernels.cpp
gpu/matrix/ell_kernels.cu
gpu/test/matrix/ell_kernels.cpp
gko::matrix::Sellp
class template in core/matrix/sellp.hpp
core/matrix/sellp.cpp
NOT_IMPLEMENTED
core/test/matrix/sellp.cpp
When use cmake 3.12.0, it will produce a error nvcc fatal : Unknown option 'pthread'
There are two cmake version 3.10.1 and 3.12.0 difference.
// cmake-3.10.1
/usr/local/cuda-8.0/bin/nvcc -gencode=arch=compute_35,code=sm_35 -gencode=arch=compute_35,code=compute_35 --expt-relaxed-constexpr -Xcompiler=-fPIC -Wno-deprecated-gpu-targets -shared -dlink CMakeFiles/cuda_test_base_cuda_executor.dir/cuda_executor.cu.o -o CMakeFiles/cuda_test_base_cuda_executor.dir/cmake_device_link.o
// cmake-3.12.0
/usr/local/cuda-8.0/bin/nvcc -gencode=arch=compute_35,code=sm_35 -gencode=arch=compute_35,code=compute_35 --expt-relaxed-constexpr -Xcompiler=-fPIC -Wno-deprecated-gpu-targets -shared -dlink CMakeFiles/cuda_test_base_cuda_executor.dir/cuda_executor.cu.o -o CMakeFiles/cuda_test_base_cuda_executor.dir/cmake_device_link.o -Xnvlink ../../../core/libginkgo.so -Xnvlink ../../../third_party/gtest/build/googlemock/gtest/libgtest_main.so -Xnvlink ../../../omp/libginkgo_omp.so -Xnvlink ../../libginkgo_cuda.so -Xnvlink ../../../reference/libginkgo_reference.so -Xnvlink ../../../third_party/gtest/build/googlemock/gtest/libgtest.so -pthread
pthread comes from gtest
It is maybe from
https://gitlab.kitware.com/cmake/cmake/commit/41eab150a8ef42bbebff18ff84652e9da1ef4e75
The GpuExecutor
currently ignores the device_id
field, and just uses the currently active device.
To fix this, several of the GpuExecutor
's methods have to be modified:
GpuExecutor
class in /core/base/executor.hpp
has to override the run()
method defined in Executor
.run()
method equivalent to the one in ExecutorBase
should be put in /core/device_hooks/gpu_hooks.cpp
(this is the one that will be used if the GPU module is not compiled)run()
method should be put into /gpu/base/executor.cpp
(this is the one used when the GPU module is compiled). This implementation should do the followig:
device_id_
op.run()
in the same way as in ExecutorBase::run()
/gpu/base/executor.cpp
(except get_num_devices
and GpuExecutor::raw_copy_to(GpuExecutor,...)
should be modified to include these device manipulation stepsGpuExecutor::raw_copy_to(GpuExecutor,...)
should be implemented using cudaMemcpyPeer
instead of cudaMemcpy
The testing freezes in gpu/test/preconditioner/block_jacobi_kernels when using cuda 9.0 and cuda 9.1 with Titan V.
There are freezing testing cases in block_jacobi_kernels:
Run these tests is fine when using cuda 8.0, cuda 9.0, and cuda 9.1 with K40, but they freeze when using cuda 9.0 and cuda 9.1 with Titan V
core/matrix/hybrid.cpp
core/base/hybrid_kernels.hpp
NOT_COMPILED
in core/base/device_hooks/common_kernels.inc.cpp
NOT_IMPLEMENTED
in cpu/matrix/hybrid_kernels.cpp
and gpu/matrix/hybrid_kernels.cu
reference/matrix/hybrid_kernels.cpp
reference/test/matrix/hybrid_kernels.cpp
Note: this issue was moved from Gitlab: https://gitlab.com/ginkgo-project/ginkgo/issues/74
It is often required to obtain a sub-range of an array or a matrix, for example when designing algorithms that work with matrix blocks. Formalizing this in a range
concept and defining classes which implement it would have several benefits:
s|eeeeeeeeeee|mmmmmmmmmmmmmmmmmmmm||mmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
and storing the two halves separately),How I imagine the interface to look like is something like this (inspired by the Armadillo library):
template <typename Range>
void range_example(Range r)
{
constexpr int dims = Range::num_dimensions; // get the number of dimensions in the range (1 for vec, 2 for matrix, should also support tensors)
constexpr auto order = Range::storage_order; // get the storage order of the range - row-major/column-major (1)
static_assert(dims >= 2, "This only works for ranges with at least 2 dimensions");
auto elem = r(3, 4); // get element at position (3, 4), if the range has more dimensions all the trailing dimensions are set to 0s
auto sub_range = r(span(end - 3, end), span()): // get subrange (end - 3:end, 0:end, 0, ...) of the range (2), (3)
r((span(0, 3), span()) = sub_range; // copy sub-range, range implementation can optimize the copy for the storage scheme
auto r2 = r.reshape(4, 5, 2); // reshape the range to size (4, 5, 2);
auto s = size(r2); // get the size of a range, in this case (4, 5, 2);
}
(1) maybe we could also have this as a list of dimension indexes - the implementation could be easier, and it is a more flexible solution for multidimensional ranges
(2) span
is the name they use in armadillo, maybe we can think of something better - but should be both clear and short, as we'll use it all the time, and also shouldn't clash with any other "usual" names
(3) end
could cause a lot of name clashes - not sure how to solve this, it has to be descriptive and short, but I am worried about end
, maybe npos
, as this will be familiar for anyone that used std::string
?
I feel that defining tons of vectors like this in every solver is getting nonsensical. How about we define some helper macros for that:
namespace detail {
template <typename T>
struct add_unique_ptr {
using type = std::unique_ptr<T>;
};
template <typename T>
struct add_unique_ptr<T*> {
using type = std::unique_ptr<T>;
};
template <typename T, typename Deleter>
struct add_unique_ptr<std::unique_ptr<T, Deleter>> {
using type = std::unique_ptr<T, Deleter>;
};
template <typename SourceType>
inline void initialize_named_vectors(const SourceType &) {}
template <typename SourceType, typename VectorType, typename... OtherVectorTypes>
inline void initialize_named_vectors(const SourceType &source, VectorType &vector, OtherVectorTypes&... rest)
{
vector = SourceType::create(exec, source->get_num_rows(), source->get_num_cols(), source->get_stride());
initialize_named_vectors(source, rest);
}
template <typename SourceType, typename VectorType, typename... OtherVectorTypes>
inline void initialize_named_scalars(const SourceType &source, VectorType &vector, OtherVectorTypes&... rest)
{
vector = SourceType::create(exec, 1, source->get_num_cols());
initialize_named_vectors(vector.get(), rest);
}
} // namespace detail
#define INITIALIZE_NAMED_VECTORS(_source, ...) \
typename detail::add_unique_ptr<typename std::decay<decltype(_source)>::type>::type __VA_ARGS__; \
detail::initialize_named_vectors(_source, __VA_ARGS__);
#define INITIALIZE_NAMED_SCALARS(_source, ...) \
typename detail::add_unique_ptr<typename std::decay<decltype(_source)>::type>::type __VA_ARGS__; \
detail::initialize_named_scalars(_source, __VA_ARGS__);
Note that I didn't test this code, so there could be some bugs in it.
But once we have it set up, we should be able to do:
INITIALIZE_NAMED_VECTORS(dense_b, r, r_tld, p, q, u, u_hat, v_hat, t);
INITIALIZE_NAMED_SCALARS(dense_b, alpha, beta, gamma, rho_prev, rho, tau);
Once we have #26, adding factorization-based preconditioners will be straightforward.
We only need to implement factorizations (I propose we use the factorization
namespace for those): a special kind of LinOp factory that returns a ComposedLinOp
.
E.g. pseudocode for ILU:
template <typename FactorType = matrix::Csr<>>
class Ilu : public LinOpFactory {
unique_ptr<LinOp> generate(shared_ptr<LinOp> *op) {
// calculate size, memory requirement
auto l_factor = FactorType::create(/* size, memory requirements */);
auto u_factor = FactorType::create(/* size, memory requirements */);
launch_factorization_kernel(op, l_factor.get(), u_factor.get());
return ComposedLinOp::create(std::move(l_factor), std::move(u_factor));
}
};
fact = ilu->generate(A)
gives us a linear operator fact = L*U
from which we can get L
and U
by calling fact->get_first()
and fact->get_second()
.
We can then use these factors to create an ILU preconditioner (i.e. an operator that solves a pair of systems Ly = b
, Ux = y
) by simply doing:
auto A = /* initialize matrix */;
auto upper_triangular_solver = /* initialize upper triangular solver factory */;
auto lower_triangular_solver = /* initialize lower triangular solver factory */;
auto factorization = /* initialize factorization factory */;
auto factorized_A = factorization->generate(A);
auto precond = ComposedLinOp::create(upper_triangular_solver->generate(factorized_A->get_second()), lower_triangular_solver->generate(factorized_A->get_first()));
// use precond in a solver
Of course, it's easier for the user if they don't have to manually compose the preconditioner (they could also make the mistake in composing it the wrong way - reversing the order, or forgetting to add the solvers), so we can add a utility FactorizedPreconditionerFactory
that will do the correct composition for the user (it also has the advantage that it can be used in place of any other LinOpFactory
):
auto A = /* initialize matrix */;
auto upper_triangular_solver = /* initialize upper triangular solver factory */;
auto lower_triangular_solver = /* initialize lower triangular solver factory */;
auto factorization = /* initialize factorization factory */;
auto precond_factory = FactorizedPreconditionerFactory::create(factorization, lower_triangular_solver, upper_triangular_solver);
auto precond = precond_factory->generate(A);
// use precond in a solver
This is a summary of a small discussion which arose during MR #46 but is more general in Ginkgo. The purpose of this Issue is simply to discuss it and collect everyone's opinion on the subject.
The current class naming convention for Ginkgo is detailed here and can be summed up as the following (correct me if I'm wrong in any of this post):
class_name
.ClassName
The reason for the distinction is that polymorphic classes should be used with care and are "heavier" in a way (and capital case can be seen as heavy).
There are some problems though to distinguish what we consider to be polymorphic behavior. We for example have called the array class Array
which has executors so it can be seen as polymorphic behavior, but there are no virtual methods and all of them are quite cheap.
I also think this naming convention can tend to be confusing and at the minimum, the convention should be made very clear somewhere in the code. The reason for that is that the convention itself serves as a form of documentation and it's not useful unless that convention is embedded in the documentation: ClassName
documents that this class is polymorphic and could imply significant overhead depending on what you do with it or which function you use.
I also personally like to be able to distinguish classes from functions, variable names, ..., in all cases so I'm in favor of a uniform ClassName
convention. In addition, it's true some classes (LinOp
s for example) are heavily polymorphic and should be used with care. I think we should have a standard way of adding a @warning
in the documentation for those classes.
What do you think @ginkgo-project/developers.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.