Comments (9)
Tagging @csiefer2 and @mperego for visibility
I reverted the weaver dashboard to the old flag for the time being so we don't lose coverage.
from albany.
@cgcgcg FYI, we have seen these errors when enabling UVM with the new flags. Hopefully they will be caught by the new PR trilinos/Trilinos#12767.
from albany.
Are you sure Albany is correctly set up to use the Tpetra shared spaces flag? I see this
Albany/src/Albany_TpetraTypes.hpp
Lines 52 to 62 in 93c7e76
and this
Albany/src/Albany_KokkosTypes.hpp
Line 29 in 93c7e76
but nothing in Phalanx is using the Tpetra shared space stuff.
from albany.
@rppawlo Christian helped me understand a bit better this issue.
Our goal is to use Cuda UVM without using the deprecated -D Kokkos_ENABLE_CUDA:BOOL=ON
option.
Tpetra and Kokkos Kernels allow to use CUDA UVM by setting
-D KokkosKernels_INST_MEMSPACE_CUDAUVMSPACE=ON
-D Tpetra_ALLOCATE_IN_SHARED_SPACE=ON
However, Phalanx does not allow us to do so. At the moment, in Albany we use Phalanx::Device
everywhere, and we end up having issues because our Tpetra vector and maps and matrices use Phalanx::Device, which doesn't use Cuda UVM, whereas Tpetra is expecting us to use Cuda UVM.
The most natural options for us would be for Phalanx to use Cuda UVM when Tpetra is, or, alternatively, to allow Albany to set a DeviceType for Phalanx (I know that you have worked on a branch that is supposed to do so, and it's probably the best solution for Trilinos).
Let us know what you think. This is not urgent, but we would like to have this addressed at some point and we can probably put some resources into it.
from albany.
I don't think we should make Phalanx look at what Tpetra uses, b/c PHX has no dependence on Tpetra.
Probably the cleanest solution would be to add a -DPhalanx_ENABLE_SHARED_SPACE=ON
cmake option for phalanx, cmakedefine it in Phalanx_config.hpp.in
, and then in Phalanx_KokkosDeviceTypes.hpp
do
using exec_space = PHX::Device::execution_space;
using ExecSpace = PHX::Device::execution_space;
#ifdef PHALANX_ENABLE_SHARED_SPACE
using MemSpace = Kokkos::SharedSpace;
using mem_space = Kokkos::SharedSpace;
#else
using MemSpace = PHX::Device::memory_space;
using mem_space = PHX::Device::memory_space;
#endif
It is up to the downstream app then to ensure that Tpetra and PHX use compatible spaces. If they so desire. There is no a-priori need to have PHX and Tpetra use the same mem space (though it is probably convenient).
from albany.
@bartgol 's comment is probably the fastest way to accomplish uvm support. There would have to be some changes to the memory allocators as well (make sure phalanx uses MemSpace consistently and still allow users to override when needed). We do not want to introduce a dependency between phalanx and tpetra just to get a default type. There is a branch that makes the PHX::Device a true Kokkos::Device (my preferred long term solution), but that causes a mess of backwards incompatible changes. It severely impacts the downstream apps. The branch is in my personal fork of trilinos: rppawlo/phalanx-device-object-refactor
if you want to take a look. I do have some refactoring tools as well that get you the 90% solution. We should have a quick meeting about this.
from albany.
I think that setting the Device would not solve the problem. The issue is the memory space that PHX uses. Without using the deprecated Kokkos_ENABLE_CUDA_UVM
, the mem space of the Cuda device will always be CudaSpace. It is up to the kokkos customer (PHX in this case) to use Kokkos::SharedSpace
as a memory space, rather than DeviceType::memory_space
(and this can be done on a per-view basis as well!).
from albany.
Sounds good. I didn't think of the fact that Phalanx doesn't depend on Tpetra
@bartgol The "true" Kokkos::Device, not the current PHX::Device, let you set both the execution space and the memory space, e.g. Kokkos::Device<Kokkos::Cuda,Kokkos::CudaUVMSpace>
. But I might be misunderstanding what you are saying.
@rppawlo, sounds good, let's have a quick meeting when you have time.
from albany.
Ah, you're right. I confused Kokkos::Device with the exec space. Makes sense then.
from albany.
Related Issues (20)
- Trilinos build error reported on Blake due to issue with Kokkos Macros HOT 5
- Albany warning-free array-bounds error on Blake HOT 1
- Kokkos::vector'is deprecated HOT 9
- Switch trilinos installations to use stk simple fields
- Tpetra CudaUVM flags for when Kokkos_ENABLE_CUDA_UVM goes away
- Nightly test failures HOT 39
- corePDEs_SideSetLaplacian_3D failing on weaver after epetra removal HOT 6
- Clean up nightlies to turn off Epetra HOT 4
- PyAlbany MatrixOperations test failing in spack nightly HOT 18
- Attaway sems modules are broken, causing those tests to fail
- Failing PyAlbany test in spack nightly build HOT 4
- Failing nightly builds due to Omegah HOT 2
- Rework nightlies on CEE for new RHEL8 OS HOT 1
- Can not finish albany spack install since trilinos failed to be installed HOT 6
- Failing attaway nightlies due to Netcdf/Pnetcdf HOT 28
- Weaver failing nightlies HOT 10
- Failing landIce_FO_GIS_AdjointSensitivity_Thickness test in nightly spack build HOT 12
- Failing corePDEs_SteadyHeatConstrainedOpt2D_Conductivity_Dist_Param_Restart test in some nightlies HOT 3
- Failing corePDEs_SideSetLaplacian_3D in CUDA weaver build HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from albany.