Comments (4)
Hi @dorooddorood606 , you can achieve such initialization with the following code:
from benchmarks.utils import set_seed_all
from phc.hypercomplex.layers import PHMLinear
import torch
# Initialize the final weight matrix following a certain distribution
device = "cuda:0" if torch.cuda.is_available() else "cpu"
set_seed_all(seed=43)
phm_lin1 = PHMLinear(in_features=128 // 2, out_features=256 // 2, phm_dim=4, w_init="phm", c_init="standard").to(device)
for w in phm_lin1.W:
w.data.normal_(mean=0.0, std=0.01)
for w in phm_lin1.W:
print(w.std())
# tensor(0.0100, device='cuda:0', grad_fn=<StdBackward0>)
# tensor(0.0101, device='cuda:0', grad_fn=<StdBackward0>)
# tensor(0.0099, device='cuda:0', grad_fn=<StdBackward0>)
# tensor(0.0099, device='cuda:0', grad_fn=<StdBackward0>)
If you want to modify the phm_rules, you can iterate over phm_lin1.phm_rules
and retrieve the data
attribute, like:
for w in phm_lin1.phm_rule:
w.data.normal_(mean=0.5, std=0.1)
for w in phm_lin1.phm_rule:
print(w)
# Parameter containing:
# tensor([[0.6034, 0.5514, 0.4601, 0.7307],
# [0.5802, 0.4613, 0.4960, 0.6374],
# [0.6922, 0.5066, 0.5063, 0.4360],
# [0.5713, 0.3694, 0.5513, 0.4803]], device='cuda:0', requires_grad=True)
# Parameter containing:
# tensor([[0.3592, 0.5751, 0.5850, 0.5287],
# [0.4716, 0.4622, 0.5230, 0.5109],
# [0.4808, 0.3467, 0.5735, 0.5904],
# [0.4408, 0.5532, 0.5885, 0.5192]], device='cuda:0', requires_grad=True)
# Parameter containing:
# tensor([[0.3816, 0.6542, 0.3359, 0.4211],
# [0.6865, 0.3759, 0.5291, 0.5276],
# [0.6018, 0.5565, 0.4768, 0.6355],
# [0.5029, 0.5969, 0.6655, 0.3873]], device='cuda:0', requires_grad=True)
# Parameter containing:
# tensor([[0.5919, 0.5583, 0.3676, 0.5180],
# [0.5897, 0.3686, 0.4941, 0.6941],
# [0.6832, 0.6234, 0.3679, 0.2792],
# [0.4790, 0.4572, 0.4511, 0.5616]], device='cuda:0', requires_grad=True)
from phc-gnn.
Hi
Thank you for the response. Sorry for the misunderstanding. What I meant was if we could intialize the components of phm_rule and W in PHM layers in a way that final weight matrix which approximates the linear layer be close to normal(mean=0, std=0.01)
inialization. So lets assume we compute the H = \sum_i(phm_i \odot W_i) how can we have H initalized as normal by initalizing phm_i and W_i elements. thanks a lot for any suggestions in advance
from phc-gnn.
Hi @dorooddorood606 , I need to think more about how we can formulate this problem, to get a precise initialization scheme, but you could start with the following code and test out different std
for the W
tensor, i.e., the weight-matrices.
import torch
from benchmarks.utils import set_seed_all
from phc.hypercomplex.layers import PHMLinear
from phc.hypercomplex.kronecker import kronecker_product_einsum_batched
set_seed_all(42)
phm_dim = 4
in_feats = 256
out_feats = 256
in_feats_axis = in_feats // phm_dim
out_feats_axis = out_feats // phm_dim
# fix this (corresponds to the phm-rules, i.e., the C_i in the paper
C = torch.randn(phm_dim, phm_dim, phm_dim).normal_(0, 0.1)
# try out here
W = torch.randn(phm_dim, in_feats_axis, out_feats_axis).normal_(0, 0.05)
H = kronecker_product_einsum_batched(C, W)
HH = H.sum(0)
print(HH.mean())
print(HH.std())
# tensor(2.9075e-06)
# tensor(0.0087)
If you found an approximate std
for initializing the W_i matrices, then you can use the code I sent you earlier, to init the W-matrices. As of now, the standard deviation for the phm-rules (C_i) are fixed initialized with standard deviation 0.1 -
Generally, the final standard deviation for the H-matrix (after sum of Kronecker products, i.e. in the code, the HH
object) can be computed by computing the standard deviation of the vectorized version of the sum of Kronecker products. But I need to think more about it and write down the equations. I hope this solution helps you, so you can at least try out, and if not, even get the right answer from it by using my hint.
from phc-gnn.
thanks a lot
from phc-gnn.
Related Issues (8)
- implementing linear layer with PHM-layer HOT 15
- Use cached version of PHM-weight matrix after training.
- kronecker product implementation and difference with torch.kron HOT 4
- serious reproducibility issue with PHMLinear layers HOT 2
- making PHM layers faster HOT 1
- AssertionError: Argument `in_features`=50 is not divisble be `phm_dim`4 HOT 3
- assert x_j.size(-1) == edge_attr.size(-1) HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from phc-gnn.