Hi I would like to initialize phm_rule and weights in a way that the

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

initialization of PHM layers ,about bayer-science-for-a-better-life/phc-gnn

Comments (4)

tuanle618 commented on June 14, 2024

Hi @dorooddorood606 , you can achieve such initialization with the following code:

from benchmarks.utils import set_seed_all
from phc.hypercomplex.layers import PHMLinear
import torch

# Initialize the final weight matrix following a certain distribution
device = "cuda:0" if torch.cuda.is_available() else "cpu"

set_seed_all(seed=43)
phm_lin1 = PHMLinear(in_features=128 // 2, out_features=256 // 2, phm_dim=4, w_init="phm", c_init="standard").to(device)

for w in phm_lin1.W:
    w.data.normal_(mean=0.0, std=0.01)

for w in phm_lin1.W:
    print(w.std())

# tensor(0.0100, device='cuda:0', grad_fn=<StdBackward0>)
# tensor(0.0101, device='cuda:0', grad_fn=<StdBackward0>)
# tensor(0.0099, device='cuda:0', grad_fn=<StdBackward0>)
# tensor(0.0099, device='cuda:0', grad_fn=<StdBackward0>)

If you want to modify the phm_rules, you can iterate over phm_lin1.phm_rules and retrieve the data attribute, like:

for w in phm_lin1.phm_rule:
    w.data.normal_(mean=0.5, std=0.1)

for w in phm_lin1.phm_rule:
    print(w)

# Parameter containing:
# tensor([[0.6034, 0.5514, 0.4601, 0.7307],
#         [0.5802, 0.4613, 0.4960, 0.6374],
#         [0.6922, 0.5066, 0.5063, 0.4360],
#         [0.5713, 0.3694, 0.5513, 0.4803]], device='cuda:0', requires_grad=True)
# Parameter containing:
# tensor([[0.3592, 0.5751, 0.5850, 0.5287],
#         [0.4716, 0.4622, 0.5230, 0.5109],
#         [0.4808, 0.3467, 0.5735, 0.5904],
#         [0.4408, 0.5532, 0.5885, 0.5192]], device='cuda:0', requires_grad=True)
# Parameter containing:
# tensor([[0.3816, 0.6542, 0.3359, 0.4211],
#         [0.6865, 0.3759, 0.5291, 0.5276],
#         [0.6018, 0.5565, 0.4768, 0.6355],
#         [0.5029, 0.5969, 0.6655, 0.3873]], device='cuda:0', requires_grad=True)
# Parameter containing:
# tensor([[0.5919, 0.5583, 0.3676, 0.5180],
#         [0.5897, 0.3686, 0.4941, 0.6941],
#         [0.6832, 0.6234, 0.3679, 0.2792],
#         [0.4790, 0.4572, 0.4511, 0.5616]], device='cuda:0', requires_grad=True)

from phc-gnn.

dorooddorood606 commented on June 14, 2024

Hi
Thank you for the response. Sorry for the misunderstanding. What I meant was if we could intialize the components of phm_rule and W in PHM layers in a way that final weight matrix which approximates the linear layer be close to normal(mean=0, std=0.01) inialization. So lets assume we compute the H = \sum_i(phm_i \odot W_i) how can we have H initalized as normal by initalizing phm_i and W_i elements. thanks a lot for any suggestions in advance

from phc-gnn.

tuanle618 commented on June 14, 2024

Hi @dorooddorood606 , I need to think more about how we can formulate this problem, to get a precise initialization scheme, but you could start with the following code and test out different std for the W tensor, i.e., the weight-matrices.

import torch

from benchmarks.utils import set_seed_all
from phc.hypercomplex.layers import PHMLinear
from phc.hypercomplex.kronecker import kronecker_product_einsum_batched


set_seed_all(42)
phm_dim = 4
in_feats = 256
out_feats = 256
in_feats_axis = in_feats // phm_dim
out_feats_axis = out_feats // phm_dim

# fix this (corresponds to the phm-rules, i.e., the C_i in the paper
C = torch.randn(phm_dim, phm_dim, phm_dim).normal_(0, 0.1)

# try out here
W = torch.randn(phm_dim, in_feats_axis, out_feats_axis).normal_(0, 0.05)

H = kronecker_product_einsum_batched(C, W)
HH = H.sum(0)
print(HH.mean())
print(HH.std())
# tensor(2.9075e-06)
# tensor(0.0087)

If you found an approximate std for initializing the W_i matrices, then you can use the code I sent you earlier, to init the W-matrices. As of now, the standard deviation for the phm-rules (C_i) are fixed initialized with standard deviation 0.1 -
Generally, the final standard deviation for the H-matrix (after sum of Kronecker products, i.e. in the code, the HH object) can be computed by computing the standard deviation of the vectorized version of the sum of Kronecker products. But I need to think more about it and write down the equations. I hope this solution helps you, so you can at least try out, and if not, even get the right answer from it by using my hint.

from phc-gnn.

dorooddorood606 commented on June 14, 2024

thanks a lot

from phc-gnn.

initialization of PHM layers about phc-gnn HOT 4 CLOSED

Comments (4)

Related Issues (8)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent