Code Monkey home page Code Monkey logo

Comments (4)

tuanle618 avatar tuanle618 commented on June 14, 2024

Hi @dorooddorood606 , you can achieve such initialization with the following code:

from benchmarks.utils import set_seed_all
from phc.hypercomplex.layers import PHMLinear
import torch

# Initialize the final weight matrix following a certain distribution
device = "cuda:0" if torch.cuda.is_available() else "cpu"

set_seed_all(seed=43)
phm_lin1 = PHMLinear(in_features=128 // 2, out_features=256 // 2, phm_dim=4, w_init="phm", c_init="standard").to(device)

for w in phm_lin1.W:
    w.data.normal_(mean=0.0, std=0.01)

for w in phm_lin1.W:
    print(w.std())

# tensor(0.0100, device='cuda:0', grad_fn=<StdBackward0>)
# tensor(0.0101, device='cuda:0', grad_fn=<StdBackward0>)
# tensor(0.0099, device='cuda:0', grad_fn=<StdBackward0>)
# tensor(0.0099, device='cuda:0', grad_fn=<StdBackward0>)

If you want to modify the phm_rules, you can iterate over phm_lin1.phm_rules and retrieve the data attribute, like:

for w in phm_lin1.phm_rule:
    w.data.normal_(mean=0.5, std=0.1)

for w in phm_lin1.phm_rule:
    print(w)

# Parameter containing:
# tensor([[0.6034, 0.5514, 0.4601, 0.7307],
#         [0.5802, 0.4613, 0.4960, 0.6374],
#         [0.6922, 0.5066, 0.5063, 0.4360],
#         [0.5713, 0.3694, 0.5513, 0.4803]], device='cuda:0', requires_grad=True)
# Parameter containing:
# tensor([[0.3592, 0.5751, 0.5850, 0.5287],
#         [0.4716, 0.4622, 0.5230, 0.5109],
#         [0.4808, 0.3467, 0.5735, 0.5904],
#         [0.4408, 0.5532, 0.5885, 0.5192]], device='cuda:0', requires_grad=True)
# Parameter containing:
# tensor([[0.3816, 0.6542, 0.3359, 0.4211],
#         [0.6865, 0.3759, 0.5291, 0.5276],
#         [0.6018, 0.5565, 0.4768, 0.6355],
#         [0.5029, 0.5969, 0.6655, 0.3873]], device='cuda:0', requires_grad=True)
# Parameter containing:
# tensor([[0.5919, 0.5583, 0.3676, 0.5180],
#         [0.5897, 0.3686, 0.4941, 0.6941],
#         [0.6832, 0.6234, 0.3679, 0.2792],
#         [0.4790, 0.4572, 0.4511, 0.5616]], device='cuda:0', requires_grad=True)

from phc-gnn.

dorooddorood606 avatar dorooddorood606 commented on June 14, 2024

Hi
Thank you for the response. Sorry for the misunderstanding. What I meant was if we could intialize the components of phm_rule and W in PHM layers in a way that final weight matrix which approximates the linear layer be close to normal(mean=0, std=0.01) inialization. So lets assume we compute the H = \sum_i(phm_i \odot W_i) how can we have H initalized as normal by initalizing phm_i and W_i elements. thanks a lot for any suggestions in advance

from phc-gnn.

tuanle618 avatar tuanle618 commented on June 14, 2024

Hi @dorooddorood606 , I need to think more about how we can formulate this problem, to get a precise initialization scheme, but you could start with the following code and test out different std for the W tensor, i.e., the weight-matrices.

import torch

from benchmarks.utils import set_seed_all
from phc.hypercomplex.layers import PHMLinear
from phc.hypercomplex.kronecker import kronecker_product_einsum_batched


set_seed_all(42)
phm_dim = 4
in_feats = 256
out_feats = 256
in_feats_axis = in_feats // phm_dim
out_feats_axis = out_feats // phm_dim

# fix this (corresponds to the phm-rules, i.e., the C_i in the paper
C = torch.randn(phm_dim, phm_dim, phm_dim).normal_(0, 0.1)

# try out here
W = torch.randn(phm_dim, in_feats_axis, out_feats_axis).normal_(0, 0.05)

H = kronecker_product_einsum_batched(C, W)
HH = H.sum(0)
print(HH.mean())
print(HH.std())
# tensor(2.9075e-06)
# tensor(0.0087)

If you found an approximate std for initializing the W_i matrices, then you can use the code I sent you earlier, to init the W-matrices. As of now, the standard deviation for the phm-rules (C_i) are fixed initialized with standard deviation 0.1 -
Generally, the final standard deviation for the H-matrix (after sum of Kronecker products, i.e. in the code, the HH object) can be computed by computing the standard deviation of the vectorized version of the sum of Kronecker products. But I need to think more about it and write down the equations. I hope this solution helps you, so you can at least try out, and if not, even get the right answer from it by using my hint.

from phc-gnn.

dorooddorood606 avatar dorooddorood606 commented on June 14, 2024

thanks a lot

from phc-gnn.

Related Issues (8)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.