Code Monkey home page Code Monkey logo

wsl-images's Introduction

WSL-Images

This project provides models pre-trained in weakly-supervised fashion on 940 million public images with 1.5K hashtags matching with 1000 ImageNet1K synsets, followed by fine-tuning on ImageNet1K dataset. Please refer to "Exploring the Limits of Weakly Supervised Pretraining" (https://arxiv.org/abs/1805.00932) presented at ECCV 2018 for the details of model training.

We are providing 4 models with different capacities.

Model #Parameters FLOPS Top-1 Acc. Top-5 Acc.
ResNeXt-101 32x8d 88M 16B 82.2 96.4
ResNeXt-101 32x16d 193M 36B 84.2 97.2
ResNeXt-101 32x32d 466M 87B 85.1 97.5
ResNeXt-101 32x48d 829M 153B 85.4 97.6

Our models significantly improve the training accuracy on ImageNet compared to training from scratch. We achieve state-of-the-art accuracy of 85.4% on ImageNet with our ResNext-101 32x48d model.

Loading models with torch.hub

The models are available with torch.hub. As an example, to load the ResNext-101 32x16d model, simply run:

model = torch.hub.load('facebookresearch/WSL-Images', 'resnext101_32x16d_wsl')

Please refer to torch.hub to see a full example of using the model to classify an image.

Citing WSL-Images

If you use the WSL-Images models, please cite the following publication.

@inproceedings{wslimageseccv2018,
  title={Exploring the Limits of Weakly Supervised Pretraining},
  author={Dhruv Kumar Mahajan and Ross B. Girshick and Vignesh Ramanathan and Kaiming He and Manohar Paluri and Yixuan Li and Ashwin Bharambe and Laurens van der Maaten},
  booktitle={ECCV},
  year={2018}
}

License

WSL-Images models are released under the CC-BY-NC 4.0 license. See LICENSE for additional details.

wsl-images's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

wsl-images's Issues

can not load model from hub

when i run model = torch.hub.load('facebookresearch/WSL-Images', 'resnext101_32x16d_wsl')
appear expections:
RuntimeError: invalid hash value (expected "c6f796b0", got "bf1a019059d34afa1def237e9455f1789853814a1dec32db242f03a49f4823b4")

Get "TypeError: __init__() got an unexpected keyword argument 'groups'" during ResNet init

I'm using the starter code.

import torch
model = torch.hub.load('facebookresearch/WSL-Images', 'resnext101_32x8d_wsl')
model.eval()

Error is

Using cache found in /root/.cache/torch/hub/facebookresearch_WSL-Images_master
Traceback (most recent call last):
  File "test.py", line 3, in <module>
    model = torch.hub.load('facebookresearch/WSL-Images', 'resnext101_32x8d_wsl')
  File "/root/.local/lib/python3.6/site-packages/torch/hub.py", line 363, in load
    model = entry(*args, **kwargs)
  File "/root/.cache/torch/hub/facebookresearch_WSL-Images_master/hubconf.py", line 39, in resnext101_32x8d_wsl
    return _resnext('resnext101_32x8d', Bottleneck, [3, 4, 23, 3], True, progress, **kwargs)
  File "/root/.cache/torch/hub/facebookresearch_WSL-Images_master/hubconf.py", line 23, in _resnext
    model = ResNet(block, layers, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'groups'

default branch name breaking torch.hub

Was the default branch recently renamed from master to main? This seems to break torch.hub loading unless you have a very recent build of pytorch installed. e.g., If you're on 1.9.1 Stable or 1.8.2 LTS and attempt to load a model such as MiDaS, it reports the following:

    self.netNetwork = torch.hub.load('intel-isl/MiDaS', 'MiDaS')
  File "/<removed>/lib/python3.8/site-packages/torch/hub.py", line 382, in load
    model = _load_local(repo_or_dir, model, *args, **kwargs)
  File "/<removed>/lib/python3.8/site-packages/torch/hub.py", line 411, in _load_local
    model = entry(*args, **kwargs)
  File "/<removed>/.cache/torch/hub/intel-isl_MiDaS_master/hubconf.py", line 61, in MiDaS
    model = MidasNet()
  File "/<removed>/.cache/torch/hub/intel-isl_MiDaS_master/midas/midas_net.py", line 30, in __init__
    self.pretrained, self.scratch = _make_encoder(backbone="resnext101_wsl", features=features, use_pretrained=use_pretrained)                       
  File "/<removed>/.cache/torch/hub/intel-isl_MiDaS_master/midas/blocks.py", line 37, in _make_encoder
    pretrained = _make_pretrained_resnext101_wsl(use_pretrained)
  File "/<removed>/.cache/torch/hub/intel-isl_MiDaS_master/midas/blocks.py", line 115, in _make_pretrained_resnext101_wsl                             
     resnet = torch.hub.load("facebookresearch/WSL-Images", "resnext101_32x8d_wsl")
  File "/<removed>/lib/python3.8/site-packages/torch/hub.py", line 380, in load
    repo_or_dir = _get_cache_or_reload(repo_or_dir, force_reload, verbose, skip_validation)
  File "/<removed>/lib/python3.8/site-packages/torch/hub.py", line 173, in _get_cache_or_reload
    _validate_not_a_forked_repo(repo_owner, repo_name, branch)
  File "/<removed>/lib/python3.8/site-packages/torch/hub.py", line 143, in _validate_not_a_forked_repo
    raise ValueError(f'Cannot find {branch} in https://github.com/{repo_owner}/{repo_name}. '
ValueError: Cannot find master in https://github.com/facebookresearch/WSL-Images. If it's a commit from a forked repo, please call hub.load() with f
orked repo directly.

Would it make sense to postpone renaming until after at least Stable has rolled out the required changes?

Here's some background on the issue.
pytorch/pytorch#63753

Significant numerical differences with torch.amp.autocast() compared with stock pretrained resnext101

When making inferences with torch.amp.autocast(), the forward results show significant numerical differences compared with pretrained resnext101_32x8d from torchvision as the sample outputs in the following given the same input batch:

Output from WSL pretrained resnext101_32x8d_wsl shows significant differences:

actual(w/ amp.autocast) = tensor([ 1.0162e-01,  3.0859e-01, -1.9760e-03,  3.7750e-02, -4.5996e-01,
        -7.2510e-01, -8.7402e-02, -9.4727e-01...698e-02, -1.4392e-01,
        -2.1533e-01, -5.7666e-01, -8.1787e-02,  1.8103e-01,  2.3596e-01],
       device='cuda:0')
expected(w/o amp.autocast) = tensor([ 1.2948e-01,  4.0339e-01,  6.4677e-02,  6.7963e-02, -3.6953e-01,
        -6.0408e-01, -1.5742e-01, -8.2637e-01...613e-02, -1.7330e-01,
        -2.1253e-01, -5.2314e-01, -1.2327e-01,  1.0499e-01,  1.7262e-01],
       device='cuda:0')

Output from torchvision pretrained resnext101_32x8d shows approximate numerical values:

actual(w/ amp.autocast) = tensor([-2.9844e+00, -6.8945e-01,  5.9668e-01, -1.2510e+00, -7.2168e-01,
        -2.1992e+00, -1.2686e+00, -5.0879e-01...953e+00, -4.4453e+00,
        -4.8984e+00, -3.2617e+00, -2.6641e+00, -2.2344e+00,  5.4922e+00],
       device='cuda:0')
expected(w/o amp.autocast) = tensor([-2.9887e+00, -6.8953e-01,  5.9514e-01, -1.2496e+00, -7.2139e-01,
        -2.2008e+00, -1.2737e+00, -5.1238e-01...971e+00, -4.4485e+00,
        -4.9002e+00, -3.2653e+00, -2.6683e+00, -2.2359e+00,  5.5001e+00],
       device='cuda:0')

Is it because the pretrained resnext101 from torchvision is already trained in mixed precision or something else?
Any clarifications would be appreciated.

PS: sample pytest code to load the models and run the tests:

import torch as th

@pytest.fixture
def batch_size():
    return 2

@pytest.fixture
def shape():
    return 3, 720, 1280

@pytest.fixture
def dev():
    return th.device('cuda') if th.cuda.is_available() else torch.device('cpu')

@pytest.fixture
def batch(batch_size, shape):
    return th.rand(batch_size, *shape)

@pytest.fixture
def x101_32x8d(dev):
    from torchvision.models.resnet import _resnet
    from torchvision.models.resnet import Bottleneck
    from torchvision.ops.misc import FrozenBatchNorm2d
    kwargs = {}
    frozen = True
    kwargs['groups'] = gs = kwargs.get('groups', 32)
    kwargs['width_per_group'] = gw = kwargs.get('width_per_group', 8)
    kwargs['norm_layer'] = kwargs.get('norm_layer', FrozenBatchNorm2d if frozen else None)
    arch = f"resnext101_{gs}x{gw}d"
    model = _resnet(arch, Bottleneck, [3, 4, 23, 3], True, True, **kwargs)
    model.to(dev).eval()
    return model

@pytest.fixture
def x101_32x8d_wsl(dev):
    from torchvision.ops.misc import FrozenBatchNorm2d
    kwargs = {}
    frozen = True
    kwargs['groups'] = gs = kwargs.get('groups', 32)
    kwargs['width_per_group'] = gw = kwargs.get('width_per_group', 8)
    kwargs['norm_layer'] = kwargs.get('norm_layer', FrozenBatchNorm2d if frozen else None)
    model = th.hub.load('facebookresearch/WSL-Images', 'resnext101_32x8d_wsl', **kwargs)
    model.to(dev).eval()
    return model

@pytest.mark.parametrize("B", [2])
def test_x101_amp(benchmark, x101_32x8d, dev, batch, B):
    model = x101_32x8d
    with th.no_grad():
        with th.cuda.amp.autocast(enabled=False):
            outputs_fp32 = model(batch[:B].to(dev)).float()
        with th.cuda.amp.autocast():
            outputs_amp = model(batch[:B].to(dev)).float()

    for i, (output_fp32, output_amp) in enumerate(zip(outputs_fp32, outputs_amp)):
        logging.info(f"output[{i}] shape={tuple(output_fp32.shape)}, norm_fp32={output_fp32.norm()}, norm_amp={output_amp.norm()}")
        th.testing.assert_allclose(output_amp, output_fp32, rtol=1e-03, atol=3e-04)

@pytest.mark.parametrize("B", [2])
def test_x101_wsl_amp(benchmark, x101_32x8d_wsl, dev, batch, B):
    model = x101_32x8d_wsl
    with th.no_grad():
        with th.cuda.amp.autocast(enabled=False):
            outputs_fp32 = model(batch[:B].to(dev)).float()
        with th.cuda.amp.autocast():
            outputs_amp = model(batch[:B].to(dev)).float()
    
    for i, (output_fp32, output_amp) in enumerate(zip(outputs_fp32, outputs_amp)):
        logging.info(f"output[{i}] shape={tuple(output_fp32.shape)}, norm_fp32={output_fp32.norm()}, norm_amp={output_amp.norm()}")
        th.testing.assert_allclose(output_amp, output_fp32, rtol=1e-03, atol=3e-04)

About performance in Cub-200 finetune.

Thank you for opensource this pretrained models.
I finetuned 32x32, 32x16 and 32x8 on Cub-200 dataset with 2 1080Ti, which only achive 87.5% accuracy on testset with image size 448x448. Image size with 224x224 only get 84.43% test acc. I find biger batch size got better performance but I only have 2 GPUs.
In the paper with 32x16 pretrained model reached 89.2 accuracy.
Could you please show more details about finetune.
Thanks.

Pretrained hashtag prediction model

Hello,

These models are the models ultimately used for Imagenet classification. I am looking for the pre-trained models used to predict hashtags. From the paper:

"In our experiments, we pre-train convolutional networks for hashtag prediction [...]"
"Full network finetuning is performed by removing the hashtag-specific fully
connected classification layer from the network"

I am looking for this hashtag prediction model for an open-source project which needs to clean and improve (by adding more) hashtags from Flickr.

At least it would help a lot knowing which layers of the available model on pytorch hub correspond to this underlying hashtag prediction model.

Thanks!

how can I do the final hashtag prediction using the pretrained model?

import torch
model = torch.hub.load('facebookresearch/WSL-Images', 'resnext101_32x8d_wsl')
# sample execution (requires torchvision)
# Download an example image from the pytorch website
import urllib
url, filename = ("https://github.com/pytorch/hub/raw/master/dog.jpg", "dog.jpg")
try: urllib.URLopener().retrieve(url, filename)
except: urllib.request.urlretrieve(url, filename)
from PIL import Image
from torchvision import transforms
# filename='/home/lalit/notebooks/Lalit/image_caption/pytorch-tutorial/tutorials/03-advanced/image_captioning/png/5e96fea74a01a44750ff4d36.png'
input_image = Image.open(filename)
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
input_tensor = preprocess(input_image)
input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model

# move the input and model to GPU for speed if available
if torch.cuda.is_available():
    input_batch = input_batch.to('cuda')
    model.to('cuda')

with torch.no_grad():
    output = model(input_batch)
# Tensor of shape 1000, with confidence scores over Imagenet's 1000 classes
# print(output[0])
# The output has unnormalized scores. To get probabilities, you can run a softmax on it.
print(torch.nn.functional.softmax(output[0], dim=0))

Will ResNeXt-101 32x4d be available?

I'm currently using ResNeXt-101_32x4d (pretrained on ImageNet) doing research on object detection. I want to try replace the ImageNet pretrained model with the WSL-Images pretrained one. So I am concerning where can I obtain the ResNeXt-101 32x4d model? Thanks.

cuda out of memory

Hi,
I am working on this code at Kaggle .com and facing this issue.
please suggest your valuable solution.
Capture

About training set

Hi,
Is there any access to the training data for training WSL, for instance, train-IG-940M-1.5k?
Jeff

404 error when pulling model from hub

I'm trying to pull resnext101_32x48d with the following code:

model = torch.hub.load('facebookresearch/WSL-Images', 'resnext101_32x48d_wsl')

but am receiving the following error

Downloading: "https://github.com/facebookresearch/WSL-Images/archive/master.zip" to /tmp/cache/torch/hub/master.zip
...
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

Passing 'pretrained' parameter to torch.hub.load raises error

๐Ÿ› Bug

Tried to load FB released pre-trained model on Instagram on PyTorch Hub. It works without passing the 'pretrained' parameter. But if the parameter is passed it raises error. I think this will cause inconsistency with loading of other models.

To Reproduce

Steps to reproduce the behavior:

model = torch.hub.load('facebookresearch/WSL-Images', 'resnext101_32x8d_wsl', pretrained=True)

TypeError: _resnext() got multiple values for argument 'pretrained'

Expected behavior

Load the architecture with random weights if the 'pretrained' parameter is False, and with the pretrained weights if it is True.

Error

ValueError: Cannot find master in https://github.com/facebookresearch/WSL-Images. If it's a commit from a forked repo, please call hub.load() with forked repo directly.ResNeXtModel
We are facing above error. Can Anyone help us we are working on kaggle

Details on performing inference

I am trying to run a variant of your model from Pytorch Hub.

I couldn't find any details in your paper (or the papers you mention) on exactly how I should prepare a standard image to get meaningful results.

I know the batch shape must be (batch_size, 3, 224, 224), and I also read in one of your linked papers:

The network input image is a 224ร—224 pixel random crop from an augmented image or its horizontal flip. The input image is normalized by the per-color mean and standard deviation, as in [12].

However, even after applying this to my basic images, I am getting a consistently incorrect prediction for a simple case (German Shepherd).

Is there any more information or code you can point to that will help?

OSError: [Errno 22] Invalid argument

OS: Windows 7
Python: v3.7.4
PyTorch: v1.2.0
Model: resnext101_32x48d_wsl

      1 import torch
----> 2 model = torch.hub.load('facebookresearch/WSL-Images', 'resnext101_32x48d_wsl')
      3 model.eval()

c:\dev\python37\lib\site-packages\torch\hub.py in load(github, model, *args, **kwargs)
    361     entry = _load_entry_from_hubconf(hub_module, model)
    362 
--> 363     model = entry(*args, **kwargs)
    364 
    365     sys.path.remove(repo_dir)

~/.cache\torch\hub\facebookresearch_WSL-Images_master/hubconf.py in resnext101_32x48d_wsl(progress, **kwargs)
     76     kwargs['groups'] = 32
     77     kwargs['width_per_group'] = 48
---> 78     return _resnext('resnext101_32x48d', Bottleneck, [3, 4, 23, 3], True, progress, **kwargs)

~/.cache\torch\hub\facebookresearch_WSL-Images_master/hubconf.py in _resnext(arch, block, layers, pretrained, progress, **kwargs)
     22 def _resnext(arch, block, layers, pretrained, progress, **kwargs):
     23     model = ResNet(block, layers, **kwargs)
---> 24     state_dict = load_state_dict_from_url(model_urls[arch], progress=progress)
     25     model.load_state_dict(state_dict)
     26     return model

c:\dev\python37\lib\site-packages\torch\hub.py in load_state_dict_from_url(url, model_dir, map_location, progress)
    461         hash_prefix = HASH_REGEX.search(filename).group(1)
    462         _download_url_to_file(url, cached_file, hash_prefix, progress=progress)
--> 463     return torch.load(cached_file, map_location=map_location)

c:\dev\python37\lib\site-packages\torch\serialization.py in load(f, map_location, pickle_module, **pickle_load_args)
    384         f = f.open('rb')
    385     try:
--> 386         return _load(f, map_location, pickle_module, **pickle_load_args)
    387     finally:
    388         if new_fd:

c:\dev\python37\lib\site-packages\torch\serialization.py in _load(f, map_location, pickle_module, **pickle_load_args)
    578     for key in deserialized_storage_keys:
    579         assert key in deserialized_objects
--> 580         deserialized_objects[key]._set_from_file(f, offset, f_should_read_directly)
    581         if offset is not None:
    582             offset = f.tell()

OSError: [Errno 22] Invalid argument

maybe because it is biggest pre-trained model (more than 2GB) and maybe because Python has bugs in pickle on Windows?

How to solve ZeroDivisionError: float division by zero?

Why i get ZeroDivisionError while trying resnext101_32x16d_wsl?

here is my model :

Model

device = torch.device("cuda:0")
model = torch.hub.load('facebookresearch/WSL-Images', 'resnext101_32x16d_wsl')
model.fc = torch.nn.Linear(2048, n_classes)

model.to(device)

criterion = torch.nn.BCEWithLogitsLoss()
plist = [{'params': model.parameters(), 'lr': 2e-5}]
optimizer = optim.Adam(plist, lr=2e-5)

model, optimizer = amp.initialize(model, optimizer, opt_level="O1")

HERE IS MY TRAINING LOG :

Epoch 0/0

Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32768.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 16384.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8192.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4096.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2048.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1024.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 512.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 256.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 128.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 64.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 16.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.5
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.25
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.125
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.0625
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.03125
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.015625
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.0078125
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.00390625
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.001953125
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.0009765625
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.00048828125
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.000244140625
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.0001220703125
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.103515625e-05
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.0517578125e-05
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.52587890625e-05
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.62939453125e-06
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.814697265625e-06
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.9073486328125e-06
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.5367431640625e-07
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.76837158203125e-07
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.384185791015625e-07
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1920928955078125e-07
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.960464477539063e-08
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.9802322387695312e-08
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.4901161193847656e-08
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.450580596923828e-09
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.725290298461914e-09
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.862645149230957e-09
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.313225746154785e-10
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.656612873077393e-10
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.3283064365386963e-10
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1641532182693481e-10
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.820766091346741e-11
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.9103830456733704e-11
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.4551915228366852e-11
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.275957614183426e-12
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.637978807091713e-12
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.8189894035458565e-12
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.094947017729282e-13
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.547473508864641e-13
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.2737367544323206e-13
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1368683772161603e-13
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.684341886080802e-14
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.842170943040401e-14
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.4210854715202004e-14
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.105427357601002e-15
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.552713678800501e-15
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.7763568394002505e-15
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.881784197001252e-16
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.440892098500626e-16
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.220446049250313e-16
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1102230246251565e-16
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.551115123125783e-17
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.7755575615628914e-17
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.3877787807814457e-17
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.938893903907228e-18
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.469446951953614e-18
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.734723475976807e-18
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.673617379884035e-19
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.336808689942018e-19
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.168404344971009e-19
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0842021724855044e-19
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.421010862427522e-20
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.710505431213761e-20
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.3552527156068805e-20
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.776263578034403e-21
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.3881317890172014e-21
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.6940658945086007e-21
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.470329472543003e-22
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.235164736271502e-22
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.117582368135751e-22
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0587911840678754e-22
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.293955920339377e-23
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.6469779601696886e-23
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.3234889800848443e-23
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.617444900424222e-24
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.308722450212111e-24
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.6543612251060553e-24
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.271806125530277e-25
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.1359030627651384e-25
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.0679515313825692e-25
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0339757656912846e-25
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.169878828456423e-26
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.5849394142282115e-26
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.2924697071141057e-26
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.462348535570529e-27
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.2311742677852644e-27
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.6155871338926322e-27
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.077935669463161e-28
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.0389678347315804e-28
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.0194839173657902e-28
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0097419586828951e-28
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.048709793414476e-29
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.524354896707238e-29
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.262177448353619e-29
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.310887241768095e-30
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.1554436208840472e-30
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.5777218104420236e-30
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.888609052210118e-31
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.944304526105059e-31
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.9721522630525295e-31
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.860761315262648e-32
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.930380657631324e-32
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.465190328815662e-32
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.232595164407831e-32
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.162975822039155e-33
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.0814879110195774e-33
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.5407439555097887e-33
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.703719777548943e-34
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.851859888774472e-34
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.925929944387236e-34
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.62964972193618e-35
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.81482486096809e-35
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.407412430484045e-35
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.2037062152420224e-35
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.018531076210112e-36
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.009265538105056e-36
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.504632769052528e-36
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.52316384526264e-37
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.76158192263132e-37
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.88079096131566e-37
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.4039548065783e-38
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.70197740328915e-38
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.350988701644575e-38
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1754943508222875e-38
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.877471754111438e-39
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.938735877055719e-39
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.4693679385278594e-39
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.346839692639297e-40
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.6734198463196485e-40
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.8367099231598242e-40
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.183549615799121e-41
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.591774807899561e-41
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.2958874039497803e-41
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1479437019748901e-41
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.739718509874451e-42
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.8698592549372254e-42
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.4349296274686127e-42
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.174648137343064e-43
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.587324068671532e-43
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.793662034335766e-43
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.96831017167883e-44
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.484155085839415e-44
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.2420775429197073e-44
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1210387714598537e-44
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.605193857299268e-45
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.802596928649634e-45
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.401298464324817e-45
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.006492321624085e-46
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.503246160812043e-46
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.7516230804060213e-46
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.758115402030107e-47
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.3790577010150533e-47
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.1895288505075267e-47
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0947644252537633e-47
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.473822126268817e-48
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.7369110631344083e-48
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.3684555315672042e-48
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.842277657836021e-49
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.4211388289180104e-49
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.7105694144590052e-49
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.552847072295026e-50
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.276423536147513e-50
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.1382117680737565e-50
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0691058840368783e-50
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.345529420184391e-51
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.6727647100921956e-51
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.3363823550460978e-51
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.681911775230489e-52
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.3409558876152446e-52
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.6704779438076223e-52
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.352389719038111e-53
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.176194859519056e-53
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.088097429759528e-53
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.044048714879764e-53
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.22024357439882e-54
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.61012178719941e-54
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.305060893599705e-54
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.525304467998525e-55
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.2626522339992623e-55
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.6313261169996311e-55
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.156630584998156e-56
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.078315292499078e-56
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.039157646249539e-56
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0195788231247695e-56
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.0978941156238473e-57
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.5489470578119236e-57
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.2744735289059618e-57
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.372367644529809e-58
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.1861838222649046e-58
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.5930919111324523e-58
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.965459555662261e-59
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.982729777831131e-59
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.9913648889155653e-59
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.956824444577827e-60
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.9784122222889134e-60
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.4892061111444567e-60
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.2446030555722283e-60
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.223015277861142e-61
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.111507638930571e-61
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.5557538194652854e-61
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.778769097326427e-62
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.8893845486632136e-62
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.9446922743316068e-62
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.723461371658034e-63
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.861730685829017e-63
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.4308653429145085e-63
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.2154326714572542e-63
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.077163357286271e-64
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.0385816786431356e-64
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.5192908393215678e-64
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.596454196607839e-65
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.7982270983039195e-65
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.8991135491519597e-65
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.495567745759799e-66
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.7477838728798994e-66
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.3738919364399497e-66
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1869459682199748e-66
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.934729841099874e-67
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.967364920549937e-67
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.4836824602749686e-67
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.418412301374843e-68
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.7092061506874214e-68
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.8546030753437107e-68
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.273015376718553e-69
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.636507688359277e-69
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.3182538441796384e-69
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1591269220898192e-69
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.795634610449096e-70
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.897817305224548e-70
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.448908652612274e-70
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.24454326306137e-71
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.622271631530685e-71
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.8111358157653425e-71
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.055679078826712e-72
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.527839539413356e-72
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.263919769706678e-72
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.131959884853339e-72
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.659799424266695e-73
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.8298997121333476e-73
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.4149498560666738e-73
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.074749280333369e-74
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.5373746401666845e-74
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.7686873200833423e-74
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.843436600416711e-75
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.421718300208356e-75
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.210859150104178e-75
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.105429575052089e-75
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.527147875260445e-76
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.7635739376302223e-76
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.3817869688151111e-76
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.908934844075556e-77
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.454467422037778e-77
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.727233711018889e-77
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.636168555094445e-78
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.3180842775472223e-78
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.1590421387736112e-78
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0795210693868056e-78
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.397605346934028e-79
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.698802673467014e-79
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.349401336733507e-79
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.747006683667535e-80
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.3735033418337674e-80
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.6867516709168837e-80
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.433758354584419e-81
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.2168791772922093e-81
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.1084395886461046e-81
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0542197943230523e-81
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.271098971615262e-82
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.635549485807631e-82
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.3177747429038154e-82
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.588873714519077e-83
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.2944368572595385e-83
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.6472184286297693e-83
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.236092143148846e-84
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.118046071574423e-84
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.0590230357872116e-84
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0295115178936058e-84
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.147557589468029e-85
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.5737787947340145e-85
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.2868893973670072e-85
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.434446986835036e-86
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.217223493417518e-86
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.608611746708759e-86
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.043058733543795e-87
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.021529366771898e-87
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.010764683385949e-87
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0053823416929744e-87
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.026911708464872e-88
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.513455854232436e-88
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.256727927116218e-88
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.28363963558109e-89
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.141819817790545e-89
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.5709099088952725e-89
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.854549544476363e-90
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.9272747722381812e-90
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.9636373861190906e-90
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.818186930595453e-91
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.909093465297727e-91
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.4545467326488633e-91
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.2272733663244316e-91
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.136366831622158e-92
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.068183415811079e-92
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.5340917079055395e-92
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.670458539527698e-93
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.835229269763849e-93
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.9176146348819244e-93
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.588073174409622e-94
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.794036587204811e-94
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.3970182936024055e-94
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1985091468012028e-94
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.992545734006014e-95
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.996272867003007e-95
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.4981364335015035e-95
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.490682167507517e-96
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.745341083753759e-96
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.8726705418768793e-96
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.363352709384397e-97
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.6816763546921983e-97
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.3408381773460992e-97
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1704190886730496e-97
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.852095443365248e-98
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.926047721682624e-98
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.463023860841312e-98
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.31511930420656e-99
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.65755965210328e-99
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.82877982605164e-99
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.1438991302582e-100
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.5719495651291e-100
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.28597478256455e-100
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.142987391282275e-100
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.714936956411375e-101
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.8574684782056875e-101
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.4287342391028437e-101
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.143671195514219e-102
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.5718355977571093e-102
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.7859177988785547e-102
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.929588994392773e-103
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.464794497196387e-103
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.2323972485981933e-103
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1161986242990967e-103
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.5809931214954833e-104
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.7904965607477417e-104
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.3952482803738708e-104
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.976241401869354e-105
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.488120700934677e-105
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.7440603504673385e-105
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.720301752336693e-106
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.3601508761683463e-106
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.1800754380841732e-106
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0900377190420866e-106
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.450188595210433e-107
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.7250942976052165e-107
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.3625471488026082e-107
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.812735744013041e-108
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.4063678720065206e-108
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.7031839360032603e-108
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.515919680016301e-109
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.257959840008151e-109
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.1289799200040754e-109
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0644899600020377e-109
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.3224498000101884e-110
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.6612249000050942e-110
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.3306124500025471e-110
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.653062250012736e-111
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.326531125006368e-111
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.663265562503184e-111
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.31632781251592e-112
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.15816390625796e-112
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.07908195312898e-112
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.03954097656449e-112
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.19770488282245e-113
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.598852441411225e-113
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.2994262207056124e-113
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.497131103528062e-114
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.248565551764031e-114
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.6242827758820155e-114
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.121413879410078e-115
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.060706939705039e-115
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.0303534698525194e-115
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0151767349262597e-115
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.075883674631299e-116
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.5379418373156492e-116
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.2689709186578246e-116
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.344854593289123e-117
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.1724272966445615e-117
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.5862136483222808e-117
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.931068241611404e-118
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.965534120805702e-118
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.982767060402851e-118
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.913835302014255e-119
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.9569176510071274e-119
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.4784588255035637e-119
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.2392294127517818e-119
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.196147063758909e-120
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.0980735318794546e-120
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.5490367659397273e-120
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.745183829698637e-121
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.8725919148493183e-121
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.9362959574246591e-121
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.681479787123296e-122
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.840739893561648e-122
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.420369946780824e-122
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.210184973390412e-122
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.05092486695206e-123
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.02546243347603e-123
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.512731216738015e-123
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.563656083690075e-124
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.7818280418450374e-124
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.8909140209225187e-124
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.454570104612593e-125
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.727285052306297e-125
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.3636425261531484e-125
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1818212630765742e-125
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.909106315382871e-126
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.9545531576914354e-126
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.4772765788457177e-126
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.386382894228589e-127
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.6931914471142943e-127
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.8465957235571472e-127
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.232978617785736e-128
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.616489308892868e-128
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.308244654446434e-128
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.154122327223217e-128
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.770611636116085e-129
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.8853058180580424e-129
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.4426529090290212e-129
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.213264545145106e-130
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.606632272572553e-130
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.8033161362862765e-130
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.016580681431383e-131
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.5082903407156913e-131
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.2541451703578456e-131
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1270725851789228e-131
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.635362925894614e-132
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.817681462947307e-132
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.4088407314736535e-132
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.044203657368268e-133
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.522101828684134e-133
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.761050914342067e-133
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.805254571710335e-134
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.4026272858551673e-134
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.2013136429275836e-134
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1006568214637918e-134
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.503284107318959e-135
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.7516420536594796e-135
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.3758210268297398e-135
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.879105134148699e-136
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.4395525670743494e-136
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.7197762835371747e-136
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.598881417685874e-137
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.299440708842937e-137
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.1497203544214684e-137
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0748601772107342e-137
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.374300886053671e-138
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.6871504430268355e-138
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.3435752215134178e-138
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.717876107567089e-139
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.3589380537835444e-139
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.6794690268917722e-139
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.397345134458861e-140
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.1986725672294305e-140
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.0993362836147152e-140
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0496681418073576e-140
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.248340709036788e-141
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.624170354518394e-141
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.312085177259197e-141
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.560425886295985e-142
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.2802129431479926e-142
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.6401064715739963e-142
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.200532357869981e-143
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.100266178934991e-143
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.0501330894674953e-143
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0250665447337477e-143
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.1253327236687384e-144
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.5626663618343692e-144
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.2813331809171846e-144
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.406665904585923e-145
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.2033329522929615e-145
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.6016664761464807e-145
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.008332380732404e-146
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.004166190366202e-146
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.002083095183101e-146
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0010415475915505e-146
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.0052077379577523e-147
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.5026038689788762e-147
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.2513019344894381e-147
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.256509672447191e-148
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.1282548362235952e-148
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.5641274181117976e-148
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.820637090558988e-149
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.910318545279494e-149
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.955159272639747e-149
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.775796363198735e-150
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.887898181599368e-150
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.443949090799684e-150
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.221974545399842e-150
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.10987272699921e-151
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.054936363499605e-151
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.5274681817498023e-151
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.637340908749012e-152
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.818670454374506e-152
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.909335227187253e-152
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.546676135936265e-153
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.7733380679681323e-153
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.3866690339840662e-153
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1933345169920331e-153
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.966672584960166e-154
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.983336292480083e-154
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.4916681462400413e-154
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.458340731200207e-155
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.7291703656001034e-155
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.8645851828000517e-155
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.322925914000258e-156
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.661462957000129e-156
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.3307314785000646e-156
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1653657392500323e-156
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.826828696250162e-157
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.913414348125081e-157
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.4567071740625404e-157
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.283535870312702e-158
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.641767935156351e-158
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.8208839675781755e-158
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.104419837890877e-159
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.552209918945439e-159
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.2761049594727193e-159
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1380524797363597e-159
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.6902623986817984e-160
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.8451311993408992e-160
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.4225655996704496e-160
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.112827998352248e-161
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.556413999176124e-161
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.778206999588062e-161
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.89103499794031e-162
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.445517498970155e-162
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.2227587494850775e-162
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1113793747425387e-162
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.556896873712694e-163
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.778448436856347e-163
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.3892242184281734e-163
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.946121092140867e-164
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.4730605460704336e-164
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.7365302730352168e-164
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.682651365176084e-165
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.341325682588042e-165
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.170662841294021e-165
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0853314206470105e-165
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.426657103235053e-166
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.7133285516175262e-166
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.3566642758087631e-166
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.783321379043816e-167
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.391660689521908e-167
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.695830344760954e-167
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.47915172380477e-168
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.239575861902385e-168
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.1197879309511924e-168
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0598939654755962e-168
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.299469827377981e-169
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.6497349136889905e-169
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.3248674568444952e-169
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.624337284222476e-170
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.312168642111238e-170
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.656084321055619e-170
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.280421605278095e-171
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.140210802639048e-171
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.070105401319524e-171
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.035052700659762e-171
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.17526350329881e-172
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.587631751649405e-172
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.2938158758247024e-172
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.469079379123512e-173
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.234539689561756e-173
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.617269844780878e-173
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.08634922390439e-174
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.043174611952195e-174
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.0215873059760975e-174
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0107936529880487e-174
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.053968264940244e-175
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.526984132470122e-175
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.263492066235061e-175
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.317460331175305e-176
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.1587301655876523e-176
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.5793650827938261e-176
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.896825413969131e-177
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.9484127069845653e-177
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.9742063534922827e-177
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.871031767461413e-178
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.935515883730707e-178
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.4677579418653533e-178
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.2338789709326767e-178
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.169394854663383e-179
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.084697427331692e-179
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.542348713665846e-179
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.71174356832923e-180
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.855871784164615e-180
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.9279358920823073e-180
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.639679460411536e-181
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.819839730205768e-181
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.409919865102884e-181
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.204959932551442e-181
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.02479966275721e-182
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.012399831378605e-182
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.5061999156893026e-182
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.530999578446513e-183
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.7654997892232564e-183
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.8827498946116282e-183
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.413749473058141e-184
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.706874736529071e-184
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.3534373682645353e-184
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1767186841322676e-184
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.883593420661338e-185
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.941796710330669e-185
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.4708983551653345e-185
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.354491775826673e-186
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.6772458879133364e-186
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.8386229439566682e-186
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.193114719783341e-187
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.5965573598916705e-187
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.2982786799458352e-187
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1491393399729176e-187
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.745696699864588e-188
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.872848349932294e-188
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.436424174966147e-188
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.182120874830735e-189
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.5910604374153675e-189
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.7955302187076838e-189
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.977651093538419e-190
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.4888255467692094e-190
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.2444127733846047e-190
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1222063866923024e-190
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.611031933461512e-191
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.805515966730756e-191
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.402757983365378e-191
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.01378991682689e-192
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.506894958413445e-192
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.7534474792067224e-192
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.767237396033612e-193
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.383618698016806e-193
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.191809349008403e-193
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0959046745042015e-193
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.479523372521008e-194
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.739761686260504e-194
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.369880843130252e-194
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.84940421565126e-195
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.42470210782563e-195
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.712351053912815e-195
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.561755269564074e-196
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.280877634782037e-196
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.1404388173910186e-196
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0702194086955093e-196
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.351097043477547e-197
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.6755485217387732e-197
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.3377742608693866e-197
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.688871304346933e-198
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.3444356521734666e-198
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.6722178260867333e-198
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.361089130433666e-199
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.180544565216833e-199
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.0902722826084166e-199
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0451361413042083e-199
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.225680706521042e-200
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.612840353260521e-200
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.3064201766302604e-200
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.532100883151302e-201
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.266050441575651e-201
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.6330252207878255e-201
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.165126103939127e-202
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.082563051969564e-202
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.041281525984782e-202
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.020640762992391e-202
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.103203814961955e-203
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.5516019074809773e-203
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.2758009537404886e-203
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.379004768702443e-204
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.1895023843512216e-204
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.5947511921756108e-204
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.973755960878054e-205
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.986877980439027e-205
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.9934389902195135e-205
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.967194951097568e-206
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.983597475548784e-206
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.491798737774392e-206
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.245899368887196e-206
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.22949684443598e-207
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.11474842221799e-207
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.557374211108995e-207
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.786871055544975e-208
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.8934355277724873e-208
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.9467177638862437e-208
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.733588819431218e-209
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.866794409715609e-209
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.4333972048578046e-209
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.2166986024289023e-209
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.083493012144512e-210
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.041746506072256e-210
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.520873253036128e-210
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.60436626518064e-211
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.80218313259032e-211
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.90109156629516e-211
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.5054578314758e-212
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.7527289157379e-212
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.37636445786895e-212
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.188182228934475e-212
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.940911144672375e-213
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.9704555723361872e-213
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.4852277861680936e-213
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.426138930840468e-214
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.713069465420234e-214
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.856534732710117e-214
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.282673663550585e-215
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.641336831775293e-215
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.3206684158876463e-215
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1603342079438231e-215
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.801671039719116e-216
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.900835519859558e-216
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.450417759929779e-216
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.252088799648895e-217
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.6260443998244473e-217
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.8130221999122236e-217
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.065110999561118e-218
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.532555499780559e-218
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.2662777498902796e-218
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1331388749451398e-218
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.665694374725699e-219
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.8328471873628494e-219
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.4164235936814247e-219
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.082117968407124e-220
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.541058984203562e-220
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.770529492101781e-220
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.852647460508905e-221
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.4263237302544523e-221
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.2131618651272261e-221
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1065809325636131e-221
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.5329046628180653e-222
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.7664523314090327e-222
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.3832261657045163e-222
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.916130828522582e-223
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.458065414261291e-223
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.7290327071306454e-223
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.645163535653227e-224
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.322581767826614e-224
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.161290883913307e-224
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0806454419566534e-224
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.403227209783267e-225
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.7016136048916335e-225
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.3508068024458167e-225
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.754034012229084e-226
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.377017006114542e-226
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.688508503057271e-226
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.442542515286355e-227
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.2212712576431773e-227
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.1106356288215886e-227
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0553178144107943e-227
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.276589072053972e-228
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.638294536026986e-228
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.319147268013493e-228
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.595736340067465e-229
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.2978681700337323e-229
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.6489340850168661e-229
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.244670425084331e-230
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.1223352125421653e-230
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.0611676062710827e-230
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0305838031355413e-230
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.152919015677707e-231
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.5764595078388533e-231
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.2882297539194267e-231
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.441148769597133e-232
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.220574384798567e-232
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.6102871923992833e-232
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.051435961996417e-233
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.0257179809982083e-233
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.0128589904991042e-233
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0064294952495521e-233
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.0321474762477604e-234
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.5160737381238802e-234
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.2580368690619401e-234
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.290184345309701e-235
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.1450921726548502e-235
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.5725460863274251e-235
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.862730431637126e-236
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.931365215818563e-236
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.9656826079092814e-236
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.828413039546407e-237
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.914206519773204e-237
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.457103259886602e-237
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.228551629943301e-237
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.142758149716505e-238
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.0713790748582522e-238
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.5356895374291261e-238
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.678447687145631e-239
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.8392238435728152e-239
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.9196119217864076e-239
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.598059608932038e-240
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.799029804466019e-240
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.3995149022330095e-240
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1997574511165048e-240
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.998787255582524e-241
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.999393627791262e-241
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.499696813895631e-241
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.498484069478155e-242
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.7492420347390774e-242
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.8746210173695387e-242
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.373105086847693e-243
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.686552543423847e-243
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.3432762717119234e-243
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1716381358559617e-243
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.858190679279809e-244
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.9290953396399042e-244
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.4645476698199521e-244
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.322738349099761e-245
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.6613691745498803e-245
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.8306845872749401e-245
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.153422936374701e-246
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.5767114681873503e-246
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.2883557340936752e-246
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1441778670468376e-246
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.720889335234188e-247
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.860444667617094e-247
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.430222333808547e-247
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.151111669042735e-248
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.5755558345213674e-248
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.7877779172606837e-248
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.938889586303419e-249
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.4694447931517093e-249
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.2347223965758547e-249
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1173611982879273e-249
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.586805991439637e-250
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.7934029957198183e-250
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.3967014978599092e-250
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.983507489299546e-251
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.491753744649773e-251
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.7458768723248864e-251
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.729384361624432e-252
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.364692180812216e-252
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.182346090406108e-252
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.091173045203054e-252
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.45586522601527e-253
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.727932613007635e-253
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.3639663065038175e-253
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.819831532519088e-254
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.409915766259544e-254
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.704957883129772e-254
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.52478941564886e-255
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.26239470782443e-255
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.131197353912215e-255
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0655986769561075e-255
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.327993384780537e-256
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.6639966923902686e-256
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.3319983461951343e-256
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.659991730975672e-257
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.329995865487836e-257
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.664997932743918e-257
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.32498966371959e-258
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.162494831859795e-258
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.0812474159298974e-258
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0406237079649487e-258
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.2031185398247434e-259
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.6015592699123717e-259
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.3007796349561859e-259
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.503898174780929e-260
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.2519490873904646e-260
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.6259745436952323e-260
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.129872718476162e-261
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.064936359238081e-261
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.0324681796190404e-261
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0162340898095202e-261
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.081170449047601e-262
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.5405852245238005e-262
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.2702926122619002e-262
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.351463061309501e-263
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.1757315306547506e-263
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.5878657653273753e-263
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.939328826636877e-264
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.9696644133184383e-264
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.9848322066592191e-264
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.924161033296096e-265
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.962080516648048e-265
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.481040258324024e-265
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.240520129162012e-265
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.20260064581006e-266
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.10130032290503e-266
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.550650161452515e-266
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.753250807262575e-267
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.8766254036312874e-267
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.9383127018156437e-267
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.691563509078218e-268
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.845781754539109e-268
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.4228908772695546e-268
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.2114454386347773e-268
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.057227193173887e-269
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.0286135965869433e-269
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.5143067982934716e-269
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.571533991467358e-270
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.785766995733679e-270
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.8928834978668395e-270
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.464417489334198e-271
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.732208744667099e-271
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.3661043723335494e-271
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1830521861667747e-271
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.915260930833874e-272
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.957630465416937e-272
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.4788152327084684e-272
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.394076163542342e-273
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.697038081771171e-273
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.8485190408855855e-273
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.242595204427927e-274
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.621297602213964e-274
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.310648801106982e-274
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.155324400553491e-274
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.776622002767455e-275
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.8883110013837273e-275
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.4441555006918637e-275
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.220777503459318e-276
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.610388751729659e-276
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.8051943758648296e-276
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.025971879324148e-277
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.512985939662074e-277
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.256492969831037e-277
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1282464849155185e-277
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.641232424577593e-278
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.8206162122887962e-278
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.4103081061443981e-278
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.051540530721991e-279
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.5257702653609953e-279
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.7628851326804976e-279
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.814425663402488e-280
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.407212831701244e-280
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.203606415850622e-280
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.101803207925311e-280
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.509016039626555e-281
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.7545080198132776e-281
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.3772540099066388e-281
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.886270049533194e-282
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.443135024766597e-282
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.7215675123832985e-282
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.607837561916492e-283
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.303918780958246e-283
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.151959390479123e-283
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0759796952395615e-283
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.379898476197808e-284
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.689949238098904e-284
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.344974619049452e-284
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.72487309524726e-285
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.36243654762363e-285
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.681218273811815e-285
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.406091369059075e-286
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.2030456845295373e-286
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.1015228422647686e-286
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0507614211323843e-286
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.253807105661922e-287
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.626903552830961e-287
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.3134517764154804e-287
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.567258882077402e-288
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.283629441038701e-288
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.6418147205193505e-288
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.209073602596753e-289
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.1045368012983762e-289
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.0522684006491881e-289
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0261342003245941e-289
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.1306710016229703e-290
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.5653355008114852e-290
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.2826677504057426e-290
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.413338752028713e-291
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.2066693760143564e-291
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.6033346880071782e-291
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.016673440035891e-292
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.008336720017946e-292
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.004168360008973e-292
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0020841800044864e-292
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.010420900022432e-293
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.505210450011216e-293
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.252605225005608e-293
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.26302612502804e-294
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.13151306251402e-294
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.56575653125701e-294
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.82878265628505e-295
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.914391328142525e-295
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.9571956640712625e-295
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.785978320356312e-296
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.892989160178156e-296
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.446494580089078e-296
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.223247290044539e-296
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.116236450222695e-297
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.0581182251113476e-297
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.5290591125556738e-297
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.645295562778369e-298
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.8226477813891845e-298
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.9113238906945923e-298
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.556619453472961e-299
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.778309726736481e-299
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.3891548633682403e-299
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1945774316841202e-299
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.972887158420601e-300
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.9864435792103004e-300
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.4932217896051502e-300
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.466108948025751e-301
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.7330544740128755e-301
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.8665272370064378e-301
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.332636185032189e-302
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.6663180925160944e-302
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.3331590462580472e-302
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1665795231290236e-302
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.832897615645118e-303
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.916448807822559e-303
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.4582244039112795e-303
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.291122019556398e-304
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.645561009778199e-304
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.8227805048890994e-304
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.113902524445497e-305
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.5569512622227484e-305
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.2784756311113742e-305
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1392378155556871e-305
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.696189077778436e-306
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.848094538889218e-306
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.424047269444609e-306
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.120236347223045e-307
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.5601181736115222e-307
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.7800590868057611e-307
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.900295434028806e-308
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.450147717014403e-308
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.2250738585072014e-308
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1125369292536007e-308
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.562684646268003e-309
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.781342323134e-309
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.390671161567e-309
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.953355807835e-310
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.4766779039175e-310
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.73833895195875e-310
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.691694759794e-311
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.345847379897e-311
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.1729236899484e-311
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.086461844974e-311
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.43230922487e-312
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.716154612436e-312
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.35807730622e-312
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.7903865311e-313
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.39519326554e-313
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.69759663277e-313
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.487983164e-314
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.243991582e-314
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.121995791e-314
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0609978955e-314
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.304989477e-315
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.65249474e-315
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.32624737e-315
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.63123685e-316
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.3156184e-316
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.6578092e-316
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.289046e-317
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.144523e-317
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.0722615e-317
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.036131e-317
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.180654e-318
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.590327e-318
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.295163e-318
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.4758e-319
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.2379e-319
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.61895e-319
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.095e-320
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.0474e-320
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.0237e-320
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.012e-320
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.06e-321
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.53e-321
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.265e-321
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.3e-322
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.16e-322
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.6e-322
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8e-323
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4e-323
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2e-323
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1e-323
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5e-324
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.0

ZeroDivisionError Traceback (most recent call last)
in ()
23
24 with amp.scale_loss(loss, optimizer) as scaled_loss:
---> 25 scaled_loss.backward()
26
27 tr_loss += loss.item()

4 frames
/usr/local/lib/python3.6/dist-packages/apex/amp/scaler.py in unscale_with_stashed(self, model_grads, stashed_master_grads, master_grads, scale_override)
174 self._overflow_buf,
175 [model_grads, stashed_master_grads, master_grads],
--> 176 out_scale/grads_have_scale, # 1./scale,
177 out_scale/stashed_have_scale, # 1.0,
178 0) # check only arg 0, aka the incoming model grads, for infs

ZeroDivisionError: float division by zero

Model performance

Hi, I have to ask a question. it seems that the model resnext101_32x48d is slower and it is taking the real time of the video while extracting the features. Is there any way to run it faster?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.