Code Monkey home page Code Monkey logo

Comments (9)

niujinshuchong avatar niujinshuchong commented on August 18, 2024

@HantingChen
Please create train_pkl or val_pkl manually. Otherwise, the first run cannot save the *.pkl file in train_pkl or val_pkl because it cannot find those folders.

Thank you very much for sharing the error.

from as-mlp.

HantingChen avatar HantingChen commented on August 18, 2024

@HantingChen
Please create train_pkl or val_pkl manually. Otherwise, the first run cannot save the *.pkl file in train_pkl or val_pkl because it cannot find those folders.

Thank you very much for sharing the error.

I have created train_pkl folder, and the first run did save the *.pkl file.
You can see that the second running did not try to produce the pkl file.
However, the size of the pkl file is 0. There may be something wrong when saving the pkl file.

from as-mlp.

niujinshuchong avatar niujinshuchong commented on August 18, 2024

@HantingChen
Also please note that if you create *.pkl using 8 gpus and then if you want to train a model with different gpus, you should regenerate those *.pkl files again.

from as-mlp.

niujinshuchong avatar niujinshuchong commented on August 18, 2024

@HantingChen
Please create train_pkl or val_pkl manually. Otherwise, the first run cannot save the *.pkl file in train_pkl or val_pkl because it cannot find those folders.
Thank you very much for sharing the error.

I have created train_pkl folder, and the first run did save the *.pkl file.
You can see that the second running did not try to produce the pkl file.
However, the size of the pkl file is 0. There may be something wrong when saving the pkl file.

@HantingChen I just cloned the code and tested it. It can create *.pkl files with cache-mode part. (PS. my pickle version is 4.0)

Would you please try it again and attach the full log.

from as-mlp.

HantingChen avatar HantingChen commented on August 18, 2024

The full log is attatched below....My version is also 4.0

The first running log:

./train_pkl/samples_bytes_0.pkl
global_rank 0 cached 0/1281167 takes 0.00s per block
global_rank 0 cached 128116/1281167 takes 21.24s per block
global_rank 0 cached 256232/1281167 takes 19.01s per block
global_rank 0 cached 384348/1281167 takes 18.36s per block
global_rank 0 cached 512464/1281167 takes 29.66s per block
global_rank 0 cached 640580/1281167 takes 35.94s per block
global_rank 0 cached 768696/1281167 takes 36.32s per block
global_rank 0 cached 896812/1281167 takes 35.54s per block
global_rank 0 cached 1024928/1281167 takes 37.19s per block
global_rank 0 cached 1153044/1281167 takes 46.55s per block
global_rank 0 cached 1281160/1281167 takes 50.39s per block
Traceback (most recent call last):
File "/home/ma-user/anaconda3/envs/Pytorch-1.4.0/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/ma-user/anaconda3/envs/Pytorch-1.4.0/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/ma-user/anaconda3/envs/Pytorch-1.4.0/lib/python3.6/site-packages/torch/distributed/launch.py", line 260, in
main()
File "/home/ma-user/anaconda3/envs/Pytorch-1.4.0/lib/python3.6/site-packages/torch/distributed/launch.py", line 256, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home/ma-user/anaconda3/envs/Pytorch-1.4.0/bin/python', '-u', 'main.py', '--local_rank=0', '--cfg', 'configs/as_base_patch4_shift5_224.yaml', '--data-path', '/cache/imagenet/imagenet/', '--eval', '--resume', '/cache/model/asmlp_base_patch4_shift5_224.pth', '--moxfile', '0']' died with <Signals.SIGKILL: 9>.

The second running log:

./train_pkl/samples_bytes_0.pkl
Traceback (most recent call last):
File "main.py", line 349, in
main(config)
File "main.py", line 78, in main
dataset_train, dataset_val, data_loader_train, data_loader_val, mixup_fn = build_loader(config)
File "/home/ma-user/work/AS-MLP-main/data/build.py", line 17, in build_loader
dataset_train, config.MODEL.NUM_CLASSES = build_dataset(is_train=True, config=config)
File "/home/ma-user/work/AS-MLP-main/data/build.py", line 80, in build_dataset
cache_mode=config.DATA.CACHE_MODE if is_train else 'part')
File "/home/ma-user/work/AS-MLP-main/data/cached_image_folder.py", line 250, in init
cache_mode=cache_mode)
File "/home/ma-user/work/AS-MLP-main/data/cached_image_folder.py", line 122, in init
self.init_cache()
File "/home/ma-user/work/AS-MLP-main/data/cached_image_folder.py", line 137, in init_cache
self.samples = pickle.load(handle)
EOFError: Ran out of input
Traceback (most recent call last):
File "/home/ma-user/anaconda3/envs/Pytorch-1.4.0/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/ma-user/anaconda3/envs/Pytorch-1.4.0/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/ma-user/anaconda3/envs/Pytorch-1.4.0/lib/python3.6/site-packages/torch/distributed/launch.py", line 260, in
main()
File "/home/ma-user/anaconda3/envs/Pytorch-1.4.0/lib/python3.6/site-packages/torch/distributed/launch.py", line 256, in main
cmd=cmd)
subprocess.CalledProcessError: Command '['/home/ma-user/anaconda3/envs/Pytorch-1.4.0/bin/python', '-u', 'main.py', '--local_rank=0', '--cfg', 'configs/as_base_patch4_shift5_224.yaml', '--data-path', '/cache/imagenet/imagenet/', '--eval', '--resume', '/cache/model/asmlp_base_patch4_shift5_224.pth', '--moxfile', '0']' returned non-zero exit status 1.

from as-mlp.

niujinshuchong avatar niujinshuchong commented on August 18, 2024

I also tested the code with 1 gpu. The output looks like this:

`
CUDA_VISIBLE_DEVICES=9 python -m torch.distributed.launch --nproc_per_node 1 --master_port 12345 main.py --cfg configs/as_tiny_patch4_shift5_224.yaml --data-path /root/fake_data/ImageNet-Zip/ --batch-size 64 --cache-mode part --accumulation-steps 2

=> merge config from configs/as_tiny_patch4_shift5_224.yaml
RANK and WORLD_SIZE in environ: 0/1
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
[2021-07-22 09:01:08 asmlp_tiny_patch4_shift5_224](main.py 340): INFO Full config saved to output/asmlp_tiny_patch4_shift5_224/default/config.json
[2021-07-22 09:01:08 asmlp_tiny_patch4_shift5_224](main.py 343): INFO AMP_OPT_LEVEL: O1
AUG:
AUTO_AUGMENT: rand-m9-mstd0.5-inc1
COLOR_JITTER: 0.4
CUTMIX: 1.0
CUTMIX_MINMAX: null
MIXUP: 0.8
MIXUP_MODE: batch
MIXUP_PROB: 1.0
MIXUP_SWITCH_PROB: 0.5
RECOUNT: 1
REMODE: pixel
REPROB: 0.25
BASE:

  • ''
    DATA:
    BATCH_SIZE: 64
    CACHE_MODE: part
    DATASET: imagenet
    DATA_PATH: /root/fake_data/ImageNet-Zip/
    IMG_SIZE: 224
    INTERPOLATION: bicubic
    NUM_WORKERS: 8
    PIN_MEMORY: true
    ZIP_MODE: false
    EVAL_MODE: false
    LOCAL_RANK: 0
    MODEL:
    ASMLP:
    DEPTHS:
    • 2
    • 2
    • 6
    • 2
      EMBED_DIM: 96
      IN_CHANS: 3
      MLP_RATIO: 4.0
      PATCH_NORM: true
      PATCH_SIZE: 4
      SHIFT_SIZE: 3
      DROP_PATH_RATE: 0.2
      DROP_RATE: 0.0
      LABEL_SMOOTHING: 0.1
      NAME: asmlp_tiny_patch4_shift5_224
      NUM_CLASSES: 1000
      RESUME: ''
      TYPE: asmlp
      OUTPUT: output/asmlp_tiny_patch4_shift5_224/default
      PRINT_FREQ: 10
      SAVE_FREQ: 1
      SEED: 0
      TAG: default
      TEST:
      CROP: true
      THROUGHPUT_MODE: false
      TRAIN:
      ACCUMULATION_STEPS: 2
      AUTO_RESUME: true
      BASE_LR: 0.000125
      CLIP_GRAD: 5.0
      EPOCHS: 300
      LR_SCHEDULER:
      DECAY_EPOCHS: 30
      DECAY_RATE: 0.1
      NAME: cosine
      MIN_LR: 1.25e-06
      OPTIMIZER:
      BETAS:
    • 0.9
    • 0.999
      EPS: 1.0e-08
      MOMENTUM: 0.9
      NAME: adamw
      START_EPOCH: 0
      USE_CHECKPOINT: false
      WARMUP_EPOCHS: 20
      WARMUP_LR: 1.25e-07
      WEIGHT_DECAY: 0.05

in part /root/fake_data/ImageNet-Zip/
./train_pkl/samples_bytes_0.pkl
global_rank 0 cached 0/50000 takes 0.00s per block
global_rank 0 cached 5000/50000 takes 1.76s per block
global_rank 0 cached 10000/50000 takes 1.71s per block
global_rank 0 cached 15000/50000 takes 1.76s per block
global_rank 0 cached 20000/50000 takes 1.77s per block
global_rank 0 cached 25000/50000 takes 1.88s per block
global_rank 0 cached 30000/50000 takes 1.87s per block
global_rank 0 cached 35000/50000 takes 1.83s per block
global_rank 0 cached 40000/50000 takes 1.80s per block
global_rank 0 cached 45000/50000 takes 1.86s per block
local rank 0 / global rank 0 successfully build train dataset
in part /root/fake_data/ImageNet-Zip/
./val_pkl/samples_bytes_0.pkl
global_rank 0 cached 0/50000 takes 0.00s per block
global_rank 0 cached 5000/50000 takes 1.54s per block
global_rank 0 cached 10000/50000 takes 1.60s per block
global_rank 0 cached 15000/50000 takes 1.39s per block
global_rank 0 cached 20000/50000 takes 1.48s per block
global_rank 0 cached 25000/50000 takes 1.40s per block
global_rank 0 cached 30000/50000 takes 1.24s per block
global_rank 0 cached 35000/50000 takes 1.53s per block
global_rank 0 cached 40000/50000 takes 1.52s per block
global_rank 0 cached 45000/50000 takes 1.46s per block
local rank 0 / global rank 0 successfully build val dataset
[2021-07-22 09:02:18 asmlp_tiny_patch4_shift5_224](main.py 76): INFO Creating model:asmlp/asmlp_tiny_patch4_shift5_224
[2021-07-22 09:02:18 asmlp_tiny_patch4_shift5_224](main.py 79): INFO AS_MLP(
(patch_embed): PatchEmbed(
(proj): Conv2d(3, 96, kernel_size=(4, 4), stride=(4, 4))
(norm): GroupNorm(1, 96, eps=1e-05, affine=True)
)
(pos_drop): Dropout(p=0.0, inplace=False)
(layers): ModuleList(
(0): BasicLayer(
dim=96, input_resolution=(56, 56), depth=2
(blocks): ModuleList(
(0): AxialShiftedBlock(
dim=96, input_resolution=(56, 56), shift_size=3, mlp_ratio=4.0
(norm1): GroupNorm(1, 96, eps=1e-05, affine=True)
(axial_shift): AxialShift(
dim=96, shift_size=3
(conv1): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1))
(conv2_1): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1))
(conv2_2): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1))
(conv3): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1))
(actn): GELU()
(norm1): GroupNorm(1, 96, eps=1e-05, affine=True)
(norm2): GroupNorm(1, 96, eps=1e-05, affine=True)
(shift_dim2): Shift()
(shift_dim3): Shift()
)
(drop_path): Identity()
(norm2): GroupNorm(1, 96, eps=1e-05, affine=True)
(mlp): Mlp(
(fc1): Conv2d(96, 384, kernel_size=(1, 1), stride=(1, 1))
(act): GELU()
(fc2): Conv2d(384, 96, kernel_size=(1, 1), stride=(1, 1))
(drop): Dropout(p=0.0, inplace=False)
)
)
(1): AxialShiftedBlock(
dim=96, input_resolution=(56, 56), shift_size=3, mlp_ratio=4.0
(norm1): GroupNorm(1, 96, eps=1e-05, affine=True)
(axial_shift): AxialShift(
dim=96, shift_size=3
(conv1): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1))
(conv2_1): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1))
(conv2_2): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1))
(conv3): Conv2d(96, 96, kernel_size=(1, 1), stride=(1, 1))
(actn): GELU()
(norm1): GroupNorm(1, 96, eps=1e-05, affine=True)
(norm2): GroupNorm(1, 96, eps=1e-05, affine=True)
(shift_dim2): Shift()
(shift_dim3): Shift()
)
(drop_path): DropPath()
(norm2): GroupNorm(1, 96, eps=1e-05, affine=True)
(mlp): Mlp(
(fc1): Conv2d(96, 384, kernel_size=(1, 1), stride=(1, 1))
(act): GELU()
(fc2): Conv2d(384, 96, kernel_size=(1, 1), stride=(1, 1))
(drop): Dropout(p=0.0, inplace=False)
)
)
)
(downsample): PatchMerging(
input_resolution=(56, 56), dim=96
(reduction): Conv2d(384, 192, kernel_size=(1, 1), stride=(1, 1), bias=False)
(norm): GroupNorm(1, 384, eps=1e-05, affine=True)
)
)
(1): BasicLayer(
dim=192, input_resolution=(28, 28), depth=2
(blocks): ModuleList(
(0): AxialShiftedBlock(
dim=192, input_resolution=(28, 28), shift_size=3, mlp_ratio=4.0
(norm1): GroupNorm(1, 192, eps=1e-05, affine=True)
(axial_shift): AxialShift(
dim=192, shift_size=3
(conv1): Conv2d(192, 192, kernel_size=(1, 1), stride=(1, 1))
(conv2_1): Conv2d(192, 192, kernel_size=(1, 1), stride=(1, 1))
(conv2_2): Conv2d(192, 192, kernel_size=(1, 1), stride=(1, 1))
(conv3): Conv2d(192, 192, kernel_size=(1, 1), stride=(1, 1))
(actn): GELU()
(norm1): GroupNorm(1, 192, eps=1e-05, affine=True)
(norm2): GroupNorm(1, 192, eps=1e-05, affine=True)
(shift_dim2): Shift()
(shift_dim3): Shift()
)
(drop_path): DropPath()
(norm2): GroupNorm(1, 192, eps=1e-05, affine=True)
(mlp): Mlp(
(fc1): Conv2d(192, 768, kernel_size=(1, 1), stride=(1, 1))
(act): GELU()
(fc2): Conv2d(768, 192, kernel_size=(1, 1), stride=(1, 1))
(drop): Dropout(p=0.0, inplace=False)
)
)
(1): AxialShiftedBlock(
dim=192, input_resolution=(28, 28), shift_size=3, mlp_ratio=4.0
(norm1): GroupNorm(1, 192, eps=1e-05, affine=True)
(axial_shift): AxialShift(
dim=192, shift_size=3
(conv1): Conv2d(192, 192, kernel_size=(1, 1), stride=(1, 1))
(conv2_1): Conv2d(192, 192, kernel_size=(1, 1), stride=(1, 1))
(conv2_2): Conv2d(192, 192, kernel_size=(1, 1), stride=(1, 1))
(conv3): Conv2d(192, 192, kernel_size=(1, 1), stride=(1, 1))
(actn): GELU()
(norm1): GroupNorm(1, 192, eps=1e-05, affine=True)
(norm2): GroupNorm(1, 192, eps=1e-05, affine=True)
(shift_dim2): Shift()
(shift_dim3): Shift()
)
(drop_path): DropPath()
(norm2): GroupNorm(1, 192, eps=1e-05, affine=True)
(mlp): Mlp(
(fc1): Conv2d(192, 768, kernel_size=(1, 1), stride=(1, 1))
(act): GELU()
(fc2): Conv2d(768, 192, kernel_size=(1, 1), stride=(1, 1))
(drop): Dropout(p=0.0, inplace=False)
)
)
)
(downsample): PatchMerging(
input_resolution=(28, 28), dim=192
(reduction): Conv2d(768, 384, kernel_size=(1, 1), stride=(1, 1), bias=False)
(norm): GroupNorm(1, 768, eps=1e-05, affine=True)
)
)
(2): BasicLayer(
dim=384, input_resolution=(14, 14), depth=6
(blocks): ModuleList(
(0): AxialShiftedBlock(
dim=384, input_resolution=(14, 14), shift_size=3, mlp_ratio=4.0
(norm1): GroupNorm(1, 384, eps=1e-05, affine=True)
(axial_shift): AxialShift(
dim=384, shift_size=3
(conv1): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
(conv2_1): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
(conv2_2): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
(conv3): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
(actn): GELU()
(norm1): GroupNorm(1, 384, eps=1e-05, affine=True)
(norm2): GroupNorm(1, 384, eps=1e-05, affine=True)
(shift_dim2): Shift()
(shift_dim3): Shift()
)
(drop_path): DropPath()
(norm2): GroupNorm(1, 384, eps=1e-05, affine=True)
(mlp): Mlp(
(fc1): Conv2d(384, 1536, kernel_size=(1, 1), stride=(1, 1))
(act): GELU()
(fc2): Conv2d(1536, 384, kernel_size=(1, 1), stride=(1, 1))
(drop): Dropout(p=0.0, inplace=False)
)
)
(1): AxialShiftedBlock(
dim=384, input_resolution=(14, 14), shift_size=3, mlp_ratio=4.0
(norm1): GroupNorm(1, 384, eps=1e-05, affine=True)
(axial_shift): AxialShift(
dim=384, shift_size=3
(conv1): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
(conv2_1): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
(conv2_2): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
(conv3): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
(actn): GELU()
(norm1): GroupNorm(1, 384, eps=1e-05, affine=True)
(norm2): GroupNorm(1, 384, eps=1e-05, affine=True)
(shift_dim2): Shift()
(shift_dim3): Shift()
)
(drop_path): DropPath()
(norm2): GroupNorm(1, 384, eps=1e-05, affine=True)
(mlp): Mlp(
(fc1): Conv2d(384, 1536, kernel_size=(1, 1), stride=(1, 1))
(act): GELU()
(fc2): Conv2d(1536, 384, kernel_size=(1, 1), stride=(1, 1))
(drop): Dropout(p=0.0, inplace=False)
)
)
(2): AxialShiftedBlock(
dim=384, input_resolution=(14, 14), shift_size=3, mlp_ratio=4.0
(norm1): GroupNorm(1, 384, eps=1e-05, affine=True)
(axial_shift): AxialShift(
dim=384, shift_size=3
(conv1): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
(conv2_1): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
(conv2_2): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
(conv3): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
(actn): GELU()
(norm1): GroupNorm(1, 384, eps=1e-05, affine=True)
(norm2): GroupNorm(1, 384, eps=1e-05, affine=True)
(shift_dim2): Shift()
(shift_dim3): Shift()
)
(drop_path): DropPath()
(norm2): GroupNorm(1, 384, eps=1e-05, affine=True)
(mlp): Mlp(
(fc1): Conv2d(384, 1536, kernel_size=(1, 1), stride=(1, 1))
(act): GELU()
(fc2): Conv2d(1536, 384, kernel_size=(1, 1), stride=(1, 1))
(drop): Dropout(p=0.0, inplace=False)
)
)
(3): AxialShiftedBlock(
dim=384, input_resolution=(14, 14), shift_size=3, mlp_ratio=4.0
(norm1): GroupNorm(1, 384, eps=1e-05, affine=True)
(axial_shift): AxialShift(
dim=384, shift_size=3
(conv1): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
(conv2_1): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
(conv2_2): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
(conv3): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
(actn): GELU()
(norm1): GroupNorm(1, 384, eps=1e-05, affine=True)
(norm2): GroupNorm(1, 384, eps=1e-05, affine=True)
(shift_dim2): Shift()
(shift_dim3): Shift()
)
(drop_path): DropPath()
(norm2): GroupNorm(1, 384, eps=1e-05, affine=True)
(mlp): Mlp(
(fc1): Conv2d(384, 1536, kernel_size=(1, 1), stride=(1, 1))
(act): GELU()
(fc2): Conv2d(1536, 384, kernel_size=(1, 1), stride=(1, 1))
(drop): Dropout(p=0.0, inplace=False)
)
)
(4): AxialShiftedBlock(
dim=384, input_resolution=(14, 14), shift_size=3, mlp_ratio=4.0
(norm1): GroupNorm(1, 384, eps=1e-05, affine=True)
(axial_shift): AxialShift(
dim=384, shift_size=3
(conv1): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
(conv2_1): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
(conv2_2): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
(conv3): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
(actn): GELU()
(norm1): GroupNorm(1, 384, eps=1e-05, affine=True)
(norm2): GroupNorm(1, 384, eps=1e-05, affine=True)
(shift_dim2): Shift()
(shift_dim3): Shift()
)
(drop_path): DropPath()
(norm2): GroupNorm(1, 384, eps=1e-05, affine=True)
(mlp): Mlp(
(fc1): Conv2d(384, 1536, kernel_size=(1, 1), stride=(1, 1))
(act): GELU()
(fc2): Conv2d(1536, 384, kernel_size=(1, 1), stride=(1, 1))
(drop): Dropout(p=0.0, inplace=False)
)
)
(5): AxialShiftedBlock(
dim=384, input_resolution=(14, 14), shift_size=3, mlp_ratio=4.0
(norm1): GroupNorm(1, 384, eps=1e-05, affine=True)
(axial_shift): AxialShift(
dim=384, shift_size=3
(conv1): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
(conv2_1): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
(conv2_2): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
(conv3): Conv2d(384, 384, kernel_size=(1, 1), stride=(1, 1))
(actn): GELU()
(norm1): GroupNorm(1, 384, eps=1e-05, affine=True)
(norm2): GroupNorm(1, 384, eps=1e-05, affine=True)
(shift_dim2): Shift()
(shift_dim3): Shift()
)
(drop_path): DropPath()
(norm2): GroupNorm(1, 384, eps=1e-05, affine=True)
(mlp): Mlp(
(fc1): Conv2d(384, 1536, kernel_size=(1, 1), stride=(1, 1))
(act): GELU()
(fc2): Conv2d(1536, 384, kernel_size=(1, 1), stride=(1, 1))
(drop): Dropout(p=0.0, inplace=False)
)
)
)
(downsample): PatchMerging(
input_resolution=(14, 14), dim=384
(reduction): Conv2d(1536, 768, kernel_size=(1, 1), stride=(1, 1), bias=False)
(norm): GroupNorm(1, 1536, eps=1e-05, affine=True)
)
)
(3): BasicLayer(
dim=768, input_resolution=(7, 7), depth=2
(blocks): ModuleList(
(0): AxialShiftedBlock(
dim=768, input_resolution=(7, 7), shift_size=3, mlp_ratio=4.0
(norm1): GroupNorm(1, 768, eps=1e-05, affine=True)
(axial_shift): AxialShift(
dim=768, shift_size=3
(conv1): Conv2d(768, 768, kernel_size=(1, 1), stride=(1, 1))
(conv2_1): Conv2d(768, 768, kernel_size=(1, 1), stride=(1, 1))
(conv2_2): Conv2d(768, 768, kernel_size=(1, 1), stride=(1, 1))
(conv3): Conv2d(768, 768, kernel_size=(1, 1), stride=(1, 1))
(actn): GELU()
(norm1): GroupNorm(1, 768, eps=1e-05, affine=True)
(norm2): GroupNorm(1, 768, eps=1e-05, affine=True)
(shift_dim2): Shift()
(shift_dim3): Shift()
)
(drop_path): DropPath()
(norm2): GroupNorm(1, 768, eps=1e-05, affine=True)
(mlp): Mlp(
(fc1): Conv2d(768, 3072, kernel_size=(1, 1), stride=(1, 1))
(act): GELU()
(fc2): Conv2d(3072, 768, kernel_size=(1, 1), stride=(1, 1))
(drop): Dropout(p=0.0, inplace=False)
)
)
(1): AxialShiftedBlock(
dim=768, input_resolution=(7, 7), shift_size=3, mlp_ratio=4.0
(norm1): GroupNorm(1, 768, eps=1e-05, affine=True)
(axial_shift): AxialShift(
dim=768, shift_size=3
(conv1): Conv2d(768, 768, kernel_size=(1, 1), stride=(1, 1))
(conv2_1): Conv2d(768, 768, kernel_size=(1, 1), stride=(1, 1))
(conv2_2): Conv2d(768, 768, kernel_size=(1, 1), stride=(1, 1))
(conv3): Conv2d(768, 768, kernel_size=(1, 1), stride=(1, 1))
(actn): GELU()
(norm1): GroupNorm(1, 768, eps=1e-05, affine=True)
(norm2): GroupNorm(1, 768, eps=1e-05, affine=True)
(shift_dim2): Shift()
(shift_dim3): Shift()
)
(drop_path): DropPath()
(norm2): GroupNorm(1, 768, eps=1e-05, affine=True)
(mlp): Mlp(
(fc1): Conv2d(768, 3072, kernel_size=(1, 1), stride=(1, 1))
(act): GELU()
(fc2): Conv2d(3072, 768, kernel_size=(1, 1), stride=(1, 1))
(drop): Dropout(p=0.0, inplace=False)
)
)
)
)
)
(norm): GroupNorm(1, 768, eps=1e-05, affine=True)
(avgpool): AdaptiveAvgPool2d(output_size=1)
(head): Linear(in_features=768, out_features=1000, bias=True)
)
Selected optimization level O1: Insert automatic casts around Pytorch functions and Tensor methods.

Defaults for this optimization level are:
enabled : True
opt_level : O1
cast_model_type : None
patch_torch_functions : True
keep_batchnorm_fp32 : None
master_weights : None
loss_scale : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled : True
opt_level : O1
cast_model_type : None
patch_torch_functions : True
keep_batchnorm_fp32 : None
master_weights : None
loss_scale : dynamic
Warning: multi_tensor_applier fused unscale kernel is unavailable, possibly because apex was installed without --cuda_ext --cpp_ext. Using Python fallback. Original ImportError was: ModuleNotFoundError("No module named 'amp_C'")
[2021-07-22 09:02:19 asmlp_tiny_patch4_shift5_224](main.py 88): INFO number of params: 28282696
[2021-07-22 09:02:19 asmlp_tiny_patch4_shift5_224](main.py 91): INFO number of GFLOPs: 4.3585536
All checkpoints founded in output/asmlp_tiny_patch4_shift5_224/default: []
[2021-07-22 09:02:19 asmlp_tiny_patch4_shift5_224](main.py 116): INFO no checkpoint found in output/asmlp_tiny_patch4_shift5_224/default, ignoring auto resume
[2021-07-22 09:02:19 asmlp_tiny_patch4_shift5_224](main.py 129): INFO Start training
[2021-07-22 09:02:24 asmlp_tiny_patch4_shift5_224](main.py 219): INFO Train: [0/300][0/781] eta 1:08:59 lr 0.000000 time 5.3000 (5.3000) loss 3.4992 (3.4992) grad_norm 2.2539 (2.2539) mem 8882MB
[2021-07-22 09:02:28 asmlp_tiny_patch4_shift5_224](main.py 219): INFO Train: [0/300][10/781] eta 0:10:30 lr 0.000000 time 0.3332 (0.8183) loss 3.4760 (3.4774) grad_norm 2.2970 (2.6936) mem 8882MB
[2021-07-22 09:02:31 asmlp_tiny_patch4_shift5_224](main.py 219): INFO Train: [0/300][20/781] eta 0:07:34 lr 0.000000 time 0.3261 (0.5978) loss 3.4740 (3.4767) grad_norm 2.3040 (2.6371) mem 8882MB
[2021-07-22 09:02:35 asmlp_tiny_patch4_shift5_224](main.py 219): INFO Train: [0/300][30/781] eta 0:06:33 lr 0.000000 time 0.3421 (0.5244) loss 3.4737 (3.4762) grad_norm 2.6535 (2.6483) mem 8882MB
[2021-07-22 09:02:38 asmlp_tiny_patch4_shift5_224](main.py 219): INFO Train: [0/300][40/781] eta 0:05:59 lr 0.000000 time 0.3265 (0.4854) loss 3.4857 (3.4764) grad_norm 2.1506 (2.6657) mem 8882MB
[2021-07-22 09:02:42 asmlp_tiny_patch4_shift5_224](main.py 219): INFO Train: [0/300][50/781] eta 0:05:37 lr 0.000001 time 0.3316 (0.4612) loss 3.4687 (3.4765) grad_norm 2.1028 (2.6646) mem 8882MB
[2021-07-22 09:02:46 asmlp_tiny_patch4_shift5_224](main.py 219): INFO Train: [0/300][60/781] eta 0:05:21 lr 0.000001 time 0.3295 (0.4454) loss 3.4673 (3.4756) grad_norm 2.2019 (2.6883) mem 8882MB
[2021-07-22 09:02:49 asmlp_tiny_patch4_shift5_224](main.py 219): INFO Train: [0/300][70/781] eta 0:05:09 lr 0.000001 time 0.3385 (0.4347) loss 3.4785 (3.4754) grad_norm 2.2327 (2.6915) mem 8882MB
[2021-07-22 09:02:53 asmlp_tiny_patch4_shift5_224](main.py 219): INFO Train: [0/300][80/781] eta 0:04:59 lr 0.000001 time 0.3303 (0.4267) loss 3.4808 (3.4752) grad_norm 2.3357 (2.6958) mem 8882MB
[2021-07-22 09:02:57 asmlp_tiny_patch4_shift5_224](main.py 219): INFO Train: [0/300][90/781] eta 0:04:50 lr 0.000001 time 0.3215 (0.4198) loss 3.4
`

from as-mlp.

niujinshuchong avatar niujinshuchong commented on August 18, 2024

@HantingChen Would you please try with a small datasets? You can try by replacing the train data with val data by
mv train train_backup
ln -s val train
in the imagenet folder.

from as-mlp.

HantingChen avatar HantingChen commented on August 18, 2024

@HantingChen Would you please try with a small datasets? You can try by replacing the train data with val data by
mv train train_backup
ln -s val train
in the imagenet folder.

I use the --eval mode, so it already used the val data. I think this error may be caused by my environment. I will test it using other machine.

Thanks for your reply!

from as-mlp.

dongzelian avatar dongzelian commented on August 18, 2024

@HantingChen Hi, if you use SSD to store the ImageNet dataset, you can also use cache-mode no, the training speed is similar.

from as-mlp.

Related Issues (18)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.