论文复现——Low-level算法 SwinIR (去噪)

SwinIR: Image Restoration Using Swin Transformer——基于Swin Transformer的用于图像恢复的强基线模型





1. 简介



2. 复现精度

在 CBSD68 测试集上测试,达到验收最低标准34.32:

SwinIRNoise Level15

注:源代码八卡训练的iteration为 1,600,000,我们四卡只训练到了 426,000 就超时停止了.

3. 数据集、预训练模型、文件结构


DIV2K (800 training images) + Flickr2K (2650 images) + BSD500 (400 training&testing images) + WED(4744 images)

已经整理好的数据:放在了 Ai Studio 里.



测试数据为 CBSD68:放在了 Ai Studio 里.


# 解压
!cd data && unzip -oq -d testsets/ data147756/CBSD68.zip
!cd data && unzip -oq -d trainsets/ data149405/trainH.zip
# 添加软链接
!cd work && ln -s ../data/trainsets trainsets && ln -s ../data/testsets testsets

3.2 预训练模型

已放在文件夹 work/pretrained_models 下:

  1. 官方预训练模型,已转为 paddle 的,名为 005_colorDN_DFWB_s128w8_SwinIR-M_noise15.pdparams.
  2. 复现的模型,名为 SwinIR_paddle.pdparams.

3.3 文件结构

    |-- data                                        # 数据相关文件
    |-- models                                   # 模型相关文件
    |-- options                                   # 训练配置文件
    |-- trainsets
         |-- trainH                                # 训练数据
    |-- testsets
         |-- CBSD68                             # 测试数据
    |-- test_tipc                                  # TIPC: Linux GPU/CPU 基础训练推理测试
    |-- pretrained_models                  # 预训练模型
    |-- utils                                          # 一些工具代码
    |-- config.py                                  # 配置文件
    |-- generate_patches_SIDD.py      # 生成数据patch
    |-- infer.py                                     # 模型推理代码
    |-- LICENSE                                   # LICENSE文件
    |-- main_test_swinir.py                  # 模型测试代码
    |-- main_train_psnr.py                   # 模型训练代码
    |-- main_train_tipc.py                    # TICP训练代码
    |-- README.md                             # README.md文件
    |-- train.log                                    # 训练日志

## 4. 环境依赖

PaddlePaddle >= 2.3.2

scikit-image == 0.19.3

5. 快速开始


5.1 模型训练


# 单机单卡
!cd work && python main_train_psnr.py --opt options/train_swinir_multi_card_32.json
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/__init__.py:107: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import MutableMapping
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/rcsetup.py:20: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Iterable, Mapping
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/colors.py:53: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  from collections import Sized
number of GPUs is: 4
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/distributed/parallel.py:159: UserWarning: Currently not a parallel execution environment, `paddle.distributed.init_parallel_env` will not do anything.
  "Currently not a parallel execution environment, `paddle.distributed.init_parallel_env` will not do anything."
LogHandlers setup!
22-10-17 16:47:13.877 :   task: swinir_denoising_color_15
  model: plain
  gpu_ids: [0, 1, 2, 3]
  dist: True
  n_channels: 3
    root: denoising
    pretrained_netG: None
    pretrained_netE: None
    task: denoising/swinir_denoising_color_15
    log: denoising/swinir_denoising_color_15
    options: denoising/swinir_denoising_color_15/options
    models: denoising/swinir_denoising_color_15/models
    images: denoising/swinir_denoising_color_15/images
    pretrained_optimizerG: None
      name: train_dataset
      dataset_type: dncnn
      dataroot_H: trainsets/trainH
      dataroot_L: None
      H_size: 128
      sigma: 15
      sigma_test: 15
      dataloader_shuffle: True
      dataloader_num_workers: 8
      dataloader_batch_size: 2
      phase: train
      scale: 1
      n_channels: 3
      name: test_dataset
      dataset_type: dncnn
      dataroot_H: testsets/CBSD68
      dataroot_L: None
      sigma: 15
      sigma_test: 15
      phase: test
      scale: 1
      n_channels: 3
    net_type: swinir
    upscale: 1
    in_chans: 3
    img_size: 128
    window_size: 8
    img_range: 1.0
    depths: [6, 6, 6, 6, 6, 6]
    embed_dim: 180
    num_heads: [6, 6, 6, 6, 6, 6]
    mlp_ratio: 2
    upsampler: None
    resi_connection: 1conv
    init_type: default
    scale: 1
    G_lossfn_type: charbonnier
    G_lossfn_weight: 1.0
    G_charbonnier_eps: 1e-09
    E_decay: 0.999
    G_optimizer_type: adam
    G_optimizer_lr: 0.0002
    G_optimizer_wd: 0
    G_optimizer_clipgrad: None
    G_optimizer_reuse: True
    G_scheduler_type: MultiStepLR
    G_scheduler_milestones: [800000, 1200000, 1400000, 1500000, 1600000]
    G_scheduler_gamma: 0.5
    G_regularizer_orthstep: None
    G_regularizer_clipstep: None
    G_param_strict: True
    E_param_strict: True
    manual_seed: 42
    checkpoint_test: 2000
    checkpoint_save: 2000
    checkpoint_print: 400
    F_feature_layer: 34
    F_weights: 1.0
    F_lossfn_type: l1
    F_use_input_norm: True
    F_use_range_norm: False
    G_optimizer_betas: [0.9, 0.999]
    G_scheduler_restart_weights: 1
  opt_path: options/train_swinir_multi_card_32.json
  is_train: True
  merge_bn: False
  merge_bn_startpoint: -1
  scale: 1
  find_unused_parameters: True
  use_static_graph: False
  num_gpu: 4
  nranks: 1

Random seed: 42
Dataset: Denosing on AWGN with fixed sigma. Only dataroot_H is needed.
Dataset [DatasetDnCNN - train_dataset] is created.
22-10-17 16:47:13.913 : Number of train images: 8,694, iters: 4,347
Dataset: Denosing on AWGN with fixed sigma. Only dataroot_H is needed.
Dataset [DatasetDnCNN - test_dataset] is created.
W1017 16:47:14.995301  3546 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W1017 16:47:14.999406  3546 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
Pass this initialization! Initialization was done during network definition!
Pass this initialization! Initialization was done during network definition!
Training model [ModelPlain] is created.
Copying model for E ...
22-10-17 16:47:15.620 : 
Networks name: SwinIR
Params number: Tensor(shape=[1], dtype=int64, place=Place(gpu:0), stop_gradient=False,
Net structure:
  (conv_first): Conv2D(3, 180, kernel_size=[3, 3], padding=1, data_format=NCHW)
  (patch_embed): PatchEmbed(
    (norm): LayerNorm(normalized_shape=[180], epsilon=1e-05)
  (patch_unembed): PatchUnEmbed()
  (pos_drop): Dropout(p=0.0, axis=None, mode=upscale_in_train)
  (layers): LayerList(
    (0): RSTB(
      (residual_group): BasicLayer(dim=180, input_resolution=(128, 128), depth=6
        (blocks): LayerList(
          (0): SwinTransformerBlock(dim=180, input_resolution=(128, 128), num_heads=6, window_size=8, shift_size=0, mlp_ratio=2
            (norm1): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (attn): WindowAttention(dim=180, window_size=(8, 8), num_heads=6
              (qkv): Linear(in_features=180, out_features=540, dtype=float32)
              (attn_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (proj): Linear(in_features=180, out_features=180, dtype=float32)
              (proj_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (softmax): Softmax(axis=-1)
            (drop_path): Identity()
            (norm2): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (mlp): Mlp(
              (fc1): Linear(in_features=180, out_features=360, dtype=float32)
              (fc2): Linear(in_features=360, out_features=180, dtype=float32)
              (act): GELU(approximate=False)
              (dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
          (1): SwinTransformerBlock(dim=180, input_resolution=(128, 128), num_heads=6, window_size=8, shift_size=4, mlp_ratio=2
            (norm1): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (attn): WindowAttention(dim=180, window_size=(8, 8), num_heads=6
              (qkv): Linear(in_features=180, out_features=540, dtype=float32)
              (attn_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (proj): Linear(in_features=180, out_features=180, dtype=float32)
              (proj_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (softmax): Softmax(axis=-1)
            (drop_path): DropPath()
            (norm2): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (mlp): Mlp(
              (fc1): Linear(in_features=180, out_features=360, dtype=float32)
              (fc2): Linear(in_features=360, out_features=180, dtype=float32)
              (act): GELU(approximate=False)
              (dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
          (2): SwinTransformerBlock(dim=180, input_resolution=(128, 128), num_heads=6, window_size=8, shift_size=0, mlp_ratio=2
            (norm1): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (attn): WindowAttention(dim=180, window_size=(8, 8), num_heads=6
              (qkv): Linear(in_features=180, out_features=540, dtype=float32)
              (attn_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (proj): Linear(in_features=180, out_features=180, dtype=float32)
              (proj_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (softmax): Softmax(axis=-1)
            (drop_path): DropPath()
            (norm2): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (mlp): Mlp(
              (fc1): Linear(in_features=180, out_features=360, dtype=float32)
              (fc2): Linear(in_features=360, out_features=180, dtype=float32)
              (act): GELU(approximate=False)
              (dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
          (3): SwinTransformerBlock(dim=180, input_resolution=(128, 128), num_heads=6, window_size=8, shift_size=4, mlp_ratio=2
            (norm1): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (attn): WindowAttention(dim=180, window_size=(8, 8), num_heads=6
              (qkv): Linear(in_features=180, out_features=540, dtype=float32)
              (attn_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (proj): Linear(in_features=180, out_features=180, dtype=float32)
              (proj_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (softmax): Softmax(axis=-1)
            (drop_path): DropPath()
            (norm2): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (mlp): Mlp(
              (fc1): Linear(in_features=180, out_features=360, dtype=float32)
              (fc2): Linear(in_features=360, out_features=180, dtype=float32)
              (act): GELU(approximate=False)
              (dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
          (4): SwinTransformerBlock(dim=180, input_resolution=(128, 128), num_heads=6, window_size=8, shift_size=0, mlp_ratio=2
            (norm1): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (attn): WindowAttention(dim=180, window_size=(8, 8), num_heads=6
              (qkv): Linear(in_features=180, out_features=540, dtype=float32)
              (attn_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (proj): Linear(in_features=180, out_features=180, dtype=float32)
              (proj_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (softmax): Softmax(axis=-1)
            (drop_path): DropPath()
            (norm2): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (mlp): Mlp(
              (fc1): Linear(in_features=180, out_features=360, dtype=float32)
              (fc2): Linear(in_features=360, out_features=180, dtype=float32)
              (act): GELU(approximate=False)
              (dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
          (5): SwinTransformerBlock(dim=180, input_resolution=(128, 128), num_heads=6, window_size=8, shift_size=4, mlp_ratio=2
            (norm1): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (attn): WindowAttention(dim=180, window_size=(8, 8), num_heads=6
              (qkv): Linear(in_features=180, out_features=540, dtype=float32)
              (attn_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (proj): Linear(in_features=180, out_features=180, dtype=float32)
              (proj_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (softmax): Softmax(axis=-1)
            (drop_path): DropPath()
            (norm2): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (mlp): Mlp(
              (fc1): Linear(in_features=180, out_features=360, dtype=float32)
              (fc2): Linear(in_features=360, out_features=180, dtype=float32)
              (act): GELU(approximate=False)
              (dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
      (conv): Conv2D(180, 180, kernel_size=[3, 3], padding=1, data_format=NCHW)
      (patch_embed): PatchEmbed()
      (patch_unembed): PatchUnEmbed()
    (1): RSTB(
      (residual_group): BasicLayer(dim=180, input_resolution=(128, 128), depth=6
        (blocks): LayerList(
          (0): SwinTransformerBlock(dim=180, input_resolution=(128, 128), num_heads=6, window_size=8, shift_size=0, mlp_ratio=2
            (norm1): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (attn): WindowAttention(dim=180, window_size=(8, 8), num_heads=6
              (qkv): Linear(in_features=180, out_features=540, dtype=float32)
              (attn_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (proj): Linear(in_features=180, out_features=180, dtype=float32)
              (proj_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (softmax): Softmax(axis=-1)
            (drop_path): DropPath()
            (norm2): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (mlp): Mlp(
              (fc1): Linear(in_features=180, out_features=360, dtype=float32)
              (fc2): Linear(in_features=360, out_features=180, dtype=float32)
              (act): GELU(approximate=False)
              (dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
          (1): SwinTransformerBlock(dim=180, input_resolution=(128, 128), num_heads=6, window_size=8, shift_size=4, mlp_ratio=2
            (norm1): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (attn): WindowAttention(dim=180, window_size=(8, 8), num_heads=6
              (qkv): Linear(in_features=180, out_features=540, dtype=float32)
              (attn_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (proj): Linear(in_features=180, out_features=180, dtype=float32)
              (proj_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (softmax): Softmax(axis=-1)
            (drop_path): DropPath()
            (norm2): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (mlp): Mlp(
              (fc1): Linear(in_features=180, out_features=360, dtype=float32)
              (fc2): Linear(in_features=360, out_features=180, dtype=float32)
              (act): GELU(approximate=False)
              (dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
          (2): SwinTransformerBlock(dim=180, input_resolution=(128, 128), num_heads=6, window_size=8, shift_size=0, mlp_ratio=2
            (norm1): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (attn): WindowAttention(dim=180, window_size=(8, 8), num_heads=6
              (qkv): Linear(in_features=180, out_features=540, dtype=float32)
              (attn_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (proj): Linear(in_features=180, out_features=180, dtype=float32)
              (proj_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (softmax): Softmax(axis=-1)
            (drop_path): DropPath()
            (norm2): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (mlp): Mlp(
              (fc1): Linear(in_features=180, out_features=360, dtype=float32)
              (fc2): Linear(in_features=360, out_features=180, dtype=float32)
              (act): GELU(approximate=False)
              (dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
          (3): SwinTransformerBlock(dim=180, input_resolution=(128, 128), num_heads=6, window_size=8, shift_size=4, mlp_ratio=2
            (norm1): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (attn): WindowAttention(dim=180, window_size=(8, 8), num_heads=6
              (qkv): Linear(in_features=180, out_features=540, dtype=float32)
              (attn_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (proj): Linear(in_features=180, out_features=180, dtype=float32)
              (proj_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (softmax): Softmax(axis=-1)
            (drop_path): DropPath()
            (norm2): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (mlp): Mlp(
              (fc1): Linear(in_features=180, out_features=360, dtype=float32)
              (fc2): Linear(in_features=360, out_features=180, dtype=float32)
              (act): GELU(approximate=False)
              (dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
          (4): SwinTransformerBlock(dim=180, input_resolution=(128, 128), num_heads=6, window_size=8, shift_size=0, mlp_ratio=2
            (norm1): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (attn): WindowAttention(dim=180, window_size=(8, 8), num_heads=6
              (qkv): Linear(in_features=180, out_features=540, dtype=float32)
              (attn_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (proj): Linear(in_features=180, out_features=180, dtype=float32)
              (proj_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (softmax): Softmax(axis=-1)
            (drop_path): DropPath()
            (norm2): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (mlp): Mlp(
              (fc1): Linear(in_features=180, out_features=360, dtype=float32)
              (fc2): Linear(in_features=360, out_features=180, dtype=float32)
              (act): GELU(approximate=False)
              (dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
          (5): SwinTransformerBlock(dim=180, input_resolution=(128, 128), num_heads=6, window_size=8, shift_size=4, mlp_ratio=2
            (norm1): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (attn): WindowAttention(dim=180, window_size=(8, 8), num_heads=6
              (qkv): Linear(in_features=180, out_features=540, dtype=float32)
              (attn_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (proj): Linear(in_features=180, out_features=180, dtype=float32)
              (proj_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (softmax): Softmax(axis=-1)
            (drop_path): DropPath()
            (norm2): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (mlp): Mlp(
              (fc1): Linear(in_features=180, out_features=360, dtype=float32)
              (fc2): Linear(in_features=360, out_features=180, dtype=float32)
              (act): GELU(approximate=False)
              (dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
      (conv): Conv2D(180, 180, kernel_size=[3, 3], padding=1, data_format=NCHW)
      (patch_embed): PatchEmbed()
      (patch_unembed): PatchUnEmbed()
    (2): RSTB(
      (residual_group): BasicLayer(dim=180, input_resolution=(128, 128), depth=6
        (blocks): LayerList(
          (0): SwinTransformerBlock(dim=180, input_resolution=(128, 128), num_heads=6, window_size=8, shift_size=0, mlp_ratio=2
            (norm1): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (attn): WindowAttention(dim=180, window_size=(8, 8), num_heads=6
              (qkv): Linear(in_features=180, out_features=540, dtype=float32)
              (attn_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (proj): Linear(in_features=180, out_features=180, dtype=float32)
              (proj_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (softmax): Softmax(axis=-1)
            (drop_path): DropPath()
            (norm2): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (mlp): Mlp(
              (fc1): Linear(in_features=180, out_features=360, dtype=float32)
              (fc2): Linear(in_features=360, out_features=180, dtype=float32)
              (act): GELU(approximate=False)
              (dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
          (1): SwinTransformerBlock(dim=180, input_resolution=(128, 128), num_heads=6, window_size=8, shift_size=4, mlp_ratio=2
            (norm1): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (attn): WindowAttention(dim=180, window_size=(8, 8), num_heads=6
              (qkv): Linear(in_features=180, out_features=540, dtype=float32)
              (attn_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (proj): Linear(in_features=180, out_features=180, dtype=float32)
              (proj_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (softmax): Softmax(axis=-1)
            (drop_path): DropPath()
            (norm2): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (mlp): Mlp(
              (fc1): Linear(in_features=180, out_features=360, dtype=float32)
              (fc2): Linear(in_features=360, out_features=180, dtype=float32)
              (act): GELU(approximate=False)
              (dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
          (2): SwinTransformerBlock(dim=180, input_resolution=(128, 128), num_heads=6, window_size=8, shift_size=0, mlp_ratio=2
            (norm1): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (attn): WindowAttention(dim=180, window_size=(8, 8), num_heads=6
              (qkv): Linear(in_features=180, out_features=540, dtype=float32)
              (attn_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (proj): Linear(in_features=180, out_features=180, dtype=float32)
              (proj_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (softmax): Softmax(axis=-1)
            (drop_path): DropPath()
            (norm2): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (mlp): Mlp(
              (fc1): Linear(in_features=180, out_features=360, dtype=float32)
              (fc2): Linear(in_features=360, out_features=180, dtype=float32)
              (act): GELU(approximate=False)
              (dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
          (3): SwinTransformerBlock(dim=180, input_resolution=(128, 128), num_heads=6, window_size=8, shift_size=4, mlp_ratio=2
            (norm1): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (attn): WindowAttention(dim=180, window_size=(8, 8), num_heads=6
              (qkv): Linear(in_features=180, out_features=540, dtype=float32)
              (attn_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (proj): Linear(in_features=180, out_features=180, dtype=float32)
              (proj_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (softmax): Softmax(axis=-1)
            (drop_path): DropPath()
            (norm2): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (mlp): Mlp(
              (fc1): Linear(in_features=180, out_features=360, dtype=float32)
              (fc2): Linear(in_features=360, out_features=180, dtype=float32)
              (act): GELU(approximate=False)
              (dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
          (4): SwinTransformerBlock(dim=180, input_resolution=(128, 128), num_heads=6, window_size=8, shift_size=0, mlp_ratio=2
            (norm1): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (attn): WindowAttention(dim=180, window_size=(8, 8), num_heads=6
              (qkv): Linear(in_features=180, out_features=540, dtype=float32)
              (attn_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (proj): Linear(in_features=180, out_features=180, dtype=float32)
              (proj_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (softmax): Softmax(axis=-1)
            (drop_path): DropPath()
            (norm2): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (mlp): Mlp(
              (fc1): Linear(in_features=180, out_features=360, dtype=float32)
              (fc2): Linear(in_features=360, out_features=180, dtype=float32)
              (act): GELU(approximate=False)
              (dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
          (5): SwinTransformerBlock(dim=180, input_resolution=(128, 128), num_heads=6, window_size=8, shift_size=4, mlp_ratio=2
            (norm1): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (attn): WindowAttention(dim=180, window_size=(8, 8), num_heads=6
              (qkv): Linear(in_features=180, out_features=540, dtype=float32)
              (attn_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (proj): Linear(in_features=180, out_features=180, dtype=float32)
              (proj_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (softmax): Softmax(axis=-1)
            (drop_path): DropPath()
            (norm2): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (mlp): Mlp(
              (fc1): Linear(in_features=180, out_features=360, dtype=float32)
              (fc2): Linear(in_features=360, out_features=180, dtype=float32)
              (act): GELU(approximate=False)
              (dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
      (conv): Conv2D(180, 180, kernel_size=[3, 3], padding=1, data_format=NCHW)
      (patch_embed): PatchEmbed()
      (patch_unembed): PatchUnEmbed()
    (3): RSTB(
      (residual_group): BasicLayer(dim=180, input_resolution=(128, 128), depth=6
        (blocks): LayerList(
          (0): SwinTransformerBlock(dim=180, input_resolution=(128, 128), num_heads=6, window_size=8, shift_size=0, mlp_ratio=2
            (norm1): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (attn): WindowAttention(dim=180, window_size=(8, 8), num_heads=6
              (qkv): Linear(in_features=180, out_features=540, dtype=float32)
              (attn_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (proj): Linear(in_features=180, out_features=180, dtype=float32)
              (proj_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (softmax): Softmax(axis=-1)
            (drop_path): DropPath()
            (norm2): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (mlp): Mlp(
              (fc1): Linear(in_features=180, out_features=360, dtype=float32)
              (fc2): Linear(in_features=360, out_features=180, dtype=float32)
              (act): GELU(approximate=False)
              (dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
          (1): SwinTransformerBlock(dim=180, input_resolution=(128, 128), num_heads=6, window_size=8, shift_size=4, mlp_ratio=2
            (norm1): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (attn): WindowAttention(dim=180, window_size=(8, 8), num_heads=6
              (qkv): Linear(in_features=180, out_features=540, dtype=float32)
              (attn_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (proj): Linear(in_features=180, out_features=180, dtype=float32)
              (proj_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (softmax): Softmax(axis=-1)
            (drop_path): DropPath()
            (norm2): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (mlp): Mlp(
              (fc1): Linear(in_features=180, out_features=360, dtype=float32)
              (fc2): Linear(in_features=360, out_features=180, dtype=float32)
              (act): GELU(approximate=False)
              (dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
          (2): SwinTransformerBlock(dim=180, input_resolution=(128, 128), num_heads=6, window_size=8, shift_size=0, mlp_ratio=2
            (norm1): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (attn): WindowAttention(dim=180, window_size=(8, 8), num_heads=6
              (qkv): Linear(in_features=180, out_features=540, dtype=float32)
              (attn_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (proj): Linear(in_features=180, out_features=180, dtype=float32)
              (proj_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (softmax): Softmax(axis=-1)
            (drop_path): DropPath()
            (norm2): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (mlp): Mlp(
              (fc1): Linear(in_features=180, out_features=360, dtype=float32)
              (fc2): Linear(in_features=360, out_features=180, dtype=float32)
              (act): GELU(approximate=False)
              (dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
          (3): SwinTransformerBlock(dim=180, input_resolution=(128, 128), num_heads=6, window_size=8, shift_size=4, mlp_ratio=2
            (norm1): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (attn): WindowAttention(dim=180, window_size=(8, 8), num_heads=6
              (qkv): Linear(in_features=180, out_features=540, dtype=float32)
              (attn_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (proj): Linear(in_features=180, out_features=180, dtype=float32)
              (proj_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (softmax): Softmax(axis=-1)
            (drop_path): DropPath()
            (norm2): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (mlp): Mlp(
              (fc1): Linear(in_features=180, out_features=360, dtype=float32)
              (fc2): Linear(in_features=360, out_features=180, dtype=float32)
              (act): GELU(approximate=False)
              (dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
          (4): SwinTransformerBlock(dim=180, input_resolution=(128, 128), num_heads=6, window_size=8, shift_size=0, mlp_ratio=2
            (norm1): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (attn): WindowAttention(dim=180, window_size=(8, 8), num_heads=6
              (qkv): Linear(in_features=180, out_features=540, dtype=float32)
              (attn_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (proj): Linear(in_features=180, out_features=180, dtype=float32)
              (proj_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (softmax): Softmax(axis=-1)
            (drop_path): DropPath()
            (norm2): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (mlp): Mlp(
              (fc1): Linear(in_features=180, out_features=360, dtype=float32)
              (fc2): Linear(in_features=360, out_features=180, dtype=float32)
              (act): GELU(approximate=False)
              (dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
          (5): SwinTransformerBlock(dim=180, input_resolution=(128, 128), num_heads=6, window_size=8, shift_size=4, mlp_ratio=2
            (norm1): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (attn): WindowAttention(dim=180, window_size=(8, 8), num_heads=6
              (qkv): Linear(in_features=180, out_features=540, dtype=float32)
              (attn_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (proj): Linear(in_features=180, out_features=180, dtype=float32)
              (proj_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (softmax): Softmax(axis=-1)
            (drop_path): DropPath()
            (norm2): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (mlp): Mlp(
              (fc1): Linear(in_features=180, out_features=360, dtype=float32)
              (fc2): Linear(in_features=360, out_features=180, dtype=float32)
              (act): GELU(approximate=False)
              (dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
      (conv): Conv2D(180, 180, kernel_size=[3, 3], padding=1, data_format=NCHW)
      (patch_embed): PatchEmbed()
      (patch_unembed): PatchUnEmbed()
    (4): RSTB(
      (residual_group): BasicLayer(dim=180, input_resolution=(128, 128), depth=6
        (blocks): LayerList(
          (0): SwinTransformerBlock(dim=180, input_resolution=(128, 128), num_heads=6, window_size=8, shift_size=0, mlp_ratio=2
            (norm1): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (attn): WindowAttention(dim=180, window_size=(8, 8), num_heads=6
              (qkv): Linear(in_features=180, out_features=540, dtype=float32)
              (attn_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (proj): Linear(in_features=180, out_features=180, dtype=float32)
              (proj_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (softmax): Softmax(axis=-1)
            (drop_path): DropPath()
            (norm2): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (mlp): Mlp(
              (fc1): Linear(in_features=180, out_features=360, dtype=float32)
              (fc2): Linear(in_features=360, out_features=180, dtype=float32)
              (act): GELU(approximate=False)
              (dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
          (1): SwinTransformerBlock(dim=180, input_resolution=(128, 128), num_heads=6, window_size=8, shift_size=4, mlp_ratio=2
            (norm1): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (attn): WindowAttention(dim=180, window_size=(8, 8), num_heads=6
              (qkv): Linear(in_features=180, out_features=540, dtype=float32)
              (attn_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (proj): Linear(in_features=180, out_features=180, dtype=float32)
              (proj_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (softmax): Softmax(axis=-1)
            (drop_path): DropPath()
            (norm2): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (mlp): Mlp(
              (fc1): Linear(in_features=180, out_features=360, dtype=float32)
              (fc2): Linear(in_features=360, out_features=180, dtype=float32)
              (act): GELU(approximate=False)
              (dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
          (2): SwinTransformerBlock(dim=180, input_resolution=(128, 128), num_heads=6, window_size=8, shift_size=0, mlp_ratio=2
            (norm1): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (attn): WindowAttention(dim=180, window_size=(8, 8), num_heads=6
              (qkv): Linear(in_features=180, out_features=540, dtype=float32)
              (attn_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (proj): Linear(in_features=180, out_features=180, dtype=float32)
              (proj_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (softmax): Softmax(axis=-1)
            (drop_path): DropPath()
            (norm2): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (mlp): Mlp(
              (fc1): Linear(in_features=180, out_features=360, dtype=float32)
              (fc2): Linear(in_features=360, out_features=180, dtype=float32)
              (act): GELU(approximate=False)
              (dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
          (3): SwinTransformerBlock(dim=180, input_resolution=(128, 128), num_heads=6, window_size=8, shift_size=4, mlp_ratio=2
            (norm1): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (attn): WindowAttention(dim=180, window_size=(8, 8), num_heads=6
              (qkv): Linear(in_features=180, out_features=540, dtype=float32)
              (attn_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (proj): Linear(in_features=180, out_features=180, dtype=float32)
              (proj_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (softmax): Softmax(axis=-1)
            (drop_path): DropPath()
            (norm2): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (mlp): Mlp(
              (fc1): Linear(in_features=180, out_features=360, dtype=float32)
              (fc2): Linear(in_features=360, out_features=180, dtype=float32)
              (act): GELU(approximate=False)
              (dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
          (4): SwinTransformerBlock(dim=180, input_resolution=(128, 128), num_heads=6, window_size=8, shift_size=0, mlp_ratio=2
            (norm1): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (attn): WindowAttention(dim=180, window_size=(8, 8), num_heads=6
              (qkv): Linear(in_features=180, out_features=540, dtype=float32)
              (attn_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (proj): Linear(in_features=180, out_features=180, dtype=float32)
              (proj_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (softmax): Softmax(axis=-1)
            (drop_path): DropPath()
            (norm2): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (mlp): Mlp(
              (fc1): Linear(in_features=180, out_features=360, dtype=float32)
              (fc2): Linear(in_features=360, out_features=180, dtype=float32)
              (act): GELU(approximate=False)
              (dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
          (5): SwinTransformerBlock(dim=180, input_resolution=(128, 128), num_heads=6, window_size=8, shift_size=4, mlp_ratio=2
            (norm1): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (attn): WindowAttention(dim=180, window_size=(8, 8), num_heads=6
              (qkv): Linear(in_features=180, out_features=540, dtype=float32)
              (attn_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (proj): Linear(in_features=180, out_features=180, dtype=float32)
              (proj_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (softmax): Softmax(axis=-1)
            (drop_path): DropPath()
            (norm2): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (mlp): Mlp(
              (fc1): Linear(in_features=180, out_features=360, dtype=float32)
              (fc2): Linear(in_features=360, out_features=180, dtype=float32)
              (act): GELU(approximate=False)
              (dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
      (conv): Conv2D(180, 180, kernel_size=[3, 3], padding=1, data_format=NCHW)
      (patch_embed): PatchEmbed()
      (patch_unembed): PatchUnEmbed()
    (5): RSTB(
      (residual_group): BasicLayer(dim=180, input_resolution=(128, 128), depth=6
        (blocks): LayerList(
          (0): SwinTransformerBlock(dim=180, input_resolution=(128, 128), num_heads=6, window_size=8, shift_size=0, mlp_ratio=2
            (norm1): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (attn): WindowAttention(dim=180, window_size=(8, 8), num_heads=6
              (qkv): Linear(in_features=180, out_features=540, dtype=float32)
              (attn_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (proj): Linear(in_features=180, out_features=180, dtype=float32)
              (proj_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (softmax): Softmax(axis=-1)
            (drop_path): DropPath()
            (norm2): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (mlp): Mlp(
              (fc1): Linear(in_features=180, out_features=360, dtype=float32)
              (fc2): Linear(in_features=360, out_features=180, dtype=float32)
              (act): GELU(approximate=False)
              (dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
          (1): SwinTransformerBlock(dim=180, input_resolution=(128, 128), num_heads=6, window_size=8, shift_size=4, mlp_ratio=2
            (norm1): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (attn): WindowAttention(dim=180, window_size=(8, 8), num_heads=6
              (qkv): Linear(in_features=180, out_features=540, dtype=float32)
              (attn_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (proj): Linear(in_features=180, out_features=180, dtype=float32)
              (proj_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (softmax): Softmax(axis=-1)
            (drop_path): DropPath()
            (norm2): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (mlp): Mlp(
              (fc1): Linear(in_features=180, out_features=360, dtype=float32)
              (fc2): Linear(in_features=360, out_features=180, dtype=float32)
              (act): GELU(approximate=False)
              (dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
          (2): SwinTransformerBlock(dim=180, input_resolution=(128, 128), num_heads=6, window_size=8, shift_size=0, mlp_ratio=2
            (norm1): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (attn): WindowAttention(dim=180, window_size=(8, 8), num_heads=6
              (qkv): Linear(in_features=180, out_features=540, dtype=float32)
              (attn_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (proj): Linear(in_features=180, out_features=180, dtype=float32)
              (proj_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (softmax): Softmax(axis=-1)
            (drop_path): DropPath()
            (norm2): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (mlp): Mlp(
              (fc1): Linear(in_features=180, out_features=360, dtype=float32)
              (fc2): Linear(in_features=360, out_features=180, dtype=float32)
              (act): GELU(approximate=False)
              (dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
          (3): SwinTransformerBlock(dim=180, input_resolution=(128, 128), num_heads=6, window_size=8, shift_size=4, mlp_ratio=2
            (norm1): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (attn): WindowAttention(dim=180, window_size=(8, 8), num_heads=6
              (qkv): Linear(in_features=180, out_features=540, dtype=float32)
              (attn_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (proj): Linear(in_features=180, out_features=180, dtype=float32)
              (proj_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (softmax): Softmax(axis=-1)
            (drop_path): DropPath()
            (norm2): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (mlp): Mlp(
              (fc1): Linear(in_features=180, out_features=360, dtype=float32)
              (fc2): Linear(in_features=360, out_features=180, dtype=float32)
              (act): GELU(approximate=False)
              (dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
          (4): SwinTransformerBlock(dim=180, input_resolution=(128, 128), num_heads=6, window_size=8, shift_size=0, mlp_ratio=2
            (norm1): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (attn): WindowAttention(dim=180, window_size=(8, 8), num_heads=6
              (qkv): Linear(in_features=180, out_features=540, dtype=float32)
              (attn_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (proj): Linear(in_features=180, out_features=180, dtype=float32)
              (proj_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (softmax): Softmax(axis=-1)
            (drop_path): DropPath()
            (norm2): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (mlp): Mlp(
              (fc1): Linear(in_features=180, out_features=360, dtype=float32)
              (fc2): Linear(in_features=360, out_features=180, dtype=float32)
              (act): GELU(approximate=False)
              (dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
          (5): SwinTransformerBlock(dim=180, input_resolution=(128, 128), num_heads=6, window_size=8, shift_size=4, mlp_ratio=2
            (norm1): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (attn): WindowAttention(dim=180, window_size=(8, 8), num_heads=6
              (qkv): Linear(in_features=180, out_features=540, dtype=float32)
              (attn_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (proj): Linear(in_features=180, out_features=180, dtype=float32)
              (proj_dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
              (softmax): Softmax(axis=-1)
            (drop_path): DropPath()
            (norm2): LayerNorm(normalized_shape=[180], epsilon=1e-05)
            (mlp): Mlp(
              (fc1): Linear(in_features=180, out_features=360, dtype=float32)
              (fc2): Linear(in_features=360, out_features=180, dtype=float32)
              (act): GELU(approximate=False)
              (dropout): Dropout(p=0.0, axis=None, mode=upscale_in_train)
      (conv): Conv2D(180, 180, kernel_size=[3, 3], padding=1, data_format=NCHW)
      (patch_embed): PatchEmbed()
      (patch_unembed): PatchUnEmbed()
  (norm): LayerNorm(normalized_shape=[180], epsilon=1e-05)
  (conv_after_body): Conv2D(180, 180, kernel_size=[3, 3], padding=1, data_format=NCHW)
  (conv_last): Conv2D(180, 3, kernel_size=[3, 3], padding=1, data_format=NCHW)

22-10-17 16:47:15.819 : 
 |  mean  |  min   |  max   |  std   || shape               
 | -0.002 | -0.941 |  1.199 |  0.274 | (180, 3, 3, 3) || conv_first.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || conv_first.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || patch_embed.norm.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || patch_embed.norm.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.0.residual_group.blocks.0.norm1.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.0.residual_group.blocks.0.norm1.bias
 | -0.000 | -0.040 |  0.040 |  0.017 | (225, 6) || layers.0.residual_group.blocks.0.attn.relative_position_bias_table
 | 112.000 |  0.000 | 224.000 | 48.713 | (64, 64) || layers.0.residual_group.blocks.0.attn.relative_position_index
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 540) || layers.0.residual_group.blocks.0.attn.qkv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (540,) || layers.0.residual_group.blocks.0.attn.qkv.bias
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 180) || layers.0.residual_group.blocks.0.attn.proj.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.0.residual_group.blocks.0.attn.proj.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.0.residual_group.blocks.0.norm2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.0.residual_group.blocks.0.norm2.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (180, 360) || layers.0.residual_group.blocks.0.mlp.fc1.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (360,) || layers.0.residual_group.blocks.0.mlp.fc1.bias
 | -0.000 | -0.105 |  0.105 |  0.061 | (360, 180) || layers.0.residual_group.blocks.0.mlp.fc2.weight
 | -0.000 | -0.000 |  0.000 |  0.000 | (180,) || layers.0.residual_group.blocks.0.mlp.fc2.bias
 | -6.152 | -100.000 | -0.000 | 24.029 | (256, 64, 64) || layers.0.residual_group.blocks.1.attn_mask
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.0.residual_group.blocks.1.norm1.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.0.residual_group.blocks.1.norm1.bias
 | -0.001 | -0.040 |  0.040 |  0.018 | (225, 6) || layers.0.residual_group.blocks.1.attn.relative_position_bias_table
 | 112.000 |  0.000 | 224.000 | 48.713 | (64, 64) || layers.0.residual_group.blocks.1.attn.relative_position_index
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 540) || layers.0.residual_group.blocks.1.attn.qkv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (540,) || layers.0.residual_group.blocks.1.attn.qkv.bias
 |  0.000 | -0.040 |  0.040 |  0.018 | (180, 180) || layers.0.residual_group.blocks.1.attn.proj.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.0.residual_group.blocks.1.attn.proj.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.0.residual_group.blocks.1.norm2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.0.residual_group.blocks.1.norm2.bias
 | -0.000 | -0.105 |  0.105 |  0.061 | (180, 360) || layers.0.residual_group.blocks.1.mlp.fc1.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (360,) || layers.0.residual_group.blocks.1.mlp.fc1.bias
 | -0.000 | -0.105 |  0.105 |  0.061 | (360, 180) || layers.0.residual_group.blocks.1.mlp.fc2.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (180,) || layers.0.residual_group.blocks.1.mlp.fc2.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.0.residual_group.blocks.2.norm1.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.0.residual_group.blocks.2.norm1.bias
 |  0.001 | -0.040 |  0.040 |  0.018 | (225, 6) || layers.0.residual_group.blocks.2.attn.relative_position_bias_table
 | 112.000 |  0.000 | 224.000 | 48.713 | (64, 64) || layers.0.residual_group.blocks.2.attn.relative_position_index
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 540) || layers.0.residual_group.blocks.2.attn.qkv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (540,) || layers.0.residual_group.blocks.2.attn.qkv.bias
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 180) || layers.0.residual_group.blocks.2.attn.proj.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.0.residual_group.blocks.2.attn.proj.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.0.residual_group.blocks.2.norm2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.0.residual_group.blocks.2.norm2.bias
 |  0.001 | -0.105 |  0.105 |  0.061 | (180, 360) || layers.0.residual_group.blocks.2.mlp.fc1.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (360,) || layers.0.residual_group.blocks.2.mlp.fc1.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (360, 180) || layers.0.residual_group.blocks.2.mlp.fc2.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (180,) || layers.0.residual_group.blocks.2.mlp.fc2.bias
 | -6.152 | -100.000 | -0.000 | 24.029 | (256, 64, 64) || layers.0.residual_group.blocks.3.attn_mask
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.0.residual_group.blocks.3.norm1.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.0.residual_group.blocks.3.norm1.bias
 | -0.000 | -0.040 |  0.040 |  0.018 | (225, 6) || layers.0.residual_group.blocks.3.attn.relative_position_bias_table
 | 112.000 |  0.000 | 224.000 | 48.713 | (64, 64) || layers.0.residual_group.blocks.3.attn.relative_position_index
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 540) || layers.0.residual_group.blocks.3.attn.qkv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (540,) || layers.0.residual_group.blocks.3.attn.qkv.bias
 |  0.000 | -0.040 |  0.040 |  0.018 | (180, 180) || layers.0.residual_group.blocks.3.attn.proj.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.0.residual_group.blocks.3.attn.proj.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.0.residual_group.blocks.3.norm2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.0.residual_group.blocks.3.norm2.bias
 | -0.000 | -0.105 |  0.105 |  0.061 | (180, 360) || layers.0.residual_group.blocks.3.mlp.fc1.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (360,) || layers.0.residual_group.blocks.3.mlp.fc1.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (360, 180) || layers.0.residual_group.blocks.3.mlp.fc2.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (180,) || layers.0.residual_group.blocks.3.mlp.fc2.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.0.residual_group.blocks.4.norm1.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.0.residual_group.blocks.4.norm1.bias
 |  0.000 | -0.039 |  0.040 |  0.018 | (225, 6) || layers.0.residual_group.blocks.4.attn.relative_position_bias_table
 | 112.000 |  0.000 | 224.000 | 48.713 | (64, 64) || layers.0.residual_group.blocks.4.attn.relative_position_index
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 540) || layers.0.residual_group.blocks.4.attn.qkv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (540,) || layers.0.residual_group.blocks.4.attn.qkv.bias
 |  0.000 | -0.040 |  0.040 |  0.018 | (180, 180) || layers.0.residual_group.blocks.4.attn.proj.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.0.residual_group.blocks.4.attn.proj.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.0.residual_group.blocks.4.norm2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.0.residual_group.blocks.4.norm2.bias
 | -0.000 | -0.105 |  0.105 |  0.061 | (180, 360) || layers.0.residual_group.blocks.4.mlp.fc1.weight
 | -0.000 | -0.000 |  0.000 |  0.000 | (360,) || layers.0.residual_group.blocks.4.mlp.fc1.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (360, 180) || layers.0.residual_group.blocks.4.mlp.fc2.weight
 | -0.000 | -0.000 |  0.000 |  0.000 | (180,) || layers.0.residual_group.blocks.4.mlp.fc2.bias
 | -6.152 | -100.000 | -0.000 | 24.029 | (256, 64, 64) || layers.0.residual_group.blocks.5.attn_mask
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.0.residual_group.blocks.5.norm1.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.0.residual_group.blocks.5.norm1.bias
 | -0.000 | -0.040 |  0.039 |  0.018 | (225, 6) || layers.0.residual_group.blocks.5.attn.relative_position_bias_table
 | 112.000 |  0.000 | 224.000 | 48.713 | (64, 64) || layers.0.residual_group.blocks.5.attn.relative_position_index
 |  0.000 | -0.040 |  0.040 |  0.018 | (180, 540) || layers.0.residual_group.blocks.5.attn.qkv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (540,) || layers.0.residual_group.blocks.5.attn.qkv.bias
 |  0.000 | -0.040 |  0.040 |  0.018 | (180, 180) || layers.0.residual_group.blocks.5.attn.proj.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.0.residual_group.blocks.5.attn.proj.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.0.residual_group.blocks.5.norm2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.0.residual_group.blocks.5.norm2.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (180, 360) || layers.0.residual_group.blocks.5.mlp.fc1.weight
 | -0.000 | -0.000 |  0.000 |  0.000 | (360,) || layers.0.residual_group.blocks.5.mlp.fc1.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (360, 180) || layers.0.residual_group.blocks.5.mlp.fc2.weight
 | -0.000 | -0.000 |  0.000 |  0.000 | (180,) || layers.0.residual_group.blocks.5.mlp.fc2.bias
 | -0.000 | -0.155 |  0.158 |  0.035 | (180, 180, 3, 3) || layers.0.conv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.0.conv.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.1.residual_group.blocks.0.norm1.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.1.residual_group.blocks.0.norm1.bias
 | -0.000 | -0.040 |  0.039 |  0.018 | (225, 6) || layers.1.residual_group.blocks.0.attn.relative_position_bias_table
 | 112.000 |  0.000 | 224.000 | 48.713 | (64, 64) || layers.1.residual_group.blocks.0.attn.relative_position_index
 |  0.000 | -0.040 |  0.040 |  0.018 | (180, 540) || layers.1.residual_group.blocks.0.attn.qkv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (540,) || layers.1.residual_group.blocks.0.attn.qkv.bias
 |  0.000 | -0.040 |  0.040 |  0.018 | (180, 180) || layers.1.residual_group.blocks.0.attn.proj.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.1.residual_group.blocks.0.attn.proj.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.1.residual_group.blocks.0.norm2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.1.residual_group.blocks.0.norm2.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (180, 360) || layers.1.residual_group.blocks.0.mlp.fc1.weight
 | -0.000 | -0.000 |  0.000 |  0.000 | (360,) || layers.1.residual_group.blocks.0.mlp.fc1.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (360, 180) || layers.1.residual_group.blocks.0.mlp.fc2.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (180,) || layers.1.residual_group.blocks.0.mlp.fc2.bias
 | -6.152 | -100.000 | -0.000 | 24.029 | (256, 64, 64) || layers.1.residual_group.blocks.1.attn_mask
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.1.residual_group.blocks.1.norm1.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.1.residual_group.blocks.1.norm1.bias
 |  0.000 | -0.040 |  0.040 |  0.018 | (225, 6) || layers.1.residual_group.blocks.1.attn.relative_position_bias_table
 | 112.000 |  0.000 | 224.000 | 48.713 | (64, 64) || layers.1.residual_group.blocks.1.attn.relative_position_index
 |  0.000 | -0.040 |  0.040 |  0.018 | (180, 540) || layers.1.residual_group.blocks.1.attn.qkv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (540,) || layers.1.residual_group.blocks.1.attn.qkv.bias
 |  0.000 | -0.040 |  0.040 |  0.018 | (180, 180) || layers.1.residual_group.blocks.1.attn.proj.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.1.residual_group.blocks.1.attn.proj.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.1.residual_group.blocks.1.norm2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.1.residual_group.blocks.1.norm2.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (180, 360) || layers.1.residual_group.blocks.1.mlp.fc1.weight
 | -0.000 | -0.000 |  0.000 |  0.000 | (360,) || layers.1.residual_group.blocks.1.mlp.fc1.bias
 | -0.000 | -0.105 |  0.105 |  0.061 | (360, 180) || layers.1.residual_group.blocks.1.mlp.fc2.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (180,) || layers.1.residual_group.blocks.1.mlp.fc2.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.1.residual_group.blocks.2.norm1.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.1.residual_group.blocks.2.norm1.bias
 | -0.001 | -0.040 |  0.040 |  0.018 | (225, 6) || layers.1.residual_group.blocks.2.attn.relative_position_bias_table
 | 112.000 |  0.000 | 224.000 | 48.713 | (64, 64) || layers.1.residual_group.blocks.2.attn.relative_position_index
 |  0.000 | -0.040 |  0.040 |  0.018 | (180, 540) || layers.1.residual_group.blocks.2.attn.qkv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (540,) || layers.1.residual_group.blocks.2.attn.qkv.bias
 |  0.000 | -0.040 |  0.040 |  0.018 | (180, 180) || layers.1.residual_group.blocks.2.attn.proj.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.1.residual_group.blocks.2.attn.proj.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.1.residual_group.blocks.2.norm2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.1.residual_group.blocks.2.norm2.bias
 | -0.000 | -0.105 |  0.105 |  0.061 | (180, 360) || layers.1.residual_group.blocks.2.mlp.fc1.weight
 | -0.000 | -0.000 |  0.000 |  0.000 | (360,) || layers.1.residual_group.blocks.2.mlp.fc1.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (360, 180) || layers.1.residual_group.blocks.2.mlp.fc2.weight
 | -0.000 | -0.000 |  0.000 |  0.000 | (180,) || layers.1.residual_group.blocks.2.mlp.fc2.bias
 | -6.152 | -100.000 | -0.000 | 24.029 | (256, 64, 64) || layers.1.residual_group.blocks.3.attn_mask
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.1.residual_group.blocks.3.norm1.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.1.residual_group.blocks.3.norm1.bias
 |  0.001 | -0.040 |  0.040 |  0.017 | (225, 6) || layers.1.residual_group.blocks.3.attn.relative_position_bias_table
 | 112.000 |  0.000 | 224.000 | 48.713 | (64, 64) || layers.1.residual_group.blocks.3.attn.relative_position_index
 |  0.000 | -0.040 |  0.040 |  0.018 | (180, 540) || layers.1.residual_group.blocks.3.attn.qkv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (540,) || layers.1.residual_group.blocks.3.attn.qkv.bias
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 180) || layers.1.residual_group.blocks.3.attn.proj.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.1.residual_group.blocks.3.attn.proj.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.1.residual_group.blocks.3.norm2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.1.residual_group.blocks.3.norm2.bias
 | -0.000 | -0.105 |  0.105 |  0.061 | (180, 360) || layers.1.residual_group.blocks.3.mlp.fc1.weight
 | -0.000 | -0.000 |  0.000 |  0.000 | (360,) || layers.1.residual_group.blocks.3.mlp.fc1.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (360, 180) || layers.1.residual_group.blocks.3.mlp.fc2.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (180,) || layers.1.residual_group.blocks.3.mlp.fc2.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.1.residual_group.blocks.4.norm1.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.1.residual_group.blocks.4.norm1.bias
 |  0.000 | -0.040 |  0.040 |  0.017 | (225, 6) || layers.1.residual_group.blocks.4.attn.relative_position_bias_table
 | 112.000 |  0.000 | 224.000 | 48.713 | (64, 64) || layers.1.residual_group.blocks.4.attn.relative_position_index
 |  0.000 | -0.040 |  0.040 |  0.018 | (180, 540) || layers.1.residual_group.blocks.4.attn.qkv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (540,) || layers.1.residual_group.blocks.4.attn.qkv.bias
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 180) || layers.1.residual_group.blocks.4.attn.proj.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.1.residual_group.blocks.4.attn.proj.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.1.residual_group.blocks.4.norm2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.1.residual_group.blocks.4.norm2.bias
 | -0.000 | -0.105 |  0.105 |  0.061 | (180, 360) || layers.1.residual_group.blocks.4.mlp.fc1.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (360,) || layers.1.residual_group.blocks.4.mlp.fc1.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (360, 180) || layers.1.residual_group.blocks.4.mlp.fc2.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (180,) || layers.1.residual_group.blocks.4.mlp.fc2.bias
 | -6.152 | -100.000 | -0.000 | 24.029 | (256, 64, 64) || layers.1.residual_group.blocks.5.attn_mask
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.1.residual_group.blocks.5.norm1.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.1.residual_group.blocks.5.norm1.bias
 |  0.000 | -0.039 |  0.040 |  0.018 | (225, 6) || layers.1.residual_group.blocks.5.attn.relative_position_bias_table
 | 112.000 |  0.000 | 224.000 | 48.713 | (64, 64) || layers.1.residual_group.blocks.5.attn.relative_position_index
 |  0.000 | -0.040 |  0.040 |  0.018 | (180, 540) || layers.1.residual_group.blocks.5.attn.qkv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (540,) || layers.1.residual_group.blocks.5.attn.qkv.bias
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 180) || layers.1.residual_group.blocks.5.attn.proj.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.1.residual_group.blocks.5.attn.proj.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.1.residual_group.blocks.5.norm2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.1.residual_group.blocks.5.norm2.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (180, 360) || layers.1.residual_group.blocks.5.mlp.fc1.weight
 | -0.000 | -0.000 |  0.000 |  0.000 | (360,) || layers.1.residual_group.blocks.5.mlp.fc1.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (360, 180) || layers.1.residual_group.blocks.5.mlp.fc2.weight
 | -0.000 | -0.000 |  0.000 |  0.000 | (180,) || layers.1.residual_group.blocks.5.mlp.fc2.bias
 |  0.000 | -0.155 |  0.147 |  0.035 | (180, 180, 3, 3) || layers.1.conv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.1.conv.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.2.residual_group.blocks.0.norm1.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.2.residual_group.blocks.0.norm1.bias
 | -0.000 | -0.040 |  0.040 |  0.018 | (225, 6) || layers.2.residual_group.blocks.0.attn.relative_position_bias_table
 | 112.000 |  0.000 | 224.000 | 48.713 | (64, 64) || layers.2.residual_group.blocks.0.attn.relative_position_index
 |  0.000 | -0.040 |  0.040 |  0.018 | (180, 540) || layers.2.residual_group.blocks.0.attn.qkv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (540,) || layers.2.residual_group.blocks.0.attn.qkv.bias
 |  0.000 | -0.040 |  0.040 |  0.018 | (180, 180) || layers.2.residual_group.blocks.0.attn.proj.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.2.residual_group.blocks.0.attn.proj.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.2.residual_group.blocks.0.norm2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.2.residual_group.blocks.0.norm2.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (180, 360) || layers.2.residual_group.blocks.0.mlp.fc1.weight
 | -0.000 | -0.000 |  0.000 |  0.000 | (360,) || layers.2.residual_group.blocks.0.mlp.fc1.bias
 |  0.001 | -0.105 |  0.105 |  0.061 | (360, 180) || layers.2.residual_group.blocks.0.mlp.fc2.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (180,) || layers.2.residual_group.blocks.0.mlp.fc2.bias
 | -6.152 | -100.000 | -0.000 | 24.029 | (256, 64, 64) || layers.2.residual_group.blocks.1.attn_mask
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.2.residual_group.blocks.1.norm1.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.2.residual_group.blocks.1.norm1.bias
 |  0.000 | -0.039 |  0.039 |  0.017 | (225, 6) || layers.2.residual_group.blocks.1.attn.relative_position_bias_table
 | 112.000 |  0.000 | 224.000 | 48.713 | (64, 64) || layers.2.residual_group.blocks.1.attn.relative_position_index
 |  0.000 | -0.040 |  0.040 |  0.018 | (180, 540) || layers.2.residual_group.blocks.1.attn.qkv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (540,) || layers.2.residual_group.blocks.1.attn.qkv.bias
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 180) || layers.2.residual_group.blocks.1.attn.proj.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.2.residual_group.blocks.1.attn.proj.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.2.residual_group.blocks.1.norm2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.2.residual_group.blocks.1.norm2.bias
 | -0.000 | -0.105 |  0.105 |  0.061 | (180, 360) || layers.2.residual_group.blocks.1.mlp.fc1.weight
 | -0.000 | -0.000 |  0.000 |  0.000 | (360,) || layers.2.residual_group.blocks.1.mlp.fc1.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (360, 180) || layers.2.residual_group.blocks.1.mlp.fc2.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (180,) || layers.2.residual_group.blocks.1.mlp.fc2.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.2.residual_group.blocks.2.norm1.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.2.residual_group.blocks.2.norm1.bias
 | -0.000 | -0.040 |  0.040 |  0.018 | (225, 6) || layers.2.residual_group.blocks.2.attn.relative_position_bias_table
 | 112.000 |  0.000 | 224.000 | 48.713 | (64, 64) || layers.2.residual_group.blocks.2.attn.relative_position_index
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 540) || layers.2.residual_group.blocks.2.attn.qkv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (540,) || layers.2.residual_group.blocks.2.attn.qkv.bias
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 180) || layers.2.residual_group.blocks.2.attn.proj.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.2.residual_group.blocks.2.attn.proj.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.2.residual_group.blocks.2.norm2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.2.residual_group.blocks.2.norm2.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (180, 360) || layers.2.residual_group.blocks.2.mlp.fc1.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (360,) || layers.2.residual_group.blocks.2.mlp.fc1.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (360, 180) || layers.2.residual_group.blocks.2.mlp.fc2.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (180,) || layers.2.residual_group.blocks.2.mlp.fc2.bias
 | -6.152 | -100.000 | -0.000 | 24.029 | (256, 64, 64) || layers.2.residual_group.blocks.3.attn_mask
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.2.residual_group.blocks.3.norm1.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.2.residual_group.blocks.3.norm1.bias
 | -0.000 | -0.039 |  0.039 |  0.018 | (225, 6) || layers.2.residual_group.blocks.3.attn.relative_position_bias_table
 | 112.000 |  0.000 | 224.000 | 48.713 | (64, 64) || layers.2.residual_group.blocks.3.attn.relative_position_index
 |  0.000 | -0.040 |  0.040 |  0.018 | (180, 540) || layers.2.residual_group.blocks.3.attn.qkv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (540,) || layers.2.residual_group.blocks.3.attn.qkv.bias
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 180) || layers.2.residual_group.blocks.3.attn.proj.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.2.residual_group.blocks.3.attn.proj.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.2.residual_group.blocks.3.norm2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.2.residual_group.blocks.3.norm2.bias
 | -0.001 | -0.105 |  0.105 |  0.061 | (180, 360) || layers.2.residual_group.blocks.3.mlp.fc1.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (360,) || layers.2.residual_group.blocks.3.mlp.fc1.bias
 | -0.000 | -0.105 |  0.105 |  0.061 | (360, 180) || layers.2.residual_group.blocks.3.mlp.fc2.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (180,) || layers.2.residual_group.blocks.3.mlp.fc2.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.2.residual_group.blocks.4.norm1.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.2.residual_group.blocks.4.norm1.bias
 | -0.000 | -0.040 |  0.040 |  0.017 | (225, 6) || layers.2.residual_group.blocks.4.attn.relative_position_bias_table
 | 112.000 |  0.000 | 224.000 | 48.713 | (64, 64) || layers.2.residual_group.blocks.4.attn.relative_position_index
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 540) || layers.2.residual_group.blocks.4.attn.qkv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (540,) || layers.2.residual_group.blocks.4.attn.qkv.bias
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 180) || layers.2.residual_group.blocks.4.attn.proj.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.2.residual_group.blocks.4.attn.proj.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.2.residual_group.blocks.4.norm2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.2.residual_group.blocks.4.norm2.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (180, 360) || layers.2.residual_group.blocks.4.mlp.fc1.weight
 | -0.000 | -0.000 |  0.000 |  0.000 | (360,) || layers.2.residual_group.blocks.4.mlp.fc1.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (360, 180) || layers.2.residual_group.blocks.4.mlp.fc2.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (180,) || layers.2.residual_group.blocks.4.mlp.fc2.bias
 | -6.152 | -100.000 | -0.000 | 24.029 | (256, 64, 64) || layers.2.residual_group.blocks.5.attn_mask
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.2.residual_group.blocks.5.norm1.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.2.residual_group.blocks.5.norm1.bias
 |  0.000 | -0.040 |  0.040 |  0.018 | (225, 6) || layers.2.residual_group.blocks.5.attn.relative_position_bias_table
 | 112.000 |  0.000 | 224.000 | 48.713 | (64, 64) || layers.2.residual_group.blocks.5.attn.relative_position_index
 |  0.000 | -0.040 |  0.040 |  0.018 | (180, 540) || layers.2.residual_group.blocks.5.attn.qkv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (540,) || layers.2.residual_group.blocks.5.attn.qkv.bias
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 180) || layers.2.residual_group.blocks.5.attn.proj.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.2.residual_group.blocks.5.attn.proj.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.2.residual_group.blocks.5.norm2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.2.residual_group.blocks.5.norm2.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (180, 360) || layers.2.residual_group.blocks.5.mlp.fc1.weight
 | -0.000 | -0.000 |  0.000 |  0.000 | (360,) || layers.2.residual_group.blocks.5.mlp.fc1.bias
 | -0.000 | -0.105 |  0.105 |  0.061 | (360, 180) || layers.2.residual_group.blocks.5.mlp.fc2.weight
 | -0.000 | -0.000 |  0.000 |  0.000 | (180,) || layers.2.residual_group.blocks.5.mlp.fc2.bias
 |  0.000 | -0.159 |  0.157 |  0.035 | (180, 180, 3, 3) || layers.2.conv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.2.conv.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.3.residual_group.blocks.0.norm1.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.3.residual_group.blocks.0.norm1.bias
 | -0.000 | -0.040 |  0.040 |  0.018 | (225, 6) || layers.3.residual_group.blocks.0.attn.relative_position_bias_table
 | 112.000 |  0.000 | 224.000 | 48.713 | (64, 64) || layers.3.residual_group.blocks.0.attn.relative_position_index
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 540) || layers.3.residual_group.blocks.0.attn.qkv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (540,) || layers.3.residual_group.blocks.0.attn.qkv.bias
 |  0.000 | -0.040 |  0.040 |  0.018 | (180, 180) || layers.3.residual_group.blocks.0.attn.proj.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.3.residual_group.blocks.0.attn.proj.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.3.residual_group.blocks.0.norm2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.3.residual_group.blocks.0.norm2.bias
 | -0.000 | -0.105 |  0.105 |  0.061 | (180, 360) || layers.3.residual_group.blocks.0.mlp.fc1.weight
 | -0.000 | -0.000 |  0.000 |  0.000 | (360,) || layers.3.residual_group.blocks.0.mlp.fc1.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (360, 180) || layers.3.residual_group.blocks.0.mlp.fc2.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (180,) || layers.3.residual_group.blocks.0.mlp.fc2.bias
 | -6.152 | -100.000 | -0.000 | 24.029 | (256, 64, 64) || layers.3.residual_group.blocks.1.attn_mask
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.3.residual_group.blocks.1.norm1.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.3.residual_group.blocks.1.norm1.bias
 | -0.001 | -0.039 |  0.040 |  0.018 | (225, 6) || layers.3.residual_group.blocks.1.attn.relative_position_bias_table
 | 112.000 |  0.000 | 224.000 | 48.713 | (64, 64) || layers.3.residual_group.blocks.1.attn.relative_position_index
 |  0.000 | -0.040 |  0.040 |  0.018 | (180, 540) || layers.3.residual_group.blocks.1.attn.qkv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (540,) || layers.3.residual_group.blocks.1.attn.qkv.bias
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 180) || layers.3.residual_group.blocks.1.attn.proj.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.3.residual_group.blocks.1.attn.proj.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.3.residual_group.blocks.1.norm2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.3.residual_group.blocks.1.norm2.bias
 | -0.000 | -0.105 |  0.105 |  0.061 | (180, 360) || layers.3.residual_group.blocks.1.mlp.fc1.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (360,) || layers.3.residual_group.blocks.1.mlp.fc1.bias
 |  0.001 | -0.105 |  0.105 |  0.061 | (360, 180) || layers.3.residual_group.blocks.1.mlp.fc2.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (180,) || layers.3.residual_group.blocks.1.mlp.fc2.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.3.residual_group.blocks.2.norm1.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.3.residual_group.blocks.2.norm1.bias
 | -0.001 | -0.040 |  0.040 |  0.017 | (225, 6) || layers.3.residual_group.blocks.2.attn.relative_position_bias_table
 | 112.000 |  0.000 | 224.000 | 48.713 | (64, 64) || layers.3.residual_group.blocks.2.attn.relative_position_index
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 540) || layers.3.residual_group.blocks.2.attn.qkv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (540,) || layers.3.residual_group.blocks.2.attn.qkv.bias
 |  0.000 | -0.040 |  0.040 |  0.018 | (180, 180) || layers.3.residual_group.blocks.2.attn.proj.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.3.residual_group.blocks.2.attn.proj.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.3.residual_group.blocks.2.norm2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.3.residual_group.blocks.2.norm2.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (180, 360) || layers.3.residual_group.blocks.2.mlp.fc1.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (360,) || layers.3.residual_group.blocks.2.mlp.fc1.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (360, 180) || layers.3.residual_group.blocks.2.mlp.fc2.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (180,) || layers.3.residual_group.blocks.2.mlp.fc2.bias
 | -6.152 | -100.000 | -0.000 | 24.029 | (256, 64, 64) || layers.3.residual_group.blocks.3.attn_mask
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.3.residual_group.blocks.3.norm1.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.3.residual_group.blocks.3.norm1.bias
 |  0.000 | -0.040 |  0.039 |  0.018 | (225, 6) || layers.3.residual_group.blocks.3.attn.relative_position_bias_table
 | 112.000 |  0.000 | 224.000 | 48.713 | (64, 64) || layers.3.residual_group.blocks.3.attn.relative_position_index
 |  0.000 | -0.040 |  0.040 |  0.018 | (180, 540) || layers.3.residual_group.blocks.3.attn.qkv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (540,) || layers.3.residual_group.blocks.3.attn.qkv.bias
 |  0.000 | -0.040 |  0.040 |  0.018 | (180, 180) || layers.3.residual_group.blocks.3.attn.proj.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.3.residual_group.blocks.3.attn.proj.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.3.residual_group.blocks.3.norm2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.3.residual_group.blocks.3.norm2.bias
 | -0.000 | -0.105 |  0.105 |  0.061 | (180, 360) || layers.3.residual_group.blocks.3.mlp.fc1.weight
 | -0.000 | -0.000 |  0.000 |  0.000 | (360,) || layers.3.residual_group.blocks.3.mlp.fc1.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (360, 180) || layers.3.residual_group.blocks.3.mlp.fc2.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (180,) || layers.3.residual_group.blocks.3.mlp.fc2.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.3.residual_group.blocks.4.norm1.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.3.residual_group.blocks.4.norm1.bias
 |  0.000 | -0.040 |  0.040 |  0.018 | (225, 6) || layers.3.residual_group.blocks.4.attn.relative_position_bias_table
 | 112.000 |  0.000 | 224.000 | 48.713 | (64, 64) || layers.3.residual_group.blocks.4.attn.relative_position_index
 |  0.000 | -0.040 |  0.040 |  0.018 | (180, 540) || layers.3.residual_group.blocks.4.attn.qkv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (540,) || layers.3.residual_group.blocks.4.attn.qkv.bias
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 180) || layers.3.residual_group.blocks.4.attn.proj.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.3.residual_group.blocks.4.attn.proj.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.3.residual_group.blocks.4.norm2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.3.residual_group.blocks.4.norm2.bias
 | -0.000 | -0.105 |  0.105 |  0.061 | (180, 360) || layers.3.residual_group.blocks.4.mlp.fc1.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (360,) || layers.3.residual_group.blocks.4.mlp.fc1.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (360, 180) || layers.3.residual_group.blocks.4.mlp.fc2.weight
 | -0.000 | -0.000 |  0.000 |  0.000 | (180,) || layers.3.residual_group.blocks.4.mlp.fc2.bias
 | -6.152 | -100.000 | -0.000 | 24.029 | (256, 64, 64) || layers.3.residual_group.blocks.5.attn_mask
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.3.residual_group.blocks.5.norm1.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.3.residual_group.blocks.5.norm1.bias
 |  0.000 | -0.040 |  0.040 |  0.018 | (225, 6) || layers.3.residual_group.blocks.5.attn.relative_position_bias_table
 | 112.000 |  0.000 | 224.000 | 48.713 | (64, 64) || layers.3.residual_group.blocks.5.attn.relative_position_index
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 540) || layers.3.residual_group.blocks.5.attn.qkv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (540,) || layers.3.residual_group.blocks.5.attn.qkv.bias
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 180) || layers.3.residual_group.blocks.5.attn.proj.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.3.residual_group.blocks.5.attn.proj.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.3.residual_group.blocks.5.norm2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.3.residual_group.blocks.5.norm2.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (180, 360) || layers.3.residual_group.blocks.5.mlp.fc1.weight
 | -0.000 | -0.000 |  0.000 |  0.000 | (360,) || layers.3.residual_group.blocks.5.mlp.fc1.bias
 | -0.000 | -0.105 |  0.105 |  0.061 | (360, 180) || layers.3.residual_group.blocks.5.mlp.fc2.weight
 | -0.000 | -0.000 |  0.000 |  0.000 | (180,) || layers.3.residual_group.blocks.5.mlp.fc2.bias
 | -0.000 | -0.166 |  0.169 |  0.035 | (180, 180, 3, 3) || layers.3.conv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.3.conv.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.4.residual_group.blocks.0.norm1.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.4.residual_group.blocks.0.norm1.bias
 |  0.001 | -0.040 |  0.040 |  0.017 | (225, 6) || layers.4.residual_group.blocks.0.attn.relative_position_bias_table
 | 112.000 |  0.000 | 224.000 | 48.713 | (64, 64) || layers.4.residual_group.blocks.0.attn.relative_position_index
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 540) || layers.4.residual_group.blocks.0.attn.qkv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (540,) || layers.4.residual_group.blocks.0.attn.qkv.bias
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 180) || layers.4.residual_group.blocks.0.attn.proj.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.4.residual_group.blocks.0.attn.proj.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.4.residual_group.blocks.0.norm2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.4.residual_group.blocks.0.norm2.bias
 | -0.000 | -0.105 |  0.105 |  0.061 | (180, 360) || layers.4.residual_group.blocks.0.mlp.fc1.weight
 | -0.000 | -0.000 |  0.000 |  0.000 | (360,) || layers.4.residual_group.blocks.0.mlp.fc1.bias
 | -0.000 | -0.105 |  0.105 |  0.061 | (360, 180) || layers.4.residual_group.blocks.0.mlp.fc2.weight
 | -0.000 | -0.000 |  0.000 |  0.000 | (180,) || layers.4.residual_group.blocks.0.mlp.fc2.bias
 | -6.152 | -100.000 | -0.000 | 24.029 | (256, 64, 64) || layers.4.residual_group.blocks.1.attn_mask
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.4.residual_group.blocks.1.norm1.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.4.residual_group.blocks.1.norm1.bias
 | -0.001 | -0.040 |  0.040 |  0.018 | (225, 6) || layers.4.residual_group.blocks.1.attn.relative_position_bias_table
 | 112.000 |  0.000 | 224.000 | 48.713 | (64, 64) || layers.4.residual_group.blocks.1.attn.relative_position_index
 |  0.000 | -0.040 |  0.040 |  0.018 | (180, 540) || layers.4.residual_group.blocks.1.attn.qkv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (540,) || layers.4.residual_group.blocks.1.attn.qkv.bias
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 180) || layers.4.residual_group.blocks.1.attn.proj.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.4.residual_group.blocks.1.attn.proj.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.4.residual_group.blocks.1.norm2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.4.residual_group.blocks.1.norm2.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (180, 360) || layers.4.residual_group.blocks.1.mlp.fc1.weight
 | -0.000 | -0.000 |  0.000 |  0.000 | (360,) || layers.4.residual_group.blocks.1.mlp.fc1.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (360, 180) || layers.4.residual_group.blocks.1.mlp.fc2.weight
 | -0.000 | -0.000 |  0.000 |  0.000 | (180,) || layers.4.residual_group.blocks.1.mlp.fc2.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.4.residual_group.blocks.2.norm1.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.4.residual_group.blocks.2.norm1.bias
 | -0.000 | -0.040 |  0.039 |  0.018 | (225, 6) || layers.4.residual_group.blocks.2.attn.relative_position_bias_table
 | 112.000 |  0.000 | 224.000 | 48.713 | (64, 64) || layers.4.residual_group.blocks.2.attn.relative_position_index
 |  0.000 | -0.040 |  0.040 |  0.018 | (180, 540) || layers.4.residual_group.blocks.2.attn.qkv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (540,) || layers.4.residual_group.blocks.2.attn.qkv.bias
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 180) || layers.4.residual_group.blocks.2.attn.proj.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.4.residual_group.blocks.2.attn.proj.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.4.residual_group.blocks.2.norm2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.4.residual_group.blocks.2.norm2.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (180, 360) || layers.4.residual_group.blocks.2.mlp.fc1.weight
 | -0.000 | -0.000 |  0.000 |  0.000 | (360,) || layers.4.residual_group.blocks.2.mlp.fc1.bias
 | -0.000 | -0.105 |  0.105 |  0.061 | (360, 180) || layers.4.residual_group.blocks.2.mlp.fc2.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (180,) || layers.4.residual_group.blocks.2.mlp.fc2.bias
 | -6.152 | -100.000 | -0.000 | 24.029 | (256, 64, 64) || layers.4.residual_group.blocks.3.attn_mask
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.4.residual_group.blocks.3.norm1.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.4.residual_group.blocks.3.norm1.bias
 | -0.001 | -0.040 |  0.040 |  0.018 | (225, 6) || layers.4.residual_group.blocks.3.attn.relative_position_bias_table
 | 112.000 |  0.000 | 224.000 | 48.713 | (64, 64) || layers.4.residual_group.blocks.3.attn.relative_position_index
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 540) || layers.4.residual_group.blocks.3.attn.qkv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (540,) || layers.4.residual_group.blocks.3.attn.qkv.bias
 |  0.000 | -0.040 |  0.040 |  0.018 | (180, 180) || layers.4.residual_group.blocks.3.attn.proj.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.4.residual_group.blocks.3.attn.proj.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.4.residual_group.blocks.3.norm2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.4.residual_group.blocks.3.norm2.bias
 | -0.000 | -0.105 |  0.105 |  0.061 | (180, 360) || layers.4.residual_group.blocks.3.mlp.fc1.weight
 | -0.000 | -0.000 |  0.000 |  0.000 | (360,) || layers.4.residual_group.blocks.3.mlp.fc1.bias
 | -0.000 | -0.105 |  0.105 |  0.061 | (360, 180) || layers.4.residual_group.blocks.3.mlp.fc2.weight
 | -0.000 | -0.000 |  0.000 |  0.000 | (180,) || layers.4.residual_group.blocks.3.mlp.fc2.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.4.residual_group.blocks.4.norm1.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.4.residual_group.blocks.4.norm1.bias
 |  0.000 | -0.040 |  0.040 |  0.018 | (225, 6) || layers.4.residual_group.blocks.4.attn.relative_position_bias_table
 | 112.000 |  0.000 | 224.000 | 48.713 | (64, 64) || layers.4.residual_group.blocks.4.attn.relative_position_index
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 540) || layers.4.residual_group.blocks.4.attn.qkv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (540,) || layers.4.residual_group.blocks.4.attn.qkv.bias
 |  0.000 | -0.040 |  0.040 |  0.018 | (180, 180) || layers.4.residual_group.blocks.4.attn.proj.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.4.residual_group.blocks.4.attn.proj.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.4.residual_group.blocks.4.norm2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.4.residual_group.blocks.4.norm2.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (180, 360) || layers.4.residual_group.blocks.4.mlp.fc1.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (360,) || layers.4.residual_group.blocks.4.mlp.fc1.bias
 | -0.000 | -0.105 |  0.105 |  0.061 | (360, 180) || layers.4.residual_group.blocks.4.mlp.fc2.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (180,) || layers.4.residual_group.blocks.4.mlp.fc2.bias
 | -6.152 | -100.000 | -0.000 | 24.029 | (256, 64, 64) || layers.4.residual_group.blocks.5.attn_mask
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.4.residual_group.blocks.5.norm1.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.4.residual_group.blocks.5.norm1.bias
 |  0.000 | -0.040 |  0.039 |  0.017 | (225, 6) || layers.4.residual_group.blocks.5.attn.relative_position_bias_table
 | 112.000 |  0.000 | 224.000 | 48.713 | (64, 64) || layers.4.residual_group.blocks.5.attn.relative_position_index
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 540) || layers.4.residual_group.blocks.5.attn.qkv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (540,) || layers.4.residual_group.blocks.5.attn.qkv.bias
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 180) || layers.4.residual_group.blocks.5.attn.proj.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.4.residual_group.blocks.5.attn.proj.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.4.residual_group.blocks.5.norm2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.4.residual_group.blocks.5.norm2.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (180, 360) || layers.4.residual_group.blocks.5.mlp.fc1.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (360,) || layers.4.residual_group.blocks.5.mlp.fc1.bias
 | -0.000 | -0.105 |  0.105 |  0.061 | (360, 180) || layers.4.residual_group.blocks.5.mlp.fc2.weight
 | -0.000 | -0.000 |  0.000 |  0.000 | (180,) || layers.4.residual_group.blocks.5.mlp.fc2.bias
 | -0.000 | -0.148 |  0.169 |  0.035 | (180, 180, 3, 3) || layers.4.conv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.4.conv.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.5.residual_group.blocks.0.norm1.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.5.residual_group.blocks.0.norm1.bias
 |  0.001 | -0.040 |  0.040 |  0.018 | (225, 6) || layers.5.residual_group.blocks.0.attn.relative_position_bias_table
 | 112.000 |  0.000 | 224.000 | 48.713 | (64, 64) || layers.5.residual_group.blocks.0.attn.relative_position_index
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 540) || layers.5.residual_group.blocks.0.attn.qkv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (540,) || layers.5.residual_group.blocks.0.attn.qkv.bias
 |  0.000 | -0.040 |  0.040 |  0.018 | (180, 180) || layers.5.residual_group.blocks.0.attn.proj.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.5.residual_group.blocks.0.attn.proj.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.5.residual_group.blocks.0.norm2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.5.residual_group.blocks.0.norm2.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (180, 360) || layers.5.residual_group.blocks.0.mlp.fc1.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (360,) || layers.5.residual_group.blocks.0.mlp.fc1.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (360, 180) || layers.5.residual_group.blocks.0.mlp.fc2.weight
 | -0.000 | -0.000 |  0.000 |  0.000 | (180,) || layers.5.residual_group.blocks.0.mlp.fc2.bias
 | -6.152 | -100.000 | -0.000 | 24.029 | (256, 64, 64) || layers.5.residual_group.blocks.1.attn_mask
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.5.residual_group.blocks.1.norm1.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.5.residual_group.blocks.1.norm1.bias
 |  0.001 | -0.039 |  0.040 |  0.018 | (225, 6) || layers.5.residual_group.blocks.1.attn.relative_position_bias_table
 | 112.000 |  0.000 | 224.000 | 48.713 | (64, 64) || layers.5.residual_group.blocks.1.attn.relative_position_index
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 540) || layers.5.residual_group.blocks.1.attn.qkv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (540,) || layers.5.residual_group.blocks.1.attn.qkv.bias
 |  0.000 | -0.040 |  0.040 |  0.018 | (180, 180) || layers.5.residual_group.blocks.1.attn.proj.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.5.residual_group.blocks.1.attn.proj.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.5.residual_group.blocks.1.norm2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.5.residual_group.blocks.1.norm2.bias
 | -0.000 | -0.105 |  0.105 |  0.061 | (180, 360) || layers.5.residual_group.blocks.1.mlp.fc1.weight
 | -0.000 | -0.000 |  0.000 |  0.000 | (360,) || layers.5.residual_group.blocks.1.mlp.fc1.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (360, 180) || layers.5.residual_group.blocks.1.mlp.fc2.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (180,) || layers.5.residual_group.blocks.1.mlp.fc2.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.5.residual_group.blocks.2.norm1.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.5.residual_group.blocks.2.norm1.bias
 |  0.000 | -0.039 |  0.040 |  0.018 | (225, 6) || layers.5.residual_group.blocks.2.attn.relative_position_bias_table
 | 112.000 |  0.000 | 224.000 | 48.713 | (64, 64) || layers.5.residual_group.blocks.2.attn.relative_position_index
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 540) || layers.5.residual_group.blocks.2.attn.qkv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (540,) || layers.5.residual_group.blocks.2.attn.qkv.bias
 |  0.000 | -0.040 |  0.040 |  0.018 | (180, 180) || layers.5.residual_group.blocks.2.attn.proj.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.5.residual_group.blocks.2.attn.proj.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.5.residual_group.blocks.2.norm2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.5.residual_group.blocks.2.norm2.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (180, 360) || layers.5.residual_group.blocks.2.mlp.fc1.weight
 | -0.000 | -0.000 |  0.000 |  0.000 | (360,) || layers.5.residual_group.blocks.2.mlp.fc1.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (360, 180) || layers.5.residual_group.blocks.2.mlp.fc2.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (180,) || layers.5.residual_group.blocks.2.mlp.fc2.bias
 | -6.152 | -100.000 | -0.000 | 24.029 | (256, 64, 64) || layers.5.residual_group.blocks.3.attn_mask
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.5.residual_group.blocks.3.norm1.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.5.residual_group.blocks.3.norm1.bias
 | -0.000 | -0.040 |  0.040 |  0.018 | (225, 6) || layers.5.residual_group.blocks.3.attn.relative_position_bias_table
 | 112.000 |  0.000 | 224.000 | 48.713 | (64, 64) || layers.5.residual_group.blocks.3.attn.relative_position_index
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 540) || layers.5.residual_group.blocks.3.attn.qkv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (540,) || layers.5.residual_group.blocks.3.attn.qkv.bias
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 180) || layers.5.residual_group.blocks.3.attn.proj.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.5.residual_group.blocks.3.attn.proj.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.5.residual_group.blocks.3.norm2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.5.residual_group.blocks.3.norm2.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (180, 360) || layers.5.residual_group.blocks.3.mlp.fc1.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (360,) || layers.5.residual_group.blocks.3.mlp.fc1.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (360, 180) || layers.5.residual_group.blocks.3.mlp.fc2.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (180,) || layers.5.residual_group.blocks.3.mlp.fc2.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.5.residual_group.blocks.4.norm1.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.5.residual_group.blocks.4.norm1.bias
 |  0.000 | -0.040 |  0.040 |  0.017 | (225, 6) || layers.5.residual_group.blocks.4.attn.relative_position_bias_table
 | 112.000 |  0.000 | 224.000 | 48.713 | (64, 64) || layers.5.residual_group.blocks.4.attn.relative_position_index
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 540) || layers.5.residual_group.blocks.4.attn.qkv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (540,) || layers.5.residual_group.blocks.4.attn.qkv.bias
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 180) || layers.5.residual_group.blocks.4.attn.proj.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.5.residual_group.blocks.4.attn.proj.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.5.residual_group.blocks.4.norm2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.5.residual_group.blocks.4.norm2.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (180, 360) || layers.5.residual_group.blocks.4.mlp.fc1.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (360,) || layers.5.residual_group.blocks.4.mlp.fc1.bias
 | -0.000 | -0.105 |  0.105 |  0.061 | (360, 180) || layers.5.residual_group.blocks.4.mlp.fc2.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (180,) || layers.5.residual_group.blocks.4.mlp.fc2.bias
 | -6.152 | -100.000 | -0.000 | 24.029 | (256, 64, 64) || layers.5.residual_group.blocks.5.attn_mask
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.5.residual_group.blocks.5.norm1.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.5.residual_group.blocks.5.norm1.bias
 |  0.000 | -0.040 |  0.040 |  0.018 | (225, 6) || layers.5.residual_group.blocks.5.attn.relative_position_bias_table
 | 112.000 |  0.000 | 224.000 | 48.713 | (64, 64) || layers.5.residual_group.blocks.5.attn.relative_position_index
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 540) || layers.5.residual_group.blocks.5.attn.qkv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (540,) || layers.5.residual_group.blocks.5.attn.qkv.bias
 | -0.000 | -0.040 |  0.040 |  0.018 | (180, 180) || layers.5.residual_group.blocks.5.attn.proj.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.5.residual_group.blocks.5.attn.proj.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || layers.5.residual_group.blocks.5.norm2.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.5.residual_group.blocks.5.norm2.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (180, 360) || layers.5.residual_group.blocks.5.mlp.fc1.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (360,) || layers.5.residual_group.blocks.5.mlp.fc1.bias
 |  0.000 | -0.105 |  0.105 |  0.061 | (360, 180) || layers.5.residual_group.blocks.5.mlp.fc2.weight
 |  0.000 | -0.000 |  0.000 |  0.000 | (180,) || layers.5.residual_group.blocks.5.mlp.fc2.bias
 | -0.000 | -0.152 |  0.160 |  0.035 | (180, 180, 3, 3) || layers.5.conv.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || layers.5.conv.bias
 |  1.000 |  1.000 |  1.000 |  0.000 | (180,) || norm.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || norm.bias
 | -0.000 | -0.166 |  0.159 |  0.035 | (180, 180, 3, 3) || conv_after_body.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (180,) || conv_after_body.bias
 |  0.000 | -0.120 |  0.133 |  0.035 | (3, 180, 3, 3) || conv_last.weight
 |  0.000 |  0.000 |  0.000 |  0.000 | (3,) || conv_last.bias
# 单机四卡
# !cd work && python -m paddle.distributed.launch main_train_psnr.py --opt options/train_swinir_multi_card_32.json

训练过程会将模型参数保存在 work/denoising/swinir_denoising_color_15/models/ 文件夹下.

训练日志将会保存在 work/denoising/swinir_denoising_color_15/models/train.log

本人单机四卡的训练日志为 work/train.log

5.2 模型评估与预测

在 CBSD68 测试数据上作测试,加强度为15的噪声,结果将存放在 work/results/swinir_color_dn_noise15/ 文件夹下

!cd work && python main_test_swinir.py --task color_dn --noise 15 --model_path pretrained_models/SwinIR_paddle.pdparams --folder_gt testsets/CBSD68/
loading model from pretrained_models/SwinIR_paddle.pdparams
W1016 16:39:45.440834 13414 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W1016 16:39:45.444900 13414 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
Testing 0 101085               - PSNR: 31.32 dB; SSIM: 0.9118; PSNR_Y: 32.88 dB; SSIM_Y: 0.9210; PSNR_B: 0.00 dB.
Testing 1 101087               - PSNR: 35.39 dB; SSIM: 0.9517; PSNR_Y: 37.14 dB; SSIM_Y: 0.9603; PSNR_B: 0.00 dB.
Testing 2 102061               - PSNR: 35.21 dB; SSIM: 0.9348; PSNR_Y: 36.97 dB; SSIM_Y: 0.9477; PSNR_B: 0.00 dB.
Testing 3 103070               - PSNR: 35.86 dB; SSIM: 0.9448; PSNR_Y: 37.79 dB; SSIM_Y: 0.9562; PSNR_B: 0.00 dB.
Testing 4 105025               - PSNR: 33.11 dB; SSIM: 0.9426; PSNR_Y: 34.87 dB; SSIM_Y: 0.9509; PSNR_B: 0.00 dB.
Testing 5 106024               - PSNR: 37.57 dB; SSIM: 0.9560; PSNR_Y: 39.35 dB; SSIM_Y: 0.9652; PSNR_B: 0.00 dB.
Testing 6 108005               - PSNR: 33.86 dB; SSIM: 0.9337; PSNR_Y: 35.73 dB; SSIM_Y: 0.9470; PSNR_B: 0.00 dB.
Testing 7 108070               - PSNR: 31.96 dB; SSIM: 0.9222; PSNR_Y: 33.71 dB; SSIM_Y: 0.9353; PSNR_B: 0.00 dB.
Testing 8 108082               - PSNR: 34.37 dB; SSIM: 0.9420; PSNR_Y: 36.06 dB; SSIM_Y: 0.9520; PSNR_B: 0.00 dB.
Testing 9 109053               - PSNR: 34.67 dB; SSIM: 0.9309; PSNR_Y: 36.52 dB; SSIM_Y: 0.9428; PSNR_B: 0.00 dB.
Testing 10 119082               - PSNR: 34.75 dB; SSIM: 0.9570; PSNR_Y: 36.56 dB; SSIM_Y: 0.9655; PSNR_B: 0.00 dB.
Testing 11 12084                - PSNR: 33.08 dB; SSIM: 0.9168; PSNR_Y: 35.40 dB; SSIM_Y: 0.9340; PSNR_B: 0.00 dB.
Testing 12 123074               - PSNR: 34.70 dB; SSIM: 0.9363; PSNR_Y: 36.45 dB; SSIM_Y: 0.9465; PSNR_B: 0.00 dB.
Testing 13 126007               - PSNR: 35.77 dB; SSIM: 0.9334; PSNR_Y: 37.60 dB; SSIM_Y: 0.9484; PSNR_B: 0.00 dB.
Testing 14 130026               - PSNR: 32.89 dB; SSIM: 0.9189; PSNR_Y: 34.51 dB; SSIM_Y: 0.9319; PSNR_B: 0.00 dB.
Testing 15 134035               - PSNR: 33.90 dB; SSIM: 0.9480; PSNR_Y: 35.66 dB; SSIM_Y: 0.9559; PSNR_B: 0.00 dB.
Testing 16 14037                - PSNR: 37.21 dB; SSIM: 0.9387; PSNR_Y: 38.95 dB; SSIM_Y: 0.9533; PSNR_B: 0.00 dB.
Testing 17 143090               - PSNR: 37.57 dB; SSIM: 0.9513; PSNR_Y: 39.62 dB; SSIM_Y: 0.9639; PSNR_B: 0.00 dB.
Testing 18 145086               - PSNR: 33.79 dB; SSIM: 0.9310; PSNR_Y: 35.46 dB; SSIM_Y: 0.9461; PSNR_B: 0.00 dB.
Testing 19 147091               - PSNR: 34.33 dB; SSIM: 0.9289; PSNR_Y: 36.20 dB; SSIM_Y: 0.9468; PSNR_B: 0.00 dB.
Testing 20 148026               - PSNR: 32.43 dB; SSIM: 0.9511; PSNR_Y: 34.22 dB; SSIM_Y: 0.9619; PSNR_B: 0.00 dB.
Testing 21 148089               - PSNR: 32.71 dB; SSIM: 0.9363; PSNR_Y: 34.36 dB; SSIM_Y: 0.9462; PSNR_B: 0.00 dB.
Testing 22 157055               - PSNR: 34.56 dB; SSIM: 0.9495; PSNR_Y: 36.59 dB; SSIM_Y: 0.9616; PSNR_B: 0.00 dB.
Testing 23 159008               - PSNR: 34.26 dB; SSIM: 0.9460; PSNR_Y: 35.95 dB; SSIM_Y: 0.9578; PSNR_B: 0.00 dB.
Testing 24 160068               - PSNR: 35.17 dB; SSIM: 0.9682; PSNR_Y: 36.88 dB; SSIM_Y: 0.9758; PSNR_B: 0.00 dB.
Testing 25 16077                - PSNR: 33.49 dB; SSIM: 0.9201; PSNR_Y: 35.28 dB; SSIM_Y: 0.9347; PSNR_B: 0.00 dB.
Testing 26 163085               - PSNR: 34.90 dB; SSIM: 0.9267; PSNR_Y: 36.75 dB; SSIM_Y: 0.9421; PSNR_B: 0.00 dB.
Testing 27 167062               - PSNR: 37.95 dB; SSIM: 0.9449; PSNR_Y: 39.81 dB; SSIM_Y: 0.9601; PSNR_B: 0.00 dB.
Testing 28 167083               - PSNR: 31.24 dB; SSIM: 0.9596; PSNR_Y: 32.78 dB; SSIM_Y: 0.9645; PSNR_B: 0.00 dB.
Testing 29 170057               - PSNR: 34.72 dB; SSIM: 0.9203; PSNR_Y: 36.58 dB; SSIM_Y: 0.9371; PSNR_B: 0.00 dB.
Testing 30 175032               - PSNR: 30.79 dB; SSIM: 0.9515; PSNR_Y: 32.49 dB; SSIM_Y: 0.9566; PSNR_B: 0.00 dB.
Testing 31 175043               - PSNR: 32.28 dB; SSIM: 0.9442; PSNR_Y: 34.06 dB; SSIM_Y: 0.9519; PSNR_B: 0.00 dB.
Testing 32 182053               - PSNR: 33.90 dB; SSIM: 0.9614; PSNR_Y: 35.50 dB; SSIM_Y: 0.9692; PSNR_B: 0.00 dB.
Testing 33 189080               - PSNR: 37.19 dB; SSIM: 0.9205; PSNR_Y: 38.89 dB; SSIM_Y: 0.9385; PSNR_B: 0.00 dB.
Testing 34 19021                - PSNR: 33.28 dB; SSIM: 0.9276; PSNR_Y: 34.99 dB; SSIM_Y: 0.9394; PSNR_B: 0.00 dB.
Testing 35 196073               - PSNR: 31.92 dB; SSIM: 0.8469; PSNR_Y: 33.31 dB; SSIM_Y: 0.8625; PSNR_B: 0.00 dB.
Testing 36 197017               - PSNR: 33.46 dB; SSIM: 0.9228; PSNR_Y: 35.10 dB; SSIM_Y: 0.9352; PSNR_B: 0.00 dB.
Testing 37 208001               - PSNR: 34.05 dB; SSIM: 0.9204; PSNR_Y: 36.01 dB; SSIM_Y: 0.9396; PSNR_B: 0.00 dB.
Testing 38 210088               - PSNR: 38.00 dB; SSIM: 0.9620; PSNR_Y: 40.78 dB; SSIM_Y: 0.9756; PSNR_B: 0.00 dB.
Testing 39 21077                - PSNR: 34.18 dB; SSIM: 0.9032; PSNR_Y: 35.83 dB; SSIM_Y: 0.9194; PSNR_B: 0.00 dB.
Testing 40 216081               - PSNR: 33.35 dB; SSIM: 0.9367; PSNR_Y: 35.04 dB; SSIM_Y: 0.9494; PSNR_B: 0.00 dB.
Testing 41 219090               - PSNR: 34.88 dB; SSIM: 0.9243; PSNR_Y: 36.50 dB; SSIM_Y: 0.9377; PSNR_B: 0.00 dB.
Testing 42 220075               - PSNR: 35.54 dB; SSIM: 0.9521; PSNR_Y: 37.61 dB; SSIM_Y: 0.9650; PSNR_B: 0.00 dB.
Testing 43 223061               - PSNR: 33.88 dB; SSIM: 0.9557; PSNR_Y: 35.49 dB; SSIM_Y: 0.9627; PSNR_B: 0.00 dB.
Testing 44 227092               - PSNR: 37.97 dB; SSIM: 0.9423; PSNR_Y: 39.80 dB; SSIM_Y: 0.9547; PSNR_B: 0.00 dB.
Testing 45 229036               - PSNR: 32.15 dB; SSIM: 0.9262; PSNR_Y: 33.76 dB; SSIM_Y: 0.9353; PSNR_B: 0.00 dB.
Testing 46 236037               - PSNR: 32.16 dB; SSIM: 0.9350; PSNR_Y: 33.94 dB; SSIM_Y: 0.9431; PSNR_B: 0.00 dB.
Testing 47 24077                - PSNR: 35.38 dB; SSIM: 0.9678; PSNR_Y: 37.68 dB; SSIM_Y: 0.9761; PSNR_B: 0.00 dB.
Testing 48 241004               - PSNR: 35.61 dB; SSIM: 0.9070; PSNR_Y: 37.23 dB; SSIM_Y: 0.9253; PSNR_B: 0.00 dB.
Testing 49 241048               - PSNR: 32.33 dB; SSIM: 0.9273; PSNR_Y: 33.99 dB; SSIM_Y: 0.9364; PSNR_B: 0.00 dB.
Testing 50 253027               - PSNR: 33.92 dB; SSIM: 0.9310; PSNR_Y: 35.58 dB; SSIM_Y: 0.9421; PSNR_B: 0.00 dB.
Testing 51 253055               - PSNR: 36.10 dB; SSIM: 0.9168; PSNR_Y: 37.78 dB; SSIM_Y: 0.9341; PSNR_B: 0.00 dB.
Testing 52 260058               - PSNR: 35.37 dB; SSIM: 0.8999; PSNR_Y: 36.95 dB; SSIM_Y: 0.9184; PSNR_B: 0.00 dB.
Testing 53 271035               - PSNR: 34.40 dB; SSIM: 0.9299; PSNR_Y: 36.10 dB; SSIM_Y: 0.9428; PSNR_B: 0.00 dB.
Testing 54 285079               - PSNR: 33.00 dB; SSIM: 0.9271; PSNR_Y: 34.77 dB; SSIM_Y: 0.9383; PSNR_B: 0.00 dB.
Testing 55 291000               - PSNR: 30.12 dB; SSIM: 0.9457; PSNR_Y: 31.99 dB; SSIM_Y: 0.9523; PSNR_B: 0.00 dB.
Testing 56 295087               - PSNR: 33.49 dB; SSIM: 0.9405; PSNR_Y: 35.32 dB; SSIM_Y: 0.9515; PSNR_B: 0.00 dB.
Testing 57 296007               - PSNR: 34.75 dB; SSIM: 0.8944; PSNR_Y: 36.36 dB; SSIM_Y: 0.9144; PSNR_B: 0.00 dB.
Testing 58 296059               - PSNR: 34.66 dB; SSIM: 0.9062; PSNR_Y: 36.29 dB; SSIM_Y: 0.9223; PSNR_B: 0.00 dB.
Testing 59 299086               - PSNR: 35.32 dB; SSIM: 0.9082; PSNR_Y: 37.03 dB; SSIM_Y: 0.9262; PSNR_B: 0.00 dB.
Testing 60 300091               - PSNR: 35.18 dB; SSIM: 0.9137; PSNR_Y: 36.80 dB; SSIM_Y: 0.9295; PSNR_B: 0.00 dB.
Testing 61 302008               - PSNR: 38.71 dB; SSIM: 0.9649; PSNR_Y: 40.80 dB; SSIM_Y: 0.9749; PSNR_B: 0.00 dB.
Testing 62 304034               - PSNR: 32.27 dB; SSIM: 0.9455; PSNR_Y: 34.11 dB; SSIM_Y: 0.9549; PSNR_B: 0.00 dB.
Testing 63 304074               - PSNR: 31.85 dB; SSIM: 0.9043; PSNR_Y: 33.38 dB; SSIM_Y: 0.9154; PSNR_B: 0.00 dB.
Testing 64 306005               - PSNR: 33.75 dB; SSIM: 0.9265; PSNR_Y: 35.72 dB; SSIM_Y: 0.9444; PSNR_B: 0.00 dB.
Testing 65 3096                 - PSNR: 42.69 dB; SSIM: 0.9768; PSNR_Y: 44.83 dB; SSIM_Y: 0.9854; PSNR_B: 0.00 dB.
Testing 66 33039                - PSNR: 30.98 dB; SSIM: 0.9585; PSNR_Y: 32.50 dB; SSIM_Y: 0.9619; PSNR_B: 0.00 dB.
Testing 67 351093               - PSNR: 32.45 dB; SSIM: 0.9607; PSNR_Y: 34.08 dB; SSIM_Y: 0.9672; PSNR_B: 0.00 dB.

-- Average PSNR/SSIM(RGB): 34.32 dB; 0.9344
-- Average PSNR_Y/SSIM_Y: 36.10 dB; 0.9465


– Average PSNR/SSIM(RGB): 34.32 dB; 0.9344


5.3 单张图像去噪测试


# 先上传一张图片
import os.path as osp
from IPython.display import display
from PIL import Image
img_path = 'bird.png' # 改成自己上传的图片名称
full_img_path = osp.join(osp.abspath('work/test_images/'), img_path)
img = Image.open(full_img_path).convert('RGB')



  1. 给定一张噪声图片:指定参数noisy_img,直接输出去噪图片.

  2. 给定一张干净图片:指定参数clean_imgnoisyL,后者为噪声水平,默认为15,输出加噪图片和去噪图片.

  3. 给定噪声图片和干净图片:直接输出去噪图片.

# 仅给定干净图片,噪声水平为15
!cd work && python predict_single.py --clean_img $full_img_path --save_images --noisyL 15 --model_path pretrained_models/SwinIR_paddle.pdparams
loading model from pretrained_models/SwinIR_paddle.pdparams
W1016 17:20:03.374689 17743 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W1016 17:20:03.378355 17743 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
only clean image provided, noise level is 15

PSNR on test data 34.9994
# 去噪效果查看
import glob
from IPython.display import display
from PIL import Image

imgs = glob.glob('work/test_images/*')
for path in imgs:
    img = Image.open(path)






6. 复现心得

我又双叒叕来参加复现赛了 Σ(っ°Д°;)っ


总的来说,SwinIR 复现的工作量还是不小的,KAIR 这个项目是有点复杂的,特别是验收后 PR 到 PaddleGAN 时,设计的工作量就着实不小,在此特别感谢不爱做科研的KeyK的协助~比心!


7. 关于作者





