Skip to content

haeun161/Dynamic_Tanh

Repository files navigation

Dynamic_Tanh

Experiments on Dynamic Tanh(Paper: Transformers without Normalization)

Paper

Experiment 1: Reconstruction Experiment on 'What do Normalization Layer do?'

  1. Reconstruction Experiment: checks input&output of LayerNorm in ViT
  2. Comparsion Test : Checks input & output of BatchNorm in ResNet50 to see whether it has the same Pattern(S-shaped)

Results

  • ViT: LayerNorm
    • image
  • ResNet50: BatchNorm
    • image

Experiment 2: Reconstruction Experiment on Training with DyT in ResNet50(BatchNorm)

  • According to the paper, Limation of DyT was that DyT struggled to fully replace BatchNormalization

    • image
  • Experiment Environment

    • torchvision 사용
      • torchvision.transforms
      • torchvision.models
    • Model: ResNet50
      • with BatchNorm
      • with DyT
    • Data: Mini version of Imagenet1K
    • Initialization of α:
      • Paper uses α=0.5 => however it didn't work well on ResNet50-DyT
      • In this experiment, I initialized α as 1.0(α=1.0)

2-1: Using Pretrained ResNet50

Results

  • BatchNorm
    • image
  • DyT
    • image
    • Learnable Param: alpha
      • image

2-1: Using untrained ResNet50

  • due to the limitation of available GPU, it is trained till epoch 30.

Results

  • BatchNorm
    • image
  • DyT
    • image
    • Learnable Param: alpha
      • image

About

Experiments on Dynamic Tanh(Paper: Transformers without Normalization)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published