A minimal ViT implementation in PyTorch. The implementation is "largely" self-contained with some minimal use of timm. Most of the core modules have been implemented in a simple way and should provide hackable, and transparent implementation — albeit less configurable than timm.
This is made for simple copy-paste use, or as a starting point for any model design or experimentation.