Improving Sub-billion Scale LLM Design Experiments Some of the techniques used in the LLM pretraining design include: Embedding Sharing Grouped Query Design SwiGLU Activations for the Multi Perceptron Layer Intermidate blockwise weight sharing