[ Preprint, Github, Motivation]
  • 🔍 Explored the ability of GPT2 Custom Configured Transformer Architecture (22M Params) to in-context learn noisy decision tree algorithm
  • 🧱 Built upon the research conducted by garg2023transformers by incorporating:
    a) Random Quadrant Prompting, b) Train-Test Overlapping Prompting for noise-free trained checkpoints on decision trees
  • 🧪 Additionally, tested the model's robustness by training it with i.i.d. Gaussian noise at standard deviations of 0, 1, and 3
  • Choice of Parameters:
    • Batch Size: 64
    • Learning Rate: 1e-4
    • Tree Depth: 4
    • Number of Dimensions: 8
    • In-Context Examples: 40
  • ⏳ Ongoing project due to computational demands required to train a transformer model from scratch and the novelty of the issue. Readers are encouraged to review the preprint for preliminary results.