- 🔍 Explored the ability of GPT2 Custom Configured Transformer Architecture (22M Params) to in-context learn noisy decision tree algorithm
- 🧱 Built upon the research conducted by garg2023transformers by incorporating:
a) Random Quadrant Prompting, b) Train-Test Overlapping Prompting for noise-free trained checkpoints on decision trees - 🧪 Additionally, tested the model's robustness by training it with i.i.d. Gaussian noise at standard deviations of 0, 1, and 3
- Choice of Parameters:
- Batch Size: 64
- Learning Rate: 1e-4
- Tree Depth: 4
- Number of Dimensions: 8
- In-Context Examples: 40
- ⏳ Ongoing project due to computational demands required to train a transformer model from scratch and the novelty of the issue. Readers are encouraged to review the preprint for preliminary results.
Adapting to Context: A Case Study on In-Context Learning of Decision Tree Algorithms by Large Language Models
[ Preprint, Github, Motivation]