- ๐ Completed a final CAPSTONE project in collaboration with Refiberd, working alongside an exceptional team including Isidora Rollan , Erin Jones, Mustafa Hameed , and Prashant Sharma.
- ๐ง๐งฉ Prototyped an image-to-text captioning system for Refiberd projecting reduced workforce requirements for label collection by 50%, thereby saving 1,500 annual work hours.
- ๐ก Conceptualized, tested, and finalized the ML model, followed by fine-tuning and deployment preparation.
- ๐ง ๐ป Optimized the state-of-the-art multimodal encoder-decoder DONUT model, achieving an exceptional normalized Levenshtein distance of 0.05.
- ๐๏ธ๐ Developed a custom dataset by sourcing raw images of vendor tags and converting them into the Hugging Face Apache Arrow format, enhancing model training efficiency.
- ๐๏ธ๐ Assembled Train Dataset: 469 Images
- ๐ง๐ Tuned Validation Dataset: 112 Images
- ๐ฏ๐ Evaluated Test Dataset: 66 Images.
- ๐ Developed a custom PyTorch training loop, logging key metrics such as train loss, validation accuracy, and test accuracy, while also synchronizing model states and checkpoints with the Hugging Face Hub.
- โ๏ธ Explored and tuned hyperparameters, experimenting with various optimizers including SGD, SGD with momentum, and second-order optimizers like Adam and AdamW to enhance model performance.
- ๐ฅ๏ธ Leveraged L4 GPU architecture to enhance training loops, efficiently managing GPU resources for improved model training workflows.
- ๐ Achieved a perfect match on 47 out of 66 test images underscoring the model's robustness to out-of-distribution data.
- โ๏ธ Attained a Cross Entropy Loss of just 0.0005 on the training dataset for the decoder's next-token-prediction task, indicating high predictive accuracy.
- ๐ Currently developing an attention mask using output_state and BertViz for the decoder side to enhance insight into the attention mechanisms within the model.
Fine Tuning OCR Free Document Understanding Transformer for Image-to-Text Captioning
[ Presentation, Demo, Github, Website, Report ]