[ Presentation, Demo, Github, Website, Report ]
  • ๐ŸŽ“ Completed a final CAPSTONE project in collaboration with Refiberd, working alongside an exceptional team including Isidora Rollan , Erin Jones, Mustafa Hameed , and Prashant Sharma.
  • ๐Ÿ”ง๐Ÿงฉ Prototyped an image-to-text captioning system for Refiberd projecting reduced workforce requirements for label collection by 50%, thereby saving 1,500 annual work hours.
  • ๐Ÿ’ก Conceptualized, tested, and finalized the ML model, followed by fine-tuning and deployment preparation.
  • ๐Ÿง ๐Ÿ’ป Optimized the state-of-the-art multimodal encoder-decoder DONUT model, achieving an exceptional normalized Levenshtein distance of 0.05.
  • ๐Ÿ—„๏ธ๐Ÿ“Š Developed a custom dataset by sourcing raw images of vendor tags and converting them into the Hugging Face Apache Arrow format, enhancing model training efficiency.
  • ๐Ÿ—๏ธ๐Ÿ“Š Assembled Train Dataset: 469 Images
  • ๐Ÿ”ง๐Ÿ“Š Tuned Validation Dataset: 112 Images
  • ๐ŸŽฏ๐Ÿ“Š Evaluated Test Dataset: 66 Images.
  • ๐Ÿ” Developed a custom PyTorch training loop, logging key metrics such as train loss, validation accuracy, and test accuracy, while also synchronizing model states and checkpoints with the Hugging Face Hub.
  • โš™๏ธ Explored and tuned hyperparameters, experimenting with various optimizers including SGD, SGD with momentum, and second-order optimizers like Adam and AdamW to enhance model performance.
  • ๐Ÿ–ฅ๏ธ Leveraged L4 GPU architecture to enhance training loops, efficiently managing GPU resources for improved model training workflows.
  • ๐Ÿ† Achieved a perfect match on 47 out of 66 test images underscoring the model's robustness to out-of-distribution data.
  • โš–๏ธ Attained a Cross Entropy Loss of just 0.0005 on the training dataset for the decoder's next-token-prediction task, indicating high predictive accuracy.
  • ๐Ÿ” Currently developing an attention mask using output_state and BertViz for the decoder side to enhance insight into the attention mechanisms within the model.