[ Preprint, Poster, Github ]

  • ๐Ÿ—„๏ธ๐Ÿ“Š Dataset: MPII - 1463 images with an 80/20 training/validation split
  • ๐Ÿค– Model: Bootstrapping Language-Image Pre-Training for Unified Vision-Language Understanding and Generation (BLIP)
  • ๐Ÿค—: Source Code
  • ๐Ÿ“‰ Training Loss: Language Modelling Loss from Text Decoder
  • ๐Ÿ“ Evaluation Metric: Mean Absolute Error (MAE) [PyTorch]
  • โš–๏ธ Validation Threshold Range: [1, 5, 25] pixels
  • โš™๏ธ Hyperparameter choice:
    • Batch size: 4
    • Learning Rate: 2e-5
    • Optimizer: AdamW
  • ๐ŸŽฏ Validation Accuracy of 92.5% with a threshold of 25 pixels