
[ Preprint, Poster, Github ]
- ๐๏ธ๐ Dataset: MPII - 1463 images with an 80/20 training/validation split
- ๐ค Model: Bootstrapping Language-Image Pre-Training for Unified Vision-Language Understanding and Generation (BLIP)
- ๐ค: Source Code
- ๐ Training Loss: Language Modelling Loss from Text Decoder
- ๐ Evaluation Metric: Mean Absolute Error (MAE) [PyTorch]
- โ๏ธ Validation Threshold Range: [1, 5, 25] pixels
- โ๏ธ Hyperparameter choice:
- Batch size: 4
- Learning Rate: 2e-5
- Optimizer: AdamW
- ๐ฏ Validation Accuracy of 92.5% with a threshold of 25 pixels