MSCOCO

doi:doi:10.57702/xriudzva

MSCOCO

Human Pose Estimation (HPE) aims to estimate the position of each joint point of the human body in a given image. HPE tasks support a wide range of downstream tasks such as activity recognition, motion capture, etc. Recently with the ViT model being proven effective on many visual tasks, many transformer-based methods have achieved excellent performance on HPE tasks.

BibTex:

@dataset{Marc_Tanti_and_Albert_Gatt_and_Adrian_Muscat_2024,
    abstract = {Human Pose Estimation (HPE) aims to estimate the position of each joint point of the human body in a given image. HPE tasks support a wide range of downstream tasks such as activity recognition, motion capture, etc. Recently with the ViT model being proven effective on many visual tasks, many transformer-based methods have achieved excellent performance on HPE tasks.},
    author = {Marc Tanti and Albert Gatt and Adrian Muscat},
    doi = {10.57702/xriudzva},
    institution = {No Organization},
    keyword = {'Captioning', 'Image Captioning', 'Image Description', 'Image Retrieval', 'Image-Text Matching', 'Images', 'Instance Segmentation', 'Japanese', 'MSCOCO', 'Multimodal Learning', 'Object Detection', 'Semantic Segmentation', 'Visual Question Answering', 'Weakly Supervised Learning', 'computer vision', 'dataset', 'diffusion models', 'gender', 'human attention', 'human pose estimation', 'image captioning', 'image classification', 'image dataset', 'image description', 'image generation', 'image processing', 'image-text matching', 'image-text pairs', 'image-text retrieval', 'key-point based object detection', 'large-scale dataset', 'natural descriptions', 'natural language processing', 'neural networks', 'non-autoregressive', 'object detection', 'paraphrase generation', 'sequence generation', 'single-shot detectors', 'skin-tone', 'synthetic pairs', 'text-only', 'text-to-image', 'vision-and-language models', 'vision-language pre-training', 'visual language grounding', 'visual question answering', 'visual relationship detection', 'visual-textual embedding'},
    month = {nov},
    publisher = {TIB},
    title = {MSCOCO},
    url = {https://service.tib.eu/ldmservice/dataset/mscoco},
    year = {2024}
}