This project focuses on the use of canonical correlation analysis to map images and text to a shared space, and then use this shared space to map unseen images to corresponding captions. The dataset we use is abstract scene dataset, developed at Microsoft, for which we show a couple of pictures below.
Click here for the following paper.
@inproceedings{papasarantopoulos-18,
title={Canonical Correlation Inference for Mapping Abstract Scenes to Text},
author={Nikos Papasarantopoulos and Helen Jiang and and Shay B. Cohen},
booktitle={Proceedings of {AAAI}},
year={2018}
}
Click here to download the ranked captions.
The format of the file is as follows. There are 300 lines, a line per ranked image. Each line has fields separated by ^. The fields are as follows:
Not all images have 8 gold-standard captions, so some can be empty.
Click here to download the splits that we used for the training/development/tuning/test sets. These are the same splits as were used by Ortiz et al. Each file in this gzipped tarball contains a list of pointers to the scenes that were used for the relevant set. The human-ranked images were taken from the test set (first 300 images).