Canonical Correlation Inference for Mapping Abstract Scenes to Text


This project focuses on the use of canonical correlation analysis to map images and text to a shared space, and then use this shared space to map unseen images to corresponding captions. The dataset we use is abstract scene dataset, developed at Microsoft, for which we show a couple of pictures below.

Click for the following paper.


@techreport{jiang-16,
  title={Canonical Correlation Inference for Mapping Abstract Scenes to Text},
  author={Helen Jiang and Nikos Papasarantopoulos and Shay B. Cohen},
  eprint={arXiv},
  year={2016}
}


Click here to download the ranked captions. The format of the file is as follows. There are 300 lines, a line per ranked image. Each line has fields separated by ^. The fields are as follows: Not all images have 8 gold-standard captions, so some can be empty.

Click here to download the splits that we used for the training/development/tuning/test sets. These are the same splits as were used by Ortiz et al. Each file in this gzipped tarball contains a list of pointers to the scenes that were used for the relevant set. The human-ranked images were taken from the test set (first 300 images).