This archive contains word embeddings extracted from Wikipedia using canonical correlation analysis with prior knowledge encoding from external resources such as FrameNet, WordNet and PPDB.
@inproceedings{osborne-16, author = "D. Osborne and S. Narayan and S. B. Cohen", title = "Encoding Prior Knowledge with Eigenword Embeddings", booktitle = "Transactions of the Association for Computational Linguistics", year = "2016" }
The files in this directory are (click to download):
The format for each file is
[word] [vector]
where vector is a space-separated list of real 300 numbers.
The words used are the top 200k most frequenst words from the first 5 gigabytes of Wikipedia. More details about alpha and the external sources of prior knowledge are in the paper.