In order two compare two alternative AMR (Abstract Meaning Representation; http://amr.isi.edu) graphs representing a sentence, previous work focused on a single metric assessing the overall similarity between them, called Smatch (http://amr.isi.edu/evaluation.html).
AMR is a complex meaning representation which embeds many traditional NLP problems such as word sense disambiguation, named entity recognition, coreference resolution, semantic role labeling, etc. As such, AMR parsers (automatic tools that convert natural language sentences into AMR representation) need to deal with all of them. When we then want to compare two AMR parsers to decide which one works better on a specific dataset or domain, we need to take into account that a parser might work well on a specific subtask and worse on another.
In our EACL paper (https://arxiv.org/abs/1608.06111) we proposed a set of metrics that can be used in addition to the traditional Smatch score:
Source code and instructions on how to use it are available at: https://github.com/mdtux89/amr-evaluation.
Check also our AMREager parser demo here.
References
@inproceedings{damonte-17,
title={An Incremental Parser for Abstract Meaning Representation},
author={Marco Damonte and Shay B. Cohen and Giorgio Satta},
booktitle={Proceedings of {EACL}},
year={2017}
}