View on GitHub

FAR

Facet-Aware Evaluation for Extractive Summarization, ACL 2020

Facet-Aware Evaluation for Extractive Summarization

We provide the annotated data and self-contained evaluation scripts for the paper “Facet-Aware Evaluation for Extractive Summarization” [Paper] [Slides & Talk], ACL 2020.

Abstract

Commonly adopted metrics for extractive summarization focus on lexical overlap at the token level. In this paper, we present a facet-aware evaluation setup for better assessment of the information coverage in extracted summaries. Specifically, we treat each sentence in the reference summary as a facet, identify the sentences in the document that express the semantics of each facet as support sentences of the facet, and automatically evaluate extractive summarization methods by comparing the indices of extracted sentences and support sentences of all the facets in the reference summary. To facilitate this new evaluation setup, we construct an extractive version of the CNN/Daily Mail dataset and perform a thorough quantitative investigation, through which we demonstrate that facet-aware evaluation manifests better correlation with human judgment than ROUGE, enables fine-grained evaluation as well as comparative analysis, and reveals valuable insights of state-of-the-art summarization methods.

Example

example

Each facet (reference sentence) is mapped to a list of support groups, where each group consists of one or multiple support sentences. The semantics of one facet is covered as long as one of the support groups is covered.

In the example above, Facet 1 (r1) is covered because document sentence d1/d3 is extracted, but r2 is not covered since d4 is not extracted.

Category Breakdown

We analyzed 150 samples on the CNN/Daily Mail dataset and found that around 1/4 contain noise, and 60% are of low abstraction (suitable for extraction).

cat

Code Instructions

facet_aware_eval.ipynb illustrates how to use the annotated data for facet-aware evaluation. The raw extracted summaries are first mapped back to sentence indices and then compared with the indices of support sentences.

data/* contains files used in the notebook demo.

output/* contains human-readable plain text of the annotation results.

Citation

Please cite the following paper if you use our data. Thank you!

@inproceedings{mao-etal-2020-facet,
    title = "Facet-Aware Evaluation for Extractive Summarization",
    author = "Mao, Yuning  and
      Liu, Liyuan  and
      Zhu, Qi  and
      Ren, Xiang  and
      Han, Jiawei",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.445",
    pages = "4941--4957",
}