Using Sentence Embedding Techniques for Enhancing Terms-of-Service Text Summarization

Authors: Peach, H., Rusnachenko, N., Baraskar, M. and Liang, H.

Journal: Lecture Notes in Networks and Systems

Volume: 1230 LNNS

Pages: 55-64

eISSN: 2367-3389

ISSN: 2367-3370

DOI: 10.1007/978-3-031-78943-4_7

Abstract:

Summarization is useful for extracting salient information from linguistically complex texts. This is especially relevant in the legal domain, where it can be used to make content more accessible to layman readers. A simplified representation can help foster transparency and trust between an organization and individuals. We examine the background of the latest advances in extractive and abstractive summarization approaches. The recent appearance of transformer architecture with a self-attention mechanism has a huge impact on abstractive summarization performance. However, a major limitation of abstractive summarization pertains to constraints on input size. To address these shortcomings, in this paper, we propose a target-oriented sentence embedding classification (SEC) architecture. It is designed specifically for Terms-of-Service (ToS) document summarization and is intended to serve the preliminary text processing for abstractive summarization. The results of experiments conducted under a collection of ToS documents from the service TOS;DR show that SEC model results in average 11% increment across all ROUGE metrics (F-measure) in comparison with other extractive summarizers for significantly short summaries. The application of SEC in general-purpose abstractive summarizers results in models that illustrate increment in ROUGE-2 by 11-12% and equal or better ROUGE-L. We accompany the proposed architecture with the annotation service and complex word simplification modules, formed into a publicly available system(https://github.com/HarryPeach/simplifying-legal-content).

Source: Scopus