Login | Register

Discourse Segmentation of Judicial Decisions for Fact Extraction

Title:

Discourse Segmentation of Judicial Decisions for Fact Extraction

Lou, Andrés (2021) Discourse Segmentation of Judicial Decisions for Fact Extraction. Masters thesis, Concordia University.

[thumbnail of main (1).pdf]
Preview
Text (application/pdf)
main (1).pdf - Accepted Version
1MB

Abstract

In order to justify rulings, legal documents need to present facts as well as an analysis built on these. However, identifying what a \textit{Fact} is and what a \textit{non-Fact} is, and how they relate in order to bring together a legal document is often not an easy task. Because of a number of reasons, such as the domain-specific definition of the term \textit{Fact}, or the technical vocabulary that permeates the documents in which these are to be found, extracting the case-related facts from a legal document is usually time-consuming and expensive.

In this thesis, we present two methods to automatically extract case-relevant facts in French-language legal documents pertaining to tenant-landlord disputes. We base our approaches on the assumption that the text of a decision will follow the structural convention of first stating the facts and then performing an analysis based on them. This assumption is itself based on the widespread application of the IRAC legal document writing model and its many variants \parencite{beazley2018practical}. Given a legal document, we perform text segmentation to extract the parts that contain the case-relevant \textit{Facts} using two different approaches based on neural methods commonly used in Natural Language Processing and a novel heuristic method based on the density of \textit{Facts} in a segment of the text.

Our two approaches are based on the representation of legal texts as binary strings, where contiguous subsequences of 1's represent sentences containing \textit{Facts} and, conversely, contiguous subsequences of 0's correspond to sentences containing \textit{non-Facts}. The first approach consists of classifying each sentence in the document as either \textit{Facts} or \textit{non-Facts} using an ensemble model of independent word embeddings \parencite{mikolov2013efficient}, GRU networks \parencite{cho2014learning} and Convolutional Neural Networks \parencite{kim2014convolutional}. The second approach consists of a contextual classification of sentences as either class using recurrent architectures to create the binary string representation of the document; we experiment using LSTM networks \parencite{hochreiter1997long}, GRU networks, and Attention Encoder-Decoder models \parencite{bahdanau2014neural}. The segmentation is carried out by introducing the concept of \textit{purity}, a measure of the density of facts in a given subsequence in the binary string; the facts are extracted by maximising both the length and the purity of the substring containing the facts.

Extrinsic evaluations of both approaches show that the second approach outperforms the first by producing the greater number of documents whose segmentation point is predicted within a single sentence of difference from the one indicated by the gold standard. Nevertheless, a significant percentage of segmentation points are underestimated by being predicted more than four sentences away from the point determined by the gold standard.

Divisions:Concordia University > Gina Cody School of Engineering and Computer Science > Computer Science and Software Engineering
Item Type:Thesis (Masters)
Authors:Lou, Andrés
Institution:Concordia University
Degree Name:M. Comp. Sc.
Program:Computer Science
Date:18 April 2021
Thesis Supervisor(s):Kosseim, Leila
ID Code:989071
Deposited By: ANDRES LOU
Deposited On:29 Nov 2021 17:01
Last Modified:29 Nov 2021 17:01
Related URLs:
All items in Spectrum are protected by copyright, with all rights reserved. The use of items is governed by Spectrum's terms of access.

Repository Staff Only: item control page

Downloads per month over past year

Research related to the current document (at the CORE website)
- Research related to the current document (at the CORE website)
Back to top Back to top