Nguyen Lab's Legal AI Research

Legal AI refers to the research of Artificial Intelligence technology to automate and enhance various aspects of legal practice, from document analysis and legal research to predictive analytics and decision support systems. We actively participate in the COLIEE, a pioneer legal AI competition, to address the task of legal retrieval and legal entailment, for both case law and statute law.

Publications

Nguyen Lab's Legal AI Research: Recent publications

Papers

Pushing the Boundaries of Legal Information Processing with Integration of Large Language Models

Authors: Chau Nguyen, Thanh Tran, Khang Le, Hien Nguyen, Truong Do, Trang Pham, Son T. Luu, Trung Vo, Le-Minh Nguyen

New Frontiers in Artificial Intelligence

Abstract: The legal domain presents unique challenges in information processing, given the complexity and specificity of legal texts. Addressing these challenges, this work leverages breakthroughs in Large Language Models (LLMs) to push the boundaries in legal information extraction and entailment. Our approaches involve the integration of LLMs in the COLIEE 2024 competition across four tasks: Legal Case Retrieval (Task 1), Legal Case Entailment (Task 2), Statute Law Retrieval (Task 3), and Legal Textual Entailment (Task 4). In Task 1, we employ a twostage strategy that combines keyword-based retrieval using BM25 with a sophisticated MonoT5 reranker fine-tuned on legal datasets. For Task 2, we further adapt MonoT5, incorporating hard negative sampling. For Task 3, we introduce a novel strategy that utilizes LLMs to enhance the performance of high-recall predictions from smaller language models, an approach we also adapt for Task 2. To address Task 4, we employ an ensemble of LLMs’ predictions, adjudicated via majority voting and the Dawid-Skene label model. Our strategies take advantage of the strengths of each model, with prompting techniques and constraints applied to exploit ensemble advantages. Consequently, we have achieved the topranked performance in Task 3 and secured promising outcomes in Tasks 1, 2, and 4. This paper describes our methodologies, offering insights into how integrating LLMs into legal information systems can significantly enhance their efficacy in tackling complex legal documents.

CAPTAIN at COLIEE 2024: Large Language Model for Legal Text Retrieval and Entailment

Authors: Phuong Nguyen, Cong Nguyen, Hiep Nguyen, Minh Nguyen, An Trieu, Dat Nguyen, Le-Minh Nguyen

New Frontiers in Artificial Intelligence

Abstract: Recently, the Large Language Models (LLMs) has made a great contribution to massive Natural Language Processing (NLP) tasks. This year, our team, CAPTAIN, utilizes the power of LLM for legal information extraction tasks of the COLIEE competition. To this end, the LLMs are used to understand the complex meaning of legal documents, summarize the important points of legal statute law articles as well as legal document cases, and find the relations between them and specific legal cases. By using various prompting techniques, we explore the hidden relation between the legal query case and its relevant statute law as supplementary information for testing cases. The experimental results show the promise of our approach, with first place in the task of legal statute law entailment, competitive performance to the State-ofthe-Art (SOTA) methods on tasks of legal statute law retrieval, and legal case entailment in the COLIEE 2024 competition.

CAPTAIN at COLIEE 2023: Efficient methods for legal information retrieval and entailment tasks

Authors: Chau Nguyen, Phuong Nguyen, Thanh Tran, Dat Nguyen, An Trieu, Tin Pham, Anh Dang, Le-Minh Nguyen

Proceedings of the International Competition on Legal Information Extraction/Entailment (JURISIN 2023)

Abstract: The Competition on Legal Information Extraction/Entailment (COLIEE) is held annually to encourage advancements in the automatic processing of legal texts. Processing legal documents is challenging due to the intricate structure and meaning of legal language. In this paper, we outline our strategies for tackling Task 2, Task 3, and Task 4 in the COLIEE 2023 competition. Our approach involved utilizing appropriate state-of-the-art deep learning methods, designing methods based on domain characteristics observation, and applying meticulous engineering practices and methodologies to the competition. As a result, our performance in these tasks has been outstanding, with first places in Task 2 and Task 3, and promising results in Task 4.

AIEPU at ALQAC 2023: Deep Learning Methods for Legal Information Retrieval and Question Answering

Authors: Long Hoang, Tung Bui, Chau Nguyen, Le-Minh Nguyen

International Conference on Knowledge and Systems Engineering (KSE 2023)

Abstract: Given the widespread integration of Artificial Intelligence across various domains, the exploration of its applications in the legal field remains understudied, emphasizing the critical need for the development of effective deep learning approaches. This paper describes on our approaches to legal information retrieval (Task 1) and legal question answering (Task 2) in the Automated Legal Question Answering Competition (ALQAC 2023). Specifically, we employed the Paraformer model for information retrieval and leveraged large language models for answering questions. In the competition, we achieved the 1 st place for Task 2. In this task, our findings demonstrate that fine-tuning appropriate prompts helps the large language models to achieve better performance.

The Impact of Large Language Modeling on Natural Language Processing in Legal Texts: A Comprehensive Survey

Authors: Dang Hoang Anh, Dinh-Truong Do, Vu Tran, Nguyen Le Minh

International Conference on Knowledge and Systems Engineering (KSE 2023)

Abstract: Natural Language Processing (NLP) has witnessed significant advancements in recent years, particularly with the emergence of large language models. These models, such as GPT-3.5 and its variants, have revolutionized various domains, including legal text processing (LTP). This survey explores the impact of large language modeling on NLP in the context of legal texts. By analyzing the latest research and developments, we seek to understand the benefits, challenges, and potential applications of large language models in the field of legal language processing.

SM-BERT-CR: a deep learning approach for case law retrieval with supporting model

Authors: Yen Thi-Hai Vuong, Quan Minh Bui, Ha-Thanh Nguyen, Thi-Thu-Trang Nguyen, Vu Tran, Xuan-Hieu Phan, Ken Satoh, Le-Minh Nguyen

Artificial Intelligence and Law (AI and Law)

Abstract: Case law retrieval is the task of locating truly relevant legal cases given an input query case. Unlike information retrieval for general texts, this task is more complex with two phases (legal case retrieval and legal case entailment) and much harder due to a number of reasons. First, both the query and candidate cases are long documents consisting of several paragraphs. This makes it difcult to model with representation learning that usually has restriction on input length. Second, the concept of relevancy in this domain is defned based on the legal relation that goes beyond the lexical or topical relevance. This is a real challenge because normal text matching will not work. Third, building a large and accurate legal case dataset requires a lot of efort and expertise. This is obviously an obstacle to creating enough data for training deep retrieval models. In this paper, we propose a novel approach called supporting model that can deal with both phases. The underlying idea is the case–case supporting relation and the paragraph–paragraph as well as the decision-paragraph matching strategy. In addition, we propose a method to automatically create a large weak-labeling dataset to overcome the lack of data. The experiments showed that our solution has achieved the state-of-the-art results for both case retrieval and case entailment phases.

Attentive deep neural networks for legal document retrieval

Authors: Ha-Thanh Nguyen, Manh-Kien Phi, Xuan-Bach Ngo, Vu Tran, Le-Minh Nguyen, Minh-Phuong Tu

Artificial Intelligence and Law (AI and Law)

Abstract: Legal text retrieval serves as a key component in a wide range of legal text processing tasks such as legal question answering, legal case entailment, and statute law retrieval. The performance of legal text retrieval depends, to a large extent, on the representation of text, both query and legal documents. Based on good representations, a legal text retrieval model can efectively match the query to its relevant documents. Because legal documents often contain long articles and only some parts are relevant to queries, it is quite a challenge for existing models to represent such documents. In this paper, we study the use of attentive neural network-based text representation for statute law document retrieval. We propose a general approach using deep neural networks with attention mechanisms. Based on it, we develop two hierarchical architectures with sparse attention to represent long sentences and articles, and we name them Attentive CNN and Paraformer. The methods are evaluated on datasets of diferent sizes and characteristics in English, Japanese, and Vietnamese. Experimental results show that: (i) Attentive neural methods substantially outperform non-neural methods in terms of retrieval performance across datasets and languages; (ii) Pretrained transformer-based models achieve better accuracy on small datasets at the cost of high computational complexity while lighter weight Attentive CNN achieves better accuracy on large datasets; and (iii) Our proposed Paraformer outperforms state-of-the-art methods on COLIEE dataset, achieving the highest recall and F2 scores in the top-N retrieval task.

A Legal Information Retrieval System for Statute Law

Authors: Chau Nguyen, Nguyen-Khang Le, Dieu-Hien Nguyen, Phuong Nguyen, Le-Minh Nguyen

Asian Conference on Intelligent Information and Database Systems (ACIIDS 2022)

Abstract: The information retrieval task for statute law requires a system to retrieve the relevant legal articles given a legal bar exam query. The Transformer-based approaches have demonstrated robustness over traditional machine learning and information retrieval methods for legal documents. However, those approaches are mainly domain adaptation without attempting to tackle the challenges in the characteristics of the legal queries and the legal documents. This paper specifies two challenges related to the characteristics of the two legal materials and proposes methods to tackle them effectively. Specifically, the challenge of different language used (while the articles use abstract language, the queries may use the language to describe a specific scenario) is addressed by a specialized model. Besides, another specialized model can overcome the challenge of long articles and queries. As shown in the experimental results, our proposed system achieved a state-of-the-art F2 score of 76.87%, with an improvement of 3.85% compared to the previous best system.

Abstract meaning representation for legal documents: an empirical research on a human-annotated dataset

Authors: Sinh Trong Vu, Minh Le Nguyen, Ken Satoh

Artificial Intelligence and Law (AI and Law)

Abstract: Natural language processing techniques contribute more and more in analyzing legal documents recently, which supports the implementation of laws and rules using computers. Previous approaches in representing a legal sentence often based on logical patterns that illustrate the relations between concepts in the sentence, often consist of multiple words. Those representations cause the lack of semantic information at the word level. In our work, we aim to tackle such shortcomings by representing legal texts in the form of abstract meaning representation (AMR), a graphbased semantic representation that gains lots of polarity in NLP community recently. We present our study in AMR Parsing (producing AMR from natural language) and AMR-to-text Generation (producing natural language from AMR) specifcally for legal domain. We also introduce JCivilCode, a human-annotated legal AMR dataset which was created and verifed by a group of linguistic and legal experts. We conduct an empirical evaluation of various approaches in parsing and generating AMR on our own dataset and show the current challenges. Based on our observation, we propose our domain adaptation method applying in the training phase and decoding phase of a neural AMR-to-text generation model. Our method improves the quality of text generated from AMR graph compared to the baseline model.

Contact Us

We are seeking students passionate about Natural Language Processing (NLP) and Deep Learning.

Location:

IS Building Ⅲ 7F, 1 Chome-1 Asahidai, Nomi, Ishikawa, Japan

Email:

nguyenml[at]jaist.ac.jp

Call:

+81 761-51-1221