Prof. NGUYEN's team, Computing Science Research Area, achieved the highest performance in the COLIEE-2023
Professor NGUYEN, Minh Le's team, Computing Science Research Area, achieved the highest performance in Task2, Task3, and Task4 of the Competition on Legal Information Extraction/Entailment (COLIEE) 2023, an international competition for information extraction and entailment recognition in legal documents.
COLIEE 2023 is an international legal document processing competition. It aims to find AI techniques for addressing legal case retrieval problems, and this is the 10th time the competition has been held. There were four challenges in total, two each from both statute law and case law, and participants solved the problems using AI and presented papers on their methods and the results of their experiments with automated solutions, and Nguyen's team received first place in three of those tasks, the highest grade possible.
COLIEE2023 was held in conjunction with ICAIL2023, the leading conference in the field of AI and law.
※Reference：Competition on Legal Information Extraction/Entailment (COLIEE) 2023
June 19, 2023
■Team Name and Members
CAPTAIN：Chau Nguyen, Phuong Nguyen, Thanh Tran, Dat Nguyen, An Trieu, Tin Pham, Anh Dang and Le-Minh Nguyen
JNLP：Quan Bui, Truong Do, Khang Le, Hien Nguyen, Nguyen Hiep, Trang Pham, and Le-Minh Nguyen
Efficient Methods for Legal Information Retrieval and Entailment Tasks
Data Augmentation and Large Language Model for Legal Case Retrieval and Entailment.
The Competition on Legal Information Extraction/Entailment (COL- IEE) is held annually to encourage advancements in the automatic processing of legal texts. Processing legal documents is challenging due to legal language's intricate structure and meaning. This paper outlines our strategies for tackling Task2, Task3, and Task4 in the COLIEE 2023 competition. Our approach involved utilizing appropriate state-of-the-art deep learning methods, designing methods based on domain characteristics observation, and applying meticulous engineering practices and methodologies to the competition. As a result, our performance in these tasks has been outstanding, with first places in Task2 and Task3 and promising developments in Task4.
Legal case retrieval involves identifying relevant cases that share similarities with a given case, while entailment requires assessing whether a legal statement can logically follow from another. These tasks are challenging due to the intricate nature of legal language and the vast quantity of legal documents. To overcome these difficulties, we propose implementing data augmentation techniques to produce additional training data and employing a large language model such as BART or T5 to capture the nuances of legal language. Specifically, we augment the provided dataset by generating synthetic cases that exhibit similar attributes to the original cases. We subsequently train a large language model on the augmented dataset and employ it to retrieve pertinent cases and determine entailment. Our findings also reveal that specific large language generative models, such as the Flan model, have demonstrated potential for performing exceptionally well on the COLIEE task4 dataset. Notably, the Flan model achieved state-of-the-art results on the COLIEE2023 and 2022 task4 test sets.
COLIEE 2023 is a prestigious competition in the field of Legal AI. This year, the Nguyen Lab achieved the highest overall performance in the competition, winning first place in three out of the four tasks. We are incredibly honored to receive this award and would like to express our deep gratitude to JAIST for providing our team with an excellent research environment. This recognition serves as an encouragement to our students and all members of the lab, motivating us to continue conducting research and strive for more significant achievements in the future
June 30, 2023