
Deep Learning, Natural Language
Understanding, Legal Text Processing
NGUYEN Laboratory
Professor:NGUYEN, Minh Le
E-mail:
[Research areas]
Artifical Intelligence, Natural Language Processing, Machine
Learning
[Keywords]
Natural Language Understanding, Text Summarization, Deep Learning,
Knowledge Representation
Skills and background we are looking for in prospective students
Mathematic, Programing (C++, Java, Python),Statistical models, Background on Artificial intelligence (Search algorithms, machine learning models). Background on Natural Language Processing is a plus point.
What you can expect to learn in this laboratory
We expect that students will obtain the following qualities through research activities in the lab. Skilsl in finding problems and reading papers. Have knowledge background on machine learning (deep learning) and natural language processing. With Ph.D students, we expect that after graduation they will become independent researcher and they can know how to write a scientific journal and how to present they works in an international conference. With master student, we expect that they will have skills in working with the problems of how to expoloit machine learning models on semi-structure data (big data). They can also know how to formulate a problem using machine learning models. They will obtain fundmental knowledge on machine learning and knowledge representation.
【Job category of graduates】 communication industry, software industry, service industry
Research outline

Logical parts in legal paragraph

Text Summarization: Sentence Reduction
Research Overview
Structure representations and machine learning models play a key important role for Artificial intelligence (AI). Our research will focus on how tactical structural representation and machine learning are used for formulating problems in AI ranging from text summarization, natural language understanding, legal engineering, and machine reading.
Machine Learning
Fundamental problems in machine learning are focused on our research directions. We particularly study on structured prediction modes, which are used to recognize structure representation such as sequence, tree, and graph. On the other hand, designing feature spaces for machine learning is difficult and requiring much human effort. To deal with this, we are concerned on how feature representation is automatically learnt from data. Regarding to this problem, Deep learning would probably be suitable for our goal. We also study on reinforcement learning which can learn by interacting with environments.
Natural Language Understanding
One of the ultimate goals in AI is to enable computers to converse with
humans through human languages. To achieve the goal, we especially
pay attention on semantic computation. This research is used to support
computers to understanding natural language. Our initial work showed
how synchronous grammars could be combined with structured learning
models to transform a natural language sentence to a logical form representation
[1]. On the other hand, we want to investigate how natural
language generation (NLG) can help computers for producing a human
understandable language sentence from its meaning representation.
One research topic we pursue is to know how probabilistic models can
be applied for generating natural language sentences from their underlying
semantic in the form of typed lambda calculus.
For legal engineering, our mission is to support people for reading legal
documents. The first task aims at recognizing logical parts of law sentences
in a paragraph, and then grouping related logical parts into some
logical structures of formulas, which describe logical relations between
logical parts [2].
Machine Reading:
One of the direction in our lab is to study the fundamental
problems on how we can extract useful information from texts
and how to build knowledge
from texts. First, we are interested
in text summarization [3]
which is used to extract gist
information from text documents.
We also focus on studying Machine
Reading, which automatically
extracts knowledge from a
large number of documents by
reading texts. Communication
between human and machine
in reading text is also interested
in our study. A Question Answering
system like IBM-Watson
is our expected outcome.
Key publications
- M.L. Nguyen, A. Shimazu: A semi supervised learning model for mapping sentences to logical forms with ambiguous supervision. Data Knowl. Eng. 90: 1-12 (2014)
- B.X. Ngo, M.L. Nguyen, T.T. Oanh, A. Shimazu, “A Two-Phase Framework for Learning Logical Structures of Paragraphs in Legal Articles”, ACM TALIP, Volume 12(1), 2013
- M.T. Nguyen and M.L. Nguyen. “SoRTESum: A Social Context Framework for Single-Document Summarization”, ECIR 2016, LNCS 9626, pp. 1–12, 2016
Equipment
Mac Server 64G
Windows Server 64GRAM
Teaching policy
The primary goal for teaching students is that we should teach students how they can develop an ability of self-learning. For supervising graduated students, we think one of the most important things is how to find problems for studying. To support students, we would like to discuss with students as much as possible to help them in choosing the research topic and discovering problems. Reading skill is so important for students in order to enrich their knowledge, and it would be helpful for students in choosing the topics and finding out problems. For this reason, our lab organize seminar courses covering state-of-the-art results. We think reading and discussing on state-of-the-art works, would be useful for improving not only student’s knowledge but also the student’s skills in writing papers. We also organize seminar courses covering the background knowledge both in machine learning and linguistic aspects.
[Website] URL:http://www.jaist.ac.jp/~nguyenml/