Eighth International Workshop on SCIentific DOCument Analysis
(SCIDOCA 2024)
associated with JSAI-isAI 2024

Workshop: May 28 - 29, 2024

Venue: ACT CITY Hamamatsu, Hamamatsu, Shizuoka, Japan

Aims and Scope

Recent proliferation of scientific papers and technical documents has become an obstacle to efficient information acquisition of new information in various fields. It is almost impossible for individual researchers to check and read all related documents. Even retrieving relevant documents is becoming harder and harder. This workshop gathers all the researchers and experts who are aiming at scientific document analysis from various perspectives, and invite technical paper presentations and system demonstrations that cover any aspects of scientific document analysis.

Important Dates (Time zone: AOE (Anywhere on Earth))

Submission Deadline for Long Papers: January 31, 2024 February 29, 2024 (Firm deadline)
Submission Deadline for Short Papers: January 31, 2024 February 29, 2024 (Firm deadline)
Notification of Acceptance (Long+Short Papers): March 15, 2024
Camera-Ready due (Long+Short Papers): March 22, 2024
Hard deadline for LNAI proceedings: March 25, 2024


Please register the workshop at registration page of JSAI International Symposia on AI 2024.


Day 1 (May 28, 2024) (Place: ROOM S (52))

  • 14:20: Workshop opening
  • 14:30-18:00: Session 1 (SC: Dr. Vu Tran)
    • 14:30: Texylon: Dataset of Log-to-Description and Description-to-Log Generation for Text Analytics Tools
      Masato Nakata, Kosuke Morita, Hirotaka Kameko, and Shinsuke Mori
    • 15:00: Tokenization Adaptation for Robust Chemical Named Entity Recognition with Limited Domain-Specific Text
      Tuan An Dao, Hiroki Teranishi, Yuji Matsumoto, Akiko Aizawa
    • Coffee break
    • 15:30: A Practical and Customizable System for Named Entity Recognition and Relation Extraction in Materials Science Publications
      Van-Thuy Phi and Yuji Matsumoto
    • 16:00: From Vietnamese to English: Advancing VQA with Cross-Linguistic Mapping (online)
      Tung Le, Dung Vo, and Huy Tien Nguyen
  • Coffee break
  • 17:00: Invited talk 1 (SC: Prof. Le-Minh Nguyen)
    Speaker: Prof. Long Tran-Thanh (The Deputy-Head and the Director of Research at the department of Computer Science, University of Warwick, UK)

    Title: Do LLM Agents Behave Strategically?

    Abstract: The rapid advancement of large language model (LLM) based AI systems has been disruptive to many sectors of our lives, and soon they will act as autonomous agents on our behalf in dealing with complex real-world problems while interacting with other agents and humans. However, it is unclear that when multiple of these LLM agents interact with each other, how they would influence each other’s behaviour. This is especially true if they are programmed to be strategic (i.e., selfish, or malicious) on the behalf of their human owners/creators. These strategic behaviours, if not mitigated efficiently, will cause societal, financial, ethical, and safety disasters. In this talk, I will discuss a number of key research challenges and problems that need to be addressed to avoid such disasters. These include: (i) last round/last iterate convergence in non-cooperative multi-agent learning; (ii) efficient learning with limited verifications against strategic manipulators; and (iii) truthful machine learning. While it is still unknown how these challenges should be addressed in the multi-LLM-agent setting, I will demonstrate how this has been addressed within the multiagent research community for simpler agent models, drawing some intuitions for future work on LLM agents.

    Short bio: Long is currently the Deputy-Head and the Director of Research at the department of Computer Science, University of Warwick, UK. He is also the university’s Chair of Digital Research Spotlight. Long has been doing active research in a number of key areas of Artificial Intelligence and multi-agent systems, mainly focusing on multi-armed bandits, game theory, and incentive engineering, and their applications to AI for Social Good. He has published more than 80 papers at peer-reviewed A* conferences in AI/ML (including AAAI, AAMAS, CVPR, ECAI, IJCAI, NeurIPS, UAI) and journals (JAAMAS, AIJ), and have received a number of prestigious national/international awards, including 2 best paper honourable mention awards at top-tier AI conferences (AAAI, ECAI), 2 Best PhD Thesis Awards (one in the UK and one in Europe), and the co-recipient of the 2021 AIJ Prominent Paper Award (for one of the 2 most influential papers between 2014-2021 published at the Artificial Intelligence Journal).
  • 18:00: Day 1 closing

Day 2 (May 29, 2024) (Place: ROOM S (52))

  • 11:00: Invited talk 2 (SC: Prof. Yuji Matsumoto )
    Speaker: Assoc. Prof. Mori Junichiro (Graduate School of Information Science and Technology, University of Tokyo)

    Title: Large-scale scholarly data analysis: network analysis perspective

    Abstract: I had been involved as a co-researcher in the JST・CREST project "Knowledge discovery from large-scale literature information based on structural understanding" under the leadership of Prof. Yuji Matsumoto. Therein, our particular focus was on "Knowledge discovery based on the structural relationships of large-scale citation networks and literature texts." In this talk, I will present the latest research outcomes in several ongoing follow-up projects related to large-scale literature data analysis. Specifically, in the current JST・CREST project "Human Computation foundations for collaboration between humans and AI," we are currently working on supporting scientific activities based on literature data analysis. Additionally, in the NEDO project "Forecasting scientific and technological trends based on the fusion of pre-trained Language models and network models," we are also working on predicting the impact of scientific research using large-scale literature data aiming at supporting science and technology policy making. Lastly, in our university's collaborative project with the industry "Technology Informatics," we are working on knowledge extraction from literature data to support R&D activities. Through these our recent researches and their findings, I would like to share insights particularly on the foundational techniques and applications of large-scale literature data analysis based on network analysis approaches.

    Short bio: Junichiro Mori is an Associate Professor at the Graduate School of Information Science and Technology, the University of Tokyo. He obtained his doctoral and master's degrees in Information Science and Technology from the University of Tokyo. His researches have focused on Artificial Intelligence, particularly in the fields of user modeling, information extraction, and social network analysis. His current research interests include data mining with graphs, social network analysis, and representation learning.
  • 12:00: Lunch break
  • 14:00-15:25: Session 2 (SC: Dr. Vu Tran)
    • 14:00: Hallucination for large language modeling: a comprehensive review
      Dang Hoang Anh, Vu Tran, and Nguyen Le Minh
    • 14:30: Vietnamese Elementary Math Reasoning using Large Language Model with Refined Translation and Dense-retrieved Chain-of-thought
      Nguyen-Khang Le, Dieu-Hien Nguyen, Dinh-Truong Do, Chau Nguyen, and Minh Le Nguyen
    • 15:00: A Framework for Enhancing Statute Law Retrieval using Large Language Models
      Trang Ngoc Anh Pham, Dinh-Truong Do, and Minh Le Nguyen
  • Coffee break
  • 15:30-17:00: Session 3 (SC: Dr. Vu Tran)
    • 15:30: Improving LLM Prompting with Ensemble of Instructions: A Case Study on Sentiment Analysis
      Vu Tran and Tomoko Matsui
    • 16:00: Enhancing Document Retrieval in COVID-19 Research: Leveraging Large Language Models for Hidden Relation Extraction
      Hoang-An Trieu, Dinh-Truong Do, Chau Nguyen, Vu Tran, and Minh Le Nguyen
    • 16:30: Semantic Parsing for Question and Answering Over DBLP Database with Large Language Models
      Le-Minh Nguyen, Le-Nguyen Khang, Kieu Que Anh, Nguyen Dieu Hien, and Yukari Nagai
  • 17:00: Workshop closing


Relevant topics include, but are not limited to, the following:

  • text analysis
  • document structure analysis
  • logical structure analysis
  • figure and table analysis
  • citation analysis of scientific and technical documents
  • scientific information assimilation
  • summarization and visualization
  • knowledge discovery/mining from scientific papers and data
  • similar document retrieval
  • entity and relation linking between documents and knowledge base
  • survey generation
  • resources for scientific documents analysis
  • document understanding in general
  • NLP systems aiming for scientific documents including tagging, parsing, coreference, etc.


There are two classes of submissions:
  • Long paper on original and completed work, including concrete evaluation and analysis wherever appropriate; and
  • Short paper on a small, focused contribution, work in progress, a negative result, or an opinion piece.

The page limits are up to 14 pages including references for the longer papers, and up to 7 pages including references for the short papers. (Reviewers will be told that there is no penalty for writing a shorter submission.)

All submissions should be written in English, formatted according to the Springer Verlag LNCS style in a pdf form, which can be obtained from here. The paper should be anonymized. If you use a word file, please follow the instruction of the format, and then convert it into a pdf form and submit it at the paper submission page.

For both classes, in addition to the original unpublished work, we also accept the papers that have already been published or presented in other venues. This submission should also be anonymized, and will be reviewed by the program committee.

You can submit your paper at https://easychair.org/conferences/?conf=scidoca2024 . If you cannot submit a paper by EasyChair System by some trouble, please send email to "nguyenml[at]jaist.ac.jp"

If a paper is accepted, at least one author of the paper must register the workshop and present it. Please register the workshop at registration page.

Workshop Chairs

Minh Le Nguyen, Japan Advanced Institute of Science and Technology
Yuji Matsumoto, RIKEN Center for Advanced Intelligence Project (Advisor)

Program Committee Members

Nguyen Le Minh, Japan Advanced Institute of Science and Technology
Noriki Nishida, RIKEN Center for Advanced Intelligence Project
Vu Tran, The Institute of Statistical Mathematics
Yusuke Miyao, The University of Tokyo
Yuji Matsumoto, RIKEN Center for Advanced Intelligence Project
Yoshinobu Kano, Shizuoka University
Akiko Aizawa, National Institute of Informatics
Ken Satoh, National Institute of Informatics and Sokendai
Junichiro Mori, The University of Tokyo
Kentaro Inui, Tohoku University
Nguyen Ha Thanh, National Institute of Informatics
Nguyen Minh Phuong, Japan Advanced Institute of Science and Technology

For any inquiry concerning the workshop, please send it to "nguyenml[at]jaist.ac.jp"

SCIDOCA 2024 home page https://www.jaist.ac.jp/event/SCIDOCA/2024/

Back To Top