Fifth International Workshop on SCIentific DOCument Analysis
associated with JSAI-isAI 2021

Workshop: November 13 - 15, 2021

Raiosha in Hiyoshi Campus of Keio University, Yokohama, Kanagawa, Japan. (or online)

Aims and Scope

Recent proliferation of scientific papers and technical documents has become an obstacle to efficient information acquisition of new information in various fields.. It is almost impossible for individual researchers to check and read all related documents. Even retrieving relevant documents is becoming harder and harder. This workshop gathers all the researchers and experts who are aiming at scientific document analysis from various perspectives, and invite technical paper presentations and system demonstrations that cover any aspects of scientific document analysis.

Important Dates

Paper submission deadline: September 15, 2021
Paper submission deadline: September 29, 2021 (revised!)
Camera-ready due: October 20, 2021 (11:59pm PST; UTC-8)
Workshop: November 13-15, 2021


Please register the workshop at registration page of JSAI International Symposia on AI 2021.


Relevant topics include, but are not limited to, the following:

  • text analysis
  • document structure analysis
  • logical structure analysis
  • figure and table analysis
  • citation analysis of scientific and technical documents
  • scientific information assimilation
  • summarization and visualization
  • knowledge discovery/mining from scientific papers and data
  • similar document retrieval
  • entity and relation linking between documents and knowledge base
  • survey generation
  • resources for scientific documents analysis
  • document understanding in general
  • NLP systems aiming for scientific documents including tagging, parsing, coreference, etc.

Invited Speaker

Prof. Iryna Gurevych, UKP Lab at the Technical University, Germany

Title:Towards consent-driven, ethically sound NLP for peer reviews

Abstract: Peer review is the major way to determine the status and importance of research outputs in science. The explosive publication growth of the past decades puts a strain on the traditional peer reviewing. Peer reviews are text, and thus make a promising target for natural language processing, from simple reviewer assistance to end-to-end review generation. However, peer reviewing data available for NLP research is scarce: existing datasets come from limited domains and are associated with a range of ethical and privacy-related challenges. What data does a single peer reviewing campaign produce? Who owns this data, and who should have a say in making it public? How can this data be redistributed and built upon? How can a research community transition to semi-open peer review, if decided to do so? UKP Lab leads the discussion on making data from the ACL community available for peer reviewing research. In this talk, we will discuss the major challenges of peer review as NLP data type, and present our past and ongoing work in analyzing peer reviewing data, providing secure access to sensitive review texts, and building sustainable workflows for continuous peer reviewing data collection. Our efforts contribute to consent-driven, ethically sound NLP for peer reviews in the ACL community and beyond.

Bio: Iryna Gurevych (PhD 2003, U. Duisburg-Essen, Germany) is professor of Computer Science and director of the Ubiquitous Knowledge Processing (UKP) Lab at the Technical University (TU) of Darmstadt in Germany. She joined TU Darmstadt in 2005 (tenured as full professor in 2009). Her main research interests are in machine learning for large-scale language understanding, text semantics and scientific literature mining. Iryna’s work has received numerous awards, e.g. ACL fellow 2020, or the first Hessian LOEWE Distinguished Chair (2,5 mil. Euro) in 2021. Currently, Iryna is the SIGDAT president and the co-director of the ELLIS NLP program. She was PC co-chair of ACL 2018 and has been elected to be the future president (2023) of the international Association for Computational Linguistics (ACL.


There are two classes of submissions:
  • Long paper on original and completed work, including concrete evaluation and analysis wherever appropriate; and
  • Short paper on a small, focused contribution, work in progress, a negative result, or an opinion piece.

The page limits are up to 14 pages including references for the longer papers, and up to 7 pages including references for the short papers. (Reviewers will be told that there is no penalty for writing a shorter submission.)

All submissions should be written in English, formatted according to the Springer Verlag LNCS style in a pdf form, which can be obtained from here. The paper should be anonymized. If you use a word file, please follow the instruction of the format, and then convert it into a pdf form and submit it at the paper submission page.

For both classes, in addition to the original unpublished work, we also accept the papers that have already been published or presented in other venues. This submission should also be anonymized, and will be reviewed by the program committee.

The accepted papers will not be archived in general. The papers are distributed to the participants of the workshop on a USB flash drive. If the authors hope to make their paper publicly available, we also will provide a link to the pdf on this webpage. Otherwise, we do not upload the papers on the web. Unpublished submissions on both long and short paper tracks are considered as the candidates for post-proceedings of LNAI (the authors can also reject the invitation, if they wish). The papers will be archived only by this post-proceedings.

You can submit your paper at . If you cannot submit a paper by EasyChair System by some trouble, please send email to "nguyenml[at]"

If a paper is accepted, at least one author of the paper must register the workshop and present it. Please register the workshop at registration page.

Post Proceedings

Selected papers will be published as a post-proceedings via Springer Verlag "Lecture Notes in Artificial Intelligence" series after the second round of review after the workshop.

SCIDOCA2021 Program (November 13, 2021)

  • 09:50-10:00: Opening
  • 10:00-12:30: Session 1 (SC: Prof. Yuji Matsumoto)
  • 10:00-10:30:
    • Ha-Thanh Nguyen, Vu Tran, Binh Dang, Minh Quan Bui, Phuong Nguyen and Nguyen Le Minh. HYDRA - Hyper Dependency Representation Attentions
  • 10:30-11:00:
    • Nguyen Huy Xuan, Le Minh Nguyen and Long H. Trieu. Investigating the Effects of Pre-trained BERT to Improve Sparse Data Recommender Systems
  • 11:00-11:30:
    • Swayatta Daw and Vikram Pudi. Long Tailed Entity Extraction of Model Names using Distant Supervision
  • 11:30-12:00:
    • Chau Nguyen, Minh-Phuong Nguyen, Tung Le and Le-Minh Nguyen. Cybersecurity Text Analysis: Identification of Token Labels in Cybersecurity Texts
  • 12:00-13:30: Lunch
  • 13:30-14:00: Session 2 (SC: Assistant Prof. Tran Duc Vu):
  • 13:30-14:00:
    • An Dao and Akiko Aizawa. Domain Adaptation for Named Entity Recognition: An Analysis on Letter-case
  • 14:00-16:00: Session 3 (SC: Prof. Akiko Aizawa):
  • 14:00-14:30:
    • Hong Son Nguyen, Minh-Tien Nguyen, Tuan Anh Nguyen Dang and Minh Hieu Vu. Jointly Learning Span Extraction and Sequence Labeling for Information Extraction from Business Documents
  • 14:30-15:00:
    • Shanshan Liu, Tatsuya Ishigaki, Yui Uehara, Hiroya Takamura, Chowdhury Mohammad Mahir Asef, Mutsunori Uenuma, Hiroyuki Shindo and Yuji Matsumoto. A Generative Approach for End-to-End Relation Extraction of Synthesis Procedure
  • 15:00-15:30:
    • Nguyen-Khang Le, Dieu-Hien Nguyen, Thi-Thu-Trang Nguyen, Minh Phuong Nguyen, Tung Le and Minh Le Nguyen. An Improvement of Geometric Representation in Sentence Embedding of Vietnamese Question-Answering system
  • 15:30-16:00:
    • Hiroki Teranishi and Yuji Matsumoto. Coordination Augmentation using Language Models
  • 16:00-17:00: Invited Talk (SC: Prof. Nguyen Le Minh and Assistant Prof. Tran Duc Vu):
    • Iryna Gurevych. Towards consent-driven, ethically sound NLP for peer reviews
  • 17:00-17:10: Closing

Workshop Chairs

Minh Le Nguyen, Japan Advanced Institute of Science and Technology
Yuji Matsumoto, RIKEN Center for Advanced Intelligence Project (Advisor)

Program Committee Members

Nguyen Le Minh, Japan Advanced Institute of Science and Technology
Noriki Nishida, RIKEN Center for Advanced Intelligence Project
Vu Tran, The Institute of Statistical Mathematics
Yusuke Miyao, The University of Tokyo
Yuji Matsumoto, RIKEN Center for Advanced Intelligence Project
Yoshinobu Kano, Shizuoka University
Akiko Aizawa, National Institute of Informatics
Ken Satoh, National Institute of Informatics and Sokendai
Junichiro Mori, The University of Tokyo
Kentaro Inui, Tohoku University

For any inquiry concerning the workshop, please send it to "nguyenml[at]"

SCIDOCA 2021 home page

Back To Top