本文へジャンプ

"Data scientist: The sexiest job of the 21st century"
(Harvard Business Review, October 2012)

HO研究室 HO Laboratory
教授:HO TU BAO(Ho Tu Bao)

E-mail:baojaist.ac.jp
[研究分野]
machine learning, data mining, big data analytics, biomedicine informatics
[キーワード]
finding knowledge from data, sparse modeling, dimensionality reduction, electronic medical records.

研究を始めるのに必要な知識・能力

Having ability of effectively analyzing and using data becomes crucial in our time, especially when data are huge and complex. Though previously having different background, you can learn and train to become a data scientist. For the common level of doing data analytics, no prerequisites are required. For the advanced level of doing research on data analytics, knowledge on statistics and some programming skill, which you can learn at JAIST, are needed.

この研究で身につく能力

Through engaging the study and research activities, students are expected to have the following qualities. First is to understand the principles and process of data analytics, the data nature and different problem settings in data analytics. Second is to understand and be able to use data analytics tools to analyze datasets. Third, the most important quality, is to know how to choose the most appropriate methods to solve and to explain the result for each concrete problem. Overall, students are expected to be able to formulate and solve significant problems relating to exploiting data for solutions.

【就職先企業・職種】 Our graduates often work at information technology companies, organizations using data analytics or at universities.

研究内容

Outline of laboratory

Our research and education is on data analytics with focus on machine learning and data mining. Data analytics refers to the process of collecting, organizing and analyzing large sets of complex data (big data) to discover patterns and other useful information. Data analytics will help organizations to better understand the information most important to their activities and business decisions. Our activities range from fundamental research such as computational models in life science to applied research such as methods of analyzing electronic medical records (EMRs).

Research areas

Our research activities divided in three groups:
Basic research in machine learning and data mining: Research on machine learning and data mining aims to automatically induce knowledge (patterns and models) from data. Our main research challenges are to develop algorithms that can deal with structured data (sequences, text, web, graphs, etc.) stored in very large databases. We have been working on several ‘tough’ problems such as learning from imbalanced data, similarities for heterogeneous data, kernel methods, privacy-preserving data mining, and recently on sparse modeling and dimensionality reduction.
Text and Web data mining: On the one hand, we develop statistical learning methods for needs of text and Web data processing. On the other hand, we aim to do applications of text and Web mining in other fields such as economy, services, technology management, etc.Our current focus is on methods to analyze clinical text from EMRs.
Scientific data mining: We aim to develop new methods for exploiting data to solve problems in medicine, biology, and materials science. The research includes study on hepatitis with temporal abstraction and text mining, protein-protein interaction networks, disease-related genes, computational methods to discover microRNA functions in human genome, and EMRs. Our current focus is on establishing a methodology and tools to exploiting EMRs. This includes preprocessing and transform EMRs into computable forms, diagnosis support systems and disease-drug relationship study based on EMRs

主な研究業績

  1. Bui, N.T., Ho, T.B., Kanda, T. (2015). A semi-supervised tensor regression model for siRNA efficacy prediction, BMC Bioinformatics (in press).
  2. Than, K., Ho, T.B., Nguyen, D.K. (2014). An effective framework for supervised dimension reduction, Neurocomputing, Vol. 139, 397-407.
  3. Than, K., Ho, T.B. (2014). Modeling the diversity and log-normal of data, Intelligent Data Analysis, Volume 18(6), 1067-1088.

使用装置

Supercomputers(CRAY XC30, SGI AltixUV1000)
Cluster systems

研究室の指導方針

Data analytics is fast changing and can be used in almost every field, and thus our main policy is to instruct students with ability of self-study and adaptation. To this end, we aim to provide firmed knowledge and a balance between theory and practice.

[研究室HP] URL:http://www.jaist.ac.jp/ks/labs/ho/

ページの先頭へもどる