Bio-IT-Station


HK Tsai Lab of Bioinformatics

Institute of Information Science, Academia Sinica

News

Lab Lunch & Trip: Riddle City

We had a wonderful lab gathering with intern presentations, a farewell & welcome lunch! A puzzle game “Riddle City - 捷運踩地

Summer: Weclome New Friends

Welcoming new interns and friends to our lab—let’s explore, learn, and grow together!

A Night of Reunions and Connections

One lab group photo over delicious duck dinner — good luck, Jeff! Wishing you all the best on your new path. We were also delighted to welcome back 詠晴

Projects

TFAS

A bioinformatics-based exploration on the promoter occupancy and alternative splicing in the human genome

LncRNA

Reveal the function of lncRNA on the transcriptional regulation and epigenetic regulation

Lab Members

Principal Investigator

Avatar

Huai-Kuang Tsai

Research Fellow/Professor

Evolutionary Algorithm, Bioinformatics, Regulatory Mechanism, Metagenomics, Computational Biology

Researchers

Avatar

Yu-Hsuan Huang

Postdoctoral Researcher

Machine Learning, Genomics, Virology

Avatar

Bing-Shiun Tsai

Research Assistant

Machine Learning, Bioinformatics

Avatar

Kai-Ze Zhu

Research Assistant

Statistical Computation, Machine Learning, Variables Selection in High Dimensional Data, Genomic

Avatar

Shu-Qi Yu

Research Assistant

Bioinformatics, Network, Graph Theory, Algorithm

Avatar

Ting-Yu Yeh

Research Assistant

Machine Learning, Network Biology, Genomics

Grad Students

Avatar

Ru-Yin Jian

Doctoral Student

Machine Learning, Bioinformatics, Cancer

Avatar

Shang-Kok NG

Doctoral Student

Bioinformatics, Cancer

Administration

Visiting Scholars

Avatar

Jia-Hsin Huang

Assistant Professor

Insect Physiology, Bioinformatics, Genomics

Avatar

Wong Jin Yung

Assistant Professor

Evolution, Genomics, Machine Learning, Biomechanics

Alumni

Recent Publications

Quickly discover relevant content by filtering publications.

A large language model framework for literature-based disease–gene association prediction

With the exponential growth of biomedical literature, leveraging Large Language Models (LLMs) for automated medical knowledge understanding has become increasingly critical for advancing precision medicine. However, current approaches face significant challenges in reliability, verifiability, and scalability when extracting complex biological relationships from scientific literature using LLMs. To overcome the obstacles of LLM development in biomedical literature understating, we propose LORE, a novel unsupervised two-stage reading methodology with LLM that models literature as a knowledge graph of verifiable factual statements and, in turn, as semantic embeddings in Euclidean space. LORE captured essential gene pathogenicity information when applied to PubMed abstracts for large-scale understanding of disease–gene relationships. We demonstrated that modeling a latent pathogenic flow in the semantic embedding with supervision from the ClinVar database led to a 90% mean average precision in identifying relevant genes across 2097 diseases. This work provides a scalable and reproducible approach for leveraging LLMs in biomedical literature analysis, offering new opportunities for researchers to identify therapeutic targets efficiently.

Discovery and prioritization of genetic determinants of kidney function in 297,355 individuals from Taiwan and Japan

Current genome-wide association studies (GWAS) for kidney function lack ancestral diversity, limiting the applicability to broader populations. The East-Asian population is especially under-represented, despite having the highest global burden of end-stage kidney disease. We conducted a meta-analysis of multiple GWASs (n = 244,952) on estimated glomerular filtration rate and a replication dataset (n = 27,058) from Taiwan and Japan. This study identified 111 lead SNPs in 97 genomic risk loci. Functional enrichment analyses revealed that variants associated with F12 gene and a missense mutation in ABCG2 may contribute to chronic kidney disease (CKD) through influencing inflammation, coagulation, and urate metabolism pathways. In independent cohorts from Taiwan (n = 25,345) and the United Kingdom (n = 260,245), polygenic risk scores (PRSs) for CKD significantly stratified the risk of CKD (p < 0.0001). Further research is required to evaluate the clinical effectiveness of PRSCKD in the early prevention of kidney disease.

Predicting splicing patterns from the transcription factor binding sites in the promoter with deep learning

Alternative splicing is a pivotal mechanism of post-transcriptional modification that contributes to the transcriptome plasticity and proteome diversity in metazoan cells. Although many splicing regulations around the exon/intron regions are known, the relationship between promoter-bound transcription factors and the downstream alternative splicing largely remains unexplored. In this study, we present computational approaches to unravel the regulatory relationship between promoter-bound transcription factor binding sites (TFBSs) and the splicing patterns. We curated a fine dataset that includes DNase I hypersensitive site sequencing and transcriptomes across fifteen human tissues from ENCODE. Specifically, we proposed different representations of TF binding context and splicing patterns to examine the associations between the promoter and downstream splicing events. While machine learning models demonstrated potential in predicting splicing patterns based on TFBS occupancies, the limitations in the generalization of predicting the splicing forms of singleton genes across diverse tissues was observed with carefully examination using different cross-validation methods. We further investigated the association between alterations in individual TFBS at promoters and shifts in exon splicing efficiency. Our results demonstrate that the convolutional neural network (CNN) models, trained on TF binding changes in the promoters, can predict the changes in splicing patterns. Furthermore, a systemic in silico substitutions analysis on the CNN models highlighted several potential splicing regulators. Notably, using empirical validation using K562 CTCFL shRNA knock-down data, we showed the significant role of CTCFL in splicing regulation. In conclusion, our finding highlights the potential role of promoter-bound TFBSs in influencing the regulation of downstream splicing patterns and provides insights for discovering alternative splicing regulations.

Contact