Publication Date

Spring 2025

Degree Type

Master's Project

Degree Name

Master of Science in Computer Science (MSCS)

Department

Computer Science

First Advisor

Chris Pollett

Second Advisor

Wendy Lee

Third Advisor

Robert Chun

Keywords

Human Cell Atlas, Tabula Sapiens, Single-cell RNA sequencing, Elbow method, k-means, Gene expression, Shannon’s Diversity Index, Pielou’s Evenness Index

Abstract

The Human Cell Atlas (HCA) created a reference map of all human cells. My project uses the Tabula Sapiens dataset, developed under HCA and based on single-cell RNA sequencing data, to explore cell type and tissue diversity. I performed experiments using the Elbow method and a formula based on dataset observations to determine the number of clusters, then applied k-means clustering on two representative subsets of the All Cells dataset. Clusters were selected for analysis using Shannon’s Diversity Index and Pielou’s Evenness. A novel algorithm based on the cell differentiation tree was used to validate the biological coherence of the clusters. Cell type emerged as the most informative feature for interpreting cluster structure. The report concludes with a summary of results and recommendations for future research.

Available for download on Monday, May 25, 2026

Share

COinS