Publication Date

Spring 5-22-2019

Degree Type

Master's Project

Degree Name

Master of Science (MS)


Computer Science

First Advisor

Suneuy Kim

Second Advisor

Christopher Pollett

Third Advisor

Robert Chun


Geospatial data has garnered enough attention in recent times that it is being used everywhere right from simple applications such as booking a taxi ride to complex applications such as autonomous driving. Though the attention towards geospatial processing is something new, substantial research has been going on for years. With the evolution of NoSQL databases in recent times, geospatial processing has attained a new dimension concerning its applications and capability. The most popular NoSQL database to be used for geospatial processing is the MongoDB followed by Cassandra. It is the indexing process that is important concerning the data at hand irrespective of the type of the database. Some of the most common indexes used for the geospatial processing are R-tree, R*-tree, B-tree, Z-curve. R*-tree is the area of our study as it is one among the widely used indexes for geospatial querying. The database of our interest is Cassandra as it is one among the widely used NoSQL database that does not have native support for geospatial query processing. To support geospatial workload, Cassandra should interact with external libraries such as GeoMesa and Solr. In particular, we are interested in the working of the GeoMesa as it uses the Z-curve as the indexing mechanism for the geospatial processing. R*-tree is a dynamic structure capable of representing multi-dimensional data whereas Z-curves are capable of representing multi-dimensional data in a single dimension. In this study, we compare and contrast the performance of R*-tree and Z-curve for various geospatial operations in Cassandra.