Publication Date

2008

Degree Type

Master's Project

Degree Name

Master of Science (MS)

Department

Computer Science

Abstract

Grid-based visualization portals help scientists explore data that is distributed across the globe and to visualize the data. Visualization allows scientists to explore data effectively and helps them to obtain further insights into the data. We developed a visualization grid portal whose main aim is to be able to store large data sets across machines in a cluster in a distributed fashion, and to allow users of the Grid Portal to visualize the data set effectively. This Grid portal uses HADOOP, a grid platform that facilitates flexible data storage in a distributed fashion, and supports distributed computation as well. The main goal of the Grid portal is to support positional datasets from the user, process them on the grid efficiently, and produce the visualization. The input on the grid is partitioned into multiple pieces and each partition is executed concurrently. Current implementation of HADOOP does not consider any boundaries when it partitions the input, which limits the kind of applications that can run on the Grid. Our aim is to implement boundary-based input partition and to enable online job submission to the grid. We discuss the advantages and disadvantages of the boundary-based input split. Finally, we compare the performance of grid processing with standalone machine processing of the same dataset and determine which approach is more efficient and faster.

Share

COinS