Even though cloud server has been around for a few years, most of the web hosts today have not converted to cloud yet. If the purpose of the cloud server is distributing and storing files on the internet, FTP servers were much earlier than the cloud. FTP server is sufficient to distribute content on the internet. Therefore, is it worth to shift from FTP server to cloud server? The cloud storage provider declares high durability and availability for their users, and the ability to scale up for more storage space easily could save users tons of money. However, does it provide higher performance and better security features? Hadoop is a very popular platform for cloud computing. It is free software under Apache License. It is written in Java and supports large data processing in a distributed environment. Characteristics of Hadoop include partitioning of data, computing across thousands of hosts, and executing application computations in parallel. Hadoop Distributed File System allows rapid data transfer up to thousands of terabytes, and is capable of operating even in the case of node failure. GridFTP supports high-speed data transfer for wide-area networks. It is based on the FTP and features multiple data channels for parallel transfers. This report describes the technology behind HDFS and enhancement to the Hadoop security features with Kerberos. Based on data transfer performance and security features of HDFS and GridFTP server, we can decide if we should replace GridFTP server with HDFS. According to our experiment result, we conclude that GridFTP server provides better throughput than HDFS, and Kerberos has minimal impact to HDFS performance. We proposed a solution which users authenticate with HDFS first, and get the file from HDFS server to the client using GridFTP.
Liu, Wei-Li, "Cloud Storage Performance and Security Analysis with Hadoop and GridFTP" (2013). Master's Projects. 289.