Publication Date

Spring 2020

Degree Type

Master's Project

Degree Name

Master of Science (MS)

Department

Computer Science

First Advisor

Chris Pollett

Second Advisor

Robert Chun

Third Advisor

Tthomas Austtin

Keywords

Hash2vec, Word2Vec, Sequence-to-Sequence Translation, Deep Learning, Recurrent Neural Network (RNN), Principal Component Analysis (PCA).

Abstract

Machine Translation is the study of computer translation of a text written in one human language into text in a different language. Within this field, a word embedding is a mapping from terms in a language into small dimensional vectors which can be processed using mathematical operations. Two traditional word embedding approaches are word2vec, which uses a Neural Network, and hash2vec, which is based on a simpler hashing algorithm. In this project, we have explored the relative suitability of each approach to sequence to sequence text translation using a Recurrent Neural Network (RNN). We also carried out experiments to test if we can directly compute a mapping between word embeddings in one language to word embeddings in another language using Linear Regression followed by Principal Component Analysis (PCA).

We trained the word2vec model for 24 hours using google collab default settings. This word2vec model when applied to sentence translation produced results with 85% accuracy. Surprisingly, the hash2vec model performed relatively well with 60% accuracy. The hash2vec model required only 6 hours of processing time which saved a lot of time spent in training the word2vec model. Further research can be carried out using the hash2vec technique on larger datasets and applying it to different machine learning models.

Recommended Citation

Gaikwad, Neha, "Comparison of Word2vec with Hash2vec for Machine Translation" (2020). Master's Projects. 919.
DOI: https://doi.org/10.31979/etd.e7pz-uhqh
https://scholarworks.sjsu.edu/etd_projects/919

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

DOI

https://doi.org/10.31979/etd.e7pz-uhqh

Master's Projects

Comparison of Word2vec with Hash2vec for Machine Translation

Publication Date

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Keywords

Abstract

Recommended Citation

Included in

DOI

Search

Browse All

Links

Master's Projects

Comparison of Word2vec with Hash2vec for Machine Translation

Author

Publication Date

Degree Type

Degree Name

Department

First Advisor

Second Advisor

Third Advisor

Keywords

Abstract

Recommended Citation

Included in

Share

DOI

Search

Browse All

Links