Publication Date

Fall 2012

Degree Type

Master's Project

Department

Computer Science

Abstract

Yioop is an open source search engine written in PHP. Yioop can be used for personal crawls on predefined URLs or as any traditional search engine to crawl the entire Web. This project added to the Yioop search engine the ability to crawl and index various resources that could be considered a part of the Invisible Web. The invisible web refers to the information like database content, non-text files, JavaScript links, password restricted sites, URL shortening services etc. on the Web. Often, a user might want to crawl and index different kinds of data which are commonly not indexed by the traditional search engines. Mining of log files and converting them into a readable format often helps in system management. In this project, the file format of log files has been studied and was noticed that they contain some predefined fields and a user is provided with a user interface to provide details of field names and the field types of the log fields. Indexing databases is one of the other features that would be helpful to a user. This project will act as a resource to the user to index the database records by entering a simple query into the interface created in Yioop and the specific database records are crawled and indexed providing the user with the ability to search for his desired keywords. This project's goal was to successfully embed these features in Yioop using the existing capabilities of Yioop.

Share

COinS