Publication Date

Spring 5-18-2020

Degree Type

Master's Project

Degree Name

Master of Science (MS)

Department

Computer Science

First Advisor

Robert Chun

Second Advisor

Thomas Austin

Third Advisor

Shobhit Saxena

Abstract

With the increase in Internet usage, it is now considered a very important platform for advertising and marketing. Digital marketing has become very important to the economy: some of the major Internet services available publicly to users are free, thanks to digital advertising. It has also allowed the publisher ecosystem to flourish, ensuring significant monetary incentives for creating quality public content, helping to usher in the information age. Digital advertising, however, comes with its own set of challenges. One of the biggest challenges is ad fraud. There is a proliferation of malicious parties and software seeking to undermine the ecosystem and causing monetary harm to digital advertisers and ad networks. Pay-per-click advertising is especially susceptible to click fraud, where each click is highly valuable. This leads advertisers to lose money and ad networks to lose their credibility, hurting the overall ecosystem. Much of the fraud detection is done in offline data pipelines, which compute fraud/non-fraud labels on clicks long after they happened. This is because click fraud detection usually depends on complex machine learning models using a large number of features on huge datasets, which can be very costly to train and lookup. In this thesis, the existence of low-cost ad click fraud classifiers with reasonable precision and recall is hypothesized. A set of simple heuristics as well as basic machine learning models (with associated simplified feature spaces) are compared with complex machine learning models, on performance and classification accuracy. Through research and experimentation, a performant classifier is discovered which can be deployed for real-time fraud detection.

Share

COinS