Faculty Research, Scholarly, and Creative Activity

A note on the price of bandit feedback for mistake-bounded online learning

Jesse Geneson, San Jose State UniversityFollow

Publication Date

6-12-2021

Document Type

Article

Publication Title

Theoretical Computer Science

Volume

874

DOI

10.1016/j.tcs.2021.05.009

First Page

Last Page

Abstract

The standard model and the bandit model are two generalizations of the mistake-bound model to online multiclass classification. In both models the learner guesses a classification in each round, but in the standard model the learner receives the correct classification after each guess, while in the bandit model the learner is only told whether or not their guess is correct in each round. For any set F of multiclass classifiers, define optstd(F) and optbandit(F) to be the optimal worst-case number of prediction mistakes in the standard and bandit models respectively. Long (Theoretical Computer Science, 2020) claimed that for all M>2 and infinitely many k, there exists a set F of functions from a set X to a set Y of size k such that optstd(F)=M and optbandit(F)≥(1−o(1))(|Y|ln⁡|Y|)optstd(F). The proof of this result depended on the following lemma, which is false e.g. for all prime p≥5, s=1 (the all 1 vector), t=2 (the all 2 vector), and all z. Lemma: Fix n≥2 and prime p, and let u be chosen uniformly at random from {0,…,p−1}n. For any s,t∈{1,…,p−1}n with s≠t and for any z∈{0,…,p−1}, we have [Formula presented]. We show that this lemma is false precisely when s and t are multiples of each other mod p. Then using a new lemma, we fix Long's proof.

Keywords

Bandit feedback, Learning theory, Mistake-bound model, Online learning

Department

Mathematics and Statistics

Recommended Citation

Jesse Geneson. "A note on the price of bandit feedback for mistake-bounded online learning" Theoretical Computer Science (2021): 42-45. https://doi.org/10.1016/j.tcs.2021.05.009

Link to Full Text

Find in your library

COinS

Faculty Research, Scholarly, and Creative Activity

A note on the price of bandit feedback for mistake-bounded online learning

Publication Date

Document Type

Publication Title

Volume

DOI

First Page

Last Page

Abstract

Keywords

Department

Recommended Citation

Search

Browse All

Links

Faculty Research, Scholarly, and Creative Activity

A note on the price of bandit feedback for mistake-bounded online learning

Authors

Publication Date

Document Type

Publication Title

Volume

DOI

First Page

Last Page

Abstract

Keywords

Department

Recommended Citation

Share

Search

Browse All

Links