Faculty PublicationsCopyright (c) 2015 San Jose State University All rights reserved.
http://scholarworks.sjsu.edu/math_pub
Recent documents in Faculty Publicationsen-usFri, 27 Feb 2015 18:08:17 PST3600Evidence Contrary to the Statistical View of Boosting
http://scholarworks.sjsu.edu/math_pub/5
http://scholarworks.sjsu.edu/math_pub/5Wed, 18 Sep 2013 17:20:35 PDT
The statistical perspective on boosting algorithms focuses on optimization, drawing parallels with maximum likelihood estimation for logistic regression. In this paper we present empirical evidence that raises questions about this view. Although the statistical perspective provides a theoretical framework within which it is possible to derive theorems and create new algorithms in general contexts, we show that there remain many unanswered important questions. Furthermore, we provide examples that reveal crucial flaws in the many practical suggestions and new methods that are derived from the statistical view. We perform carefully designed experiments using simple simulation models to illustrate some of these flaws and their practical consequences.
]]>
David Mease et al.Refereed ArticlesEnhancing the Communication Competency of Business Undergraduates: A Consumer Socialization Perspective
http://scholarworks.sjsu.edu/math_pub/4
http://scholarworks.sjsu.edu/math_pub/4Wed, 18 Sep 2013 17:20:33 PDT
Explaining how individuals acquire the necessary skills and knowledge to effectively participate in society is often accomplished through Socialization Theory. We investigate numerous socialization agents and their relationship with the communication competency of university business majors. Communication competency (reading, writing, and verbal) was measured via both a standardized skill test and self report. Exploratory analysis was conducted upon high and low communication competency groups that were identified via cluster analysis. Our findings generally indicate the most important socialization agents are via personal interactions whereas the least important socialization agents are influencing via primarily electronic or media-based methods.
]]>
K. C. Gehrt et al.Refereed ArticlesAnalysis of the convergence history of fluid flow through nozzles with shocks
http://scholarworks.sjsu.edu/math_pub/3
http://scholarworks.sjsu.edu/math_pub/3Wed, 18 Sep 2013 17:20:30 PDT
"Convergence of iterative methods for the solution of the steady quasi-one-dimensional nozzle problem with shocks is considered. The finite-difference algorithms obtained from implicit schemes are used to approximate both the Euler and Navier-Stokes Equations. These algorithms are investigated for stability and convergence characteristics. The numerical methods are broken down into their matrix-vector components and then analyzed by examining a subset of the eigensystem using a method based on the Arnoldi process. The eigenvalues obtained by this method are accurate to within 5 digits for the largest ones and to within 2 digits for the ones smaller in magnitude compared the elgenvalues obtained using the full Jacobian. In the analysis we examine the functional relationship between the numerical parameters and the rate of convergence of the iterative scheme. Acceleration techniques for iterative methods like Wynn's e-algorithm are also applied to these systems of difference equations in order to accelerate their convergence. This acceleration translates into savings in the total number of iterations and thus the total amount of computer time required to obtain a converged solution. The rate of convergence of the accelerated system is found to agree with the prediction based on the eigenvalues of the original iteration matrix. The ultimate goal of this study is to extend this elgenvalue analysis to multi-dimensional problems and to quantitatively estimate the effects of different parameters on the rate of convergence."
]]>
Mohammad Saleem et al.Publications / AbstractsComment: Boosting Algorithms: Regularization, Prediction and Model Fitting
http://scholarworks.sjsu.edu/math_pub/2
http://scholarworks.sjsu.edu/math_pub/2Wed, 18 Sep 2013 17:20:26 PDT
The authors are doing the readers of Statistical Science a true service with a well-written and up-to-date overview of boosting that originated with the seminal algorithms of Freund and Schapire. Equally, we are grateful for high-level software that will permit a larger readership to experiment with, or simply apply, boosting-inspired model fitting. The authors show us a world of methodology that illustrates how a fundamental innovation can penetrate every nook and cranny of statistical thinking and practice. They introduce the reader to one particular interpretation of boosting and then give a display of its potential with extensions from classification (where it all started) to least squares, exponential family models, survival analysis, to base-learners other than trees such as smoothing splines, to degrees of freedom and regularization, and to fascinating recent work in model selection. The uninitiated reader will find that the authors did a nice job of presenting a certain coherent and useful interpretation of boosting. The other reader, though, who has watched the business of boosting for a while, may have quibbles with the authors over details of the historic record and, more importantly, over their optimism about the current state of theoretical knowledge. In fact, as much as "the statistical view" has proven fruitful, it has also resulted in some ideas about why boosting works that may be misconceived, and in some recommendations that may be misguided.
]]>
A. Buja et al.Refereed ArticlesBoosted Classification Trees and Class Probability/Quantile Estimation
http://scholarworks.sjsu.edu/math_pub/1
http://scholarworks.sjsu.edu/math_pub/1Wed, 18 Sep 2013 17:20:24 PDT
The standard by which binary classifiers are usually judged, misclassification error, assumes equal costs of misclassifying the two classes or, equivalently, classifying at the 1/2 quantile of the conditional class probability function P[y = 1jx]. Boosted classification trees are known to perform quite well for such problems. In this article we consider the use of standard, off-the-shelf boosting for two more general problems: 1) classification with unequal costs or, equivalently, classification at quantiles other than 1/2, and 2) estimation of the conditional class probability function P[y = 1jx]. We first examine whether the latter problem, estimation of P[y = 1jx], can be solved with Logit- Boost, and with AdaBoost when combined with a natural link function. The answer is negative: both approaches are often ineffective because they overfit P[y = 1jx] even though they perform well as classifiers. A major negative point of the present article is the disconnect between class probability estimation and classification. Next we consider the practice of over/under-sampling of the two classes. We present an algorithm that uses AdaBoost in conjunction with Over/Under-Sampling and Jittering of the data (“JOUS-Boost”). This algorithm is simple, yet successful, and it preserves the advantage of relative protection against overfitting, but for arbitrary misclassification costs and, equivalently, arbitrary quantile boundaries. We then use collections of classifiers obtained from a grid of quantiles to form estimators of class probabilities. The estimates of the class probabilities compare favorably to those obtained by a variety of methods across both simulated and real data sets.
]]>
David Mease et al.Refereed Articles