Logistic regression models to predict solvent accessible residues using sequence- and homology-based qualitative and quantitative descriptors applied to a domain-complete X-ray structure learning set

Reecha Nepal, San Jose State UniversityFollow
Joanna Spencer, San Jose State UniversityFollow
Guneet Bhogal, San Jose State UniversityFollow
Amulya Nedunuri, San Jose State UniversityFollow
Thomas Poelman, Cal Poly San Luis ObispoFollow
Thejas Kamath, University of California, San DiegoFollow
Edwin Chung, San Jose State UniversityFollow
Katherine Kantardjieff, California State University - San MarcosFollow
Andrea Gottlieb, San Jose State UniversityFollow
Brooke Lustig, San Jose State UniversityFollow

Document Type

Article

Publication Date

12-2015

Publication Title

Journal of Applied Crystallography

Volume

48

Issue Number

6

First Page

1976

Last Page

1984

DOI

10.1107/S1600576715018531

Keywords

relative solvent accessibility, logistic regression, Lobanov–Galzitskaya descriptor

Disciplines

Chemistry

Abstract

A working example of relative solvent accessibility (RSA) prediction for proteins is presented. Novel logistic regression models with various qualitative descriptors that include amino acid type and quantitative descriptors that include 20- and six-term sequence entropy have been built and validated. A domain-complete learning set of over 1300 proteins is used to fit initial models with various sequence homology descriptors as well as query residue qualitative descriptors. Homology descriptors are derived from BLASTp sequence alignments, whereas the RSA values are determined directly from the crystal structure. The logistic regression models are fitted using dichotomous responses indicating buried or accessible solvent, with binary classifications obtained from the RSA values. The fitted models determine binary predictions of residue solvent accessibility with accuracies comparable to other less computationally intensive methods using the standard RSA threshold criteria 20 and 25% as solvent accessible. When an additional non-homology descriptor describing Lobanov–Galzitskaya residue disorder propensity is included, incremental improvements in accuracy are achieved with 25% threshold accuracies of 76.12 and 74.45% for the Manesh-215 and CASP(8+9) test sets, respectively. Moreover, the described software and the accompanying learning and validation sets allow students and researchers to explore the utility of RSA prediction with simple, physically intuitive models in any number of related applications.

Comments

This article was published in the Journal of Applied Crystallography, volume 48, issue 6, 2015. It is also available at this link.

This work is licensed under a Creative Commons Attribution 4.0 License.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Recommended Citation

Reecha Nepal, Joanna Spencer, Guneet Bhogal, Amulya Nedunuri, Thomas Poelman, Thejas Kamath, Edwin Chung, Katherine Kantardjieff, Andrea Gottlieb, and Brooke Lustig. "Logistic regression models to predict solvent accessible residues using sequence- and homology-based qualitative and quantitative descriptors applied to a domain-complete X-ray structure learning set" Journal of Applied Crystallography (2015): 1976-1984. https://doi.org/10.1107/S1600576715018531

Data_Preparation.zip (105424 kB)
Logistic_Regression_Modeling.zip (9416 kB)
Prediction_Results.zip (27 kB)

Faculty Publications, Chemistry

Logistic regression models to predict solvent accessible residues using sequence- and homology-based qualitative and quantitative descriptors applied to a domain-complete X-ray structure learning set

Document Type

Publication Date

Publication Title

Volume

Issue Number

First Page

Last Page

DOI

Keywords

Disciplines

Abstract

Comments

Creative Commons License

Recommended Citation

Included in

Search

Browse All

Links

SelectedWorks Sites

Faculty Publications, Chemistry

Logistic regression models to predict solvent accessible residues using sequence- and homology-based qualitative and quantitative descriptors applied to a domain-complete X-ray structure learning set

Authors

Document Type

Publication Date

Publication Title

Volume

Issue Number

First Page

Last Page

DOI

Keywords

Disciplines

Abstract

Comments

Creative Commons License

Recommended Citation

Included in

Share

Search

Browse All

Links

SelectedWorks Sites