VIRTUAL SCREENING OF CANDIDATE DRUG COMPOUNDS USING MOLECULAR DESCRIPTORS, PROTEIN PHYSICOCHEMICAL PROPERTIES, BINDING AFFINITY, AND MACHINE LEARNING-BASED ACTIVITY PREDICTION
Abstract
Virtual screening of candidate drug compounds is an important computational strategy for accelerating early-stage drug discovery by identifying compounds with predicted biological activity before experimental validation. This study evaluated candidate drug compounds using molecular descriptors, protein physicochemical properties, binding affinity, and machine learning-based activity prediction. The dataset contained 2,000 compound–protein interaction records and 17 variables, including molecular weight, LogP, hydrogen bond donors and acceptors, rotatable bonds, polar surface
area, protein length, protein isoelectric point, hydrophobicity, binding site size, engineered interaction features, binding affinity, and binary activity status. Data preprocessing involved missing value treatment, duplicate assessment, feature standardization, and stratified data splitting. Exploratory analysis showed that active and inactive compounds differed mainly in binding affinity, LogP, protein pI, and LogP–pI interaction. Multiple supervised learning models were developed, including Logistic Regression, Random Forest, Support Vector Machine, Gradient Boosting, and k-Nearest Neighbors. Random Forest and Gradient Boosting produced the strongest classification performance, while feature
importance analysis identified binding affinity as the dominant predictor, followed by LogP–pI interaction and LogP. The findings indicate that integrated molecular, protein, and interaction-based descriptors can support accurate activity prediction and candidate prioritization, although external validation remains necessary before biological interpretation.
Downloads
References
Andrianov, G. V., Haroldsen, E., & Karanicolas, J. (2024). vScreenML v2. 0: Improved Machine Learning
Classification for Reducing False Positives in Structure-Based Virtual Screening. International Journal of Molecular
Sciences, 25(22), 12350.
Bender, A., & Cortes-Ciriano, I. (2021). Artificial intelligence in drug discovery: what is realistic, what are illusions?
Part 2: a discussion of chemical and biological data. Drug Discovery Today, 26(4), 1040-1052.
Bender, A., & Cortés-Ciriano, I. (2021). Artificial intelligence in drug discovery: what is realistic, what are illusions?
Part 1: Ways to make an impact, and why we are not there yet. Drug discovery today, 26(2), 511-524.
Cieślak, M., Danel, T., Krzysztyńska-Kuleta, O., & Kalinowska-Tłuścik, J. (2024). Machine learning accelerates
pharmacophore-based virtual screening of MAO inhibitors. Scientific Reports, 14(1), 8228.
D’Souza, S., Prema, K. V., & Balaji, S. (2020). Machine learning models for drug–target interactions: current
knowledge and future directions. Drug Discovery Today, 25(4), 748-756.
Gentile, F., Agrawal, V., Hsing, M., Ton, A. T., Ban, F., Norinder, U., ... & Cherkasov, A. (2020). Deep docking: a deep
learning platform for augmentation of structure-based drug discovery. ACS central science, 6(6), 939-949.
Guedes, I. A., Barreto, A. M., Marinho, D., Krempser, E., Kuenemann, M. A., Sperandio, O., ... & Miteva, M. A.
(2021). New machine learning and physics-based scoring functions for drug discovery. Scientific reports, 11(1), 3198.
Jiménez-Luna, J., Grisoni, F., Weskamp, N., & Schneider, G. (2021). Artificial intelligence in drug discovery: recent
advances and future perspectives. Expert opinion on drug discovery, 16(9), 949-959.
Kabir, S. (n.d.). Drug discovery virtual screening dataset [Data set]. Kaggle.
Kimber, T. B., Chen, Y., & Volkamer, A. (2021). Deep learning in virtual screening: recent applications and
developments. International journal of molecular sciences, 22(9), 4435.
Kleandrova, V. V., Scotti, L., Bezerra Mendonça Junior, F. J., Muratov, E., Scotti, M. T., & Speck-Planche, A. (2021).
QSAR modeling for multi-target drug discovery: designing simultaneous inhibitors of proteins in diverse pathogenic
parasites. Frontiers in Chemistry, 9, 634663.
Kotsias, P. C., Arús-Pous, J., Chen, H., Engkvist, O., Tyrchan, C., & Bjerrum, E. J. (2020). Direct steering of de novo
molecular generation with descriptor conditional recurrent neural networks. Nature Machine Intelligence, 2(5), 254-
Moshawih, S., Bu, Z. H., Goh, H. P., Kifli, N., Lee, L. H., Goh, K. W., & Ming, L. C. (2024). Consensus holistic
virtual screening for drug discovery: a novel machine learning model approach. Journal of Cheminformatics, 16(1),
Oliveira, T. A. D., Silva, M. P. D., Maia, E. H. B., Silva, A. M. D., & Taranto, A. G. (2023). Virtual screening
algorithms in drug discovery: a review focused on machine and deep learning methods. Drugs and Drug
Candidates, 2(2), 311-334.
Patel, L., Shukla, T., Huang, X., Ussery, D. W., & Wang, S. (2020). Machine learning methods in drug
discovery. Molecules, 25(22), 5277.
Serafim, M. S. M., Pantaleão, S. Q., da Silva, E. B., McKerrow, J. H., O’Donoghue, A. J., Mota, B. E. F., ... &
Maltarollo, V. G. (2023). The importance of good practices and false hits for QSAR-driven virtual screening real
application: A SARS-CoV-2 main protease (Mpro) case study. Frontiers in Drug Discovery, 3, 1237655.
Shi, W., Yang, H., Xie, L., Yin, X. X., & Zhang, Y. (2024). A review of machine learning-based methods for predicting
drug–target interactions. Health Information Science and Systems, 12(1), 30.
Tsou, L. K., Yeh, S. H., Ueng, S. H., Chang, C. P., Song, J. S., Wu, M. H., ... & Ke, Y. Y. (2020). Comparative study
between deep learning and QSAR classifications for TNBC inhibitors and novel GPCR agonist discovery. Scientific
reports, 10(1), 16771.
Valsson, Í., Warren, M. T., Deane, C. M., Magarkar, A., Morris, G. M., & Biggin, P. C. (2025). Narrowing the gap
between machine learning scoring functions and free energy perturbation using augmented data. Communications
Chemistry, 8(1), 41.
Xu, L., Ru, X., & Song, R. (2021). Application of machine learning for drug–target interaction prediction. Frontiers
in genetics, 12, 680117.
Zhang, S., Huo, D., Horne, R. I., Qi, Y., Pujalte Ojeda, S., Yan, A., & Vendruscolo, M. (2025). Sequence-based virtual
screening using transformers. Nature Communications, 16(1), 6925.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 International Journal For Research In Biology & Pharmacy

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
In consideration of the journal, Green Publication taking action in reviewing and editing our manuscript, the authors undersigned hereby transfer, assign, or otherwise convey all copyright ownership to the Editorial Office of the Green Publication in the event that such work is published in the journal. Such conveyance covers any product that may derive from the published journal, whether print or electronic. Green Publication shall have the right to register copyright to the Article in its name as claimant, whether separately
or as part of the journal issue or other medium in which the Article is included.
By signing this Agreement, the author(s), and in the case of a Work Made For Hire, the employer, jointly and severally represent and warrant that the Article is original with the author(s) and does not infringe any copyright or violate any other right of any third parties, and that the Article has not been published elsewhere, and is not being considered for publication elsewhere in any form, except as provided herein. Each author’s signature should appear below. The signing author(s) (and, in



