VIRTUAL SCREENING OF CANDIDATE DRUG COMPOUNDS USING MOLECULAR DESCRIPTORS, PROTEIN PHYSICOCHEMICAL PROPERTIES, BINDING AFFINITY, AND MACHINE LEARNING-BASED ACTIVITY PREDICTION

Dr. Habiba Alsafar; Dr. Basma AlBlooshi; Dr. Amina Al Kaabi; Dr. Mohammed Y. Ali

Authors

Dr. Habiba Alsafar
Dr. Basma AlBlooshi
Dr. Amina Al Kaabi
Dr. Mohammed Y. Ali

Abstract

Virtual screening of candidate drug compounds is an important computational strategy for accelerating early-stage drug discovery by identifying compounds with predicted biological activity before experimental validation. This study evaluated candidate drug compounds using molecular descriptors, protein physicochemical properties, binding affinity, and machine learning-based activity prediction. The dataset contained 2,000 compound–protein interaction records and 17 variables, including molecular weight, LogP, hydrogen bond donors and acceptors, rotatable bonds, polar surface
area, protein length, protein isoelectric point, hydrophobicity, binding site size, engineered interaction features, binding affinity, and binary activity status. Data preprocessing involved missing value treatment, duplicate assessment, feature standardization, and stratified data splitting. Exploratory analysis showed that active and inactive compounds differed mainly in binding affinity, LogP, protein pI, and LogP–pI interaction. Multiple supervised learning models were developed, including Logistic Regression, Random Forest, Support Vector Machine, Gradient Boosting, and k-Nearest Neighbors. Random Forest and Gradient Boosting produced the strongest classification performance, while feature
importance analysis identified binding affinity as the dominant predictor, followed by LogP–pI interaction and LogP. The findings indicate that integrated molecular, protein, and interaction-based descriptors can support accurate activity prediction and candidate prioritization, although external validation remains necessary before biological interpretation.

Downloads

Download data is not yet available.

References

Andrianov, G. V., Haroldsen, E., & Karanicolas, J. (2024). vScreenML v2. 0: Improved Machine Learning

Classification for Reducing False Positives in Structure-Based Virtual Screening. International Journal of Molecular

Sciences, 25(22), 12350.

Bender, A., & Cortes-Ciriano, I. (2021). Artificial intelligence in drug discovery: what is realistic, what are illusions?

Part 2: a discussion of chemical and biological data. Drug Discovery Today, 26(4), 1040-1052.

Bender, A., & Cortés-Ciriano, I. (2021). Artificial intelligence in drug discovery: what is realistic, what are illusions?

Part 1: Ways to make an impact, and why we are not there yet. Drug discovery today, 26(2), 511-524.

Cieślak, M., Danel, T., Krzysztyńska-Kuleta, O., & Kalinowska-Tłuścik, J. (2024). Machine learning accelerates

pharmacophore-based virtual screening of MAO inhibitors. Scientific Reports, 14(1), 8228.

D’Souza, S., Prema, K. V., & Balaji, S. (2020). Machine learning models for drug–target interactions: current

knowledge and future directions. Drug Discovery Today, 25(4), 748-756.

Gentile, F., Agrawal, V., Hsing, M., Ton, A. T., Ban, F., Norinder, U., ... & Cherkasov, A. (2020). Deep docking: a deep

learning platform for augmentation of structure-based drug discovery. ACS central science, 6(6), 939-949.

Guedes, I. A., Barreto, A. M., Marinho, D., Krempser, E., Kuenemann, M. A., Sperandio, O., ... & Miteva, M. A.

(2021). New machine learning and physics-based scoring functions for drug discovery. Scientific reports, 11(1), 3198.

Jiménez-Luna, J., Grisoni, F., Weskamp, N., & Schneider, G. (2021). Artificial intelligence in drug discovery: recent

advances and future perspectives. Expert opinion on drug discovery, 16(9), 949-959.

Kabir, S. (n.d.). Drug discovery virtual screening dataset [Data set]. Kaggle.

Kimber, T. B., Chen, Y., & Volkamer, A. (2021). Deep learning in virtual screening: recent applications and

developments. International journal of molecular sciences, 22(9), 4435.

Kleandrova, V. V., Scotti, L., Bezerra Mendonça Junior, F. J., Muratov, E., Scotti, M. T., & Speck-Planche, A. (2021).

QSAR modeling for multi-target drug discovery: designing simultaneous inhibitors of proteins in diverse pathogenic

parasites. Frontiers in Chemistry, 9, 634663.

Kotsias, P. C., Arús-Pous, J., Chen, H., Engkvist, O., Tyrchan, C., & Bjerrum, E. J. (2020). Direct steering of de novo

molecular generation with descriptor conditional recurrent neural networks. Nature Machine Intelligence, 2(5), 254-

Moshawih, S., Bu, Z. H., Goh, H. P., Kifli, N., Lee, L. H., Goh, K. W., & Ming, L. C. (2024). Consensus holistic

virtual screening for drug discovery: a novel machine learning model approach. Journal of Cheminformatics, 16(1),

Oliveira, T. A. D., Silva, M. P. D., Maia, E. H. B., Silva, A. M. D., & Taranto, A. G. (2023). Virtual screening

algorithms in drug discovery: a review focused on machine and deep learning methods. Drugs and Drug

Candidates, 2(2), 311-334.

Patel, L., Shukla, T., Huang, X., Ussery, D. W., & Wang, S. (2020). Machine learning methods in drug

discovery. Molecules, 25(22), 5277.

Serafim, M. S. M., Pantaleão, S. Q., da Silva, E. B., McKerrow, J. H., O’Donoghue, A. J., Mota, B. E. F., ... &

Maltarollo, V. G. (2023). The importance of good practices and false hits for QSAR-driven virtual screening real

application: A SARS-CoV-2 main protease (Mpro) case study. Frontiers in Drug Discovery, 3, 1237655.

Shi, W., Yang, H., Xie, L., Yin, X. X., & Zhang, Y. (2024). A review of machine learning-based methods for predicting

drug–target interactions. Health Information Science and Systems, 12(1), 30.

Tsou, L. K., Yeh, S. H., Ueng, S. H., Chang, C. P., Song, J. S., Wu, M. H., ... & Ke, Y. Y. (2020). Comparative study

between deep learning and QSAR classifications for TNBC inhibitors and novel GPCR agonist discovery. Scientific

reports, 10(1), 16771.

Valsson, Í., Warren, M. T., Deane, C. M., Magarkar, A., Morris, G. M., & Biggin, P. C. (2025). Narrowing the gap

between machine learning scoring functions and free energy perturbation using augmented data. Communications

Chemistry, 8(1), 41.

Xu, L., Ru, X., & Song, R. (2021). Application of machine learning for drug–target interaction prediction. Frontiers

in genetics, 12, 680117.

Zhang, S., Huo, D., Horne, R. I., Qi, Y., Pujalte Ojeda, S., Yan, A., & Vendruscolo, M. (2025). Sequence-based virtual

screening using transformers. Nature Communications, 16(1), 6925.

VIRTUAL SCREENING OF CANDIDATE DRUG COMPOUNDS USING MOLECULAR DESCRIPTORS, PROTEIN PHYSICOCHEMICAL PROPERTIES, BINDING AFFINITY, AND MACHINE LEARNING-BASED ACTIVITY PREDICTION

Authors

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Make a Submission

crossref

google-scholar

Road

open-access

International Journal for Research in Biology & Pharmacy