Some ‘Learning’ in Cheminformatics, QSAR and Generative AI
Published:
Introduction
In 2025, I published two posts exploring molecular modelling through AI co-folding and physics-based simulations for Structure-Based Drug Discovery (SBDD), drawing on my background in physical organic chemistry and biophysics. However, some pillars of modern computational drug discovery - Cheminformatics and Machine Learning (ML) - have yet to be discussed here. During my time in the industry, these are the fields where I have experienced the most significant professional growth.
In this post, I will share insights from my journey transitioning between the roles of a cheminformatician and an ML engineer in the biotech. To demonstrate the real-world application of data science in drug discovery, I will present a virtual screening (VS) workflow for covalent drug discovery in the Cereblon (CRBN) chemical space (Figure 1). We will walk through a rigorous pipeline: from chemical database mining and library enumeration to molecular docking, QSAR modelling, and AI-driven generation - all guided by the rational constraints of organic and medicinal chemistry.

