ChemFusion-SGK: A Multi-Modal Approach to Molecular Prediction
Keywords:
Molecular property prediction; ADMETAbstract
Accurate prediction of absorption, distribution, metabolism, excretion and toxicity (ADMET) properties from molecular structure is a central bottleneck in early-stage drug discovery. Current deep-learning approaches commit prematurely to a single molecular view: graph neural networks privilege topology, sequence models privilege connectivity as text, and fingerprint models privilege pre-computed substructural features. Each view captures a distinct inductive bias, yet combining them in a way that lets the model decide how much to trust each view on a per-molecule basis remains an open problem. We propose ChemFusion-SGK, a triple-view architecture in which a SMILES Transformer, an edge-conditioned Message Passing Graph Attention Network, and a Morgan fingerprint multi-layer perceptron are fused by a learned Cross-Modal Gate and passed to a Chebyshev-basis Kolmogorov–Arnold Network (KAN) output head. The gate produces per-molecule softmax weights that reveal which view is driving the prediction, and the Chebyshev-KAN head replaces the standard linear output with a learnable nonlinear activation on every edge. On five MoleculeNet ADMET benchmarks (BBBP, BACE, ClinTox, ESOL, FreeSolv) under scaffold splitting, ChemFusion-SGK achieves state-of-the-art ROC-AUC of 0.996 on ClinTox, is within 0.01 AUC of the best single-view model on BACE and BBBP, and reaches R² = 0.743 (RMSE = 1.071 log mol/L) on ESOL solubility regression. An ablation study shows that removing the KAN head, the Cross-Modal Gate, or the fingerprint branch degrades performance on at least one dataset, and the learned gate weights quantitatively expose the dataset-specific reliance on sequence (0.88 on ClinTox), graph topology (0.44 on BBBP) or composite features (0.57 on ESOL). The proposed framework is implemented entirely in free-tier Google Colab with public datasets and is released as reproducible notebooks
