NOSE—Neural-Olfactory-Sensing-and-Evaluation
Published:
NOSE: Neural Olfactory Sensing and Evaluation
Repository for Introduction to Machine Learning Course Project, 2025 Spring.
This project provides code for neural-based olfactory (smell) sensing and evaluation using various machine learning models.
Folder Structure
train/
Contains scripts for model training and experiments:
classification.py: Classification tasks on olfactory data.regression.py: Regression tasks for scent-related properties.fine-tuned MolFormer.py: Fine-tuning MolFormer model.finetune_multitask.py: Multitask fine-tuning.OpenPoM.py: OpenPoM-related training.
utils/
Helper functions, dataset preparation, and visualization:
prepare_datasets.py: Dataset preparation utilities.gs_lf.py: Latent factor helper.helper_methods.py: Miscellaneous helper functions.mol_loss.py: Molecular loss calculations.util_alignment.py: Alignment utilities.visualization_helper.py: Visualization tools.test_gs_lf.ipynb: Notebook for testing latent factor code.
custom_utils/
Custom utilities for argument parsing, data handling, encoding, and configuration:
args.py,args_finetune.py: Argument parsing.data_utils.py: Data loading and preprocessing.pubchem_encoder.py: PubChem encoding utilities.hparams.yaml: Hyperparameter configuration.train_pubchem_light.py: Training on PubChem-light.pubchem_canon_zinc_final_vocab_sorted.pth: Precomputed vocabulary (PyTorch format).- Folders:
rotate_attention/,tokenizer/
(For the full file listing, see the custom_utils folder here.)
How to Run
Install Requirements
The environment for this project can be quite tricky. We encourage you follow the exact steps of IBM’s MoLFormer repository to set up the environment. You can find the instructions here
Warning: The environment includes apex, which may fail in certain CUDA versions. If you encounter issues, try using a different CUDA version or change the optimization method to Adam in the training scripts.
Prepare Datasets
We use the curated GS-LF dataset. You can download it from here.
For the Keller-2016 dataset used as test set, you can download it from here. We also add an extra binarization step to the dataset.
Train Models
Before running the training scripts, ensure you have the datasets prepared and placed in the correct directories. You will also need to download the MoLFormer_Pretrained model from here Notice that the checkpoint files are vital for your fine-tuning process. Make sure to have them before you run the training scripts.
For fine-tuning specific models:
python train/finetune_multitask.py python train/fine-tuned\ MolFormer.pyFor training classification or regression models:
python train/classification.py python train/regression.pyCustomize Arguments and Hyperparameters
- Edit YAML config in
custom_utils/hparams.yamlfor hyperparameters. - Use scripts in
custom_utils/args.pyorcustom_utils/args_finetune.pyfor advanced argument parsing.
- Edit YAML config in
