go to main contents
Search
URV

Muhammad Faizan Khan


PhD Programme: Bioinformatics
Research group: MIL@b – Metabolomics Interdisciplinary Laboratory
Supervisor: Óscar Yanes Torrado 


Bio

Muhammad Faizan Khan, from Pakistan, has done a BS in Software Engineering from Comsats University Attock. After that, he got a job at Aviation Design Institute (AvDI) in Pakistan, as a design engineer working on different machine learning projects. Then he was selected as a Graduate Assistant at the Ghulam Ishaq Khan Institute of Engineering Sciences and Technology (GIKI). He also has a Master's in Computer Science from GIKI. His thesis title was "Fixed-Point optimization of fully convolutional networks". His master's thesis is also under review in the journal "IEEE Transactions on Pattern Analysis and Machine Intelligence" (Impact factor 16.9). After his master's, he got a job at Pakistan National Radio & Telecommunication Corporation as a Team Lead in the Artificial intelligence LAB.

Project: A deep learning approach for identifying metabolites by mass spectrometry-based metabolomics

The project is frontier science and will overcome one of the main existing barriers in the field of metabolomics: the structural identification of metabolites and the characterization of complete metabolomes. The identification of metabolites in eukaryotic and prokaryotic (e.g., microbiota) organisms, and environmental samples is the next frontier in metabolomics research. Similar to the impact of protein search algorithms for the progress of proteomics in the 90s, the approach proposed in this project will lay the basis for characterizing large numbers of metabolites lacking annotated tandem MS spectra or without molecular structures listed in chemical databases, filling the gap that has prevented metabolomics from evolving as fast as the other omic sciences. The identification of metabolites from mass spectrometry data requires annotation from MS1 and MS2 data, including reduction of redundant signals in MS1 (mostly due to in-source phenomena such as cation adduction) and matching observed MSn (n ≥ 2) spectra to experimental spectra available in curated databases (e.g. HMDB, NIST) in MS2. However, a complication of this strategy is the poor coverage of primary and secondary metabolites (i.e., natural products) that may be detected from biological and environmental samples in terms of experimental spectra. For example, about 10% of known small molecules in compound-centric databases (e.g. METLIN and HMDB) have experimental spectral data from pure standards, although these databases currently contain >900,000 compounds (HMDB) and >110,000 compounds (METLIN). In addition, many metabolites are unknown so they are extremely difficult to characterize by the fact that both chemical structures of metabolites and annotated tandem MS spectra are unknown. This project aims to develop the first integrated computational workflow for non-targeted mass spectrometry-based metabolomics, including the annotation of MS1 and MS2 (or MSn) data. The MS1 approach will use the methodology described in patent P202030061, which will be coupled to a deep learning approach for MS2 annotation. The deep learning approach will be based on using separate variational autoencoders for representing MS/MS spectral data and for representing the chemical structure of known metabolites. The integrated workflow will be made publicly available for the wider scientific community to utilize in the form of user-friendly software.

Outreach activities

  • Science Week 2021, Escola Internacional del Camp: “Mass spectrometry”. 
  • European Researchers' Night 2022: "Nanotecnologia per a les teràpies contra el càncer guiades per imatges".