In our paper Cyclo(tetrahydroxybutyrate) production is sufficient to distinguish between Xenorhabdus and Photorhabdus isolates in Thailand, we train a gradient boosting model to classify bacterial metabolite data obtained from soil samples into corresponding to Photorhabdus or Xenorhabdus, and discuss the role played by the host nematodes in providing a living environment to the bacteria, in comparison to abiotic factors.
The interactive visualisations below show the clustered metabolites at different values of the correlation threshold ρ across samples. For each threshold, one metabolite of each cluster was selected as a feature for the model, while the others were discarded. After identified the best predictor of bacterial genus, we looked into the individual members of its cluster at the lowest correlation threshold. One of these compounds was isolated and its chemical structure determined.
Note: Only clusters with 250 or fewer members are displayed.
The full code for the paper can be found here.