The Annals of Applied Statistics
- Ann. Appl. Stat.
- Volume 5, Number 4 (2011), 2603-2629.
Bayesian matching of unlabeled marked point sets using random fields, with an application to molecular alignment
Statistical methodology is proposed for comparing unlabeled marked point sets, with an application to aligning steroid molecules in chemoinformatics. Methods from statistical shape analysis are combined with techniques for predicting random fields in spatial statistics in order to define a suitable measure of similarity between two marked point sets. Bayesian modeling of the predicted field overlap between pairs of point sets is proposed, and posterior inference of the alignment is carried out using Markov chain Monte Carlo simulation. By representing the fields in reproducing kernel Hilbert spaces, the degree of overlap can be computed without expensive numerical integration. Superimposing entire fields rather than the configuration matrices of point coordinates thereby avoids the problem that there is usually no clear one-to-one correspondence between the points. In addition, mask parameters are introduced in the model, so that partial matching of the marked point sets can be carried out. We also propose an adaptation of the generalized Procrustes analysis algorithm for the simultaneous alignment of multiple point sets. The methodology is illustrated with a simulation study and then applied to a data set of 31 steroid molecules, where the relationship between shape and binding activity to the corticosteroid binding globulin receptor is explored.
Ann. Appl. Stat., Volume 5, Number 4 (2011), 2603-2629.
First available in Project Euclid: 20 December 2011
Permanent link to this document
Digital Object Identifier
Mathematical Reviews number (MathSciNet)
Zentralblatt MATH identifier
Czogiel, Irina; Dryden, Ian L.; Brignell, Christopher J. Bayesian matching of unlabeled marked point sets using random fields, with an application to molecular alignment. Ann. Appl. Stat. 5 (2011), no. 4, 2603--2629. doi:10.1214/11-AOAS486. https://projecteuclid.org/euclid.aoas/1324399608
- Supplementary material A: R programs for Bayesian molecule alignment. The zip file contains R programs for molecular alignment using random fields. The main R program is fields8.r which carries out a Bayesian MCMC procedure. The programs were written by Irina Czogiel, with some later edits by Ian Dryden. There are two options in the program—simulation study (as in Section 4.4) of the paper, or comparison of two molecules using steric information (as in Section 5).
- Supplementary material B: Steroids data. The zip file contains the data set of steroids first analyzed by Dryden, Hirst and Melville (2007). The data set of (x, y, z) atom co-ordinates and partial charges was constructed by Jonathan Hirst and James Melville (School of Chemistry, University of Nottingham).