Pre- and Post-publication Verification for Reproducible Data Mining in Macromolecular Crystallography.

Journal: Methods in molecular biology (Clifton, N.J.)
PMID:

Abstract

Like an article narrative is deemed by an editor and referees to be worthy of being a version of record on acceptance as a publication, so must the underpinning data also be scrutinized before passing it as a version of record. Indeed without the underpinning data, a study and its conclusions cannot be reproduced at any stage of evaluation, pre- or post-publication. Likewise, an independent study without its own underpinning data also cannot be reproduced let alone be considered a replicate of the first study. The PDB is a modern marvel of achievement providing an organized open access to depositor and user of the data held there opening numerous applications. Methods for modeling protein structures and for determination of structures are still improving their precision, and artifacts of the method exist. So their accuracy is realized if they are reproduced by other methods. It is on such foundations that reproducible data mining is based. Data rates are expanding considerably be they at synchrotrons, the X-ray free electron lasers (XFELs), electron cryomicroscopes (cryoEM), or at the neutron facilities. The work of a person as a referee or user with a narrative and its underpinning data may well be complemented in future by artificial intelligence with machine learning, the former for specific refereeing and the latter for the more general validation, both ideally before publication. Examples are described involving rhenium theranostics, the anti-cancer platins and the SARS-CoV-2 main protease.

Authors

  • John R Helliwell
    Department of Chemistry, University of Manchester, Manchester, UK. john.helliwell@manchester.ac.uk.