State of the art protein crystallization is a numbers game:
as it is unlikely that the conditions under which any given macromolecule will
crystallize can be deduced a priori, conditions
must instead be found by experimentation.
Crystallization is a time-dependent trial and error sampling of the
extremely large space of possible crystallization conditions: large number of
conditions are tested, and each experiment is observed (often by imaging) at
several time points. The ultimate goal is to have a consistent machine generated score for
each image describing the outcome and then to correlate image similarity
with condition similarity, building up an accurate picture of the phase diagram
for any system. This would enable
conditions for crystallization to be located, even if the initial set of
experiments did not sample the appropriate
set of experimental conditions in the space of all possible conditions.
Currently, automation is used routinely to miniaturize the
experiments and to capture their results, but not to interpret the results of
the experiments. We are interested in
different approaches to using machine learning to interpret the results of
crystallization experiments – what tools have already been developed, and how
can they be best implemented in a practical and timely way? We will discuss progress of implementation,
and compare and contrast existing approaches to automation of scoring. Finally, we will discuss the steps we are
taking to find relationships between the experimental conditions and the
outcomes of those experiments.