DeepMind’s protein folding research journey – Part 2


The former director at Microsoft Research leads the science team at DeepMind. There is much talk in artificial intelligence circles of the “AI winter” – a period where there was little tangible progress – having ended during the past decade. The same sense of movement is now also true of protein folding, the science of predicting the shape of what biologists consider to be the building blocks of life.

When studying at UCL and later at MIT, Hassabis found that interdisciplinary collaboration was a hot topic. He recalls that workshops would be organized involving different disciplines – neuroscience, psychology, mathematics and philosophy, for instance. There would be a couple of days of talks and debates before the academics returned to their departments, vowing that they must gather more regularly and find ways to collaborate. The next meeting would be a year later – grant applications, teaching assignments and the churn of research and scholarly life would get in the way of co-operation.

“Interdisciplinary research is hard,” Hassabis says. “Say you get two world-leading experts, in maths and genomics – there obviously could be some crossover. But who is going to do the work to understand the other person’s field, their jargon, what their real problem is?”

Identifying the right question to ask, why that question hasn’t been answered – and what, if it’s not been answered, the tricky thing about it is – may seem, to outsiders, relatively straightforward. But scientists, even in the same discipline, don’t always see their work in the same way. And it’s notoriously hard for researchers to add value to other disciplines. It’s even harder for researchers to find a joint question that they might answer.

The current DeepMind headquarters – two floors of Google’s King’s Cross building – has become increasingly populous in the past couple of years. There are six or seven disciplines represented in the company’s AI research alone, and it has been hiring specialists in mathematics, physics, neuroscience, psychology, biology and philosophy as it broadens its remit.

“Some of the most interesting areas of science are in the gaps between, the confluences between subjects,” Hassabis says. “What I’ve tried to do in building DeepMind is to find ‘glue people’, those who are world class in multiple domains, who possess the creativity to find analogies and points of contact between different subjects. Generally speaking, when that happens, the magic happens.”

One such glue person is Pushmeet Kohli. The former director at Microsoft Research leads the science team at DeepMind. There is much talk in artificial intelligence circles of the “AI winter” – a period where there was little tangible progress – having ended during the past decade. The same sense of movement is now also true of protein folding, the science of predicting the shape of what biologists consider to be the building blocks of life.

Kohli has brought together a team of structural biologists, machine-learning experts and physicists in order to address this challenge, widely recognised as one of the most important questions in science. Proteins are fundamental to all mammalian life – they make much of the structure and function of tissues and organs at a molecular level. Each is comprised of amino acids, which make up chains. The sequence of these determines the shape of the protein, which determines its function.

“Proteins are the most spectacular machines ever created for moving atoms at the nanoscale and often do chemistry orders of magnitude more efficiently than anything that we’ve built,” says John Jumper, a research scientist at DeepMind who specialises in protein folding. “And they’re also somewhat inscrutable, these self–assembling machines.”

Proteins arrange atoms at the angstrom scale, a unit of length equivalent to one ten-billionth of a metre; a deeper understanding would offer scientists a much more substantial grasp of structural biology. For instance, proteins are necessary for virtually every function within a cell, and incorrectly folded proteins are thought to be contributing factors to diseases such as Parkinson’s, Alzheimer’s and diabetes.

If we can learn about the proteins that nature has made, we can learn to build our own,” Jumper says. “It’s about getting a really concrete view into this complex, microscopic world.”

What has made protein folding an attractive puzzle for the DeepMind team has been the widespread availability of genomic data sets. Since 2006 there has been an explosion in DNA data acquisition, storage, distribution and analysis. Researchers estimate that by 2025 two billion genomic data sets may have been analysed, requiring 40 exabytes of storage capacity.

“It’s a nice problem from a deep learning perspective, because at enormous expense, enormous complexity and enormous time [commitment], people have generated this amazing resource of proteins that we already understand,” Jumper says.

While progress is being made, scientists urge against false exuberance. The esteemed American molecular biologist Cyrus Levinthal expressed the complexity of the challenge in a bracing manner, noting that it would take longer than the age of the universe to enumerate all the possible configurations of a typical protein before reaching the right 3D structure. “The search space is huge,” says Rich Evans, a research scientist at DeepMind. “It’s bigger than Go.”

Nevertheless, a milestone in the protein-folding journey was reached in December 2018 at the CASP (Critical Assessment of Techniques for Protein Structure Prediction) competition in Cancun, Mexico – a biennial challenge that provides an independent way of plotting researchers’ progress. The aim for competing teams of scientists is to predict the structure of proteins from sequences of their amino acids for which the 3D shape is known but not yet made public. Independent assessors verify the predictions.

The protein-folding team at DeepMind entered as a way of benchmarking AlphaFold, the algorithm it had developed over the previous two years. In the months leading up to the conference, the organisers sent data sets to the team members in King’s Cross, who sent back their predictions with no sense of how they would fare. In total, there were ninety protein structures to predict – some were template-based targets, which use previously solved proteins as guidance, others were modeled from scratch. Shortly before the conference, they received the results: AlphaFold was, on average, more accurate than the other teams. And some metrics put DeepMind significantly ahead of the other teams, for protein sequences modeled from scratch — 43 of the 90 – AlphaFold made the most accurate prediction for 25 proteins. The winning margin was striking: its nearest rival managed three.

A “ribbon diagram” visualisation of a protein’s backbone, folded into a 3D structure predicted by the AlphaFold algorithm for the CASP13 competition.

Mohammed AlQuraishi, a fellow at the Laboratory of Systems Pharmacology and the Department of Systems Biology at Harvard Medical School, attended the event, and learned about the DeepMind approach before the results were published. “Reading the abstract, I didn’t think ‘Oh, this is completely new’,” he says. “I accepted they would do pretty well, but I wasn’t expecting them to do as well as they did.”

According to AlQuraishi, the approach was similar to that of other labs, but what distinguished the DeepMind process was that they were able to “execute better”. He points to the strength of the DeepMind team on the engineering side.

“I think they can work better than academic groups, because academic groups tend to be very secretive in this field,” AlQuraishi says. “And so, even though the ideas DeepMind had in their algorithm were out there and people were trying them independently, no one has brought it all together.”

AlQuraishi draws a parallel with the academic community in machine learning, which has undergone something of an exodus in recent years to companies like Google Brain, DeepMind and Facebook, where organisational structures are more efficient, compensation packages are generous, and there are computational resources that don’t necessarily exist at universities.

“Machine learning computer science communities have sort of really experienced that over the last four or five years,” he says. “Computational biology is just now waking up to this new reality.”

This echoes the explanation given by the founders of DeepMind when they sold to Google in January 2014. The sheer scale of Google’s computational network would enable the company to move research forward much more quickly than if it had to scale organically, and the £400 million cheque enabled the startup to hire world-class talent. Hassabis describes a strategy of targeting individuals who have been identified as a good fit for specific research areas. “We’ve our roadmap that informs what subject areas, sub-fields of AI or neuroscience will be important,” he says. “And then we go and find the world’s best person who fits in culturally as well.”

“So far as a company like DeepMind can make a dent, I think protein folding is a very good place to start, because it’s a problem that’s very well defined, there’s useable data, you can almost treat it like a pure computer science problem,” AlQuraishi says. “That’s probably not true in other areas of biology. It’s a lot messier. So, I don’t necessarily think that the success that DeepMind has had with protein folding will translate automatically to other areas.”

DeepMind staffers pictured on the roof of Google’s offices in King’s Cross.

To be continue

Inside DeepMind’s epic mission to solve science’s trickiest problem – Part 1

Related posts: