The early development of computers and artificial intelligence (AI) is inseparably linked to the emerging concept of the living cell as a kind of “molecular machine.”
More recent biological research points in a very different direction, however. Far from being mere gears in a machine, proteins possess a kind of “intelligence” of their own. There is reason to believe that “smart proteins” function as a substrate for human cognitive processes on the subcellular level.
I expect that proteins will someday also be utilized as analog elements in AI systems of a new type. For that to happen, however, AI must overcome the digital bias which has accompanied its development up to the present time.
A number of the pioneers of artificial intelligence took a deep interest in the functioning of living organisms and their possible replication by man-made devices. These include especially John von Neumann and Alan Turing and, in a somewhat different way, Norbert Wiener and Claude Shannon.
Beginning in the 1940s von Neumann began working on a “General Theory of Automata.” Here he intended the term “automaton” to apply to living organisms as well as machines created to imitate them. The human brain would be included in the category of automata. Von Neumann’s preoccupation with living and artificial automata was essential to his contributions to the development of modern computer systems.
Von Neumann was particularly interested in the possibility of creating machines that would be able to reproduce themselves and even to evolve – thereby matching the most essential property of living, “natural automata.”
Again and again, von Neumann came back to the challenge of developing a mathematical theory that would embrace both. He apparently regarded it as the most important subject of his life’s work.
He set the theme in his presentation for the 1948 Hixon Symposium, entitled “The General and Logical Theory of Automata.” It is taken up by a 1953 presentation on “Machines and Organisms” and his last work, the “Computer and the Brain” which appeared a year after his death in 1957. A collection of his publications and unpublished manuscripts on these subjects was published in 1966 under the title The Theory of Self-Reproducing Automata.
Computer pioneer Alan Turing contributed to the concept of a living organism as an automaton in a 1952 work entitled, “The Chemical Basis of Morphogenesis.” Here Turing develops a mathematical theory of a developing embryo, setting forth “a possible mechanism by which the genes of a zygote [a fertilized egg cell] may determine the anatomical structure of the resulting organism.”
Certainly nothing in the domain of living organisms suggests the idea of a genetically-programmed algorithmic process more strikingly that embryogenesis: the generation of a complete adult organism starting from a single cell via a succession of developmental stages, which are repeated in all individuals of a species.
Turing’s paper puts forward a theory for how the geometrical forms of the organism are generated by a physico-chemical process involving the production and diffusion of chemical substances he called morphogenes. The cell’s genes serve as chemical catalysts for the synthesis of the morphogenes.
The details are largely obsolete today, but they anticipate elements of the picture that emerged in the subsequent development of molecular biology.
Norbert Wiener’s classical book Cybernetics: Or Control and Communication in the Animal and the Machine (1948) deeply influenced both molecular biology and the development of artificial intelligence.
Likewise Claude Shannon’s 1949 book A Mathematical Theory of Communication, which set forth what became known as “information theory.”
Claude Shannon’s concept of “information” served as a cornerstone of artificial intelligence and – in a less rigorous form – established itself in biology through the expression “genetic information.” Interestingly. Shannon wrote his PhD thesis on genetics. In 1950 the same Claude Shannon built an electromechanical mouse moving around in a maze, carrying out one of the first experiments with AI.
His mathematical theory of information proved extremely useful as a technical tool, e.g. in the design of communications systems. But in my opinion the subsequent hegemony of his concept of information in virtually every field led to a one-sided over-emphasis on the “discrete” – the combinatorial side of reality – at the expense of continuity.
Not everything in the world breaks up neatly into pieces that one can arrange on a chessboard. A line is more than a collection of points; airplanes don’t move through the air in sequences of little jerks; beauty is not equal to an arrangement of pixels; and meaning is not an arrangement of letters on a page.
As I shall argue later in this series, efforts to impose the concept of “information” on the human use of language have exacerbated the stupidity of AI, as well as the stupidity problem of human society today.
Your brain is not digital
It is interesting to note that John von Neumann – who was more brilliant than the others – was initially somewhat cautious concerning the apparent digital character of the human nervous system. In his 1948 paper von Neumann stated:
“The neuron transmits an impulse. This appears to be its primary function, even if the last word about this function and its exclusive or non-exclusive character is far from having been said. The nerve impulse seems in the main to be an all-or-none affair, comparable to a binary digit … but it is equally evident that this is not the entire story… The living organisms are very complex – part digital and part analog mechanisms. The computing machines, at least in their recent forms to which I am referring in this discussion, are purely digital… Although I am well aware of the analog component in living organisms, and it would be absurd to deny its importance, I shall, nevertheless, for the sake of the simpler discussion, disregard that part. I shall consider the living organisms as if they were purely digital automata.”
Unfortunately, von Neumann held on to this digital “simplification” of living organisms, and especially of the brain and nervous system, in most of his later work.
From the standpoint of what we know in neurobiology today, it is nonsense to try to understand the function of neurons and the nervous system by supposed analogies to digital computers.
The reign of discreteness and combinatorics in biology – its virtual “digitalization,” we could say – was cemented by the 1953 discovery of the double-helix structure of DNA, by Francis Crick’s enunciation of the “The Central Dogma of Molecular Biology” in 1957 and by the deciphering of the genetic code in the early 1960s.
According to the Central Dogma and its systematic elaboration, the DNA sequences contain the basic information and “rules” for the functioning of the cell; these remain unchanged during the cell’s life, except for rare chance mutations; and the genetic code, contained in the DNA, determines the structures of the proteins that control the chemical machinery of the cell.
In particular, the fundamental act of cell division, by which living organisms grow and multiply, proceeds step by step in a precisely determined sequence of events, through the successive activation of genes contained in the DNA.
A living cell would thus be a special type of Turing machine, realized on a molecular basis. Put in popularized terms: cells operate like digital computers, with their DNA as the computer program.
Nobel Prize-winning physiologist Sydney Brenner summed up the doctrine of molecular biology most succinctly in a 2002 essay in honor of Alan Turing, entitled “Life’s Code Script”:
“Biologists ask only three questions of a living organism: how does it work? How is it built? And how did it get that way? They are problems embodied in the classical fields of physiology, embryology and evolution. And at the core of everything are the tapes containing the descriptions to build these special Turing machines.”
Genetic code holy doctrine
Students today all learn to recite the Doctrine of the Genetic Code. It goes very roughly as follows: Proteins are the organizers and agents of cell activity, each with its area of specialization. They are formed from linear sequences of amino acids, of which there are 20 in all.
The information specifying the sequence of amino acids for a given protein is encoded in the cell’s DNA, through the sequence of nucleotide molecules holding the two strands of the DNA double helix together.
There are four different nucleotides, defining a four-letter code. The genes correspond to sequences written in the code. These are transcribed by molecular machinery in the cell nucleus from DNA to RNA molecules which function as carriers of information.
After some editing, the RNA molecules are somehow transported out of the nucleus and fed into structures called ribosomes. Moving along the RNA like a tape reader head, a ribosome produces the corresponding string of amino acids making up the desired specific protein.
It does so according to a preset coding schema, whereby each successive set of three letters (triplet) of the four-letter code corresponds to a specific amino acid. As there are 64 possible triplets but only 20 amino acids, the code is redundant.
Protein folding problem
The Doctrine of the Genetic Code, as commonly advertised, misses several crucial points. These include, among others, the so-called protein folding problem (see below), the existence of inheritable epigenetic (non-DNA-coded) changes in organisms, and the unresolved role of the 99% of the DNA that seems not to have a coding function.
Here I shall address only the protein folding problem since it relates most directly to the present and future of artificial intelligence.
What comes out of the DNA transcription process is just a chain of amino acids. But to perform its function in the cell – e.g., as an enzyme, a membrane receptor, an antibody, etc. – the protein must first transform itself from this linear chain into a precise three-dimensional form, unique to each protein. This so-called “native conformation” can be extraordinarily complicated, including multiply knotted topologies.
One should bear in mind that protein molecules in the human body consist on average of 480 amino acids, containing a total of nearly 10,000 atoms. The largest of these proteins, titin, is formed from a chain of 35,350 amino acids, and has over 600,000 atoms. Titin makes up about 10% of our muscle tissue.
What is the problem? The DNA code contains no indications about what the right conformation for a given protein should be, nor how to generate it from the original linear chain.
All the DNA tells us is the sequence of amino acids along the protein’s chain. How does the protein know what shape to become and how to get there? Where does the additional information come from?
This has come to be known as the “folding problem” (although the protein’s actual motions include twisting, stretching, etc.). The folding problem is not only only fundamental for molecular biology; it also has far-reaching implications for medicine.
The list of “protein misfolding diseases” (protein conformational disorders) includes Alzheimer’s disease and other forms of dementia, Parkinson’s disease, cystic fibrosis, sickle cell disease and most likely also type 2 diabetes. In the wrong conformation, proteins not only become dysfunctional; they can also disrupt normal cell functioning.
In 1968-1969 the molecular biologist Cyrus Levinthal posed what became known as the “Levinthal Paradox.” After being stretched out into roughly linear form, proteins in solution revert to their precise native conformations within at most a few seconds.
On the other hand, given the number of variable bond angles even in a small protein, the number of possible conformations is astronomically large. It’s estimated that if a protein were to try them all out, one at a time, at the rate a trillion a second, it would on average take longer than the estimated age of the Universe to find the right one!
In light of the rapidity and precision with which proteins assume their correct conformation, Levinthal concluded that the folding process cannot be random, but must follow a more or less well-defined pathway.
Evidently the physical interactions between the various parts of the molecule, with their electric charges and bond angles and so forth, as well as the surrounding medium, guide the protein in its process of folding and twisting into the right shape.
This problem takes us into a completely different world from combinatorics, digital computations and Turing machines. We are dealing with hard-core physics. For half a century, scientists have been struggling to solve Levinthal’s Paradox. What would it mean to have a solution?
First, to explain how proteins in general are able to “find” their native conformations so rapidly and reliably.
Second, to be able to determine precisely the course of events whereby a given protein transforms itself into its correct three-dimensional conformation, starting from its linear form.
Third, to be able to forecast the three-dimensional conformation of a protein, given nothing more than its DNA code.
Some scientists regard Levinthal’s Paradox as “essentially solved,” while others do not agree. As far as structure forecasting is concerned, a 2019 paper in the International Journal of Modern Physics remarks:
“Predicting the 3D structure of a protein from its amino acid sequence is one of the most important unsolved problems in biophysics and computational biology…. At present, some of the most successful methods have a reasonable probability of predicting the folds of small, single-domain proteins.”
The mathematical equations, describing the behavior of a protein according to the laws of quantum physics, are hopelessly complicated to solve – even with drastic simplifications and even using the fastest and largest supercomputers.
Predicting protein structure has been a major selling point for the development of next-generation supercomputers such as IBMs Blue Gene series. But attempts to solve the problem by “brute force” calculations have yielded disappointing results.
Instead, structure forecasting today employs mixed strategies, using large data banks of molecules with known 3D structures, computer simulations, and extensive knowledge from experimental and theoretical protein science in an effort to determine the most probable shape.
Finding an ever-growing field of application here is AI, especially deep learning systems. The reader can find a useful presentation on this subject in a DeepMind blog entry.
As an afterthought, I would like to suggest a potentially revolutionary idea: To apply deep learning, the AI system must be trained on a large database with information on the known behavior of proteins.
De facto the proteins are teaching the supercomputers! Proteins are evidently more “intelligent” than our digital systems. They don’t need any computations to fold into the right configuration. They just do it naturally.
Why not replace the stupid transistors in our computer chips by proteins or other smart molecules? Why not replace laborious computations by natural physical events?
“Biological computing” is already an established area of research. It appears still dominated by algorithmic mindsets, but the future is open-ended. More on this in future articles.
Jonathan Tennenbaum received his PhD in mathematics from the University of California in 1973 at age 22. Also a physicist, linguist and pianist, he’s a former editor of FUSION magazine. He lives in Berlin and travels frequently to Asia and elsewhere, consulting on economics, science and technology.