During the Second World War, there was considerable haste for mass induction into the US Army, but it was important to weed out men with syphilis. This appropriated rapid screening of the men. But blood tests were time-consuming and expensive at the time.
In early 1943, Robert Dorfman, a professor of political economy at Harvard University, singlehandedly developed a strategy to reduce the overall testing time drastically. It’s worth pondering why it took so long for someone to think of this rather accessible and lucid strategy, one that is now ubiquitous with the extensive usage of the binary search algorithm.
Since Dorfman’s paper, group testing has seen widespread application in statistics, computing and laboratories, from the Human Genome Project to cybersecurity.
Prompt, rapid, and rigorous testing has been widely observed to be the most effective approach to tackle the current Covid-19 pandemic. Governments have been recommended to adhere to the “test, test, test!” maxim. Countries that have tested early, swiftly, and extensively have shown promise in containing the epidemic and “flattening the curve.”
However, Covid-19 tests still consume a precious amount of time when put in perspective of how fast the disease is spreading. Often, by the time a test for a particular suspected area or demographic segment gets completed, the subjects would have already spread out enough to make it frustratingly tough to keep track of successive contacts and potential transmissions. It creates a dismaying lag, an unraveling spool spiraling out of hand. Pooled testing could be of great help here.
First off, one can exclude symptomatic candidates from this form of testing. It would still work very well, but erring on the side of caution never harms. Samples from asymptomatic subjects should be segregated and analyzed using this algorithm.
This algorithm affords maximum benefit in populations where the spread is in its early stages, or is otherwise sparse, and helps nip it in the bud. Nonetheless, until the infection rate is overwhelmingly high, the strategy would always prove to be significantly more time and cost effective than individual trials.
The first step is to estimate the extent of infections in the population, that is, the probability of a sample from a given population testing positive. The smaller this probability, the more pronounced are the benefits of group testing. Even when playing it unrelentingly skeptical and safe, considering the worst-case scenarios and not missing a single infection, the method proves to be quite effective.
Let us consider the case of India in April where, on average, 1 in 24 samples tested positive. This translates to roughly 4 in every 100 test subjects having the disease. Let us accommodate for time, error, and caution and take it to be 5 in 100.
So we know that roughly 5% of the samples we are testing are positive. We won’t treat this number as an absolute parameter, which would have further enhanced the efficiency of group testing (where we cease tests and ignore the remaining batches at any stage when five positive batches are found) while only slightly compromising its accuracy, by letting a few positives slip by. A single spark neglected can rekindle the wildfire.
Before delving into the algorithm, a simple illustration can demonstrate how much effort and time pool testing can save. We know that on an average we can expect 5 positives per 100 samples, which translates to 1 positive sample per 20 of those tested. Say we have a thousand samples to test. We make pools of 20. Realistically, for random pooling, most of the pools would have a single positive, while some might have none, and others might have two or, unlikelier yet, three, four, or five positives.
These pools are tested and the ones that test negative get eliminated. The entire pool is discarded because if the signs are absent from the pool, they are absent from each of the samples.
The remaining pools are each divided into two. For most of these pools, either of their size-10 sub-pools will test negative, and no further sub-testing would be required in them – that is, three tests will eliminate 10 for most of the pools. The rest will take varying numbers of further tests.
Now, let us apply this binary splitting to pool sizes of 100, where the estimated count of positives per pool is 5. We split it into two, each thus comprising 50 members, and test both pools. If any pool tests negative, we discard it and declare all members of the pool to be negative.
We repeat the halving. We continue iterating this test-discard-divide process. The number of pools will grow and their size will shrink. Once the pool size starts falling beneath the expected number, it seems prudent to sweep the residue and just test all of them for the sake of worst-case, but overall the binary splitting prevails in efficiency.
When the estimated cases are 1 in 100, playing it safe would prove at least seven times as efficient as testing one-by-one even in its worst sub-case when taking a conservative (full-safety) approach.
There’s another method that can be used in testing where the infections are is known to be sparse. Say a laboratory has only eight sample test-kits and needs to test 50 individuals. This can be accomplished in a single go if it is known that there will only be a few positives. Each individual sample can be divided into half as many parts as there are test-kits, that is, in this case, four.
Let’s say the test-kits are labelled A, B, C, D, E, F, G and H. The four portions are distributed among a unique combination of four test-kits for each sample. Each individual sample thus gets assigned a unique sequence of letters.
For example, sample No 7 gets ABDF, which implies portions from sample 7 were put only in test kits A, B, D, and F and not in kits C, E, G and H. Now if exactly those four kits, A, B, D and F, only test positive, one can know that only Sample 7 was positive. If more than four kits yield positive results, one can’t ascertain which sample was positive, but any sample whose portion went into any of the remaining kits (which tested negative) can be affirmatively declared uninfected.
In essence, the decision of choosing pool sizes is a struggle between breadth and depth – the number of pools, and the number of members of each pool. Pools can also be natural – based on geography, demography, or some other class and categorization, such as occupation or travel history.
A caution that must be exercised is that if group sizes are too large, there’s a high dilution of the indicators (say antigens, antibodies, or byproduct biochemical species) and the concentration of these indicators might fall below the sensitivity thresholds of the detection tools and instruments leading to false negatives. Hence pool testing with very large pools is recommended only with precise testing apparatus.
In spite of all this, the general, average efficiency of group testing is so great that it is always recommendable to have at least some extent of pooling. Batch eliminations in pooling will enable large swaths of the workforce to return quickly to their workplaces, and could prove instrumental in saving the worst-ailing of all the pandemic’s victims – the economy.