|Ananda Theertha Suresh|
By way of explanation, Suresh provides a theoretical example: “Imagine you’re trying to calculate the best possible vehicle to buy, one that will move quickly over all types of terrain -- snow, desert, roads, etc. You will only be buying one vehicle, so you need to optimize for a wide distribution of probabilities.
“With the prior computational approach, known as the MinMax algorithm, you would look at a range of 100 miles and minimize the time it would take to travel on all types of terrain by different types of vehicles. In other words," he continues, "you’re trying to minimize how long it will take in a worst-case scenario. Eventually, you’d calculate that the best possible vehicle would be a tank, since a tank can travel on these worst-case scenario terrains, but a tank is not practical because it can’t travel well on the road, which is where most people are driving. Thus, with the MinMax approach, you lose functionality.”
|Prof. Arlon Orlitsky|
In their paper, Suresh and Orlitsky showed that instead of favoring the MinMax approach, where one optimizes for the worst-case distribution, the Good-Turing frequency estimation can be used to optimize for every distribution, not the just worst-case. “So if the MinMax estimator calculates you should have a tank, the Good-Turing estimator is like having a Transformer, where whatever situation you’re in, it adapts to it.”
Applications for the Good-Turing estimator include ecology (estimating the probability of animal populations with a certain physical trait, for example) and a field known as natural language processing, which is a branch of computer science concerned with interactions between computers and human (natural) languages. The “autocorrect” function on a smartphone is one example of a probability estimator at work, where the computer tries to estimate which word the user is trying to type, given a set of past observations.
“Good and Turing came up with this estimator in World War II and people have been using it for more than 60 years,” says Orlitsky, who holds dual appointments in the departments of Electrical and Computer Engineering and Computer Science and Engineering. “This paper provides perhaps the clearest explanation to date as to why it works.”
Watch Suresh's presentation at the NIPS conference in this video: