The Pastry A.I. That Learned to Fight Cancer – The New Yorker
One morning in the spring of 2019, I entered a pastry shop in the Ueno train station, in Tokyo. The shop worked cafeteria-style. After taking a tray and tongs at the front, you browsed, plucking what you liked from heaps of baked goods. What first struck me was the selection, which seemed endless: there were croissants, turnovers, Danishes, pies, cakes, and open-faced sandwiches piled up everywhere, sometimes in dozens of varieties. But I was most surprised when I got to the register. At the urging of an attendant, I slid my items onto a glowing rectangle on the counter. A nearby screen displayed an image, shot from above, of my doughnuts and Danish. I watched as a set of jagged, neon-green squiggles appeared around each item, accompanied by its name in Japanese and a price. The system had apparently recognized my pastries by sight. It calculated what I owed, and I paid.
I tried to gather myself while the attendant wrapped and bagged my items. I was still stunned when I got outside. The bakery system had the flavor of magica feat seemingly beyond the possible, made to look inevitable. I had often imagined that, someday, Id be able to point my smartphone camera at a peculiar flower and have it identified, or at a chess board, to study the position. Eventually, the tech would get to the point where one could do such things routinely. Now it appeared that we were in this world already, and that the frontier was pastry.
Computers learned to see only recently. For decades, image recognition was one of the grand challenges in artificial intelligence. As I write this, I can look up at my shelves: they contain books, and a skein of yarn, and a tangled cable, all inside a cabinet whose glass enclosure is reflecting leaves in the trees outside my window. I cant help but parse this sceneabout a third of the neurons in my cerebral cortex are implicated in processing visual information. But, to a computer, its a mess of color and brightness and shadow. A computer has never untangled a cable, doesnt get that glass is reflective, doesnt know that trees sway in the wind. A.I. researchers used to think that, without some kind of model of how the world worked and all that was in it, a computer might never be able to distinguish the parts of complex scenes. The field of computer vision was a zoo of algorithms that made do in the meantime. The prospect of seeing like a human was a distant dream.
All this changed in 2012, when Alex Krizhevsky, a graduate student in computer science, released AlexNet, a program that approached image recognition using a technique called deep learning. AlexNet was a neural network, deep because its simulated neurons were arranged in many layers. As the network was shown new images, it guessed what was in them; inevitably, it was wrong, but after each guess it was made to adjust the connections between its layers of neurons, until it learned to output a label matching the one that researchers provided. (Eventually, the interior layers of such networks can come to resemble the human visual cortex: early layers detect simple features, like edges, while later layers perform more complex tasks, such as picking out shapes.) Deep learning had been around for years, but was thought impractical. AlexNet showed that the technique could be used to solve real-world problems, while still running quickly on cheap computers. Today, virtually every A.I. system youve heard ofSiri, AlphaGo, Google Translatedepends on the technique.
The drawback of deep learning is that it requires large amounts of specialized data. A deep-learning system for recognizing faces might have to be trained on tens of thousands of portraits, and it wont recognize a dress unless its also been shown thousands of dresses. Deep-learning researchers, therefore, have learned to collect and label data on an industrial scale. In recent years, weve all joined in the effort: todays facial recognition is particularly good because people tag themselves in pictures that they upload to social networks. Google asks users to label objects that its A.I.s are still learning to identify: thats what youre doing when you take those Are you a bot? tests, in which you select all the squares containing bridges, crosswalks, or streetlights. Even so, there are blind spots. Self-driving cars have been known to struggle with unusual signage, such as the blue stop signs found in Hawaii, or signs obscured by dirt or trees. In 2017, a group of computer scientists at the University of California, Berkeley, pointed out that, on the Internet, almost all the images tagged as bedrooms are clearly staged and depict a made bed from 2-3 meters away. As a result, networks have trouble recognizing real bedrooms.
Its possible to fill in these blind spots through focussed effort. A few years ago, I interviewed for a job at a company that was using deep learning to read X-rays, starting with bone fractures. The programmers asked surgeons and radiologists from some of the best hospitals in the U.S. to label a library of images. (The job I interviewed for wouldnt have involved the deep-learning system; instead, Id help improve the Microsoft Paint-like program that the doctors used for labelling.) In Tokyo, outside the bakery, I wondered whether the pastry recognizer could possibly be relying on a similar effort. But it was hard to imagine a team of bakers assiduously photographing and labelling each batch as it came out of the oven, tens of thousands of times, for all the varieties on offer. My partner suggested that the bakery might be working with templates, such that every pain au chocolat would have precisely the same shape. An alternative suggested by the machines retro graphicsbut perplexing, given the systems uncanny performancewas that it wasnt using deep learning. Maybe someone had gone down the old road of computer vision. Maybe, by really considering what pastry looked like, they had taught their software to see it.
Hisashi Kambe, the man behind the pastry A.I., grew up in Nishiwaki City, a small town that sits at Japans geographic center. The city calls itself Japans navel; surrounded by mountains and rice fields, its best known for airy, yarn-dyed cotton fabrics woven in intricate patterns, which have been made there since the eighteenth century. As a teen-ager, Kambe planned to take over his fathers lumber business, which supplied wood to homes built in the traditional style. But he went to college in Tokyo and, after graduating, in 1974, took a job in Osaka at Matsushita Electric Works, which later became Panasonic. There, he managed the companys relationship with I.B.M. Finding himself in over his head, he took computer classes at night and fell in love with the machines.
In his late twenties, Kambe came home to Nishiwaki, splitting his time between the lumber mill and a local job-training center, where he taught computer classes. Interest in computers was soaring, and he spent more and more time at the school; meanwhile, more houses in the area were being built in a Western style, and traditional carpentry was in decline. Kambe decided to forego the family business. Instead, in 1982, he started a small software company. In taking on projects, he followed his own curiosity. In 1983, he began working with NHK, one of Japans largest broadcasters. Kambe, his wife, and two other programmers developed a graphics system for displaying the score during baseball games and exchange rates on the nightly news. In 1984, Kambe took on a problem of special significance in Nishiwaki. Textiles were often woven on looms controlled by planning programs; the programs, written on printed cards, looked like sheet music. A small mistake on a planning card could produce fabric with a wildly incorrect pattern. So Kambe developed SUPER TEX-SIM, a program that allowed textile manufacturers to simulate the design process, with interactive yarn and color editors. It sold poorly until 1985, a series of breaks led to a distribution deal with Mitsubishis fabric division. Kambe formally incorporated as BRAIN Co., Ltd.
For twenty years, BRAIN took on projects that revolved, in various ways, around seeing. The company made a system for rendering kanji characters on personal computers, a tool that helped engineers design bridges, systems for onscreen graphics, and more textile simulators. Then, in 2007, BRAIN was approached by a restaurant chain that had decided to spin off a line of bakeries. Bread had always been an import in Japanthe Japanese word for it, pan, comes from Portugueseand the countrys rich history of trade had left consumers with ecumenical tastes. Unlike French boulangeries, which might stake their reputations on a handful of staples, its bakeries emphasized range. (In Japan, even Kit Kats come in more than three hundred flavors, including yogurt sake and cheesecake.) New kinds of baked goods were being invented all the time: the carbonara, for instance, takes the Italian pasta dish and turns it into a kind of breakfast sandwich, with a piece of bacon, slathered in egg, cheese, and pepper, baked open-faced atop a roll; the ham corn pulls a similar trick, but uses a mixture of corn and mayo for its topping. Every kind of baked good was an opportunity for innovation.
Analysts at the new bakery venture conducted market research. They found that a bakery sold more the more varieties it offered; a bakery offering a hundred items sold almost twice as much as one selling thirty. They also discovered that naked pastries, sitting in open baskets, sold three times as well as pastries that were individually wrapped, because they appeared fresher. These two facts conspired to create a crisis: with hundreds of pastry types, but no wrappersand, therefore, no bar codesnew cashiers had to spend months memorizing what each variety looked like, and its price. The checkout process was difficult and error-pronethe cashier would fumble at the register, handling each item individuallyand also unsanitary and slow. Lines in pastry shops grew longer and longer. The restaurant chain turned to BRAIN for help. Could they automate the checkout process?
AlexNet was five years in the future; even if Kambe and his team could have photographed thousands of pastries, they couldnt have pulled a neural network off the shelf. Instead, the state of the art in computer vision involved piecing together a pipeline of algorithms, each charged with a specific task. Suppose that you wanted to build a pedestrian-recognition system. Youd start with an algorithm that massaged the brightness and colors in your image, so that you werent stymied by someones red shirt. Next, you might add algorithms that identified regions of interest, perhaps by noticing the zebra pattern of a crosswalk. Only then could you begin analyzing image featurespatterns of gradients and contrasts that could help you pick out the distinctive curve of someones shoulders, or the A made by a torso and legs. At each stage, you could choose from dozens if not hundreds of algorithms, and ways of combining them.
For the BRAIN team, progress was hard-won. They started by trying to get the cleanest picture possible. A document outlining the companys early R. & D. efforts contains a triptych of pastries: a carbonara sandwich, a ham corn, and a minced potato. This trio of lookalikes was one of the systems early nemeses: As you see, the text below the photograph reads, the bread is basically brown and round. The engineers confronted two categories of problem. The first they called similarity among different kinds: a bacon pain dpi, for instancea sort of braided baguette with bacon insidehas a complicated knotted structure that makes it easy to mistake for sweet-potato bread. The second was difference among same kinds: even a croissant came in many shapes and sizes, depending on how you baked it; a cream doughnut didnt look the same once its powdered sugar had melted.
In 2008, the financial crisis dried up BRAINs other business. Kambe was alarmed to realize that he had bet his company, which was having to make layoffs, on the pastry project. The situation lent the team a kind of maniacal focus. The company developed ten BakeryScan prototypes in two years, with new image preprocessors and classifiers. They tried out different cameras and light bulbs. By combining and rewriting numberless algorithms, they managed to build a system with ninety-eight per cent accuracy across fifty varieties of bread. (At the office, they were nothing if not well fed.) But this was all under carefully controlled conditions. In a real bakery, the lighting changes constantly, and BRAINs software had to work no matter the season or the time of day. Items would often be placed on the device haphazardly: two pastries that touched looked like one big pastry. A subsystem was developed to handle this scenario. Another subsystem, called Magnet, was made to address the opposite problem of a pastry that had been accidentally ripped apart.
Read more from the original source:
The Pastry A.I. That Learned to Fight Cancer - The New Yorker