Researchers have created an artificial neural network that identifies the activities in a user’s average day through a catalogue of images.
New research in neural networks may let computers identify our daily actions more accurately than the apps on the market that track things like GPS location and heart rate. A new computer model has achieved about 83 percent accuracy in identifying the activities it sees in real-life images—and with just a bit of training it could do this for any user it encounters.
Led by Georgia Tech graduate students Daniel Castro and Steven Hickson, researchers have created an artificial neural network designed to identify scenes in so-called “egocentric” photographs taken from the user’s point of view. These usually come from wearable cameras like Narrative Clip, MeCam, Google Glass, and GoPro, but regular cell-phone photos often work as well. The team gave the network its skill by training it with a set of about 40,000 images taken by a single individual over a six-month period. This dedicated volunteer manually associated each image with an activity, and naturally settled on using 19 basic activity labels. These labels include driving, watching TV, family time, and hygiene.
A separate learning algorithm combines the neural network’s guesses with metadata about the day and time at which the image was captured. This allows the network to learn common associations between activities and even make predictions about the user’s upcoming schedule.
“It’s this ensemble-like method, where we trained on top of a deep learning method,” says Hickson. “So it can leverage the deep learning, and the basic contextual information on daily activities.” (See “10 Breakthrough Technologies 2013: Deep Learning.”)
Wearable technology developers could offer much more insightful services with this technology. The researchers imagine an app that notices a user’s eating or exercise habits and suggests possible adjustments. And since it can learn your schedule, it could make intelligent suggestions on the fly, like leaving early for work due to a traffic report. Castro says it might even let an app reorganize your activities throughout the day so you can get through them more efficiently.
Microsoft researcher Gordon Bell has worked on so-called e-memory, which aims to assist human recall with computers. He says that the key is giving machines the ability to recognize the content of photos. “Every one of these steps forward [for machine learning] is incredibly valuable,” says Bell. “I’d look at [this indexing ability] as something that will enhance your long-term memory by being able to find things in earlier situations.” He says that in the future, e-memory algorithms could search a wide variety of photos from more than just the egocentric viewpoint, “so it’s got a wide range of applicability.”
Happily, not every user has to compile a 40,000-image database to take advantage of this technology. When the team tested its machine-learning ensemble on two new volunteers, it struggled with the changes in lifestyle. Hickson says they “did just a quick study” on the effect of fine-tuning the model, training it with just a single day’s worth of egocentric photos from their two new volunteers. The accuracy of the results increased dramatically, he says.
As always with wearable cameras, however, there are complex issues of privacy and user trust. Point-of-view photography (“egography”) allows insights that can be extremely useful when put to work for users, but it can also create a very desirable target for criminal hackers and nosy advertising companies. The practice is even becoming political through the spreading use of police body cameras to automatically record interactions with suspects (see “Controlling When the Cameras Record”).
Some of the issues could evaporate if the hardware needed to run intensive machine learning algorithms in consumer-grade mobile devices becomes available. If data no longer has to travel over the Internet for processing, the researchers say, security becomes a lot more manageable. Castro says the challenge is whether we can “figure out what these privacy issues are now so we don’t run into problems later, say five years down the line, when these devices are available.”
The researchers do examine the possibility of an image-analysis algorithm that could complement theirs by identifying and removing private information from images automatically—a casual request of the machine-learning community that would have seemed far too aspirational just a few short years ago.