NYC Machine Learning
On Thursday, December 20th, 2012, OLC attended the NYC Machine Learning Meetup at Pivotal Labs featuring Joseph Turian, head of MetaOptimize. He consults on machine learning, natural language processing and predictive analysis. Turian talked about crowd-learning programming patterns and its current state and where the tech market is moving towards.
Joseph Turian is a post-doctoral research fellow at the Université de Montréal, currently studying deep method learning. He has 20-year coding experience and focuses on using sophisticated machine learning techniques to approach large-scale problems in natural language. His firm, MetaOptimize is a consultancy on big data, healthcare, finance and other models.
Turian talked about the "ghosts of work," which was categorized into three buckets: past, present and future. "In the past," Turian said, "work was delegated to other people." The present, "work is delegated to both people and computers." The future, "we think that work will be delegated to artificial intelligence." Artificial intelligence can be measured using the Turing Test, which can measure the machine's ability to "successfully imitate humans." The new age for AI "is driven by research interested in methods that move us to AI space," Turian said. "Deep machine learning is a trick to do AI."
The task of AI is to create a computer that successfully imitates a human. "The trick is for the computer to pass the work to the human and pass it off as AI," Turian said. "We don't have true AI right now, but we want it. We have to settle for artificial-artificial intelligence, or to just to leave out the double negative, natural intelligence. This means person to computer to people. It's crowd programming [crowd labor]."
"Crowd labor is basically crowd-sourced information work for pay. It's not volunteer crowd sourcing, like Wikipedia. It's not on-demand errands or virtual outsourcing or even traditional BPO [Business Processing Outsourcing]," Turian said. "Crowd-labor sourcing is very helpful—a virtual assembly line can be created and work can be delegated to specific people. Multiple eyes per task means a reliable result, versus, the in-house approach, where costly training and one person per task means the company can't scale to meet demand and results in less than acceptable results."
Crowd programming uses humans and computer power. It allows for more accurate data entry, and crowd labor means automatic quality assurance (QA) versus temp workers or interns, which bring about 86% accuracy to crowd labor's 98% accuracy. "Crowd labor allows for an increase in scalability and accuracy," Turian summed up.
"Sentiment analysis is error prone," Turian said. "Crowd labor, though, gives accurate results. We can take output from people and take it as training data for computers."
"It's proven that demands for products increase if grammar in reviews are well-written," he said. "It's hard to develop grammar check, so interns are hired, but that's time-consuming and costly. If you use crowd labor, you save money and increase profit margins."
"When we're talking about intelligent applications that we want versus intelligent applications that we have, what we have right now is commoditized crowd-labor applications and enterprise crowd-labor applications," Turian said. "Cloud computing and crowd labor both stack in similar ways: app, platform and infrastructure. Cloud is abundant (app), robust (platform) and unreliable (infrastructure). Crowd labor is broken up into two stacks: commoditized and enterprise. The commoditized side is few (app), immature (platform) and unreliable (infrastructure). The enterprise side is some (app), generic (platform) and unreliable (infrastructure)."
"If you want to build a crowd-labor platform, you need to do the following," Turian said. "Build your own code—you must build auto QA for each module in your business plan. Use CrowdFlower—it's good for one-shot annotation and has no native support for business processes; you need to code that yourself. It does need gold-standard data to do auto QA, though. Use MobileWorks—it feels like they actually care about you as a third-party developer. This is what you actually want. The advantages are that there is native support for business processes and there's no need for gold-standard data, but the disadvantage is that it's an immature platform. And if you have small capital, let's say $500 to spend, use oDesk for labor."
Turian asked how a robust crowd-labor platform be developed. "Well," he said, "there are two ways: curation and technology. Curation allows for reliable workers as opposed to technology, which produces unreliable workers."
He also listed a few platforms to watch in 2013. "Know these crowd-computing systems," Turian said. "These are best positioned to dominate enterprise. MobileWorks is the most ambitions yet the least mature. CloudFactory has BPO down to an art and its the fasted at building new apps. CrowdFlower is the best self-serve platform and it will have the biggest app-sentiment tool. Servio, I think, will be a quiet juggernaut. It has the best enterprise app: product merchandise app."
The technology for auto QA involves reputation, grading, gold-standard, voting and patterns. "Intelligent dispatching and rating is a hard valuable open problem. Worker communication is necessary." To create business processes, like invoices or bill processing, the answer is to do it manually. "But can we do this automatically?" Turian asked. "The answer is to program the crowd. You start with one module, and then the crowd splits it into two steps. Locate each field, transcribe each field. Program the crowd to program the crowd!"
To sum up, he reiterated the emergence of crowd labor. "It will be game-changing," he said. "There will be dramatic changes in AI apps. It will all be ubiquitous soon."