November 14th, 2013 Social News Mining

On Thursday, November 14, 2013, OLC attended Social News Mining, presented by Columbia Journalism School, which featured Carlos Castillo, a Social Scientist at the Qatar Computing Research Institute in Doha. He worked at Yahoo! Research, and moved to be a researcher. He is an active researcher with more than 40+ publications. Castillo focuses on the application of web mining methods.

Carlos Castillo presented Social News Mining and Automatic Content Analysis of News. He compared communication scholars to computer scientists and explained that media and communications scholars start with high level questions whereas computer scientists start from low-level observations. “We need to find a middle ground!” Castillo said.

Regarding predictive analytics using social media, people on Twitter react immediately to news and Facebook is slower at catching up. “Traffic is affected by bursts in attention and it’s most prevalent in 0 to 1 hour of news,” he said. According to Castillo, news and in-depth news serve different behaviors. News is time sensitive and intense in its first hour. In-depth, however, has a longer shelf life and a longer burn. Castillo presented the types of new visitation profiles, which were decreasing (78% of content in 12 hours), steady (9%), increasing (3%), rebounding (10%).

Castillo talked about prediction of events, and explained that short-term traffic is to a large extent correlated with long-term traffic. Social media signals are correlated with traffic and shelf life; that is, more reactions means more traffic and more discussion means more shelf life. “Don’t remove overachievers; promote underachievers,” Castillo said.

He talked about news crowds and news curators in social media and the transient news crowds. “The crowd gathers around a news article, but later, they gather again. That part is probably related to the first story,” he said. The observation Castillo made is that most crowds disperse quickly. “Just because someone ReTweeted an article, it doesn’t mean they read it.” He instead, said to focus in articles, which focus on users.

Another one of Castillo’s research revolved around analyzing closed captioning on television networks. He found out that sentiment scores on TV captions go from neutral to positive. “Strong positive words are used more than negative words,” he said. He also found out that networks with more resources can cover more stories, but some prefer to cover only prominent ones, and other prefer to focus on niche content.