April 25th, 2013 Datakind - Data in the Big City


OLC attended Datakind’s event, Data in the Big City on Thursday, April 25, 2013 held at ThoughtWorks. Chris Corcoran from the Analytics branch of the Mayor’s Office along with Jaqueline Lu of NYC Parks Department and Brian D’Alessandro from media6degrees gave an informed talk about using data to help New York City and us— the inhabitants.


Chris Corcoran started off the talks by stating that the presenters were here to talk about data, but he first gave a short background to show how he ended up becoming part of the analytics team of the Mayor’s office. Corcoran first worked for a management consultant before moving to a software company based in Washington D.C. and in San Francisco. The company was focused on the idea of giving people different motivations. “We quickly found out that money has no effect on motivation,” Corcoran said. “Instead telling people how much electricity their neighbor is using cut their electric use significantly. Relation of data is very powerful,” he said. Corcoran also worked with Fab.com, where he and his team worked with a lot of data. “It was almost overwhelming,” he admitted. Corcoran said that he didn’t know what to do with the data a lot of the time. He found quickly himself working for Mayor Michael Bloomberg and his office of analytics. “[The office] is about touching people’s lives everyday,” Corcoran said.

“We’re not a big data office,” Corcoran said. “There’s about one million buildings in New York City, but the data is pretty much static. We’re taking in stock, base level data and monitoring how it changes every day. We started off with a problem and data grew over time with more and more agencies we’re working with.”

Corcoran talked about illegal conversions as his first example. “An illegal conversion is breaking up your apartment or any building to add more rooms for housing. AirBnB breaks New York state code, actually. The Department of Buildings takes this seriously and remediates them,” he said. The population of NYC is growing, but the head count of city employees are static—or decreasing due to downsizing divisions in response to economic affairs. “We only have a set number of city inspectors, so the question is how can we find these apartments that are hazardous.” The answer is using data.

“We know that Mayor Bloomberg has strict laws on smoking. There are people that go down to Kentucky and load up their vans on cigarettes to sell them here in New York. They sell it to bodegas and they sell it to people here below market price. That means that the city loses out on the tax. How do we identify illegal cigarette sales? Well, data doesn’t have to be complicated. We find these sales through other means. We find them when they commit other crimes, like selling cigarettes to minors. Once you make illegal sales, you’re more likely to cross that boundary more often. It’s also viewing EDT stamps fraud. The point here is that we’re aggressively pursuing public health policy,” Corcoran said.

The last example given was about the “dirtiest” neighborhood in New York City. “We want to do a project with the Department of Sanitation—actually, they’re the ones who came up with it. They want to identify the dirtiest blocks in NYC and one way is through 311 calls. Believe it or not, everything south of Houston Street has been identified as the dirtiest neighborhoods. Maybe it’s because people there are more in-tune with technology and are more comfortable calling 311 about trash. We’re going through a neighborhood- by-neighborhood comparison. It’s all about resource allocation, but it’s a real challenge to the city,” he concluded.


Next, Jacqueline Lu and Brian D’Alessandro tried to answer the question, “Does pruning trees reduce future tree hazards?”—the answer is yes, but the two tackled the question with in-depth analysis.

“New York City parks take up about 29,000 acres throughout the boroughs. That’s 14 percent of NYC’s land area. We take care of 600,000 trees—according to the last data taken in 2006. The parks department does more than just count trees, we take care of them too. We pruned 32,000 trees in 2011, planted 10,000, removed 12,000 dead trees and handled 22,000 emergencies,” Lu said. She explained that storm events in NYC are increasing in severity. “We’re being tasked with being first responders—we’re the ones removing fallen trees so emergencies can be tended to as quickly as possible. A question that was asked frequently was, “Can tree pruning reduce tree hazards?’ People have never quantified this to see what the data would tell us, so we took it upon ourselves to find out,” she said.


Brian D’Alessandro was tasked with explaining the theory and mathematics behind the data analysis. “It turns out that this problem is familiar to me. They wanted to quantify what block pruning has on problems—emergencies. Well, here are the benefits of reducing tree hazards, first. It’s promotes public safety. It allows for better resource management and makes NYC more beautiful,” he said. “Well, if you can intelligently allocate your resources, when a storm happens, you’re prepared—this is the motivation behind the pruning program. Pruning helps to keep the tree from catching wind like a sail.” D’Alessandro looked at trees in storms to create a hypothesis. It was here that data came into the question and furthered his involvement in the project.

“It turns out that this problem is a causal estimation project. It’s used a lot in advertising agencies. It’s a particular problem that involves my professional background, so I jumped on this project. A causal estimation is if I change X, does it change Y? The gold standard here is controlled experimentation,” D’Alessandro said. He gave an example of pruning 100 trees in a certain block and not pruning 100 trees in another. “This would be an example of A/B testing. The Parks Department has data about every single tree. There’s a census for every one of them. Observational methods allow for us to statistically create an A/B test.”

The ingredient for this is data. The subject ends up being city blocks; the treatment, pruning; the outcome, tree hazards in the future; and the confounders, variables. “Naturally, low-lying areas are more prone to hazards,” D’Alessandro reminded. He carefully explained in detail causal experimentation A/B testing. “The issue with confounding is that there are certain aspects that affect treatment and the outcome,” he said. “For this particular problem, trying to estimate the number of tree hazards. What causes a tree hazard? Naturally, the number of trees increases hazards. Blocks are also affected by locations. Tree pruning is not exactly random.”

For the results, he gave two graphs. One was titled, “Naive Result”—which was without confounding. “There is a 16 percent increase in tree hazards. So basically, tree pruning causes hazards. This is from a naive result using bad statistics. This doesn’t take into account the environment that the trees are in,” he said. When compared to using a daily robust method, pruning reduced hazards by 22 percent. “If you intellectually prune, you have the chance to results instead of randomly pruning.”

To conclude, D’Alessandro left the audience with words of wisdom. “Controlled experiment is ideal, but not always feasible. Also, when relying on data, naive estimation can give you the wrong results. And remember that strict assumptions should be met and the results are as good as the validity of the assumptions,” he said.