May 20th, 2013 NY MySQL to Cassandra

On Monday, May 20, 2013, OLC attended New York MySQL’s meetup, MySQL to Cassandra: Big Data, High Scale, Data Migration... Oh My! held at AOL. The featured speakers were Scott Bonneau, CTO and Executive VP of Engineering at Bazaarvoice and RC Johnson, NYC Engineering Manager at Bazaarvoice.

Scott Bonneau, as the CTO and EVP of Engineering, is responsible for the building of critical infrastructure systems at Bazaarvoice, a data storage system that serves over 10 billion interactions per month. Bonneau worked at Google for four years, as well as leadership positions at RGM Advisors, Lombardi Software, MessageOne and Trilogy Software. 

Bazaarvoice was founded in 2005 and its basic thesis was based on Amazon’s datacapturing tendencies. Teams at Amazon would analyze the user-generated data and Scott Bonneau felt that he should be able to do that too.

“The value proposition of Bazaarvoice was that nobody reads ads,” Bonneau declared.

“Okay, that’s not really 100 percent true, but ads don’t get read. We place more trust in other people’s reviews than we do with advertisements. Feedback from people are read differently than from ads.”

Bonneau gave a brief breakdown of Bazaarvoice’s thesis.

1. Help business capture display, share and analyze user-generated content

2. Connect brands with the point of purchase conversation

3. Enable retailers and brands to participate in shopper marketing media

“We aggregate for these brands and give them a dashboard to help brands operate with their own,” he said.

And this is how Bazaarvoice does it:

• Hasten and managed SaaS solutions

• Full service

• White label clients use JavaScript

• Serves all of the necessary content including image, CSS, JS

• Serves content every time and when any one of our clients are involved

“Bazaarvoice currently serves over 2,500 brands,” Bonneau said.

In terms of scaling, Bonneau said, “scaling is our middle name.” Bazaarvoice serves more than 100 million page views, displays 800 millions pieces of content and it receives 95,000 new pieces of content a month.

Regarding their technologies, Bazaarvoice uses DataStore, made up of MySQL, Hadoop, ElasticSearch among others. Its application uses Java, Tornado, mGINX and node; and the UI is composed of JQuery, freemaker, Ember.js and others.

“We did find some cracks in the foundation, though,” Bonneau confessed. “We found that it had some slow rendering, hard to recover content—we decided to revisit everything that we had done at the beginning of the strategy and worked back from there.”

“Things like latency matter a lot. We need to send out content as fast as we can. We need to be able to get to throw data at any time. We do a lot of machine learning,” Bonneau said.

RC Johnson took over for Bonneau from here.

“There are three different things we created:”

• Emo – an emotions database

• Databus – emotions database

• Pollui – ElasticSearch, erodes databus events and displays index.

Johnson talked at length on breaking Emo apart. “There are key horizontal scale on read/writes, it is fault-tolerant, multi-master and multi-document.” On structure, Johnson said that there would be one ring for product data and another one for content. In other cultures, they’ve been exposed to MySQL, Mango and HBale.

Bazaarvoice’s SQL is designed for writers and it can scale with your changing plans. Its noSQL is designed for rendering.

On Pollui, Johnson described it as a declarative, rule-based, denormalized search index.

“Bazaarvoice uses ElasticSearch clusters—the backbone—of the system.” It can be customized with the AWS Cloud plugins and the discoverability/SOA/Monitoring programs.