Data as complete, clean, contextual, consumable information

NEW YORK—When we’re drowning in data but still thirsting for information, what does that say about data’s role out there? Prakash Nanduri, founder and CEO of Paxata, thinks there is a way for business users and information workers to understand data more as information: “when it’s complete, clean, contextual and consumable.”

 

http://www.meetup.com/DataDrivenNYC/events/229759269/

 

Nanduri was at the Data-Driven meetup last April 11 at AXA Equitable Center with three other data companies,

Haoyuan Li, CEO of Alluxio on next generation storage; Florian Douetteau, founder and CEO of Dataiku on its data science platform and Sri Ambati, founder and CEO of H2O.ai, with its machine learning API for smarter applications.

 

With its self-service data preparation software, Nanduri shows us how it works to make information out of data in various phases. From its presentation of visual guidance and library of tools to help everyone make education assumptions, Paxata gives you the tools and guides you proactively with the raw or messy data based on your history of data preparation.

 

From its library, Paxata recommends improvements based on crowdsourced answers. Lastly, it automatically transforms data for immediate consumption as it continuously learns from user interactions. A visual paradigm, he said, is created.

 

Can you make a data analyst and data engineer work together? Dataiku’s Doutteau thinks two mindsets can co-exist-- the clickers and coders.  “You have to make those two work together.”

 

Dataiku is the developer of DSS, the integrated development platform for data professionals to turn raw data into predictions. The new integrated visual environment in DSS3 includes a dedicated production node feature that solves the problem of development environments typically disconnected and incompatible with production environments.

 

One can now deploy, test and roll-back instance of data applications in the data engineering process, which permits the team to build, run and improve data products.

 

Haoyuan Li, CEO of Alluxio (formerly Tachyon) flew from California to talk about its memory speed virtual distributed storage, with its memory-centric architecture designed for memory i/0.  

 

Renamed a month ago, Li talked about how Alluxio has come a long way from the time it started in summer 2012 at the University of Berkeley AMPLab to the time it became open source in 2013 to the company’s deployment in 100 companies. It has raised $7.5 million from Andreessen Horowitz, the leading VC firm based in Silicon Valley..

 

“We power up your workloads,” he said, citing how Baidu queries data 30 times faster (now). “We enable new workloads across storage systems. We work with frameworks of your choices and scale storage and compute independently.”

 

Ambati of H20 said the company scales statistics, machine learning and math over Big Data. It develops predictive analysis applications for such as tasks as detecting fraudulent transactions, forecasting online customer purchase, and predicting best time for running ads, among others.