August 15th, 2013 To Be or Not To Be IID: That Is The Question

On Thursday, August 15, 2013, OLC attended Machine Learning’s event, To Be or Not To Be IID: That Is The Question, with William M. Pottenger as the guest speaker. 

First, Pottenger asked about higher order information. “What is it?” he asked. “There are things you can discover by looking at importance of connections.” According to Pottenger, link mining and collective classification were related works. In short, Pottenger boiled it down to: “What are LSIs?” Latent Semantic Indexing is based on singular value decompositions and uses a lot of image processing. There are different geometric values and most of the zeroes are not zeroes when truncated into two dimensions.

LSI is used to leverage intellectual property in a number of ways. “We have basically proved and demonstrated empirically that LSI is based on the use of higher order co-occurrence relations.”

Pottenger used multinomial and multivariable event models to define the chain subgraphs and the result to get a high order path. There are patterns of connectivity between features and using second-order paths, which are more frequent, he said that users can leverage them to built a product. “You can take a label-training dataset, break it down and get a count of frequency of words occurred in the high order paths. It’s a very simple idea.”

HONB achieves statistically better performance than NB based on four t-test results. HONB can distinguish bases and it consistently outperformed NB, and HOSVM surprisingly outperformed SVMs.