David Robinson wrote about his experience after a year as a Data Scientist at Stack Overflow. I haven't read it yet, but it the opening paragraph turned me on to his series about analyzing baseball statistics using the beta distribution.
- Understanding the beta distribution
- Understanding empirical Bayes estimation
- Understanding credible intervals
- Understanding the Bayesian approach to false discovery rates
- Understanding Bayesian A/B testing
- Understanding beta binomial regression
This is a great series of posts -- I've been learning how to use the beta distribution to estimate uncertainties in clickthrough rates, so finding David Robinson's blog was like stumbling into the lecture hall for a class I didn't know I needed. And not getting immediately kicked out, I guess?
Aziz Ansari, on idle internet browsing:
Like, here's a test, OK. Take, like, your nightly or morning browse of the Internet, right? Your Facebook feed, Instagram feed, Twitter, whatever. OK if someone every morning was like, I'm gonna print this and give you a bound copy of all this stuff you read so you don’t have to use the Internet. You can just get a bound copy of it. Would you read that book? No! You'd be like, this book sucks. There's a link to some article about a horse that found its owner somehow. It's not that interesting.
Greg Borenstein believes the cutting edge of AI will actually be in designing interaction patterns for machine learning systems. Supervised learning algorithms need human-labeled examples to train generalized models. Google and Facebook are leaders in the field first and foremost because of their massive systems for collecting labeled data. Google crawls the web and uses statistical text analysis to turn the entire web into examples. Facebook's users volunteer labeled data whenever they post. If you want to build a powerful AI system, solve the collection problem first -- and that's the territory of user experience and interaction design.
Once models generate predictions, what will users do with them? If the learning algorithm can show us its reasoning, we can critique that reasoning and generalize our own learning.
Users trust learning systems more when they can understand how they arrive at their decisions. And they are better able to correct and improve these systems when they can see the internals of their operation.
Of course, just like in arguments with other humans, we're more likely to uncritically accept conclusions that we agree with -- which is how machine learning can drift into "money laundering for bias".
Contra Aziz Ansari, I come across this article in my early-morning skim of Twitter. Yes, if it had been printed as a book it would suck... but there is still wheat in that chaff. ↩︎