The Happy Technologist Interesting Geekdom


R Slopegraphs and the HHP Leaderboard

I'm still working on my visualization-fu, so when the Heritage Health Prize finally got announced, the final scores provided a simple source of data that I wanted to investigate.

I've written about the HHP before.  After spending three years with the competition, the winners were announced at Health Datapalooza just a few days ago.  Prior to the announcement, the teams had been ranked based on a 30% sample of the final data, so it was of some interest to see what happened to the scores against the full 100%.  For one thing, I personally dropped from 80th place to 111th, and the winners of the $500,000 prize jumped from 4th place to take the prize... not an unheard of jump, but given the apparent lead of the top 3 teams it was somewhat unexpected.  The results were published on the HHP site, but I scraped them manually into a .csv format for a little simpler manipulation.  An Excel file with the raw and manipulated data is attached here:  HHP Final Standings for convenience.

A decent visualization for this before-and-after style information is the slopegraph.    Here's an example:

hhp_slopegraph_top_50 (720)

Top 50 Teams


Two Years with the Heritage Health Prize

ED: I spoke to a reporter yesterday for a half hour or so, discussing the final stretch of the Heritage Health Prize data mining competition I've been a competitor in for the past couple of years. Her article came out today and is posted here: 3-Million-Health-Puzzler-Draws-to-a-Close. I'm quoted as saying only: "They set the bar too high". I probably said that; I said a lot of things, and I don't want to accuse Cheryl of misquoting me (she was quite nice, and her article is helpful, well written, and correct), but I feel like a lot of context was missed on my comment, so I'm just going to write an article of my own that helps explain my perspective... I've been meaning to blog more anyway. 🙂

On April 4th 2011, a relatively unknown company called "Kaggle" opened a competition with a $3 Million bounty to the public. The competition was called the "Heritage Health Prize", and it was designed to help healthcare providers determine which patients would benefit most from preventive care, hopefully saving the patients from a visit to the hospital, and saving money at the same time. And not just a little money either ... the $3 Million in prize money pales in comparison to the billions of dollars that could be saved by improving preventive care. The Trust for America's Health estimates that spending $10 in preventive care per person could save $16 billion per year, which is still just the tip of the iceberg for soaring health care prices in the United States.


Big Data: It’s not the size that matters

Many an article has been spent defining "Big Data"... everyone agrees that "Big Data" must be, well, large, and made up of data. There may be seemingly new ways of handling big data:tools such as Hadoop and R (my personal favorite) and concepts like No-SQL databases, and an explosion of data due to new collection tools: faster and more prolific sensors, higher quality video, and social websites. Large companies with the wherewithal to build petabyte and larger data centers are learning to collect and mine this data fairly effectively, and that's all very exciting -- there's a wealth of knowledge to be gleaned from all this data. But what about the rest of us?

The thing is, it's not really a matter of collecting and hoarding a large amount of data yourself. It's how you use and take advantage of the data that you do have available that is at the core of these new trends.