The Happy Technologist Interesting Geekdom


A Moment for the HTC EVO 3D

I've been a Sprint user for over 10 years, at least according to Amanda, who cheerfully explained to me why my cell phone bill never makes sense but that they appreciate my loyalty anyway as she sold me my new phone a couple of weeks ago.

My new phone is the HTC EVO 3D, but enough about that for now. First, it's important to talk about my PREVIOUS phone, which was the very underrated Samsung Moment.

The Moment was a very early generation Android phone which managed to hit just about all the design elements I wanted. Despite being a bit too slow (Angry Birds never played quite right) and missing out on simple things like multi-touch which Samsung apparently left out to get it to market quickly and inexpensively, it was one of my favorite phones. My Moment was a replacement for a Palm Treo which dutifully kept by my side for several years (forever in smartphone land).

Samsung MomentThe Moment was a wide slide-out keyboard styled phone. If anyone is reading this that designs phone keyboards, go pick this up and play with it -- it's the best. The keys are clearly separated and slightly raised so that touch-typing, such that is is on a teensy-weensy keyboard is actually possible. I wasn't quite able to bang out entire novellas without looking, but I could get pretty far into a decent text message with minimal mistakes while watching Netflix. The keyboard rocked.

Moreover, the Moment set aside the typical 4-button Android interface (Home, Menu, Back, Search) that seems prolific, instead opting for the three required buttons (Home, Menu, Back), and two buttons dedicated to phone operation (Pickup and Hangup, where the Hangup button also acted as a power button for the overall phone). Most importantly, though, the phone had a tiny touchpad that depressed as a select button. I haven't seen better cursor control on any smart phone, although the Palm and the Blackberry dedicated rollerballs and rockers are fairly close.

The HTC EVO 3D with which I now entertain myself boasts none of this coolness. The more-than-4-inch screen is gorgeous, responsive (the phone is wicked fast), and I've whittled down the on-screen keyboard options to a few that I like (I'm currently using SwiftKey X which has a curious habit of predicting words when nothing has been typed -- it currently assumes that I want to say "I am a beautiful person." if I don't give it any other starting letters). But it's not as cute or cuddly as the Samsung Moment.

But, and this is very important:



Marbled Rye and Evolutionary Algorithms

Since no one has yet taken it upon themselves to write my unauthorized biography, it falls to me to make the following piece of information available to the public: I like to bake.

Breads and pies mostly -- I've got a couple of recipes posted here, including a pie crust that I'm pretty happy about, and a few things I've borrowed from other people. I've made a few rhubarbs lately that really turned out quite well.

One thing I recently attempted, was a Marbled Rye. This isn't a terribly difficult bread to make -- there are recipes everywhere. I was mostly pleased with it, though -- I didn't have any Caraway seeds, which add a lot of flavor, but the bread looked nice and better than a lot that I've made lately.
Chip's First Marbled Rye
One thing I experimented with, though, was yeast.

Yeast is one of those things I don't really understand. This is because the most I remember about the biological classification taxonomy was that everything was an "Animal", "Vegetable", or "Mineral" -- I have no idea which one a yeast would be. This was a problem for biologists as well, so in 1990 they changed the top three domains to be "Archaea," "Bacteria," and "Eukaryota," which has helped me in no way whatsoever because not only do I still not know which one yeast would be, but I no longer know which one I'm supposed to be, and I much preferred back when I was an Animal and the world made sense.

Anyway, yeast are largely responsible for the existence of Bourbon, which automatically qualifies them as A Good Thing™ no matter what biologists call them. Baker's yeast, which makes us happy, is "Saccharomyces cerevisiae" (note the interesting comment in Wikipedia about Crohn's and Colitis on that page -- I never knew that), and lives everywhere, so it's pretty easy to get hold of. You can leave potato-starch filled water out for a while and yeast will just show up. All it does, really, is convert sugar into bubbles and alcohol. In breads, the bubbles (Carbon Dioxide) make the breads rise... in alcohols, the alcohol well, makes the alcohol alcoholic. Yeast is glorious.


Computing on the cheap with Amazon EC2

We here at the happy technologist tend to host our own servers, because we like it, and because we can. (We also speak in the third person when there's just one of us, but there's no accounting for some people). Nothing fancy, mind you... for now, a handful of websites are running on an Ubuntu virtual machine through VirtualBox on a Windows 7 (or maybe Vista, I forget) box that otherwise serves as a Media Center. It's actually simpler than it sounds.

Lately, though, what with the Heritage Health Prize and a lot of hours spent learning and playing with data mining techniques, the poor little server has been called upon to do much more intensive work. It's routinely running simulations and calculations all night long and it's really not built for that. The fan has started humming heroically (i.e. loudly), which isn't always best for a media center.

Noone wants their media center to hate them, or to catch fire.

Enter Amazon EC2. That stands for Elastic Compute Cloud. See how clever that is -- what they did with that 2 there? Rather than go "ECC", they just counted the C twice and made it like a math or a chemistry equation. These Amazon guys are some serious funny. I'm actually very impressed with the setup they have. There's a wealth of options for configuring the virtual servers -- public AMIs (preconfigured images) are available for most major software vendor platforms, from the expected Oracle, Microsoft, and Linux offerings to MicroStrategy, R, Elastic Bamboo, Citrix, and even BitCoin configured software. Public data sets are available should you need them, advanced storage, database, failover, clustering, networking, identity management, queuing, notification, and probably a million other things at pennies-per-hour prices.

At the moment, I'm running a simulation on a 20-CPU 1.6 Terabyte beast of a machine for $0.228 per hour. This is the sort of thing that infuses me with glee. It's easily outperforming my media center by 30:1.


Unhappy Bits about Bitcoins

I wandered across bitcoin not too long ago, during some random web crawling, and downloaded it in May. I installed it, ran it, realized I was behind a firewall, killed it, uninstalled it and forgot about it for a couple of weeks until this Wired article came out and sent the whole world a'twitter about bitcoin again.

The Wired article, in short, talks about an underground website that sells illicit drugs and whose sole allowable currency is the Bitcoin. The website itself is shrouded in anonymity in the TOR network which itself is an excellent little piece of technology which I'm planning on running out of space to describe here just now, but you should look into it.

The Bitcoin spiked in popularity. You can buy and sell Bitcoins in open marketplaces such as Mt Gox (whatever that means) or Lillion Transfer if you're using some more international currencies, or you can use them directly on sites that take them, such as this Alpaca sock store. Prices quickly went from a few dollars to around $30, although they've now backed off a bit to around $20/BTC (Bitcoin).

Ok, so where are we? We can buy cocaine and alpaca socks with Bitcoins. Great. But what ARE they, again? How can you get some, and should you care?


Mosaics and More Algorithm Love

Photomosaic of Dad and Mom

My mom (whose website I should update) recently celebrated her birthday. My mom is an avid shutterbug, and abuses the digital camera we got for her a handful of Christmases ago to the tune of 10,000 photos a year, give or take. Our basement is piled with boxes of images just begging to be scanned, cataloged, sorted, and all that from back when mom was a film user (you all remember film, right?). I daydream of software that will actually be useful to that endeavor -- to say nothing of how fun it would be to digitize the piles of super-8mm movie film that goes along with it -- but so far we haven't made much of a dent.

In the mean time, though, I have a hard drive which contains about 150,000 photos from my mom's library... basically a full off-site backup in case horror happens to her computer. (ObNote: This is good practice, boys-and-girls, you should all go about giving hard drives away, with complete backups of your stuff in case of a disaster... you'll thank me when the revolution comes). Anyway, I was looking around for a way to leverage this wealth of digital media for a birthday present and decided to go with a photomosaic.


Hey, Happy Technologist got Published!

I'm rather excited about the fact that, after only six actual posts on the new blog, one got picked up for publication. The slightly altered version is available at Technology First, and you can get the full newspaper in .pdf too.

Let's hope this will be motivation to write more!


Musings from a few weeks of data mining

Ok, I'm still no expert data miner by a large margin, but I've learned a LOT in just a few weeks of playing with the Heritage Health Prize data. The folks on the Kaggle/HHP Chat Board are pretty helpful, and the internet is full of useful information. I've taken to using Excel and MYSQL far more than any mining-specific tools. I have been interested in R and RapidMiner, and I've been able to set up a few basic models with those tools. One thing I've been very happy with is the wealth of online tutorials available for just about everything. My resident 16 year old has been using them for a while to pick up piano and guitar songs, but I haven't had much use until now; I'm pleased to report that the quality of these online free video or web tutorials is pretty high. I have a list started as a tag set if you want to see what I've been watching.

I've made 9 submissions (the first two or three of which I don't count -- let's call those 'test' submissions). The 9th actually had a worse score than the 8th. Now that interests me. On my tests, which include several different sampling and "cross validation" methods on the two years of available data, my score on each submission improved from the last... not much in this last case, but enough for me to feel reasonable in submitting the algorithm. Why, then, did my result against the real data using the same algorithm go backwards? One possibility is that I've been overfitting the data. Basically, my algorithm makes assumptions that are either unnecessary or are only applicable to the sample data, and don't hold true for the final data. At the tolerances we're dealing with, it's still possible that this is just a random selection bias issue, but it's still interesting, and a common and very important problem in statistical data mining: how can you know when you've overfit? When do you know that you're "trying too hard" as it is. 🙂

Filed under: Data Continue reading

Dayton Technology Landscape Conference

Technology First is a local IT Trade Group, and their second annual "Technology Landscape Conference" was yesterday, so I dutifully (duty = I'm dating their intern) attended.

Ok, so there was some more duty... one of the companies presenting was ExpeData, a Dayton, Ohio (which is "local" for us folk) company who has a digital writing capture technology. We've been working with them for a few months to find some suitable applications and to discuss some security issues and requirements. It's a fairly interesting technology, although I have some trouble finding its killer-app.

Another interesting company whose presentation I attended was Persistent Surveillance Systems -- these guys have a 190+ MegaPixel camera array that they fly over the Cincinnati area (among others), taking pictures about once per second. When they hear about a crime, typically a murder, after the fact, they can go back and assign analysts to review the captured images to track people in the vicinity. Their software allows analysts to assign colored tracks and markers to people, vehicles, and anything else of interest -- they initially track suspects, then go back and track anyone they interacted with, anyone nearby (possible witnesses/accomplices), and whatnot. The large pixel view of the city and long video times allow them to watch people drive all the way to their destination -- a home, hideout, friends' house, or whatever -- where they can then work with police to get a warrant and follow up as appropriate. Their metadata is even good enough that they can apparently cross reference locations to find that, for example, the getaway driver from murder A may have lived next door to the suspect from murder B, which may help detectives tie together previously unrelated crimes.


Heritage Health Prize: On Algorithms, Rights, Patents and Patients

So my goal for the weekend was to submit an entry to the Heritage Health Prize. It took me until Monday night (have to work off-hours, this isn't a work-sponsored event), but our team (the Data Monkeys) (with Jeremi and Chris at this point) are now entered and somewhat amazingly NOT in last place! Yay!

But I'm ahead of myself... the Heritage Health Prize is a data competition run through Kaggle, who runs these sorts of things. It's a $3-Million prize competition for a method of predicting what hospital patients will spend time in a hospital given their prior years' medical history.

I've been wanting to enter something like this for a while. I don't house any real hopes of winning (I have some fake hopes, of course); this sort of money attracts teams with far more depth of experience in data mining algorithms than I have -- our team leans more towards data management, but not analytics. Still, this is an opportunity to head in that direction, so I'm going to take it.

Filed under: Data Continue reading

Data Ethics, Privacy, and Responsibility

We've seen a lot of high profile data privacy and data leak issues in the news lately.

And I'm sure the list could go on and on and on...