Hadoop Day (Seattle)

Attended Hadoop Day today at the Amazon PacMed building. Free “conference” setup – and it was very worthwhile.

The morning was panels and general discussions – the brightest spot being learning about the whirr project. I’m afraid the link has 100% suck to it because it says nothing about what the project is. I haven’t ready through all the code, but reports from others at the conference say that it is “nice tools to work with Hadoop”. Hopefully so. It’s a maven/java project with zippo documentation – not even a README that has an overview of any use.

The afternoon I sat in the “intermediate” track and picked up some really interesting pieces. I got the in-depth scoop on what’s happening with Hadoop and adding security from Jakob Homan, got a great introduction to Mahout from Jake Mannix (about to be a search geek employed at Twitter), learned about Prezi, which I’m thinking I’ll inflict on my coworkers some time, and was amused and interested in cascalog.

Cascalog = Closure + datalog + cascading

It turned out to be surprisingly (to me – I’m being unfair to Closure really) expressive and readable for making interesting and complex queries from Hadoop data structures – a very nice abstraction setup. I snorted at the thought of handing it to someone who had trouble with SQL though – it’s for programmers, not business analysts. I do wish I’d been able to get more skinny and in-depth viewing of cascading – it looks really effective at making queries and processing hadoop based data. I would have also liked to get some real meat and details on Oozie, which is Yahoo’s workflow engine for submitting mapreduce jobs into their hadoop clusters.

I took off a little early from the conference, but it was very definitely worthwhile. I wish the amazon environment had better wifi connectivity (rather sucked for guests), but in the end I didn’t really need it for what I gathered.

Posted in Geekstuff | Leave a comment

Hacking on OpenStack’s Nova

Like quite a number of other folks, I’ve been lurking on the OpenStack mailing lists since I saw the announcements. Friday, Eric Day put out a call to help with the “get this code into shape” against PEP8 and pylint.

“Ahh!”, I thought – an easy intro to getting into the project and it’ll give me an excuse to really read the code. So this weekend I started taking a stab at doing a little light buff and puff on the code to get the PEP8 and pylint code scores up a bit.

What I found is that it took some work to get everything from the codebase ready to really do some work on it. And the notes aren’t all in the same places on how to do that – what notes are there are all written mostly for Ubuntu. I was pretty sure most of this could be done on a Mac too – at least based on the dependency documentation, so I cobbled up some notes on getting rolling with Launchpad, this code base, and being able to run the tests to verify that my cleaning didn’t really break anything.

I put the resulting notes on the OpenStack wiki page http://wiki.openstack.org/HackingNovaMacOSX.

The big thing that I’m not sure about is the testing. Just a stock install is failing on one unit test, and it’s in an area I’m not very familiar with (auth & creating certificates). If any the OpenStack folk are reading this, here’s the error I’m seeing:

[ERROR]: nova.tests.auth_unittest.AuthTestCase.test_209_can_generate_x509
 
Traceback (most recent call last):
  File "/Users/heckj/Documents/code/nova/nova/test.py", line 222, in run
    d = self._maybeInlineCallbacks(testMethod)
  File "/Users/heckj/Documents/code/nova/nova/test.py", line 182, in _maybeInlineCallbacks
    g = f()
  File "/Users/heckj/Documents/code/nova/nova/tests/auth_unittest.py", line 162, in test_209_can_generate_x509
    signed_cert = X509.load_cert_string(cert_str)
  File "/Users/heckj/Documents/code/nova/.nova-venv/lib/python2.6/site-packages/M2Crypto/X509.py", line 655, in load_cert_string
    return load_cert_bio(bio, format)
  File "/Users/heckj/Documents/code/nova/.nova-venv/lib/python2.6/site-packages/M2Crypto/X509.py", line 639, in load_cert_bio
    raise X509Error(Err.get_error())
M2Crypto.X509.X509Error: 140735090166816:error:0906D06C:PEM routines:PEM_read_bio:no start line:/SourceCache/OpenSSL098/OpenSSL098-32/src/crypto/pem/pem_lib.c:650:Expecting: CERTIFICATE

update: Turns out the error was directly related to the version of OpenSSL installed on my laptop. I had version 1.0.0a from MacPorts installed and in my default path, which caused the error. Version 0.9.8l (base install in MacOS X) works fine.

sudo port deactivate openssl

did the trick and the tests are all running now. I updated the bug against Nova with those details, leaving it open – it ought to at least fail reasonably.

Posted in Geekstuff, devops, openstack | 4 Comments

breaking the private network oasis addiction with OAuth

Most reasonably (and larger) sized operations organizations have a pretty standard networking setup – or at least some close variation on the theme. ARIN wants public IP addresses behind load balancers, so most orgnizations front up their services through software or hardware load balancers. From there, it goes back to a highly response “web tier” – the spewing of content and the CDN source systems. Those back in to application tiers, and behind those are data persistence tiers (typically, your classic RDBMS). The only thing on the internet is often those load balancers. The networks are often segmented between app and data as well, often with firewalls, to “reduce potential intrusion”. It’s a good plan and pattern – it generally works. And while you own it all, you own your own network oasis of happiness.

Economic pressures being what they are, it is getting more effective to own what you absolutely need of infrastructure, and rent the rest. Shoot – if you’re small enough, owning anything at all for infrastructure may not make sense.

The problem comes when you want to start moving between those network oasis’ of happiness. Like, oh say, to a cloud provider. The end result is our services have all become addicted to this concept of secure, high-bandwidth, reliable access between tiers – that happy oasis. And that addiction is one that isn’t healthy – at least not in a world of elastic scaling architectures. We’re getting used to breaking that addiction when we integrate with external services – facebook, twitter, etc. The mashups are breaking the mold of how this has been done in the past. And we need to break our addiction!

Why am I asserting this? As I look at a large number of applications, they don’t fit a “cloud provider” very well – at least not when you start to get into the realm of dynamic or elastic scaling. Most providers have something akin to an “internal network” which we can leverage as consumers. As we get to the logical conclusion we will find ourselves wanting to shift work between one “oasis of network happiness” to another. The solution today? Expand to fill all available resources and then buy some more at the same place.

That ain’t gunna to cut it.

For us as consumers of infrastructure of a service, we want one provider to be as good and flexibly as another. That means commoditization of the infrastructure (which I believe we’re starting to see now, although the infrastructure providers will fight it tooth and nail). That means we need to be able to shift from one to the other at a moments notice.

Here’s an example:

You have enough resources for 100 units at provider A, and you are running up to a high level usage that looks like it’ll exceed that. You call up Mr Infrastructure A, who reluctantly informs you “Sorry, all sold out – it’ll take 6 weeks to add capacity”. You also happen to have a not-quite-as-good deal with provider B that costs a little more. All sounds good so far – your uber-cool software provisioning system smacks down a couple of VM images and spins them up…. except it’s a got a problem: connecting the stuff running at provider A to the new stuff you’ve just spun up.

And that’s assuming that you’ve solved all sorts of somewhat evil configuration problems knowing to indicate to a component what it is and who it should be talking to to get its work done. That’s a topic for a different post though.

I don’t have THE answer, but I do have AN answer. Follow the mashup leaders: take the OAuth & REST pattern back into your office, workspace, colleagues, whatever. That’s what we’re doing after all – mashing up some services. It’s just that we normally think mashup means “with someone else’s stuff”.

OAuth and REST work together beautifully and with some coordination can be the answer to providing the answer to the security question “Should I allow this external request to access these resources?” The OAuth 2.0 spec has a segment in it – section 1.4.4 (version 10) – that walks through the flow for this to happen. Routing over SSL does a pretty reasonable job of getting the data encrypted as well. The cost: running an Authorization server.

(You can pull off this same trick with OAuth 1.0: the not-quite-standard-but-defacto “2 legged OAuth” routing. The problem: not everyone and all the libraries agree on how to make that work seamlessly.)

In order to allow our ourselves to start treating everything – infrastructure as well – as software with it’s speed of change we need to be able to dynamically allocate resources. Once we have services that are all chatting across OAuth authorized links (encrypted or not), we’ve removed a huge impediment to being able to elastically scale our services.

Yeah, configuration. Kind of bitch, ain’t it. Another post, eh?

Posted in Geekstuff, Ranting and Reflections, devops | Leave a comment

the CMDB is dead, long live the CMDB

I work in an environment that has an existing CMDB. Over the past year, I’ve spent a fair number of man hours from my team and an equal number of hours of my own thinking about what it is, what we want from it, and how so much of what’s available today just doesn’t cut it.

The thing that we’ll label as a “CMDB”, to me, isn’t. It isn’t a configuration management database. For us, it’s an inventory of assets – digital and physical. It’s a metadata store that allows us to assign ownership, and an ancillary data set that makes categorizing incidents, requests, and changes in the classic service management sense a bit easier across a very wide organization. If any small company came to me and said “yeah, gotta have a CMDB!” I’d be looking very closely at how potentially insane they were. Most small orgs and companies just don’t need it. It’s honestly only useful when you breach some amount of scale.

The worst part is the ITIL definition of what should be in a CMDB has been effectively unachievable because of the costs associated with it. The classic ITIL world of CMDB has this data repository being updated with process (typically manual) as changes are approved – it’s meant to represent the “desired state” of an operational world. Only it doesn’t. Really, it never has. And even with the highest priced tools on the market today never will. At best it’s an audit-against tool that you can see “yeah, it matched or didn’t match when we ran that scan a few days ago”.

It doesn’t have to be this way though. What most of us want from a CMDB is what we get implicitly, to some degree, from many of our monitoring solutions – a digital map of our environment. The monitoring pieces created their own version – typically configured by hand, or sometimes configured automatically with a hand to help guide (Zenoss and Hyperic both do a pretty good job at this). The monitoring systems then use that data model to know who to alert when something goes wrong, or if they’re really good – to share some set of analysis around “service X is down because the component A that it relies on is down”.

Virtualization is pushing this all right over a critical tipping point. The “old” CMDB is dead, lets jump to the new. We need a model of our environment that maps our physical and digital assets. We need it to show us dependencies in an ever increasing world, and we need it to help inform us – especially in a larger organization, who to contact if there’s an issue with a service. If we have to fill out all this data and information by hand, we’re lost. The rate of change is increasing, and we *want it* to increase. Look to the model of continuous deployment, the natural successor to the software development process that is continuous integration. Now in classic #devops style, let’s apply that right on through and into operations and running our services.

What doesn’t exist today, in our collective musings about a DevOps toolchain, are (currently) the tools to integrate the knowledge that the deployment tools have into updating a digital asset model . Even these tools don’t know the dependencies (i.e. what database is this rails app using, which memcache server/port combo is being used, etc) – but it’s there, just slightly under the covers.

The other place where we want/need this knowledge stashed? Our monitoring systems. The continuos tests against our live services to assert they’re OK. Many open source systems include some level of a model just implicitly in their configurations. Nagios, Munin, Zenoss, Hyperic, etc. I am still struggling to find monitoring systems that have the concept of dynamic configuration through an API access built in to the base of them. Still – much of their configuration is something that we might naturally want to store in a map of our services.

The way to get this data? Drive it from the tools that are implementing the changes. Have it as a service that can be updated and modified through simple API’s that Buildout, Func, Capistrano, Fabric, ControlTier, or whatever can access and inform. Use the manifests and details that the system configuration tiers (BCFG, cfengine, Chef, Puppet, SmartFrog) have been built with to populate this map as they deploy and invoke their services.

This is all a step to moving all of our infrastructure, historically so very physical, into the digital world. There are tremendous efficiencies to be gained – both financially (using our physical assets more effectively, or just using what you need from an existing infrastructure provider) and from a service perspective (being able to reconfigure and deploy your services to match the market needs).

Much of devops is focused on deployment, because that’s where we spend most of our time today. That’s good – but we can not forget that it is just one small part of the overall process for these services from inception to retirement.

And before any of the classic CMDB folks find me and start shooting, yes – I’m very aware of the work that is going on at CMDBf around federating CMDBs. The idea there is good – they’re heading in the right direction. At a 10,000 foot view they’re going the right direction with their standard. The foundation that it is built on is – in my opinion – now outdated and needs to be revisited. The implementation needs to be simplified. I would recommend you look at the labels of the members that are coming together around the federated CMDB concept. Do you see anything in there that shouts of “open, adaptable, flexible”? I don’t. I see the same kind of collaboration that led to J2EE standards and the W* standards around “web services”. What is needed is something simpler, more open, and with publicly available implementations. I would never expect BMC, CA, HP, IBM, or Microsoft to help provide that – it’s just not in their best interests when they have revenue tied to the services and software they provide in this space today.

Posted in Geekstuff, Ranting and Reflections, devops | 7 Comments

Django 1.1 Testing and Debugging

I borrowed a copy of Django 1.1 Testing and Debugging from a friend today. Spent the bus ride home from the office flipping through the pages, and I’ve got to say – it’s a pretty darned good book!

If you’re new or newish to Django development, it’s a book that I think would be good to have in your stable. Even with Django 1.2 now released, everything in the book is super relevant, and Karen Tracey has done a really wonderful job of explaining and then showing with details the soup to nuts run of how to do and deal with testing a django application … and more importantly how to debug when things go awry.

The book covers unit tests, doctests, testing through the WSGI interface with the Django unit test extensions, and even driving a website test set with twill. It’s thorough and chock full of examples and walk through goodness.

The debugging is even more detailed – going to a lot of trouble to explain tracebacks, the format of the Django error pages, how to get convenient debug data on your django project (using Django Debug Toolbar), and finally – how to even use the python debugger to step through execution. She goes to a lot of trouble to set up real-world scenarios that are slightly and subtly broken and then walk through the whole process of solving the issue with the tools at hand.

I’ll be keeping John’s copy for another day or two, and then getting my own copy to stack into the reference shelf.

Posted in Geekstuff, django | 1 Comment

Ubuntu 10.04 (Desktop)

I’ve been using Ubuntu as my distribution of choice for VM’s and server instances, and on a lark I took a swag at installing Ubuntu desktop onto a VM yesterday. I’ve got to say, it’s a pretty usable setup.

I still completely prefer the Mac, but the installation was relatively painless, the browser setup pretty good – and installing all the tidbits that I wanted to fiddle with for development was very easy.

I don’t even really know what I’ll use it for – other than something to experiment with and try out. I don’t really need another VM with a desktop interface, but I thought it would be interesting to see where it’d gone.

I recall Mark Shuttleworth (for whom I’ve a great deal of respect) making a comment earlier this year about “this is when Linux takes the desktop”. He might be right no. I recall at the time thinking, “Dude – Apple’s doing this iPad thing and you’ve better change your sights!”. Well, based on what I see in his blog with the “Unity” interface, they’re taking a pretty interesting stab at it. One of the things that I think Apple really has going for it in the tablet space is the multi-touch programming interfaces. They cooked them for several years with the iPhone, and now they’re solid and beautiful to work from a programmer’s and designer’s point of view. I don’t know what Ubuntu or any of the linux distros are doing with the interface space there, but I hope they’re paying close attention to the programming paradigms – that’s what is making the platform so damn powerful.

Now that I spent an afternoon installing it, I think I’ll probably nuke it and get back that 20GB of space, but it was fun to play with…

Posted in Geekstuff, django | 3 Comments

WordPress 3.0 upgrade

New theme, and updated back end – moved everything this afternoon to WordPress version 3.0. Upgrade worked like a champ – and I took the time to twiddle the theme into something new and switched around the various widgets.

Since I started using WordPress several years ago, it’s really come along solidly. I upgrade it these days directly from subversion. My hosting solution doesn’t have FTP access, which makes upgrading the plugins a bit trickier, but all in all it’s been really slick and clean for the using.

Posted in Geekstuff, Ranting and Reflections | Leave a comment

WWDC 2010 Wrap-up

I’m leaving San Francisco, and Apple’s WWDC, this year more introspective than inspired. The conference puts us under NDA, so while there was some new stuff shown and talked about there, I can’t pass it all along. Not yet anyway, there should be some interesting new stuff to talk about once iOS 4.0 is out and in the wild.

It’s no secret that most of the conference focused on the iPhone and iPad – what they’re now calling “iOS” (and yeah, I thought “Hey, doesn’t Cisco own that brand name…” when I first heard it). The desktop/laptop Mac OS X didn’t get much play time, the iOS operating system instead getting the red carpet treatment.

The engineers at Apple have managed to build, update, and ship an entire OS for specific platforms once a year for the past 3-4 years. I’ve got imagine that they’re nearly, if not completely, mentally bust at this point. I’m not at all surprised that “iOS 4″ won’t be out until Fall for the iPad – I suspect they’ll call it iOS 4.1 or 4.2. Apple’s not growing like mad (at least that I can tell), which means to me that they have been and continue to push at a fevered pitched for the accomplishments they’re making. I hope they don’t push so hard they fry their best and brightest that make up the core of the culture that’s doing all this impressive lifting.

The “other focus” of the conference was around new developer tools – Xcode 4 (developer preview) is quite a bit of engineering and accomplishment. To me it’s centrally a fulfillment of an idea that Chris Lattner presented and posed for possible futures in an LLVM session two years ago. It is more than that, and I’d love to wax a bit more enthusiastic about it, but I’m afraid that runs into NDA covered territory, so I’ll stop here on that topic. I did come away from the conference having submitted more enhancement requests than ever before around that tool chain.

Dr. Michael Johnson (@drwave) gave a talk in the middle of the conference, impressive in display and content, and really talking to the “why” and “how” of making tools. The message I took away this year from his talk was akin to an artist extolling “know your medium”. He didn’t outline the pros/cons of each medium that he works in per se, but just expressed what he could do in each and why knowing that medium was important. I’m still reflecting this year’s talk off the meme that he was touting a year or more ago (and was reflected a bit in this year’s talk too): “using tools to remove the tension in the room”. I guess it was two years ago at WWDC, we talked a bit. He related a viewpoint that “tools shift power” and was being very mindful of wanting to shift it in a way that was ultimately positive, not negative. Being a bit of a tools guy myself, I find myself reflecting on that quite a bit. The real bitch is that it’s often hard to predict which way the “right way” is between organizational politics and getting our jobs done.

For all the cool and amazing stuff that I saw at WWDC, I’m finding that I’m not walking away from this one inspired, scheming, and planning for something new, cool, etc. Oh, there’s the idea for the iPhone or iPad client application to this or that, but really I find myself instead thinking about my day job.

This week has given me some much needed time to step back and away from the office. I deleted my mail accounts from my iPad and iPhone to make sure I wasn’t even tempted to check them (those red badges can be like a red flag to a bull when you’re a somewhat compulsive person). What are we really focusing on and what difference is it going to make? How do we push or pull the whole organization forward to make it better? I have ideas. Some “maybe 60-70% right” ideas but no definitive “I’m 100% sure of it” answers right now.

Posted in Ranting and Reflections | Leave a comment

Rites of Passage

I saw on the news last night a story about a girl (Abby Sunderland) being rescued while on a solo around-the-world sailing trip. She was in the Indian Ocean, and the rough seas had taken their toll on her boat, demasting her and leaving her drifting. She’d been on a sat-call apparently when it happened, and the sat call (which used the mast as an antenna) was abrubtly cut off. Her emergency beacon’s went off and Search and Rescue out of Australia came roaring out, found her and verified that she was OK and stable (the boat wasn’t taking water) – they should be picking her up today, if they haven’t already.

Abby is 16, and was going for a record: youngest solo around the world. Pretty damned impressive.

At the article in the Seattle PI online, the first comment was made by someone calling themselves “Bobert”. He wrote:

Her dad’s a moron, no need for a 16 year old to sail solo around the world.

I don’t know who this person is, but whomever they are – they symbolize something that is very, very wrong with our society today. Preserve and protect at all costs. What the fuck have we come to? Our rites of passage are down to “driving”, “graduating from high school”, “getting laid for the first time”, and sometimes “graduating college”. What happened to Thomas Jefferson’s american ideals – where the hell did we loose them? Maybe what passes for mainline culture thinks a rite of passage is a barbaric, untidy thing. It’s “dangerous”, and not for those who haven’t “proven themselves responsible”. They might get hurt.

How the hell do you prove you can be responsible in this day and age? By looking down, fitting in, not making waves, and getting by? We need to be looking up and out, trying things, and reaching out for our limits. We need to think, plan, and move ourselves forward, not just follow along.

I don’t know about being in the Indian Ocean during the winter months down there, but to my mind Abby was taking a rite of passage. Also trying to get the youngest-solo-around-the-world record is neat, but to me somewhat irrelevant. She was going out and doing something that I’m sure has touched her in some significant ways, and none of the externally visible.

One of the best things I did around College was go one a four-month walk-about in Europe. By myself. Right after college, I got a backpack and my parents got me the tickets and a Eurail pass. I was off. I think that was my most significant rite of passage. Being by myself for months, in countries where I didn’t even speak the language, teaches you something about yourself.

So to the Bobert’s out there – wake up and fucking smell the coffee. We are not going to make a difference in the world by being sheep. Going on a rite of passage, by whatever you want to call it, should be praised for the effort, not belittled.

Posted in Ranting and Reflections | 6 Comments

WWDC 2010

Ticketing system replaced and launched, I’m ready for a decompress. There’s pretty much no better place to do that, in my mind, than at WWDC in San Francisco this coming week.

I used to send myself every other year. Strangely, I’ve been attending for over 10 years now. Seems odd that it’s been so long. It’s been wild watching the changes in the conference over the years. This year is no different. There’s hardly any “just” MacOS X sessions now – iPhone and iPad (not surprisingly) crowd out the whole schedule. The time to selling out the conference is shorter this year than before, and I rather expect the crowds to be as insane as ever. Fortunately, many of my friends from Seattle Xcoders will be there – even if they love taking digs at each other over sax interface xml parsing code and “unnatural love”.

Best of all, the really big project weighing on my mind for the past month is out – I can go play, hack, drink, laugh with my friends, and not worry about deadlines for a while.

Posted in Geekstuff, Ranting and Reflections, iPad, iPhone, mac | 1 Comment