Nearly at the top of that first hill

I’ve been thinking about the past week at the OpenStack Design Summit (Bexar) solidly from last night (flying home from San Antonio, TX) through the various errands I’ve been running today. This morning Rick Clark tweeted “A question about OpenStack”. As I think about it, this shouldn’t be about what is going right and wrong, but where the project is and what will provide the most benefit by improving it.

I’m saying all this after a week with the OpenStack guys – both in design sessions and just chillin’ out. Focused, intelligent, demanding conversations scattered through the week with an amazing “no-ego” attitude presenting itself. Not that there weren’t some good ole technical “best way to do it” or “which is better” fights, but given the breadth of this project and the open nature with vendors lurking all around the corners – well, frankly I expected a lot more “special interest” to be clearly showing itself. Everyone at that conference was interested in making OpenStack better at every turn.

250 people, 12 countries, 90 companies/organizations – all that after 3 months from being publicly announced. And they’re going it without any prior structure – building up an OpenStack foundation, doing all the legal and community building, right from scratch. And yeah – that’s showing right now.

The first thing I see that will provide the biggest gains:

  • “How do we all work together?”

Some of the best sessions were around “What does the status X of bugs mean” and talking through the development and release process. At this point I’m convinced the core folks are reasonably comfortable with LaunchPad (the platform the system is hosted on) – and being at the conference really taught me a great deal about how OpenStack is effectively using it. Prior, it wasn’t comfortable or familiar to me at all. The object store and compute (swift and nova, respectively) core groups are really quite separate teams, all trying to figure out how to get some common ground in re-using code, libraries, and even setting up documentation.

The second:

  • “Show me it’s workin’, again and again”

OpenStack is quickly heading to be the kernel or core of a platform. You could see it in the twinkle of Eucalyptus’ eye when they talked about Swift (the object store), or chatting with the folks from Scalr or RightScale. The whole system is being built with API in mind from the ground up, and while there is some pretty good unit testing in play and continuous integration, it was clear that installing this sucker was a PITA – and the documentation to really pull that all together starting coming together in the documentation sprint and install fest at the summit. One of the “blueprints” of the design summit (i.e. “Things we want to do, and how we want to do it for the next release”) is to get some fully automated integration testing as well as track the metrics on how the system is operating. There were a lot of folks that have some cross over into the Drizzle project, and the ideas of running and tracking benchmark data on every revision is darned power.

Add to that the benefits of a constant flow of functional testing against a couple of pre-defined clusters of both compute and object store, and you have a powerful engine to make sure trouble is spotted early and can be resolved quickly.

The third:

  • “How’s this thing tick?”

One of the admitted weak points is that some small, damned effective core teams have done most of the work – and if you want to understand the system, well… you’ve just got to read the code. That is a huge investment – and frankly a barrier to entry into the project that can be avoided with some effort towards docs and discussion. Again, great progress was made there (I learned what the “project” concept was in Nova at the summit) – but the interactions between components, what the components are responsible for, and what they’re *not* responsible for, are all kind of tricky to learn right now.

This extends down into digging into the code, where docstrings could be better (and are getting better!) so that if you wanted to go help with something specific, you didn’t have to grok a broad codebase to get a handle on what the impacts are of the changes you’ll need to make.

And the last thing I’ll throw in here:

  • “What OpenStack isn’t, or won’t do…”

The project is still in a lot of flux. There were some great components that were shown off at the summit that ride over the top of the infrastructure, or work with it through APIs. Should those be a part of OpenStack, or on the side? Some service providers were very interested in more platform-kind of elements – a common logging infrastructure, a common authentication, ID, and authorization infrastructure. Should that be a part, or on the side? How tightly or loosely do we want to couple some of these elements? The philosophy is there and forming up, but the real truth of it all will be over the next 6-12 months of the project when decisions are made, reviewed, and a core forms out of it. There have been a few architectural decisions made early: “Don’t mandate anything in the client”, “If a feature would restrict scale, it MUST be optional”, etc. that I absolutely applaud. I think it will form up more as projects apply to join the OpenStack umbrella and either make it or don’t. It will become clear what’s common, and what isn’t, pretty darn quickly.

I’m pumped about this project, the people, and it’s future. The core openstacker’s have clearly been driving up a steep hill to get to where they can level out a bit and move into more of a marathon mode. Really, it feels like we’re nearly at the top of that first hill.

Posted in Geekstuff, openstack, Ranting and Reflections | 1 Comment

more always on applications … in the cloud

I took the “always on application” and ran through some numbers this evening. I was curious – with the hosting options available today, what would it cost someone to run an “always on” application.

The way I’m thinking about it, GAE probably isn’t the right thing. They don’t easily let you run repeated cron jobs, making an agent style application (i.e. proactive, rather than reactive) not quite a good fit. So what would be? A VM. Sure, you could slice it down farther and get a slicehost style thing – they’re excellent, but many of them have restrictions about “long running processes” – and that is exactly what I want.

So what’s it take to run a VM? There’s an interesting question. The cheapest I found was Rackspace – that offered a 256MB Linux hosting solution for $10/mo. Linode starts at 512MB for $19.95. Of course you need to at least make a vague attempt to compare to an EC2 instance. The always on really bites you there – the cheapest I could get my numbers was around $6/mo (a single micro instance, 1 yr commitment – minimal to no bandwidth).

I have to wonder – how well would something like that be received? Say you could offer a private agent/proxy service for $10/mo. I suspect you could get it cheaper – the compute needed for this kind of service would seem to smaller than the minimum you can purchase as a slice of compute. But people aren’t generally buying like this – at least not today.

I think most folks who have a desire for something like this would be more likely to have a desktop computer at home they could dedicate to leaving on. That’s assuming they have an internet service to their house too – so the real cost is more than the hosted compute, but since people often have it anyway… I’m guessing they’d be more likely to want to just use it.

The closest to paying as you go for an application gets to the Platform as a Service – Google App Engine or Microsoft Azure. Frankly neither of which I particularly want because they both represent a sort of technology lock in. In development now is OpenStack – Rackspace & NASA doing some of that base line commoditization work to drive down the costs of running small VM slices. We’re already on the sweet side of Moore’s Law for the commoditization of compute resources – it’s just a few more years until it’s a complete no brainer.

Posted in Geekstuff, Ranting and Reflections | 1 Comment

The “always on” for desktop apps…

I’ve been thinking about a problem I’m having with twitter – that I want to see more than I can easily watch unless I’m checking on it every hour or so. This gets especially problematic when I’m gone for a long weekend or *gasp* on vacation and completely off the grid for a week. I don’t want to see it while I’m gone – but I would like to be able to review through what happened after the fact.

I also like using the desktop and iOS applications for watching twitter. They’re generally really good – give me a nice sliding window of news. But they only “go back” so far. The essential problem is that twitter has a limit to the history they’ll let you easily interrogate for your feed. My feed is wide enough that I’m starting to miss some pieces, especially as more of it is coming from around the globe and different time zones.

In thinking about it, we I think I need architecturally is something that’s “always on” on the Internet that can grab and watch my feed. Or, more efficiently, accept a twitter stream as events if they have a pub/sub model that is user specific (I don’t think so, but I didn’t look).

To me, that means leaving a desktop or something on all the time – kind of a pain, really. It’s way more compute than this little task needs. And my head is into cloud computing and various infrastructures – so what would it take to have something running – application as a service if you will – that I could cobble to do this for me? I think it would be pretty easy to arrange this with Google App Engine – and that feels roughly about right. A platform as a service engine – I can write and lay down the code on top of it, and then access it from the web through a browser or specifically out of an iOS application. I suppose someone could do this on a slicehost or other thin virtualization platform as well. This could all go into a dedicated JEOS VM and run on Amazon too.

What we’re down to is what’s it cost to run this sort of thing on a regular basis. Always on applications – can you do it cheaper “in the cloud” than you can at home?

And this whole thing is getting darned close to the concept of a personal agent – dealing with (aggregating, filtering) higher speed feeds on the internet than I want to dedicate my attention to. If you go beyond twitter, there’s a whole pile of social networking feeds that I review and look at with a variety of tools today – Netnewswire, browser, other dedicated applications… but all about consuming, filtering, and aggregating information I’m interested in. I get a heady feeling looking forward at how we might use it all.

Posted in Geekstuff, Ranting and Reflections | 1 Comment

Vacation

Last couple of weeks have been spent on vacation – something I’ve always wanted to do – cruise up the Inside Passage to Alaska. We left from Seattle, WA, stopped at Juneau, Skagway, Ketchikan (all in Alaska) and were dropped off in Vancouver, BC. Karen, her parents, her aunt and uncle, and mom all went – one of the last carnival cruises for the season. Pat (Karen’s aunt) has a great photo album of it on Facebook with 35 chosen pics. We came home with something like 2.5GB of photos from our collected point and shoot cameras.

The amazing natural beauty is what I went for, and definitely what I received. The bass note for the tour was news that my grandmother, Lela, was going downhill fast and was in the hospital (received just after we left Seattle), and passed away when we were in Juneau.

Lela Pietzsch

She was an inspiration to me in so many ways – grew up in Burlington, Iowa, stopped at a 6th grade education, lived through the depression, World War II, survived two husbands, and I think that shows she’s the most adaptable – bought her first computer 10 years ago in 2000, even had a facebook page and presence. I abused people complaining about “being too old” for computing with her accomplishments. Her passing wasn’t really unexpected – not at 97 years old, and when I visited her this past June she was definitely looking very frail. Aside from being very tired, and the various physical complications of just being old, she was sharp minded and had a great laugh through it all.

I got back from this all last week, had a day off at home, and then back to work for a couple of days. Now the weekend, where I’m writing this and reflecting a bit on the past two weeks. It was a really good vacation, definitely well timed, as the upcoming weeks and months are going to be darned busy.

Posted in Ranting and Reflections | 2 Comments

Hadoop Day (Seattle)

Attended Hadoop Day today at the Amazon PacMed building. Free “conference” setup – and it was very worthwhile.

The morning was panels and general discussions – the brightest spot being learning about the whirr project. I’m afraid the link has 100% suck to it because it says nothing about what the project is. I haven’t ready through all the code, but reports from others at the conference say that it is “nice tools to work with Hadoop”. Hopefully so. It’s a maven/java project with zippo documentation – not even a README that has an overview of any use.

The afternoon I sat in the “intermediate” track and picked up some really interesting pieces. I got the in-depth scoop on what’s happening with Hadoop and adding security from Jakob Homan, got a great introduction to Mahout from Jake Mannix (about to be a search geek employed at Twitter), learned about Prezi, which I’m thinking I’ll inflict on my coworkers some time, and was amused and interested in cascalog.

Cascalog = Closure + datalog + cascading

It turned out to be surprisingly (to me – I’m being unfair to Closure really) expressive and readable for making interesting and complex queries from Hadoop data structures – a very nice abstraction setup. I snorted at the thought of handing it to someone who had trouble with SQL though – it’s for programmers, not business analysts. I do wish I’d been able to get more skinny and in-depth viewing of cascading – it looks really effective at making queries and processing hadoop based data. I would have also liked to get some real meat and details on Oozie, which is Yahoo’s workflow engine for submitting mapreduce jobs into their hadoop clusters.

I took off a little early from the conference, but it was very definitely worthwhile. I wish the amazon environment had better wifi connectivity (rather sucked for guests), but in the end I didn’t really need it for what I gathered.

Posted in Geekstuff | Leave a comment

Hacking on OpenStack’s Nova

Like quite a number of other folks, I’ve been lurking on the OpenStack mailing lists since I saw the announcements. Friday, Eric Day put out a call to help with the “get this code into shape” against PEP8 and pylint.

“Ahh!”, I thought – an easy intro to getting into the project and it’ll give me an excuse to really read the code. So this weekend I started taking a stab at doing a little light buff and puff on the code to get the PEP8 and pylint code scores up a bit.

What I found is that it took some work to get everything from the codebase ready to really do some work on it. And the notes aren’t all in the same places on how to do that – what notes are there are all written mostly for Ubuntu. I was pretty sure most of this could be done on a Mac too – at least based on the dependency documentation, so I cobbled up some notes on getting rolling with Launchpad, this code base, and being able to run the tests to verify that my cleaning didn’t really break anything.

I put the resulting notes on the OpenStack wiki page http://wiki.openstack.org/HackingNovaMacOSX.

The big thing that I’m not sure about is the testing. Just a stock install is failing on one unit test, and it’s in an area I’m not very familiar with (auth & creating certificates). If any the OpenStack folk are reading this, here’s the error I’m seeing:

[ERROR]: nova.tests.auth_unittest.AuthTestCase.test_209_can_generate_x509
 
Traceback (most recent call last):
  File "/Users/heckj/Documents/code/nova/nova/test.py", line 222, in run
    d = self._maybeInlineCallbacks(testMethod)
  File "/Users/heckj/Documents/code/nova/nova/test.py", line 182, in _maybeInlineCallbacks
    g = f()
  File "/Users/heckj/Documents/code/nova/nova/tests/auth_unittest.py", line 162, in test_209_can_generate_x509
    signed_cert = X509.load_cert_string(cert_str)
  File "/Users/heckj/Documents/code/nova/.nova-venv/lib/python2.6/site-packages/M2Crypto/X509.py", line 655, in load_cert_string
    return load_cert_bio(bio, format)
  File "/Users/heckj/Documents/code/nova/.nova-venv/lib/python2.6/site-packages/M2Crypto/X509.py", line 639, in load_cert_bio
    raise X509Error(Err.get_error())
M2Crypto.X509.X509Error: 140735090166816:error:0906D06C:PEM routines:PEM_read_bio:no start line:/SourceCache/OpenSSL098/OpenSSL098-32/src/crypto/pem/pem_lib.c:650:Expecting: CERTIFICATE

update: Turns out the error was directly related to the version of OpenSSL installed on my laptop. I had version 1.0.0a from MacPorts installed and in my default path, which caused the error. Version 0.9.8l (base install in MacOS X) works fine.

sudo port deactivate openssl

did the trick and the tests are all running now. I updated the bug against Nova with those details, leaving it open – it ought to at least fail reasonably.

Posted in devops, Geekstuff, openstack | 4 Comments

breaking the private network oasis addiction with OAuth

Most reasonably (and larger) sized operations organizations have a pretty standard networking setup – or at least some close variation on the theme. ARIN wants public IP addresses behind load balancers, so most orgnizations front up their services through software or hardware load balancers. From there, it goes back to a highly response “web tier” – the spewing of content and the CDN source systems. Those back in to application tiers, and behind those are data persistence tiers (typically, your classic RDBMS). The only thing on the internet is often those load balancers. The networks are often segmented between app and data as well, often with firewalls, to “reduce potential intrusion”. It’s a good plan and pattern – it generally works. And while you own it all, you own your own network oasis of happiness.

Economic pressures being what they are, it is getting more effective to own what you absolutely need of infrastructure, and rent the rest. Shoot – if you’re small enough, owning anything at all for infrastructure may not make sense.

The problem comes when you want to start moving between those network oasis’ of happiness. Like, oh say, to a cloud provider. The end result is our services have all become addicted to this concept of secure, high-bandwidth, reliable access between tiers – that happy oasis. And that addiction is one that isn’t healthy – at least not in a world of elastic scaling architectures. We’re getting used to breaking that addiction when we integrate with external services – facebook, twitter, etc. The mashups are breaking the mold of how this has been done in the past. And we need to break our addiction!

Why am I asserting this? As I look at a large number of applications, they don’t fit a “cloud provider” very well – at least not when you start to get into the realm of dynamic or elastic scaling. Most providers have something akin to an “internal network” which we can leverage as consumers. As we get to the logical conclusion we will find ourselves wanting to shift work between one “oasis of network happiness” to another. The solution today? Expand to fill all available resources and then buy some more at the same place.

That ain’t gunna to cut it.

For us as consumers of infrastructure of a service, we want one provider to be as good and flexibly as another. That means commoditization of the infrastructure (which I believe we’re starting to see now, although the infrastructure providers will fight it tooth and nail). That means we need to be able to shift from one to the other at a moments notice.

Here’s an example:

You have enough resources for 100 units at provider A, and you are running up to a high level usage that looks like it’ll exceed that. You call up Mr Infrastructure A, who reluctantly informs you “Sorry, all sold out – it’ll take 6 weeks to add capacity”. You also happen to have a not-quite-as-good deal with provider B that costs a little more. All sounds good so far – your uber-cool software provisioning system smacks down a couple of VM images and spins them up…. except it’s a got a problem: connecting the stuff running at provider A to the new stuff you’ve just spun up.

And that’s assuming that you’ve solved all sorts of somewhat evil configuration problems knowing to indicate to a component what it is and who it should be talking to to get its work done. That’s a topic for a different post though.

I don’t have THE answer, but I do have AN answer. Follow the mashup leaders: take the OAuth & REST pattern back into your office, workspace, colleagues, whatever. That’s what we’re doing after all – mashing up some services. It’s just that we normally think mashup means “with someone else’s stuff”.

OAuth and REST work together beautifully and with some coordination can be the answer to providing the answer to the security question “Should I allow this external request to access these resources?” The OAuth 2.0 spec has a segment in it – section 1.4.4 (version 10) – that walks through the flow for this to happen. Routing over SSL does a pretty reasonable job of getting the data encrypted as well. The cost: running an Authorization server.

(You can pull off this same trick with OAuth 1.0: the not-quite-standard-but-defacto “2 legged OAuth” routing. The problem: not everyone and all the libraries agree on how to make that work seamlessly.)

In order to allow our ourselves to start treating everything – infrastructure as well – as software with it’s speed of change we need to be able to dynamically allocate resources. Once we have services that are all chatting across OAuth authorized links (encrypted or not), we’ve removed a huge impediment to being able to elastically scale our services.

Yeah, configuration. Kind of bitch, ain’t it. Another post, eh?

Posted in devops, Geekstuff, Ranting and Reflections | Leave a comment

the CMDB is dead, long live the CMDB

I work in an environment that has an existing CMDB. Over the past year, I’ve spent a fair number of man hours from my team and an equal number of hours of my own thinking about what it is, what we want from it, and how so much of what’s available today just doesn’t cut it.

The thing that we’ll label as a “CMDB”, to me, isn’t. It isn’t a configuration management database. For us, it’s an inventory of assets – digital and physical. It’s a metadata store that allows us to assign ownership, and an ancillary data set that makes categorizing incidents, requests, and changes in the classic service management sense a bit easier across a very wide organization. If any small company came to me and said “yeah, gotta have a CMDB!” I’d be looking very closely at how potentially insane they were. Most small orgs and companies just don’t need it. It’s honestly only useful when you breach some amount of scale.

The worst part is the ITIL definition of what should be in a CMDB has been effectively unachievable because of the costs associated with it. The classic ITIL world of CMDB has this data repository being updated with process (typically manual) as changes are approved – it’s meant to represent the “desired state” of an operational world. Only it doesn’t. Really, it never has. And even with the highest priced tools on the market today never will. At best it’s an audit-against tool that you can see “yeah, it matched or didn’t match when we ran that scan a few days ago”.

It doesn’t have to be this way though. What most of us want from a CMDB is what we get implicitly, to some degree, from many of our monitoring solutions – a digital map of our environment. The monitoring pieces created their own version – typically configured by hand, or sometimes configured automatically with a hand to help guide (Zenoss and Hyperic both do a pretty good job at this). The monitoring systems then use that data model to know who to alert when something goes wrong, or if they’re really good – to share some set of analysis around “service X is down because the component A that it relies on is down”.

Virtualization is pushing this all right over a critical tipping point. The “old” CMDB is dead, lets jump to the new. We need a model of our environment that maps our physical and digital assets. We need it to show us dependencies in an ever increasing world, and we need it to help inform us – especially in a larger organization, who to contact if there’s an issue with a service. If we have to fill out all this data and information by hand, we’re lost. The rate of change is increasing, and we *want it* to increase. Look to the model of continuous deployment, the natural successor to the software development process that is continuous integration. Now in classic #devops style, let’s apply that right on through and into operations and running our services.

What doesn’t exist today, in our collective musings about a DevOps toolchain, are (currently) the tools to integrate the knowledge that the deployment tools have into updating a digital asset model . Even these tools don’t know the dependencies (i.e. what database is this rails app using, which memcache server/port combo is being used, etc) – but it’s there, just slightly under the covers.

The other place where we want/need this knowledge stashed? Our monitoring systems. The continuos tests against our live services to assert they’re OK. Many open source systems include some level of a model just implicitly in their configurations. Nagios, Munin, Zenoss, Hyperic, etc. I am still struggling to find monitoring systems that have the concept of dynamic configuration through an API access built in to the base of them. Still – much of their configuration is something that we might naturally want to store in a map of our services.

The way to get this data? Drive it from the tools that are implementing the changes. Have it as a service that can be updated and modified through simple API’s that Buildout, Func, Capistrano, Fabric, ControlTier, or whatever can access and inform. Use the manifests and details that the system configuration tiers (BCFG, cfengine, Chef, Puppet, SmartFrog) have been built with to populate this map as they deploy and invoke their services.

This is all a step to moving all of our infrastructure, historically so very physical, into the digital world. There are tremendous efficiencies to be gained – both financially (using our physical assets more effectively, or just using what you need from an existing infrastructure provider) and from a service perspective (being able to reconfigure and deploy your services to match the market needs).

Much of devops is focused on deployment, because that’s where we spend most of our time today. That’s good – but we can not forget that it is just one small part of the overall process for these services from inception to retirement.

And before any of the classic CMDB folks find me and start shooting, yes – I’m very aware of the work that is going on at CMDBf around federating CMDBs. The idea there is good – they’re heading in the right direction. At a 10,000 foot view they’re going the right direction with their standard. The foundation that it is built on is – in my opinion – now outdated and needs to be revisited. The implementation needs to be simplified. I would recommend you look at the labels of the members that are coming together around the federated CMDB concept. Do you see anything in there that shouts of “open, adaptable, flexible”? I don’t. I see the same kind of collaboration that led to J2EE standards and the W* standards around “web services”. What is needed is something simpler, more open, and with publicly available implementations. I would never expect BMC, CA, HP, IBM, or Microsoft to help provide that – it’s just not in their best interests when they have revenue tied to the services and software they provide in this space today.

Posted in devops, Geekstuff, Ranting and Reflections | 8 Comments

Django 1.1 Testing and Debugging

I borrowed a copy of Django 1.1 Testing and Debugging from a friend today. Spent the bus ride home from the office flipping through the pages, and I’ve got to say – it’s a pretty darned good book!

If you’re new or newish to Django development, it’s a book that I think would be good to have in your stable. Even with Django 1.2 now released, everything in the book is super relevant, and Karen Tracey has done a really wonderful job of explaining and then showing with details the soup to nuts run of how to do and deal with testing a django application … and more importantly how to debug when things go awry.

The book covers unit tests, doctests, testing through the WSGI interface with the Django unit test extensions, and even driving a website test set with twill. It’s thorough and chock full of examples and walk through goodness.

The debugging is even more detailed – going to a lot of trouble to explain tracebacks, the format of the Django error pages, how to get convenient debug data on your django project (using Django Debug Toolbar), and finally – how to even use the python debugger to step through execution. She goes to a lot of trouble to set up real-world scenarios that are slightly and subtly broken and then walk through the whole process of solving the issue with the tools at hand.

I’ll be keeping John’s copy for another day or two, and then getting my own copy to stack into the reference shelf.

Posted in django, Geekstuff | 1 Comment

Ubuntu 10.04 (Desktop)

I’ve been using Ubuntu as my distribution of choice for VM’s and server instances, and on a lark I took a swag at installing Ubuntu desktop onto a VM yesterday. I’ve got to say, it’s a pretty usable setup.

I still completely prefer the Mac, but the installation was relatively painless, the browser setup pretty good – and installing all the tidbits that I wanted to fiddle with for development was very easy.

I don’t even really know what I’ll use it for – other than something to experiment with and try out. I don’t really need another VM with a desktop interface, but I thought it would be interesting to see where it’d gone.

I recall Mark Shuttleworth (for whom I’ve a great deal of respect) making a comment earlier this year about “this is when Linux takes the desktop”. He might be right no. I recall at the time thinking, “Dude – Apple’s doing this iPad thing and you’ve better change your sights!”. Well, based on what I see in his blog with the “Unity” interface, they’re taking a pretty interesting stab at it. One of the things that I think Apple really has going for it in the tablet space is the multi-touch programming interfaces. They cooked them for several years with the iPhone, and now they’re solid and beautiful to work from a programmer’s and designer’s point of view. I don’t know what Ubuntu or any of the linux distros are doing with the interface space there, but I hope they’re paying close attention to the programming paradigms – that’s what is making the platform so damn powerful.

Now that I spent an afternoon installing it, I think I’ll probably nuke it and get back that 20GB of space, but it was fun to play with…

Posted in django, Geekstuff | 3 Comments