Inside the Nova service framework

In my previous spelunking article, I went over the basic pieces needed to get a nova service stood up. Well okay – I skipped logging – maybe another article later for that later…

Quick recap: The service framework in nova is set up to make it easy to write your own services that interact with any other nova services (such as nova-network, nova-scheduler, etc). The service framework includes all the pieces to communicate with these other services (using a module called rpc.py that abstracts away the communications), a database connection for looking up data from the nova persistence store, and leaves the rest to you.

The service module is expecting to be told a manager class to load, and the framework will use that to do what it needs. There are only two required methods to overwrite:

There are two kinds of services – I’m going to focus on the stand-alone (not WSGI) service that expects to communicate and respond entirely through the message queue system in Nova.

So how does this service critter work? Well, when it’s initialized, it gets a number of attributes assigned to it. This is typically done from a class method on service.py:

from nova.service import Service
my_service = Service.create()

The create() method has a number of parameters:

  • host – a string with the host this service is running on
  • binary – a string with the binary name of this program
  • topic – a string with a subset of the binary name, used to set up a message exchange in AMQP
  • manager – a string with the class name to be loaded to do the ‘work’
  • report_interval – an interval set from Flags, triggering a regular reporting loop
  • periodic_interval – an interval set from Flags, triggering a periodic loop to do repeating tasks

If you don’t provide any of them, all these values get populated with defaults and from configuration detail (the flags) in the service.serve() method. Once serve() gets everything configured up, it calls .start() on each service to kick it into gear.

start() is where things really get moving. This is where the service manager class gets loaded, registered with the nova database (if it isn’t already), and the RPC mechanisms get spun up with Eventlet greenthreads to accept messages to this service. The topic (which is the name of the binary, minus any “nova-” in front of it) is used as an exchange. Through this mechanism, any service can talk to any other service (or set of services). Here’s how that works:

rpc.py has two methods: call() and cast() that do all the heavy lifting. When you use these, they take a “context” (i.e. authorization for who’s doing the call), a topic (the name of the service you’re calling), and a message. call() sends this message and waits for a response. cast() sends the message entirely asynchronously, not expecting a response.

The message is a JSON structure – a dictionary, and it’s expected that the dictionary will have a key “method” and another key “args”. method is expected to be a string, and args is expected to be another dictionary. The rpc module does the work of using that method string to look up and invoke the method on your manager.

An example of this operating is right in the code. In the nova-network API, there’s an rpc.cast():

rpc.cast(context,
         self.db.queue_get_for(context, FLAGS.network_topic, host),
         {'method': 'associate_floating_ip',
          'args': {'floating_address': floating_ip['address'],
                   'fixed_address': fixed_ip['address']}})

Through the service framework and the RPC mechanisms, this is calling associate_floating_ip() on the network service manager class.

The other nifty thing about service is that it’s keeping and managing a number of greenthreads from Eventlet to do it’s work. The basic bits are all encapsulated in that rpc.py mechanism – when it sets up connections to the message queue service to receive communications, that starts a greenthread rolling to watch out for, pull, and process any messages inbound. The two periodic interval pieces are also spun up on their own greenthreads – looping every “interval” (specified by the flags –report_interval and –periodic_interval, set at 10 and 60 seconds by default respectively). These run continuously until the service is terminated.

Posted in Geekstuff, openstack | 1 Comment

Benchmarking Celery

Before you even go there, I’ll preface this with YMMV.

This little post is to document a benchmark that I did for an internal use case, in the hopes that it’ll be helpful for others. As a benchmark, I wasn’t attempting to fully characterize the performance of a specific system – I just wanted a pole against which to measure changes in the environment or underlying infrastructure. I was curious what the performance was of Celery, using RabbitMQ. The tests I ran were pretty much straight sample code of the project (a simple “add” task) and a client making multiple requests and combining the results with the veritable python “timeit“.

The code

All the code for this test (and more, I got excited…) is stashed up on Github: https://github.com/heckj/openstack-benchmarks/tree/CeleryBenchmark. Really, the parts you’re likely to be interested in is the worker class: tasks.py, the configuration: celeryconfig.py, and the actual benchmarking code: celery-benchmark.py.

And before you ask, no – the OpenStack project doesn’t currently use Celery – in fact they use Carrot right now. I’m just intending to add on more benchmarks and profile tools into this codebase around the OpenStack project in the future.

The config

The configuration was held constant – a stock Ubuntu 10.10 server with all the various dependencies installed. Since I’m sure someone will want to know about the versions:

  • rabbitmq-server 1.8.0-1ubuntu2
  • python2.6 2.6.6-5ubuntu1
  • python-amqplib 0.6.1-1
  • celery 2.2.7
  • kombu 1.1.6
  • anyjson 0.3.1

The host was a Shuttle PC, 8GB ram, Core 2 Duo processors. The host was never heavily burdened by the processing that took place (load < 1.0, no significant swaping). The benchmarking was done on the same host as RabbitMQ to remove any network latency effects.

The results

.

I was totally abusing MS Excel’s “stock graph” to show variability in the results (of which there wasn’t a hell of a lot). In the graph, the thin line represents the range (min to max) and the thicker box in the middle is standard deviation +/- the average result. The gist – the round trip time was pretty much straight up at 160ms per requests, and that sampled over 1,000,000 requests. The image above shows a portion of the sequence. The relevant code:

from benchmark.celerybench.tasks import add
 
result = add.apply_async(args=[4, 4]_
result.get()

(As I mentioned earlier, you can see the whole code on github).

I did more tests, but I need to keep some of those to myself, as they’re testing variations of configurations for my job.

Random side notes

I hadn’t done anything in depth with Celery before. I’d heard about it from friends, and in the community in general. The author pinged me a couple of times with help as well. Overall, I found the Celery setup to be incredibly easy to use and a very straightforward API (always nice). There were lots of options available, but everything was set with very usable defaults from the start. I’m totally looking forward to using Celery in some projects, as well as taking advantage of Kombu – a drop-in/compatibility layer for Carrot.

Update:

Ask mentioned some suggestions for optimizing in twitter – seemed a good place to put them. Try:

  • CELERYD_PREFETCH_MULTIPLIER=0
  • CELERY_DISABLE_RATE_LIMITS=True
  • and BROKER_TRANSPORT=”librabbitmq” to use the pylibrabbitmq C library
Posted in devops, django, Geekstuff | 1 Comment

Spelunking Nova – flags and services

I’ve been doing a lot of spelunking into the nova codebase, digging around and trying to learn some of the under pinnings. Some of these pieces were a bit confusing to me, so I’m stashing them up here for Google to find and share with others in the future.

Before I dive into the gritty details, it’s worth getting a high level overview so that some of this (hopefully) makes sense. OpenStack’s service architecture is made up of services that all talk with each other to get things done. nova-network, nova-scheduler, etc. There’s a lot of underpinning in the nova codebase to make those services relatively easy to write and work together – I was mostly curious about how they passed messages back and forth. As I dove in, the two pieces that stood out as needing to be understood first were the unified service framework in nova and configuration using flags (which it heavily depends upon).

Configuration – using the flags

The configuration for nova services – global or specific to a service – are all done with configuration files that can be over-ridden on the command line, taking advantage of python gflags to make it all work nice. I didn’t know much about the flags system, so I dug around in the python-gflags project. They have the documentation for how to use gFlags in the code itself: http://python-gflags.googlecode.com/svn/trunk/gflags.py.

To summarize it up:

Python modules in the codebase can define and use flags, and there is a general nova flags file that holds the cross-service (common) configuration settings. Nova defaults to looking for it’s configuration in a nova.conf file in the local directory. Failing that it looks for the nova.conf file in /etc/nova/nova.conf. Where it looks for the configuration file can be overridden (typically on the command line) by (--flagfile) and a location to a config file. The code that makes this happen nova.utils.default_flagfile().

To use the configuration from within code, you typically instantiate the global flags, add any flag definitions (with default values) that you care to add, and then use ‘em! Here’s a code snippet example:

from nova import flags
# import the nova wrapper around python gflags
#  .. there's some interesting wrapping for taking in arguments
#     and passing along extras values to your code
#  .. and it's where the global flag definitions reside
FLAGS = flags.FLAGS
# get the global instance
#  .. this attempts to read the /etc/nova/nova.conf for flags
# 
# You can define an additional flag here if you needed to...
flags.DEFINE_string('my_flag', 'default_value', \
        'human readable description of your flag')
# there's also flags.DEFINE_bool, flags.DEFINE_integer and more...
# 
# And then you can use the flags
#  .. the flags you defined show up as attributes 
#     on that FLAGS object
print FLAGS.my_ip

If you were happening to write a script that took in flags and worked with them for a command-line script, you might do something like:

from nova import flags
form nova import utils
utils.default_flagfile()
flags.FLAGS(sys.argv)
GLOBAL_FLAGS = flags.FLAGS
# ... and on to the rest of your code

There is some good end-user documentation on how to find the flags. The gist is – if you want to know what flags are there, the easiest way is to hand in the flag “–help” or “–shorthelp” from the command line. That is how the gFlags library is set up to tell you about the flags.

Update:

After a little digging down a side passage, I noticed that service.py had some debugging code in it that iterated through all the set flags. You iterate directly on FLAGS (treating it as an iterable thing) and use FLAGS.get() to retrieve the set values.

    logging.debug(_('Full set of FLAGS:'))
    for flag in FLAGS:
        flag_get = FLAGS.get(flag, None)
        logging.debug('%(flag)s : %(flag_get)s' % locals())

Services

There are two types of services in Nova: system services and web services. The code to use and launch them is basically the same, and Nova has this all bundled into a general service architecture and code base. The reason that configuration is so important is that the nova service framework has a convention of knowing how to run a service based on flags from the framework.

Here’s a bit of example code of a service to illustrate what I’m talking about.

nova-exampleservice:

import eventlet
eventlet.monkey_patch()
 
import sys
from nova import flags
from nova import service
from nova import utils
 
if __name__ == '__main__':
    utils.default_flagfile()
    flags.FLAGS(sys.argv)
    service.serve()
    service.wait()

The convention starts off by using the name of the script invoked – in this case “nova-exampleservice”. The scripts in bin/ (like nova-network) use this mechanism. This convention can be overridden, of course, but it does make things pretty straightforward once you know the convention. The key to this convention is that the code in nova.service looks in the configuration for a class to instantiate (expected to be a subclass of nova.manager.Manager) named after the service that was just invoked. (this convention is in code under the nova.service.create() method)

For our example of nova-exampleservice, the service is going to look in the configuration for exampleservice_manager, expecting the value to be a class that it can load that will be a subclass of nova.manager.Manager and will be responsible for running the service.

This code is invoked from service.serve() from our example above. Again, it looks for the flag “exampleservice_manager” and try to load that class to do the work.

An updated example that sets a default manager that will attempt to load the class mymodule.exampleservice.ExampleServiceManager by default:

nova-exampleservice:

import eventlet
eventlet.monkey_patch()
 
import sys
from nova import flags
from nova import service
from nova import utils
 
if __name__ == '__main__':
    utils.default_flagfile()
    flags.FLAGS(sys.argv)
    flags.DEFINE_string('exampleservice_manager',
            'mymodule.exampleservice.ExampleServiceManager',
            'Default manager for the nova-exampleservice')
    service.serve()
    service.wait()

The manager class has two classes that you override to get your stuff done:

  • init_host
  • periodic_tasks

There are also some conventions around adding methods to your manager and invoking them using the service framework’s RPC mechanism, which I’ll dig into with another post.

Ref: Nova Developer Documentation
Ref: OpenStack Compute (Nova) Administration Manual
Ref: Openstack Wiki: Unified Service Architecture

Posted in Geekstuff, openstack | 5 Comments

Sunny summer mornings

Its been ages since I wrote here, and its time to get back into that a bit. Since January I’ve switched jobs, which I have found to be immensely refreshing. My teams at Disney accomplished truely amazing things, including an incredibly innovative internal cloud hosting architecture and production support of a central hadoop instance and breaking through the learning curves on operationalizing a multi-tenant Hadoop cluster.

April started showing me a need to change things up. There’s not too much to say about the whys and wherefores of deciding to leave Disney that are relevant (or appropriate) for a public forum like my blog. Suffice to say the impetus hit. In late April I attended the second OpenStack design summit in Santa Clara, and then a vacation up to Alaska to settle down and reset. At the end of that, I was ready and took on a new position at a new company where I can combine a number of pieces that Im passionate about: DevOps and OpenStack. I guess that misses out a bit on the Mac/iPhone development, but it would be an immensely rare gig that could cover all three of those.

I can’t yet speak of the new company, save to say that it is exciting, challenging, and I really enjoy the crew that I’m working with. We are all top-secret and stealth startup at the moment… although the word should be coming out at OSCON 2011 with some pretty amazing announcements. In the mean time, I’m very happy to be back in the middle of actively working on open source projects – most specifically OpenStack.

Posted in Geekstuff | Leave a comment

javascript everywhere

There’s an interesting fast iteration in language efforts happening around javascript. It’s been pretty much dominating the client side arena for ages, started to make some interesting headway into server side with Node.js, but now I am starting to see even greater depth in the languages toolchain.

It all revolves around the development toolchain, and the base components to cobble together full desktop-UI quality applications. This happens to overlap extensively with the server side world too – but the details started really becoming clear after digging a bit into Sencha, SproutCore, and JavascriptMVC. And it’s Javascript MVC that stands out to me.

If you look at the home page for the site, it presents itself as a collection of tools and mechanisms to do a full-bore standard application:

  • core generators
  • dependency management
  • build scripts
  • testing
  • templating
  • code cleaning and linting

and the list goes on. All of those components in javascript, for javascript. It’s still incredibly fragmented as a language, but the parallel efforts moving forward to drive automated testing, including unit testing on the server side. Node has it’s own built-in stuff, but there’s also a lot of effort to bring QUnit from the JQuery world into that same server side world.

Python and Ruby still have a dominant hold on the server side development, but I wonder for how much longer…

Posted in Geekstuff, Ranting and Reflections | 2 Comments

A week with the fitbit

I first heard about fitbit from a coworker who used it to track his daily running and activity. He showed it to me, and it’s a pedometer with a bit more – it wireless sync’s to a base station (laptop or desktop computer) and tracks your activity level as you wear it. It’s small and easy to tuck away – I’ve been wearing it under the color of my shirt, and it’s been pretty effective.

I had a previous pedometer (kept running out of battery), but I’m finding that after a week with the fitbit, the big difference is the dashboard that it automatically keeps and maintains. You can sign up for the dashboard even without having one – although without the fitbit what it really amounts to is an online journal for tracking food intake, weight, and the other kinds of common measurements. With the fitbit, and some of the data going in on a regular basis, it’s really become a bit more of a personal dashboard.

I’ve been trying to track food intake (i.e. calories) and weight manually. That’s going reasonably well, although I find it rather tricky to track calories for some of the food I eat – there isn’t an easy lookup. Ironically, getting breakfast at a health club cafe last weekend was the hardest – they couldn’t tell me anything about the nutritional value of what I ate. Ironic that it was a health club too…

The fitbit cost $99, and I frankly bought it on a whim because I saw other people saying great things about it. I wanted to check it – having liked having a pedometer before – and I’ve got to say, this particular gadget buy has really worked out. I’m now much more aware of how much I’m walking (and how much I’m not walking), and it’s even doing some numbers like “how many calories I’ve burned” so that making attempts to track how much I’m taking in with food have something nice to balance against.

I’m pretty happy with the first week – and I’d recommend it to anyone wanting to get a track on their day to day activity. Now we’ll see what happens after a few weeks…

Posted in Geekstuff | 2 Comments

Site moved and updated

I’ve a little bit of additional work to call it 100% complete, but the site is now available (as you’ve likely noticed) and updated internally. Still wordpress, but shifted to a new hosting location. My previous one decided to disable PHP without a whole lot of notice, and it took me a while to get everything back in order.

Posted in Ranting and Reflections | 2 Comments

jquerymobile and having design constraints

I’ve been working on this idea/project called Eyes for the past year on and off. Maybe for the past 4 to 6 months I’ve been stymied by what I want the visual representation to look like, struggling with the options of an open HTML design canvas. I’m afraid the sheer potential of anything there left me struggling – there were too many ideas of what I could do for a visual presentation and interaction, and I wasn’t able to really narrow it down all that well.

I’ve been following SproutCore for a while, sort of thinking about that as an interesting mechanism, and then more recently I saw the announcements about the jQueryMobile framework setup. Having done some iPhone and iPad development, the common metaphors of list views, tableviews, etc. felt very familiar. jQueryMobile isn’t all that to be perfectly honest, but the basic design feel is reasonably close.

I made a branch on Eyes and started laying it out – and I’ve been going gangbusters since. Laying in the constraints of the framework has really made the choices much easier to work out. Not that I haven’t run down some dead ends and had to back out, but I feel like I have better sense of how the pages flow and work together.

And best of all, I have something that I’m comfortable will work on mobile devices, while not looking like crap on a larger view.

I’m still learning the various tidbits of the framework CSS stylings, and I’m already seeing some elements where I’m going to want to dig deeper. The only real struggle I’ve had is working with Ajax based forms submissions and debugging them when I bork it up. Doing so much javascript work and debugging is still pretty unfamiliar to me. I’m ending up with a lot of print(“..”) debugging on the back end, and alert(…) debugging in the javascript itself.

All in all, I’m very pleased with the effects and results of using jQueryMobile. I do rather wish I hadn’t left my jQuery book at work for the holiday weekend though – I’m certainly not a jQuery expect, and it would have come in very handy more than once already. Thank god for excellent online documentation.

Posted in django, Geekstuff, Ranting and Reflections | 2 Comments

Eyes – a new monitoring system

It was over a year ago that I started getting really annoyed at the state of monitoring systems. They all do what you sort of expect a monitoring system to do – watch (poll) systems and alert you when something’s gone off. Pretty much anyone who’s done much system administration work knows the obvious critters: Nagios, Munin, Cacti, Zenoss. Zenoss is the start into the pay realm too – GroundWorks, BMC Patrol, HP’s SiteScope, etc.

So here’s my annoyance – all of these systems, even the open source ones, are really set up to be managed by people, not other systems. They’re not built with API’s to be able to create, update, check on, and delete monitors. Some of them come darned close – Zenoss, SiteScope, etc. Others have this sort of worked around from the back side – i.e. Puppet or Chef recipes that generate Nagios configuration files.

So a year ago I decided that I could probably put something together that does the same basics, but has API’s built in from scratch. I did a lot of noodling on the idea, scratching ideas in notebooks and such, before I really kicked things off. Then I decided to see what I could do to wrap up a new system. I built it around Nagios plugins – mostly because there’s a lot of them and they do some pretty good stuff right off the bat. And that’s all open source as well. And I wrapped that around with a web application based on Django, because – well – I know Django reasonably well. The result is a basic system that I’m calling “Eyes”.

The whole source, from the very beginning, is on Bitbucket at http://bitbucket.org/heckj/eyes. And after poking at it on and off for nearly 12 months, I decided to wrap this up a bit and get it out there – so as of today I have my first release on PyPi, with documentation. And I created a mailing list – eyes-monitoring – on Google Groups to have some place to talk around it.

I very intentionally included documentation for the project from the very beginning, embedded with the project. I also very intentionally wrapped a large number of tests around the project, and have been checking and watching test coverage as the project grows and changes. The state today? Basically functional, but you have to know the innards to do much more with it.

Today it doesn’t have anything related to alerting or escalation within it. It doesn’t have much of anything around a user interface, and the data is stored all in RRD files. It’s at the point now where the basic framework is in place – and there’s a lot more to go: design of the user interface, blocking in alternate mechanisms to do monitoring, and fleshing out higher level features like notifications and alerts – both via email and alternative mechanisms (like web-hooks, or XMPP event streams).

What it is ready for is outside eyes. If you’re interested in contributing, or even sharing some ideas about what to do with Eyes, I’d love to hear from you. Fork the code and try it out, or join the mailing list (http://groups.google.com/group/eyes-monitoring) and share some of your ideas or thoughts.

Posted in devops, Geekstuff | 2 Comments

Nearly at the top of that first hill

I’ve been thinking about the past week at the OpenStack Design Summit (Bexar) solidly from last night (flying home from San Antonio, TX) through the various errands I’ve been running today. This morning Rick Clark tweeted “A question about OpenStack”. As I think about it, this shouldn’t be about what is going right and wrong, but where the project is and what will provide the most benefit by improving it.

I’m saying all this after a week with the OpenStack guys – both in design sessions and just chillin’ out. Focused, intelligent, demanding conversations scattered through the week with an amazing “no-ego” attitude presenting itself. Not that there weren’t some good ole technical “best way to do it” or “which is better” fights, but given the breadth of this project and the open nature with vendors lurking all around the corners – well, frankly I expected a lot more “special interest” to be clearly showing itself. Everyone at that conference was interested in making OpenStack better at every turn.

250 people, 12 countries, 90 companies/organizations – all that after 3 months from being publicly announced. And they’re going it without any prior structure – building up an OpenStack foundation, doing all the legal and community building, right from scratch. And yeah – that’s showing right now.

The first thing I see that will provide the biggest gains:

  • “How do we all work together?”

Some of the best sessions were around “What does the status X of bugs mean” and talking through the development and release process. At this point I’m convinced the core folks are reasonably comfortable with LaunchPad (the platform the system is hosted on) – and being at the conference really taught me a great deal about how OpenStack is effectively using it. Prior, it wasn’t comfortable or familiar to me at all. The object store and compute (swift and nova, respectively) core groups are really quite separate teams, all trying to figure out how to get some common ground in re-using code, libraries, and even setting up documentation.

The second:

  • “Show me it’s workin’, again and again”

OpenStack is quickly heading to be the kernel or core of a platform. You could see it in the twinkle of Eucalyptus’ eye when they talked about Swift (the object store), or chatting with the folks from Scalr or RightScale. The whole system is being built with API in mind from the ground up, and while there is some pretty good unit testing in play and continuous integration, it was clear that installing this sucker was a PITA – and the documentation to really pull that all together starting coming together in the documentation sprint and install fest at the summit. One of the “blueprints” of the design summit (i.e. “Things we want to do, and how we want to do it for the next release”) is to get some fully automated integration testing as well as track the metrics on how the system is operating. There were a lot of folks that have some cross over into the Drizzle project, and the ideas of running and tracking benchmark data on every revision is darned power.

Add to that the benefits of a constant flow of functional testing against a couple of pre-defined clusters of both compute and object store, and you have a powerful engine to make sure trouble is spotted early and can be resolved quickly.

The third:

  • “How’s this thing tick?”

One of the admitted weak points is that some small, damned effective core teams have done most of the work – and if you want to understand the system, well… you’ve just got to read the code. That is a huge investment – and frankly a barrier to entry into the project that can be avoided with some effort towards docs and discussion. Again, great progress was made there (I learned what the “project” concept was in Nova at the summit) – but the interactions between components, what the components are responsible for, and what they’re *not* responsible for, are all kind of tricky to learn right now.

This extends down into digging into the code, where docstrings could be better (and are getting better!) so that if you wanted to go help with something specific, you didn’t have to grok a broad codebase to get a handle on what the impacts are of the changes you’ll need to make.

And the last thing I’ll throw in here:

  • “What OpenStack isn’t, or won’t do…”

The project is still in a lot of flux. There were some great components that were shown off at the summit that ride over the top of the infrastructure, or work with it through APIs. Should those be a part of OpenStack, or on the side? Some service providers were very interested in more platform-kind of elements – a common logging infrastructure, a common authentication, ID, and authorization infrastructure. Should that be a part, or on the side? How tightly or loosely do we want to couple some of these elements? The philosophy is there and forming up, but the real truth of it all will be over the next 6-12 months of the project when decisions are made, reviewed, and a core forms out of it. There have been a few architectural decisions made early: “Don’t mandate anything in the client”, “If a feature would restrict scale, it MUST be optional”, etc. that I absolutely applaud. I think it will form up more as projects apply to join the OpenStack umbrella and either make it or don’t. It will become clear what’s common, and what isn’t, pretty darn quickly.

I’m pumped about this project, the people, and it’s future. The core openstacker’s have clearly been driving up a steep hill to get to where they can level out a bit and move into more of a marathon mode. Really, it feels like we’re nearly at the top of that first hill.

Posted in Geekstuff, openstack, Ranting and Reflections | 1 Comment