Planet Linux Australia

Syndicate content
Planet Linux Australia -
Updated: 23 min 35 sec ago

Clinton Roy: clintonroy

Tue, 2015-01-13 22:28


First day of miniconfs, I spent some of my time at the kernel miniconf and some at the Debian miniconf.

That night the ghosts dinner was on, caught up with a couple of Melbourne friends.

Filed under: diary

Simon Lyall: 2015 – Day 2 – Session 3 – Sysadmin

Tue, 2015-01-13 15:28

Alerting Husbandry – Julien Goodwin

  • Obsolete alerts
    • New staff members won’t have context to know was is obsolete and should have been removed (or ignorened)
  • Unactionable alerts – It is managed by another team but thought you’d like to be woken up
  • SLA Alerts – can I do something about that?
  • Bad thresholds ( server with 32 cores had load of 4 , that is not load ), Disk space alerts either too much or not enough margin
  • Thresholds only redo after complete monitoring rebuilds
  • Hair trigger alerts ( once at 51ms not 50ms )
  • Not impacting redundancy ( only one of 8 web servers is down )
  • Spamming alerts, things is down for the 2925379857 time. Even if important you’ve stopped caring
  • Alerts for something nobody cares about, eg test servers
  • Most of earlier items end up in “don’t care” bucket
  • Emails bad, within a few weeks the entire team will have a filter to ignore it.
  • Undocumented alerts – If it is broken, what am I supposed to do about it?
  • Document actions to take in  “playbook”
  • Alert acceptance practice, only oncallers should e accepting alerts
  • Need a way to silence it
  • Production by Fiat



Managing microservices effectively – Daniel Hall

  • Step one – write your own apps
  • keep state outside apps
  • not nanoservices, not milliservices
  • Each should be replaceable, independantly deployable , have a single capability
  • think about depandencies, especially circular
  • Packaging
    • small
    • multiple versions on same machine
    • in dev and prod
    • maybe use docker, have local registry
    • Small performance hit compared to VMs
    • Docker is a little immature
  • Step 3 deployment
    • Fast in and out
    • Minimal human interaction
    • Recovery from failures
    • Less overhead requires less overhead
    • We use Meso and marathon
    • Marathon handles switches from old app to new, task failure and recover
    •  Early on the Hype Cycle
  • Extra Credit Sceduling
    • Chronos within Mesos
    • A bit newish


Corralling logs with ELK – Mark Walkom

  • You don’t want to be your bosses grep
  • Cluster Elastisearch, single master at any point
  • Sizing best to determine with single machine, see how much it can hadle. Keep Java heap under 31GB
  • Lots of plugins and clients
  • APIs return json. ?pretty makes it looks nicer. The ” _cat/* ” api is more command line
  • new node scales, auto balancers and grows automatic
  • Logstash. lots of filters, handles just about any format, easy to setup.
  • Kibana – graphical front end for elastisearch
  • Curator, logstash-forwarder, grokdebugger

FAI — the universal deployment tool – Thomas Lange

  • From power off to applications running
  • It is all about installing software packages
  • Central administration and control
  • no master or golden image
  • can be expanded by hooks
  • plan your installation and FAI installs the plan
  • Boot up diskless client via PXE/tftp
  • creates partitions, file systems, installs, reboots
  • groups hosts by classes, mutiple classes per host etc
  • Classes can be executables, writeing to standard output, can be in shell, pass variables
  • partitioning, can handle LVM, RAID
  • Projected started in 1999
  • Supports debian based distributions including ubuntu
  • Supports bare metal, VM, chroot, LiveCD, Golden image


Documentation made complicated – Eric Burgueno

  • Incomplete, out of date, inconsistent
  • Tools – Word, LibreOffice  -> Sharepoint
  • Sharepoint = lets put this stuff over here so nobody will read it ever again
  • txt , markdown, html. Need to track changes
  • Files can be put in version control.
  • Mediawiki
  • Wiki – uncontrolled proliferation of pages, duplicate pages
  • Why can’t documentation be mixed in with the configuration management
  • Documentation snippits
    • Same everywhere (mostly)
    • Reusable
  • Transclusion in mediawiki (include one page install another)
  • Modern version of mediawiki have parser functions. display different content depending on a condition

Simon Lyall: 2015 – Day 2 – Session 2 – Sysadmin Miniconf

Tue, 2015-01-13 13:28

Mass automatic roll out of Linux with Windows as a VM guest – Steven Sykes

  • Was late and missed the start of the talk

etcd: distributed locking and service discovery – Brandon Philips

  • /etc distributed
  • open source, failure tolerant, durable, watchable, exposed via http, runtime configurable
  • API – get/put/del  basics plus some extras
  • Applications
    • Locksmith, distributed locks used when machines update
    • Vulcan http load balancer
  • Leader Election
    • TTL and atomic operations
    • Magical stuff explained faster than I can type it.
    • Just one leader cluster-wide
  • Aims for consistence ahead of raw performance


Linux at the University – Randy Appleton

  • No numbers on how many students use Linux
  • Peninsula Michigan
  • 3 schools
  • Michigan Tech
    • research, 7k students, 200CS Students, Sysadmin Majors in biz school
    • Linux used is Sysadmin courses, one of two main subjects
    • Research use Linux “alot”
    • Inactive LUG
    • Scripting languages. Python, perl etc
  • Northern Michigan
    • 9k students, 140 CS Majors
    • Growing CIS program
    • No Phd Programs
    • Required for sophomore and senior network programming course
    • Optional Linux sysadmin course
    • Inactive LUG
    • Sysadmin course: One teacher, app of the week (Apache, nfs, email ), shell scripting at end, big project at the end
    • No problem picking distributions, No problem picking topics, huge problem with desperate incoming knowledge
    • Kernel hacking. Difficult to do, difficult to teach, best students do great. Hard to teach the others
  • Lake Superior State
    • 2600 students
    • 70 CS Majors
    • One professor teaches Sysadmin and PHP/MySQL
    • No LUG
    • Not a lot of research
  • What is missing
    • Big power Universities
    • High Schools – None really
    • Community college – None really
  • Usage for projects
    • Sometimes, not for video games
  • Usage for infrastructure
    • Web sites, ALL
    • Beowuld Clusters
    • Databases – Mostly
  • Obstacles
    • Not in High Schools
    • Not on laptops, not supported by Uni
    • Need to attract liberal studies students
    • Is Sysadmin a core concept – not academic enough
  • What would make it better
    • Servers but not desktops
    • Not a edu distribution
    • Easier than Eclispe , better than visual studio

Untangling the strings: Scaling Puppet with inotify – Steven McDonald

  • Around 1000 nodes at site
  • Lots of small changes, specific to one node that we want to happen quickly
  • Historically restarting the puppet master after each update
  • Problem is the master gets slow as you scale up
  • 1300 manifests, takes at least a minute to read each startup
  • Puppet internal caching very coarse, per environment basis (and they have only one prod one)
  • Multiple environments doesn’t work well at site
  • Ideas – tell puppet exactly what files have changed with each rollout (via git, inotify). But puppet doesn’t support this
  • I missed the explan of exactly how puppet parses the change. I think it is “import” which is getting removed in the future
  • Inotify seemed to be more portable and simpler
  • Speed up of up to 5 minutes for nodes with complex catalogs, 70 seconds off average agent run
  • implementation doesn’t support the future parser, re-opening the class in a seperate file is not supported
  • Available on github. Doesn’t work with current ruby-inotify ( in current master branch )



Russell Coker: Systemd Notes

Tue, 2015-01-13 11:26

A few months ago I gave a lecture about systemd for the Linux Users of Victoria. Here are some of my notes reformatted as a blog post:

Scripts in /etc/init.d can still be used, they work the same way as they do under sysvinit for the user. You type the same commands to start and stop daemons.

To get a result similar to changing runlevel use the “systemctl isolate” command. Runlevels were never really supported in Debian (unlike Red Hat where they were used for starting and stopping the X server) so for Debian users there’s no change here.

The command systemctl with no params shows a list of loaded services and highlights failed units.

The command “journalctl -u UNIT-PATTERN” shows journal entries for the unit(s) in question. The pattern uses wildcards not regexs.

The systemd journal includes the stdout and stderr of all daemons. This solves the problem of daemons that don’t log all errors to syslog and leave the sysadmin wondering why they don’t work.

The command “systemctl status UNIT” gives the status and last log entries for the unit in question.

A program can use ioctl(fd, TIOCSTI, …) to push characters into a tty buffer. If the sysadmin runs an untrusted program with the same controlling tty then it can cause the sysadmin shell to run hostile commands. The system call setsid() to create a new terminal session is one solution but managing which daemons can be started with it is difficult. The way that systemd manages start/stop of all daemons solves this. I am glad to be rid of the run_init program we used to use on SE Linux systems to deal with this.

Systemd has a mechanism to ask for passwords for SSL keys and encrypted filesystems etc. There have been problems with that in the past but I think they are all fixed now. While there is some difficulty during development the end result of having one consistent way of managing this will be better than having multiple daemons doing it in different ways.

The commands “systemctl enable” and “systemctl disable” enable/disable daemon start at boot which is easier than the SysVinit alternative of update-rc.d in Debian.

Systemd has built in seat management, which is not more complex than consolekit which it replaces. Consolekit was installed automatically without controversy so I don’t think there should be controversy about systemd replacing consolekit.

Systemd improves performance by parallel start and autofs style fsck.

The command systemd-cgtop shows resource use for cgroups it creates.

The command “systemd-analyze blame” shows what delayed the boot process and

systemd-analyze critical-chain” shows the critical path in boot delays.

Sysremd also has security features such as service private /tmp and restricting service access to directory trees.


For basic use things just work, you don’t need to learn anything new to use systemd.

It provides significant benefits for boot speed and potentially security.

It doesn’t seem more complex than other alternative solutions to the same problems.

Related posts:

  1. systemd – a Replacement for init etc The systemd projecct is an interesting concept for replacing init...
  2. Some Notes on DRBD DRBD is a system for replicating a block device across...
  3. licence for lecture notes While attending LCA it occurred to me that the lecture...

Simon Lyall: – Day 2 – Session 1 – Sysadmin Miniconf

Tue, 2015-01-13 10:28

Configuration Management – A love Story – Javier Turegano

  • June 2008 – Devs want to deploy fast
  • June 2009 – git -> jenkins -> Puppet master
  • But things got pretty complicated and hard to maintain
  • Remove puppet master, puppet noop, but only happens now and then lots of changes but a couple of errors
  • Now doing manual changes
  • June 2010 – Thngs turned into a mess.
  • June 2011 – Devs want prod-like development
  • Cloud! Tooling! Chef! – each dev have their own environment
  • June 2012 – dev environments for all working in ec2
  • dev no longer prod-like. cloud vs datacentre, puppet vs chef , debian vs centos, etc
  • June 2013 – More into cloud, teams re-arranged
  • Build EC2 images and deploy out of jenkins. Eaither as AMI or as rpm
  • Each team fairly separate, doing thing different ways. Had guilds to share skills and procedures and experience
  • June 2014 – Cloudformation, Ansible used by some groups, random

Healthy Operations – Phil Ingram

  • Acquia – Enterprise Drupal as a service. GovCMS Australian Federal Government. 1/4 are remote
  • Went from working in office to working from home
  • Every week had phone call with boss
  • Talk about thing other than with work, ask home people are going, talk to people.
  • Not sleep, waking up at night, not exercising, quick to anger and negative thinking, inability to concentrate
  • Hadn’t taken more than 1 week off work, let exercise work, hobbies was computer stuff
  • In general being in Ops not as much of an option to take time off. Things stay broke until fix
  • Unable to learn via Osmosis, Timing of handing over between shifts
  • People do not understand that computers are run by people not robots
  • Methods: Turn work off at the end of the day, Rubber Ducking, exercise

Developments in PCP (Performance Co-Pilot) : Nathan Scott

  • See my slides from yesterday for intro to PCP
  • Stuff in last 12 months
    • Included in supported in RHEL 6.6 and RHEL 7
    • Regular stable releases
    • Better out of the box experience
    • Tackling some long-standing problems
  • JSON access – pmwebd , interactive web charts ( Graphite, grafana )
  • zero-install look-inside containers
  • Docker support but written to allow use by others
  • Collectors
    • Lots of new kernel metrics additions
    • New applications from web devs (memcached, DNS, web )
    • DB server additions
    • Python PMDA interfaces
  • Monitor work
    • Reporting tools
    • Web tools, GUIs
  • Also improving ease of setup
  • Getting historical data from sar, iostat

Security options for container implementations – Jay Coles

  • What doesn’t work: rlimits, quotas, blacklisting via ACLs
  • Capabilities: Big list that containers probably shouldn’t have
  • Cgroups – Accounting, Limiting resource usage, tracking of processes, preventing/allowing device access
  • App Armor vs selinux – Use at least one, selinux a little more featured

Michael Still: Kilo Nova deploy recommendations

Tue, 2015-01-13 09:28
What would a Nova developer tell a deployer to think about before their first OpenStack install? This was the question I wanted to answer for my OpenStack miniconf talk, and writing this essay seemed like a reasonable way to take the bullet point list of ideas we generated and turn it into something that was a cohesive story. Hopefully this essay is also useful to people who couldn't make the conference talk.

Please understand that none of these are hard rules -- what I seek is for you to consider your options and make informed decisions. Its really up to you how you deploy Nova.

Operating environment

  • Consider what base OS you use for your hypervisor nodes if you're using Linux. I know that many environments have standardized on a given distribution, and that many have a preference for a long term supported release. However, Nova is at its most basic level a way of orchestrating tools packaged by your distribution via APIs. If those underlying tools are buggy, then your Nova experience will suffer as well. Sometimes we can work around known issues in older versions of our dependencies, but often those work-arounds are hard to implement (and therefore likely to be less than perfect) or have performance impacts. There are many examples of the problems you can encounter, but hypervisor kernel panics, and disk image corruption are just two examples. We are trying to work with distributions on ensuring they back port fixes, but the distributions might not be always willing to do that. Sometimes upgrading the base OS on your hypervisor nodes might be a better call.
  • The version of Python you use matters. The OpenStack project only tests with specific versions of Python, and there can be bugs between releases. This is especially true for very old versions of Python (anything older than 2.7) and new versions of Python (Python 3 is not supported for example). Your choice of base OS will affect the versions of Python available, so this is related to the previous point.
  • There are existing configuration management recipes for most configuration management systems. I'd avoid reinventing the wheel here and use the community supported recipes. There are definitely resources available for chef, puppet, juju, ansible and salt. If you're building a very large deployment from scratch consider triple-o as well. Please please please don't fork the community recipes. I know its tempting, but contribute to upstream instead. Invariably upstream will continue developing their stuff, and if you fork you'll spend a lot of effort keeping in sync.
  • Have a good plan for log collection and retention at your intended scale. The hard reality at the moment is that diagnosing Nova often requires that you turn on debug logging, which is very chatty. Whilst we're happy to take bug reports where we've gotten the log level wrong, we haven't had a lot of success at systematically fixing this issue. Your log infrastructure therefore needs to be able to handle the demands of debug logging when its turned on. If you're using central log servers think seriously about how much disks they require. If you're not doing centralized syslog logging, perhaps consider something like logstash.
  • Pay attention to memory usage on your controller nodes. OpenStack python processes can often consume hundreds of megabytes of virtual memory space. If you run many controller services on the same node, make sure you have enough RAM to deal with the number of processes that will, by default, be spawned for the many service endpoints. After a day or so of running a controller node, check in on the VMM used for python processes and make any adjustments needed to your "workers" configuration settings.

  • Estimate your final scale now. Sure, you're building a proof of concept, but these things have a habit of becoming entrenched. If you are planning a deployment that is likely to end up being thousands of nodes, then you are going to need to deploy with cells. This is also possibly true if you're going to have more than one hypervisor or hardware platform in your deployment -- its very common to have a cell per hypervisor type or per hardware platform. Cells is relatively cheap to deploy for your proof of concept, and it helps when that initial deploy grows into a bigger thing. Should you be deploying cells from the beginning? It should be noted however that not all features are currently implemented in cells. We are working on this at the moment though.
  • Consider carefully what SQL database to use. Nova supports many SQL databases via sqlalchemy, but are some are better tested and more widely deployed than others. For example, the Postgres back end is rarely deployed and is less tested. I'd recommend a variant of MySQL for your deployment. Personally I've seen good performance on Percona, but I know that many use the stock MySQL as well. There are known issues at the moment with Galera as well, so show caution there. There is active development happening on the select-for-update problems with Galera at the moment, so that might change by the time you get around to deploying in production. You can read more about our current Galera problems on Jay Pipe's blog .
  • We support read only replicas of the SQL database. Nova supports offloading read only SQL traffic to read only replicas of the main SQL database, but I do no believe this is widely deployed. It might be of interest to you though.
  • Expect a lot of SQL database connections. While Nova has the nova-conductor service to control the number of connections to the database server, other OpenStack services do not, and you will quickly out pace the number of default connections allowed, at least for a MySQL deployment. Actively monitor your SQL database connection counts so you know before you run out. Additionally, there are many places in Nova where a user request will block on a database query, so if your SQL back end isn't keeping up this will affect performance of your entire Nova deployment.
  • There are options with message queues as well. We currently support rabbitmq, zeromq and qpid. However, rabbitmq is the original and by far the most widely deployed. rabbitmq is therefore a reasonable default choice for deployment.

  • Not all hypervisor drivers are created equal. Let's be frank here -- some hypervisor drivers just aren't as actively developed as others. This is especially true for drivers which aren't in the Nova code base -- at least the ones the Nova team manage are updated when we change the internals of Nova. I'm not a hypervisor bigot -- there is a place in the world for many different hypervisor options. However, the start of a Nova deploy might be the right time to consider what hypervisor you want to use. I'd personally recommend drivers in the Nova code base with active development teams and good continuous integration, but ultimately you have to select a driver based on its merits in your situation. I've included some more detailed thoughts on how to evaluate hypervisor drivers later in this post, as I don't want to go off on a big tangent during my nicely formatted bullet list.
  • Remember that the hypervisor state is interesting debugging information. For example with the libvirt hypervisor, the contents on /var/lib/instances is super useful for debugging misbehaving instances. Additionally, all of the existing libvirt tools work, so you can use those to investigate as well. However, I strongly recommend you only change instance state via Nova, and not go directly to the hypervisor.

  • Avoid new deployments of nova-network. nova-network has been on the deprecation path for a very long time now, and we're currently working on the final steps of a migration plan for nova-network users to neutron. If you're a new deployment of Nova and therefore don't yet depend on any of the features of nova-network, I'd start with Neutron from the beginning. This will save you a possible troublesome migration to Neutron later.

Testing and upgrades
  • You need a test lab. For a non-trivial deployment, you need a realistic test environment. Its expected that you test all upgrades before you do them in production, and rollbacks can sometimes be problematic. For example, some database migrations are very hard to roll back, especially if new instances have been created in the time it took you to decide to roll back. Perhaps consider turning off API access (or putting the API into a read only state) while you are validating a production deploy post upgrade, that way you can restore a database snapshot if you need to undo the upgrade. We know this isn't perfect and are working on a better upgrade strategy for information stored in the database, but we will always expect you to test upgrades before deploying them.
  • Test database migrations on a copy of your production database before doing them for real. Another reason to test upgrades before doing them in production is because some database migrations can be very slow. Its hard for the Nova developers to predict which migrations will be slow, but we do try to test for this and minimize the pain. However, aspects of your deployment can affect this in ways we don't expect -- for example if you have large numbers of volumes per instance, then that could result in database tables being larger than we expect. You should always test database migrations in a lab and report any problems you see.
  • Think about your upgrade strategy in general. While we now support having the control infrastructure running a newer release than the services on hypervisor nodes, we only support that for one release (so you could have your control plane running Kilo for example while you are still running Juno on your hypervisors, you couldn't run Icehouse on the hypervisors though). Are you going to upgrade every six months? Or are you going to do it less frequently but step through a series of upgrades in one session? I suspect the latter option is more risky -- if you encounter a bug in a previous release we would need to back port a fix, which is a much slower process than fixing the most recent release. There are also deployments which choose to "continuously deploy" from trunk. This gets the access to features as they're added, but means that the deployments need to have more operational skill and a closer association with the upstream developers. In general continuous deployers are larger public clouds as best as I can tell.

libvirt specific considerations
  • For those intending to run the libvirt hypervisor driver, not all libvirt hypervisors are created equal. libvirt implements pluggable hypervisors, so if you select the Nova libvirt hypervisor driver, you then need to select what hypervisor to use with libvirt as well. It should be noted however that some hypervisors work better than others, with kvm being the most widely deployed.
  • There are two types of storage for instances. There is "instance storage", which is block devices that exist for the life of the instance and are then cleaned up when the instance is destroyed. There is also block storage provided Cinder, which is persistent and arguably easier to manage than instance storage. I won't discuss storage provided by Cinder any further however, because it is outside the scope of this post. Instance storage is provided by a plug in layer in the libvirt hypervisor driver, which presents you with another set of deployment decisions.
  • Shared instance storage is attractive, but it comes at a cost. Shared instance storage is an attractive option, but isn't required for live migration of instances using the libvirt hypervisor. Think about the costs of shared storage though -- for example putting everything on network attached storage is likely to be expensive, especially if most of your instances don't need the facility. There are other options such as Ceph, but the storage interface layer in libvirt is one of the areas of code where we need to improve testing so be wary of bugs before relying on those storage back ends.

Thoughts on how to evaluate hypervisor drivers

As promised, I also have some thoughts on how to evaluate which hypervisor driver is the right choice for you. First off, if your organization has a lot of experience with a particular hypervisor, then there is always value in that. If that is the case, then you should seriously consider running the hypervisor you already have experience with, as long as that hypervisor has a driver for Nova which meets the criteria below.

What's important is to be looking for a driver which works well with Nova, and a good measure of that is how well the driver development team works with the Nova development team. The obvious best case here is where both teams are the same people -- which is true for drivers that are in the Nova code base. I am aware there are drivers that live outside of Nova's code repository, but you need to remember that the interface these drivers plug into isn't a stable or versioned interface. The risk of those drivers being broken by the ongoing development of Nova is very high. Additionally, only a very small number of those "out of tree" drivers contribute to our continuous integration testing. That means that the Nova team also doesn't know when those drivers are broken. The breakages can also be subtle, so if your vendor isn't at the very least doing tempest runs against their out of tree driver before shipping it to you then I'd be very worried.

You should also check out how many bugs are open in LaunchPad for your chosen driver (this assumes the Nova team is aware of the existence of the driver I suppose). Here's an example link to the libvirt driver bugs currently open. As well as total bug count, I'd be looking for bug close activity -- its nice if there is a very small number of bugs filed, but perhaps that's because there aren't many users. It doesn't necessarily mean the team for that driver is super awesome at closing bugs. The easiest way to look into bug close rates (and general code activity) would be to checkout the code for Nova and then look at the log for your chosen driver. For example for the libvirt driver again:

$ git clone $ cd nova/nova/virt/driver/libvirt $ git log .

That will give you a report on all the commits ever for that driver. You don't need to read the entire report, but it will give you an idea of what the driver authors have recently been thinking about.

Another good metric is the specification activity for your driver. Specifications are the formal design documents that Nova adopted for the Juno release, and they document all the features that we're currently working on. I write summaries of the current state of Nova specs regularly, which you can see posted at with this being the most recent summary at the time of writing this post. You should also check how much your driver authors interact with the core Nova team. The easiest way to do that is probably to keep an eye on the Nova team meeting minutes, which are posted online.

Finally, the OpenStack project believes strongly in continuous integration testing. It (s/It/Testing) has clear value in the number of bugs it finds in code before our users experience them, and I would be very wary of driver code which isn't continuously integrated with Nova. Thus, you need to ensure that your driver has well maintained continuous integration testing. This is easy for "in tree" drivers, as we do that for all of them. For out of tree drivers, continuous integration testing is done with a thing called "third party CI".

How do you determine if a third party CI system is well maintained? First off, I'd start by determining if a third party CI system actually exists by looking at OpenStack's list of known third party CI systems. If the third party isn't listed on that page, then that's a very big warning sign. Next you can use Joe Gordon's lastcomment tool to see when a given CI system last reported a result:

$ git clone $ ./ --name "DB Datasets CI" last 5 comments from 'DB Datasets CI' [0] 2015-01-07 00:46:33 (1:35:13 old) 'Ignore 'dynamic' addr flag on gateway initialization' [1] 2015-01-07 00:37:24 (1:44:22 old) 'Use session with neutronclient' [2] 2015-01-07 00:35:33 (1:46:13 old) 'libvirt: Expanded test libvirt driver' [3] 2015-01-07 00:29:50 (1:51:56 old) 'ephemeral file names should reflect fs type and mkfs command' [4] 2015-01-07 00:15:59 (2:05:47 old) 'Support for ext4 as default filesystem for ephemeral disks'

You can see here that the most recent run is 1 hour 35 minutes old when I ran this command. That's actually pretty good given that I wrote this while most of America was asleep. If the most recent run is days old, that's another warning sign. If you're left in doubt, then I'd recommend appearing in the OpenStack IRC channels on freenode and asking for advice. OpenStack has a number of requirements for third party CI systems, and I haven't discussed many of them here. There is more detail on what OpenStack considers a "well run CI system" on the OpenStack Infrastructure documentation page.

General operational advice

Finally, I have some general advice for operators of OpenStack. There is an active community of operators who discuss their use of the various OpenStack components at the openstack-operators mailing list, if you're deploying Nova you should consider joining that mailing list. While you're welcome to ask questions about deploying OpenStack at that list, you can also ask questions at the more general OpenStack mailing list if you want to.

There are also many companies now which will offer to operate an OpenStack cloud for you. For some organizations engaging a subject matter expert will be the right decision. Probably the most obvious way to evaluate which of those companies to use is to look at their track record of successful deployments, as well as their overall involvement in the OpenStack community. You need a partner who can advocate for you with the OpenStack developers, as well as keeping an eye on what's happening upstream to ensure it meets your needs.


Thanks for reading so far! I hope this document is useful to someone out there. I'd love to hear your feedback -- are there other things we wished deployers considered before committing to a plan? Am I simply wrong somewhere? Finally, this is the first time that I've posted an essay form of a conference talk instead of just the slide deck, and I'd be interested in if people find this format more useful than a YouTube video post conference. Please drop me a line and let me know if you find this useful!

Tags for this post: openstack nova

Related posts: One week of Nova Kilo specifications; Specs for Kilo; Juno nova mid-cycle meetup summary: nova-network to Neutron migration; Juno Nova PTL Candidacy; Juno nova mid-cycle meetup summary: scheduler; Juno nova mid-cycle meetup summary: ironic


Simon Lyall: – Day 2 – Keynote by Eben Moglen

Tue, 2015-01-13 08:28

Last spoke 10 years ago in Canberra

Things have improved in the last ten years

  • $10s of billions of value have been lost in software patent war
  • But things have been so bad that some help was acquired, so worst laws have been pushed back  a little
  • “Fear of God” in industry was enough to push open Patent pools
  • Judges determined that Patent law was getting pathological, 3 wins in Supreme court
  • Likelihood worst patent laws will be applied against free software devs has decreased
  • “The Nature of the problem has altered because the world has altered”

The Next 10 years

  • Most important Patent system will be China’s
  • Lack of rule of law in China will cause problems in environment of patents
  • Too risky for somebody too try and stop a free software project. We have “our own baseball bat” to spring back at them

The last 10 years

  • Changes in Society more important changes in software
  • 21st century vs 20th century social organisations
    • Less need for hierarchy and secrecy
    • Transparency, Participation, non-hierarchical interaction
  • OS invented that organisation structure
  • Technology we made has taken over the creation of software
  • “Where is BitKeeper now?” – Eben Moglen
  • Even Microsoft reorganises that our way of software making won
  • Long term the organisation structure change everywhere will be more important than just it’s application in Software
  • If there has been good news about politics = “we did it”, bad news = “we tried”

Our common Values

  • “Bridge entire environment between vi and emacs”


  • Without PGP and free software then things could have been worse
  • The world would be a far more despotic place if PGP was driven underground back in 1993. Imagine today’s Net without HTTPS or SSH!
  • “We now live in the world we are afraid of”
  • “What stands between them and us is our inventions”
  • “Freedom itself depends on how we make use of the technologies we are creating.” – Eben Moglen
  • “You can’t trust what you can’t read”
  • Big power in the wrong is committed against the first law of robotics, they what technology to work for it.
  • From guy in twitter – “You can’t trust what you can’t read.” True, but if OpenSSL teaches us anything you can’t necessarily trust what you can
  • Attitudes in under-18s are a lot more positive towards him than those who are older (not just cause he looks like Harry Potter)
  • GNU Project is 30 years old, almost same age is Snowden


  • We can’t control the net but opportunity to prevent others from controlling it
  • Opportunity to prevent failure of freedom
  • Society is changing, demographics under control
  • But 1.6 billion people live in China, America is committed to spying, consumer companies are committed to collecting consumer information
  • Collecting everything is not the way we want the net to work
  • We are playing for keeps now. News: Tuesday Keynote Speaker - Professor Eben Moglen

Tue, 2015-01-13 05:28

Today we have our first Keynote speaker - Professor Eben Moglen, Executive Director of the Software Freedom Law Center and professor of Law and Legal History at Columbia University Law School.

Professor Moglen's presentation is scheduled for 09:00 am Tuesday, 13 January 2015 so don't be late.

Professor Moglen has represented many of the world's leading free software developers. He earned his PhD in History and his law degree at Yale University during what he sometimes calls his “long, dark period” in New Haven.

After law school he clerked for Judge Edward Weinfeld of the United States District Court in New York City and for Justice Thurgood Marshall of the United States Supreme Court. He has taught at Columbia Law School since 1987 and has held visiting appointments at Harvard University, Tel Aviv University and the University of Virginia.

In 2003 he was given the Electronic Frontier Foundation's Pioneer Award for efforts on behalf of freedom in the electronic society.

The LCA 2015 Auckland Team

Jeremy Visser: Floppy drive music

Mon, 2015-01-12 21:55

Some time in 2013 I set up a rig to play music with a set of floppy drives. At 2015 in Auckland I gave a brief lightning talk about this, and here is a set of photos and some demo music to accompany.

The hardware consists of six 3.5″ floppy drives connected to a LeoStick (Arduino) via custom vero board that connects the direction and step pins (18 and 20, respectively) as well as permanently grounding the select pin A (14).

The LeoStick is then connected via USB to a laptop, where the Moppy software (not written by me) is loaded onto the LeoStick and its companion Java software is run on the laptop.

The Moppy software expects MIDI format music as input. For me, I use Rosegarden to create and edit the MIDI files.

Music must be specially arranged for use with the floppies. Floppy drives have a useful range of only one octave, and notes must typically be shortened to be distinguished. As a general rule, halve the length of the note that would otherwise be played. In some cases if a particular part is not loud enough, I double it with a second floppy drive (usually playing an octave above/below rather than in unison, as balance issues are usually fundamental to the particular octave).

Each MIDI channel means a separate floppy drive. In Rosegarden, even though it is possible to play multiple notes simultaneously on a single MIDI channel, this is not supported by Moppy. MIDI channels must be numbered sequentially (this is a drop-down option in Rosegarden) unless my patch is used.

It is possible to arrange music by starting with a ready-made MIDI file obtained from public domain sources such as Mutopia Project or IMSLP. In this case it is vital to reorchestrate the music.

Rosegarden’s “Split by pitch” feature makes it much more efficient to transcribe piano music (or music that contains multiple parts in one MIDI channel) by allowing you to separate the top-most or bottom-most notes into separate channels. Rinse and repeat until you have filled all your floppy MIDI channels.

And here are the recordings:

Please be aware that I had a very limited recording setup, and the above recordings are (a) full of noise, and (b) incredibly soft. Crank that volume up!

Simon Lyall: – Day 1 – Session 3 – Containers

Mon, 2015-01-12 15:29

Building a PaaS with Docker, Kubernetes, and Hard Work – Steven Pousty

  • Slides –
  • All about Openshift
  • So why a new Paas?
  • Project Atomic – stripped down RHEL install, everything else as a container. ostree file system, same kernel as RHEL
  • Kubernetes intro
    • Kubernetes Daemon – Routing for services
    • Sceduler etc
  • Openshift
    • Built-in software defined networking – OpenVSwith , HAPRoxy load balancing etc
  • Takeaway
    • PAAS seems to be cool again


Galera with Docker: How Synchronous Replication and Linux Containers Mesh Together – Raghavendra Prabhu

  • I got lost in the talk


Cloud, Containers, and Orchestration Panel -  Katie Miller

  • Steven Pousty , Bran Philips , Tycho Andersen Tycho Andersen Tycho Andersen Tycho Andersen

    Tycho Andersen

  • Standard is Dockers to lose and they might manage it
  • 3-4 years before we should standardise them. Need to experiment first.
  • The kernel API imposes some limits on diversity
  • Lots of other stuff


Simon Lyall: 2015 – Day 1 – Session 2 – Containers

Mon, 2015-01-12 13:28

AWS OpsWorks Orchestration War Stories – Andrew Boag

  • Autoscaling too slow since running build-from-scratch every time
  • Communications dependencies
  • Full stack rebuild in 20-40 minutes to use data currently in production
  • A bit longer in a different region
  • Great for load testing
  • If we ere doing again
    • AMI-based better
    • OPSWorks not suitable for all AWS stacks
    • Golden master for flexable
  • Auto-Scaling
    • Not every AMI instance is Good to Go upon provisioning
    • Not a magic bullet, you can’t broadly under-provision
    • needs to be throughly load-tested
  • Tips
    • Dual factor authentication
    • No single person / credentials should be able to delete all cloud-hosted copies of your data
  • Looked at Cloudformation at start, seemed to be more work
  • Fallen out of love with OpsWorks
  • Nice distinction by Andrew Boag: he doesn’t talk about “lock-in” to cloud providers, but about “cost to exit”.   – Quote from Paul


Slim Application Containers from Source – Sven Dowideit

  • Choose a base image and make a local version (so all your stuff uses the same one)
  • I’d pick debian (a little smaller) unless you can make do with busybox or scratch
  • Do I need these files? (check though the Dockerfile) eg remove docs files, manpages, timezones
  • Then build, export, import and it comes all clean with just one layer.
  • If all your images use same base, only on the disk once
  • Use related images with all your tools, related to deployment image but with the extra dev, debug, network tools
  • Version the dev images
  • Minimise to 2 layers
    • look at docker-squash
    • Get rid of all the sourc code from your image, just end up with whats need, not junk hidden in layers
  • Static micro-container nginx
    • Build as container
    • export as tar , reimport
    • It crashes
    • Use inotifywait to find what extra files (like shared libraries) it needs
    • Create new tarball with those extra files and “docker import” again
    • Just 21MB instead of 1.4GB with all the build fragments and random system stuff
    • Use docker build as last stage rather than docker import and you can run nginx from docker command line
    • Make 2 tar files, one for each image, one in libs/etc, second is nginx


Containers and PCP (Performance Co-Pilot) -  Nathan Scott

  • Been around for 20+ years, 11 years open source, Not a big mindshare
  • What is PCP?
    • Toolkit, System level analysis, live and historical, Extensible, distributed
    • pmcd daemon on each server, plus for various functions (bit of like collectd model)
    • pmlogger, pmchart, pmie, etc talk (pull or poll) to pmcd to get data
  • With Containers
    • Use –container=  to grab info inside a container/namespace
    • Lots of work still needed. Metrics inside containers limited compared to native OS


The Challenges of Containerizing your Datacenter – Daniel Hall

  • Goals at LIFX
    • Apps all stateless, easy to dockerize
    • Using mesos, zookeeper, marathon, chronos
    • Databases and other stuff outside that cloud
  • Mesos slave launches docker containers
  • Docker Security
    • chroot < Docker < KVM
    • Running untrusted Docket containers are a BAD IDEA
    • Don’t run apps as root inside container
    • Use a recent kernel
    • Run as little as possible in container
    • Single static app if possible
    • Run SELinux on the host
  • Finding things
    • Lots of micoroservices, marathon/mesos moves things all over the place
    • Whole machines going up and down
    • Marathon comes with a tool that pushes it’s state into HAProxy, works fairly well, apps talk to localhost on each machines and haproxy forwards
    • Use custom script for this
  • Collecting Logs
    • Not a good solution
    • can mount /dev/log but don’t restart syslog
    • Mesos collects stdout/stderror , hard to work with and no timestamps
    • Centralized logs
    • rsyslog log to -> haproxy -> contral machine
    • Sometimes needs to queue/drop if things take a little while to start
    • rsyslog -> logstash
    • elasticsearch on mesos
    • nginx tasks running kibana
  • Troubleshooting
    • Similar to service discover problem
    • Easier to get into a container than getting out
    • Find a container in marathon
    • Use docker exec to run a shell, doesn’t work so well on really thin containers
    • So debugging tolls can work from outside, pprof or jsonsole can connect to exposed port/pid of container

BlueHackers: Is depression a kind of allergic reaction? | The Guardian

Mon, 2015-01-12 11:12
“A growing number of scientists are suggesting that depression is a result of inflammation caused by the body’s immune system” An interesting idea, worth further research I think. The article content is clear that the scientists don’t regard it as a possible single cause, which is sensible as physical and psychological factors as well as environmental ones are likely to play a role.

Simon Lyall: 2015 – Day 1 – Session 1 – Containers

Mon, 2015-01-12 10:28

Clouds, Containers, and Orchestration Miniconf


Cloud Management and ManageIQ – John Mark Walker

  • Who needs management – Needs something to tie it all together
  • New Technology -> Adoption -> Proliferation -> chaos -> Control -> New Technology
  • Many technologies follow this, flies under the radar, becomes a problem to control, management tools created, management tools follow the same pattern
  • Large number of customers using hybrid cloud environment ( 70% )
  • Huge potential complexity, lots of requirements, multiple vendors/systems to interact with
  • ManageIQ
    • Many vendor managed open source products fail – open core, runt products
    • Better way – give more leeway to upstream developers
    • Article about taking it opensource on Took around a year from when decision was made
    • Lots of work to create a good open source project that will grow
    • Release named after Chess Grandmasters
    • Rails App


LXD: The Container-Based Hypervisor That Isn’t -  Tycho Andersen

  • Part of Openstack
  • Based on LXC , container based hypervisor
  • Secure by default: user namespaces, cgroups, Apparmor, etc
  • A daemon that doesn’t hypervisory things
  • A framework for maintaining container based applications
  • It Isn’t
    • No network configuration
    • No storage management – But storage aware
    • Not an application container tool
    • handwavy difference between it and docker, I’m sure it makes sense to some people. Something about running an init/systemd rather than the app directly.
  • Features
    • Snapshoting – eg something that is slow to start, snapshot after just starts and deploy it in that state
    • Injection – add files into the container for app to work on.
    • Migration – designed to go fairly fast with low downtime
  • Image
    • Public and private images
    • can be published
  • Roadmap
    • MVP 0.1 released late January 2015
    • container management only


Rocket and the App Container Spec – Brandon Philips

  • Single binary – rkt – runs everywhere, systemd not required
  • rkt fetch – downloads and discovers images ( can run as non-root user )
  • bash -> rkt -> application
  • upstart -> rkt -> application
  • rkt run
  • multiple processes in container common. Multiple can be run from command line or specified in json file of spec.
  • Steps in launch
    • stage 0 – downloads images, checks it
    • Stage 1 – Exec as root, setup namespaces and cgroups, run systemd container
    • Stage 2 – runs actual app in container. Things like policy to restart the app
    • rocket-gc garbage collects stuff , runs periodicly. no managmanent daemon
  • App Container spec is work in progress
    • images, files, compressed, meta-data, dependencies on other images
    • runtime , restarts processes, run multiple processes, run extra procs under specified conditions
    • metadata server
    • Intended to be built with test suite to verify News: 2015 is low live #BeHere #lca2015

Mon, 2015-01-12 10:28

Registration is downstairs (level 0) at the Owen G Glenn Business School at the University of Auckland.

We look forward to seeing you there, and we even have a coffee cart in the vicinity to keep those caffeine levels up.

Map of Owen G Glenn Building (OGGB)

Look for redshirts around the University - they will be happy to point you in the right direction!

Sridhar Dhanapalan: Twitter posts: 2015-01-05 to 2015-01-11

Mon, 2015-01-12 01:27

Clinton Roy: clintonroy

Sun, 2015-01-11 22:28


Spent the better part of the morning buying shorts that somehow didn’t make the trip with me…

Registered for, looks like I might get two t-shirts if the second batch of corrected printing comes in on time.

Dinner with old friends at an Indian place here. Managed to thank Paul McKenny for librcu.

Filed under: diary

Clinton Roy: clintonroy

Sun, 2015-01-11 22:28


Spent most of the day at the Auckland War memorial/museum. There’s an emphasis on natural history, Maori culture and conflict history. There’s a lot of stuff in the museum and I was running out of puff by the end of it.

Filed under: diary

Clinton Roy: clintonroy

Sun, 2015-01-11 22:28


Flew to Auckland for my fourteenth

Somehow ended up with a ticket that didn’t give me a meal during the flight.

Filed under: diary News: Conference registration opens 2pm today #BeRegistered

Sun, 2015-01-11 10:28

Registration this year will be downstairs (level 0) at the Owen G Glenn Business School at the University of Auckland.

We look forward to seeing you there, and we even have a coffee cart in the vicinity to keep those caffeine levels up.

Map of Owen G Glenn Building (OGGB)

Follow the yellow path starting on Level 1 (see map below) to the rego desk which is on level 0 (downstairs).

The map below shows the level 0 (downstairs) of the OGGB building

Look for redshirts around the University - they will be happy to point you in the right direction!

Ben Martin: Terry 2.0: The ROS armada begins!

Fri, 2015-01-09 23:58
It all started with wanting to use a Kinect or other RGBD (Depth sensing) camera to do navigation... Things ended up slowly but surely with moving from a BeagleBone Black and custom nodejs script that I created as the heart to a quad core atom running ROS and many ROS nodes that I created ;)

The main gain to ROS is the nodes that other people have written. If you want to convert RGBD to a simulated laser scan in order to do 2d navigation then that's already available. If you want to make a map and then use it then that code is already there for you. And the visualization for these things. I'm not sure I'd have the time to write from scratch a 3d robot viewer and visualize my cut down 'fake' 2d laser scan data from the Kinect in OpenGL. But with ROS I got the joy of seeing the scan change in real time as Terry sensed me move in front of it.

I now have 3d control of the robot arm happening, including optional sinusoidal encoding of movements. The fun part is that the use of sinusoidal can be enabled or disabled without any code changes. I wrote that part as a JointTrajectory shim. For something to use smoother movement all it has to do is publish to that shim instead of directly to the servo controller itself. The publish and subscribe parts of the IPC that ROS has are very easy to get used to and allow breaking up the functionality into rather small pieces if desired.

The arm is one area that is ROS controlled, but not quite the way I want. It seems that using MoveIt is indicated for arm control but I didn't manage to get that to work as yet. The wizard only produced an arm that would articulate on one joint, so more tinkering is needed in that area. Instead I wrote my own ROS node to control the arm. It's all fairly basic trig to get the gripper at an x,y,z relative to the base of the arm. And an easy carry over to fix the gripper at a horizontal to the base no matter what position the arm is moved to. But in the future the option to MoveIt will be considered, can't hurt to have two codepaths to choose from for arm control.

As part of the refresh I updated the pan unit for the camera platform.Previously I used a solid 1/4 inch shaft with the load taken by a bearing and the gearmotor turning the shaft directly from below it. Unfortunately that setup has many drawbacks; no ability to use a slip ring, no torque multiplication, difficulty using an axle end rotary encoder IC to gain real world position feedback. The updated setup uses a 6 rpm gearmotor offset with a variable motor mount to drive a 24 tooth brass gear. That mates with an 80 tooth gear which is affixed to a hollow 1/2 inch alloy tube. As you can see at the top of the image, I've fed the tilt servo cable directly into the inside of that tube. No slip ring right now, but it is all set to allow the USB cable to slip through to the base and enable continuous rotation of the pan subsystem. So the Kinect becomes a radar style. One interesting aside is that you can no longer manually rotate the pan system because the gearmotor, even unpowered, will stop you. The grub screw will slip before the axle turns.

As shown below, the gearmotor is driven by an Arduino which is itself connected to a SparkFun breakout of the TB6612FNG HBridge IC. This combo is attached using double sided 3M tape to a flat bit of channel. Then the flat bit of channel is bolted to Terry. I've used this style a few times now and quite like it. A single unit and all it's wires can be attached and moved fairly quickly.

At first I thought the Arudino gearmotor control and the Web interface would be a bit outside the bounds of ROS. But there is an API for Arduino which gives the nice publish and subscribe with messages that one would expect on the main ROS platform. A little bit of python glue takes the ttyUSB right out of your view and you are left with a little extension from the main ROS right into the MCU. I feel that my 328 screen multiplexer will be updated to use this ROS message API. Reimplementing packeting and synchronization at the serial port level becomes a little less exciting after a while, and not having to even think about that with ROS is certainly welcome.

Below is the motherboard setup for all this. Unfortunately many of the things I wanted to attach used TTL serial, so I needed a handful of USB to TTL bridges. The IMU uses I2C, so its another matter of shoving a 328 into the mix to publish the ROS messages with the useful information for the rest of the ROS stack on the main machine to use at its will.

The web interface has been resurrected and extended from the old BBB driven Terry. This is the same Bootstrap/jQuery style interface but now using roslibjs to communicate from the browser to Terry. I'm using WebSockets to talk back, which is what I was doing manually from the BBB, but with ROS that is an implementation choice that gets hidden away and you again get a nice API to talk ROS like things such as publishing and subscribing standard and custom messages.

The below javascript code sends an array of 4 floats back to Terry to tell it where you want to have the arm (x,y,z,claw) to be located. The 4th number allows you to open and close the claw in the same command. The wrist is held horizontal to the ground for you. Notice that this message is declared to be a Float32MultiArray which is a standard message type.The msg and topic can be reused, so an update is just a prod to an array and a publish call. You can fairly easily publish these messages from the command line too for brute force testing.

var topic_arm_xyz = new ROSLIB.Topic({

   ros  : ros,

   name : '/arm/xyzc',

   messageType : 'std_msgs/Float32MultiArray'


var msg = new ROSLIB.Message({

  data : [ x,y,z, claw ]


topic_arm_xyz.publish( msg );

The learning curve is a bit sharp for some parts of ROS. Navigation requires many subsystems to be brought up, and at first I had a case that the robot model was visualized 90 degrees out of phase to reality. Most of the stuff is already there, but you need to have a robot base controller that is compatible. It is also a trap for the new players not to have a simple robot model urdf file. Without a model some parts of the system didn't work for me. I'd have liked to have won with the MoveIt control, and will get back to trying to do just that in the future. I think I'll dig around for shoe string examples, something like building a very basic three servo arm with ice cream sticks and $5 servos would make for an excellent example of MoveIt for hobby ROS folk. Who knows, maybe that example will appear here in a future post.