Planet Linux Australia

Syndicate content
Planet Linux Australia - http://planet.linux.org.au
Updated: 31 min 43 sec ago

Josh Stewart: Channel 9, Catch Up, Hiro and linux

Thu, 2016-07-28 23:03

About 6 months ago, Channel 9 launched their ‘Catch Up’ service. Basically this is their way of fighting piracy and allowing people to download Australian made TV shows to watch on their PC. Now, of course, no ‘old media’ service would possibly do this without the wonders of DRM. Channel 9 though, are taking a slightly different approach.

Instead of the normal style of DRM that prevents you copying the file, Channel 9 employs technology from a company called Hiro. Essentially you install the Hiro player, download the file and watch it. The player will insert unskippable ads throughout the video, supposedly even targeted at your demographic. Now this is actually a fairly neat system, Channel 9 actually encourage you to share the video files over bittorrent etc! The problem, as I’m sure you can guess, is that there’s no player for linux.

So, just to skip to the punchline, yes it IS possible to get these files working on free software (completely legally & without the watermark)! If you just want to know how to do it, jump to the end as I’m going to explain a bit of background first.

Hiro

The Hiro technology is interesting in that it isn’t simply some custom player. The files you download from Channel 9 are actually xvid encoded, albeit a bastard in-bred cousin of what xvid should be. If you simply download the file and play it with vlc or mplayer, it will run, however you will get a nasty watermark over the top of the video and it will almost certainly crash about 30s in when it hits the first advertising blob. There is also some trickiness going on with the audio as, even if you can get the video to keep playing, the audio will jump back to the beginning at this point. Of course, the watermark isn’t just something that’s placed over the top in post-processing like a subtitle, its in the video data itself. To remove it you actually need to filter the video to modify the area covered by the watermark to darken/lighten the pixels affected. Sounds crazy and a tremendous amount of work right? Well thankfully its already been done, by Hiro themselves.

When you install Hiro, you don’t actually install a media player, you install either a DirectShow filter or a Quicktime component depending on your platform. This has the advantage that you can use reasonably standard software to play the files. Its still not much help for linux though.

Before I get onto how to create a ‘normal’ xvid file, I just want to mention something I think should be a concern for free software advocates. As you might know, xvid is an open codec, both for encoding and decoding. Due to the limitations of Quicktime and Windows Media Player, Hiro needs to include an xvid decoder as part of their filter. I’m sure its no surprise to anyone though that they have failed to release any code for this filter, despite it being based off a GPL’d work. IA(definitely)NAL, but I suspect there’s probably some dodginess going on here.

Using Catchup with free software

Very basically, the trick to getting the video working is that it needs to be passed through the filter provided by Hiro. I tried a number of methods to get the files converted for use with mplayer or vlc and in the end, unfortunately, I found that I needed to be using either Windows or OSX to get it done. Smarter minds than mine might be able to get the DirectShow filter (HiroTransform.ax) working with mplayer in a similar manner to how CoreAVC on linux works, but I had no luck.

But, if you have access to OSX, here’s how to do it:

  1. Download and install the Hiro software for Mac. You don’t need to register or anything, in fact, you can delete the application the moment you finish the install. All you need is the Quicktime component it added.
  2. Grab any file from the Catch Up Service (http://video.ninemsn.com.au/catchuptv). I’ve tested this with Underbelly, but all videos should work.
  3. Install ffmpegx (http://ffmpegx.com)
  4. Grab the following little script: CleanCatch.sh
  5. Run:
    chmod +x CleanCatch.sh
    ./CleanCatch.sh <filename>
  6. Voila. Output will be a file called ‘<filename>.clean.MP4′ and should be playable in both VLC and mplayer

Distribution

So, I’m the first to admit that the above is a right royal pain to do, particularly the whole requiring OSX part. To save everyone the hassle though, I believe its possible to simply distribute the modified file. Now again, IANAL, but I’ve gone over the Channel 9 website with a fine tooth comb and can see nothing that forbids me from distributing this newly encoded file. I agreed to no EULA when I downloaded the original video and their site even has the following on it:

You can share the episode with your friends and watch it as many times as you like – online or offline – with no limitations

That whole ‘no limitations’ part is the bit I like. Not only have Channel 9 given me permission to distribute the file, they’ve given it to me unrestricted. I’ve not broken any locks and in fact have really only used the software provided by Channel 9 and a standard transcoding package.

This being the case, I am considering releasing modified versions of Channel 9’s files over bittorrent. I’d love to hear people’s opinions about this before doing so though in case they know more than I (not a hard thing) about such matters.

Josh Stewart: The rallyduino lives

Thu, 2016-07-28 23:03

[Update: The code or the rallyduino can be found at: http://code.google.com/p/rallyduino/]

Amidst a million other things (Today is T-1 days until our little bubs technical due date! No news yet though) I finally managed to get the rallyduino into the car over the weekend. It actually went in a couple of weeks ago, but had to come out again after a few problems were found.

So, the good news, it all works! I wrote an automagic calibration routine that does all the clever number crunching and comes up with the calibration number, so after a quick drive down the road, everything was up and running. My back of the envelope calculations for what the calibration number for our car would be also turned out pretty accurate, only about 4mm per revolution out. The unit, once calibrated, is at least as accurate as the commercial devices and displays considerably more information. Its also more flexible as it can switch between modes dynamically and has full control from the remote handset. All in all I was pretty happy, even the instantaneous speed function worked first time.

To give a little background, the box uses a hall effect sensor mounted next to a brake caliper that pulses each time a wheel stud rotates past. This is a fairly common method for rally computers to use and to make life simpler, the rallyduino is pin compatible with another common commercial product, the VDO Minicockpit. As we already had a Minicockpit in the car, all we did was ‘double adapt’ the existing plug and pop in the rallyduino. This means there’s 2 computers running off the 1 sensor, in turn making it much simpler (assuming 1 of them is known to be accurate) to determine if the other is right.

After taking the car for a bit of a drive though, a few problems became apparent. The explain them, a pic is required:

The 4 devices in the pic are:

  1. Wayfinder electronic compass
  2. Terratrip (The black box)
  3. Rallyduino (Big silver box with the blue screen)
  4. VDO Minicockpit (Sitting on top of the rallyduino)

The major problem should be immediately obvious. The screen is completely unsuitable. Its both too small and has poor readability in daylight. I’m currently looking at alternatives and it seems like the simplest thing to do is get a 2×20 screen the same physical size as the existing 4×20. This, however, means that there would only be room for a single distance tracker rather than the 2 currently in place. The changeover is fairly trivial as the code, thankfully, is nice and flexible and the screen dimensions can be configured at compile time (From 2×16 to 4×20). Daylight readable screens are also fairly easily obtainable (http://www.crystalfontz.com is the ultimate small screen resource) so its just a matter of ordering one. In the long term I’d like to look at simply using a larger 4×20 screen but, as you can see, real estate on the dash is fairly tight.

Speaking of screens, I also found the most amazing little LCD controller from web4robot.com. It has both serial and I2C interfaces, a keypad decoder (with inbuilt debounce) and someone has gone to all the hard work of work of writing a common arduino interface library for it and other I2C LCD controllers (http://www.wentztech.com/radio/arduino/projects.html) . If you’re looking for such a LCD controller and you are working with an arduino, I cannot recommend these units enough. They’re available on eBay for about $20 Au delivered too. As much as I loved the unit from Phil Anderson, it simply doesn’t have the same featureset as this one, nor is it as robust.

So that’s where things are at. Apologies for the brain dump nature of this post, I just needed to get everything down while I could remember it all.

Josh Stewart: 1 + 1 = 3

Thu, 2016-07-28 23:03

No updates to this blog in a while I’m afraid, things have just been far too busy to have had anything interesting (read geeky) to write about. That said, I am indulging and making this post purely to show off.

On Monday 25th May at 8:37am, after a rather long day/night, Mel and I welcomed Spencer Bailey Stewart into the world. There were a few little issues throughout the night (Particularly towards the end) and he had some small hurdles to get over in his first hour, but since then he has gone from strength to strength and both he and Mum and now doing amazingly well.
Obligatory Pic:

Spencer Bailey Stewart

He’s a very placid little man and would quite happily sleep through an earthquake, much to our delight. And yes, that is a little penguin he’s holding on to in that pic

So that’s all really. I am very conscious of not becoming a complete baby bore so unless something actually worth writing about happens, this will hopefully be my only baby post for the sake of a baby post.

Josh Stewart: Boxee iView back online

Thu, 2016-07-28 23:03

Just a quick post to say the ABC iView Boxee app has been updated to version 0.7 and should now be fully functional once again. I apologise to anyone who has been using the app for how long this update has taken and I wish I could say I’ve been off solving world hunger or something, but in reality I’ve just been flat out with work and family. I’ve also got a few other projects on the go that have been keeping me busy. These may or may not ever reach a stage of public consumption, but if they do it’ll be cool stuff.

For anyone using Boxee, you may need to remove the app from My Apps and wait for Boxee to refresh its repository index, but eventually version 0.7 should pop up. Its a few rough in places so I hope to do another cleanup within the next few weeks, but at least everything is functional again.

Josh Stewart: Going in to business

Thu, 2016-07-28 23:03

Lately my toying around with media centers has created some opportunities for commercial work in this area, which has been a pleasant change. As a result of this I’ve formed Noisy Media, a company specialising in the development of media centre apps for systems such as Boxee as well as the creation of customised media applications using XBMC (and similar) as a base.

Whilst I don’t expect this venture to ever be huge, I can see the market growing hugely in the future as products such as Google TV (Something I will be looking at VERY closely) and the Boxee Box are released and begin bringing streaming Video on Demand to the loungeroom. Given this is something I have a true passion for, the ability to turn work in this area into something profitable is very appealing, and exciting.

Here’s to video on demand!

Josh Stewart: ASX RSS down

Thu, 2016-07-28 23:03

Just a quick note to advise that the ASX RSS feed at http://noisymime.org/asx is currently not functional due to a change in the source data format.  I am in the process of performing a rewrite on this now and should have it back up and running (Better and more robust than ever) within the next few days.

Apologies for the delay in getting things back up and running.

Josh Stewart: Cortina Fuel Injection – Part 1 The Electronics

Thu, 2016-07-28 23:03

Besides being your run of the mill computer geek, I’ve always been a bit of a car geek as well. This often solicits down-the-nose looks from others who associate such people with V8 Supercar lovin’ petrolheads, which has always surprised me little because the most fun parts of working on a car are all just testing physics theories anyway. With that in mind, I’ll do this writeup from the point of view of the reader being a non-car, but scientifically minded person. First a bit of background…

Background

For the last 3 years or so my dad and I have been working a project to fuel inject our race car. The car itself is a 1968 Mk2 Cortina and retains the original 40 year old 1600 OHV engine. This engine is originally carbureted, meaning that it has a device that uses the vacuum created by the engine to mix fuel and air. This mixture is crucial to the running of an engine as the ratio of fuel to air dramatically alters power, response and economy. Carburetors were used for this function for a long time and whilst they achieve the basics very well, they are, at best, a compromise for most engines. To overcome these limitations, car manufacturers started moving to fuel injection in the 80’s, which allowed precise control of the amount of fuel added through the use of electronic signals. Initially these systems were horrible however, being driven by analog or very basic digital computers that did not have the power or inputs needed to accurately perform this function. These evolved to something useful throughout the 90’s and by the 00’s cars were having full sequential system (more on this later) that could deliver both good performance and excellent economy. It was our plan to fit something like the late 90’s type systems (ohh how did this change by the end though) to the Cortina with the aims of improving the power and drivability of the old engine. IN this post I’m going to run through the various components needed from the electrical side to make this all happen, as well as a little background on each. Our starting point was this:

The System

To have a computer control when are how much fuel to inject, it requires a number of inputs:

  • A crank sensor. This is the most important thing and tells the computer where in the 4-stroke cycle (HIGHLY recommended reading if you don’t know about the 4 strokes and engine goes through) the engine is and therefore WHEN to inject the fuel. Typically this is some form of toothed wheel that is on the end of the crankshaft with a VR or Hall effect sensor that pulses each time a tooth goes past it. The more teeth the wheel has, the more precisely the computer knows where the engine is (Assuming it can keep up with all the pulses). By itself the toothed wheel is not enough however as the computer needs a reference point to say when the cycle is beginning. This is typically done by either a 2nd sensor that only pulses once every 4 strokes, or by using what’s know as a missing tooth wheel, which is the approach we have taken. This works by having a wheel that would ordinarily have, say, 36 teeth, but has then had one of them removed. This creates an obvious gap in the series of pulses which the computer can use as a reference once it is told where in the cycle the missing tooth appears. The photos below show the standard Cortina crankshaft end and the wheel we made to fit onto the end

    Standard Crankshaft pulley

    36-1 sensor wheel added. Pulley is behind the wheel

    To read the teeth, we initially fitted a VR sensor, which sat about 0.75mm from the teeth, however due to issues with that item, we ended up replacing it with a Hall Effect unit.

  • Some way of knowing how much air is being pulled into the engine so that it knows HOW MUCH fuel to inject. In earlier fuel injection systems this was done with a Manifold Air Flow (MAF) sensor, a device which heated a wire that was in the path of the incoming air. By measuring the drop in temperature of the wire, the amount of air flowing over it could be determined (Although guessed is probably a better word as most of these systems tended to be fairly inaccurate). More recently systems (From the late 90’s onwards) have used Manifold Absolute Pressure (MAP) sensors to determine the amount of air coming in. Computationally these are more complex as there are a lot more variables that need to be known by the computer, but they tend to be much more accurate for the purposes of fuel injection. Nearly all aftermarket computers now use MAP and given how easy it is the setup (just a single vacuum hose going from the manifold to the ECU) this is the approach we took.

The above are the bare minimum inputs required for a computer to control the injection, however typically more sensors are needed in order to make the system operate smoothly. We used:

  • Temperature sensors: As the density of air changes with temperature, the ECU needs to know how hot or cold the incoming air is. It also needs to know the temperature of the water in the coolant system to know whether it is running too hot or cold so it can make adjustments as needed.
  • Throttle position sensor: The ratio of fuel to air is primarily controlled by the MAf or MAP sensor described above, however as changes in these sensors are not instantaneous, the ECU needs to know when the accelerator is pressed so it can add more fuel for the car to be able to accelerate quickly. These sensors are typically just a variable resistor fitted to the accelerator.
  • Camshaft sensor: I’ll avoid getting too technical here, but injection can essentially work in 2 modes, batched or sequential. In the 4 strokes an engine goes through, the crankshaft will rotate through 720 degrees (ie 2 turns). With just a crank sensor, the ECU can only know where the shaft is in the 0-360 degree range. To overcome this, most fuel injection systems up to about the year 2000 ran in ‘batched’ mode, meaning that the fuel injectors would fire in pairs, twice (or more) per 720 degrees. This is fine and cars can run very smoothly in this mode, however it means that after being injected, some fuel mixture sits in the intake manifold before being sucked into the chamber. During this time, the mixture starts to condense back into a liquid which does not burn as efficiently, leading to higher emissions and fuel consumption. To improve the situation, car manufacturers starting moving to sequential injection, meaning that the fuel is only ever injected at the time it can go straight into the combustion chamber. To do this, the ECU needs to know where in 720 degrees the engine is rather than just in 360 degrees. As the camshaft in a car runs at half the crankshaft speed, all you need to do this is place a similar sensor on this that produces 1 pulse every revolution (The equivalent of 1 pulse every 2 turns of the crank). In our case, we had decided that to remove the distributor (which is driven off the crank) and converted it to provide this pulse. I’ll provide a picture of this shortly, but it uses a single tooth that passes through a ‘vane’ type hall effect sensor, so that the signal goes high when the tooth enters the sensor and low when it leaves.
  • Oxygen sensor (O2) – In order to give some feedback to the ECU about how the engine is actually running, most cars these days run a sensor in the exhaust system to determine how much of the fuel going in is actually being burned. Up until very recently, virtually all of these sensors were what is known as narrowband, which in short means that they can determine whether the fuel/air mix is too lean (Not enough fuel) or too rich (Too much fuel), but not actually by how much. The upshot of this is that you can only ever know EXACTLY what the fuel/air mixture is when it switches from one state to the other. To overcome this problem, there is a different version of the sensor, known as wideband, that (within a certain range) can determine exactly how rich or lean the mixture is. If you ever feel like giving yourself a headache, take a read through http://www.megamanual.com/PWC/LSU4.htm which is the theory behind these sensors. They are complicated! Thankfully despite all the complication, they are fairly easy to use and allow much easier and quicker tuning once the ECU is up and running.

So with all of the above, pretty much the complete electronics system is covered. Of course, this doesn’t even start to cover off the wiring, fusing, relaying etc that has to go into making all of it work in the terribly noisy environment of a car, but that’s all the boring stuff

ECU

Finally the part tying everything together is the ECU (Engine Control Unit) itself. There are many different types of programmable ECUs available and they vary significantly in both features and price, ranging from about $400 to well over $10,000. Unsurprisingly there’s been a lot of interest in this area from enthusiasts looking to make their own and despite there having been a few of these to actually make it to market, the most successfully has almost certainly been Megasquirt. when we started with this project we had planned on using the 2nd generation Megasquirt which, whilst not having some of the capabilities of the top end systems, provided some great bang for bang. As we went along though, it became apparent that the Megasquirt 3 would be coming out at about the right time for us and so I decided to go with one of them instead. I happened to fluke one of the first 100 units to be produced and so we had it in our hands fairly quickly.

Let me just say that this is an AMAZING little box. From what I can see it has virtually all the capabilities of the (considerably) more expensive commercial units including first class tuning software (Multi platform, Win, OSX, linux) and a very active developer community. Combined with the Extra I/O board (MS3X) the box can do full sequential injection and ignition (With direct coil driving of ‘smart’ coils), launch control, traction control, ‘auto’ tuning in software, generic I/O of basically any function you can think of (including PID closed loop control), full logging via onboard SD slot and has a built in USB adaptor to boot!

Megasquirt 3 box and unassembled components

In the next post I’ll go through the hardware and setup we used to make all this happen. I’ll also run through the ignition system that we switched over to ECU control.

Josh Stewart: Command line Skype

Thu, 2016-07-28 23:03

Despite risking the occasional dirty look from a certain type of linux/FOSS supporter, I quite happily run the (currently) non-free Skype client on my HTPC. I have a webcam sitting on top of the TV and works flawlessly for holding video chats with family and friends.

The problem I initially faced however, was that my HTPC is 100% controlled by a keyboard only. Unlike Windows, the linux version of Skype has no built in shortcut keys (user defined or otherwise) for basic tasks such as answering and terminating calls. This makes it virtually impossible to use out of the box. On the upside though, the client does have an API and some wonderful person out there has created a python interface layer for it, aptly named, Skype4Py.

A little while ago when I still had free time on weekends, I sat down and quickly hacked together a python script for answering and terminating calls, as well as maximising video chats from the command line. I then setup a few global shortcut keys within xfce to run the script with the appropriate arguments.

I won’t go into the nitty gritty of the script as it really is a hack in some places (Particularly the video maximising), but I thought I’d post it up in case it is of use to anyone else. I’ve called it mythSkype, simply because the primary function of the machine I run it on is MythTV, but it has no dependencies on Myth at all.

The depedencies are:

  • Python – Tested with 2.6, though 2.5 should work
  • Skype4Py –  any version
  • xdotool – Only required for video maximising

To get the video maximising to work you’ll need to edit the file and set the screen_width and screen_height variables to match your resolution.
Make sure you have Skype running, then simply execute one of the following:

  • ./mythSkype -a (Answer any ringing calls)
  • ./mythSkype -e (End active calls)
  • ./mythSkype -m (Maximise the current video)

The first time you run the script, you will get a prompt from Skype asking if you are happy with Skype4Py having access to the application. Obviously you must give your ascent or nothing will work.

Its nothing fancy, but I hope its of use to others out there.

Download: mythSkype

Josh Stewart: ABC iView on Boxee

Thu, 2016-07-28 23:03

A few months ago I switched from using the standard mythfrontend to Boxee, a web enhanced version of the popular XBMC project. Now Boxee has a LOT of potential and the upcoming Boxee Box looks very promising, but its fantastic web capabilities are let down here in Australia as things such as Hulu and Netflix streaming are not available here.

What we do have though is a national broadcaster with a reasonably good online facility. The ABCs iView has been around for some time and has a really great selection of current programs available on it. Sounds like the perfect candidate for a Boxee app to me.

So with the help of Andy Botting and using Jeremy’s Vissers Python iView download app for initial guidance, I put together a Boxee app for watching iView programs fullscreen. For those wishing to try it out, just add the following repository within Boxee:
http://noisymime.org/boxee

Its mostly feature complete although there are a few things that still need to be added. If you have any suggestions or find a bug either leave a comment or put in a ticket at http://code.google.com/p/xbmc-boxee-iview/issues/list (A Google code site by Andy that I am storing this project at)

So that’s the short version of the story. Along the way however there has been a few hiccups and I want to say something about what the ABC (and more recently the BBC) have done to ‘protect’ their content.

The ABC have enabled a function called SWF Verification on their RTMP stream. This is something that Adobe offer on top of their RTMP products despite the fact that they omitted it from the public RTMP spec. That wouldn’t be so bad, except that they are now going after open source products that implement this, threatening them with cease and desists. Going slightly technical for a minute, SWF Verification is NOT a real protection system. It does not encrypt the content being sent nor does it actually prevent people copying it. The system works by requesting a ‘ping’ every 60-90 seconds. If the player can’t provide the correct response (Which is made up of things such as the date and time and the phrase “Genuine Adobe Flash Player 001″) then the server stops the streaming. Hardly high tech stuff.

In my opinion the ABC has made a huge mistake in enabling this as it achieves nothing in stopping piracy or restricting the service to a certain platform and serves only to annoy and frustrate the audience. There is a patch available at http://code.google.com/p/xbmc-boxee-iview that allows Boxee to read these streams directly, however until such time as this is included in Boxee/XBMC mainline (Watch this space: http://trac.xbmc.org/ticket/8971) or the ABC come to their senses as disable this anti-feature, this Boxee app will use the flash interface instead (boo!)

So that’s it. Hope the app is useful to people and, as stated above, if there are any problems, please let me know.

[EDIT]
I should’ve mentioned this originally, Andy and I did actually contact the iView team at the ABC regarding SWF verification. They responded with the following:

Thanks for contacting us re-Boxee. We agree that it’s a great platform and ultimately appropriate for iView iteration. Currently we’re working out our priorities this year for our small iView team, in terms of extended content offerings, potential platforms and general enhancements to the site.

Just some background on our security settings. We have content agreements with various content owners (individuals, production companies, US TV networks etc) a number require additional security, such as SWF hashing. Our content owners also consider non-ABC rendering of that content as not in the spirit of those agreements.

We appreciate the effort you have put into the plug-in and your general interest in all things iView. So once we are on our way with our development schedule for “out of the browser” iView, we’ll hopefully be in a position to share our priorities a little more. We would like to keep in touch with you this year and if you have any questions or comments my direct email is ********@abc.net.au.

[STATUS]
The app is currently in a WORKING state. If you are experiencing any problems, please send me a copy of your Boxee log file and I will investigate the issue.

Linux Users of Victoria (LUV) Announce: LUV Main August 2016 Meeting: M.2 / CRCs

Wed, 2016-07-27 17:02
Start: Aug 2 2016 18:30 End: Aug 2 2016 20:30 Start: Aug 2 2016 18:30 End: Aug 2 2016 20:30 Location: 

6th Floor, 200 Victoria St. Carlton VIC 3053

Link:  http://luv.asn.au/meetings/map

Speakers:

  • Russell Coker, M.2
  • Rodney Brown, CRCs

200 Victoria St. Carlton VIC 3053

Late arrivals, please call (0490) 049 589 for access to the venue.

Before and/or after each meeting those who are interested are welcome to join other members for dinner. We are open to suggestions for a good place to eat near our venue. Maria's on Peel Street in North Melbourne is currently the most popular place to eat after meetings.

LUV would like to acknowledge Red Hat and Infoxchange for their help in obtaining the meeting venues.

Linux Users of Victoria Inc. is an incorporated association, registration number A0040056C.

August 2, 2016 - 18:30

read more

Linux Users of Victoria (LUV) Announce: LUV Beginners August Meeting: File Sharing in Linux

Wed, 2016-07-27 17:02
Start: Aug 20 2016 12:30 End: Aug 20 2016 16:30 Start: Aug 20 2016 12:30 End: Aug 20 2016 16:30 Location: 

Infoxchange, 33 Elizabeth St. Richmond

Link:  http://luv.asn.au/meetings/map

This hands-on presentation and tutorial with Wen Lin will introduce the various types of file sharing in Linux - from the more traditional NFS & Samba to the newer cloud-based ones like Dropbox, Google Drive and OwnCloud. The primary audience of the talk will be for beginners (newbies), but hopefully some of you who are familiar with Linux will get something out of it as well.

The meeting will be held at Infoxchange, 33 Elizabeth St. Richmond 3121 (enter via the garage on Jonas St.)

Late arrivals, please call (0490) 049 589 for access to the venue.

LUV would like to acknowledge Infoxchange for the venue.

Linux Users of Victoria Inc., is an incorporated association, registration number A0040056C.

August 20, 2016 - 12:30

read more

sthbrx - a POWER technical blog: Get off my lawn: separating Docker workloads using cgroups

Wed, 2016-07-27 13:30

On my team, we do two different things in our Continuous Integration setup: build/functional tests, and performance tests. Build tests simply test whether a project builds, and, if the project provides a functional test suite, that the tests pass. We do a lot of MySQL/MariaDB testing this way. The other type of testing we do is performance tests: we build a project and then run a set of benchmarks against it. Python is a good example here.

Build tests want as much grunt as possible. Performance tests, on the other hand, want a stable, isolated environment. Initially, we set up Jenkins so that performance and build tests never ran at the same time. Builds would get the entire machine, and performance tests would never have to share with anyone.

This, while simple and effective, has some downsides. In POWER land, our machines are quite beefy. For example, one of the boxes I use - an S822L - has 4 sockets, each with 4 cores. At SMT-8 (an 8 way split of each core) that gives us 4 x 4 x 8 = 128 threads. It seems wasteful to lock this entire machine - all 128 threads - just so as to isolate a single-threaded test.1

So, can we partition our machine so that we can be running two different sorts of processes in a sufficiently isolated way?

What counts as 'sufficiently isolated'? Well, my performance tests are CPU bound, so I want CPU isolation. I also want memory, and in particular memory bandwith to be isolated. I don't particularly care about IO isolation as my tests aren't IO heavy. Lastly, I have a couple of tests that are very multithreaded, so I'd like to have enough of a machine for those test results to be interesting.

For CPU isolation we have CPU affinity. We can also do something similar with memory. On a POWER8 system, memory is connected to individual P8s, not to some central point. This is a 'Non-Uniform Memory Architecture' (NUMA) setup: the directly attached memory will be very fast for a processor to access, and memory attached to other processors will be slower to access. An accessible guide (with very helpful diagrams!) is the relevant RedBook (PDF), chapter 2.

We could achieve the isolation we want by dividing up CPUs and NUMA nodes between the competing workloads. Fortunately, all of the hardware NUMA information is plumbed nicely into Linux. Each P8 socket gets a corresponding NUMA node. lscpu will tell you what CPUs correspond to which NUMA nodes (although what it calls a CPU we would call a hardware thread). If you install numactl, you can use numactl -H to get even more details.

In our case, the relevant lscpu output is thus:

NUMA node0 CPU(s): 0-31 NUMA node1 CPU(s): 96-127 NUMA node16 CPU(s): 32-63 NUMA node17 CPU(s): 64-95

Now all we have to do is find some way to tell Linux to restrict a group of processes to a particular NUMA node and the corresponding CPUs. How? Enter control groups, or cgroups for short. Processes can be put into a cgroup, and then a cgroup controller can control the resouces allocated to the cgroup. Cgroups are hierarchical, and there are controllers for a number of different ways you could control a group of processes. Most helpfully for us, there's one called cpuset, which can control CPU affinity, and restrict memory allocation to a NUMA node.

We then just have to get the processes into the relevant cgroup. Fortunately, Docker is incredibly helpful for this!2 Docker containers are put in the docker cgroup. Each container gets it's own cgroup under the docker cgroup, and fortunately Docker deals well with the somewhat broken state of cpuset inheritance.3 So it suffices to create a cpuset cgroup for docker, and allocate some resources to it, and Docker will do the rest. Here we'll allocate the last 3 sockets and NUMA nodes to Docker containers:

cgcreate -g cpuset:docker echo 32-127 > /sys/fs/cgroup/cpuset/docker/cpuset.cpus echo 1,16-17 > /sys/fs/cgroup/cpuset/docker/cpuset.mems echo 1 > /sys/fs/cgroup/cpuset/docker/cpuset.mem_hardwall

mem_hardwall prevents memory allocations under docker from spilling over into the one remaining NUMA node.

So, does this work? I created a container with sysbench and then ran the following:

root@0d3f339d4181:/# sysbench --test=cpu --num-threads=128 --max-requests=10000000 run

Now I've asked for 128 threads, but the cgroup only has CPUs/hwthreads 32-127 allocated. So If I run htop, I shouldn't see any load on CPUs 0-31. What do I actually see?

It works! Now, we create a cgroup for performance tests using the first socket and NUMA node:

cgcreate -g cpuset:perf-cgroup echo 0-31 > /sys/fs/cgroup/cpuset/perf-cgroup/cpuset.cpus echo 0 > /sys/fs/cgroup/cpuset/perf-cgroup/cpuset.mems echo 1 > /sys/fs/cgroup/cpuset/perf-cgroup/cpuset.mem_hardwall

Docker conveniently lets us put new containers under a different cgroup, which means we can simply do:

dja@p88 ~> docker run -it --rm --cgroup-parent=/perf-cgroup/ ppc64le/ubuntu bash root@b037049f94de:/# # ... install sysbench root@b037049f94de:/# sysbench --test=cpu --num-threads=128 --max-requests=10000000 run

And the result?

It works! My benchmark results also suggest this is sufficient isolation, and the rest of the team is happy to have more build resources to play with.

There are some boring loose ends to tie up: if a build job does anything outside of docker (like clone a git repo), that doesn't come under the docker cgroup, and we have to interact with systemd. Because systemd doesn't know about cpuset, this is quite fiddly. We also want this in a systemd unit so it runs on start up, and we want some code to tear it down. But I'll spare you the gory details.

In summary, cgroups are surprisingly powerful and simple to work with, especially in conjunction with Docker and NUMA on Power!

  1. It gets worse! Before the performance test starts, all the running build jobs must drain. If we have 8 Jenkins executors running on the box, and a performance test job is the next in the queue, we have to wait for 8 running jobs to clear. If they all started at different times and have different runtimes, we will inevitably spend a fair chunk of time with the machine at less than full utilisation while we're waiting. 

  2. At least, on Ubuntu 16.04. I haven't tested if this is true anywhere else. 

  3. I hear this is getting better. It is also why systemd hasn't done cpuset inheritance yet. 

Pia Waugh: Personal submission to the Productivity Commission Review on Public Sector Data

Wed, 2016-07-27 11:01

My name is Pia Waugh and this is my personal submission to the Productivity Commission Review on Public Sector Data. It does not reflect the priorities or agenda of my employers past, present or future, though it does draw on my expertise and experience in driving the open data agenda and running data portals in the ACT and Commonwealth Governments from 2011 till 2015. I was invited by the Productivity Commission to do a submission and thought I could provide some useful ideas for consideration. I note I have been on maternity leave since January 2016 and am not employed by the Department of Prime Minister and Cabinet or working on data.gov.au at the time of writing this submission. This submission is also influenced by my work and collaboration with other Government jurisdictions across Australia, overseas and various organisations in the private and community sectors. I’m more than happy to discuss these ideas or others if useful to the Productivity Commission.

I would like to thank all those program and policy managers, civic hackers, experts, advocates, data publishers, data users, public servants and vendors whom I have had the pleasure to work with and have contributed to my understanding of this space. I’d also like to say a very special thank you to the Australian Government Chief Technology Officer, John Sheridan, who gave me the freedom to do what was needed with data.gov.au, and to Allan Barger who was my right hand man in rebooting the agenda in 2013, supporting agencies and helping establish a culture of data publishing and sharing across the public sector. I think we achieved a lot in only a few years with a very small but highly skilled team. A big thank you also to Alex Sadleir and Steven De Costa who were great to work with and made it easy to have an agile and responsive approach to building the foundation for an important piece of data infrastructure for the Australian Government.

Finally, this is a collection of some of my ideas and feedback for use by the Productivity Commission however, it doesn’t include everything I could possibly have to say on this topic because, frankly, we have a small baby who is taking most of my time at the moment. Please feel free to add your comments, criticisms or other ideas to the comments below! It is all licensed as Creative Commons 4.0 By Attribution, so I hope it is useful to others working in this space.

The Importance of Vision

Without a vision, we stumble blindly in the darkness. Without a vision, the work and behaviours of people and organisations are inevitably driven by other competing and often short term priorities. In the case of large and complex organisms like the Australian Public Service, if there is no cohesive vision, no clear goal to aim for, then each individual department is going to do things their own way, driven by their own priorities, budgets, Ministerial whims and you end up with what we largely have today: a cacophony of increasingly divergent approaches driven by tribalism that make collaboration, interoperability, common systems and data reuse impossible (or prohibitively expensive).

If however, you can establish a common vision, then even a strongly decentralised system can converge on the goal. If we can establish a common vision for public data, then the implementation of data programs and policies across the APS should become naturally more consistent and common in practice, with people naturally motivated to collaborate, to share expertise, and to reuse systems, standards and approaches in pursuit of the same goal.

My vision for public data is two-pronged and a bit of a paradigm shift: data by design and gov as an API! “Data by design” is about taking a data driven approach to the business of government and “gov as an API” is about changing the way we use, consume, publish and share data to properly enable a data driven public service and a broader network of innovation. The implementation of these ideas would create mashable government that could span departments, jurisdictions and international boundaries. In a heavily globalised world, no government is in isolation and it is only by making government data, content and services API enabled and reusable/interfacable, that we, collectively, can start to build the kind of analysis, products and services that meet the necessarily cross jurisdictional needs of all Australians, of all people.

More specifically, my vision is a data driven approach to the entire business of government that supports:

  • evidence based and iterative policy making and implementation;

  • transparent, accountable and responsible Government;

  • an open competitive marketplace built on mashable government data, content and services; and

  • a more efficient, effective and responsive public service.

What this requires is not so simple, but is utterly achievable if we could embed a more holistic whole of government approach in the work of individual departments, and then identify and fill the gaps through a central capacity that is responsible for driving a whole of government approach. Too often we see the data agenda oversimplified into what outcomes are desired (data visualisations, dashboards, analysis, etc) however, it is only in establishing multipurpose data infrastructure which can be reused for many different purposes that we will enable the kind of insights, innovation, efficiencies and effectiveness that all the latest reports on realising the value of data allude to. Without actual data, all the reports, policies, mission statements, programs and governance committees are essentially wasting time. But to get better government data, we need to build capacity and motivation in the public sector. We need to build a data driven culture in government. We also need to grow consumer confidence because a) demand helps drive supply, and b) if data users outside the public sector don’t trust that they can find, use and rely upon at least some government data, then we won’t ever see serious reuse of government data by the private sector, researchers, non-profits, citizens or the broader community.

Below is a quick breakdown of each of these priorities, followed by specific recommendations for each:

data infrastructure that supports multiple types of reuse. Ideally all data infrastructure developed by all government entities should be built in a modular, API enabled way to support data reuse beyond the original purpose to enable greater sharing, analysis, aggregation (where required) and publishing. It is often hard for agencies to know what common infrastructure already exists and it is easy for gaps to emerge, so another part of this is to map the data infrastructure requirements for all government data purposes, identify where solutions exist and any gaps. Where whole of government approaches are identified, common data infrastructure should be made available for whole of government use, to reduce the barrier to publishing and sharing data for departments. Too often, large expensive data projects are implemented in individual agencies as single purpose analytics solutions that don’t make the underlying data accessible for any other purpose. If such projects separated the data infrastructure from the analytics solutions, then the data infrastructure could be built to support myriad reuse including multiple analytics solutions, aggregation, sharing and publishing. If government data infrastructure was built like any other national infrastructure, it should enable a competitive marketplace of analysis, products and service delivery both domestically and globally. A useful analogy to consider is the example of roads. Roads are not typically built just from one address to another and are certainly not made to only support certain types of vehicles. It would be extremely inefficient if everyone built their own custom roads and then had to build custom vehicles for each type of road. It is more efficient to build common roads to a minimum technical standard that any type of vehicle can use to support both immediate transport needs, but also unknown transport needs into the future. Similarly we need to build multipurpose data infrastructure to support many types of uses.

greater publisher capacity and motivation to share and publish data. Currently the range of publishing capacity across the APS is extremely broad, from agencies that do nothing to agencies that are prolific publishers. This is driven primarily by different cultures and responsibilities of agencies and if we are to improve the use of data, we need to improve the supply of data across the entire public sector. This means education and support for agencies to help them understand the value to their BAU work. The time and money saved by publishing data, opportunities to improve data quality, the innovation opportunities and the ability to improve decision making are all great motivations once understood, but generally the data agenda is only pitched in political terms that have little to no meaning to data publishers. Otherwise there is no natural motivation to publish or share data, and the strongest policy or regulation in the world does not create sustainable change or effective outcomes if you cannot establish a motivation to comply. Whilst ever publishing data is seen as merely a compliance issue, it will be unlikely for agencies to invest the time and skills to publish data well, that is, to publish the sort of data that consumers want to use.

greater consumer confidence to improve the value realised from government data. Supply is nothing without demand and currently there is a relatively small (but growing) demand for government data, largely because people won’t use what they don’t trust. In the current landscape is difficult to find data and even if one can find it, it is often not machine readable or not freely available, is out of date and generally hard to use. There is not a high level of consumer confidence in what is provided by government so many people don’t even bother to look. If they do look, they find myriad data sources of ranging quality and inevitably waste many hours trying to get an outcome. There is a reasonable demand for data for research and the research community tends to jump through hoops – albeit reluctantly and at great cost – to gain access to government data. However, the private and civic sectors are yet to seriously engage apart form a few interesting outliers. We need to make finding and using useful data easy, and start to build consumer confidence or we will never even scratch the surface of the billions of dollars of untapped potential predicted by various studies. The data infrastructure section is obviously an important part of building consumer confidence as it should make it easier for consumers to find and have confidence in what they need, but it also requires improving the data culture across the APS, better outreach and communications, better education for public servants and citizens on how to engage in the agenda, and targeted programs to improve the publishing of data already in demand. What we don’t need is yet another “tell us what data you want” because people want to see progress.

a data driven culture that embeds in all public servants an understanding of the role of data in the every day work of the public service, from program management, policy development, regulation and even basic reporting. It is important to take data from being seen as a specialist niche delegated only to highly specialised teams and put data front and centre as part of the responsibilities of all public servants – especially management – in their BAU activities. Developing this culture requires education, data driven requirements for new programs and policies, some basic skills development but mostly the proliferation of an awareness of what data is, why it is important, and how to engage appropriate data skills in the BAU work to ensure a data driven approach. Only with data can a truly evidence driven approach to policy be taken, and only with data can a meaningful iterative approach be taken over time.

Finally, obviously the approach above requires an appropriately skilled team to drive policy, coordination and implementation of the agenda in collaboration with the broader APS. This team should reside in a central agenda to have whole of government imprimatur, and needs a mix of policy, commercial, engagement and technical data skills. The experience of data programs around the world shows that when you split policy and implementation, you inevitably get both a policy team lacking in the expertise to drive meaningful policy and an implementation team paralysed by policy indecision and an unclear mandate. This space is changing so rapidly that policy and implementation need to be agile and mutually reinforcing with a strong focus on getting things done.

As we examine the interesting opportunities presented by new developments such as blockchain and big data, we need to seriously understand the shift in paradigm from scarcity to surplus, from centralised to distributed systems, and from pre-planned to iterative approaches, if we are to create an effective public service for the 21st century.

There is already a lot of good work happening, so the recommendations in this submission are meant to improve and augment the landscape, not replicate. I will leave areas of specialisation to the specialists, and have tried to make recommendations that are supportive of a holistic approach to developing a data-driven public service in Australia.

Current Landscape

There has been progress in recent years towards a more data driven public sector however, these initiatives tend to be done by individual teams in isolation from the broader public service. Although we have seen some excellent exemplars of big data and open data, and some good work to clarify and communicate the intent of a data driven public service through policy and reviews, most projects have simply expanded upon the status quo thinking of government as a series of heavily fortified castles that take the extraordinary effort of letting in outsiders (including other departments) only under strictly controlled conditions and with great reluctance and cost. There is very little sharing at the implementation level (though an increasing amount of sharing of ideas and experience) and very rarely are new initiatives consulted across the APS for a whole of government perspective. Very rarely are actual data and infrastructure experts encouraged or supported to work directly together across agency or jurisdiction lines, which is a great pity. Although we have seen the idea of the value of data start to be realised and prioritised, we still see the implementation of data projects largely delegated to small, overworked and highly specialised internal teams that are largely not in the habit of collaborating externally and thus there is a lot of reinvention and diversity in what is done.

If we are to realise the real benefits of data in government and the broader economy, we need to challenge some of the status quo thinking and approaches towards data. We need to consider government (and the data it collects) as a platform for others to build upon rather than the delivery mechanism for all things to all people. We also need to better map what is needed for a data-driven public service rather than falling victim to the attractive (and common, and cheap) notion of simply identifying existing programs of work and claiming them to be sufficient to meet the goals of the agenda.

Globally this is still a fairly new space. Certain data specialisations have matured in government (eg. census/statistics, some spatial, some science data) but there is still a lack of a cohesive approach to data in any one agency. Even specialist data agencies tend to not look beyond the specialised data to have a holistic data driven approach to everything. In this way, it is critical to develop a holistic approach to data at all levels of the public service to embed the principles of data driven decision making in everything we do. Catalogues are not enough. Specialist data projects are not enough. Publishing data isn’t enough. Reporting number of datasets quickly becomes meaningless. We need to measure our success in this space by how well data is helping the public service to make better decisions, build better services, develop and iterate responsive and evidence based policy agendas, measure progress and understand the environment in which we operate.

Ideally, government agencies need to adopt a dramatic shift in thinking to assume in the first instance that the best results will be discovered through collaboration, through sharing, through helping people help themselves. There also needs in the APS to be a shift away from thinking that a policy, framework, governance structure or other artificial constructs are sufficient outcomes. Such mechanisms can be useful, but they can also be a distraction from getting anything tangible done. Such mechanisms often add layers of complexity and cost to what they purport to achieve. Ultimately, it is only what is actually implemented that will drive an outcome and I strongly believe an outcomes driven approach must be applied to the public data agenda for it to achieve its potential.

References

In recent years there has been a lot of progress. Below is a quick list to ensure they are known and built upon for the future. It is also useful to recognise the good work of the government agencies to date.

  • Public Data Toolkit – the data.gov.au team have pulled together a large repository of information, guidance and reports over the past 3 years on our open data toolkit at http://toolkit.data.gov.au. There are also some useful contributions from the Department of Communications Spatial Policy Branch. The Toolkit has links to various guidance from different authoritative agencies across the APS as well as general information about data management and publishing which would be useful to this review.

  • The Productivity Commission is already aware of the Legislative and other Barriers Workshop I ran at PM&C before going on maternity leave, and I commend the outcomes of that session to the Review.

  • The Financial Sector Inquiry (the “Murray Inquiry”) has some excellent recommendations regarding the use of data-drive approaches to streamline the work and reporting of the public sector which, if implemented, would generate cost and time savings as well as the useful side effect of putting in place data driven practices and approaches which can be further leveraged for other purposes.

  • Gov 2.0 Report and the Ahead of the Game Report – these are hard to find copies of online now, but have some good recommendations and ideas about a more data centric and evidence based public sector and I commend them both to the Review. I’m happy to provide copies if required.

  • There are many notable APS agency efforts which I recommend the Productivity Commission engage with, if they haven’t already. Below are a few I have come across to date, and it is far from an exhaustive list:

    • PM&C (Public Data Management Report/Implementation & Public Data Policy Statement)

    • Finance (running and rebooting data.gov.au, budget publishing, data integration in GovCMS)

    • ABS (multi agency arrangement, ABS.Stat)

    • DHS (analytics skills program, data infrastructure and analysis work)

    • Immigration (analytics and data publishing)

    • Social Services (benefits of data publishing)

    • Treasury (Budget work)

    • ANDS (catalogue work and upskilling in research sector)

    • NDI (super computer functionality for science)

    • ATO (smarter data program, automated and publications data publishing, service analytics, analytics, dev lab, innovationspace)

    • Industry (Lighthouse data integration and analysis, energy ratings data and app)

    • CrimTRAC and AUSTRAC (data collection, consolidation, analysis, sharing)

  • Other jurisdictions in Australia have done excellent work as well and you can see a list (hopefully up to date) of portals and policies on the Public Data Toolkit. I recommend the Productivity Commission engage with the various data teams for their experiences and expertise in this matter. There are outstanding efforts in all the State and Territory Governments involved as well as many Local Councils with instructive success stories, excellent approaches to policy, implementation and agency engagement/skills and private sector engagement projects.

Som current risks/issues

There are a number of issues and risks that exist in pursuing the current approach to data in the APS. Below are some considerations to take into account with any new policies or agendas to be developed.

  • There is significant duplication of infrastructure and investment from building bespoke analytics solutions rather than reusable data infrastructure that could support multiple analytics solutions. Agencies build multiple bespoke analytics projects without making the underpinning data available for other purposes resulting in duplicated efforts and under-utilised data across government.

  • Too much focus on pretty user interfaces without enough significant investment or focus on data delivery.

  • Discovery versus reuse – too many example of catalogues linking to dead data. Without the data, discovery is less than useful.

  • Limitations of tech in agencies by ICT Department – often the ICT Department in an agency is reticent to expand the standard operating environment beyond the status quo, creating an issue of limitation of tools and new technologies.

  • Copyright and legislation – particularly old interpretations of each and other excuses to not share.

  • Blockers to agencies publishing data (skills, resources, time, legislation, tech, competing priorities e.g. assumed to be only specialists that can do data).

  • Often activities in the public sector are designed to maintain the status quo (budgets, responsibilities, staff count) and there is very little motivation to do things more efficiently or effectively. We need to establish these motivations for any chance to be sustainable.

  • Public perceptions about the roles and responsibilities of government change over time and it is important to stay engaged when governments want to try something new that the public might be uncertain about. There has been a lot of media attention about how data is used by government with concerns aired about privacy. Australians are concerned about what Government plans to do with their data. Broadly the Government needs to understand and engage with the public about what data it holds and how it is used. There needs to be trust built to both improve the benefits from data and to ensure citizen privacy and rights are protected. Where government wants to use data in new ways, it needs to prosecute the case with the public and ensure there are appropriate limitations to use in place to avoid misuse of the data. Generally, where Australians can directly view the benefit of their data being used and where appropriate limitations are in place, they will probably react positively. For example, tax submission are easier now that their data auto-fills from their employers and health providers when completing Online Tax. People appreciate the concept of having to only update their details once with government.

Benefits

I agree with the benefits identified by the Productivity Commission discussion paper however I would add the following:

  • Publishing government data, if published well, enables a competitive marketplace of service and product delivery, the ability to better leverage public and academic analysis for government use and more broadly, taps into the natural motivation of the entire community to innovate, solve problems and improve life.

  • Establishing authoritative data – often government is the authoritative source of information it naturally collects as part of the function of government. When this data is not then publicly available (through anonymised APIs if necessary) then people will use whatever data they can get access to, reducing the authority of the data collected by Government

  • A data-drive approach to collecting, sharing and publishing data enables true iterative approaches to policy and services. Without data, any changes to policy are difficult to justify and impossible to track the impact, so data provides a means to support change and to identify what is working quickly. Such feedback loops enable iterative improvements to policies and programs that can respond to the changing financial and social environment the operate in.

  • Publishing information in a data driven way can dramatically streamline reporting, government processes and decision making, freeing up resources that can be used for more high value purposes.

Public Sector Data Principles

The Public Data Statement provides a good basis of principles for this agenda. Below are some principles I think are useful to highlight with a brief explanation of each.

Principles:

  • build for the future - legacy systems will always be harder to deal with so agencies need to draw a line in the sand and ensure new systems are designed with data principles, future reuse and this policy agenda in mind. Otherwise we will continue to build legacy systems into the future. Meanwhile, just because a legacy system doesn’t natively support APIs or improved access doesn’t mean you can’t affordably build middleware solutions to extract, transform, share and publish data in an automated way.

  • data first - wherever data is used to achieve an outcome, publish the data along with the outcome. This will improve public confidence in government outcomes and will also enable greater reuse of government data. For example, where graphs or analysis are published also publish the data. Where a mobile app is using data, publish the data API. Where a dashboard is set up, also provide access to the underpinning data.

  • use existing data, from the source where possible - this may involve engaging with or even paying for data from private sector or NGOs, negotiating with other jurisdictions or simply working with other government entities to gain access.

  • build reusable data infrastructure first - wherever data is part of a solution, the data should be accessible through APIs so that other outcomes and uses can be realised, even if the APIs are only used for internal access in the first instance.

  • data driven decision making to support iterative and responsive policy and implementations approaches – all decisions should be evidence based, all projects, policies and programs should have useful data indicators identified to measure and monitor the initiative and enable iterative changes backed by evidence.

  • consume your own data and APIs - agencies should consider how they can better use their own data assets and build access models for their own use that can be used publicly where possible. In consuming their own data and APIs, there is a better chance the data and APIs will be designed and maintained to support reliable reuse. This could raw or aggregate data APIs for analytics, dashboards, mobile apps, websites, publications, data visualisations or any other purpose.

  • developer empathy - if government agencies start to prioritise the needs of data users when publishing data, there is a far greater likelihood the data will be published in a way developers can use. For instance, no developer likes to use PDFs, so why would an agency publish data in a PDF (hint: there is no valid reason. PDF does not make your data more secure!).

  • standardise where beneficial but don’t allow the perfect to delay the good - often the focus on data jumps straight to standards and then multi year/decade standards initiatives are stood up which creates huge delays to accessing actual data. If data is machine readable, it can often be used and mapped to some degree which is useful, more useful than having access to nothing.

  • automate, automate, automate! – where human effort is required, tasks will always be inefficient and prone to error. Data collection, sharing and publishing should be automated where possible. For example, when data is regularly requested, agencies should automate the publishing of data and updates which both reduces the work for the agency and improves the quality for data users.

  • common platforms - where possible agencies should use existing common platforms to share and publish data. Where they need to develop new infrastructure, efforts should be made to identify where new platforms might be useful in a whole of government or multi agency context and built to be shared. This will support greater reuse of infrastructure as well as data.

  • a little less conversation a little more action – the public service needs to shift from talking about data to doing more in this space. Pilot projects, experimentation, collaboration between implementation teams and practitioners, and generally a greater focus on getting things done.

Recommendations for the Public Data agenda Strategic
  1. Strong Recommendation: Develop a holistic vision and strategy for a data-driven APS. This could perhaps be part of a broader digital or ICT strategy, but there needs to be a clear goal that all government entities are aiming towards. Otherwise each agency will continue to do whatever they think makes sense just for them with no convergence in approach and no motivation to work together.

  2. Strong Recommendation: Develop and publish work program and roadmap with meaningful measures of progress and success regularly reported publicly on a public data agenda dashboard. NSW Government already have a public roadmap and dashboard to report progress on their open data agenda.

Whole of government data infrastructure
  1. Strong Recommendation: Grow the data.gov.au technical team to at least 5 people to grow the whole of government catalogue and cloud based data hosting infrastructure, to grow functionality in response to data publisher and data user requirements, to provide free technical support and training to agencies, and to regularly engage with data users to grow public confidence in government data. The data.gov.au experience demonstrated that even just a small motivated technical team could greatly assist agencies to start on their data publishing journey to move beyond policy hypothesising into practical implementation. This is not something that can be efficiently or effectively outsourced in my experience.

  • I note that in the latest report from PM&C, Data61 have been engaged to improve the infrastructure (which looks quite interesting) however, there still needs to be an internal technical capability to work collaboratively with Data61, to support agencies, to ensure what is delivered by contractors meets the technical needs of government, to understand and continually improve the technical needs and landscape of the APS, to contribute meaningfully to programs and initiatives by other agencies, and to ensure the policies and programs of the Public Data Branch are informed by technical realities.

  1. Recommendation: Establish/extend a data infrastructure governance/oversight group with representatives from all major data infrastructure provider agencies including the central public data team to improve alignment of agendas and approaches for a more holistic whole of government approach to all major data infrastructure projects. The group would assess new data functional requirements identified over time, would identify how to best collectively meet the changing data needs of the public sector and would ensure that major data projects apply appropriate principles and policies to enable a data driven public service. This work would also need to be aligned with the work of the Data Champions Network.

  2. Recommendation: Map out, publish and keep up to date the data infrastructure landscape to assist agencies in finding and using common platforms.

  3. Recommendation: Identify on an ongoing basis publisher needs and provide whole of government solutions where required to support data sharing and publishing (eg – data.gov.au, ABS infrastructure, NationalMap, analytics tools, github and code for automation, whole of gov arrangements for common tools where they provide cost benefits).

  4. Recommendation: Create a requirement for New Policy Proposals that any major data initiatives (particularly analytics projects) also make the data available via accessible APIs to support other uses or publishing of the data.

  5. Recommendation: Establish (or build upon existing efforts) an experimental data playground or series of playgrounds for agencies to freely experiment with data, develop skills, trial new tools and approaches to data management, sharing, publishing, analysis and reuse. There are already some sandbox environments available and these could be mapped and updated over time for agencies to easily find and engage with such initiatives.

Grow consumer confidence
  1. Strong Recommendation: Build automated data quality indicators into data.gov.au. Public quality indicators provide an easy way to identify quality data, thus reducing the time and effort required by data users to find something useful. This could also support a quality search interface, for instance data users could limit searches to “high quality government data” or choose granular options such as “select data updated this year”. See my earlier blog (from PM&C) draft of basic technical quality indicators which would be implemented quickly, giving data users a basic indication of how usable and useful data is in a consistent automated way. Additional quality indicators including domain specific quality indicators could be implemented in a second or subsequent iteration of the framework.

  2. Strong Recommendation: Establish regular public communications and engagement to improve relations with data users, improve perception of agenda and progress and identify areas of data provision to prioritise. Monthly blogging of progress, public access to the agenda roadmap and reporting on progress would all be useful. Silence is generally assumed to mean stagnation, so it is imperative for this agenda to have a strong public profile, which in part relies upon people increasingly using government data.

  3. Strong Recommendation: Establish a reasonable funding pool for agencies to apply for when establishing new data infrastructure, when trying to make existing legacy systems more data friendly, and for responding to public data requests in a timely fashion. Agencies should also be able to apply for specialist resource sharing from the central and other agencies for such projects. This will create the capacity to respond to public needs faster and develop skills across the APS.

  4. Strong Recommendation: The Australian Government undertake an intensive study to understand the concerns Australians hold relating to the use of their data and develop a new social pact with the public regarding the use and limitations of data.

  5. Recommendation: establish a 1-2 year project to support Finance in implementing the data driven recommendations from the Murray Inquiry with 2-3 dedicated technical resources working with relevant agency teams. This will result in regulatory streamlining, improved reporting and analysis across the APS, reduced cost and effort in the regular reporting requirements of government entities and greater reuse of the data generated by government reporting.

  6. Recommendation: Establish short program to focus on publishing and reporting progress on some useful high value datasets, applying the Public Data Policy Statement requirements for data publishing. The list of high value datasets could be drawn from the Data Barometer, the Murray Inquiry, existing requests from data.gov.au, and work from PM&C. The effort of determining the MOST high value data to publish has potentially got in the way of actual publishing, so it would be better to use existing analysis and prioritise some data sets but more importantly to establish data by default approach across govt for the kinds of serendipitous use of data for truly innovation outcomes.

  7. Recommendation: Citizen driven privacy – give citizens the option to share data for benefits and simplified services, and a way to access data about themselves.

Grow publisher capacity and motivation
  1. Strong Recommendation: Document the benefits for agencies to share data and create better guidance for agencies. There has been a lot of work since the reboot of data.gov.au to educate agencies on the value of publishing data. The value of specialised data sharing and analytics projects is often evident driving those kinds of projects, but traditionally there hasn’t been a lot of natural motivations for agencies to publish data, which had the unfortunate result of low levels of data publishing. There is a lot of anecdotal evidence that agencies have saved time and money by publishing data publicly, which have in turn driven greater engagement and improvements in data publishing by agencies. If these examples were better documented (now that there are more resources) and if agencies were given more support in developing holistic public data strategies, we would likely see more data published by agencies.

  2. Strong Recommendation: Implement an Agency League Table to show agency performance on publishing or otherwise making government data publicly available. I believe such a league table needs to be carefully designed to include measures that will drive better behaviours in this space. I have previously mapped out a draft league table which ranks agency performance by quantity (number of data resources, weighted by type), quality (see previous note on quality metrics), efficiency (the time and/or money saved in publishing data) and value (a weighted measure of usage and reuse case studies) and would be happy to work with others in re-designing the best approach if useful.

  3. Recommendation: Establish regular internal hackfests with tools for agencies to experiment with new approaches to data collection, sharing, publishing and analysis – build on ATO lab, cloud tools, ATO research week, etc.

  4. Recommendation: Require data reporting component for New Policy Proposals and new tech projects wherein meaningful data and metrics are identified that will provide intelligence on the progress of the initiative throughout the entire process, not just at the end of the project.

  5. Recommendation: Add data principles and API driven and automated data provision to the digital service standard and APSC training.

  6. Recommendation: Require public APIs for all government data, appropriately aggregated where required, leveraging common infrastructure where possible.

  7. Recommendation: Establish a “policy difference engine” – a policy dashboard that tracks the top 10 or 20 policy objectives for the government of the day which includes meaningful metrics for each policy objective over time. This will enable the discovery of trends, the identification of whether policies are meeting their objectives, and supports an evidence based iterative approach to the policies because the difference made by any tweaks to the policy agenda will be evident.

  8. Recommendation: all publicly funded research data to be published publicly, and discoverable on central research data hub with free hosting available for research institutions. There has been a lot of work by ANDS and various research institutions to improve discovery of research data, but a large proportion is still only available behind a paywall or with an education logon. A central repository would reduce the barrier for research organisations to publicly publish their data.

  9. Recommendation: Require that major ICT and data initiatives consider cloud environments for the provision, hosting or analysis of data.

  10. Recommendation: Identify and then extend or provide commonly required spatial web services to support agencies in spatially enabling data. Currently individual agencies have to run their own spatial services but it would be much more efficient to have common spatial web services that all agencies could leverage.

Build data drive culture across APS
  1. Strong Recommendation: Embed data approaches are considered in all major government investments. For example, if data sensors were built into major infrastructure projects it would create more intelligence about how the infrastructure is used over time. If all major investments included data reporting then perhaps it would be easier to keep projects on time and budget.

  2. Recommendation: Establish a whole of government data skills program, not just for specialist skills, but to embed in the entire APS and understanding of data-driven approaches for the public service. This would ideally include mandatory data training for management (in the same way OH&S and procurement are mandatory training). At C is a draft approach that could be taken.

  3. Recommendation: Requirement that all government contracts have create new data make that data available to the contracting gov entity under Creative Commons By Attribution only licence so that government funded data is able to published publicly according to government policy. I have seen cases of contracts leaving ownership with companies and then the data not being reusable by government.

  4. Recommendation: Real data driven indicators required for all new policies, signed off by data champions group, with data for KPIs publicly available on data.gov.au for public access and to feed policy dashboards. Gov entities must identify existing data to feed KPIs where possible from gov, private sector, community and only propose new data collection where new data is clearly required.

  • Note: it was good to see a new requirement to include evidence based on data analytics for new policy proposals and to consult with the Data Champions about how data can support new proposals in the recently launched implementation report on the Public Data Management Report. However, I believe it needs to go further and require data driven indicators be identified up front and reported against throughout as per the recommendation above. Evidence to support a proposal does not necessarily provide the ongoing evidence to ensure implementation of the proposal is successful or has the intended effect, especially in a rapidly changing environment.

  1. Recommendation: Establish relationships with private sector to identify aggregate data points already used in private sector that could be leveraged by public sector rather. This would be more efficient and accurate then new data collection.

  2. Recommendation: Establish or extend a cross agency senior management data champions group with specific responsibilities to oversee the data agenda, sign off on data indicators for NPPs as realistic, provide advice to Government and Finance on data infrastructure proposals across the APS.

  3. Recommendation: Investigate the possibilities for improving or building data sharing environments for better sharing data between agencies.

  4. Recommendation: Take a distributed and federated approach to linking unit record data. Secure API access to sensitive data would avoid creating a honey pot.

  5. Recommendation: Establish data awards as part of annual ICT Awards to include: most innovative analytics, most useful data infrastructure, best data publisher, best data driven policy.

  6. Recommendation: Extend the whole of government service analytics capability started at the DTO and provide access to all agencies to tap into a whole of government view of how users interact with government services and websites. This function and intelligence, if developed as per the original vision, would provide critical evidence of user needs as well as the impact of changes and useful automated service performance metrics.

  7. Recommendation: Support data driven publishing including an XML platform for annual reports and budgets, a requirement for data underpinning all graphs and datavis in gov publications to be published on data.gov.au.

  8. Recommendation: develop a whole of government approach to unit record aggregation of sensitive data to get consistency of approach and aggregation.

Implementation recommendations
  1. Move the Public Data Branch to an implementation agency – Currently the Public Data Branch sits in the Department of Prime Minister and Cabinet. Considering this Department is a policy entity, the questions arises as to whether it is the right place in the longer term for an agenda which requires a strong implementation capability and focus. Public data infrastructure needs to be run like other whole of government infrastructure and would be better served as part of a broader online services delivery team. Possible options would include one of the shared services hubs, a data specialist agency with a whole of government mandate, or the office of the CTO (Finance) which runs a number of other whole of government services.

Downloadable copy

Pia Waugh: Gather-ing some thoughts on societal challenges

Tue, 2016-07-26 15:01

On the weekend I went to the GatherNZ event in Auckland, an interesting unconference. I knew there were going to be some pretty awesome people hanging out which gave a chance for me to catch up with and introduce the family to some friends, hear some interesting ideas, and road test some ideas I’ve been having about where we are all heading in the future. I ran a session I called “Choose your own adventure, please” and it was packed! Below is a bit of a write up of what was discussed as there was a lot of interest in how to keep the conversation going. I confess, I didn’t expect so much interest as to be asked where the conversation could be continued, but this is a good start I think. I was particularly chuffed when a few attendee said the session blew their minds

I’m going to be blogging a fair bit over the coming months on this topic in any case as it relates to a book I’m in the process of researching and writing, but more on that next week!

Choose your own adventure, please

We are at a significant tipping point in history. The world and the very foundations our society were built on have changed, but we are still largely stuck in the past in how we think and plan for the future. If we don’t make some active decisions about how we live, think and prioritise, then we will find ourselves subconsciously reinforcing the status quo at every turn and not in a position to genuinely create a better future for all. I challenge everyone to consider how they think and to actively choose their own adventure, rather than just doing what was done before.

How has the world changed? Well many point to the changes in technology and science, and the impact these have had on our quality of life. I think the more interesting changes are in how power and perspectives has changed, which created the environment for scientific and technological progress in the first instance, but also created the ability for many many more individuals to shape the world around them. We have seen traditional paradigms of scarcity, centralisation and closed systems be outflanked and outdated by modern shifts to surplus, distribution and open systems. When you were born a peasant and died one, what power did you have to affect your destiny? Nowadays individuals are more powerful than ever in our collective history, with the traditionally centralised powers of publishing, property, communications, monitoring and even enforcement now distributed internationally to anyone with access to a computer and the internet, which is over a third of the world’s population and growing. I blogged about this idea more here. Of course, these shifts are proving challenging for traditional institutions and structures to keep up, but individuals are simply routing around these dinosaurs, putting such organisations in the uncomfortable position of either adapting or rendering themselves irrelevant.

Choices, choices, choices

We discussed a number of specific premises or frameworks that underpinned the development of much of the world we know today, but are now out of touch with the changing world we live in. It was a fascinating discussion, so thank you to everyone who came and contributed and although I think we only scratched the surface, I think it gave a lot of people food for thought

  • Open vs closed – open systems (open knowledge, data, government, source, science) are outperforming closed ones in almost everything from science, technology, business models, security models, government and political systems, human knowledge and social models. Open systems enable rapid feedback loops that support greater iteration and improvements in response to the world, and open systems create a natural motivation for the players involved to perform well and gain the benefits of a broader knowledge, experience and feedback base. Open systems also support a competitive collaborative environment, where organisations can collaborate on the common, but compete on their specialisation. We discussed how security by obscurity was getting better understood as a largely false premise and yet, there are still so many projects, decisions, policies or other initiatives where closed is the assumed position, in contrast to the general trend towards openness across the board.
  • Central to distributed – many people and organisations still act like kings in castles, protecting their stuff from the masses and only collaborating with walls and moats in place to keep out the riff raff. The problem is that everything is becoming more distributed, and the smartest people will never all be in the one castle, so if you want the best outcomes, be it for a policy, product, scientific discovery, service or anything else, you need to consider what is out there and how you can be a part of a broader ecosystem. Building on the shoulders of giants and being a shoulder for others to build upon. Otherwise you will always be slower than those who know how to be a node in the network. Although deeply hierarchical systems still exist, individuals are learning how to route around the hierarchy (which is only an imaginary construct in any case). There will always be specialists and the need for central controls over certain things however, if whatever you do is done in isolation, it will only be effective in isolation. Everything and everyone is more and more interconnected so we need to behave more in this way to gain the benefits, and to ensure what we do is relevant to those we do it for. By tapping into the masses, we can also tap into much greater capacity and feedback loops to ensure how we iterate is responsive to the environment we operate in. Examples of the shift included media, democracy, citizen movements, ideology, security, citizen science, gov as an API, transnational movements and the likely impact of blockchain technologies on the financial sector.
  • Scarcity to surplus – the shift from scarcity to surplus is particularly interesting because so much of our laws, governance structures, business models, trade agreements and rules for living are based around antiquated ideas of scarcity and property. We now apply the idea of ownership to everything and I shared a story of a museum claiming ownership on human remains taken from Australia. How can you own that and then refuse to repatriate the remains to that community? Copyright was developed when the ability to copy something was costly and hard. Given digital property (including a lot of “IP”) is so easily replicated with low/zero cost, it has wrought havoc with how we think about IP and yet we have continued to duplicate this antiquated thinking in a time of increasing surplus. This is a problem because new technologies could genuinely create surplus in physical properties, especially with the developments in nano-technologies and 3D printing, but if we bind up these technologies to only replicate the status quo, we will never realise the potential to solve major problems of scarcity, like hunger or poverty.
  • Nationalism and tribalism – because of global communications, more people feel connected with their communities of interest, which can span geopolitical, language, disability and other traditional barriers to forming groups. This will also have an impact on loyalties because people will have an increasingly complex relationship with the world around them. Citizens can and will increasingly jurisdiction shop for a nation that supports their lifestyle and ideological choices, the same way that multinational corporates have jurisdiction shopped for low tax, low regulation environments for some time. On a more micro level, individuals engage in us vs them behaviours all the time, and it gets in the way of working together.
  • Human augmentation and (dis)ability – what it means to look and be human will start to change as more human augmentation starts to become mainstream. Not just cosmetic augmentations, but functional. The body hacking movement has been playing with human abilities and has discovered that the human brain can literally adapt to and start to interpret foreign neurological inputs, which opens up the path to nor just augmenting existing human abilities, but expanding and inventing new human abilities. If we consider the olympics have pretty much found the limit of natural human sporting achievement and have become arguably a bit boring, perhaps we could lift the limitations on the para-olympics and start to see rocket powered 100m sprints, or cyborg Judo competitions. As we start to explore what we can do with ourselves physically, neurologically and chemically, it will challenge a lot of views on what it means to be human. By why should we limit ourselves?
  • Outsourcing personal responsibility – with advances in technology, many have become lazy about how far their personal responsibility extends. We outsource small tasks, then larger ones, then strategy, then decision making, and we end up having no personal responsibility for major things in our world. Projects can fail, decisions become automated, ethics get buried in code, but individuals can keep their noses clean. We need to stop trying to avoid risk to the point where we don’t do anything and we need to ensure responsibility for human decisions are not automated beyond human responsibility.
  • Unconscious bias of privileged views, including digital colonialism – the need to be really aware of our assumptions and try to not simply reinvent the status quo or reinforce “structural white supremacy” as it was put by the contributor. Powerful words worth pondering! Explicit inclusion was put forward as something to prioritise.
  • Work – how we think about work! If we are moving into a more automated landscape, perhaps how we think about work will fundamentally change which would have enormous ramifications for the social and financial environment. Check out Tim Dunlop’s writing on this
  • Facts to sensationalism – the flow of information and communications are now so rapid that people, media and organisations are motivated to ever more sensationalism rather than considered opinions or facts. Definitely a shift worth considering!

Other feedback from the room included:

  • The importance of considering ethics, values and privilege in making decisions.
  • The ability to route around hierarchy, but the inevitable push back of established powers on the new world.
  • The idea that we go in cycles of power from centralised to distributed and back again. I confess, this idea is new to me and I’ll be pondering on it more.

Any feedback, thinking or ideas welcome in the comments below It was a fun session.

Simon Lyall: Gather Conference 2016 – Afternoon

Sat, 2016-07-23 15:02

The Gathering

Chloe Swarbrick

  • Whose responsibility is it to disrupt the system?
  • Maybe try and engage with the system we have for a start before writing it off.
  • You disrupt the system yourself or you hold the system accountable

Nick McFarlane

  • He wrote a book
  • Rock Stars are dicks to work with

So you want to Start a Business

  • Hosted by Reuben and Justin (the accountant)
  • Things you need to know in your first year of business
  • How serious is the business, what sort of structure
    • If you are serious, you have to do things properly
    • Have you got paying customers yet
    • Could just be an idea or a hobby
  • Sole Trader vs Incorporated company vs Trust vs Partnership
  • Incorperated
    • Directors and Shareholders needed to be decided on
    • Can take just half an hour
  • when to get a GST number?
    • If over $60k turnover a year
    • If you have lots of stuff you plan to claim back.
  • Have an accounting System from Day 1 – Xero Pretty good
  • Get an advisor or mentor that is not emotionally invested in your company
  • If partnership then split up responsibilities so you can hold each other accountable for specific items
  • If you are using Xero then your accountant should be using Xero directly not copying it into a different system.
  • Remuneration
    • Should have a shareholders agreement
    • PAYE possibility from drawings or put 30% aside
    • Even if only a small hobby company you will need to declare income to IRD especially non-trivial level.
  • What Level to start at Xero?
    • Probably from the start if the business is intended to be serious
    • A bit of pain to switch over later
  • Don’t forget about ACC
  • Remember you are due provisional tax once you get over the the $2500 for the previous year.
  • Home Office expense claim – claim percentage of home rent, power etc
  • Get in professionals to help

Diversity in Tech

  • Diversity is important
    • Why is it important?
    • Does it mean the same for everyone
  • Have people with different “ways of thinking” then we will have a diverse views then wider and better solutions
  • example “Polish engineer could analysis a Polish specific character input error”
  • example “Controlling a robot in Samoan”, robots are not just in english
  • Stereotypes for some groups to specific jobs, eg “Indians in tech support”
  • Example: All hires went though University of Auckland so had done the same courses etc
  • How do you fix it when people innocently hire everyone from the same background? How do you break the pattern? No be the first different-hire represent everybody in that group?
  • I didn’t want to be a trail-blazer
  • Wow’ed out at “Women in tech” event, first time saw “majority of people are like me” in a bar.
  • “If he is a white male and I’m going to hire him on the team that is already full of white men he better be exception”
  • Worried about implication that “diversity” vs “Meritocracy” and that diverse candidates are not as good
  • Usual over-representation of white-males in the discussion even in topics like this.
  • Notion that somebody was only hired to represent diversity is very harmful especially for that person
  • If you are hiring for a tech position then 90% of your candidates will be white-males, try place your diversity in getting more diverse group applying for the jobs not tilt in the actual hiring.
  • Even in maker spaces where anyone is welcome, there are a lot fewer women. Blames mens mags having things unfinished, women’s mags everything is perfect so women don’t want to show off something that is unfinished.
  • Need to make the workforce diverse now to match the younger people coming into it
  • Need to cover “power income” people who are not exposed to tech
  • Even a small number are role models for the future for the young people today
  • Also need to address the problem of women dropping out of tech in the 30s and 40s. We can’t push girls into an “environment filled with acid”
  • Example taking out “cocky arrogant males” from classes into “advanced stream” and the remaining class saw women graduating and staying in at a much higher rate.

Podcasting

  • Paul Spain from Podcast New Zealand organising
  • Easiest to listen to when doing manual stuff or in car or bus
  • Need to avoid overload of commercials, eg interview people from the company about the topic of interest rather than about their product
  • Big firms putting money into podcasting
  • In the US 21% of the market are listening every single month. In NZ perhaps more like 5% since not a lot of awareness or local content
  • Some radios shows are re-cutting and publishing them
  • Not a good directory of NZ podcasts
  • Advise people use proper equipment if possible if more than a once-off. Bad sound quality is very noticeable.
  • One person: 5 part series on immigration and immigrants in NZ
  • Making the charts is a big exposure
  • Apples “new and noteworthy” list
  • Domination by traditional personalities and existing broadcasters at present. But that only helps traction within New Zealand

 

 

Share

Francois Marier: Replacing a failed RAID drive

Sat, 2016-07-23 14:55

Here's the complete procedure I followed to replace a failed drive from a RAID array on a Debian machine.

Replace the failed drive

After seeing that /dev/sdb had been kicked out of my RAID array, I used smartmontools to identify the serial number of the drive to pull out:

smartctl -a /dev/sdb

Armed with this information, I shutdown the computer, pulled the bad drive out and put the new blank one in.

Initialize the new drive

After booting with the new blank drive in, I copied the partition table using parted.

First, I took a look at what the partition table looks like on the good drive:

$ parted /dev/sda unit s print

and created a new empty one on the replacement drive:

$ parted /dev/sdb unit s mktable gpt

then I ran mkpart for all 4 partitions and made them all the same size as the matching ones on /dev/sda.

Finally, I ran toggle 1 bios_grub (boot partition) and toggle X raid (where X is the partition number) for all RAID partitions, before verifying using print that the two partition tables were now the same.

Resync/recreate the RAID arrays

To sync the data from the good drive (/dev/sda) to the replacement one (/dev/sdb), I ran the following on my RAID1 partitions:

mdadm /dev/md0 -a /dev/sdb2 mdadm /dev/md2 -a /dev/sdb4

and kept an eye on the status of this sync using:

watch -n 2 cat /proc/mdstat

In order to speed up the sync, I used the following trick:

blockdev --setra 65536 "/dev/md0" blockdev --setra 65536 "/dev/md2" echo 300000 > /proc/sys/dev/raid/speed_limit_min echo 1000000 > /proc/sys/dev/raid/speed_limit_max

Then, I recreated my RAID0 swap partition like this:

mdadm /dev/md1 --create --level=0 --raid-devices=2 /dev/sda3 /dev/sdb3 mkswap /dev/md1

Because the swap partition is brand new (you can't restore a RAID0, you need to re-create it), I had to update two things:

  • replace the UUID for the swap mount in /etc/fstab, with the one returned by mkswap (or running blkid and looking for /dev/md1)
  • replace the UUID for /dev/md1 in /etc/mdadm/mdadm.conf with the one returned for /dev/md1 by mdadm --detail --scan
Ensuring that I can boot with the replacement drive

In order to be able to boot from both drives, I reinstalled the grub boot loader onto the replacement drive:

grub-install /dev/sdb

before rebooting with both drives to first make sure that my new config works.

Then I booted without /dev/sda to make sure that everything would be fine should that drive fail and leave me with just the new one (/dev/sdb).

This test obviously gets the two drives out of sync, so I rebooted with both drives plugged in and then had to re-add /dev/sda to the RAID1 arrays:

mdadm /dev/md0 -a /dev/sda2 mdadm /dev/md2 -a /dev/sda4

Once that finished, I rebooted again with both drives plugged in to confirm that everything is fine:

cat /proc/mdstat

Then I ran a full SMART test over the new replacement drive:

smartctl -t long /dev/sdb

Simon Lyall: Gather Conference 2016 – Morning

Sat, 2016-07-23 09:02

At the Gather Conference again for about the 6th time. It is a 1-day tech-orientated unconference held in Auckland every year.

The day is split into seven streamed sessions each 40 minutes long (of about 8 parallel rooms of events that are each scheduled and run by attendees) plus and opening and a keynote session.

How to Steer your own career – Shirley Tricker

  • Asked people hands up on their current job situation, FT vs PT, sinmgle v multiple jobs
  • Alternatives to traditional careers of work. possible to craft your career
  • Recommended Blog – Free Range Humans
  • Job vs Career
    • Job – something you do for somebody else
    • Career – Uniqie to you, your life’s work
    • Career – What you do to make a contribution
  • Predicted that a greater number of people will not stay with one (or even 2 or 3) employers through their career
  • Success – defined by your goals, lifestyle wishes
  • What are your strengths – Know how you are valuable, what you can offer people/employers, ways you can branch out
  • Hard and Soft Skills (soft skills defined broadly, things outside a regular job description)
  • Develop soft skills
    • List skills and review ways to develop and improve them
    • Look at people you admire and copy them
    • Look at job desctions
  • Skills you might need for a portfilio career
    • Good at organising, marketing, networking
    • flexible, work alone, negotiation
    • Financial literacy (handle your accounts)
  • Getting started
    • Start small ( don’t give up your day job overnight)
    • Get training via work or independently
    • Develop you strengths
    • Fix weaknesses
    • Small experiments
    • cheap and fast (start a blog)
    • Don’t have to start out as an expert, you can learn as you go
  • Just because you are in control doesn’t make it easy
  • Resources
    • Careers.govt.nz
    • Seth Goden
    • Tim Ferris
    • eg outsources her writing.
  • Tools
    • Xero
    • WordPress
    • Canva for images
    • Meetup
    • Odesk and other freelance websites
  • Feedback from Audience
    • Have somebody to report to, eg meet with friend/adviser monthly to chat and bounce stuff off
    • Cultivate Women’s mentoring group
    • This doesn’t seem to filter through to young people, they feel they have to pick a career at 18 and go to university to prep for that.
    • Give advice to people and this helps you define
    • Try and make the world a better place: enjoy the work you are doing, be happy and proud of the outcome of what you are doing and be happy that it is making the world a bit better
    • How to I “motivate myself” without a push from your employer?
      • Do something that you really want to do so you won’t need external motivation
      • Find someone who is doing something write and see what they did
      • Awesome for introverts
    • If you want to start a startup then work for one to see what it is like and learn skills
    • You don’t have to have a startup in your 20s, you can learn your skills first.
    • Sometimes you have to do a crappy job at the start to get onto the cool stuff later. You have to look at the goal or path sometimes

Books and Podcasts – Tanya Johnson

Stuff people recommend

  • Intelligent disobedience – Ira
  • Hamilton the revolution – based on the musical
  • Never Split the difference – Chris Voss (ex hostage negotiator)
  • The Three Body Problem – Lia CiXin – Sci Fi series
  • Lucky Peach – Food and fiction
  • Unlimited Memory
  • The Black Swan and Fooled by Randomness
  • The Setup (usesthis.com) website
  • Tim Ferris Podcast
  • Freakonomics Podcast
  • Moonwalking with Einstein
  • Clothes, Music, Boy – Viv Albertine
  • TIP: Amazon Whispersync for Kindle App (audiobook across various platforms)
  • TIP: Blinkist – 15 minute summaries of books
  • An Intimate History of Humanity – Theodore Zenden
  • How to Live – Sarah Bakewell
  • TIP: Pocketcasts is a good podcast app for Android.
  • Tested Podcast from Mythbusters people
  • Trumpcast podcast from Slate
  • A Fighting Chance – Elizabeth Warren
  • The Choice – Og Mandino
  • The Good life project Podcast
  • The Ted Radio Hour Podcast (on 1.5 speed)
  • This American Life
  • How to be a Woman by Caitlin Moran
  • The Hard thing about Hard things books
  • Flashboys
  • The Changelog Podcast – Interview people doing Open Source software
  • The Art of Oppertunity Roseland Zander
  • Red Rising Trilogy by Piers Brown
  • On the Rag podcast by the Spinoff
  • Hamish and Andy podcast
  • Radiolab podcast
  • Hardcore History podcast
  • Car Talk podcast
  • Ametora – Story of Japanese menswear since WW2
  • .net rocks podcast
  • How not to be wrong
  • Savage Love Podcast
  • Friday Night Comedy from the BBC (especially the News Quiz)
  • Answer me this Podcast
  • Back to work podcast
  • Reply All podcast
  • The Moth
  • Serial
  • American Blood
  • The Productivity podcast
  • Keeping it 1600
  • Ruby Rogues Podcast
  • Game Change – John Heilemann
  • The Road less Travelled – M Scott Peck
  • The Power of Now
  • Snow Crash – Neil Stevensen

My Journey to becoming a Change Agent – Suki Xiao

  • Start of 2015 was a policy adviser at Ministry
  • Didn’t feel connected to job and people making policies for
  • Outside of work was a Youthline counsellor
  • Wanted to make a difference, organised some internal talks
  • Wanted to make changes, got told had to be a manager to make changes (10 years away)
  • Found out about R9 accelerator. Startup accelerator looking at Govt/Business interaction and pain points
  • Get seconded to it
  • First month was very hard.
  • Speed of change was difficult, “Lean into the discomfort” – Team motto
  • Be married to the problem
    • Specific problem was making sure enough seasonal workers, came up with solution but customers didn’t like it. Was not solving the actual problem customers had.
    • Team was married to the problem, not the married to the solution
  • When went back to old job, found slower pace hard to adjust back
  • Got offered a job back at the accelerator, coaching up to 7 teams.
    • Very hard work, lots of work, burnt out
    • 50% pay cut
    • Worked out wasn’t “Agile” herself
    • Started doing personal Kanban boards
    • Cut back number of teams coaching, higher quality
  • Spring Board
    • Place can work at sustainable pace
    • Working at Nomad 8 as an independent Agile consultant
    • Work on separate companies but some support from colleges
  • Find my place
    • Joined Xero as a Agile Team Facilitator
  • Takeaways
    • Anybody can be a change agent
    • An environment that supports and empowers
    • Look for support
  • Conversation on how you overcome the “Everest” big huge goal
    • Hard to get past the first step for some – speaker found she tended to do first think later. Others over-thought beforehand
    • It seems hard but think of the hard things you have done in your life and it is usually not as bad
    • Motivate yourself by having no money and having no choice
    • Point all the bad things out in the open, visualise them all and feel better cause they will rarely happen
    • Learn to recognise your bad patterns of thoughts
    • “The Way of Art” Steven Pressfield (skip the Angels chapter)
  • Are places Serious about Agile instead of just placing lip-service?
    • Questioner was older and found places wanted younger Agile coaches
    • Companies had to completely change into organisation, eg replace project managers
    • eg CEO is still waterfall but people lower down are into Agile. Not enough management buy-in.
    • Speaker left on client that wasn’t serious about changing
  • Went though an Agile process, made “Putting Agile into the Org” as the product
  • Show customers what the value is
  • Certification advice, all sorts of options. Nomad8 course is recomended

 

Share

Russell Coker: 802.1x Authentication on Debian

Fri, 2016-07-22 17:02

I recently had to setup some Linux workstations with 802.1x authentication (described as “Ethernet authentication”) to connect to a smart switch. The most useful web site I found was the Ubuntu help site about 802.1x Authentication [1]. But it didn’t describe exactly what I needed so I’m writing a more concise explanation.

The first thing to note is that the authentication mechanism works the same way as 802.11 wireless authentication, so it’s a good idea to have the wpasupplicant package installed on all laptops just in case you need to connect to such a network.

The first step is to create a wpa_supplicant config file, I named mine /etc/wpa_supplicant_SITE.conf. The file needs contents like the following:

network={ key_mgmt=IEEE8021X eap=PEAP identity="USERNAME" anonymous_identity="USERNAME" password="PASS" phase1="auth=MD5" phase2="auth=CHAP password=PASS" eapol_flags=0 }

The first difference between what I use and the Ubuntu example is that I’m using “eap=PEAP“, that is an issue of the way the network is configured, whoever runs your switch can tell you the correct settings for that. The next difference is that I’m using “auth=CHAP” and the Ubuntu example has “auth=PAP“. The difference between those protocols is that CHAP has a challenge-response and PAP just has the password sent (maybe encrypted) over the network. If whoever runs the network says that they “don’t store unhashed passwords” or makes any similar claim then they are almost certainly using CHAP.

Change USERNAME and PASS to your user name and password.

wpa_supplicant -c /etc/wpa_supplicant_SITE.conf -D wired -i eth0

The above command can be used to test the operation of wpa_supplicant.

Successfully initialized wpa_supplicant eth0: Associated with 00:01:02:03:04:05 eth0: CTRL-EVENT-EAP-STARTED EAP authentication started eth0: CTRL-EVENT-EAP-PROPOSED-METHOD vendor=0 method=25 TLS: Unsupported Phase2 EAP method 'CHAP' eth0: CTRL-EVENT-EAP-METHOD EAP vendor 0 method 25 (PEAP) selected eth0: CTRL-EVENT-EAP-PEER-CERT depth=0 subject='' eth0: CTRL-EVENT-EAP-PEER-CERT depth=0 subject='' EAP-MSCHAPV2: Authentication succeeded EAP-TLV: TLV Result - Success - EAP-TLV/Phase2 Completed eth0: CTRL-EVENT-EAP-SUCCESS EAP authentication completed successfully eth0: CTRL-EVENT-CONNECTED - Connection to 00:01:02:03:04:05 completed [id=0 id_str=]

Above is the output of a successful test with wpa_supplicant. I replaced the MAC of the switch with 00:01:02:03:04:05. Strangely it doesn’t like “CHAP” but is automatically selecting “MSCHAPV2” and working, maybe anything other than “PAP” would do.

auto eth0 iface eth0 inet dhcp wpa-driver wired wpa-conf /etc/wpa_supplicant_SITE.conf

Above is a snippet of /etc/network/interfaces that works with this configuration.

Related posts:

  1. SASL Authentication and Debian/Wheezy After upgrading a mail server to Debian/Unstable (which will soon...
  2. Fingerprints and Authentication Dustin Kirkland wrote an interesting post about fingerprint authentication [1]....
  3. installing Xen domU on Debian Etch I have just been installing a Xen domU on Debian...

OpenSTEM: Conversations on Collected Health Data

Thu, 2016-07-21 15:04

There are more and more wearable devices that collect a variety of health data, and other health records are kept electronically. More often than not, the people whose data it is don’t actually have access. There are very important issues to consider, and you could use this for a conversation with your students, and in assignments.

On the individual level, questions such as

  • Who should own your health data?
  • Should you be able to get an overview of who has what kind of your data?  (without fuzzy vague language)
  • Should you be able to access your own data? (directly out of a device, or online service where a device sends its data)
  • Should you be able to request a company to completely remove data from their records?

For society, questions like

  • Should a company be allowed to hoard data, or should they be required to make it accessible (open data) for other researchers?

A comment piece in this week’s Nature entitled “Lift the blockade on health data” could be used as a starting point a conversation and for additional information:

http://nature.com/articles/doi:10.1038/535345a

Technology titans, such as Google and Apple, are moving into health. For all the potential benefits, the incorporation of people’s health data into algorithmic ‘black boxes’ could harm science and exacerbate inequalities, warn John Wilbanks and Eric Topol in a Comment piece in this week’s Nature. “When it comes to control over our own data, health data must be where we draw the line,” they stress.

Cryptic digital profiling is already shaping society; for example, online adverts are tailored to people’s age, location, spending and browsing habits. Wilbanks and Topol envision a future in which “companies are able to trade people’s disease profiles, unbeknown to them” and where “health decisions are abstruse and difficult to challenge, and advances in understanding are used to aggressively market health-related services to people — regardless of whether those services actually benefit their health.”

The authors call for a campaigning movement similar to the environmental one to break open how people’s data are being used, and to illuminate how such information could be used in the future. In their view, “the creation of credible competitors that are open source is the most promising way to regulate” corporations that have come to “resemble small nations in their own right”.

 

Binh Nguyen: Social Engineering/Manipulation, Rigging Elections, and More

Tue, 2016-07-19 02:03
We recently had an election locally and I noticed how they were handing out 'How To Vote' cards which made me wonder. How much social engineering and manipulation do we experience each day/throughout our lives (please note, that all of the results are basically from the first few pages of any publicly available search engine)? - think about the education system and the way we're mostly taught to