Planet Linux Australia

Syndicate content
Planet Linux Australia -
Updated: 1 week 6 days ago

Simon Lyall: 2016 – Wednesday – Session 2

Wed, 2016-02-03 13:28

Welcoming Everyone: Five Years of Inclusion and Outreach Programmes at PyCon Australia by Christopher Neugebauer

  • How to bring more people to community run events
  • Talk is not about diversity in tech
  • Talk is about “Outreach and Inclusion in Events”
  • Outreach = getting them in , Inclusion = making them feel welcome
  • About funding programmes for events
  • FOSS happens over the Internet , face-to-face is less common than in other areas/communities
  • Events are where you can see the community
  • BUT: Going to a conference costs money – travel, rego, parking, leave from job
  • Events have equality of access problem
  • Inequity of access is  a problem with diversity
  • Solution: Run outreach programmes
  • Money can reduce the barriers, just spending money can help solve the problem
  • Pycon Australia has had outreach for last 5 years
  • FOSS vs other outreach programmes
    • Events have easy goals, define ppl/numbers to target, exact things to spend on, time period defined
    • Similar every year, similar result each year
    • Long-term results are ill-defined
    • Engagement is hard to track
  • Pycon Australia
    • Fairly independent of Python software foundation
    • Biggest Pycon within 9 hours of flying
    • Pycon US – 2500 attendees, $200k on financial attendance
    • Pycon Aus 2015 – 450 attendees , 5-8% of budget on funding
  • 2011
    • Harassment and Codes of Conduct were a big thing
    • Gender diversity policy, code of conduct, 20% speaks were women, First Gender diversity grants
    • 2 Grants, – 1 ticket and 1 Ticket + $500 funded out of general conf budget
    • 7 strong applicants at time when numbers were looking low (later picked up)
    • Sponsor found and funded all 7 applicants
  • 2012
    • 1st of 2 years running conf in Hobart
    • Moving from Sydney is hard. Australia big and people have to fly between cities (especially to Hobart)
    • Hobart long way away for many people and small number of locals
    • Sponsor increased funding to $700, funded 10 people for $500 + ticket
    • Previous grant recipient from 2011 was speaking in 2012
  • 2013
    • Finding more speakers from more places
    • Outreach and Speaker support run out of the same budget, cap removed on grants so International travel possible.
    • Anyone could apply removed purely on gender limit. So other people who needed funding could apply. Eg Students, teachers, geographic minorities
    • $12,500 allocated
    • As more signups and more money came in more could go to the assistance budget
    • If remove gender targeting then then what happens to diversity
    • Got groups like GeekGirlDinners to target people that needed grants rather than directly chasing people to apply.
    • Over half aid budget going to women
    • Teachers good force multiplies
  • 2014
    • Lost previous diversity Sponsor
    • Previously $5k from Sponsor + $7k from general fund.
    • Pycon US – Everybody pays to attend ( See Essay by Jesse Noller – Everybody Pays )
    • Most speakers have FOSS-friendly employers or can claim money
    • Argument: Some confs make everybody pay no matter their ability.
    • Told speakers that by default they would be charged, but by charge they weive it by just asking. Also said where the money was going and prioritised speakers to assistance. Also all organisers paid
    • Extra money from about $7000
    • Simplified structure of grants, less paperwork, just gave people a budget. Worked well since many people went with good deals.
    • Caters better for diverse needs
    • Also had Education Miniconf, covered under teacher traning budget. Offered to underwrite costs of substitute teachers for schools since that is not covered by normal school professional-dev budget
  •  Results
    • Every time at least one funding recipient has spoken at next conference
    • Many fundees come back when get professional jobs
    • Evangelize to the friends
  • Discovery
    • expanding fund gets people you might not expect
    • Diverse people have diverse needs
    • Avoid making people do paperwork, just give them money
    • Sponsors can make boot-strapping starting a programme easier
    • Don’t expect 100% success
    • Budget liberally, disburse conservatively
    • Watch out for immigration scams
    • Decline requests compassionately
  • Questions
    • Weekend hard for Childcare – Not heavily targeted
    • Targeting Speakers for funding rather than giving all of them means it gets to go a lot further. Better Bang for buck

Sentrifarm – open hardware telemetry system for Australian farming conditions by Andrew McDonnell

  • Great time to be a maker, everybody is able to make something
  • Neighbour had problem with having to measure grass fire danger in each paddock before going out with machinery during summer
  • Needs Wind Speed, temperature, humidity
  • Sentrifarm
    • Low power, solar
    • distributed
    • Works in area with slow internet, sim card expense adds up however
    • Easy to use for farmer, access via their farm.
    • Data should not be owned by cloud provider
  • Hackerday Prize
    • Build “something that matters”
    • Prizes just for participating
    • Document progress, produce a video
  • Our Goals
    • Cheap and Cheerful
    • Aussie “bush mechanic” ehtos
    • Enjoy the adventure
  • Used stuff from 24+ other opensource projects
  • Prototyping
    • Tried out various micro-controllers an other equipment
    • Most you could only buy for a few dollars
    • Tools – Bus Pirate
  • Radio links
    • ISM-band radio module “Lora” technology
    • SPI interface, well documented SX1276
    • $20 for the module
    • Propriety radio protocol, long rang low power, but open interface on top of it
  • Eagle used (alt is KiCAD) to design circuit
    • Build own shields to plug sensors and various controllers into
  • – run one command, creates a arduino project and builds with one command for multiple micro-controllers
  • MQTT-SN – communications protocol for low-bw links.
  • Breakdown of his stack, see his slides for details
  • Backend Software
    • Ubuntu
    • Docker
    • Carbon + Whisper + Graphite, Grafana
    • “Great time to be a hacker, using who knows how many lines of code and only had to write 7 to get it to work together”
  • Grafana hard to setup but found a nice docker container
  • Data kept separately from the container
  • Goal to get power down
  • Used 3D-printer to create some parts from mounting bits.
    • OpenSCAD – Language to design the parts
  • Range of Lori of 5km un-evalated , 9km up a tower with sinple home-built antenna
  • Won a top-100 prize at Hackaday of a t-shirt
  • You can do it
  • Questions
    • Ask home survives weather? – Not a lot of experience yet, some options
    • Home likely others to use? – Maybe but main gaol was to building it


Craige McWhirter: Usable formal methods - are we there yet? - Stefan Götz - LCa2016

Wed, 2016-02-03 11:17

Stefan Götz

  • Software reliability is often defined by industry standards.
  • Software analysis can be divided into three parts:
    • Static analysis
      • Examines code, no compilation or execution.
      • Share input with compilers
      • Example static analisers:
        • BLAST, Cppcheck, Eclipse, Frama-C
        • LLVM/CLang
        • Sparse
        • Splint - used by eChronos
    • Proof systems
    • Model checking
  • Performs patter matching
  • Understands c-types
  • Language model and rule matching
  • Control and data flow analysis
  • Similar to compiler setup
  • Run against entire application code.
  • Improved auto generated code and readability.
  • Found incorrect character conversion.
  • Discovered signal sets unintentionally returning a boolean,
  • Some false positives with unused code.
  • Some macros were not picked up.
  • Some variable initialisation not picked up.
  • Works very well over all.

Felt the time invested in using splint was well spent and brings a lot of piece of mind to the project.

Model Checking
  • Uses CBMC
  • Requires a little plumbing code and training.
  • Made them reconsider and improve execution timing.
  • Scalability requires improvements.
  • Being integrated with eChronos.
Are we there yet?
  • Static analysis Open Source tools need improving and an established best practices.
  • Model checking is not yet out of the box

Craige McWhirter: CloudABI - Ed Schouten - LCa2016

Wed, 2016-02-03 11:17

Ed Schouten provided a detailed tour of Capsicum and CloudABI.

  • AppArmor is an after thought
  • Puts the burden back on users
  • Not linked to security policies.
  • Capsicum is a FreeBSD method that sandboxes software
  • Works well with small applications but doesn't scale.
  • Questions why UNIX can't run third party binaries safely.
What is CloudABI?
  • CloudABI is a POSIX-like runtime environment based on Capsicum.
  • Capability based security with less foot shooting.
  • Global namespaces are entirely absent
    • By default can only perform actions with no global impact.
  • Symbiosis, not assimilation as it can run side by side with traditional applications.
  • File descriptors are used to provide additional rights.
  • Provided an example of using CloudABI to provide a secure web service.
  • You can use wrappers to provide features missing from CloudABI.
  • Only has 58 system calls. Incredibly compact.
  • Working towards having support for more POSIX operating systems.
  • Allows reuse of binaries without compilation.
  • Provided an example of a simple CloudABI ls program.
  • How to execute it via the shell
  • Feels there's scalability problems with CloudABI.
  • Wrote cloudabi-run to make it feel less clunky to run.
  • Replace CLI arguments with a YAML file.
  • Easy to configure.
  • Impossible to invoke programs with the wrong file
  • Reduces start-up complexity.
  • Gave an example of CloudABI as the basis of a cluster management suite.
  • Provides a 100% accurate dependency graph.
  • Gave an example of "CloudABI as a Service".

Simon Lyall: 2016 – Wednesday – Session 1

Wed, 2016-02-03 10:28

Going Faster: Continuous Delivery for Firefox by Laura Thomson

  • Works for Cloud services web operations team
  • Web Dev and Contious delivery lover
  • “Continuous delivery is for webapps” – Maybe not just Webapps? Maybe Firefox too
  • But Firefox is complicated
  • Process very complicated – “down from 5 source control systems to 3”
  • But plenty of web apps are very complicated (eg Netflix)
  • How do we continuous deliver Firefox
  • How it works currently
    • Release every 6 weeks
    • 4 channels – Nightly -> Aurora -> Beta -> release
    • Mercurial Repo for each channel
  • Release Models
    • Critical Mass – When enough is done and it is stable
    • Single Hard deadline – eg for games being mass released
    • Train Model – fixed intervals
    • Continuous Delivery
  • Deployment Maturity Model
  • Updates
    • New Build -> Generate  a diff -> FF calls back -> downloads and updates
    • Hotfixs
    • Addons automatically updated
  • Currently pipeline around 12 hours long, lots of tests and gatekeeping
  • “Go Faster”
    • System add-ons
    • Test Pilot
    • Data Separate from code
    • Downloadable content
    • Features delivered as web apps
  • System addons
    • Part of core FF, modularized into an add-on
    • Build/test against existing FF build, a lot smaller test
    • Updated up to daily(for now) on any release channel
    • signed and trusted
    • Restartless updates
      • install or update without a browser restart
      • Restarts suck
      • Restartsless coming soon for system add-ons
    • Good for rapid iteration, particularly on the front-end
    • Wrappers for services
    • Replacing hotfixes
  • Problems with add-ons
    • Localalisation
    • Optimizing UX : Better browser faster vs update fatigue
    • Upfront telemetry requirements
    • Dependency mngt on firefox
    • Dependency management between system add-ons (coming soon)
  • Add-ons in flights
    • Firefox hello is already an add-on
    • Currently in beta in 45
    • First beta updates before 46
  • Test Pilot
    • Release channel users opt in to new features
    • Release channel users different from pre-release ones
    • Developed as regular ad-ons (not system add ons)
    • Can graduate to system add-ons by flipping a bit
  • Data should be seperate from code
    • Sec policy
    • blocklists
    • tracking protection list
    • dictionaries
    • fonts
  • Many times Data update == release , this is broken
  • Also some have their own updaters
  • Kinto
    • Lightweight JSON storage with sync, sharing, signing
    • Natice JSON over http
    • niceties of couchDB backed by postgressDB
  • How Kinto Works
    • pings for updates
    • balrog supplies link to kinto
    • signed data downloaded, checked, applied
  • Kinto good for
    • Add-ons block list
  • Downloadable Content
    • Some parts of the browser may not need frequently
    • May not be needed on startup
    • eg languages packs, fonts for Firefox on Android
  • Features delivered remotely
    • Browser features delivered as web apps
    • Pull in content from the server
    • in a early stage
  • Futures
    • Easy for projects to impliment
    • Better “knobs and dials” (canaries A?B, data viz)
    • Pushed based updates
    • Simpler localisation
  • Questions
    • They support rollbacks
    • Worst case: Firefox has a startup crash
    • Not sure sure ice weasel would fit in.
    • How will effect ESR channel? – Won’t change, they will stay security-only
    • Bad Addons – Hate ones that reporting user-data, crashers (eg skype toolbar at one point), Highjack your browser and change settings
    • There is much collaboration between [open source] browsers
    • You are avoiding the release cycle, planning to speed it up – Lots of tests that can’t get rid of all, working on it but not a simple thing to solve.


Craige McWhirter: LCA2016 Wednesday Keynote - Catarina Mota

Wed, 2016-02-03 09:40

Catarina Mota spoke about how open source software changed her life.

  • Discussed the spilling over of online communities in the real world, with 3D printing as an example.
  • In particular covered the RepRap printer and community.
  • RepRap printer provided the ability to short circuit traditional manufacturing.

"think of RepRap as a China on your desktop" - Chris de Bona

  • Open Source (GNU GPL) is core the RepRap mission and it's success.
  • Catarina describe sher house as an Open Source house.
  • Followed Open Source principles to design and build her home.
  • The goal was more affordable and sustainable housing.
Why Does it Matter?
  • Machines are not neutral.
  • Talked about obsolescence and e-waste.
  • Phones are big contributor,
  • Phonebloks is a phone designed to be phone worth keeping with entirely replaceable components.
  • Users are co-creators.
  • Meant to be repaired, transformed, adapted and appropriated.

Catarina gave a wonderful talk on open source software, hardware and the way it's changing the world. Great talk.

Lev Lafayette: Reviving a Downed Compute Node in TORQUE/MOAB

Wed, 2016-02-03 09:29

The following describes a procedure for bringing up a compute node in TORQUE that's marked as 'Down'. Whilst the procedure, once known, is relatively simple, investigation to come to this stage required some research and to save others time this document may help.

1. Determine whether the node is really down.

Following an almighty NFS outage quite a number of compute nodes were marked as "down". However the two standard tools, `mdiag -n | grep "Down"` and `pbsnodes -ln` gave significantly different results.

read more

Colin Charles: Donating to an opensource project when you download it

Tue, 2016-02-02 19:25

Apparently I’ve always thought that donating to opensource software that you use would be a good idea — I found this about Firefox add-ons. I suggested that the MariaDB Foundation do this for downloads to the MariaDB Server, and it looks like most people seem to think that it is an OK thing to do.

I see it being done with Ubuntu, LibreOffice, and more recently: elementary OS. The reasoning seems sound, though there was some controversy before they changed the language of the post. Though I’m not sure that I’d force the $0 figure. 

For something like MariaDB Server, this is mostly going to probably hit Microsoft Windows users; Linux users have repositories configured or use it from their distribution of choice! 

Simon Lyall: 2016 – Sysadmin Miniconf – Session 3

Tue, 2016-02-02 16:28

The life of a Sysadmin in a research environment – Eric Burgueno

  • Everything must be reproducible
  • Keeping system up as long as possible, not have an overall uptime percentage
  • One person needs to cover lots of roles rather than specialise
  • 2 Servers with 2TB of RAM. Others smaller according to need
  • Lots of varied tools mostly bioinformatics software
  • 90TB to over 200TB of data over 2 years. Lots of large files. Big files, big servers.
  • Big job using 2TB of RAM taking 8 days to run.
  • The 2*2TB servers can be joined togeather to create a single 4TB server
  • Have to customize environment for each tool, hard when there have lots of tools and also want to compare/collaborate against other places where software is being run.
  • Reproducible(?) Research

Creating bespoke logging systems and dashboards with Grafana, in fifteen minutes – Andrew McDonnell

Live Demo

Order in the chaos: or lessons learnt on planning in operations – Peter Hall

  • Lead of an Ops team at REA group. Looks after dev teams for 10-15 applications
  • Ops is not a project, but works with many projects
  • Many sources of work, dev, security, incidents, infrastructure improvement
  • Understand the work
    • Document your work
    • Talk about it, 15min standup
  • Scedule things
    • and prepare for the unplanned
    • Perhaps 2 weeks
    • Leave lots of slack
  • Interruptions
    • Assign team members to each ops teams
    • Rotating “ops goal keeper”
    • Developers on pager
  • Review Often
  • Longer term goals for your team
  • Failure demand vs value demand.
    • Make sure [at least some of] what you are doing is adding value to the environment


From Commit to Cloud – Daniel Hall

  • Deployments should be:
    • fast – 10 minutes
    • small – only one feature change and person doing should be aware of all of what is changing
    • easy – little human work as possible, simple to understand
  • We believe this because
    • less to break
    • devs should focus on dev
    • each project should be really easy to learn, devs can switch between projects easy
    • Don’t want anyone from being afraid to deploy
  • Able to rollback
    • 30 microservices
    • 2 devs plus some work from others
  • How to do it
    • Microservices arch (optional but helps)
    • git , build agent, packaging format with dependencies
    • something to run you stuff
  • code -> git -> built -> auto test -> package -> staging -> test -> deploy to prod
  • Application is built triggere by git
    • script in each repo
  • Auto test after build, don’t do end-to-end testing, do that in staging
  • Package app – they use docker – push to internal docker repo
  • Deploy to staging – they use curl to push json mesos/matathon with pulls container. Testing run there
  • Single Click approval to deploy to staging
  • Deploy to prod – should be same as how you deploy to staging.

LNAV – Paul Wayper

  • Point at a dir. read all the files. sort all the lines together in timestamp order
  • Colour codes, machines, different facilities(daemons). Highlights IPs addresses
  • Errors lines in red, warning lines in yellow
  • Regular expressions highlighted. Fully pcre compatable
  • Able to move back and force and hour or a day at a time with special keys
  • Histograph of error lines, number per minutes etc
  • more complete (SQL like) queries
  • compiles as a static binary
  • Ability to add your own log file formats
  • Ability share format filters with others
  • Doesn’t deal with journald logs
  • Availbale for spel, fedora, debian but under a lot of active development.
  • acts like tail -f to spot updates to logs.


Russell Coker: Compatibility and a Linux Community Server

Tue, 2016-02-02 16:26

Compatibility/interoperability is a good thing. It’s generally good for systems on the Internet to be capable of communicating with as many systems as possible. Unfortunately it’s not always possible as new features sometimes break compatibility with older systems. Sometimes you have systems that are simply broken, for example all the systems with firewalls that block ICMP so that connections hang when the packet size gets too big. Sometimes to take advantage of new features you have to potentially trigger issues with broken systems.

I recently added support for IPv6 to the Linux Users of Victoria server. I think that adding IPv6 support is a good thing due to the lack of IPv4 addresses even though there are hardly any systems that are unable to access IPv4. One of the benefits of this for club members is that it’s a platform they can use for testing IPv6 connectivity with a friendly sysadmin to help them diagnose problems. I recently notified a member by email that the callback that their mail server used as an anti-spam measure didn’t work with IPv6 and was causing mail to be incorrectly rejected. It’s obviously a benefit for that user to have the problem with a small local server than with something like Gmail.

In spite of the fact that at least one user had problems and others potentially had problems I think it’s clear that adding IPv6 support was the correct thing to do.

SSL Issues

Ben wrote a good post about SSL security [1] which links to a test suite for SSL servers [2]. I tested the LUV web site and got A-.

This blog post describes how to setup PFS (Perfect Forward Secrecy) [3], after following it’s advice I got a score of B!

From the comments on this blog post about RC4 etc [4] it seems that the only way to have PFS and not be vulnerable to other issues is to require TLS 1.2.

So the issue is what systems can’t use TLS 1.2.

TLS 1.2 Support in Browsers

This Wikipedia page has information on SSL support in various web browsers [5]. If we require TLS 1.2 we break support of the following browsers:

The default Android browser before Android 5.0. Admittedly that browser always sucked badly and probably has lots of other security issues and there are alternate browsers. One problem is that many people who install better browsers on Android devices (such as Chrome) will still have their OS configured to use the default browser for URLs opened by other programs (EG email and IM).

Chrome versions before 30 didn’t support it. But version 30 was released in 2013 and Google does a good job of forcing upgrades. A Debian/Wheezy system I run is now displaying warnings from the google-chrome package saying that Wheezy is too old and won’t be supported for long!

Firefox before version 27 didn’t support it (the Wikipedia page is unclear about versions 27-31). 27 was released in 2014. Debian/Wheezy has version 38, Debian/Squeeze has Iceweasel 3.5.16 which doesn’t support it. I think it is reasonable to assume that anyone who’s still using Squeeze is using it for a server given it’s age and the fact that LTS is based on packages related to being a server.

IE version 11 supports it and runs on Windows 7+ (all supported versions of Windows). IE 10 doesn’t support it and runs on Windows 7 and Windows 8. Are the free upgrades from Windows 7 to Windows 10 going to solve this problem? Do we want to support Windows 7 systems that haven’t been upgraded to the latest IE? Do we want to support versions of Windows that MS doesn’t support?

Windows mobile doesn’t have enough users to care about.

Opera supports it from version 17. This is noteworthy because Opera used to be good for devices running older versions of Android that aren’t supported by Chrome.

Safari supported it from iOS version 5, I think that’s a solved problem given the way Apple makes it easy for users to upgrade and strongly encourages them to do so.

Log Analysis

For many servers the correct thing to do before even discussing the issue is to look at the logs and see how many people use the various browsers. One problem with that approach on a Linux community site is that the people who visit the site most often will be more likely to use recent Linux browsers but older Windows systems will be more common among people visiting the site for the first time. Another issue is that there isn’t an easy way of determining who is a serious user, unlike for example a shopping site where one could search for log entries about sales.

I did a quick search of the Apache logs and found many entries about browsers that purport to be IE6 and other versions of IE before 11. But most of those log entries were from other countries, while some people from other countries visit the club web site it’s not very common. Most access from outside Australia would be from bots, and the bots probably fake their user agent.

Should We Do It?

Is breaking support for Debian/Squeeze, the built in Android browser on Android <5.0, and Windows 7 and 8 systems that haven’t upgraded IE as a web browsing platform a reasonable trade-off for implementing the best SSL security features?

For the LUV server as a stand-alone issue the answer would be no as the only really secret data there is accessed via ssh. For a general web infrastructure issue it seems that the answer might be yes.

I think that it benefits the community to allow members to test against server configurations that will become more popular in the future. After implementing changes in the server I can advise club members (and general community members) about how to configure their servers for similar results.

Does this outweigh the problems caused by some potential users of ancient systems?

I’m blogging about this because I think that the issues of configuration of community servers have a greater scope than my local LUG. I welcome comments about these issues, as well as about the SSL compatibility issues.

Related posts:

  1. Name Server IP and a Dead Server About 24 hours ago I rebooted the system that runs...
  2. Server Costs vs Virtual Server Costs The Claim I have seen it claimed that renting a...
  3. My Blog Server was Cracked On the 1st of August I noticed that the server...

Craige McWhirter: Haskell is Not For Production and Other Tales - Katie Miller - LCA2016

Tue, 2016-02-02 16:17

Katie Miller gave an excellent talk about Haskell covering:

  • Haskell is at the heart of Sigma at Facebook handling more than 1 million requests / second
  • Haxl is a Haskell framework in Sigma and is used for fighting malicious activity on Facebook
  • Haxl is open source.
  • Haskell's origins are academic.
  • Haskell is used widely in industry. Katie listed quite a few.
Why Haskell?
  • The ability to reason about code, from:
    • Purity
    • Strong static typing
    • Abstract away from concurrency
  • Haskell performs:
    • As much as 3x as fast as it's predecessor
    • 30 times as fast for the user experience.
The Myth of Haskell Being Difficult
  • Not necessarily difficult but it different and that result sin friction.
  • Expectations:
    • Concepts don't neatly to familiar languages.
    • It's a journey teaching you a new way to think.
  • Abstractions:
    • Abstract concepts can take time to get a handle on.
  • Type Errors
    • Can be a little difficult to explain to new users
    • Worth mastering due to increase of productivity.
Teaching Haskell
  • Didn't mention monads.
  • Stuck to notations and implicit concurrency.
  • Created conversation space to discuss issues.
  • Results exceeded expectations.
Hiring Difficulty
  • There are more people wanting to work in Haskell than there are Haskell jobs.
  • An embarrassment of talent riches.
Haskell Community Difficulty
  • Community may have forgotten how to communicate to new people.
  • Keep in mind what it as like to be new.
  • Documentation for libraries needs to improve.
  • Work to create diverse and welcoming community.
  • Be technical brutal but personally respectful.
  • Haskell is not immune to bad code.

"Using functional programming is a conclusion from a goal, not the goal itself" - Brian McKenna

  • The legacy of Haskell is spreading good ideas and this was an original goal.
  • Haskell's difference brings both benefits and challenges.

"Open Source = opportunity" - Katie Miller

Conclusion: Haskell is for production.

Simon Lyall: 2016 – Sysadmin Miniconf – Session 2

Tue, 2016-02-02 15:28

Site Reliability Engineering at Dropbox – Tammy Butow

  • Having a SLA, measuring against it. Caps OPSwork, Blameless Post Mortum, Always be coding
  • 400 M customer, billion files every day
  • Very hard to find people to scale, so build tool to scale instead
  • Team looks at 6,000 DB machines, look after whole machines not just the app
  • Build a lot of tools in python and go
  • PygerDuty – python library for pagerduty API
    • Easy to find the top things paging, write tools to reduce these
    • Regular weekly meeting to review those problems and make them better
    • If work is happening on machines then turn off monitoring on them so people don’t get woken up for things they don’t need to.
    • Going for days without any pages
  • Self-Healing and auto-remediation scripts
  • Hermes
    • Allocate and track tasks to groups
  • Automation of DB tasks
  • Bot copies pagerduty alerts in slack
  • Aim Higher
    • Create a roadmap for next 12 months
    • Buiding a rocketship while it is flying though the sky
  • Follow the Sun so people are working days
  • Post Mortem for every page
  • Frequent DR testing
  • Take time out to celebrate

I missed out writing up the next couple of talks due to technical problems



Tim Serong: A Gentle Introduction to Ceph

Tue, 2016-02-02 14:28

I told a little story about Ceph today in the sysadmin miniconf at 2016. Simon and Ewen have already put a copy of my slides up on the miniconf web site, but for bonus posterity you can get them from here also. For some more detail, check out Lars’ SUSE Enterprise Storage – Design and Performance for Ceph slides from SUSECon 2015 and Software-defined Storage: Sizing and Performance.

Update (2016-02-07): The video of the talk can be found here.

Craige McWhirter: Internet Archive: Universal Access, Open APIs - VM Brasseur - LCA2016

Tue, 2016-02-02 14:07

VM Brasseur gave the most revealing and informative talk I've attended thus far.

A brief tour:
  • Anonymise IP addresses
  • Run the Wayback Machine
  • Archive for paying customers
  • Donated crawled content
  • Deep crawl on popular sites
  • Broad shallow crawl of known domains
  • Targeted crawls.
  • On demand crawling from the webpage - so you can archive before it's pulled from the net.
  • Scan 1000 books per day
  • 30 scanning centres around the world.
  • Runs the Open Library for both books, video and TV.
  • TV is now a citable resource
  • Live music archive
    • Includes every live show of the Grateful Dead.
  • Video game archive of classic games playable in the browser.
  • Will save anything digital, for free, publicly.
Open Library Api
  • Returns JSON (XML also available)
  • Evergreen and Koha are FOSS library applications that use the Open Library API.
  • "Do We Want It?" API to see what books the library needs.
  • There is an advanced search API too.
  • Drop in replacement for the Amazon S3 API.
  • NASA use the API. All the audio, video and photos are available on the Internet Archive.
  • ia-wrapper is a Perl client to access the API.
  • There is a Python client: internetarchive.

This is the talk that has blown my mind so far. This ought to have been a keynote. So very glad I made it. I highly recommend watching the video when it comes out. This was an amazing talk.

Simon Lyall: 2016 – Sysadmin Miniconf – Session 1

Tue, 2016-02-02 12:28

Is that a Cloud in you packet – Steven Ellis

  • What if you could have a demo of a stack on a phone
  • or on a memory stick or a mini raspberry-pi type PC
  • Nested Virtualisation
  • Hardware
    • Using Linux as host env, not so good on Win and Mac
    • Thinkpad, fedora or Centos, 128GB SSD
  • Nested Virtualisation
    • Huge perforance boost over qemu
    • Use SSD
    • enable options in modules kvm-intel or kvm-amd
    • Confirm SSD perf 1st – hdparm -t /dev/sdX
    • Create base env for VMs, enable vmx in features
    • Make sure it uses a different network so doesn’t badly interact with ones further out
  • Think LVM
    • Creat ethin pool for all envs
    • Think on lvm ” issue_discards = 1 “
  • Base image
    • Doesn’t have to be minimal
    • update the base regularly
    • How do you build your base image?
      • Thin may go weirdly wrong
      • Always use kickstart to re-create it.
    • Think of your use case, don’t skim on the disk (eg 40G disk for image)
    • ssh keys, Enable yum cache
    • Patch once kicked
    • keep a content cache, maybe with rsync or mrepo
  • Turn off VM and hen use fsrim and partx to make it nice and smaller.
  • virt-manager can’t manage thin volumes, DONT manually add the path
  • use virsh to manually add the path.
  • snapshots or snapshots great performance on SSD
  • Thin longer activates automatically on distros
  • packstack simple way to install simple openstack setup
  • LVM vs QCOW
    • qcow okay for smaller images
    • cloud-init with atomic
    • do not snapshot a qcow image when it is running

Revisiting Unix principles for modern system automation – Martin Krafft

  • SSH Botnet
  • OSI of System Automation
  • Transport unix style, both push and pull
  • uses socat for low level data moving
  • autossh <- restarts ssh connection automatically
  • creates control socket

A Gentle Introduction to Ceph – Tim Serong

  • Ceph gives a storage cluster that is self healing and self managed
  • 3 interfaces, object, block, distributed fs
  • OSD with files on them, monitor nodes
  • OSD will forward writes to other replics of the data
  • clients can read from any OSD
  • Software defined storage vs legacy appliances
  • Network
    • Fastest you can, seperate public and cluster networks
    • cluster fatsre than public
  • Nodes
    • 1-2G ram per TB of storage
    • read recomendations
  • SSD journals to cache writes
  • Redundancy
    • Replications – capacity impact but usually good performance
    • Erasure coding – Like raid – better space efficiency but impact in most other areas
  • Adding more nodes
    • tends to work
    • temp impact during rebalancing
  • How to size
    • understand you workload
    • make a guess
    • Build a 10% pilot
    • refine to until perf is achieved
    • scale up the the pilot

Keeping Pinterest running – Joe Gordon

  • Software vs service
    • No stable versions
    • Only one version is live
    • Devs support their own service – alligns incentives, eg monitoring built in
    • Testing against production traffic
  • SRE at Pinterest
    • Like a pit crew in F1
    • firefighting at scale
    • changing tires while moving
  • Operation Maturity
  • Operation Excellence
    • Have the best practices, docs, process, imporvements
    • Repeatable deploys
  • Visability
    • data driven company
    • Lots of Time series data – TSDB
    • Using ELK
  • Deployments
    • no impact to end user
    • easy to do, every few minutes
  • Canary vs Staging
    • Send dark (copies) of traffic to canary box without sending anything back to user
    • Bounce back to starting if problems
  • Teletran
    • Rollback, hotfix, rolling deploy, starting and testing, visibility and useability
    • client-server model
    • pre/post download, restart, etc scripts included with every deployment
    • puase/resume various testing
  • Postmortums and Production readyness reviews
  • Cloud is not infinite, often will hit AWS capacity limits or even no avaialble stuff in the region
  • Need to be able to make sure you know what you are running and if it i seffecintly used
  • Open sourced tools
    • mysql_utils – lots of tools to manage many DBs
    • Thrift tools
    • Teletraan – open sourced in Feb 2016


Craige McWhirter: Functional Programming in Python - Juan Nunez-Iglesiasis - LCA2016

Tue, 2016-02-02 11:31

Juan Nunez-Iglesiasis gave an informative talk on using toolz to write Python using functional programming methodologies.

  • There was an excellent real world example using streaming.
  • There were a couple of other examples user k-mer and a genome markov model.
  • Python has a function call overhead of half a megabyte per second that is a hard limit for these applications.

Juan provided an excellent introduction into using the libraries that enable to write in Python in the manner of functional programming language.

Craige McWhirter: Functional Programming, Parametricity, Types - Tony Morris - LCA2016

Tue, 2016-02-02 10:50

Tony Morris gave an excellent and informative talk that was mostly well beyond my current skill sets. some key points were:

  • Believes that functional programming as already won "the war".
  • Parametricity is a tool of hight reward.
  • Every programme terminates in a "total language".
  • Gave an excellent and detailed example of a fictional parametric language.
  • Covered the limits of parametricity using Haskell as an example.

However my understanding of parametricity is greater than when I entered the room :-)

Sitting next to Brendan O'Dea for his post talk rants was definitely a good move.

Craige McWhirter: LCA2016 Tuesday Keynote - George Fong

Tue, 2016-02-02 09:42

George Fong opened covering an extraordinary time in history, the birth of the Internet and in particular it's origins in Australia.

George made some interesting points about Linux and the community:

  • 96.3% of the top 1 million website run on Linux
  • Open Source and Free Software refer to a different set of views based on different values.

There was some coverage of the common foundations of Internet communities and the rough consensus.

"Be conservative in what you send and liberal in what you accept" - Jon Postel

Discussed the impact and challenges of mobile devices, the Internet of things and the reliability of maintenance.

Covered the innovation of the hacker / maker community doing a better job in some cases than the manufacturers.

Made some interesting points about how the Internet became a scary place, the rift between the tech community and commercial users.

Discussed the erosion of civil digital liberties and the interference of the state in devices.

The frustrations about metadata laws were touched and quoted Michael Cordover.

George wrapped up covering the loss of integrity and trust. A heightened awareness of the human impact.

"The cavalry are not coming. We are the cavalry" - George Fong

George gave an interesting and thought provoking keynote. Well worth watching when the video becomes available.

Simon Lyall: 2016 – Tuesday – Keynote: George Fong

Tue, 2016-02-02 09:28

George Fong – Chair of Internet Australia

The Challenges of the Changing Social Significance of the Nerd

  • “This is the first conference I’ve been to where there’s an extremely high per capita number of ponytails”
  • Linux not just running web server and other servers  but also network devices
  • Linux and the Web aren’t the same thing, but they’ve grown symbiotically and neither would be the same without the other
  • “One of the lessons we’ve learned in Australia is that when you mix technology with politics, you get into trouble”
  • “We have proof in Australia that if you take guns away from people, people stop getting killed”



Simon Lyall: 2016 – Monday – Session 3

Mon, 2016-02-01 16:28

Cloud Anti-Patterns – Casey West

  • The 5 stages of Cloud Native
  • Deploying my apps to the cloud is painful – why?
  • Denial
    • “Containers are like tiny VMs”
    • Anti-Pattern 1 – do not assume what you have now is what you want to put into the cloud or a container
    • “We don’t need to automate continuous delivery”
    • We shouldn’t automate what we have until it is perfect. Automate to make things consistent (not always perfect at least at the start)
  • Anger
    • “works on my machine”
    • Dev is just push straight from dev boxes to production
    • Not about making worse code go to production faster
    • Aim to repeatable testable builds, just faster
  • Bargaining
    • “We crammed the monolith into a container and called it a microservice”
    • Anti-Pattern: Critically think on what you need to re-factor (or “re-platforming” )
    • ” Bi-modal IT “
    • Some stuff on fast lane, some stuff on old-way slow lane
    • Anti-pattern: leagacy products put into slow lane, these are often the ones that really need to be fixed.
    • “Micros-services” talking to same data-source, not APIs
  • Depression
    • “200 microservices but forgot to setup Jenkins”
    • “We have an automated build pipeline but online release twice per year”
  • Acceptance
    • All software sucks, even the stuff we write
    • Respect CAP theorem
    • Respect Conway’s Law
    • Small batch sizes works for replatforming too
  • Microservices architecture, Devops culture, Continuous delivery – Pick all three

Cloud Crafting – Public / Private / Hybrid  – Steven Ellis

  • What does Hybrid mean to you?
  • What is private Cloud (IAAS)
  • Hybrid – communicate to public cloud and manage local stuff
  • ManageIQ – single pain of glass for hardware, vms, clounds, containers
  • What does it do?
    • Brownfields as well as Greenfields, gathers current setup
    • Discovery, API presentations, control and detect when env non-complient (eg not fully patched)
    • Premise or public cloud
    • Supplied as a virtual appliance, HA, scale out
    • Platform – Centos 7, rails, postgress, gui, some dashboards our of the box.
  • Get involved
    • Online, roadmap is public
    • Various contributors
  • DEMO
  • Just put in credentials to allow access and then it can gather the data straiht away

Live Migration of Linux Containers by Tycho Andersen

  • LXC / LXD
  • LXD is a REST API that you use to control the container system
  • tool -> RST -> Daemon -> lxc -> Kernel
  • “lxc move host1:c1 host2: ” – Live migrations
    • Needs a bit of work since lots moving, lots of ways it could fail
    • 3 channels created, control, filesystem, container processes state
  • CRIU
    • 5 years of check-pointing
    • Lots based off open-VZ initial work
    • All sorts of things need to support check-pointing and moving (eg selinux)
    • Iterative migration added
    • Lots of hooks needed for very privileged kernel features
  • Filesystems
    • btrfs, lvm, zfs, (swift, nfs), have special support for migration that it hooks into
    • rsync between incompatable hosts
  • Memory State
    • Stop the world and move it all
    • Iterative incremental transfer (via p.haul) being worked on.
  • LXC + LXD 2.0 should be in Ubuntu 16.04 LTS
  • Need to use latest versions and latest kernels for best results.


Craige McWhirter: ESPlant - Open Hardware Miniconf - LCA2016

Mon, 2016-02-01 15:25

Today at Linux Conference Australia 2016 or "LCA by the Bay" in Geelong, I attended the Open Hardware miniconf for the first time after a number of years of wanting to attend but being unable to.

This year they were making an ESPlant (Environment Sensor Plant), which is kind of timely.

The solar powered plant we're building has the following sensors:

  • BME280 Temperature/Humidity/Barometric Pressure sensor
  • 2 soil moisture sensors
  • DS18B20 temperature sensor
  • PIR (infrared motion) sensor (in case your plants are running away from something)
  • ADXL345 accelerometer (to see how fast your plants are running away)
  • WS2812B LED strip
  • UV sensor SEN0162 from DF Robot.

It only took about 30 minutes to assemble, and it looked like this on my desk:

This was an enjoyable and extremely productive hardware hacking session. I am going to have some of these all over our gardens and they will look something like this but with added weather shields:

I intend to hook the moisture sensors into the watering system so that water is delivered when soil moisture levels reach a pre-determined low point and stopped once they reach a desired saturation point. I want to maximise the efficiency of water consumption and I think this should help.

Sample raw serial output from the ESPlant:

temp = 26.22 pressure = 1005.65 humidity = 43.34 acc/x = 0.04 acc/y = -0.35 acc/z = 10.04 adc/uv_sensor = 0 adc/soil_1 = 91 adc/soil_2 = 38 adc/input_voltage = 4398 adc/internal_temp = 1656 external/temp_sensor = 24.00 chip/free_heap = 47120 chip/vcc = 3431 pir = low led = 6

This data could also be fed into some long term graphing on my server via it's WiFi chip.