Planet Linux Australia
Catarina Mota spoke about how open source software changed her life.
- Discussed the spilling over of online communities in the real world, with 3D printing as an example.
- In particular covered the RepRap printer and community.
- RepRap printer provided the ability to short circuit traditional manufacturing.
"think of RepRap as a China on your desktop" - Chris de Bona
- Open Source (GNU GPL) is core the RepRap mission and it's success.
- Catarina describe sher house as an Open Source house.
- Followed Open Source principles to design and build her home.
- The goal was more affordable and sustainable housing.
- Machines are not neutral.
- Talked about obsolescence and e-waste.
- Phones are big contributor,
- Phonebloks is a phone designed to be phone worth keeping with entirely replaceable components.
- Users are co-creators.
- Meant to be repaired, transformed, adapted and appropriated.
Catarina gave a wonderful talk on open source software, hardware and the way it's changing the world. Great talk.
The following describes a procedure for bringing up a compute node in TORQUE that's marked as 'Down'. Whilst the procedure, once known, is relatively simple, investigation to come to this stage required some research and to save others time this document may help.
1. Determine whether the node is really down.
Following an almighty NFS outage quite a number of compute nodes were marked as "down". However the two standard tools, `mdiag -n | grep "Down"` and `pbsnodes -ln` gave significantly different results.
Apparently I’ve always thought that donating to opensource software that you use would be a good idea — I found this about Firefox add-ons. I suggested that the MariaDB Foundation do this for downloads to the MariaDB Server, and it looks like most people seem to think that it is an OK thing to do.
I see it being done with Ubuntu, LibreOffice, and more recently: elementary OS. The reasoning seems sound, though there was some controversy before they changed the language of the post. Though I’m not sure that I’d force the $0 figure.
For something like MariaDB Server, this is mostly going to probably hit Microsoft Windows users; Linux users have repositories configured or use it from their distribution of choice!
The life of a Sysadmin in a research environment – Eric Burgueno
- Everything must be reproducible
- Keeping system up as long as possible, not have an overall uptime percentage
- One person needs to cover lots of roles rather than specialise
- 2 Servers with 2TB of RAM. Others smaller according to need
- Lots of varied tools mostly bioinformatics software
- 90TB to over 200TB of data over 2 years. Lots of large files. Big files, big servers.
- Big job using 2TB of RAM taking 8 days to run.
- The 2*2TB servers can be joined togeather to create a single 4TB server
- Have to customize environment for each tool, hard when there have lots of tools and also want to compare/collaborate against other places where software is being run.
- Reproducible(?) Research
Creating bespoke logging systems and dashboards with Grafana, in fifteen minutes – Andrew McDonnell
Order in the chaos: or lessons learnt on planning in operations – Peter Hall
- Lead of an Ops team at REA group. Looks after dev teams for 10-15 applications
- Ops is not a project, but works with many projects
- Many sources of work, dev, security, incidents, infrastructure improvement
- Understand the work
- Document your work
- Talk about it, 15min standup
- Scedule things
- and prepare for the unplanned
- Perhaps 2 weeks
- Leave lots of slack
- Assign team members to each ops teams
- Rotating “ops goal keeper”
- Developers on pager
- Review Often
- Longer term goals for your team
- Failure demand vs value demand.
- Make sure [at least some of] what you are doing is adding value to the environment
From Commit to Cloud – Daniel Hall
- Deployments should be:
- fast – 10 minutes
- small – only one feature change and person doing should be aware of all of what is changing
- easy – little human work as possible, simple to understand
- We believe this because
- less to break
- devs should focus on dev
- each project should be really easy to learn, devs can switch between projects easy
- Don’t want anyone from being afraid to deploy
- Able to rollback
- 30 microservices
- 2 devs plus some work from others
- How to do it
- Microservices arch (optional but helps)
- git , build agent, packaging format with dependencies
- something to run you stuff
- code -> git -> built -> auto test -> package -> staging -> test -> deploy to prod
- Application is built triggere by git
- build.sh script in each repo
- Auto test after build, don’t do end-to-end testing, do that in staging
- Package app – they use docker – push to internal docker repo
- Deploy to staging – they use curl to push json mesos/matathon with pulls container. Testing run there
- Single Click approval to deploy to staging
- Deploy to prod – should be same as how you deploy to staging.
LNAV – Paul Wayper
- Point at a dir. read all the files. sort all the lines together in timestamp order
- Colour codes, machines, different facilities(daemons). Highlights IPs addresses
- Errors lines in red, warning lines in yellow
- Regular expressions highlighted. Fully pcre compatable
- Able to move back and force and hour or a day at a time with special keys
- Histograph of error lines, number per minutes etc
- more complete (SQL like) queries
- compiles as a static binary
- Ability to add your own log file formats
- Ability share format filters with others
- Doesn’t deal with journald logs
- Availbale for spel, fedora, debian but under a lot of active development.
- acts like tail -f to spot updates to logs.
Compatibility/interoperability is a good thing. It’s generally good for systems on the Internet to be capable of communicating with as many systems as possible. Unfortunately it’s not always possible as new features sometimes break compatibility with older systems. Sometimes you have systems that are simply broken, for example all the systems with firewalls that block ICMP so that connections hang when the packet size gets too big. Sometimes to take advantage of new features you have to potentially trigger issues with broken systems.
I recently added support for IPv6 to the Linux Users of Victoria server. I think that adding IPv6 support is a good thing due to the lack of IPv4 addresses even though there are hardly any systems that are unable to access IPv4. One of the benefits of this for club members is that it’s a platform they can use for testing IPv6 connectivity with a friendly sysadmin to help them diagnose problems. I recently notified a member by email that the callback that their mail server used as an anti-spam measure didn’t work with IPv6 and was causing mail to be incorrectly rejected. It’s obviously a benefit for that user to have the problem with a small local server than with something like Gmail.
In spite of the fact that at least one user had problems and others potentially had problems I think it’s clear that adding IPv6 support was the correct thing to do.SSL Issues
This blog post describes how to setup PFS (Perfect Forward Secrecy) , after following it’s advice I got a score of B!
From the comments on this blog post about RC4 etc  it seems that the only way to have PFS and not be vulnerable to other issues is to require TLS 1.2.
So the issue is what systems can’t use TLS 1.2.TLS 1.2 Support in Browsers
This Wikipedia page has information on SSL support in various web browsers . If we require TLS 1.2 we break support of the following browsers:
The default Android browser before Android 5.0. Admittedly that browser always sucked badly and probably has lots of other security issues and there are alternate browsers. One problem is that many people who install better browsers on Android devices (such as Chrome) will still have their OS configured to use the default browser for URLs opened by other programs (EG email and IM).
Chrome versions before 30 didn’t support it. But version 30 was released in 2013 and Google does a good job of forcing upgrades. A Debian/Wheezy system I run is now displaying warnings from the google-chrome package saying that Wheezy is too old and won’t be supported for long!
Firefox before version 27 didn’t support it (the Wikipedia page is unclear about versions 27-31). 27 was released in 2014. Debian/Wheezy has version 38, Debian/Squeeze has Iceweasel 3.5.16 which doesn’t support it. I think it is reasonable to assume that anyone who’s still using Squeeze is using it for a server given it’s age and the fact that LTS is based on packages related to being a server.
IE version 11 supports it and runs on Windows 7+ (all supported versions of Windows). IE 10 doesn’t support it and runs on Windows 7 and Windows 8. Are the free upgrades from Windows 7 to Windows 10 going to solve this problem? Do we want to support Windows 7 systems that haven’t been upgraded to the latest IE? Do we want to support versions of Windows that MS doesn’t support?
Windows mobile doesn’t have enough users to care about.
Opera supports it from version 17. This is noteworthy because Opera used to be good for devices running older versions of Android that aren’t supported by Chrome.
Safari supported it from iOS version 5, I think that’s a solved problem given the way Apple makes it easy for users to upgrade and strongly encourages them to do so.Log Analysis
For many servers the correct thing to do before even discussing the issue is to look at the logs and see how many people use the various browsers. One problem with that approach on a Linux community site is that the people who visit the site most often will be more likely to use recent Linux browsers but older Windows systems will be more common among people visiting the site for the first time. Another issue is that there isn’t an easy way of determining who is a serious user, unlike for example a shopping site where one could search for log entries about sales.
I did a quick search of the Apache logs and found many entries about browsers that purport to be IE6 and other versions of IE before 11. But most of those log entries were from other countries, while some people from other countries visit the club web site it’s not very common. Most access from outside Australia would be from bots, and the bots probably fake their user agent.Should We Do It?
Is breaking support for Debian/Squeeze, the built in Android browser on Android <5.0, and Windows 7 and 8 systems that haven’t upgraded IE as a web browsing platform a reasonable trade-off for implementing the best SSL security features?
For the LUV server as a stand-alone issue the answer would be no as the only really secret data there is accessed via ssh. For a general web infrastructure issue it seems that the answer might be yes.
I think that it benefits the community to allow members to test against server configurations that will become more popular in the future. After implementing changes in the server I can advise club members (and general community members) about how to configure their servers for similar results.
Does this outweigh the problems caused by some potential users of ancient systems?
I’m blogging about this because I think that the issues of configuration of community servers have a greater scope than my local LUG. I welcome comments about these issues, as well as about the SSL compatibility issues.
-  https://www.decadent.org.uk/ben/blog/securing-wwwdecadentorguk.html
-  https://www.ssllabs.com/ssltest/
-  https://blog.qualys.com/ssllabs/2013/08/05/configuring-apache-nginx-and-openssl-for-forward-secrecy
-  https://blog.qualys.com/ssllabs/2013/03/19/rc4-in-tls-is-broken-now-what
-  https://en.wikipedia.org/wiki/Template:TLS/SSL_support_history_of_web_browsers
Katie Miller gave an excellent talk about Haskell covering:
- Haskell is at the heart of Sigma at Facebook handling more than 1 million requests / second
- Haxl is a Haskell framework in Sigma and is used for fighting malicious activity on Facebook
- Haxl is open source.
- Haskell's origins are academic.
- Haskell is used widely in industry. Katie listed quite a few.
- The ability to reason about code, from:
- Strong static typing
- Abstract away from concurrency
- Haskell performs:
- As much as 3x as fast as it's predecessor
- 30 times as fast for the user experience.
- Not necessarily difficult but it different and that result sin friction.
- Concepts don't neatly to familiar languages.
- It's a journey teaching you a new way to think.
- Abstract concepts can take time to get a handle on.
- Type Errors
- Can be a little difficult to explain to new users
- Worth mastering due to increase of productivity.
- Didn't mention monads.
- Stuck to notations and implicit concurrency.
- Created conversation space to discuss issues.
- Results exceeded expectations.
- There are more people wanting to work in Haskell than there are Haskell jobs.
- An embarrassment of talent riches.
- Community may have forgotten how to communicate to new people.
- Keep in mind what it as like to be new.
- Documentation for libraries needs to improve.
- Work to create diverse and welcoming community.
- Be technical brutal but personally respectful.
- Haskell is not immune to bad code.
"Using functional programming is a conclusion from a goal, not the goal itself" - Brian McKenna
- The legacy of Haskell is spreading good ideas and this was an original goal.
- Haskell's difference brings both benefits and challenges.
"Open Source = opportunity" - Katie Miller
Conclusion: Haskell is for production.
Site Reliability Engineering at Dropbox – Tammy Butow
- Having a SLA, measuring against it. Caps OPSwork, Blameless Post Mortum, Always be coding
- 400 M customer, billion files every day
- Very hard to find people to scale, so build tool to scale instead
- Team looks at 6,000 DB machines, look after whole machines not just the app
- Build a lot of tools in python and go
- PygerDuty – python library for pagerduty API
- Easy to find the top things paging, write tools to reduce these
- Regular weekly meeting to review those problems and make them better
- If work is happening on machines then turn off monitoring on them so people don’t get woken up for things they don’t need to.
- Going for days without any pages
- Self-Healing and auto-remediation scripts
- Allocate and track tasks to groups
- Automation of DB tasks
- Bot copies pagerduty alerts in slack
- Aim Higher
- Create a roadmap for next 12 months
- Buiding a rocketship while it is flying though the sky
- Follow the Sun so people are working days
- Post Mortem for every page
- Frequent DR testing
- Take time out to celebrate
I missed out writing up the next couple of talks due to technical problems
I told a little story about Ceph today in the sysadmin miniconf at linux.conf.au 2016. Simon and Ewen have already put a copy of my slides up on the miniconf web site, but for bonus posterity you can get them from here also. For some more detail, check out Lars’ SUSE Enterprise Storage – Design and Performance for Ceph slides from SUSECon 2015 and Software-defined Storage: Sizing and Performance.
VM Brasseur gave the most revealing and informative talk I've attended thus far.A brief tour:
- Anonymise IP addresses
- Run the Wayback Machine
- Archive for paying customers
- Donated crawled content
- Deep crawl on popular sites
- Broad shallow crawl of known domains
- Targeted crawls.
- On demand crawling from the webpage - so you can archive before it's pulled from the net.
- Scan 1000 books per day
- 30 scanning centres around the world.
- Runs the Open Library for both books, video and TV.
- TV is now a citable resource
- Live music archive
- Includes every live show of the Grateful Dead.
- Video game archive of classic games playable in the browser.
- Will save anything digital, for free, publicly.
- Returns JSON (XML also available)
- Evergreen and Koha are FOSS library applications that use the Open Library API.
- "Do We Want It?" API to see what books the library needs.
- There is an advanced search API too.
- Drop in replacement for the Amazon S3 API.
- NASA use the API. All the audio, video and photos are available on the Internet Archive.
- ia-wrapper is a Perl client to access the API.
- There is a Python client: internetarchive.
This is the talk that has blown my mind so far. This ought to have been a keynote. So very glad I made it. I highly recommend watching the video when it comes out. This was an amazing talk.
Is that a Cloud in you packet – Steven Ellis
- What if you could have a demo of a stack on a phone
- or on a memory stick or a mini raspberry-pi type PC
- Nested Virtualisation
- Using Linux as host env, not so good on Win and Mac
- Thinkpad, fedora or Centos, 128GB SSD
- Nested Virtualisation
- Huge perforance boost over qemu
- Use SSD
- enable options in modules kvm-intel or kvm-amd
- Confirm SSD perf 1st – hdparm -t /dev/sdX
- Create base env for VMs, enable vmx in features
- Make sure it uses a different network so doesn’t badly interact with ones further out
- Think LVM
- Creat ethin pool for all envs
- Think on lvm ” issue_discards = 1 “
- Base image
- Doesn’t have to be minimal
- update the base regularly
- How do you build your base image?
- Thin may go weirdly wrong
- Always use kickstart to re-create it.
- Think of your use case, don’t skim on the disk (eg 40G disk for image)
- ssh keys, Enable yum cache
- Patch once kicked
- keep a content cache, maybe with rsync or mrepo
- Turn off VM and hen use fsrim and partx to make it nice and smaller.
- virt-manager can’t manage thin volumes, DONT manually add the path
- use virsh to manually add the path.
- snapshots or snapshots great performance on SSD
- Thin longer activates automatically on distros
- packstack simple way to install simple openstack setup
- LVM vs QCOW
- qcow okay for smaller images
- cloud-init with atomic
- do not snapshot a qcow image when it is running
Revisiting Unix principles for modern system automation – Martin Krafft
- SSH Botnet
- OSI of System Automation
- Transport unix style, both push and pull
- uses socat for low level data moving
- autossh <- restarts ssh connection automatically
- creates control socket
A Gentle Introduction to Ceph – Tim Serong
- Ceph gives a storage cluster that is self healing and self managed
- 3 interfaces, object, block, distributed fs
- OSD with files on them, monitor nodes
- OSD will forward writes to other replics of the data
- clients can read from any OSD
- Software defined storage vs legacy appliances
- Fastest you can, seperate public and cluster networks
- cluster fatsre than public
- 1-2G ram per TB of storage
- read recomendations
- SSD journals to cache writes
- Replications – capacity impact but usually good performance
- Erasure coding – Like raid – better space efficiency but impact in most other areas
- Adding more nodes
- tends to work
- temp impact during rebalancing
- How to size
- understand you workload
- make a guess
- Build a 10% pilot
- refine to until perf is achieved
- scale up the the pilot
Keeping Pinterest running – Joe Gordon
- Software vs service
- No stable versions
- Only one version is live
- Devs support their own service – alligns incentives, eg monitoring built in
- Testing against production traffic
- SRE at Pinterest
- Like a pit crew in F1
- firefighting at scale
- changing tires while moving
- Operation Maturity
- Operation Excellence
- Have the best practices, docs, process, imporvements
- Repeatable deploys
- data driven company
- Lots of Time series data – TSDB
- Using ELK
- no impact to end user
- easy to do, every few minutes
- Canary vs Staging
- Send dark (copies) of traffic to canary box without sending anything back to user
- Bounce back to starting if problems
- Rollback, hotfix, rolling deploy, starting and testing, visibility and useability
- client-server model
- pre/post download, restart, etc scripts included with every deployment
- puase/resume various testing
- Postmortums and Production readyness reviews
- Cloud is not infinite, often will hit AWS capacity limits or even no avaialble stuff in the region
- Need to be able to make sure you know what you are running and if it i seffecintly used
- Open sourced tools
- mysql_utils – lots of tools to manage many DBs
- Thrift tools
- Teletraan – open sourced in Feb 2016
- There was an excellent real world example using streaming.
- There were a couple of other examples user k-mer and a genome markov model.
- Python has a function call overhead of half a megabyte per second that is a hard limit for these applications.
Juan provided an excellent introduction into using the libraries that enable to write in Python in the manner of functional programming language.
Tony Morris gave an excellent and informative talk that was mostly well beyond my current skill sets. some key points were:
- Believes that functional programming as already won "the war".
- Parametricity is a tool of hight reward.
- Every programme terminates in a "total language".
- Gave an excellent and detailed example of a fictional parametric language.
- Covered the limits of parametricity using Haskell as an example.
However my understanding of parametricity is greater than when I entered the room :-)
Sitting next to Brendan O'Dea for his post talk rants was definitely a good move.
George Fong opened covering an extraordinary time in history, the birth of the Internet and in particular it's origins in Australia.
George made some interesting points about Linux and the community:
- 96.3% of the top 1 million website run on Linux
- Open Source and Free Software refer to a different set of views based on different values.
There was some coverage of the common foundations of Internet communities and the rough consensus.
"Be conservative in what you send and liberal in what you accept" - Jon Postel
Discussed the impact and challenges of mobile devices, the Internet of things and the reliability of maintenance.
Covered the innovation of the hacker / maker community doing a better job in some cases than the manufacturers.
Made some interesting points about how the Internet became a scary place, the rift between the tech community and commercial users.
Discussed the erosion of civil digital liberties and the interference of the state in devices.
The frustrations about metadata laws were touched and quoted Michael Cordover.
George wrapped up covering the loss of integrity and trust. A heightened awareness of the human impact.
"The cavalry are not coming. We are the cavalry" - George Fong
George gave an interesting and thought provoking keynote. Well worth watching when the video becomes available.
George Fong – Chair of Internet Australia
The Challenges of the Changing Social Significance of the Nerd
- “This is the first conference I’ve been to where there’s an extremely high per capita number of ponytails”
- Linux not just running web server and other servers but also network devices
- Linux and the Web aren’t the same thing, but they’ve grown symbiotically and neither would be the same without the other
- “One of the lessons we’ve learned in Australia is that when you mix technology with politics, you get into trouble”
- “We have proof in Australia that if you take guns away from people, people stop getting killed”
Cloud Anti-Patterns – Casey West
- The 5 stages of Cloud Native
- Deploying my apps to the cloud is painful – why?
- “Containers are like tiny VMs”
- Anti-Pattern 1 – do not assume what you have now is what you want to put into the cloud or a container
- “We don’t need to automate continuous delivery”
- We shouldn’t automate what we have until it is perfect. Automate to make things consistent (not always perfect at least at the start)
- “works on my machine”
- Dev is just push straight from dev boxes to production
- Not about making worse code go to production faster
- Aim to repeatable testable builds, just faster
- “We crammed the monolith into a container and called it a microservice”
- Anti-Pattern: Critically think on what you need to re-factor (or “re-platforming” )
- ” Bi-modal IT “
- Some stuff on fast lane, some stuff on old-way slow lane
- Anti-pattern: leagacy products put into slow lane, these are often the ones that really need to be fixed.
- “Micros-services” talking to same data-source, not APIs
- “200 microservices but forgot to setup Jenkins”
- “We have an automated build pipeline but online release twice per year”
- All software sucks, even the stuff we write
- Respect CAP theorem
- Respect Conway’s Law
- Small batch sizes works for replatforming too
- Microservices architecture, Devops culture, Continuous delivery – Pick all three
Cloud Crafting – Public / Private / Hybrid – Steven Ellis
- What does Hybrid mean to you?
- What is private Cloud (IAAS)
- Hybrid – communicate to public cloud and manage local stuff
- ManageIQ – single pain of glass for hardware, vms, clounds, containers
- What does it do?
- Brownfields as well as Greenfields, gathers current setup
- Discovery, API presentations, control and detect when env non-complient (eg not fully patched)
- Premise or public cloud
- Supplied as a virtual appliance, HA, scale out
- Platform – Centos 7, rails, postgress, gui, some dashboards our of the box.
- Get involved
- Online, roadmap is public
- Various contributors
- Just put in credentials to allow access and then it can gather the data straiht away
Live Migration of Linux Containers by Tycho Andersen
- LXC / LXD
- LXD is a REST API that you use to control the container system
- tool -> RST -> Daemon -> lxc -> Kernel
- “lxc move host1:c1 host2: ” – Live migrations
- Needs a bit of work since lots moving, lots of ways it could fail
- 3 channels created, control, filesystem, container processes state
- 5 years of check-pointing
- Lots based off open-VZ initial work
- All sorts of things need to support check-pointing and moving (eg selinux)
- Iterative migration added
- Lots of hooks needed for very privileged kernel features
- btrfs, lvm, zfs, (swift, nfs), have special support for migration that it hooks into
- rsync between incompatable hosts
- Memory State
- Stop the world and move it all
- Iterative incremental transfer (via p.haul) being worked on.
- LXC + LXD 2.0 should be in Ubuntu 16.04 LTS
- Need to use latest versions and latest kernels for best results.
Today at Linux Conference Australia 2016 or "LCA by the Bay" in Geelong, I attended the Open Hardware miniconf for the first time after a number of years of wanting to attend but being unable to.
This year they were making an ESPlant (Environment Sensor Plant), which is kind of timely.
The solar powered plant we're building has the following sensors:
- BME280 Temperature/Humidity/Barometric Pressure sensor
- 2 soil moisture sensors
- DS18B20 temperature sensor
- PIR (infrared motion) sensor (in case your plants are running away from something)
- ADXL345 accelerometer (to see how fast your plants are running away)
- WS2812B LED strip
- UV sensor SEN0162 from DF Robot.
It only took about 30 minutes to assemble, and it looked like this on my desk:
This was an enjoyable and extremely productive hardware hacking session. I am going to have some of these all over our gardens and they will look something like this but with added weather shields:
I intend to hook the moisture sensors into the watering system so that water is delivered when soil moisture levels reach a pre-determined low point and stopped once they reach a desired saturation point. I want to maximise the efficiency of water consumption and I think this should help.
Sample raw serial output from the ESPlant:temp = 26.22 pressure = 1005.65 humidity = 43.34 acc/x = 0.04 acc/y = -0.35 acc/z = 10.04 adc/uv_sensor = 0 adc/soil_1 = 91 adc/soil_2 = 38 adc/input_voltage = 4398 adc/internal_temp = 1656 external/temp_sensor = 24.00 chip/free_heap = 47120 chip/vcc = 3431 pir = low led = 6
This data could also be fed into some long term graphing on my server via it's WiFi chip.
If your school or homeschool group is based in our around the Brisbane area, we can visit with our robotic caterpillar and other critters as part of our FREE Robotics Incursion.
The caterpillar has quickly become our main mascot, as students, teachers and parents take a liking to it! It is an autonomous robot, with a 3D printed frame and Arduino controlled electronics.
The OpenSTEM caterpillar design and code are fully open and also serve as a good example of how subjects such as robotics can be explored at relatively low cost – that is, without expensive branded kits. This can be a real enabler for both school and home.
Open Cloud Miniconf – Continuous Delivery using blue-green deployments and immutable infrastructure by Ruben Rubio Rey
- Lots of things can go wrong in a deployment
- Often hard to do rollbacks once upgrade happens
- Blue-Green deployment is running several envs at the same time, each potentially with different versions
- Immutable infrastructure , split between data (which changes) and everything else only gets replaced fully by deployments, not changed
- When you use docker don’t store data in the container, makes it immutable. But containers are not required to do this.
- Rule 1 – Never modify the infrastructure
- Rule 2 – Instead of modifying – always create from ground up everything that is not data.
- Rollbacks easy
- Avoid Configuration drift
- Updated and accurate infrastructure documentation
- Split things up
- No State – LBs, Web servers, App Servers
- Temp data , Volatile State – message queues, email servers
- Persistent data – Databases, Filesystems, slow warming cache
- In case of temp data you have to be able to drain
- USe LBs and multiple servers to split up infrastructure, more bit give more room to split up the upgrades.
- If pending jobs require old/new version of app then route to servers that have/not been upgraded yet.
- Put toy rocket launcher in devs office, shoots person who broke the build.
- Need to “use activity script” to bleed traffic off section of the “temp data” layer of infrastructure, determine when it is empty and then re-create.
In less sanguine news, it became apparent that the heartless hypocrite has further isolated one of their emotional slaves.
A far too venerable cluster (Scientific Linux release 6.2, 2.6.32 kernel, Opteron 6212 processors) with more than 800 user accounts makes use of NFS-v4 to access storage directories. It is a typical architecture, with a management and login node with a number of compute nodes. The directory /usr/local is on the management node and mounted across to the login and compute nodes. User and project directories are distributed two storage arrays appropriately named storage1 and storage2.
[root@edward-m ~]# cat /etc/fstab