Planet Linux Australia

Syndicate content
Planet Linux Australia - http://planet.linux.org.au
Updated: 34 min 4 sec ago

Simon Lyall: DevOps Days Auckland 2017 – Wednesday Session 3

Wed, 2017-10-04 15:02

Sanjeev Sharma – When DevOps met SRE: From Apollo 13 to Google SRE

  • Author of Two DevOps Bookks
  • Apollo 13
    • Who were the real heroes? The guys back at missing control. The Astronaunts just had to keep breathing and not die
  • Best Practice for Incident management
    • Prioritize
    • Prepare
    • Trust
    • Introspec
    • Consider Alternatives
    • Practice
    • Change it around
  • Big Hurdles to adoption of DevOps in Enterprise
    • Literature is Only looking at one delivery platform at a time
    • Big enterprise have hundreds of platforms with completely different technologies, maturity levels, speeds. All interdependent
    • He Divides
      • Industrialised Core – Value High, Risk Low, MTBF
      • Agile/Innovation Edge – Value Low, Risk High, Rapid change and delivery, MTTR
      • Need normal distribution curve of platforms across this range
      • Need to be able to maintain products at both ends in one IT organisation
  • 6 capabilities needed in IT Organisation
    • Planning and architecture.
      • Your Delivery pipeline will be as fast as the slowest delivery pipeline it is dependent on
    • APIs
      • Modernizing to Microservices based architecture: Refactoring code and data and defining the APIs
    • Application Deployment Automation and Environment Orchestration
      • Devs are paid code, not maintain deployment and config scripts
      • Ops must provide env that requires devs to do zero setup scripts
    • Test Service and Environment Virtualisation
      • If you are doing 2week sprints, but it takes 3-weeks to get a test server, how long are your sprints
    • Release Management
      • No good if 99% of software works but last 1% is vital for the business function
    • Operational Readiness for SRE
      • Shift between MTBF to MTTR
      • MTTR  = Mean time to detect + Mean time to Triage + Mean time to restore
      • + Mean time to pass blame
    • Antifragile Systems
      • Things that neither are fragile or robust, but rather thrive on chaos
      • Cattle not pets
      • Servers may go red, but services are always green
    • DevOps: “Everybody is responsible for delivery to production”
    • SRE: “(Everybody) is responsible for delivering Continuous Business Value”

Simon Lyall: DevOps Days Auckland 2017 – Wednesday Session 2

Wed, 2017-10-04 11:02

Marcus Bristol (Pushpay) – Moving fast without crashing

  • Low tolerance for errors in production due to being in finance
  • Deploy twice per day
  • Just Culture – Balance safety and accountability
    • What rule?
    • Who did it?
    • How bad was the breach?
    • Who gets to decide?
  • Example of Retributive Culture
    • KPIs reflect incidents.
    • If more than 10% deploys bad then affect bonus
    • Reduced number of deploys
  • Restorative Culture
  • Blameless post-mortem
    • Can give detailed account of what happened without fear or retribution
    • Happens after every incident or near-incident
    • Written Down in Wiki Page
    • So everybody has the chance to have a say
    • Summary, Timeline, impact assessment, discussion, Mitigations
    • Mitigations become highest-priority work items
  • Our Process
    • Feature Flags
    • Science
    • Lots of small PRs
    • Code Review
    • Testers paired to devs so bugs can be fixed as soon as found
    • Automated tested
    • Pollination (reviews of code between teams)
    • Bots
      • Posts to Slack when feature flag has been changed
      • Nags about feature flags that seems to be hanging around in QA
      • Nags about Flags that have been good in prod for 30+ days
      • Every merge
      • PRs awaiting reviews for long time (days)
      • Missing postmortun migrations
      • Status of builds in build farm
      • When deploy has been made
      • Health of API
      • Answer queries on team member list
      • Create ship train of PRs into a build and user can tell bot to deploy to each environment

Simon Lyall: DevOps Days Auckland 2017 – Wednesday Session 1

Wed, 2017-10-04 09:02

Michael Coté – Not actually a DevOps Talk

Digital Transformation

  • Goal: deliver value, weekly reliably, with small patches
  • Management must be the first to fail and transform
  • Standardize on a platform: special snow flakes are slow, expensive and error prone (see his slide, good list of stuff that should be standardize)
  • Ramping up: “Pilot low-risk apps, and ramp-up”
  • Pair programming/working
    • Half the advantage is people speed less time on reddit “research”
  • Don’t go to meetings
  • Automate compliance, have what you do automatic get logged and create compliance docs rather than building manually.
  • Crafting Your Cloud-Native Strategy

Sajeewa Dayaratne – DevOps in an Embedded World

  • Challenges on Embedded
    • Hardware – resource constrinaed
    • Debugging – OS bugs, Hardware Bugs, UFO Bugs – Oscilloscopes and JTAG connectors are your friend.
    • Environment – Thermal, Moisture, Power consumption
    • Deploy to product – Multi-month cycle, hard of impossible to send updates to ships at sea.
  • Principles of Devops , equally apply to embedded
    • High Frequency
    • Reduce overheads
    • Improve defect resolution
    • Automate
    • Reduce response times
  • Navico
    • Small Sonar, Navigation for medium boats, Displays for sail (eg Americas cup). Navigation displays for large ships
    • Dev around world, factory in Mexico
  • Codebase
    • 5 million lines of code
    • 61 Hardware Products supported – Increasing steadily, very long lifetimes for hardware
    • Complex network of products – lots of products on boat all connected, different versions of software and hardware on the same boat
  • Architecture
    • Old codebase
    • Backward compatible with old hardware
    • Needs to support new hardware
    • Desire new features on all products
  • What does this mean
    • Defects were found too late
    • Very high cost of bugs found late
    • Software stabilization taking longer
    • Manual test couldn’t keep up
    • Cost increasing , including opportunity cost
  • Does CI/CD provide answer?
    • But will it work here?
    • Case Study from HP. Large-Scale Agile Development by Gary Gruver
  • Our Plan
    • Improve tolls and archetecture
    • Build Speeds
    • Automated testing
    • Code quality control
  • Previous VCS
    • Proprietary tool with limit support and upgrades
    • Limited integration
    • Lack of CI support
    • No code review capacity
  • Move to git
    • Code reviews
    • Integrated CI
    • Supported by tools
  • Archetecture
    • Had a configurable codebase already
    • Fairly common hardware platform (only 9 variations)
    • Had runtime feature flags
    • But
      • Cyclic dependancies – 1.5 years to clean these up
      • Singletons – cut down
      • Promote unit testability – worked on
      • Many branches – long lived – mega merges
  • Went to a single Branch model, feature flags, smaller batch sizes, testing focused on single branch
  • Improve build speed
    • Start 8 hours to build Linux platform, 2 hours for each app, 14+ hours to build and package a release
    • Options
      • Increase speed
      • Parallel Builds
    • What did
      • ccache.clcache
      • IncrediBuild
      • distcc
    • 4-5hs down to 1h
  • Test automation
    • Existing was mock-ups of the hardware to not typical
    • Started with micro-test
      • Unit testing (simulator)
      • Unit testing (real hardware)
    • Build Tools
      • Software tools (n2k simulator, remote control)
      • Hardware tools ( Mimic real-world data, re purpose existing stuff)
    • UI Test Automation
      • Build or Buy
      • Functional testing vs API testing
      • HW Test tools
      • Took 6 hours to do full test on hardware.
  • PipeLine
    • Commit -> pull request
    • Automated Build / Unit Tests
    • Daily QA Build
  • Next?
    • Configuration as code
    • Code Quality tools
    • Simulate more hardware
    • Increase analytics and reporting
    • Fully simulated test env for dev (so the devs don’t need the hardware)
    • Scale – From internal infrastructure to the cloud
    • Grow the team
  • Lessons Learnt
    • Culture!
    • Collect Data
    • Get Executive Buy in
    • Change your tolls and processes if needed
    • Test automation is the key
      • Invest in HW
      • Silulate
      • Virtualise
    • Focus on good software design for Everything

Ben Martin: Ikea wireless charger in CNC mahogany case

Tue, 2017-10-03 21:41
I notice that Ikea sell their wireless chargers without a shell for insertion into desks. The "desk" I chose is a curve cut profile in mahogany that just happens to have the same fit as an LG G3/4/5 type phone. The design changed along the way to a more upright one which then required a catch to stop the phone sliding off.


This was done in Fusion360 which allows bringing in STL files of things like phones and cutting those out of another body. It took a while to work out the ball end toolpath but I finally worked out how to get something that worked reasonably well. The chomps in the side allow fingers to securely lift the phone off the charger.

It will be interesting to play with sliced objects in wood. Layering 3D cuts to build up objects that are 10cm (or about 4 layers) tall.

Simon Lyall: DevOps Days Auckland 2017 – Tuesday Session 3

Tue, 2017-10-03 15:02

Mirror, mirror, on the wall: testing Conway’s Law in open source communities – Lindsay Holmwood

  • The map between the technical organisation and the technical structure.
  • Easy to find who owns something, don’t have to keep two maps in your head
  • Needs flexibility of the organisation structure in order to support flexibility in a technical design
  • Conway’s “Law” really just adage
  • Complexity frequently takes the form of hierarchy
  • Organisations that mirror perform badly in rapidly changing and innovative enviroments

Metrics that Matter – Alison Polton-Simon (Thoughtworks)

  • Metrics Mania – Lots of focus on it everywhere ( fitbits, google analytics, etc)
  • How to help teams improve CD process
  • Define CD
    • Software consistently in a deployable state
    • Get fast, automated feedback
    • Do push-button deployments
  • Identifying metrics that mattered
    • Talked to people
    • Contextual observation
    • Rapid prototyping
    • Pilot offering
  • 4 big metrics
    • Deploy ready builds
    • Cycle time
    • Mean time between failures
    • Mean time to recover
  • Number of Deploy-ready builds
    • How many builds are ready for production?
    • Routine commits
    • Testing you can trust
    • Product + Development collaboration
  • Cycle Time
    • Time it takes to go from a commit to a deploy
    • Efficient testing (test subset first, faster testing)
    • Appropriate parallelization (lots of build agents)
    • Optimise build resources
  • Case Study
    • Monolithic Codebase
    • Hand-rolled build system
    • Unreliable environments ( tests and builds fail at random )
    • Validating a Pull Request can take 8 hours
    • Coupled code: isolated teams
    • Wide range of maturity in testing (some no test, some 95% coverage)
    • No understanding of the build system
    • Releases routinely delay (10 months!) or done “under the radar”
  • Focus in case study
    • Reducing cycle time, increasing reliability
    • Extracted services from monolith
    • Pipelines configured as code
    • Build infrastructure provisioned as docker and ansible
    • Results:
      • Cycle time for one team 4-5h -> 1:23
      • Deploy ready builds 1 per 3-8 weeks -> weekly
  • Mean time between failures
    • Quick feedback early on
    • Robust validation
    • Strong local builds
    • Should not be done by reducing number of releases
  • Mean time to recover
    • How long back to green?
    • Monitoring of production
    • Automated rollback process
    • Informative logging
  • Case Study 2
    • 1.27 million lines of code
    • High cyclomatic complexity
    • Tightly coupled
    • Long-running but frequently failing testing
    • Isolated teams
    • Pipeline run duration 10h -> 15m
    • MTTR Never -> 50 hours
    • Cycle time 18d -> 10d
    • Created a dashboard for the metrics
  • Meaningless Metrics
    • The company will build whatever the CEO decides to measure
    • Lines of code produced
    • Number of Bugs resolved. – real life duplicates Dilbert
    • Developers Hours / Story Points
    • Problems
      • Lack of team buy-in
      • Easy to agme
      • Unintended consiquences
      • Measuring inputs, not impacts
  • Make your own metrics
    • Map your path to production
    • Highlights pain points
    • collaborate
    • Experiment

 

Simon Lyall: DevOps Days Auckland 2017 – Tuesday Session 2

Tue, 2017-10-03 11:02

Using Bots to Scale incident Management – Anthony Angell (Xero)

  • Who we are
    • Single Team
    • Just a platform Operations team
  • SRE team is formed
    • Ops teams plus performance Engineering team
  • Incident Management
    • In Bad old days – 600 people on a single chat channel
    • Created Framework
    • what do incidents look like, post mortems, best practices,
    • How to make incident management easy for others?
  • ChatOps (Based on Hubot)
    • Automated tour guide
    • Multiple integrations – anything with Rest API
    • Reducing time to restore
    • Flexability
  • Release register – API hook to when changes are made
  • Issue report form
    • Summary
    • URL
    • User-ids
    • how many users & location
    • when started
    • anyone working on it already
    • Anything else to add.
  • Chat Bot for incident
    • Populates for an pushes to production channel, creates pagerduty alert
    • Creates new slack channel for incident
    • Can automatically update status page from chat and page senior managers
    • Can Create “status updates” which record things (eg “restarted server”), or “yammer updates” which get pushed to social media team
    • Creates a task list automaticly for the incident
    • Page people from within chat
    • At the end: Gives time incident lasted, archives channel
    • Post Mortum
  • More integrations
    • Report card
    • Change tracking
    • Incident / Alert portal
  • High Availability – dockerisation
  • Caching
    • Pageduty
    • AWS
    • Datadog

 

Simon Lyall: DevOps Days Auckland 2017 – Tuesday Session 1

Tue, 2017-10-03 09:02

DevSecOps – Anthony Rees

“When Anthrax and Public Enemy came together, It was like Developers and Operations coming together”

  • Everybody is trying to get things out fast, sometimes we forget about security
  • Structural efficiency and optimised flow
  • Compliance putting roadblock in flow of pipeline
    • Even worse scanning in production after deployment
  • Compliance guys using Excel, Security using Shell-scripts, Develops and Operations using Code
  • Chef security compliance language – InSpec
    • Insert Sales stuff here
  • ispec.io
  • Lots of pre-written configs available

Immutable SQL Server Clusters – John Bowker (from Xero)

  • Problem
    • Pet Based infrastructure
    • Not in cloud, weeks to deploy new server
    • Hard to update base infrastructure code
  • 110 Prod Servers (2 regions).
  • 1.9PB of Disk
  • Octopus Deploy: SQL Schemas, Also server configs
  • Half of team in NZ, Half in Denver
    • Data Engineers, Infrastructure Engineers, Team Lead, Product Owner
  • Where we were – The Burning Platform
    • Changed mid-Migration from dedicated instances to dedicated Hosts in AWS
    • Big saving on software licensing
  • Advantages
    • Already had Clustered HA
    • Existing automation
    • 6 day team, 15 hours/day due to multiple locations of team
  • Migration had to have no downtime
    • Went with node swaps in cluster
  • Split team. Half doing migration, half creating code/system for the node swaps
  • We learnt
    • Dedicated hosts are cheap
    • Dedicated host automation not so good for Windows
    • Discovery service not so good.
    • Syncing data took up to 24h due to large dataset
    • Powershell debugging is hard (moving away from powershell a bit, but powershell has lots of SQL server stuff built in)
    • AWS services can timeout, allow for this.
  • Things we Built
    • Lots Step Templates in Octopus Deploy
    • Metadata Store for SQL servers – Dynamite (Python, Labda, Flask, DynamoDB) – Hope to Open source
    • Lots of PowerShell Modules
  • Node Swaps going forward
    • Working towards making this completely automated
    • New AMI -> Node swap onto that
    • Avoid upgrade in place or running on old version

James Morris: Linux Security Summit 2017 Roundup

Mon, 2017-10-02 13:01

The 2017 Linux Security Summit (LSS) was held last month in Los Angeles over the 14th and 15th of September.  It was co-located with Open Source Summit North America (OSSNA) and the Linux Plumbers Conference (LPC).

LSS 2017

Once again we were fortunate to have general logistics managed by the Linux Foundation, allowing the program committee to focus on organizing technical content.  We had a record number of submissions this year and accepted approximately one third of them.  Attendance was very strong, with ~160 attendees — another record for the event.

LSS 2017 Attendees

On the day prior to LSS, attendees were able to access a day of LPC, which featured two tracks with a security focus:

Many thanks to the LPC organizers for arranging the schedule this way and allowing LSS folk to attend the day!

Realtime notes were made of these microconfs via etherpad:

I was particularly interested in the topic of better integrating LSM with containers, as there is an increasingly common requirement for nesting of security policies, where each container may run its own apparently independent security policy, and also a potentially independent security model.  I proposed the approach of introducing a security namespace, where all security interfaces within the kernel are namespaced, including LSM.  It would potentially solve the container use-cases, and also the full LSM stacking case championed by Casey Schaufler (which would allow entirely arbitrary stacking of security modules).

This would be a very challenging project, to say the least, and one which is further complicated by containers not being a first class citizen of the kernel.   This leads to security policy boundaries clashing with semantic functional boundaries e.g. what does it mean from a security policy POV when you have namespaced filesystems but not networking?

Discussion turned to the idea that it is up to the vendor/user to configure containers in a way which makes sense for them, and similarly, they would also need to ensure that they configure security policy in a manner appropriate to that configuration.  I would say this means that semantic responsibility is pushed to the user with the kernel largely remaining a set of composable mechanisms, in relation to containers and security policy.  This provides a great deal of flexibility, but requires those building systems to take a great deal of care in their design.

There are still many issues to resolve, both upstream and at the distro/user level, and I expect this to be an active area of Linux security development for some time.  There were some excellent followup discussions in this area, including an approach which constrains the problem space. (Stay tuned)!

A highlight of the TPMs session was an update on the TPM 2.0 software stack, by Philip Tricca and Jarkko Sakkinen.  The slides may be downloaded here.  We should see a vastly improved experience over TPM 1.x with v2.0 hardware capabilities, and the new software stack.  I suppose the next challenge will be TPMs in the post-quantum era?

There were further technical discussions on TPMs and container security during subsequent days at LSS.  Bringing the two conference groups together here made for a very productive event overall.

TPMs microconf at LPC with Philip Tricca presenting on the 2.0 software stack.

This year, due to the overlap with LPC, we unfortunately did not have any LWN coverage.  There are, however, excellent writeups available from attendees:

There were many awesome talks.

The CII Best Practices Badge presentation by David Wheeler was an unexpected highlight for me.  CII refers to the Linux Foundation’s Core Infrastructure Initiative , a preemptive security effort for Open Source.  The Best Practices Badge Program is a secure development maturity model designed to allow open source projects to improve their security in an evolving and measurable manner.  There’s been very impressive engagement with the project from across open source, and I believe this is a critically important effort for security.

CII Bade Project adoption (from David Wheeler’s slides).

During Dan Cashman’s talk on SELinux policy modularization in Android O,  an interesting data point came up:

Interesting data from the talk: 44% of Android kernel vulns blocked by SELinux due to attack surface reduction. https://t.co/FnU544B3XP

— James Morris (@xjamesmorris) September 15, 2017

We of course expect to see application vulnerability mitigations arising from Mandatory Access Control (MAC) policies (SELinux, Smack, and AppArmor), but if you look closely this refers to kernel vulnerabilities.   So what is happening here?  It turns out that a side effect of MAC policies, particularly those implemented in tightly-defined environments such as Android, is a reduction in kernel attack surface.  It is generally more difficult to reach such kernel vulnerabilities when you have MAC security policies.  This is a side-effect of MAC, not a primary design goal, but nevertheless appears to be very effective in practice!

Another highlight for me was the update on the Kernel Self Protection Project lead by Kees, which is now approaching its 2nd anniversary, and continues the important work of hardening the mainline Linux kernel itself against attack.  I would like to also acknowledge the essential and original research performed in this area by grsecurity/PaX, from which this mainline work draws.

From a new development point of view, I’m thrilled to see the progress being made by Mickaël Salaün, on Landlock LSM, which provides unprivileged sandboxing via seccomp and LSM.  This is a novel approach which will allow applications to define and propagate their own sandbox policies.  Similar concepts are available in other OSs such as OSX (seatbelt) and BSD (pledge).  The great thing about Landlock is its consolidation of two existing Linux kernel security interfaces: LSM and Seccomp.  This ensures re-use of existing mechanisms, and aids usability by utilizing already familiar concepts for Linux users.

Mickaël Salaün from ANSSI talking about his Landlock LSM work at #linuxsecuritysummit 2017 pic.twitter.com/wYpbHuLgm2

— LinuxSecuritySummit (@LinuxSecSummit) September 14, 2017

Overall I found it to be an incredibly productive event, with many new and interesting ideas arising and lots of great collaboration in the hallway, lunch, and dinner tracks.

Slides from LSS may be found linked to the schedule abstracts.

We did not have a video sponsor for the event this year, and we’ll work on that again for next year’s summit.  We have discussed holding LSS again next year in conjunction with OSSNA, which is expected to be in Vancouver in August.

We are also investigating a European LSS in addition to the main summit for 2018 and beyond, as a way to help engage more widely with Linux security folk.  Stay tuned for official announcements on these!

Thanks once again to the awesome event staff at LF, especially Jillian Hall, who ensured everything ran smoothly.  Thanks also to the program committee who review, discuss, and vote on every proposal, ensuring that we have the best content for the event, and who work on technical planning for many months prior to the event.  And of course thanks to the presenters and attendees, without whom there would literally and figuratively be no event :)

See you in 2018!

 

OpenSTEM: Stone Axes and Aboriginal Stories from Victoria

Mon, 2017-10-02 11:04
In the most recent edition of Australian Archaeology, the journal of the Australian Archaeological Association, there is a paper examining the exchange of stone axes in Victoria and correlating these patterns of exchange with Aboriginal stories in the 19th century. This paper is particularly timely with the passing of legislation in the Victorian Parliament on […]

Russell Coker: Process Monitoring

Fri, 2017-09-29 02:02

Since forking the Mon project to etbemon [1] I’ve been spending a lot of time working on the monitor scripts. Actually monitoring something is usually quite easy, deciding what to monitor tends to be the hard part. The process monitoring script ps.monitor is the one I’m about to redesign.

Here are some of my ideas for monitoring processes. Please comment if you have any suggestions for how do do things better.

For people who don’t use mon, the monitor scripts return 0 if everything is OK and 1 if there’s a problem along with using stdout to display an error message. While I’m not aware of anyone hooking mon scripts into a different monitoring system that’s going to be easy to do. One thing I plan to work on in the future is interoperability between mon and other systems such as Nagios.

Basic Monitoring ps.monitor tor:1-1 master:1-2 auditd:1-1 cron:1-5 rsyslogd:1-1 dbus-daemon:1- sshd:1- watchdog:1-2

I’m currently planning some sort of rewrite of the process monitoring script. The current functionality is to have a list of process names on the command line with minimum and maximum numbers for the instances of the process in question. The above is a sample of the configuration of the monitor. There are some limitations to this, the “master” process in this instance refers to the main process of Postfix, but other daemons use the same process name (it’s one of those names that’s wrong because it’s so obvious). One obvious solution to this is to give the option of specifying the full path so that /usr/lib/postfix/sbin/master can be differentiated from all the other programs named master.

The next issue is processes that may run on behalf of multiple users. With sshd there is a single process to accept new connections running as root and a process running under the UID of each logged in user. So the number of sshd processes running as root will be one greater than the number of root login sessions. This means that if a sysadmin logs in directly as root via ssh (which is controversial and not the topic of this post – merely something that people do which I have to support) and the master process then crashes (or the sysadmin stops it either accidentally or deliberately) there won’t be an alert about the missing process. Of course the correct thing to do is to have a monitor talk to port 22 and look for the string “SSH-2.0-OpenSSH_”. Sometimes there are multiple instances of a daemon running under different UIDs that need to be monitored separately. So obviously we need the ability to monitor processes by UID.

In many cases process monitoring can be replaced by monitoring of service ports. So if something is listening on port 25 then it probably means that the Postfix “master” process is running regardless of what other “master” processes there are. But for my use I find it handy to have multiple monitors, if I get a Jabber message about being unable to send mail to a server immediately followed by a Jabber message from that server saying that “master” isn’t running I don’t need to fully wake up to know where the problem is.

SE Linux

One feature that I want is monitoring SE Linux contexts of processes in the same way as monitoring UIDs. While I’m not interested in writing tests for other security systems I would be happy to include code that other people write. So whatever I do I want to make it flexible enough to work with multiple security systems.

Transient Processes

Most daemons have a second process of the same name running during the startup process. This means if you monitor for exactly 1 instance of a process you may get an alert about 2 processes running when “logrotate” or something similar restarts the daemon. Also you may get an alert about 0 instances if the check happens to run at exactly the wrong time during the restart. My current way of dealing with this on my servers is to not alert until the second failure event with the “alertafter 2” directive. The “failure_interval” directive allows specifying the time between checks when the monitor is in a failed state, setting that to a low value means that waiting for a second failure result doesn’t delay the notification much.

To deal with this I’ve been thinking of making the ps.monitor script automatically check again after a specified delay. I think that solving the problem with a single parameter to the monitor script is better than using 2 configuration directives to mon to work around it.

CPU Use

Mon currently has a loadavg.monitor script that to check the load average. But that won’t catch the case of a single process using too much CPU time but not enough to raise the system load average. Also it won’t catch the case of a CPU hungry process going quiet (EG when the SETI at Home server goes down) while another process goes into an infinite loop. One way of addressing this would be to have the ps.monitor script have yet another configuration option to monitor CPU use, but this might get confusing. Another option would be to have a separate script that alerts on any process that uses more than a specified percentage of CPU time over it’s lifetime or over the last few seconds unless it’s in a whitelist of processes and users who are exempt from such checks. Probably every regular user would be exempt from such checks because you never know when they will run a file compression program. Also there is a short list of daemons that are excluded (like BOINC) and system processes (like gzip which is run from several cron jobs).

Monitoring for Exclusion

A common programming mistake is to call setuid() before setgid() which means that the program doesn’t have permission to call setgid(). If return codes aren’t checked (and people who make such rookie mistakes tend not to check return codes) then the process keeps elevated permissions. Checking for processes running as GID 0 but not UID 0 would be handy. As an aside a quick examination of a Debian/Testing workstation didn’t show any obvious way that a process with GID 0 could gain elevated privileges, but that could change with one chmod 770 command.

On a SE Linux system there should be only one process running with the domain init_t. Currently that doesn’t happen in Stretch systems running daemons such as mysqld and tor due to policy not matching the recent functionality of systemd as requested by daemon service files. Such issues will keep occurring so we need automated tests for them.

Automated tests for configuration errors that might impact system security is a bigger issue, I’ll probably write a separate blog post about it.

Related posts:

  1. Monitoring of Monitoring I was recently asked to get data from a computer...
  2. When to Use SE Linux Recently someone asked on IRC whether they should use SE...
  3. Health and Status Monitoring via Smart Phone Health Monitoring Eric Topol gave an interesting TED talk about...

Michael Still: I think I found a bug in python's unittest.mock library

Thu, 2017-09-28 18:00
Mocking is a pretty common thing to do in unit tests covering OpenStack Nova code. Over the years we've used various mock libraries to do that, with the flavor de jour being unittest.mock. I must say that I strongly prefer unittest.mock to the old mox code we used to write, but I think I just accidentally found a fairly big bug.

The problem is that python mocks are magical. Its an object where you can call any method name, and the mock will happily pretend it has that method, and return None. You can then later ask what "methods" were called on the mock.

However, you use the same mock object later to make assertions about what was called. Herein is the problem -- the mock object doesn't know if you're the code under test, or the code that's making assertions. So, if you fat finger the assertion in your test code, the assertion will just quietly map to a non-existent method which returns None, and your code will pass.

Here's an example:

    #!/usr/bin/python3 from unittest import mock class foo(object): def dummy(a, b): return a + b @mock.patch.object(foo, 'dummy') def call_dummy(mock_dummy): f = foo() f.dummy(1, 2) print('Asserting a call should work if the call was made') mock_dummy.assert_has_calls([mock.call(1, 2)]) print('Assertion for expected call passed') print() print('Asserting a call should raise an exception if the call wasn\'t made') mock_worked = False try: mock_dummy.assert_has_calls([mock.call(3, 4)]) except AssertionError as e: mock_worked = True print('Expected failure, %s' % e) if not mock_worked: print('*** Assertion should have failed ***') print() print('Asserting a call where the assertion has a typo should fail, but ' 'doesn\'t') mock_worked = False try: mock_dummy.typo_assert_has_calls([mock.call(3, 4)]) except AssertionError as e: mock_worked = True print('Expected failure, %s' % e) print() if not mock_worked: print('*** Assertion should have failed ***') print(mock_dummy.mock_calls) print() if __name__ == '__main__': call_dummy()


If I run that code, I get this:

    $ python3 mock_assert_errors.py Asserting a call should work if the call was made Assertion for expected call passed Asserting a call should raise an exception if the call wasn't made Expected failure, Calls not found. Expected: [call(3, 4)] Actual: [call(1, 2)] Asserting a call where the assertion has a typo should fail, but doesn't *** Assertion should have failed *** [call(1, 2), call.typo_assert_has_calls([call(3, 4)])]


So, we should have been told that typo_assert_has_calls isn't a thing, but we didn't notice because it silently failed. I discovered this when I noticed an assertion with a (smaller than this) typo in its call in a code review yesterday.

I don't really have a solution to this right now (I'm home sick and not thinking straight), but it would be interesting to see what other people think.

Tags for this post: python unittest.mock mock testing
Related posts: mbot: new hotness in Google Talk bots; On syncing with Google Contacts; Rsyncing everything but the data; Domain name lookup helper for python?; More coding club; Twisted conch

Comment

OpenSTEM: What Makes Humans Different From Most Other Mammals?

Mon, 2017-09-25 10:04
Well, there are several things that makes us different from other mammals – although perhaps fewer than one might think. We are not unique in using tools, in fact we discover more animals that use tools all the time – even fish! We pride ourselves on being a “moral animal”, however fairness, reciprocity, empathy and […]

Dave Hall: Drupal Puppies

Sun, 2017-09-24 22:02

Over the years Drupal distributions, or distros as they're more affectionately known, have evolved a lot. We started off passing around database dumps. Eventually we moved onto using installations profiles and features to share par-baked sites.

There are some signs that distros aren't working for people using them. Agencies often hack a distro to meet client requirements. This happens because it is often difficult to cleanly extend a distro. A content type might need extra fields or the logic in an alter hook may not be desired. This makes it difficult to maintain sites built on distros. Other times maintainers abandon their distributions. This leaves site owners with an unexpected maintenance burden.

We should recognise how people are using distros and try to cater to them better. My observations suggest there are 2 types of Drupal distributions; starter kits and targeted products.

Targeted products are easier to deal with. Increasingly monetising targeted distro products is done through a SaaS offering. The revenue can funds the ongoing development of the product. This can help ensure the project remains sustainable. There are signs that this is a viable way of building Drupal 8 based products. We should be encouraging companies to embrace a strategy built around open SaaS. Open Social is a great example of this approach. Releasing the distros demonstrates a commitment to the business model. Often the secret sauce isn't in the code, it is the team and services built around the product.

Many Drupal 7 based distros struggled to articulate their use case. It was difficult to know if they were a product, a demo or a community project that you extend. Open Atrium and Commerce Kickstart are examples of distros with an identity crisis. We need to reconceptualise most distros as "starter kits" or as I like to call them "puppies".

Why puppies? Once you take a puppy home it becomes your responsibility. Starter kits should be the same. You should never assume that a starter kit will offer an upgrade path from one release to the next. When you install a starter kit you are responsible for updating the modules yourself. You need to keep track of security releases. If your puppy leaves a mess on the carpet, no one else will clean it up.

Sites build on top of a starter kit should diverge from the original version. This shouldn't only be an expectation, it should be encouraged. Installing a starter kit is the starting point of building a unique fork.

Project pages should clearly state that users are buying a puppy. Prospective puppy owners should know if they're about to take home a little lap dog or one that will grow to the size of a pony that needs daily exercise. Puppy breeders (developers) should not feel compelled to do anything once releasing the puppy. That said, most users would like some documentation.

I know of several agencies and large organisations that are making use of starter kits. Let's support people who are adopting this approach. As a community we should acknowledge that distros aren't working. We should start working out how best to manage the transition to puppies.

Tim Serong: On Equal Rights

Sun, 2017-09-24 06:04

This is probably old news now, but I only saw it this morning, so here we go:

WOW. JILL POULSEN’S COLUMN TOMORROW. pic.twitter.com/5kmPa8LHiC

— The NT News (@TheNTNews) September 15, 2017

In case that embedded tweet doesn’t show up properly, that’s an editorial in the NT News which says:

Voting papers have started to drop through Territory mailboxes for the marriage equality postal vote and I wanted to share with you a list of why I’ll be voting yes.

1. I’m not an arsehole.

This resulted in predictable comments along the lines of “oh, so if I don’t share your views, I’m an arsehole?”

I suppose it’s unlikely that anyone who actually needs to read and understand what I’m about to say will do so, but just in case, I’ll lay this out as simply as I can:

  • A personal belief that marriage is a thing that can only happen between a man and a woman does not make you an arsehole (it might make you on the wrong side of history, or a lot of other things, but it does not necessarily make you an arsehole).
  • Voting “no” to marriage equality is what makes you an arsehole.

The survey says “Should the law be changed to allow same-sex couples to marry?” What this actually means is, “Should same-sex couples have the same rights under law as everyone else?”

If you believe everyone should have the same rights under law, you need to vote yes regardless of what you, personally, believe the word “marriage” actually means – this is to make sure things like “next of kin” work the way the people involved in a relationship want them to.

If you believe that there are minorities that should not have the same rights under law as everyone else, then I’m sorry, but you’re an arsehole.

(Personally I think the Marriage Act should be ditched entirely in favour of a Civil Unions Act – that way the word “marriage” could go back to simply meaning whatever it means to the individuals being married, and to their god(s) if they have any – but this should in no way detract from the above. Also, this vote shouldn’t have happened in the first place; our elected representatives should have done their bloody jobs and fixed the legislation already.)

Russell Coker: Converting Mbox to Maildir

Sat, 2017-09-23 16:02

MBox is the original and ancient format for storing mail on Unix systems, it consists of a single file per user under /var/spool/mail that has messages concatenated. Obviously performance is very poor when deleting messages from a large mail store as the entire file has to be rewritten. Maildir was invented for Qmail by Dan Bernstein and has a single message per file giving fast deletes among other performance benefits. An ongoing issue over the last 20 years has been converting Mbox systems to Maildir. The various ways of getting IMAP to work with Mbox only made this more complex.

The Dovecot Wiki has a good page about converting Mbox to Maildir [1]. If you want to keep the same message UIDs and the same path separation characters then it will be a complex task. But if you just want to copy a small number of Mbox accounts to an existing server then it’s a bit simpler.

Dovecot has a mb2md.pl script to convert folders [2].

cd /var/spool/mail mkdir -p /mailstore/example.com for U in * ; do ~/mb2md.pl -s $(pwd)/$U -d /mailstore/example.com/$U done

To convert the inboxes shell code like the above is needed. If the users don’t have IMAP folders (EG they are just POP users or use local Unix MUAs) then that’s all you need to do.

cd /home for DIR in */mail ; do U=$(echo $DIR| cut -f1 -d/) cd /home/$DIR for FOLDER in * ; do ~/mb2md.pl -s $(pwd)/$FOLDER -d /mailstore/example.com/$U/.$FOLDER done cp .subscriptions /mailstore/example.com/$U/ subscriptions done

Some shell code like the above will convert the IMAP folders to Maildir format. The end result is that the users will have to download all the mail again as their MUA will think that every message had been deleted and replaced. But as all servers with significant amounts of mail or important mail were probably converted to Maildir a decade ago this shouldn’t be a problem.

Related posts:

  1. Why Cyrus Sucks I’m in the middle of migrating a mail server away...
  2. Moving a Mail Server Nowadays it seems that most serious mail servers (IE mail...
  3. Mail Server Training Today I ran a hands-on training session on configuring a...

Linux Users of Victoria (LUV) Announce: LUV Main October 2017 Meeting: The Tor software and network

Sat, 2017-09-23 14:02
Start: Oct 3 2017 18:30 End: Oct 3 2017 20:30 Start: Oct 3 2017 18:30 End: Oct 3 2017 20:30 Location:  Mail Exchange Hotel, 688 Bourke St, Melbourne VIC 3000 Link:  http://mailexchangehotel.com.au/

PLEASE NOTE NEW LOCATION

Tuesday, October 3, 2017
6:30 PM to 8:30 PM
Mail Exchange Hotel
688 Bourke St, Melbourne VIC 3000

Speakers:

  • Russell Coker, Tor

Tor is free software and an open network that helps you defend against traffic analysis, a form of network surveillance that threatens personal freedom and privacy, confidential business activities and relationships, and state security.

Russell Coker has done lots of Linux development over the years, mostly involved with Debian.

Mail Exchange Hotel, 688 Bourke St, Melbourne VIC 3000

Food and drinks will be available on premises.

Linux Users of Victoria is a subcommittee of Linux Australia.

October 3, 2017 - 18:30

Linux Users of Victoria (LUV) Announce: LUV October 2017 Workshop: Hands-on with Tor

Sat, 2017-09-23 14:02
Start: Oct 21 2017 12:30 End: Oct 21 2017 16:30 Start: Oct 21 2017 12:30 End: Oct 21 2017 16:30 Location:  Infoxchange, 33 Elizabeth St. Richmond Link:  http://luv.asn.au/meetings/map

Hands-on with Tor

Following on from Russell Coker's well-attended Tor presentation at the October main meeting, he will cover torbrowser-launcher, torchat, ssh (as an example of a traditionally non-tor app that can run with it), and how to write a basic torchat in shell.

The meeting will be held at Infoxchange, 33 Elizabeth St. Richmond 3121 (enter via the garage on Jonas St.) Late arrivals, please call (0421) 775 358 for access to the venue.

LUV would like to acknowledge Infoxchange for the venue.

Linux Users of Victoria is a subcommittee of Linux Australia.

October 21, 2017 - 12:30

read more

sthbrx - a POWER technical blog: Stupid Solutions to Stupid Problems: Hardcoding Your SSH Key in the Kernel

Sat, 2017-09-23 04:00
The "problem"

I'm currently working on firmware and kernel support for OpenCAPI on POWER9.

I've recently been allocated a machine in the lab for development purposes. We use an internal IBM tool running on a secondary machine that triggers hardware initialisation procedures, then loads a specified skiboot firmware image, a kernel image, and a root file system directly into RAM. This allows us to get skiboot and Linux running without requiring the usual hostboot initialisation and gives us a lot of options for easier tinkering, so it's super-useful for our developers working on bringup.

When I got access to my machine, I figured out the necessary scripts, developed a workflow, and started fixing my code... so far, so good.

One day, I was trying to debug something and get logs off the machine using ssh and scp, when I got frustrated with having to repeatedly type in our ultra-secret, ultra-secure root password, abc123. So, I ran ssh-copy-id to copy over my public key, and all was good.

Until I rebooted the machine, when strangely, my key stopped working. It took me longer than it should have to realise that this is an obvious consequence of running entirely from an initrd that's reloaded every boot...

The "solution"

I mentioned something about this to Jono, my housemate/partner-in-stupid-ideas, one evening a few weeks ago. We decided that clearly, the best way to solve this problem was to hardcode my SSH public key in the kernel.

This would definitely be the easiest and most sensible way to solve the problem, as opposed to, say, just keeping my own copy of the root filesystem image. Or asking Mikey, whose desk is three metres away from mine, whether he could use his write access to add my key to the image. Or just writing a wrapper around sshpass...

One Tuesday afternoon, I was feeling bored...

The approach

The SSH daemon looks for authorised public keys in ~/.ssh/authorized_keys, so we need to have a read of /root/.ssh/authorized_keys return a specified hard-coded string.

I did a bit of investigation. My first thought was to put some kind of hook inside whatever filesystem driver was being used for the root. After some digging, I found out that the filesystem type rootfs, as seen in mount, is actually backed by the tmpfs filesystem. I took a look around the tmpfs code for a while, but didn't see any way to hook in a fake file without a lot of effort - the tmpfs code wasn't exactly designed with this in mind.

I thought about it some more - what would be the easiest way to create a file such that it just returns a string?

Then I remembered sysfs, the filesystem normally mounted at /sys, which is used by various kernel subsystems to expose configuration and debugging information to userspace in the form of files. The sysfs API allows you to define a file and specify callbacks to handle reads and writes to the file.

That got me thinking - could I create a file in /sys, and then use a bind mount to have that file appear where I need it in /root/.ssh/authorized_keys? This approach seemed fairly straightforward, so I decided to give it a try.

First up, creating a pseudo-file. It had been a while since the last time I'd used the sysfs API...

sysfs

The sysfs pseudo file system was first introduced in Linux 2.6, and is generally used for exposing system and device information.

Per the sysfs documentation, sysfs is tied in very closely with the kobject infrastructure. sysfs exposes kobjects as directories, containing "attributes" represented as files. The kobject infrastructure provides a way to define kobjects representing entities (e.g. devices) and ksets which define collections of kobjects (e.g. devices of a particular type).

Using kobjects you can do lots of fancy things such as sending events to userspace when devices are hotplugged - but that's all out of the scope of this post. It turns out there's some fairly straightforward wrapper functions if all you want to do is create a kobject just to have a simple directory in sysfs.

#include <linux/kobject.h> static int __init ssh_key_init(void) { struct kobject *ssh_kobj; ssh_kobj = kobject_create_and_add("ssh", NULL); if (!ssh_kobj) { pr_err("SSH: kobject creation failed!\n"); return -ENOMEM; } } late_initcall(ssh_key_init);

This creates and adds a kobject called ssh. And just like that, we've got a directory in /sys/ssh/!

The next thing we have to do is define a sysfs attribute for our authorized_keys file. sysfs provides a framework for subsystems to define their own custom types of attributes with their own metadata - but for our purposes, we'll use the generic bin_attribute attribute type.

#include <linux/sysfs.h> const char key[] = "PUBLIC KEY HERE..."; static ssize_t show_key(struct file *file, struct kobject *kobj, struct bin_attribute *bin_attr, char *to, loff_t pos, size_t count) { return memory_read_from_buffer(to, count, &pos, key, bin_attr->size); } static const struct bin_attribute authorized_keys_attr = { .attr = { .name = "authorized_keys", .mode = 0444 }, .read = show_key, .size = sizeof(key) };

We provide a simple callback, show_key(), that copies the key string into the file's buffer, and we put it in a bin_attribute with the appropriate name, size and permissions.

To actually add the attribute, we put the following in ssh_key_init():

int rc; rc = sysfs_create_bin_file(ssh_kobj, &authorized_keys_attr); if (rc) { pr_err("SSH: sysfs creation failed, rc %d\n", rc); return rc; }

Woo, we've now got /sys/ssh/authorized_keys! Time to move on to the bind mount.

Mounting

Now that we've got a directory with the key file in it, it's time to figure out the bind mount.

Because I had no idea how any of the file system code works, I started off by running strace on mount --bind ~/tmp1 ~/tmp2 just to see how the userspace mount tool uses the mount syscall to request the bind mount.

execve("/bin/mount", ["mount", "--bind", "/home/ajd/tmp1", "/home/ajd/tmp2"], [/* 18 vars */]) = 0 ... mount("/home/ajd/tmp1", "/home/ajd/tmp2", 0x18b78bf00, MS_MGC_VAL|MS_BIND, NULL) = 0

The first and second arguments are the source and target paths respectively. The third argument, looking at the signature of the mount syscall, is a pointer to a string with the file system type. Because this is a bind mount, the type is irrelevant (upon further digging, it turns out that this particular pointer is to the string "none").

The fourth argument is where we specify the flags bitfield. MS_MGC_VAL is a magic value that was required before Linux 2.4 and can now be safely ignored. MS_BIND, as you can probably guess, signals that we want a bind mount.

(The final argument is used to pass file system specific data - as you can see it's ignored here.)

Now, how is the syscall actually handled on the kernel side? The answer is found in fs/namespace.c.

SYSCALL_DEFINE5(mount, char __user *, dev_name, char __user *, dir_name, char __user *, type, unsigned long, flags, void __user *, data) { int ret; /* ... copy parameters from userspace memory ... */ ret = do_mount(kernel_dev, dir_name, kernel_type, flags, options); /* ... cleanup ... */ }

So in order to achieve the same thing from within the kernel, we just call do_mount() with exactly the same parameters as the syscall uses:

rc = do_mount("/sys/ssh", "/root/.ssh", "sysfs", MS_BIND, NULL); if (rc) { pr_err("SSH: bind mount failed, rc %d\n", rc); return rc; }

...and we're done, right? Not so fast:

SSH: bind mount failed, rc -2

-2 is ENOENT - no such file or directory. For some reason, we can't find /sys/ssh... of course, that would be because even though we've created the sysfs entry, we haven't actually mounted sysfs on /sys.

rc = do_mount("sysfs", "/sys", "sysfs", MS_NOSUID | MS_NOEXEC | MS_NODEV, NULL);

At this point, my key worked!

Note that this requires that your root file system has an empty directory created at /sys to be the mount point. Additionally, in a typical Linux distribution environment (as opposed to my hardware bringup environment), your initial root file system will contain an init script that mounts your real root file system somewhere and calls pivot_root() to switch to the new root file system. At that point, the bind mount won't be visible from children processes using the new root - I think this could be worked around but would require some effort.

Kconfig

The final piece of the puzzle is building our new code into the kernel image.

To allow us to switch this important functionality on and off, I added a config option to fs/Kconfig:

config SSH_KEY bool "Andrew's dumb SSH key hack" default y help Hardcode an SSH key for /root/.ssh/authorized_keys. This is a stupid idea. If unsure, say N.

This will show up in make menuconfig under the File systems menu.

And in fs/Makefile:

obj-$(CONFIG_SSH_KEY) += ssh_key.o

If CONFIG_SSH_KEY is set to y, obj-$(CONFIG_SSH_KEY) evaluates to obj-y and thus ssh-key.o gets compiled. Conversely, obj-n is completely ignored by the build system.

I thought I was all done... then Andrew suggested I make the contents of the key configurable, and I had to oblige. Conveniently, Kconfig options can also be strings:

config SSH_KEY_VALUE string "Value for SSH key" depends on SSH_KEY help Enter in the content for /root/.ssh/authorized_keys.

Including the string in the C file is as simple as:

const char key[] = CONFIG_SSH_KEY_VALUE;

And there we have it, a nicely configurable albeit highly limited kernel SSH backdoor!

Conclusion

I've put the full code up on GitHub for perusal. Please don't use it, I will be extremely disappointed in you if you do.

Thanks to Jono for giving me stupid ideas, and the rest of OzLabs for being very angry when they saw the disgusting things I was doing.

Comments and further stupid suggestions welcome!

sthbrx - a POWER technical blog: NCSI - Nice Network You've Got There

Fri, 2017-09-22 11:08

A neat piece of kernel code dropped into my lap recently, and as a way of processing having to inject an entire network stack into by brain in less-than-ideal time I thought we'd have a look at it here: NCSI!

NCSI - Not the TV Show

NCSI stands for Network Controller Sideband Interface, and put most simply it is a way for a management controller (eg. a BMC like those found on our OpenPOWER machines) to share a single physical network interface with a host machine. Instead of two distinct network interfaces you plug in a single cable and both the host and the BMC have network connectivity.

NCSI-capable network controllers achieve this by filtering network traffic as it arrives and determining if it is host- or BMC-bound. To know how to do this the BMC needs to tell the network controller what to look out for, and from a Linux driver perspective this the focus of the NCSI protocol.

Hi My Name Is 70:e2:84:14:24:a1

The major components of what NCSI helps facilitate are:

  • Network Controllers, known as 'Packages' in this context. There may be multiple separate packages which contain one or more Channels.
  • Channels, most easily thought of as the individual physical network interfaces. If a package is the network card, channels are the individual network jacks. (Somewhere a pedant's head is spinning in circles).
  • Management Controllers, or our BMC, with their own network interfaces. Hypothetically there can be multiple management controllers in a single NCSI system, but I've not come across such a setup yet.

NCSI is the medium and protocol via which these components communicate.

The interface between Management Controller and one or more Packages carries both general network traffic to/from the Management Controller as well as NCSI traffic between the Management Controller and the Packages & Channels. Management traffic is differentiated from regular traffic via the inclusion of a special NCSI tag inserted in the Ethernet frame header. These management commands are used to discover and configure the state of the NCSI packages and channels.

If a BMC's network interface is configured to use NCSI, as soon as the interface is brought up NCSI gets to work finding and configuring a usable channel. The NCSI driver at first glance is an intimidating combination of state machines and packet handlers, but with enough coffee it can be represented like this:

Without getting into the nitty gritty details the overall process for configuring a channel enough to get packets flowing is fairly straightforward:

  • Find available packages.
  • Find each package's available channels.
  • (At least in the Linux driver) select a channel with link.
  • Put this channel into the Initial Config State. The Initial Config State is where all the useful configuration occurs. Here we find out what the selected channel is capable of and its current configuration, and set it up to recognise the traffic we're interested in. The first and most basic way of doing this is configuring the channel to filter traffic based on our MAC address.
  • Enable the channel and let the packets flow.

At this point NCSI takes a back seat to normal network traffic, transmitting a "Get Link Status" packet at regular intervals to monitor the channel.

AEN Packets

Changes can occur from the package side too; the NCSI package communicates these back to the BMC with Asynchronous Event Notification (AEN) packets. As the name suggests these can occur at any time and the driver needs to catch and handle these. There are different types but they essentially boil down to changes in link state, telling the BMC the channel needs to be reconfigured, or to select a different channel. These are only transmitted once and no effort is made to recover lost AEN packets - another good reason for the NCSI driver to periodically monitor the channel.

Filtering

Each channel can be configured to filter traffic based on MAC address, broadcast traffic, multicast traffic, and VLAN tagging. Associated with each of these filters is a filter table which can hold a finite number of entries. In the case of the VLAN filter each channel could match against 15 different VLAN IDs for example, but in practice the physical device will likely support less. Indeed the popular BCM5718 controller supports only two!

This is where I dived into NCSI. The driver had a lot of the pieces for configuring VLAN filters but none of it was actually hooked up in the configure state, and didn't have a way of actually knowing which VLAN IDs were meant to be configured on the interface. The bulk of that work appears in this commit where we take advantage of some useful network stack callbacks to get the VLAN configuration and set them during the configuration state. Getting to the configuration state at some arbitrary time and then managing to assign multiple IDs was the trickiest bit, and is something I'll be looking at simplifying in the future.

NCSI! A neat way to give physically separate users access to a single network controller, and if it works right you won't notice it at all. I'll surely be spending more time here (fleshing out the driver's features, better error handling, and making the state machine a touch more readable to start, and I haven't even mentioned HWA), so watch this space!

OpenSTEM: Those Dirty Peasants!

Mon, 2017-09-18 10:04
It is fairly well known that many Europeans in the 17th, 18th and early 19th centuries did not follow the same routines of hygiene as we do today. There are anecdotal and historical accounts of people being dirty, smelly and generally unhealthy. This was particularly true of the poorer sections of society. The epithet “those […]