Planet Linux Australia

Syndicate content
Planet Linux Australia - http://planet.linux.org.au
Updated: 7 min 27 sec ago

Binh Nguyen: Trump Background, Random Stuff, and More

Fri, 2017-02-03 21:16
Given his recent inauguration, I thought it would be interesting to take a look at the background of the new US president, Donald Trump: https://www.bloomberg.com/politics/articles/2017-01-21/merkel-said-to-scour-trump-archive-for-clues-on-how-to-read-him https://www.rt.com/viral/374666-twitter-gifts-trump-followers/?utm_source=rss&utm_medium=rss&utm_campaign=RSS - well known background,

Michael Still: Nova vendordata deployment, an excessively detailed guide

Fri, 2017-02-03 15:00
Nova presents configuration information to instances it starts via a mechanism called metadata. This metadata is made available via either a configdrive, or the metadata service. These mechanisms are widely used via helpers such as cloud-init to specify things like the root password the instance should use. There are three separate groups of people who need to be able to specify metadata for an instance.

User provided data

The user who booted the instance can pass metadata to the instance in several ways. For authentication keypairs, the keypairs functionality of the Nova APIs can be used to upload a key and then specify that key during the Nova boot API request. For less structured data, a small opaque blob of data may be passed via the user-data feature of the Nova API. Examples of such unstructured data would be the puppet role that the instance should use, or the HTTP address of a server to fetch post-boot configuration information from.

Nova provided data

Nova itself needs to pass information to the instance via its internal implementation of the metadata system. Such information includes the network configuration for the instance, as well as the requested hostname for the instance. This happens by default and requires no configuration by the user or deployer.

Deployer provided data

There is however a third type of data. It is possible that the deployer of OpenStack needs to pass data to an instance. It is also possible that this data is not known to the user starting the instance. An example might be a cryptographic token to be used to register the instance with Active Directory post boot -- the user starting the instance should not have access to Active Directory to create this token, but the Nova deployment might have permissions to generate the token on the user's behalf.

Nova supports a mechanism to add "vendordata" to the metadata handed to instances. This is done by loading named modules, which must appear in the nova source code. We provide two such modules:

  • StaticJSON: a module which can include the contents of a static JSON file loaded from disk. This can be used for things which don't change between instances, such as the location of the corporate puppet server.
  • DynamicJSON: a module which will make a request to an external REST service to determine what metadata to add to an instance. This is how we recommend you generate things like Active Directory tokens which change per instance.


Tell me more about DynamicJSON

Having said all that, this post is about how to configure the DynamicJSON plugin, as I think its the most interesting bit here.

To use DynamicJSON, you configure it like this:

  • Add "DynamicJSON" to the vendordata_providers configuration option. This can also include "StaticJSON" if you'd like.
  • Specify the REST services to be contacted to generate metadata in the vendordata_dynamic_targets configuration option. There can be more than one of these, but note that they will be queried once per metadata request from the instance, which can mean a fair bit of traffic depending on your configuration and the configuration of the instance.


The format for an entry in vendordata_dynamic_targets is like this:

<name>@<url>

Where name is a short string not including the '@' character, and where the URL can include a port number if so required. An example would be:

testing@http://127.0.0.1:125

Metadata fetched from this target will appear in the metadata service at a new file called vendordata2.json, with a path (either in the metadata service URL or in the configdrive) like this:

openstack/2016-10-06/vendor_data2.json

For each dynamic target, there will be an entry in the JSON file named after that target. For example::

{ "testing": { "value1": 1, "value2": 2, "value3": "three" } }

Do not specify the same name more than once. If you do, we will ignore subsequent uses of a previously used name.

The following data is passed to your REST service as a JSON encoded POST:

  • project-id: the UUID of the project that owns the instance
  • instance-id: the UUID of the instance
  • image-id: the UUID of the image used to boot this instance
  • user-data: as specified by the user at boot time
  • hostname: the hostname of the instance
  • metadata: as specified by the user at boot time


Deployment considerations

Nova provides authentication to external metadata services in order to provide some level of certainty that the request came from nova. This is done by providing a service token with the request -- you can then just deploy your metadata service with the keystone authentication WSGI middleware. This is configured using the keystone authentication parameters in the vendordata_dynamic_auth configuration group.

This behavior is optional however, if you do not configure a service user nova will not authenticate with the external metadata service.

Deploying the same vendordata service

There is a sample vendordata service that is meant to model what a deployer would use for their custom metadata at http://github.com/mikalstill/vendordata. Deploying that service is relatively simple:

$ git clone http://github.com/mikalstill/vendordata $ cd vendordata $ apt-get install virtualenvwrapper $ . /etc/bash_completion.d/virtualenvwrapper (only needed if virtualenvwrapper wasn't already installed) $ mkvirtualenv vendordata $ pip install -r requirements.txt

We need to configure the keystone WSGI middleware to authenticate against the right keystone service. There is a sample configuration file in git, but its configured to work with an openstack-ansible all in one install that I setup up for my private testing, which probably isn't what you're using:

[keystone_authtoken] insecure = False auth_plugin = password auth_url = http://172.29.236.100:35357 auth_uri = http://172.29.236.100:5000 project_domain_id = default user_domain_id = default project_name = service username = nova password = 5dff06ac0c43685de108cc799300ba36dfaf29e4 region_name = RegionOne

Per the README file in the vendordata sample repository, you can test the vendordata server in a stand alone manner by generating a token manually from keystone:

$ curl -d @credentials.json -H "Content-Type: application/json" http://172.29.236.100:5000/v2.0/tokens > token.json $ token=`cat token.json | python -c "import sys, json; print json.loads(sys.stdin.read())['access']['token']['id'];"`

We then include that token in a test request to the vendordata service:

curl -H "X-Auth-Token: $token" http://127.0.0.1:8888/

Configuring nova to use the external metadata service

Now we're ready to wire up the sample metadata service with nova. You do that by adding something like this to the nova.conf configuration file:

[api] vendordata_providers=DynamicJSON vendordata_dynamic_targets=testing@http://metadatathingie.example.com:8888

Where metadatathingie.example.com is the IP address or hostname of the server running the external metadata service. Now if we boot an instance like this:

nova boot --image 2f6e96ca-9f58-4832-9136-21ed6c1e3b1f --flavor tempest1 --nic net-name=public --config-drive true foo

We end up with a config drive which contains the information or external metadata service returned (in the example case, handy Carrie Fischer quotes):

# cat openstack/latest/vendor_data2.json | python -m json.tool { "testing": { "carrie_says": "I really love the internet. They say chat-rooms are the trailer park of the internet but I find it amazing." } }

Tags for this post: openstack nova metadata vendordata configdrive cloud-init
Related posts: One week of Nova Kilo specifications; Specs for Kilo; Juno nova mid-cycle meetup summary: nova-network to Neutron migration; Juno Nova PTL Candidacy; Juno nova mid-cycle meetup summary: scheduler; Juno nova mid-cycle meetup summary: ironic

Comment

Linux Users of Victoria (LUV) Announce: LUV Beginners February Meeting: Static websites with Jekyll, Hugo and Forestry

Wed, 2017-02-01 23:02
Start: Feb 25 2017 12:30 End: Feb 25 2017 16:30 Start: Feb 25 2017 12:30 End: Feb 25 2017 16:30 Location:  Infoxchange, 33 Elizabeth St. Richmond Link:  http://luv.asn.au/meetings/map

PLEASE NOTE CHANGE OF DATE THIS MONTH ONLY

Static websites with Jekyll, Hugo and Forestry

Andrew Pam will demonstrate a new way to make websites complete with content management that doesn't require software running on a web server.  This technique enhances both performance and security.  More information at:

 

The meeting will be held at Infoxchange, 33 Elizabeth St. Richmond 3121 (enter via the garage on Jonas St.) Late arrivals, please call (0421) 775 358 for access to the venue.

LUV would like to acknowledge Infoxchange for the venue.

Linux Users of Victoria Inc., is an incorporated association, registration number A0040056C.

February 25, 2017 - 12:30

read more

Linux Users of Victoria (LUV) Announce: LUV Main February 2017 Meeting: OpenStack Barcelona Summit / Data Structures and Algorithms

Wed, 2017-02-01 23:02
Start: Feb 7 2017 18:30 End: Feb 7 2017 20:30 Start: Feb 7 2017 18:30 End: Feb 7 2017 20:30 Location:  6th Floor, 200 Victoria St. Carlton VIC 3053 Link:  http://luv.asn.au/meetings/map

Speakers:

• Lev Lafayette, OpenStack and the OpenStack Barcelona Summit
• Jacinta Richardson, Data Structures and Algorithms in the 21st Century

200 Victoria St. Carlton VIC 3053 (the EPA building)

Late arrivals needing access to the building and the sixth floor please call 0490 049 589.

Before and/or after each meeting those who are interested are welcome to join other members for dinner. We are open to suggestions for a good place to eat near our venue. Maria's on Peel Street in North Melbourne is currently the most popular place to eat after meetings.

LUV would like to acknowledge Red Hat for their help in obtaining the venue.

Linux Users of Victoria Inc. is an incorporated association, registration number A0040056C.

February 7, 2017 - 18:30

read more

Michael Still: Giving serial devices meaningful names

Wed, 2017-02-01 09:00
This is a hack I've been using for ages, but I thought it deserved a write up.

I have USB serial devices. Lots of them. I use them for home automation things, as well as for talking to devices such as the console ports on switches and so forth. For the permanently installed serial devices one of the challenges is having them show up in predictable places so that the scripts which know how to drive each device are talking in the right place.

For the trivial case, this is pretty easy with udev:

$ cat /etc/udev/rules.d/60-local.rules KERNEL=="ttyUSB*", \ ATTRS{idVendor}=="0403", ATTRS{idProduct}=="6001", \ ATTRS{serial}=="A8003Ye7", \ SYMLINK+="radish"

This says for any USB serial device that is discovered (either inserted post boot, or at boot), if the USB vendor and product ID match the relevant values, to symlink the device to "/dev/radish".

You find out the vendor and product ID from lsusb like this:

$ lsusb Bus 003 Device 003: ID 0624:0201 Avocent Corp. Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 007 Device 002: ID 0665:5161 Cypress Semiconductor USB to Serial Bus 007 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 006 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 004 Device 002: ID 0403:6001 Future Technology Devices International, Ltd FT232 Serial (UART) IC Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub Bus 009 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 008 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

You can play with inserting and removing the device to determine which of these entries is the device you care about.

So that's great, until you have more than one device with the same USB serial vendor and product id. Then things are a bit more... difficult.

It turns out that you can have udev execute a command on device insert to help you determine what symlink to create. So for example, I have this entry in the rules on one of my machines:

KERNEL=="ttyUSB*", \ ATTRS{idVendor}=="067b", ATTRS{idProduct}=="2303", \ PROGRAM="/usr/bin/usbtest /dev/%k", \ SYMLINK+="%c"

This results in /usr/bin/usbtest being run with the path of the device file on its command line for every device detection (of a matching device). The stdout of that program is then used as the name of a symlink in /dev.

So, that script attempts to talk to the device and determine what it is -- in my case either a currentcost or a solar panel inverter.

Tags for this post: linux udev serial usb usbserial
Related posts: SMART and USB storage; Video4Linux, ov511, and RGB24 palettes; ov511 hackery; Ubuntu, Dapper Drake, and that difficult Dell e310; Roomba serial cables; Via M10000, video, and a Belkin wireless USB thing

Comment

sthbrx - a POWER technical blog: NAMD on NVLink

Wed, 2017-02-01 08:32

NAMD is a molecular dynamics program that can use GPU acceleration to speed up its calculations. Recent OpenPOWER machines like the IBM Power Systems S822LC for High Performance Computing (Minsky) come with a new interconnect for GPUs called NVLink, which offers extremely high bandwidth to a number of very powerful Nvidia Pascal P100 GPUs. So they're ideal machines for this sort of workload.

Here's how to set up NAMD 2.12 on your Minsky, and how to debug some common issues. We've targeted this script for CentOS, but we've successfully compiled NAMD on Ubuntu as well.

Prerequisites GPU Drivers and CUDA

Firstly, you'll need CUDA and the NVidia drivers.

You can install CUDA by following the instructions on NVidia's CUDA Downloads page.

yum install epel-release yum install dkms # download the rpm from the NVidia website rpm -i cuda-repo-rhel7-8-0-local-ga2-8.0.54-1.ppc64le.rpm yum clean expire-cache yum install cuda # this will take a while...

Then, we set up a profile file to automatically load CUDA into our path:

cat > /etc/profile.d/cuda_path.sh <<EOF # From http://developer.download.nvidia.com/compute/cuda/8.0/secure/prod/docs/sidebar/CUDA_Quick_Start_Guide.pdf - 4.4.2.1 export PATH=/usr/local/cuda-8.0/bin${PATH:+:${PATH}} export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}} EOF

Now, open a new terminal session and check to see if it works:

cuda-install-samples-8.0.sh ~ cd ~/NVIDIA_CUDA-8.0_Samples/1_Utilities/bandwidthTest make && ./bandwidthTest

If you see a figure of ~32GB/s, that means NVLink is working as expected. A figure of ~7-8GB indicates that only PCI is working, and more debugging is required.

Compilers

You need a c++ compiler:

yum install gcc-c++ Building NAMD

Once CUDA and the compilers are installed, building NAMD is reasonably straightforward. The one hitch is that because we're using CUDA 8.0, and the NAMD build scripts assume CUDA 7.5, we need to supply an updated Linux-POWER.cuda file. (We also enable code generation for the Pascal in this file.)

We've documented the entire process as a script which you can download. We'd recommend executing the commands one by one, but if you're brave you can run the script directly.

The script will fetch NAMD 2.12 and build it for you, but won't install it. It will look for the CUDA override file in the directory you are running the script from, and will automatically move it into the correct place so it is picked up by the build system..

The script compiles for a single multicore machine setup, rather than for a cluster. However, it should be a good start for an Ethernet or Infiniband setup.

If you're doing things by hand, you may see some errors during the compilation of charm - as long as you get charm++ built successfully. at the end, you should be OK.

Testing NAMD

We have been testing NAMD using the STMV files available from the NAMD website:

cd NAMD_2.12_Source/Linux-POWER-g++ wget http://www.ks.uiuc.edu/Research/namd/utilities/stmv.tar.gz tar -xf stmv.tar.gz sudo ./charmrun +p80 ./namd2 +pemap 0-159:2 +idlepoll +commthread stmv/stmv.namd

This binds a namd worker thread to every second hardware thread. This is because hardware threads share resources, so using every hardware thread costs overhead and doesn't give us access to any more physical resources.

You should see messages about finding and using GPUs:

Pe 0 physical rank 0 binding to CUDA device 0 on <hostname>: 'Graphics Device' Mem: 4042MB Rev: 6.0

This should be significantly faster than on non-NVLink machines - we saw a gain of about 2x in speed going from a machine with Nvidia K80s to a Minsky. If things aren't faster for you, let us know!

Downloads Other notes

Namd requires some libraries, some of which they supply as binary downloads on their website. Make sure you get the ppc64le versions, not the ppc64 versions, otherwise you'll get errors like:

/bin/ld: failed to merge target specific data of file .rootdir/tcl/lib/libtcl8.5.a(regfree.o) /bin/ld: .rootdir/tcl/lib/libtcl8.5.a(regerror.o): compiled for a big endian system and target is little endian /bin/ld: failed to merge target specific data of file .rootdir/tcl/lib/libtcl8.5.a(regerror.o) /bin/ld: .rootdir/tcl/lib/libtcl8.5.a(tclAlloc.o): compiled for a big endian system and target is little endian

The script we supply should get these right automatically.

sthbrx - a POWER technical blog: linux.conf.au 2017 review

Tue, 2017-01-31 16:07

I recently attended LCA 2017, where I gave a talk at the Linux Kernel miniconf (run by fellow sthbrx blogger Andrew Donnellan!) and a talk at the main conference.

I received some really interesting feedback so I've taken the opportunity to write some of it down to complement the talk videos and slides that are online. (And to remind me to follow up on it!)

Miniconf talk: Sparse Warnings

My kernel miniconf talk was on sparse warnings (pdf slides, 23m video).

The abstract read (in part):

sparse is a semantic parser for C, and is one of the static analysis tools available to kernel devs.

Sparse is a powerful tool with good integration into the kernel build system. However, we suffer from warning overload - there are too many sparse warnings to spot the serious issues amongst the trivial. This makes it difficult to use, both for developers and maintainers.

Happily, I received some feedback that suggests it's not all doom and gloom like I had thought!

  • Dave Chinner told me that the xfs team uses sparse regularly to make sure that the file system is endian-safe. This is good news - we really would like that to be endian-safe!

  • Paul McKenney let me know that the 0day bot does do some sparse checking - it would just seem that it's not done on PowerPC.

Main talk: 400,000 Ephemeral Containers

My main talk was entitled "400,000 Ephemeral Containers: testing entire ecosystems with Docker". You can read the abstract for full details, but it boils down to:

What if you want to test how all the packages in a given ecosystem work in a given situation?

My main example was testing how many of the Ruby packages successfully install on Power, but I also talk about other languages and other cool tests you could run.

The 44m video is online. I haven't put the slides up yet but they should be available on GitHub soonish.

Unlike with the kernel talk, I didn't catch the names of most of the people with feedback.

Docker memory issues

One of the questions I received during the talk was about running into memory issues in Docker. I attempted to answer that during the Q&A. The person who asked the question then had a chat with me afterwards, and it turns out I had completely misunderstood the question. I thought it was about memory usage of running containers in parallel. It was actually about memory usage in the docker daemon when running lots of containers in serial. Apparently the docker daemon doesn't free memory during the life of the process, and the question was whether or not I had observed that during my runs.

I didn't have a good answer for this at the time other than "it worked for me", so I have gone back and looked at the docker daemon memory usage.

After a full Ruby run, the daemon is using about 13.9G of virtual memory, and 1.975G of resident memory. If I restart it, the memory usage drops to 1.6G of virtual and 43M of resident memory. So it would appear that the person asking the question was right, and I'm just not seeing it have an effect.

Other interesting feedback
  • Someone was quite interested in testing on Sparc, once they got their Go runtime nailed down.

  • A Rackspacer was quite interested in Python testing for OpenStack - this has some intricacies around Py2/Py3, but we had an interesting discussion around just testing to see if packages that claim Py3 support provide Py3 support.

  • A large jobs site mentioned using this technique to help them migrate their dependencies between versions of Go.

  • I was 'gently encouraged' to try to do better with how long the process takes to run - if for no other reason than to avoid burning more coal. This is a fair point. I did not explain very well what I meant with diminishing returns in the talk: there's lots you could do to make the process faster, it's just comes at the cost of the simplicity that I really wanted when I first started the project. I am working (on and off) on better ways to deal with this by considering the dependency graph.

Binh Nguyen: Linux BASH CLI RSS Reader, Explaining Prophets 4, and More

Tue, 2017-01-31 05:00
- built my own RSS feed reader yesterday. It actually took a lot less time then going out to search for one that suited my needs. It's based on someone else's code (credit given in code but since that code was so buggy that it wouldn't work) I guess it's mine now? https://sites.google.com/site/dtbnguyen/rssread-1.11.tar.gz https://sites.google.com/site/dtbnguyen/ - code to extract from

Tim Serong: My Personal Travel Ban

Mon, 2017-01-30 21:04

I plan to avoid any and all travel to the USA for the foreseeable future due to the complete mess unfolding there with Trump’s executive orders banning immigration from some Muslim-majority countries, related protests, illegal detainment, etc. etc. (the list goes on, and I expect it to get longer).

It’s not that I’m from one of the blacklist countries, and I’m not a Muslim. I’m even white. But I no longer consider travel to the USA safe (especially bearing in mind my ridiculous beard and long hair), and even if I did, I’d want to stand in solidarity with the people who are currently being screwed. The notion of banning entire groups of people based on a single shared trait (in this case, probable adherence to a particular religion) is abhorrent; it demonizes our fellow humans, divides us and builds walls – whether metaphorical or physical – between our various communities. The fact that this immigration ban will impact refugees and asylum seekers just makes matters worse. I am deeply ashamed by Australia’s record on that front too, and concerned that our government will not do much better.

So I won’t be putting in any talks for Cephalocon - which is a damn shame, as I’m working on Ceph – or for any other US-based tech conference unless and until the situation over there changes.

I realise this post may not make much difference in the grander scheme of things, but one more voice is one more voice.

Michael Still: A pythonic example of recording metrics about ephemeral scripts with prometheus

Mon, 2017-01-30 21:00
In my previous post we talked about how to record information from short lived scripts (I call them ephemeral scripts by the way) with prometheus. The example there was a script which checked the SMART status of each of the disks in a machine and reported that via pushgateway. I now want to work through a slightly more complicated example.

I think you hit the limits of reporting simple values in shell scripts via curl requests fairly quickly. For example with the SMART monitoring script, SMART is capable of returning a whole heap of metrics about the performance of a disk, but we boiled that down to a single "health" value. This is largely because writing a parser for all the other values that smartctl returns would be inefficient and fragile in shell. So for this post, we're going to work through an example of how to report a variety of values from a python script. Those values could be the parsed output of smartctl, but to mix things up a bit, I'm going to use a different script I wrote recently.

This new script uses the Weather Underground API to lookup weather stations near my house, and then generate graphics of the weather forecast. These graphics are displayed on the various Cisco SIP phones I already had around the house. The forecasts look like this:



The script to generate these weather forecasts is relatively simple python, and you can see the source code on github.

My cunning plan here is to use prometheus' time series database and alert capabilities to drive home automation around my house. The first step for that is to start gathering some simple facts about the home environment so that we can do trending and decision making on them. The code to do this isn't all that complicated. First off, we need to add the python prometheus client to our python environment, which is hopefully a venv:

pip install prometheus_client pip install six

That second dependency isn't a strict requirement for prometheus, but the script I'm working on needs it (because it needs to work out what's a text value, and python 3 is bonkers).

Next we import the prometheus client in our code and setup the counter registry. At the same time I record when the script was run:

from prometheus_client import CollectorRegistry, Gauge, push_to_gateway registry = CollectorRegistry() Gauge('job_last_success_unixtime', 'Last time the weather job ran', registry=registry).set_to_current_time()

And then we just add gauges for any values we want to add to the pushgateway

Gauge('_'.join(field), '', registry=registry).set(value)

Finally, the values don't exist in the pushgateway until we actually push them there, which we do like this:

push_to_gateway('localhost:9091', job='weather', registry=registry)

You can see the entire patch I wrote to add prometheus support on github if you're interested in an example with more context.

Now we can have pretty graphs of temperature and stuff!

Tags for this post: prometheus monitoring python pushgateway
Related posts: Recording performance information from short lived processes with prometheus; Basic prometheus setup; Implementing SCP with paramiko; Mona Lisa Overdrive; Packet capture in python; mbot: new hotness in Google Talk bots

Comment

Francois Marier: Creating a home music server using mpd

Mon, 2017-01-30 17:21

I recently setup a music server on my home server using the Music Player Daemon, a cross-platform free software project which has been around for a long time.

Basic setup

Start by installing the server and the client package:

apt install mpd mpc

then open /etc/mpd.conf and set these:

music_directory "/path/to/music/" bind_to_address "192.168.1.2" bind_to_address "/run/mpd/socket" zeroconf_enabled "yes" password "Password1"

before replacing the alsa output:

audio_output { type "alsa" name "My ALSA Device" }

with a pulseaudio one:

audio_output { type "pulse" name "Pulseaudio Output" }

In order for the automatic detection (zeroconf) of your music server to work, you need to prevent systemd from creating the network socket:

systemctl stop mpd.service systemctl stop mpd.socket systemctl disable mpd.socket

otherwise you'll see this in /var/log/mpd/mpd.log:

zeroconf: No global port, disabling zeroconf

Once all of that is in place, start the mpd daemon:

systemctl start mpd.service

and create an index of your music files:

MPD_HOST=Password1@/run/mpd/socket mpc update

while watching the logs to notice any files that the mpd user doesn't have access to:

tail -f /var/log/mpd/mpd.log Enhancements

I also added the following in /etc/logcheck/ignore.server.d/local-mpd to silence unnecessary log messages in logcheck emails:

^\w{3} [ :0-9]{11} [._[:alnum:]-]+ systemd\[1\]: Started Music Player Daemon.$ ^\w{3} [ :0-9]{11} [._[:alnum:]-]+ systemd\[1\]: Stopped Music Player Daemon.$ ^\w{3} [ :0-9]{11} [._[:alnum:]-]+ systemd\[1\]: Stopping Music Player Daemon...$

and created a cronjob in /etc/cron.d/mpd-francois to update the database daily and stop the music automatically in the evening:

# Refresh DB once a day 5 1 * * * mpd MPD_HOST=Password1@/run/mpd/socket /usr/bin/mpc --quiet update # Think of the neighbours 0 22 * * 0-4 mpd MPD_HOST=Password1@/run/mpd/socket /usr/bin/mpc --quiet stop 0 23 * * 5-6 mpd MPD_HOST=Password1@/run/mpd/socket /usr/bin/mpc --quiet stop Clients

To let anybody on the local network connect, I opened port 6600 on the firewall (/etc/network/iptables.up.rules since I'm using Debian's iptables-apply):

-A INPUT -s 192.168.1.0/24 -p tcp --dport 6600 -j ACCEPT

Then I looked at the long list of clients on the mpd wiki.

Desktop

The official website suggests two clients which are available in Debian and Ubuntu:

Both of them work well, but haven't had a release since 2011, even though there is some activity in 2013 and 2015 in their respective source control repositories.

Ario has a simpler user interface but gmpc has cover art download working out of the box, which is why I might stick with it.

In both cases, it is possible to configure a polipo proxy so that any external resources are fetched via Tor.

Android

On Android, I got these two to work:

I picked M.A.L.P. since it includes a nice widget for the homescreen.

iOS

On iOS, these are the most promising clients I found:

since MPoD and MPaD don't appear to be available on the AppStore anymore.

sthbrx - a POWER technical blog: Extracting Early Boot Messages in QEMU

Mon, 2017-01-30 16:47

Be me, you're a kernel hacker, you make some changes to your kernel, you boot test it in QEMU, and it fails to boot. Even worse is the fact that it just hangs without any failure message, no stack trace, no nothing. "Now what?" you think to yourself.

You probably do the first thing you learnt in debugging101 and add abundant print statements all over the place to try and make some sense of what's happening and where it is that you're actually crashing. So you do this, you recompile your kernel, boot it in QEMU and lo and behold, nothing... What happened? You added all these shiny new print statements, where did the output go? The kernel still failed to boot (obviously), but where you were hoping to get some clue to go on you were again left with an empty screen. "Maybe I didn't print early enough" or "maybe I got the code paths wrong" you think, "maybe I just need more prints" even. So lets delve a bit deeper, why didn't you see those prints, where did they go, and how can you get at them?

__log_buf

So what happens when you call printk()? Well what normally happens is, depending on the log level you set, the output is sent to the console or logged so you can see it in dmesg. But what happens if we haven't registered a console yet? Well then we can't print the message can we, so its logged in a buffer, kernel log buffer to be exact helpfully named __log_buf.

Console Registration

So how come I eventually see print statements on my screen? Well at some point during the boot process a console is registered with the printk system, and any buffered output can now be displayed. On ppc it happens that this occurs in register_early_udbg_console() called in setup_arch() from start_kernel(), which is the generic kernel entry point. From this point forward when you print something it will be displayed on the console, but what if you crash before this? What are you supposed to do then?

Extracting Early Boot Messages in QEMU

And now the moment you've all been waiting for, how do I extract those early boot messages in QEMU if my kernel crashes before the console is registered? Well it's quite simple really, QEMU is nice enough to allow us to dump guest memory, and we know the log buffer is in there some where, so we just need to dump the correct part of memory which corresponds to the log buffer.

Locating __log_buf

Before we can dump the log buffer we need to know where it is. Luckily for us this is fairly simple, we just need to dump all the kernel symbols and look for the right one.

> nm vmlinux > tmp; grep __log_buf tmp; c000000000f5e3dc b __log_buf

We use the nm tool to list all the kernel symbols and output this into some temporary file, we can then grep this for the log buffer (which we know to be named __log_buf), and presto we are told that it's at kernel address 0xf5e3dc.

Dumping Guest Memory

It's then simply a case of dumping guest memory from the QEMU console. So first we press ^a+c to get us to the QEMU console, then we can use the aptly named dump-guest-memory.

> help dump-guest-memory dump-guest-memory [-p] [-d] [-z|-l|-s] filename [begin length] -- dump guest memory into file 'filename'. -p: do paging to get guest's memory mapping. -d: return immediately (do not wait for completion). -z: dump in kdump-compressed format, with zlib compression. -l: dump in kdump-compressed format, with lzo compression. -s: dump in kdump-compressed format, with snappy compression. begin: the starting physical address. length: the memory size, in bytes.

We just give it a filename for where we want our output to go, we know the starting address, we just don't know the length. We could choose some arbitrary length, but inspection of the kernel code shows us that:

#define __LOG_BUF_LEN (1 << CONFIG_LOG_BUF_SHIFT) static char __log_buf[__LOG_BUF_LEN] __aligned(LOG_ALIGN);

Looking at the pseries_defconfig file shows us that the LOG_BUF_SHIFT is set to 18, and thus we know that the buffer is 2^18 bytes or 256kb. So now we run:

> dump-guest-memory tmp 0xf5e3dc 262144

And we now get our log buffer in the file tmp. This can simply be viewed with:

> hexdump -C tmp

This gives a readable, if poorly formatted output. I'm sure you can find something better but I'll leave that as an exercise for the reader.

Conclusion

So if like me your kernel hangs somewhere early in the boot process and you're left without your console output you are now fully equipped to extract the log buffer in QEMU and hopefully therein lies the answer to why you failed to boot.

Chris Smart: Git hook to help with OpenStack development

Mon, 2017-01-30 15:02

I wrote a small Git hook which may be useful in helping OpenStack devs run tests (and any script they like) before a commit is made (see Superuser magazine article).

This way we can save everyone time in the review process by fixing simple issues before they break in the check-pipeline.

Installation is easy (see the GitHub page) and all prompts default to no, so that the dev can easily just hit Enter to skip and continue (but still be reminded).

sthbrx - a POWER technical blog: Installing Centos 7.2 on IBM Power System's S822LC for High Performance Computing (Minksy) with USB device

Mon, 2017-01-30 08:54
Introduction

If you are installing Linux on your IBM Power System's S822LC server then the instructions in this article will help you to start and run your system. These instructions are specific to installing CentOS 7 on an IBM Power System S822LC for High Performance Computing (Minsky), but also work for RHEL 7 - just swap CentOS for RHEL.

Prerequisites

Before you power on the system, ensure that you have the following items:

  • Ethernet cables;
  • USB storage device of 7G or greater;
  • An installed ethernet network with a DHCP server;
  • Access to the DHCP server's logs;
  • Power cords and outlet for your system;
  • PC or notebook that has IPMItool level 1.8.15 or greater; and
  • a VNC client.

Download CentOS ISO file from the Centos Mirror. Select the "Everything" ISO file.

Note: You must use the 1611 release (dated 2016-12-22) or later due to Linux Kernel support for the server hardware.

Step 1: Preparing to power on your system

Follow these steps to prepare your system:

  1. If your system belongs in a rack, install your system into that rack. For instructions, see IBM POWER8 Systems information.
  2. Connect an Ethernet cable to the left embedded Ethernet port next to the serial port on the back of your system and the other end to your network. This Ethernet port is used for the BMC/IPMI interface.
  3. Connect another Enternet cable to the right Ethernet port for network connection for the operating system.
  4. Connect the power cords to the system and plug them into the outlets.

At this point, your firmware is booting.

Step 2: Determining the BMC firmware IP address

To determine the IP address of the BMC, examine the latest DHCP server logs for the network connected to the server. The IP address will be requested approximately 2 minutes after being powered on.

It is possible to set the BMC to a static IP address by following the IBM documentation on IPMI.

Step 3: Connecting to the BMC firmware with IPMItool

After you have a network connection set up for your BMC firmware, you can connect using Intelligent Platform Management Interface (IPMI). IPMI is the default console to use when connecting to the Open Power Abstraction Layer (OPAL) firmware.

Use the default authentication for servers over IPMI is:

  • Default user: ADMIN
  • Default password: admin

To power on your server from a PC or notebook that is running Linux®, follow these steps:

Open a terminal program on your PC or notebook with Activate Serial-Over-Lan using IPMI. Use other steps here as needed.

For the following impitool commands, server_ip_address is the IP address of the BMC from Step 2, and ipmi_user and ipmi_password are the default user ID and password for IPMI.

Power On using IPMI

If your server is not powered on, run the following command to power the server on:

ipmitool -I lanplus -H server_ip_address -U ipmi_user -P ipmi_password chassis power on Activate Serial-Over-Lan using IPMI

Activate your IPMI console by running this command:

ipmitool -I lanplus -H server_ip_address -U ipmi_user -P ipmi_password sol activate

After powering on your system, the Petitboot interface loads. If you do not interrupt the boot process by pressing any key within 10 seconds, Petitboot automatically boots the first option. At this point the IPMI console will be connected to the Operating Systems serial. If you get to this stage accidently you can deactivate and reboot as per the following two commands.

Deactivate Serial-Over-Lan using IPMI

If you need to power off or reboot your system, deactivate the console by running this command:

ipmitool -I lanplus -H server_ip_address -U user-name -P ipmi_password sol deactivate Reboot using IPMI

If you need to reboot the system, run this command:

ipmitool -I lanplus -H server_ip_address -U user-name -P ipmi_password chassis power reset Step 4: Creating a USB device and booting

At this point, your IPMI console should be contain a Petitboot bootloader menu as illustrated below and you are ready to install Centos 7 on your server.

Use one of the following USB devices:

  • USB attached DVD player with a single USB cable to stay under 1.0 Amps, or
  • 7 GB (or more) 2.0 (or later) USB flash drive.

Follow the following instructions:

  1. To create the bootable USB device, follow the instructions in the CentOS wiki Host to Set Up a USB to Install CentOS.
  2. Insert your bootable USB device into the front USB port. CentOS AltArch installer will automatically appear as a boot option on the Petitboot main screen. If the USB device does not appear select Rescan devices. If your device is not detected, you might have to try a different type.
  3. Arrow up to select the CentOS boot option. Press e (Edit) to open the Petitboot Option Editor window
  4. Move the cursor to the Boot arguments section and to include the following information: ro inst.stage2=hd:LABEL=CentOS_7_ppc64le:/ console=hvc0 ip=dhcp (if using RHEL the LABEL will be similar to RHEL-7.3\x20Server.ppc64le:/)

Notes about the boot arguments:

  • ip=dhcp to ensure network is started for VNC installation.
  • console hvc0 is needed as this is not the default.
  • inst.stage2 is needed as the boot process won't automatically find the stage2 install on the install disk.
  • append inst.proxy=URL where URL is the proxy URL if installing in a network that requires a proxy to connect externally.

You can find additional options at Anaconda Boot Options.

  1. Select OK to save your options and return to the Main menu
  2. On the Petitboot main screen, select the CentOS AltArch option and then press Enter.
Step 5: Complete your installation

After you select to boot the CentOS installer, the installer wizard walks you through the steps.

  1. If the CentOS installer was able to obtain a network address via DHCP, it will present an option to enable the VNC. If no option is presented check your network cables.
  2. Select the Start VNC option and it will provide an OS server IP adress. Note that this will be different to the BMC address previously optained.
  3. Run a VNC client program on your PC or notebook and connect to the OS server IP address.

During the install over VNC, there are a couple of consoles active. To switch between them in the ipmitool terminal, press ctrl-b and then between 1-4 as indicated.

Using the VNC client program:

  1. Select "Install Destination"
  2. Select a device from "Local Standard Disks"
  3. Select "Full disk summary and boot device"
  4. Select the device again from "Selected Disks" with the Boot enabled
  5. Select "Do not install boot loader" from device. which results in .

Without disabling boot loader, the installer complains about an invalid stage1 device. I suspect it needs a manual Prep partition of 10M to make the installer happy.

If you have a local Centos repository you can set this by selecting "Install Source" - the directories at this url should look like CentOS's Install Source for ppc64le.

Step 6: Before reboot and using the IPMI Serial-Over-LAN

Before reboot, generate the grub.cfg file as Petitboot uses this to generate its boot menu:

  1. Using the ipmitool's shell (ctrl-b 2):
  2. Enter the following commands to generate a grub.cfg file
chroot /mnt/sysimage rm /etc/grub.d/30_os-prober grub2-mkconfig -o /boot/grub2/grub.cfg exit

/etc/grub.d/30_os-prober is removed as Petitboot probes the other devices anyway so including it would create lots of duplicate menu items.

The last step is to restart your system.

Note: While your system is restarting, remove the USB device.

After the system restarts, Petitboot displays the option to boot CentOS 7.2. Select this option and press Enter.

Conclusion

After you have booted CentOS, your server is ready to go! For more information, see the following resources:

Tim Serong: Random Test Subject

Mon, 2017-01-30 01:04

Almost every time I fly, it seems like I get pulled aside for the random explosives trace detection test. I always assumed it was because I usually look like a crazy mountain man (see photo). But, if you google around for “airport random explosives test”, you’ll find forum posts from security staff assuring everyone they’re not doing profiling, and even a helpful FAQ from Newcastle Airport (“Why are you always chosen for ‘explosive testing’?“) which says the process is “as the officer finishes screening one person, they are required to ask the next person walking through screening to undertake the ETD test”.

So maybe it’s just bad luck. Except possibly for that time at Hobart airport last week, where I was seeing off a colleague after linux.conf.au 2017. As far as I could tell, we were the only two people approaching security, and my colleague was in front. He was waved through to the regular security screening, and I was asked over for an explosives test, to which I replied “you’re most welcome to test me if you like, but I’m not actually going through security into departures”. The poor guy looked a bit nonplussed at this, then moved on to the next traveler who’d since appeared in line behind us.

What to do about this in future? Obviously, I need a new t-shirt, with text something like one of these:

If anyone else would like a t-shirt along these lines, the images above conveniently link to my Redbubble store. Or, if you’d rather DIY, there’s PNGs here, here and here (CC-BY-SA as usual, and no, they’re not broken, it’s white text on a transparent background).

Finally, for some real randomness, check out Keith Packard’s ChaosKey To Production presentation. I’m not actually affiliated with Keith, but the ChaosKey sure looks nifty.

Clinton Roy: South Coast Track Report

Sun, 2017-01-29 23:00

Please note this is a work in progress

I had previously stated my intention to walk the South Coast Track. I have now completed this walk and now want a space where I can collect all my thoughts.

Photos: Google Photos album

The sections I’m referring to here come straight from the guide book. Due to the walking weather and tides all being in our favour, we managed to do the walk in six days. We flew in late on the first day and did not finish section one of the walk, on the second day we finished section one and then completed section two and three. On day three it was just the Ironbound range. On day four it was just section five. Day five we completed section six and the tiny section seven. Day six was section eight and day seven was cockle creak (TODO something’s not adding up here)

The hardest day, not surprisingly, was day three where we tackled the Ironbound range, 900m up, then down. The surprising bit was how easy the ascent was and how god damn hard the descent was. The guide book says there are three rest camps on the descent, with one just below the peak, a perfect spot for lunch. Either this camp is hidden (e.g. you have to look behind you) or it’s overgrown, as we all missed it. This meant we ended up skipping lunch and were slipping down the wed, muddy awful descent side for hours. When we came across the mid rest camp stop, because we’d been walking for so long, everyone assumed we were at the lower camp stop and that we were therefore only an hour or so away from camp. Another three hours later or so we actually came across the lower camp site, and the by that time all sense of proportion was lost and I was starting to get worried that somehow we’d gotten lost and were not on the right trail and that we’d run out of light. In the end I got into camp about an hour before sundown (approx eight) and B&R got in about half an hour before sundown. I was utterly exhausted, got some water, pitched the tent, collapsed in it and fell asleep. Woke up close to midnight, realised I hadn’t had any lunch or dinner, still wasn’t actually feeling hungry. I forced myself to eat a hot meal, then collapsed in bed again.

TODO: very easy to follow trail.
TODO: just about everything worked.
TODO: spork
TODO: solar panel
TODO: not eating properly
TODO: needing more warmth

I could not have asked for better walking companions, Richard and Bec.


Filed under: camping, Uncategorized

Julien Goodwin: Charging my ThinkPad X1 Gen4 using USB-C

Sun, 2017-01-29 01:02
As with many massive time-sucking rabbit holes in my life, this one starts with one of my silly ideas getting egged on by some of my colleagues in London (who know full well who they are), but for a nice change, this is something I can talk about.

I have a rather excessive number of laptops, at the moment my three main ones are a rather ancient Lenovo T430 (personal), a Lenovo X1 Gen4, and a Chromebook Pixel 2 (both work).

At the start of last year I had a T430s in place of the X1, and was planning on replacing both it and my personal ThinkPad mid-year. However both of those older laptops used Lenovo's long-held (back to the IBM days) barrel charger, which lead to me having a heap of them in various locations at home and work, but all the newer machines switched to their newer rectangular "slim" style power connector and while adapters exist, I decided to go in a different direction.

One of the less-touted features of USB-C is USB-PD[1], which allows devices to be fed up to 100W of power, and can do so while using the port for data (or the other great feature of USB-C, alternate modes, such as DisplayPort, great for docks), which is starting to be used as a way to charge laptops, such as the Chromebook Pixel 2, various models of the Apple MacBook line, and more.

Instead of buying a heap of slim-style Lenovo chargers, or a load of adapters (which would inevitably disappear over time) I decided to bridge towards the future by making an adapter to allow me to charge slim-type ThinkPads (at least the smaller ones, not the portable workstations which demand 120W or more).

After doing some research on what USB-PD platforms were available at the time I settled on the TI TPS65986 chip, which, with only an external flash chip, would do all that I needed.

Devkits were ordered to experiment with, and prove the concept, which they did very quickly, so I started on building the circuit, since just reusing the devkit boards would lead to an adapter larger than would be sensible. As the TI chip is a many-pin BGA, and breaking it out on 2-layers would probably be too hard for my meager PCB design skills, I needed a 4-layer board, so I decided to use KiCad for the project.

It took me about a week of evenings to get the schematic fully sorted, with much of the time spent reading the chip datasheet, or digging through the devkit schematic to see what they did there for some cases that weren't clear, then almost a month for the actual PCB layout, with much of the time being sucked up learning a tool that was brand new to me, and also fairly obtuse.

By mid-June I had a PCB which should (but, spoiler, wouldn't) work, however as mentioned the TI chip is a 96-ball 3x3mm BGA, something I had no hope of manually placing for reflow, and of course, no hope of hand soldering, so I would need to get these manufactured commercially. Luckily there are several options for small scale assembly at very reasonable prices, and I decided to try a new company still (at the time of ordering) in closed trials, PCB.NG, they have a nice simple procedure to upload board files, and a slightly custom pick & place file that includes references to the exact component I want by Digikey[link] part number. Best of all the pricing was completely reasonable, with a first test run of six boards only costing my US$30 each.

Late in June I recieved a mail from PCB.NG telling me that they'd built my boards, but that I had made a mistake with the footprint I'd used for the USB-C connector and they were posting my boards along with the connectors. As I'd had them ship the order to California (at the time they didn't seem to offer international shipping) it took a while for them to arrive in Sydney, courtesy a coworker.

I tried to modify a connector by removing all through hole board locks, keeping just the surface mount pins, however I was unsuccessful, and that's where the project stalled until mid-October when I was in California myself, and was able to get help from a coworker who can perform miracles of surface mount soldering (while they were working on my board they were also dead-bug mounting a BGA). Sadly while I now had a board I could test it simply dropped off my priority list for months.

At the start of January another of my colleagues (a US-based teammate of the London rabble-rousers) asked for a status update, which prompted me to get off my butt and perform the testing. The next day I added some reinforcement to the connector which was only really held on by the surface mount pins, and was highly likely to rip off the board, so I covered it in epoxy. Then I knocked up some USB A plug/socket to bare wires test adapters using some stuff from the junk bin we have at the office maker space for just this sort of occasion (the socket was actually a front panel USB port from an old IBM x-series server). With some trepidation I plugged the board into my newly built & tested adapter, and powered the board from a lab supply set to limit current in case I'd any shorts in the board. It all came up straight away, and even lit the LEDs I'd added for some user feedback.

Next was to load a firmware for the chip. I'd previously used TI's tool to create a firmware image, and after some messing around with the SPI flash programmer I'd purchased managed to get the board programmed. However the behaviour of the board didn't change with (what I thought was) real firmware, I used an oscilloscope to verify the flash was being read, and a twinkie to sniff the PD negotiation, which confirmed that no request for 20v was being sent. This was where I finished that day.

Over the weekend that followed I dug into what I'd seen and determined that either I'd killed the SPI MISO port (the programmer I used was 5v, not 3.3v), or I just had bad firmware and the chip had some good defaults. I created a new firmware image from scratch, and loaded that.

Sure enough it worked first try. Once I confirmed 20v was coming from the output ports I attached it to my recently acquired HP 6051A DC load where it happily sank 45W for a while, then I attached the cable part of a Lenovo barrel to slim adapter and plugged it into my X1 where it started charging right away.

At linux.conf.au last week I gave (part of) a hardware miniconf talk about USB-C & USB-PD, which open source hardware folk might be interested in. Over the last few days while visiting my dad down in Gippsland I made the edits to fix the footprint and sent a new rev to the manufacturer for some new experiments.

Of course at CES Lenovo announced that this years ThinkPads would feature USB-C ports and allow charging through them, and due to laziness I never got around to replacing my T430, so I'm planning to order a T470 as soon as they're available, making my adapter obsolete.





Rough timeline:
  • April 21st 2016, decide to start working on the project
  • April 28th, devkits arrive
  • May 8th, schematic largely complete, work starts on PCB layout
  • June 14th, order sent to CM
  • ~July 6th, CM ships order to me (to California, then hand carried to me by a coworker)
  • early August, boards arrive from California
  • Somewhere here I try, and fail, to reflow a modified connector onto a board
  • October 13th, California cowoker helps to (successfully) reflow a USB-C connector onto a board for testing
  • January 6th 2017, finally got around to reinforce the connector with epoxy and started testing, try loading firmware but no dice
  • January 10th, redo firmware, it works, test on DC load, then modify a ThinkPad-slim adapater and test on a real ThinkPad
  • January 25/26th, fixed USB-C connector footprint, made one more minor tweak, sent order for rev2 to CM, then some back & forth over some tolerance issues they're now stricter on.


1: There's a previous variant of USB-PD that works on the older A/B connector, but, as far as I'm aware, was never implemented in any notable products.

Michael Still: Recording performance information from short lived processes with prometheus

Sat, 2017-01-28 17:00
Now that I'm recording basic statistics about the behavior of my machines, I now want to start tracking some statistics from various scripts I have lying around in cron jobs. In order to make myself sound smarter, I'm going to call these short lived scripts "ephemeral scripts" throughout this document. You're welcome.

The promethean way of doing this is to have a relay process. Prometheus really wants to know where to find web servers to learn things from, and my ephemeral scripts are both not permanently around and also not running web servers. Luckily, prometheus has a thing called the pushgateway which is designed to handle this situation. I can run just one of these, and then have all my little scripts just tell it things to add to its metrics. Then prometheus regularly scrapes this one process and learns things about those scripts. Its like a game of Telephone, but for processes really.

First off, let's get the pushgateway running. This is basically the same as the node_exporter from last time:

$ wget https://github.com/prometheus/pushgateway/releases/download/v0.3.1/pushgateway-0.3.1.linux-386.tar.gz $ tar xvzf pushgateway-0.3.1.linux-386.tar.gz $ cd pushgateway-0.3.1.linux-386 $ ./pushgateway

Let's assume once again that we're all adults and did something nicer than that involving configuration management and init scripts.

The pushgateway implements a relatively simple HTTP protocol to add values to the metrics that it reports. Note that the values wont change once set until you change them again, they're not garbage collected or aged out or anything fancy. Here's a trivial example of adding a value to the pushgateway:

echo "some_metric 3.14" | curl --data-binary @- http://pushgateway.example.org:9091/metrics/job/some_job

This is stolen straight from the pushgateway README of course. The above command will have the pushgateway start to report a metric called "some_metric" with the value "3.14", for a job called "some_job". In other words, we'll get this in the pushgateway metrics URL:

# TYPE some_metric untyped some_metric{instance="",job="some_job"} 3.14

You can see that this isn't perfect because the metric is untyped (what types exist? we haven't covered that yet!), and has these confusing instance and job labels. One tangent at a time, so let's explain instances and jobs first.

On jobs and instances

Prometheus is built for a universe a little bit unlike my home lab. Specifically, it expects there to be groups of processes doing a thing instead of just one. This is especially true because it doesn't really expect things like the pushgateway to be proxying your metrics for you because there is an assumption that every process will be running its own metrics server. This leads to some warts, which I'll explain in a second. Let's start by explaining jobs and instances.

For a moment, assume that we're running the world's most popular wordpress site. The basic architecture for our site is web frontends which run wordpress, and database servers which store the content that wordpress is going to render. When we first started our site it was all easy, as they could both be on the same machine or cloud instance. As we grew, we were first forced to split apart the frontend and the database into separate instances, and then forced to scale those two independently -- perhaps we have reasonable database performance so we ended up with more web frontends than we did database servers.

So, we go from something like this:



To an architecture which looks a bit like this:



Now, in prometheus (i.e. google) terms, there are three jobs here. We have web frontends, database masters (the top one which is getting all the writes), and database slaves (the bottom one which everyone is reading from). For one of the jobs, the frontends, there is more than one instance of the job. To put that into pictures:



So, the topmost frontend job would be job="fe" and instance="0". Google also had a cool way to lookup jobs and instances via DNS, but that's a story for another day.

To harp on a point here, all of these processes would be running a web server exporting metrics in google land -- that means that prometheus would know that its monitoring a frontend job because it would be listed in the configuration file as such. You can see this in the configuration file from the previous post. Here's the relevant snippet again:

- job_name: 'node' static_configs: - targets: ['molokai:9100', 'dell:9100', 'eeebox:9100']

The job "node" runs on three targets (instances), named "molokai:9100", "dell:9100", and "eeebox:9100".

However, we live in the ghetto for these ephemeral scripts and want to use the pushgateway for more than one such script, so we have to tell lies via the pushgateway. So for my simple emphemeral script, we'll tell the pushgateway that the job is the script name and the instance can be an empty string. If we don't do that, then prometheus will think that the metric relates to the pushgateway process itself, instead of the ephemeral process.

We tell the pushgateway what job and instance to use like this:

echo "some_metric 3.14" | curl --data-binary @- http://localhost:9091/metrics/job/frontend/instance/0

Now we'll get this at the metrics URL:

# TYPE some_metric untyped some_metric{instance="",job="some_job"} 3.14 some_metric{instance="0",job="frontend"} 3.14

The first metric there is from our previous attempt (remember when I said that values are never cleared out?), and the second one is from our second attempt. To clear out values you'll need to restart the pushgateway process. For simple ephemeral scripts, I think its ok to leave the instance empty, and just set a job name -- as long as that job name is globally unique.

We also need to tell prometheus to believe our lies about the job and instance for things reported by the pushgateway. The scrape configuration for the pushgateway therefore ends up looking like this:

- job_name: 'pushgateway' honor_labels: true static_configs: - targets: ['molokai:9091']

Note the honor_labels there, that's the believing the lies bit.

There is one thing to remember here before we can move on. Job names are being blindly trusted from our reporting. So, its now up to us to keep job names unique. So if we export a metric on every machine, we might want to keep the job name specific to the machine. That said, it really depends on what you're trying to do -- so just pay attention when picking job and instance names.

On metric types

Prometheus supports a couple of different types for the metrics which are exported. For now we'll discuss two, and we'll cover the third later. The types are:

  • Gauge: a value which goes up and down over time, like the fuel gauge in your car. Non-motoring examples would include the amount of free disk space on a given partition, the amount of CPU in use, and so forth.
  • Counter: a value which always increases. This might be something like the number of bytes sent by a network card -- the value only resets when the network card is reset (probably by a reboot). These only-increasing types are valuable because its easier to do maths on them in the monitoring system.
  • Histograms: a set of values broken into buckets. For example, the response time for a given web page would probably be reported as a histogram. We'll discuss histograms in more detail in a later post.


I don't really want to dig too deeply into the value types right now, apart from explaining that our previous examples haven't specified a type for the metrics being provided, and that this is undesirable. For now we just need to decide if the value goes up and down (a gauge) or just up (a counter). You can read more about prometheus types at https://prometheus.io/docs/concepts/metric_types/ if you want to.

A typed example

So now we can go back and do the same thing as before, but we can do it with typing like adults would. Let's assume that the value of pi is a gauge, and goes up and down depending on the vagaries of space time. Let's also show that we can add a second metric at the same time because we're fancy like that. We'd therefore need to end up doing something like (again heavily based on the contents of the README):

cat <<EOF | curl --data-binary @- http://pushgateway.example.org:9091/metrics/job/frontend/instance/0 # TYPE some_metric gauge # HELP approximate value of pi in the current space time continuum some_metric 3.14 # TYPE another_metric counter # HELP another_metric Just an example. another_metric 2398 EOF

And we'd end up with values like this in the pushgateway metrics URL:

# TYPE some_metric gauge some_metric{instance="0",job="frontend"} 3.14 # HELP another_metric Just an example. # TYPE another_metric counter another_metric{instance="0",job="frontend"} 2398

A tangible example

So that's a lot of talking. Let's deploy this in my home lab for something actually useful. The node_exporter does not report any SMART health details for disks, and that's probably a thing I'd want to alert on. So I wrote this simple script:

#!/bin/bash hostname=`hostname | cut -f 1 -d "."` for disk in /dev/sd[a-z] do disk=`basename $disk` # Is this a USB thumb drive? if [ `/usr/sbin/smartctl -H /dev/$disk | grep -c "Unknown USB bridge"` -gt 0 ] then result=1 else result=`/usr/sbin/smartctl -H /dev/$disk | grep -c "overall-health self-assessment test result: PASSED"` fi cat <<EOF | curl --data-binary @- http://localhost:9091/metrics/job/$hostname/instance/$disk # TYPE smart_health_passed gauge # HELP whether or not a disk passed a "smartctl -H /dev/sdX" smart_health_passed $result EOF done

Now, that's not perfect and I am sure that I'll re-write this in python later, but it is actually quite useful already. It will report if a SMART health check failed, and now I could write an alerting rule which looks for disks with a health value of 0 and send myself an email to go to the hard disk shop. Once your pushgateways are being scraped by prometheus, you'll end up with something like this in the console:



I'll explain how to turn this into alerting later.

Tags for this post: prometheus monitoring ephemeral_script pushgateway
Related posts: A pythonic example of recording metrics about ephemeral scripts with prometheus; Basic prometheus setup; Mona Lisa Overdrive; The Diamond Age ; Buying Time; The System of the World

Comment

Michael Still: Basic prometheus setup

Fri, 2017-01-27 17:00
I've been playing with prometheus for monitoring. It feels quite familiar to me because its based on an internal google technology called borgmon, but I suspect that means it feels really weird to everyone else.

The first thing to realize is that everything at google is a web server. Your short lived tool that copies some files around probably runs a web server. All of these web servers have built in URLs which report the progress and status of the task at hand. Prometheus is built to: scrape those web servers; aggregate the data; store the data into a time series database; and then perform dashboarding, trending and alerting on that data.

The most basic example is to just export metrics for each machine on my home network. This is the easiest first step, because we don't need to build any software to do this. First off, let's install node_exporter on each machine. node_exporter is the tool which runs a web server to export metrics for each node. Everything in prometheus land is written in go, which is new to me. However, it does make running node exporter easy -- just grab the relevant binary from https://prometheus.io/download/, untar, and run. Let's do it in a command line script example thing:

$ wget https://github.com/prometheus/node_exporter/releases/download/v0.14.0-rc.1/node_exporter-0.14.0-rc.1.linux-386.tar.gz $ tar xvzf node_exporter-0.14.0-rc.1.linux-386.tar.gz $ cd node_exporter-0.14.0-rc.1.linux-386 $ ./node_exporter

That's all it takes to run the node_exporter. This runs a web server at port 9100, which exposes the following metrics:

$ curl -s http://localhost:9100/metrics | grep filesystem_free | grep 'mountpoint="/data"' node_filesystem_free{device="/dev/mapper/raidvg-srvlv",fstype="xfs",mountpoint="/data"} 6.811044864e+11

Here you can see that the system I'm running on is exporting a filesystem_free value for the filesystem mounted at /data. There's a lot more than that exported, and I'd encourage you to poke around at that URL a little before continuing on.

So that's lovely, but we really want to record that over time. So let's assume that you have one of those running on each of your machines, and that you have it setup to start on boot. I'll leave the details of that out of this post, but let's just say I used my existing puppet infrastructure.

Now we need the central process which collects and records the values. That's the actual prometheus binary. Installation is again trivial:

$ wget https://github.com/prometheus/prometheus/releases/download/v1.5.0/prometheus-1.5.0.linux-386.tar.gz $ tar xvzf prometheus-1.5.0.linux-386.tar.gz $ cd prometheus-1.5.0.linux-386

Now we need to move some things around to install this nicely. I did the puppet equivalent of:

  • Moving the prometheus file to /usr/bin
  • Creating an /etc/prometheus directory and moving console_libraries and consoles into it
  • Creating a /etc/prometheus/prometheus.yml config file, more on the contents on this one in a second
  • And creating an empty data directory, in my case at /data/prometheus


The config file needs to list all of your machines. I am sure this could be generated with puppet templating or something like that, but for now here's my simple hard coded one:

# my global config global: scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute. # scrape_timeout is set to the global default (10s). # Attach these labels to any time series or alerts when communicating with # external systems (federation, remote storage, Alertmanager). external_labels: monitor: 'stillhq' # Load rules once and periodically evaluate them according to the global 'evaluation_interval'. rule_files: # - "first.rules" # - "second.rules" # A scrape configuration containing exactly one endpoint to scrape: # Here it's Prometheus itself. scrape_configs: # The job name is added as a label `job=` to any timeseries scraped from this config. - job_name: 'prometheus' # metrics_path defaults to '/metrics' # scheme defaults to 'http'. static_configs: - targets: ['molokai:9090'] - job_name: 'node' static_configs: - targets: ['molokai:9100', 'dell:9100', 'eeebox:9100']

Here you can see that I want to scrape each of my web servers which exports metrics every 15 seconds, and I also want to calculate values (such as firing alerts) every 15 seconds too. This might not scale if you have bajillions of processes or machines to monitor. I also label all of my values as coming from my domain, so that if I ever aggregate these values with another prometheus from somewhere else the origin will be clear.

The other interesting bit for now is the scrape configuration. This lists the metrics exporters to monitor. In this case its prometheus itself (molokai:9090), and then each of my machines in the home lab (molokai, dell, and eeebox -- all on port 9100). Remember, port 9090 is the prometheus binary itself and port 9100 is that node_exporter binary we now have running on all of our machines.

Now if we start prometheus, it will do its thing. There is some configuration which needs to be passed on the command line here (instead of in the configration file), so my command line looks like this:

/usr/bin/prometheus -config.file=/etc/prometheus/prometheus.yml \ -web.console.libraries=/etc/prometheus/console_libraries \ -web.console.templates=/etc/prometheus/consoles \ -storage.local.path=/data/prometheus

Prometheus also presents an interactive user interface on port 9090, which is handy. Here's an example of it graphing the load average on each of my machines (it was something which caused a nice jaggy line):



You can see here that the user interface has a drop down for selecting values that are known, and that the key at the bottom tells you things about each time series in the graph. So for example, if we added {instance="eeebox:9100"} to the end of the value in the text box at the top, then we'd be filtering for values with that label set, and would as a result only show one value in the graph (the one for eeebox).

If you're interested in very simple dashboarding of basic system metrics, that's actually all you need to do. In my next post about prometheus I'm going to show how to write your own binary which exports values to be graphed. In my case, the temperature outside my house.

Tags for this post: prometheus monitoring node_exporter
Related posts: Recording performance information from short lived processes with prometheus; A pythonic example of recording metrics about ephemeral scripts with prometheus; Mona Lisa Overdrive; The Diamond Age ; Buying Time; The System of the World

Comment