Planet Linux Australia
Had the first orchestral rehearsal with the orchestra for the Berlioz last night.
At one point, the hairs on my neck stood up and my knees began to melt. A truly special musical moment.
Any folk in Melbourne who miss the performance tonight in the Town Hall are going you feel sorry fit themselves. (More details about tickets at http://www.miv.aicsa.org.au/
Anyhow, off to the dress rehearsal.
I’ve been back in Perth for about a month now and am only starting to re-adjust to it. After having travelled around the world, I still feel very much like a visitor in town (admittedly, one who knows how to get around)
I’ve been extremely lax about writing up my international adventures. It was hard during the trip due to constantly being on the move with sucky internet access and a case of writers block. I’ve decided to write up some of the international sections as vignettes so based on requests I’ll write something about those cities first. In order, Vancouver, San Francisco, Chicago, Grand Rapids, Boston, New York, London, Amsterdam (Haarlem), Oslo and Hong Kong (with a bonus Brisbane and Sydney in there too)
I still have to complete my other journey that requires leaving a very familiar place after 19 years, but I suspect that will happen in the new year. (ask me in person and I’ll be less cryptic)
Anyhow, comments and requests welcome.
- From toilet to table, overcoming the ‘yuk’ factor
- Snowden’s Leaks Didn’t Help Terrorists
- Italy: High Court shoots down Microsoft Windows tax as “a commercial policy of forced distribution” http://t.co/0oIpqpkjTz 20:32:10, 2014-09-18
- How to Pull an All-Nighter: What the sleep doctors won’t tell you http://t.co/5cxCbkOmv6 16:33:03, 2014-09-18
- New website fills healthy food ratings vacuum
- Steve Jobs was a low-tech parent http://t.co/dzXEi6HjMS 18:27:00, 2014-09-17
- World hunger eases, but 805 million still chronically undernourished, report from food agencies finds
- Sea level rises will cost Australia billions: report
- The first letter of your surname apparently can influence how you respond to opportunities.
- Find the Perfect Sleep Position http://t.co/W5xj2k0r8q http://t.co/jGupuv52uf 14:19:12, 2014-09-16
- Foil chip packets turned into life-saving water purifier
- Plastic pollution choking Australian waters: study
- John Howard questions Coalition’s two royal commissions
Taking Zoe bowling has been on my to do list for a while, and yesterday the schedule was pretty open, so I thought I'd try doing it first thing in the morning.
Who would have thought that a bowling alley would be booked solid with senior citizens from 9am onwards? I certainly didn't. So I was a bit disappointed when we rocked up at 9:30am and were told that there weren't any lanes available until 11am.
We stuck around and watched for a little bit anyway, so I could explain stuff to Zoe. There were three lanes that weren't in use the entire time we were there, so I don't know what was up with that.
We had to be home at 11am for a vet visit for Smudge, and so after a brief stop off at the Garage to grab some tomatoes for lunch, we got home just as the vet arrived.
Dr Anthony is the most lovely and charismatic vet you could ever hope to have in your home, and Zoe took to him straight away. She wanted to show him all sorts of random stuff, and had to show him her bedroom before he left. It was very cute, and he was very nice and played along.
I'd asked him to come out to take a look at Smudge because I'd noticed excessive fur missing on the insides of her back legs. Turns out she's just over-grooming, which is probably a stress reaction. Not sure if it's related to reducing her food portions or maybe some random construction in the neighbourhood. The prescription is some Feliway, which like everything else in Australia is exorbitantly expensive. $11.99 on Amazon, $56.70 at Petbarn. Absolutely outrageous.
After lunch, we returned to the bowling alley for our bowling game. Zoe found a 6 pound ball a bit heavy to swing, and didn't really have enough oompf to get it down the lane. We ended up with quite a few balls stuck in the gutter. She did much better with the ball ramp.
I was disappointed that the technician couldn't take us out the back to see the behind the scenes workings, as I thought Zoe would have liked seeing that. I also forgot to get a printout of our score.
She had a great time "playing" on all the car racing arcade games, and we had a game of air hockey, which she was a good height for playing before returning home.
I had a few things that I had to get done, so I let Zoe watch some TV while I did them, and then Sarah arrived to pick her up. I then spent the rest of the night furiously trying to finish off all the written exams I needed to do for my weekend rock climbing course.
OpenStack is Open Source software for building public and private clouds through a series of Infrastructure as a Service building blocks. OpenStack offers virtualized infrastructure -- compute, networking and storage -- as well as orchestration and management tools. Built with the support of a large number of organizations, many of whom sponsor linux.conf.au, there are now thousands of installations around the world. You quite possibly know someone running OpenStack right now.
This year's linux.conf.au miniconf is going to try taking a new approach -- we're going to cover the issues that are important to a System Admin in deploying OpenStack into their environment, while attempting to tell the story of a hypothetical OpenStack deployment from end to end. How does OpenStack integrate with your existing LDAP or Active Directory? What choices need to be made about how to configure storage on your compute nodes? How does scaling your object storage work? What are the networking options you might like to consider? What hypervisor is the right choice for your needs?
We'll also cover the existing configuration management options, including puppet, chef and HP's deployment system tripleo.
The focus of this year's miniconf is explicitly on the deployers of OpenStack, rather than the developers of it. We won't cover developer-centric issues like the latest tweaks to our CI system, or what the state of development is with the Kilo release. We pinky swear there will be no talks on the governance of the OpenStack Foundation.
So, are you interested in deploying cloud infrastructure at your organization? If so, the OpenStack miniconf is the event for you. Also, we'll have stickers. Just sayin'.
People often complain about the quality of open source project documentation. At the same time, documentation is a great place to get started contributing to an open source community.
This miniconf will explore practical aspects of Open Source documentation, with an eye to applying them right away.
We will look at:
- popular markup languages (Docbook, DITA, markdown, etc)
- version control systems for writers (SVN, git, etc)
- getting started as a contributor (how to pick a project, getting an account, meeting the community, your first commit, etc)
- documentation skills and methodologies (topic-based authoring, single sourcing, minimalism, etc)
We'll then be able to start contributing documentation. The pacing of this session will be largely driven by participant interest. It might be that we fly through the concepts straight to a frenetic docs hack fest. It might be that we get a lively argument about the best markup language, or whether minimalism is all hype.
Tim Hildred was a technical writer at Red Hat. Before that a barista at the Linuxcaffe in Toronto
In my last post I covered how I setup an OpenWRT build, to examine a small subset of indicators of security of the firmware.
In this follow-up post we will examine in detail the analysis results of one of the indicators: specifically, the RELRO flag.A first look – what are the defaults?
The analysis here is specific to the Barrier Breaker release of OpenWRT, but it should be noted that during experiments with the OpenWRT development trunk the results are much the same.
Before diving into RELRO, lets take a look at the overall default situation.
Here is the checksec report for the Carambola2 device (MIPS processor) build.
It is a sea of red…
The ‘run as root’ errors can be ignored: those programs are actually absolute symbolic links which do not resolve in the host system. Relative symbolic links resolve correctly but are filtered out of the analysis.
The x86 build paints a similar picture:
Note, the rest of this post describes how to modify OpenWRT to enable RELRO. There may be perfectly valid reasons to not enable the flag (for example, using RELRO may have a performance impact, and for a given system the adverse security risk may be judged low), so I have ensured that the suggested mitigation if applied remains a choice in the configuration menu of the system. For the moment my patch also retains backward compatibility by defaulting to off.Inside the OpenWRT build system
After a brief look at the build logs, the reason is obvious: the typical gcc linker command is missing the flags needed to enable RELRO: -Wl,-z,relro -Wl,-z,now (or the direct linker equivalents, -z relro -z now)
What could be done to address this?
OpenWRT provides a hook for appending to the global compiler CFLAGS but there is no similar hook for the linker stage. We could add those flags to the global CFLAGS and they can in fact flow through to the linker for many programs, but that would also be redundant as the flags are irrelevant to the compiler. In the end I decided I would modify the OpenWRT source to add a new global CONFIG option, which adds -Wl,-z,relro -Wl,-z,now to the global LDFLAGS instead.
The following patch achieves that (note, I have left out some of the help for brevity):diff --git a/rules.mk b/rules.mk index c9efb9e..e9c58d8 100644 --- a/rules.mk +++ b/rules.mk @@ -177,6 +177,10 @@ else endif endif +ifeq ($(CONFIG_SECURITY_USE_RELRO_EVERYWHERE),y) + TARGET_LDFLAGS+= -Wl,-z,relro -Wl,-z,now +endif + export PATH:=$(TARGET_PATH) export STAGING_DIR export SH_FUNC:=. $(INCLUDE_DIR)/shell.sh; diff --git a/toolchain/Config.in b/toolchain/Config.in index 7257f1d..964200d 100644 --- a/toolchain/Config.in +++ b/toolchain/Config.in @@ -38,6 +38,19 @@ menuconfig TARGET_OPTIONS Most people will answer N. + config SECURITY_USE_RELRO_EVERYWHERE + bool "Enable RELRO and NOW for binaries and libraries" if TARGET_OPTIONS + default n + help + Apply -z relro -z now flag to the linker stage for all ELF binaries and libraries. menuconfig EXTERNAL_TOOLCHAIN bool
Having attched OpenWRT, and enabled the new flag, lets rebuild everything again and run another checksec scan.
The results shown above are for x86, the picture is similar for the Carambola2 MIPS image.
The new results indicate that the RELRO flag is present on some binaries but not all of them. From this we can predict that some packages do not fully honour the global OpenWRT build system linker flags. I soon confirmed this**; the implication is that the new flag CONFIG_SECURITY_FORCE_RELRO is useful, however, a caveat in the Kconfig help is required. In particular, a statement to the effect that the efficacy depends on proper coding of OpenWRT packages (with ideally all packages maintained by the project being fixed to honour the flag.)
** For example: the package that builds libnl-tiny.so does not pass LDFLAGS through to the linker; this and some other base system packages needed patching to get complete coverage. it is likely that there are other packages that I did not have selected that may also need tweaking.
Another notable package is busybox. Busybox it turns out uses ld directly for linking, instead of indirectly via gcc, and thus requires the flags in the pure form -z relro -z now. (The busybox OpenWRT package Makefile also happens to treat the global TARGET_LDFLAGS differently from the TARGET_CFLAGS although I am unsure if this is a bug; but that turned out to be a red-herring.) Oddly, this solution worked for MIPS when I tried it previously, but is presently not successful for the x86 build, so further investigation is needed here; possibly I incorrectly noted the fix in previous experiments.Fun and Games with uClibc and busybox
The other recalcitrant is the uClibc library. I spent quite a bit time trying to work out why this was not working, especially having confirmed with verbose logging that the flags are being applied as expected. Along the way I learned that uClibc already has its own apply RELRO config item, which was already enabled. Even more oddly, RELRO is present on some uClibc libraries and not others, that as far as I could tell were being linked with identical linker flag sets.
After some digging I discovered hints of bugs related to RELRO in various versions of binutils, so I further patched OpenWRT to use the very latest binutils release. However that made no difference. At this point I took a big diversion and spent some time building the latest uClibc externally, where I discovered that it built fine using the native toolchain of Debian Wheezy (including a much older binutils!) After some discussion on the uClibc mailing list I have come to the conclusion that there may be a combination of problems, including the fact that uClibc in OpenWRT is a couple of years old (and additionally has a set of OpenWRT specific patches.) I could go further and patch OpenWRT to use the trunk uClibc but then I would have to work through refreshing the set of patches which I really don’t have time or inclination to do, so for the moment I have deferred working on resolving this conundrum. Eventually someone at OpenWRT may realise that uClibc has undergone a flurry of development in recent times and may bump to the more recent version.Comments
Along the way, I discovered that Debian actually runs security scans across all packages in the distribution – take a look at https://lintian.debian.org/tags/hardening-no-relro.html.
It is worth noting that whenever changing any build-related flag it is worth cleaning and rebuilding the toolchain as well as the target packages and kernel; I found without doing this, flag changes such as the RELRO flag don’t fully take effect as expected.
For maximum verboseness, run with make V=csw although I had to dig through the code to find this out.
I was going to repeat all the testing against a third target, another MIPS-based SOC the RALINK 3530 but at this point I don’t really have the time or inclination, I am sure the results will be quite similar. It would probably be useful to try with an ARM-based target as well.
I should also try repeating this experiment with MUSL, which is an alternative C library that OpenWRT can be built with.Conclusion
Out of the box, OpenWRT has very limited coverage of the RELRO security mitigation in a standard firmware build. By applying the suggested patches it is possible to bring OpenWRT up to a level of coverage, for RELRO, to that approaching a hardened Gentoo or Ubuntu distribution, with only a small subset of binaries missing the flag.References
My Github account includes the repository openwrt-barrier-breaker-hardening. The following branch include the completed series of patches mentioned above: owrt_analysis_relro_everywhere I hope it will remain possible to apply these changes against the official release for a while yet.
The patch that enables the latest binutils is not in that branch, but in this commit.
The battery at the end of that was at 121.8V, which is about 3.2V per cell, and the lowest cell was 3.16V. Watching the voltage as I was riding (taking quick glances) showed that even under load it wasn't dipping below 110V, so there's a good chance that most cells were still running just fine. It's possible that there's a cell with lower capacity, but I think as I ride it and the battery gets more chance to level out I'm actually improving its range.
Unfortunately the battery meter on the bike thinks that power is leaking out when it isn't, so that doesn't tell me much. The meter on the BMS read "0%" at 38Km (the last time I read it) and "0%" before I started charging it, so I have no idea whether that's being caused by a low cell or some other random error. Either way, it's still trial and error to see how much distance I can actually get out of the battery.
According to the meter on the wall socket, that was half a kilowatt-hour. According to my calculations, that was about 11 amps at 120 volts over four hours, so about five kilowatt-hours. Either the decimal point is wrong on the meter and it's an order of magnitude too low, I'm reading it wrong, or it's just plain incorrect. Five kilowatt-hours is in the right ball park. At $0.17 per kilowatt-hour I just paid 89 cents to fill up the bike. So it's about 1.2 cents per kilometre, at this rough guess.
The other amusing thing is that I'm having to get used to coming back to the bike to find people peering intently at it. Fluoro-clad workers, motorbike enthusiasts, general passers by - I get all types. There are still lots of people who walk on by, so I don't think I've really changed the planet. But it's still fun to explain it and to see people's different reactions, all without exception positive. That's pretty cool.
Steven Hanley: [mtb/events] The Surf Coast Century 2014 - Anglesea, Victoria - Surf Coast track and Otway ranges
Near Bells beach around 18km into the 100km run fullsize)
Earlier this year I competed in The North Face 100 km run in the Blue Mountains, it has a fairly large amount of climbing (over 4000 metres) and though it is remarkably pretty it is a tough day out. When a friend suggested doing the Surf Coast Centruy I thought why the heck not, less than half the climbing and I have never spent much time in this part of Victoria.
Report and photos are online. I was happy with my time, had a good day out and definately have to hand it to the locals, they live in a beautiful place. I also got a PB time for the distance by a fairly large amount (almost 3 hours faster than my TNF100 time)
Steven Hanley: [mtb/events] The Coastal Classic 2014 - Otford to Bundeena in the Royal National Park
Running along a beach 10km in to the event fullsize)
I had the opportunity to run in the Coastal Classic two weeks ago, Max Adventure do a great event here by getting access to the coastal walkiing track for this run and I recommend a visit to this track by anyone looking for a great coastal wlak or run.
Report and photos are online. It was a really pretty run and I had never been along the track so it was a great day out.
So, I’ve been looking around for a while (and a few times now) for any good resources that cover a bunch of MySQL architecture and technical details aimed towards the technically proficient but not MySQL literate audience. I haven’t really found anything. I mean, there’s the (huge and very detailed) MySQL manual, there’s the MySQL Internals manual (which is sometimes only 10 years out of date) and there’s various blog entries around the place. So I thought I’d write something explaining roughly how it all fits together and what it does to your system (processes, threads, IO etc).(Basically, I’ve found myself explaining this enough times in the past few years that I should really write it down and just point people to my blog). I’ve linked to things for more reading. You should probably read them at some point.
Years ago, there were many presentations on MySQL Architecture. I went to try and find some on YouTube and couldn’t. We were probably not cool enough for YouTube and the conferences mostly weren’t recorded. So, instead, I’ll just link to Brian on NoSQL – because it’s important to cover NoSQL as well.
So, here is a quick overview of executing a query inside a MySQL Server and all the things that can affect it. This isn’t meant to be complete… just a “brief” overview (of a few thousand words).
MySQL is an open source relational database server, the origins of which date back to 1979 with MySQL 1.0 coming into existence in 1995. It’s code that has some history and sometimes this really does show. For a more complete history, see my talk from linux.conf.au 2014: Past, Present and Future of MySQL (YouTube, Download).
The MySQL Server runs as a daemon (mysqld). Users typically interact with it over a TCP or UNIX domain socket through the MySQL network protocol (of which multiple implementations exist under various licenses). Each connection causes the MySQL Server (mysqld) to spawn a thread to handle the client connection.
There are now several different thread-pool plugins that instead of using one thread per connection, multiplex connections over a set of threads. However, these plugins are not commonly deployed and we can largely ignore them. For all intents and purposes, the MySQL Server spawns one thread per connection and that thread alone performs everything needed to service that connection. Thus, parallelism in the MySQL Server is gotten from executing many concurrent queries rather than one query concurrently.
The MySQL Server will cache threads (the amount is configurable) so that it doesn’t have to have the overhead of pthread_create() for each new connection. This is controlled by the thread_cache_size configuration option. It turns out that although creating threads may be a relatively cheap operation, it’s actually quite time consuming in the scope of many typical MySQL Server connections.
Because the MySQL Server is a collection of threads, there’s going to be thread local data (e.g. connection specific) and shared data (e.g. cache of on disk data). This means mutexes and atomic variables. Most of the more advanced ways of doing concurrency haven’t yet made it into MySQL (e.g. RCU hasn’t yet and is pretty much needed to get 1 million TPS), so you’re most likely going to see mutex contention and contention on cache lines for atomic operations and shared data structures.
There are also various worker threads inside the MySQL Server that perform various functions (e.g. replication).
Until sometime in the 2000s, more than one CPU core was really uncommon, so the fact that there were many global mutexes in MySQL wasn’t really an issue. These days, now that we have more reliable async networking and disk IO system calls but MySQL has a long history, there’s global mutexes still and there’s no hard and fast rule about how it does IO.
Over the past 10 years of MySQL development, it’s been a fight to remove the reliance on global mutexes and data structures controlled by them to attempt to increase the number of CPU cores a single mysqld could realistically use. The good news is that it’s no longer going to max out on the number of CPU cores you have in your phone.
So, you have a MySQL Client (e.g. the mysql client or something) connecting to the MySQL Server. Now, you want to enter a query. So you do that, say “SELECT 1;”. The query is sent to the server where it is parsed, optimized, executed and the result returns to the client.
Now, you’d expect this whole process to be incredibly clean and modular, like you were taught things happened back in university with several black boxes that take clean input and produce clean output that’s all independent data structures. At least in the case of MySQL, this isn’t really the case. For over a decade there’s been lovely architecture diagrams with clean boxes – the code is not like this at all. But this probably only worries you once you’re delving into the source.
The parser is a standard yacc one – there’s been attempts to replace it over the years, none of which have stuck – so we have the butchered yacc one still. With MySQL 5.0, it exploded in size due to the addition of things like SQL2003 stored procedures and it is of common opinion that it’s rather bloated and was better in 4.1 and before for the majority of queries that large scale web peeps execute.
There is also this thing called the Query Cache – protected by a single global mutex. It made sense in 2001 for a single benchmark. It is a simple hash of the SQL statement coming over the wire to the exact result to send(2) over the socket back to a client. On a single CPU system where you ask the exact same query again and again without modifying the data it’s the best thing ever. If this is your production setup, you probably want to think about where you went wrong in your life. On modern systems, enabling the query cache can drop server performance by an order of magnitude. A single global lock is a really bad idea. The query cache should be killed with fire – but at least in the mean time, it can be disabled.
Normally, you just have the SQL progress through the whole process of parse, optimize, execute, results but the MySQL Server also supports prepared statements. A prepared statement is simply this: “Hi server, please prepare this statement for execution leaving the following values blank: X, Y and Z” followed by “now execute that query with X=foo, Y=bar and Z=42″. You can call execute many times with different values. Due to the previously mentioned not-quite-well-separated parse, optimize, execute steps, prepared statements in MySQL aren’t as awesome as in other relational databases. You basically end up saving parsing the query again when you execute it with new parameters. More on prepared statements (from 2006) here. Unless you’re executing the same query many times in a single connection, server side prepared statements aren’t worth the network round trips.
The absolute worst thing in the entire world is MySQL server side prepared statements. It moves server memory allocation to be the responsibility of the clients. This is just brain dead stupid and a reason enough to disable prepared statements. In fact, just about every MySQL client library for every programming language ever actually fakes prepared statements in the client rather than trust every $language programmer to remember to close their prepared statements. Open many client connections to a MySQL Server and prepare a lot of statements and watch the OOM killer help you with your DoS attack.
So now that we’ve connected to the server, parsed the query (or done a prepared statement), we’re into the optimizer. The optimizer looks at a data structure describing the query and works out how to execute it. Remember: SQL is declarative, not procedural. The optimizer will access various table and index statistics in order to work out an execution plan. It may not be the best execution plan, but it’s one that can be found within reasonable time. You can find out the query plan for a SELECT statement by prepending it with EXPLAIN.
The MySQL optimizer is not the be all and end all of SQL optimizers (far from it). A lot of MySQL performance problems are due to complex SQL queries that don’t play well with the optimizer, and there’s been various tricks over the years to work around deficiencies in it. If there’s one thing the MySQL optimizer does well it’s making quick, pretty good decisions about simple queries. This is why MySQL is so popular – fast execution of simple queries.
To get table and index statistics, the optimizer has to ask the Storage Engine(s). In MySQL, the actual storage of tables (and thus the rows in tables) is (mostly) abstracted away from the upper layers. Much like a VFS layer in an OS kernel, there is (for some definition) an API abstracting away the specifics of storage from the rest of the server. The API is not clean and there are a million and one layering violations and exceptions to every rule. Sorry, not my fault.
Table definitions are in FRM files on disk, entirely managed by MySQL (not the storage engines) and for your own sanity you should not ever look into the actual file format. Table definitions are also cached by MySQL to save having to open and parse a file.
Originally, there was MyISAM (well, and ISAM before it, but that’s irrelevant now). MyISAM was non-transactional but relatively fast, especially for read heavy workloads. It only allowed one writer although there could be many concurrent readers. MyISAM is still there and used for system tables. The current default storage engine is called InnoDB. It’s all the buzzwords like ACID and MVCC. Just about every production environment is going to be using InnoDB. MyISAM is effectively deprecated.
InnoDB originally was its own independent thing and has (to some degree) been maintained as if it kind of was. It is, however, not buildable outside a MySQL Server anymore. It also has its own scalability issues. A recent victory was splitting the kernel_mutex, which was a mutex that protected far too much internal InnoDB state and could be a real bottleneck where NRCPUs > 4.
So, back to query execution. Once the optimizer has worked out how to execute the query, MySQL will start executing it. This probably involves accessing some database tables. These are probably going to be InnoDB tables. So, MySQL (server side) will open the tables, looking up the MySQL Server table definition cache and creating a MySQL Server side table share object which is shared amongst the open table instances for that table. See here for scalability hints on these (from 2009). The opened table objects are also cached – table_open_cache. In MySQL 5.6, there is table_open_cache_instances, which splits the table_open_cache mutex into table_open_cache_instances mutexes to help reduce lock contention on machines with many CPU cores (> 8 or >16 cores, depending on workload).
Once tables are opened, there are various access methods that can be employed. Table scans are the worst (start at the start and examine every row). There’s also index scans (often seeking to part of the index first) and key lookups. If your query involves multiple tables, the server (not the storage engine) will have to do a join. Typically, in MySQL, this is a nested loop join. In an ideal world, this would all be really easy to spot when profiling the MySQL server, but in reality, everything has funny names like rnd_next.
As an aside, any memory allocated during query execution is likely done as part of a MEM_ROOT – essentially a pool allocator, likely optimized for some ancient libc on some ancient linux/Solaris and it just so happens to still kinda work okay. There’s some odd server configuration options for (some of the) MEM_ROOTs that get exactly no airtime on what they mean or what changing them will do.
InnoDB has its own data dictionary (separate to FRM files) which can also be limited in current MySQL (important when you have tens of thousands of tables) – which is separate to the MySQL Server table definitions and table definition cache.
But anyway, you have a number of shared data structures about tables and then a data structure for each open table. To actually read/write things to/from tables, you’re going to have to get some data to/from disk.
InnoDB tables can be stored either in one giant table space or file-per-table. (Even though it’s now configurable), InnoDB database pages are 16kb. Database pages are cached in the InnoDB Buffer Pool, and the buffer-pool-size should typically be about 80% of system memory. InnoDB will use a (configurable) method to flush. Typically, it will all be O_DIRECT (it’s configurable) – which is why “just use XFS” is step 1 in IO optimization – the per inode mutex in ext3/ext4 just doesn’t make IO scale.
InnoDB will do some of its IO in the thread that is performing the query and some of it in helper threads using native linux async IO (again, that’s configurable). With luck, all of the data you need to access is in the InnoDB buffer pool – where database pages are cached. There exists innodb_buffer_pool_instances configuration option which will split the buffer pool into several instances to help reduce lock contention on the InnoDB buffer pool mutex.
All InnoDB tables have a clustered index. This is the index by which the rows are physically sorted by. If you have an INT PRIMARY KEY on your InnoDB table, then a row with that primary key value of 1 will be physically close to the row with primary key value 2 (and so on). Due to the intricacies of InnoDB page allocation, there may still be disk seeks involved in scanning a table in primary key order.
Every page in InnoDB has a checksum. There was an original algorithm, then there was a “fast” algorithm in some forks and now we’re converging on crc32, mainly because Intel implemented CPU instructions to make that fast. In write heavy workloads, this used to show up pretty heavily in profiles.
InnoDB has both REDO and UNDO logging to keep both crash consistency and provide consistent read views to transactions. These are also stored on disk, the redo logs being in their own files (size and number are configurable). The larger the redo logs, the longer it may take to run recovery after a crash. The smaller the redo logs, the more trouble you’re likely to run into with large or many concurrent transactions.
If your query performs writes to database tables, those changes are written to the REDO log and then, in the background, written back into the table space files. There exists configuration parameters for how much of the InnoDB buffer pool can be filled with dirty pages before they have to be flushed out to the table space files.
In order to maintain Isolation (I in ACID), InnoDB needs to assign a consistent read view to a new transaction. Transactions are either started explicitly (e.g. with BEGIN) or implicitly (e.g. just running a SELECT statement). There has been a lot of work recently in improving the scalability of creating read views inside InnoDB. A bit further in the past there was a lot of work in scaling InnoDB for greater than 1024 concurrent transactions (limitations in UNDO logging).
Fancy things that make InnoDB generally faster than you’d expect are the Adaptive Hash Index and change buffering. There are, of course, scalability challenges with these too. It’s good to understand the basics of them however and (of course), they are configurable.
If you end up reading or writing rows (quite likely) there will also be a translation between the InnoDB row format(s) and the MySQL Server row format(s). The details of which are not particularly interesting unless you’re delving deep into code or wish to buy me beer to hear about them.
Query execution may need to get many rows from many tables, join them together, sum things together or even sort things. If there’s an index with the sort order, it’s better to use that. MySQL may also need to do a filesort (sort rows, possibly using files on disk) or construct a temporary table in order to execute the query. Temporary tables are either using the MEMORY (formerly HEAP) storage engine or the MyISAM storage engine. Generally, you want to avoid having to use temporary tables – doing IO is never good.
Once you have the results of a query coming through, you may think that’s it. However, you may also be part of a replication hierarchy. If so, any changes made as part of that transaction will be written to the binary log. This is a log file maintained by the MySQL Server (not the storage engines) of all the changes to tables that have occured. This log can then be pulled by other MySQL servers and applied, making them replication slaves of the master MySQL Server.
We’ll ignore the differences between statement based replication and row based replication as they’re better discussed elsewhere. Being part of replication means you get a few extra locks and an additional file or two being written. The binary log (binlog for short) is a file on disk that is appended to until it reaches a certain size and is then rotated. Writes to this file vary in size (along with the size of transactions being executed). The writes to the binlog occur as part of committing the transaction (and the crash safety between writing to the binlog and writing to the storage engine are covered elsewhere – basically: you don’t want to know).
If your MySQL Server is a replication slave, then you have a thread reading the binary log files from another MySQL Server and then another thread (or, in newer versions, threads) applying the changes.
If the slow query log or general query log is enabled, they’ll also be written to at various points – and the current code for this is not optimal, there be (yes, you guess it) global mutexes.
Once the results of a query have been sent back to the client, the MySQL Server cleans things up (frees some memory) and prepares itself for the next query. You probably have many queries being executed simultaneously, and this is (naturally) a good thing.
There… I think that’s a mostly complete overview of all the things that can go on during query execution inside MySQL.
Linux.conf.au is pleased to announce that an Open Hardware Miniconf will be run the Linux.conf.au 2015 conference in Auckland, New Zealander during January 2015 .
The concept of Free / Open Source Software, already well understood by LCA attendees, is complemented by a rapidly growing community focused around Open Hardware and "maker culture". One of the drivers of the popularity of the Open Hardware community is easy access to cheap devices such as Arduino, which is a microcontroller development board originally intended for classroom use but now a popular building block in all sorts of weird and wonderful hobbyist and professional projects.
Interest in Open Hardware is high among FOSS enthusiasts but there is also a barrier to entry with the perceived difficulty and dangers of dealing with hot soldering irons, unknown components and unfamiliar naming schemes. The Miniconf will use the Arduino microcontroller board as a stepping stone to help ease software developers into dealing with Open Hardware. Topics will cover both software and hardware issues, starting with simpler sessions suitable for Open Hardware beginners and progressing through to more advanced topics.
The day will run in two distinct halves. The first part of the day will be a hands-on assembly session where participants will have the chance to solder together a special hardware project developed for the miniconf. Instructors will be on hand to assist with soldering and the other mysteries of hardware assembly. The second part of the day will be presentations about Open Hardware topics, including information on software to run on the hardware project built earlier in the day.
Please see www.openhardwareconf.org for more info.Miniconf organiser Jon Oxer Jon Oxer has been hacking on both hardware and software since he was a little tacker. Most recently he's been focusing more on the Open Hardware side, co-founding Freetronics as a result of organising the first Arduino Miniconf at LCA2010 and designing the Arduino-based payloads that were sent into orbit in 2013 on board satellites ArduSat-X and ArduSat-1. His books include "Ubuntu Hacks" and "Practical Arduino".
It’s been a little while since I blogged on MySQL on POWER (last time was thinking that new releases would be much better for running on POWER). Well, I recently grabbed the MySQL 5.6.20 source tarball and had a go with it on a POWER8 system in the lab. There is good news: I now only need one patch to have it function pretty flawlessly (no crashes). Unfortunately, there’s still a bit of an odd thing with some of the InnoDB mutex code (bug filed at some point soon).
But, with this one patch applied, I was getting okay sysbench results and things are looking good.
Now just to hope the MySQL team applies my other patches that improve things on POWER. To be honest, I’m a bit disappointed many of them have sat there for this long… it doesn’t help build a development community when patches can sit for months without either “applied” or “fix these things first”. That being said, just as I write this, Bug 72809 which I filed has been closed as the fix has been merged into 5.6.22 and 5.7.6, so there is hope… it’s just that things can just be silent for a while. It seems I go back and forth on how various MySQL variants are going with fostering an external development community.
linux.conf.au News: Developer, Testing, Release and Continuous Integration Automation Miniconf at Linux.conf.au 2015
Linux.conf.au is pleased to announce that the Developer, Testing, Release and Continuous Integration Automation Miniconf will be part of Linux.conf.au in Auckland, New Zealand during January 2015.
This miniconf is all about improving the way we produce, collaborate, test and release software.
We want to cover tools and techniques to improve the way we work together to produce higher quality software:
- code review tools and techniques (e.g. gerrit)
- continuous integration tools (e.g. jenkins)
- CI techniques (e.g. gated trunk, zuul)
- testing tools and techniques (e.g. subunit, fuzz testing tools)
- release tools and techniques: daily builds, interacting with distributions, ensuring you test the software that you ship.
- applying CI in your workplace/project
Organiser: Stewart Smith
Stewart currently works for IBM in the Linux Technology Center on KVM on POWER, giving him a job that is even harder to explain to non-Linux geek people than ever before. Previously he worked for Percona as Director of Server Development where he oversaw development of many of Percona’s software products. He comes from many years experience in databases and free and open source software development. He’s often found hacking on the Drizzle database server, taking photos, running, brewing beer and cycling (yes, all at the same time).
I have recently spent some time pondering the state of embedded system security.
I have been a long time user of OpenWRT, partly based on the premise that it should be “more secure” out of the box (provided attention is paid to routine hardening); but as the Infosec world keeps on evolving, I thought I would take a closer look at the state of play.
OpenWRT is a Linux distribution targeted as a replacement firmware for consumer home routers, having been named after the venerable Linksys WRT54g, but it is gaining a growing user base in the so-called Internet of Things space, as evidenced by devices such as the Carambola2, VoCore and WRTnode.
Now there are many, many areas that could be the focus of security related attention. One useful reference is the Arch Linux security guide , which covers a broad array of mitigations, but for some reason I felt like diving into the deeper insides of the operating system. I decided it would be worthwhile to compare the security performance of the core of OpenWRT, specifically the Linux kernel plus the userspace toolchain, against other “mainstream” distributions.
The theory is that ensuring a baseline level of protection in the toolchain and the kernel will cut of a large number of potential root escalation vulnerabilities before they can get started. To get started, take a look at the Gentoo Linux toolchain hardening guide. This lists mitigations such as: Stack Smashing Protection (SSP), used to make it much harder for malware to take advantage of stack overrun situations; position independent code (PIC or PIE) that randomises the address of a program in memory when it is loaded; and so-on to other features such as Address Space Layout Randomisation (ASLR) in the kernel, all of these designed to halt entire classes of exploits in their tracks.
I wont repeat all the pros and cons of the various methods here. It should be noted that these methods are not foolproof; new techniques such as Return Oriented Programming (ROP) can be used to circumvent these mitigations. But defense-in-depth would appear to be an increasingly important strategy. If a device can be made secure using these techniques it is probably ahead of 95% of the embedded market and more likely to avoid the larger number of automated attacks; perhaps this a bit like the old joke:
The first zebra said to the second zebra: “Why are you bothering to run? The cheetah is faster than us!”
To which the second zebra replied, “I only have to run faster than you!”Analysis of just a few aspects.
There is an excellent tool “checksec.sh”, that can scan binaries in a Linux system and report their status against a variety of security criteria.
The original author has retained his website but the script was last updated in November 2011. I found a more recently maintained version on GitHub featuring various enhancements (the maintainer even accepted a patch from me addressing a cosmetic bugfix.)
The basic procedure for using the tool against an OpenWRT build is straightforward enough:
- Configure the OpenWRT build and ensure the tar.gz target is enabled ( CONFIG_TARGET_ROOTFS_TARGZ=y )
- To expedite the testing I build OpenWRT with a pretty minimal set of packages
- Build the image as a tar.gz
- Unpack the tar.gz image to a temporary directory
- Run the tool
When repeating a test I made sure I cleaned out the toolchain and the target binaries, because I ahve been bitten by spuriuos results caused by changes to the configuration not propagating through all components. This includes manually removing the staging_dir which may be missed by make clean. This made each build take around 20+ minutes on my quad core Phenom machine.
I used the following base configuration, only changing the target:CONFIG_DEVEL=y CONFIG_DOWNLOAD_FOLDER="/path/to/downloads" CONFIG_BUILD_LOG=y CONFIG_TARGET_ROOTFS_TARGZ=y
I repeated the test for three platform configurations initially (note, these are mutually exclusive choices) – the Carambola2 and rt305x are MIPS platforms.CONFIG_TARGET_ramips_rt305x_MPRA1=y CONFIG_TARGET_x86_kvm_guest=y CONFIG_TARGET_ar71xx_generic_CARAMBOLA2=y
There are several directories most likely to have binaries of interest, so of course I scanned them using a script, essentially consisting of:for p in lib usr/lib sbin usr/sbin bin usr/bin ; do checksec.sh --dir $p ; done
The results were interesting.
(to be continued)Further reading
Mum and Dad had gotten Zoe a family pass to Australia Zoo for her birthday, but we hadn't gotten around to being able to use it until today.
As the zoo opened at 9am, and we had to pick up Mum and Dad along the way, it was an exercise in getting going early. Much to my pleasant surprise, we managed to do it, and arrived at the zoo as they were throwing open the front doors.
Patronage levels were quite low, so it made for pretty easy getting around. As it was the shuttle that runs around the zoo was horribly backed up. It can't scale well at all when the zoo is actually busy.
We caught most of the timed shows, except for the koala one (because of the aforementioned shuttle, mostly). Zoe enjoyed the Wildlife Warriors show in the Crocoseum. It's been quite a few years since I've been to Australia Zoo, and that show in particular has only gotten better.
It's really funny the stuff that Zoe is brave about and afraid of. She'll happily climb all over crocodile sculptures, and touch any animal she can, and even ask questions of the zoo keepers during Q&A, but a fake cave with a dinosaur head baring teeth had her freaking out. To her credit, she wanted to come back later on to face her fears and see what it was she'd seen the first time.
It was a really good day out. The weather was about as good as yesterday, and Zoe handled all the walking around very well. As I predicted, she fell asleep in the car about 8 minutes after we departed, and stayed asleep while we dropped of Mum and Dad and until we arrived back at home. She was still pretty wrecked after that, and bedtime was a bit messy, but we got there in the end.
We decided to upgrade our tickets to an annual pass, as we completely failed to properly look at all the newer stuff on the other side of Zoo Road, so we'll come back again another time.
I've always been appalled at the price tag on entry to Australia Zoo. Zoos in the US are so cheap compared to zoos in Australia. I'll have to write a blog post some time comparing the prices. That said, Australia Zoo is a pretty good quality zoo, even if the range of animals isn't comparable. The Crocoseum is an amazing stadium. I'd hate to be there with all 5000 seats full. The Feeding Frenzy Food Court is also pretty impressive in its capacity. I'd just hate to be in the zoo when either of these were working to capacity, because of the rest of the zoo isn't that big, really.
I have to wonder what it's like for the Irwin family, particularly the kids, growing up there with so much of Steve still front and centre. They had Bindi's songs, like My Daddy The Crocodile Hunter piped all over the place. My personal opinion of the man changed from disdain for someone who seemed so ocker and over the top, to admiration for his passion after watching his interview with Andrew Denton on Enough Rope. I remember where I was when I heard of his passing. He was a big man, who's left a big legacy. I think that's why I'm always happy to go to Australia Zoo, despite the price tag.
Inspired by the Community Leadership Summit run each year before OSCON, Donna Benjamin will be running an event to bring together community leaders, organizers and managers and the projects and organizations that are interested in growing and empowering a strong community.
The event pulls together the leading minds in community management, relations and online collaboration to discuss, debate and continue to refine the art of building an effective and capable community.
The event will be run in a similar manner to the parent event:
as an open unconference-style event in which everyone who attends is welcome to lead and contribute sessions on any topic that is relevant. These sessions are very much discussion sessions: the participants can interact directly, offer thoughts and experience, and share ideas and questions. These unconference sessions are also augmented with a series of presentations from leaders in the field, panel debates and networking opportunities.
Donna Benjamin currently chairs the Drupal community working group, sits on the board of the Drupal Association, and works as community engagement director with PreviousNext. She's also been an advisor to councils of Linux Australia, and was conference director for LCA2008 in Melbourne. Donna has also served as President of Linux Users of Victoria, and as a Director of Open Source Industry Australia.
At the receiver, we need to “sample” the PSK modem signal at just the right time. A timing estimator looks at the received signal and works out the best time to sample. Timing estimation is a major part of a PSK demodulator. if you get the timing estimate wrong, the scatter diagram looks worse and you get more bit errors.
The basic idea is we pass the modem signal through a non-linearity. The non-linearity could be the absolute function (i.e. a rectifier), a square law, or in analog-land a diode. This strips the phase information from the signal leaving the amplitude (envelope of the original signal) bobbing up and down at the symbol rate. Turns out that the phase of this envelope signal is related to the timing offset of the PSK signal. The phase can be extracted using a single point Discrete Fourier Transform (DFT).
Here’s an example using some high school trig functions. Consider a simple BPSK signal that consists of alternating symbols ….,-1,+1,-1,+1….. Once filtered, this will look something like a sine wave at half the symbol rate, r(t)=cos(Rs(t-T)/2), where T is the timing offset, and Rs is the symbol rate. So if Rs=50 symbols/s (50 baud), r(t) would be a sine wave at 25 Hz, with some time offset T.
If we square r(t) we get s(t)=r(t)*r(t) = 0.5 + 0.5*cos(Rs(t-T)), using the trig identify cosacosb=(cos(a-b)+cos(a+b))/2. The second term of s(t) is a sine wave of frequency Rs, with phase = RsT. So if we perform a single point DFT at frequency Rs on s(t), the phase will be related to the timing offset.
That’s pretty much what happens in rx_est_timing(). We use the parallel QPSK signals, and a nice long window of modem samples to get a good estimate in the presence of noise and frequency selective fading.
Linux.conf.au is pleased to announce that the Open Source for Humanitarian Tech Miniconf will be part of Linux.conf.au for the first time in Auckland, New Zealand during January 2015.
Technology is increasingly important in humanitarian response. Now responders are better connected to digital volunteers, more advanced tools such as unmanned-aerial vehicles give a better review of post disaster situations and great quantities of data can be collected and analyzed. Often these solutions are not expendable and are based on expensive proprietary solutions.
The Humanitarian Tech Miniconf will focus on two main audiences:
- Existing technologists who are interested in ways they can assist with technology in humanitarian response.
- Allowing existing projects and participants to share what they are working on and look for ways to integrate.
Technologists who work on UAVs, mesh networks, data collection platforms and content management will be invited to speak, and humanitarians to give a background in humanitarian response to those not familiar.
Kate Chapman is the Executive Director at the Humanitarian OpenStreetMap Team. Her most recent work has been in Indonesia working on a pilot program over the past year analyzing the feasibility utilizing OpenStreetMap for collection of exposure data. This project has hosted a OpenStreetMap mapping competition, a month long event to map critical infrastructure in Jakarta and assisting community facilitators in moving from hand-drawn maps to digital maps. Previous to working at HOT Kate was involved in development of multiple web-GIS applications including GeoCommons and iMapData.
Today was a glorious day, and the universe was very kind to me.
I couldn't quite get myself motivated to go for a run before my chiropractic adjustment, so I had a nice relaxed start to the day instead.
I decided to get the cleaners back for a regular clean, and it was such a much better use of my time. I used the time I recouped to clean out my email inbox, and sort out a whole bunch of things that had been piling up on my to do list.
I biked over for my massage, and my massage therapist said he'd book a Thermomix demo with me, that was pretty awesome of him. Biking home, I decided to phone in an order for a burger from Grill'D and just continue biking over to pick it up. On the way down Oxford Street, I found a $20 bill on the road, so that paid for lunch for me. I was pretty stoked.
On the way home, I ran into my yoga teacher. I love my little village.
I had a little bit of time before I had to head to Kindergarten early to chair the monthly PAG meeting. Because the weather was so nice, I decided to bike in for it. It's hard to believe that Term 3 has ended now. The year is flying by.
We biked home, and only had about 20 minutes before we had to turn around and bike to swim class. After swim class, on a whim, I decided we should go out for dinner, so we biked down to Oxford Street and had some gyoza for dinner and biked home.
Tomorrow's going to be an early start and a big day, and apparently Zoe woke up early this morning, so I got her to bed a little early tonight. It's pretty hot tonight, so hopefully she'll sleep okay.