Planet Linux Australia

Syndicate content
Planet Linux Australia -
Updated: 7 sec ago

OpenSTEM: How to Help Your Child Become a Maker | MakerKids

Tue, 2015-10-20 13:30!How-To-Help-Your-Child-Become-a-Maker/dcsov/562094010cf2c3a4a7109d92

Let’s say your child is currently a classic consumer – they love watching TV, reading books, but they don’t really enjoy making things themselves. Or maybe they are making some things but it’s not really technological. We think any kind of making is awesome, but one of our favourite kinds is the kind where kids realize that they can build and influence the world around them. There’s an awesome Steve Jobs quote that I love, which says:

“When you grow up you tend to get told that the world is the way it is and you’re life is just to live your life inside the world. Try not to bash into the walls too much. Try to have a nice family life, have fun, save a little money.

That’s a very limited life. Life can be much broader once you discover one simple fact: Everything around you that you call life was made up by people that were no smarter than you. And you can change it, you can influence it…

Once you learn that, you’ll never be the same again.”

Imagine if you can figure this out as a child.

Rusty Russell: ccan/mem’s memeqzero iteration

Tue, 2015-10-20 11:28

On Thursday I was writing some code, and I wanted to test if an array was all zero.  First I checked if ccan/mem had anything, in case I missed it, then jumped on IRC to ask the author (and overall CCAN co-maintainer) David Gibson about it.

We bikeshedded around names: memallzero? memiszero? memeqz? memeqzero() won by analogy with the already-extant memeq and memeqstr. Then I asked:

rusty: dwg: now, how much time do I waste optimizing?

dwg: rusty, in the first commit, none

Exactly five minutes later I had it implemented and tested.

The Naive Approach: Times: 1/7/310/37064 Bytes: 50 bool memeqzero(const void *data, size_t length) { const unsigned char *p = data; while (length) { if (*p) return false; p++; length--; } return true; }

As a summary, I’ve give the nanoseconds for searching through 1,8,512 and 65536 bytes only.

Another 20 minutes, and I had written that benchmark, and an optimized version.

128-byte Static Buffer: Times: 6/8/48/5872 Bytes: 108

Here’s my first attempt at optimization; using a static array of 128 bytes of zeroes and assuming memcmp is well-optimized for fixed-length comparisons.  Worse for small sizes, much better for big.

const unsigned char *p = data; static unsigned long zeroes[16]; while (length > sizeof(zeroes)) { if (memcmp(zeroes, p, sizeof(zeroes))) return false; p += sizeof(zeroes); length -= sizeof(zeroes); } return memcmp(zeroes, p, length) == 0; Using a 64-bit Constant: Times: 12/12/84/6418 Bytes: 169

dwg: but blowing a cacheline (more or less) on zeroes for comparison, which isn’t necessarily a win

Using a single zero uint64_t for comparison is pretty messy:

bool memeqzero(const void *data, size_t length) {     const unsigned char *p = data;     const unsigned long zero = 0;     size_t pre;     pre = (size_t)p % sizeof(unsigned long);     if (pre) {         size_t n = sizeof(unsigned long) - pre;         if (n > length)             n = length;         if (memcmp(p, &zero, n) != 0)             return false;         p += n;         length -= n;     }     while (length > sizeof(zero)) {         if (*(unsigned long *)p != zero)             return false;         p += sizeof(zero);         length -= sizeof(zero);     }     return memcmp(&zero, p, length) == 0; }

And, worse in every way!

Using a 64-bit Constant With Open-coded Ends: Times: 4/9/68/6444 Bytes: 165

dwg: rusty, what colour is the bikeshed if you have an explicit char * loop for the pre and post?

That’s slightly better, but memcmp still wins over large distances, perhaps due to prefetching or other tricks.

Epiphany #1: We Already Have Zeroes: Times 3/5/92/5801 Bytes: 422

Then I realized that we don’t need a static buffer: we know everything we’ve already tested is zero!  So I open coded the first 16 byte compare, then memcmp()ed against the previous bytes, doubling each time.  Then a final memcmp for the tail.  Clever huh?

But it no faster than the static buffer case on the high end, and much bigger.

dwg: rusty, that is brilliant. but being brilliant isn’t enough to make things work, necessarily :p

Epiphany #2: memcmp can overlap: Times 3/5/37/2823 Bytes: 307

My doubling logic above was because my brain wasn’t completely in phase: unlike memcpy, memcmp arguments can happily overlap!  It’s still worth doing an open-coded loop to start (gcc unrolls it here with -O3), but after 16 it’s worth memcmping with the previous 16 bytes.  This is as fast as naive with as little as 2 bytes, and the fastest solution by far with larger numbers:

const unsigned char *p = data; size_t len; /* Check first 16 bytes manually */ for (len = 0; len < 16; len++) { if (!length) return true; if (*p) return false; p++; length--; } /* Now we know that's zero, memcmp with self. */ return memcmp(data, p, length) == 0;

You can find the final code in CCAN (or on Github) including the benchmark code.

Finally, after about 4 hours of random yak shaving, it turns out lightning doesn’t even want to use memeqzero() any more!  Hopefully someone else will benefit.

Stewart Smith: TianoCore (UEFI) ported to OpenPower

Tue, 2015-10-20 11:26

Recently, there’s been (actually two) ports of TianoCore (the reference implementation of UEFI firmware) to run on POWER on top of OPAL (provided by skiboot) – and it can be run in the Qemu PowerNV model.

More details: