Planet Linux Australia

Syndicate content
Planet Linux Australia -
Updated: 1 hour 55 min ago

sthbrx - a POWER technical blog: SROP Mitigation

Fri, 2016-05-13 15:23
What is SROP?

Sigreturn Oriented Programming - a general technique that can be used as an exploit, or as a backdoor to exploit another vulnerability.

Okay, but what is it?

Yeah... Let me take you through some relevant background info, where I skimp on the details and give you the general picture.

In Linux, software interrupts are called signals. More about signals here! Generally a signal will convey some information from the kernel and so most signals will have a specific signal handler (some code that deals with the signal) setup.

Signals are asynchronous - ie they can be sent to a process/program at anytime. When a signal arrives for a process, the kernel suspends the process. The kernel then saves the 'context' of the process - all the general purpose registers (GPRs), the stack pointer, the next-instruction pointer etc - into a structure called a 'sigframe'. The sigframe is stored on the stack, and then the kernel runs the signal handler. At the very end of the signal handler, it calls a special system call called 'sigreturn' - indicating to the kernel that the signal has been dealt with. The kernel then grabs the sigframe from the stack, restores the process's context and resumes the execution of the process.

This is the rough mental picture you should have:

Okay... but you still haven't explained what SROP is..?

Well, if you insist...

The above process was designed so that the kernel does not need to keep track of what signals it has delivered. The kernel assumes that the sigframe it takes off the stack was legitimately put there by the kernel because of a signal. This is where we can trick the kernel!

If we can construct a fake sigframe, put it on the stack, and call sigreturn, the kernel will assume that the sigframe is one it put there before and will load the contents of the fake context into the CPU's registers and 'resume' execution from where the fake sigframe tells it to. And that is what SROP is!

Well that sounds cool, show me!

Firstly we have to set up a (valid) sigframe:

By valid sigframe, I mean a sigframe that the kernel will not reject. Luckily most architectures only examine a few parts of the sigframe to determine the validity of it. Unluckily, you will have to dive into the source code to find out which parts of the sigframe you need to set up for your architecture. Have a look in the function which deals with the syscall sigreturn (probably something like sys_sigreturn() ).

For a real time signal on a little endian powerpc 64bit machine, the sigframe looks something like this:

struct rt_sigframe { struct ucontext uc; unsigned long _unused[2]; unsigned int tramp[TRAMP_SIZE]; struct siginfo __user *pinfo; void __user *puc; struct siginfo info; unsigned long user_cookie; /* New 64 bit little-endian ABI allows redzone of 512 bytes below sp */ char abigap[USER_REDZONE_SIZE]; } __attribute__ ((aligned (16)));

The most important part of the sigframe is the context or ucontext as this contains all the register values that will be written into the CPU's registers when the kernel loads in the sigframe. To minimise potential issues we can copy valid values from the current GPRs into our fake ucontext:

register unsigned long r1 asm("r1"); register unsigned long r13 asm("r13"); struct ucontext ctx = { 0 }; /* We need a system thread id so copy the one from this process */ ctx.uc_mcontext.gp_regs[PT_R13] = r13; /* Set the context's stack pointer to where the current stack pointer is pointing */ ctx.uc_mcontext.gp_regs[PT_R1] = r1;

We also need to tell the kernel where to resume execution from. As this is just a test to see if we can successfully get the kernel to resume execution from a fake sigframe we will just point it to a function that prints out some text.

/* Set the next instruction pointer (NIP) to the code that we want executed */ ctx.uc_mcontext.gp_regs[PT_NIP] = (unsigned long) test_function;

For some reason the sys_rt_sigreturn() on little endian powerpc 64bit checks the endianess bit of the ucontext's MSR register, so we need to set that:

/* Set MSR bit if LE */ ctx.uc_mcontext.gp_regs[PT_MSR] = 0x01;

Fun fact: not doing this or setting it to 0 results in the CPU switching from little endian to big endian! For a powerpc machine sys_rt_sigreturn() only examines ucontext, so we do not need to set up a full sigframe.

Secondly we have to put it on the stack:

/* Set current stack pointer to our fake context */ r1 = (unsigned long) &ctx;

Thirdly, we call sigreturn:

/* Syscall - NR_rt_sigreturn */ asm("li 0, 172\n"); asm("sc\n");

When the kernel receives the sigreturn call, it looks at the userspace stack pointer for the ucontext and loads this in. As we have put valid values in the ucontext, the kernel assumes that this is a valid sigframe that it set up earlier and loads the contents of the ucontext in the CPU's registers "and resumes" execution of the process from the address we pointed the NIP to.

Obviously, you need something worth executing at this address, but sadly that next part is not in my job description. This is a nice gateway into the kernel though and would pair nicely with another kernel vulnerability. If you are interested in some more in depth examples, have a read of this paper.

So how can we mitigate this?

Well, I'm glad you asked. We need some way of distinguishing between sigframes that were put there legitimately by the kernel and 'fake' sigframes. The current idea that is being thrown around is cookies, and you can see the x86 discussion here.

The proposed solution is to give every sighand struct a randomly generated value. When the kernel constructs a sigframe for a process, it stores a 'cookie' with the sigframe. The cookie is a hash of the cookie's location and the random value stored in the sighand struct for the process. When the kernel receives a sigreturn, it hashes the location where the cookie should be with the randomly generated number in sighand struct - if this matches the cookie, the cookie is zeroed, the sigframe is valid and the kernel will restore this context. If the cookies do not match, the sigframe is not restored.

Potential issues:

Multithreading: Originally the random number was suggested to be stored in the task struct. However, this would break multi-threaded applications as every thread has its own task struct. As the sighand struct is shared by threads, this should not adversely affect multithreaded applications. Cookie location: At first I put the cookie on top of the sigframe. However some code in userspace assumed that all the space between the signal handler and the sigframe was essentially up for grabs and would zero the cookie before I could read the cookie value. Putting the cookie below the sigframe was also a no-go due to the ABI-gap (a gap below the stack pointer that signal code cannot touch) being a part of the sigframe. Putting the cookie inside the sigframe, just above the ABI gap has been fine with all the tests I have run so far! Movement of sigframe: If you move the sigframe on the stack, the cookie value will no longer be valid... I don't think that this is something that you should be doing, and have not yet come across a scenario that does this.

For a more in-depth explanation of SROP, click here.

sthbrx - a POWER technical blog: Doubles in hex and why Kernel addresses ~= -2

Thu, 2016-05-12 22:22

It started off a regular Wednesday morning when I hear from my desk a colleague muttering about doubles and their hex representation. "But that doesn't look right", "How do I read this as a float", and "redacted you're the engineer, you do it". My interest piqued, I headed over to his desk to enquire about the great un-solvable mystery of the double and its hex representation. The number which would consume me for the rest of the morning: 0xc00000001568fba0.

That's a Perfectly Valid hex Number!

I hear you say. And you're right, if we were to treat this as a long it would simply be 13835058055641365408 (or -4611686018068186208 if we assume a signed value). But we happen to know that this particular piece of data which we have printed is supposed to represent a double (-2 to be precise). "Well print it as a double" I hear from the back, and once again we should all know that this can be achieved rather easily by using the %f/%e/%g specifiers in our print statement. The only problem is that in kernel land (where we use printk) we are limited to printing fixed point numbers, hence why our only easy option was to print our double in it's raw hex format.

This is the point where we all think back to that university course where number representations were covered in depth, and terms like 'mantissa' and 'exponent' surface in our minds. Of course as we rack our brains we realise there's no way that we're going to remember exactly how a double is represented and bring up the IEEE 754 Wikipedia page.

What is a Double?

Taking a step back for a second, a double (or a double-precision floating-point) is a number format used to represent floating-point numbers (those with a decimal component). They are made up of a sign bit, an exponent and a fraction (or mantissa):

Where the number they represent is defined by:

So this means that a 1 in the MSB (sign bit) represents a negative number, and we have some decimal component (the fraction) which we multiply by some power of 2 (as determined by the exponent) to get our value.

Alright, so what's 0xc00000001568fba0?

The reason we're all here to be begin with, so what's 0xc00000001568fba0 if we treat it as a double? We can first split it into the three components:


Sign bit: 1 -> Negative
Exponent: 0x400 -> 2(1024 - 1023)
Fraction: 0x1568fba0 -> 1.something

And then use the formula above to get our number:

(-1)1 x 1.something_* x 2(1024 - 1023)__

But there's a much easier way! Just write ourselves a little program in userspace (where we are capable of printing floats) and we can save ourselves most of the trouble.

#include <stdio.h> void main(void) { long val = 0xc00000001568fba0; printf("val: %lf\n", *((double *) &val)); }

So all we're doing is taking our hex value and storing it in a long (val), then getting a pointer to val, casting it to a double pointer, and dereferencing it and printing it as a float. Drum Roll And the answer is?

"val: -2.000000"

"Wait a minute, that doesn't quite sound right". You're right, it does seem a bit strange that this is exactly -2. Well it may be that we are not printing enough decimal places to see the full result, so update our print statement to:

printf("val: %.64lf\n", *((double *) &val));

And now we get:

"val: -2.0000001595175973534423974342644214630126953125000000"

Much better... But still where did this number come from and why wasn't it the -2 that we were expecting?

Kernel Pointers

At this point suspicions had been raised that what was being printed by my colleague was not what he expected and that this was in fact a Kernel pointer. How do you know? Lets take a step back for a second...

In the PowerPC architecture, the address space which can be seen by an application is known as the effective address space. We can take this and translate it into a virtual address which when mapped through the HPT (hash page table) gives us a real address (or the hardware memory address).

The effective address space is divided into 5 regions:

As you may notice, Kernel addresses begin with 0xc. This has the advantage that we can map a virtual address without the need for a table by simply masking the top nibble.

Thus it would be reasonable to assume that our value (0xc00000001568fba0) was indeed a pointer to a Kernel address (and further code investigation confirmed this).

But What is -2 as a Double in hex?

Well lets modify the above program and find out:

include <stdio.h> void main(void) { double val = -2; printf("val: 0x%lx\n", *((long *) &val)); }


"val: 0xc000000000000000"

Now that sounds much better. Lets take a closer look:


Sign Bit: 1 -> Negative
Exponent: 0x400 -> 2(1024 - 1023)
Fraction: 0x0 -> Zero

So if you remember from above, we have:

(-1)1 x 1.0_* x 2(1024 - 1023)__ = -2

What about -1? -3?

-1: 0xbff0000000000000:

Sign Bit: 1 -> Negative
Exponent: 0x3ff -> 2(1023 - 1023)
Fraction: 0x0 -> Zero

(-1)1 x 1.0_* x 2(1023 - 1023)__ = -1

-3: 0xc008000000000000:

Sign Bit: 1 -> Negative
Exponent: 0x400 -> 2(1024 - 1023)
Fraction: 0x8000000000000 -> 0.5

(-1)1 x 1.5_* x 2(1024 - 1023)__ = -3

So What Have We Learnt?

Firstly, make sure that what you're printing is what you think you're printing.

Secondly, if it looks like a Kernel pointer then you're probably not printing what you think you're printing.

Thirdly, all Kernel pointers ~= -2 if you treat them as a double.

And Finally, with my morning gone, I can say for certain that if we treat it as a double, 0xc00000001568fba0 = -2.0000001595175973534423974342644214630126953125.

Chris Smart: TRIM on LVM on LUKS on SSD, revisited

Thu, 2016-05-12 15:03

A few years ago I wrote about enabling trim on an SSD that was running with LVM on top of LUKS. Since then things have changed slightly, a few times.

With Fedora 24 you no longer need to edit the /etc/crypttab file and rebuild your initramfs. Now systemd supports a kernel boot argument rd.luks.options=discard which is the only thing you should need to do to enable trim on your LUKS device.

Edit /etc/default/grub and add the rd.luks.options=discard argument to the end of GRUB_CMDLINE_LINUX, e.g.:
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_CMDLINE_LINUX="rd.luks.uuid=luks-de023401-ccec-4455-832bf-e5ac477743dc rd.luks.uuid=luks-a6d344739a-ad221-4345-6608-e45f16a8645e rhgb quiet rd.luks.options=discard"

Next, rebuild your grub config file:
sudo grub2-mkconfig -o /boot/grub2/grub.cfg

If you’re using LVM, the setting is the same as the previous post. Edit the /etc/lvm/lvm.conf file and enabled the issue_discards option:
issue_discards = 1

If using LVM you will need to rebuild your initramfs so that the updated lvm.conf is in there.
sudo dracut -f

Reboot and try fstrim:
sudo fstrim -v /

Now also thanks to systemd, you can just enable the fstrim timer (cron) to do this automatically:
sudo systemctl enable fstrim.timer