Support for zero-copy shared memory buffers for bpf(4) has now been committed to the FreeBSD 8-CURRENT tree by Christian Peron.
procstat(1) and supporting kernel changes have now been committed to the FreeBSD 8-CURRENT tree.
With the creation of a new 8-CURRENT tree, I've started putting together an initial personal todo list for FreeBSD 8.0. Right now it's all a bit pie-in-the-sky, but with 18 months to put together a .0 release, here are some of the areas I hope to work in.
After spending several years on network locking, we now have an entirely Giant-free and significantly parallel, network stack. We are continuing to refine the locking, not to mention algorithms, of the network stack for increased scalability. One significant concern with current locking is the pbcinfo lock, a per-protocol global lock that protects the global lists of connections (PCBs). These are held across TCP timer events and in-bound TCP and UDP processing in order to prevent connections from disappearing while being used. Two strategies have been discussed by Mohan, myself, and others: moving to true reference management in the in-bound and timer paths (i.e., refcount inpcb), and decompose pcbinfo into a series of locks where hashes on connection tuples determine which instance to use. The latter is a lower risk strategy as it largely maintains the current model, and offers most of the benefits of a true reference-counted model.
With the adoption of the TrustedBSD MAC Framework by Apple in Mac OS X Leopard, a significant number of refinements were made to the framework that have not yet been adopted back into the base FreeBSD CVS repository. This includes significant sanitization and cleanup of MAC policy methods, for example. We had hoped to include these in FreeBSD 7.0, but this did not get done in time. We will instead ship this in 8.0. One result of this work will be that it will be much easier to share security policy modules between Mac OS X and FreeBSD.
Most MAC policy modules have the following form:
Rather than storing these all in a single file, we should probably break them out into a series of files, with a common internal include file so that functions are visible between them.
The OpenBSM library and kernel code manipulate byte sequences to build up a token-based event log. Manipulating strings of bytes is a classic "risky" activity. This task rewrites the generation code in OpenBSM to use the sbuf library, which should clean up the code as well as reduce the riskiness of the code.
Accept filters allow user applications to change the "connection established" criteria for in-bound connections on listen TCP sockets. This allows them to avoid additional system calls and wakeups until, for example, a complete HTTP request is received on the socket. However, this code was implemented at a time when the kernel didn't run in parallel and wasn't preemptive, and employs recursion into the socket and protocol code in a risky way. It's not yet clear what exactly the right solution is, but a cleanup is definitely necessary. Colin Percival has suggested changing the model from an "arbitrary code runs" model to one in which accept filters simply become predicate functions, in a bid to reduce complexity and reentrance. Another possibility that may need to be considered is processing filters in a different thread from in-bound network processing to reduce reentrance/ recursion.
Currently, the kernel audit system supports a single global trail and a series of application-configured audit pipes. This task would allow multiple independent audit trails to be configured, either globally scoped or scoped to a specific jail. Trails could have different preselection properties, and specific policies regarding the right to submit events to the trail. For example, trails intended for use with specific applications running as specific users, allowing a "lockbox" for audit trails from those applications.
In FreeBSD 7.0, the suser(9) KPI is almost entirely eliminated, replaced with the priv(9) KPI. In this model, specific named privileges are requested, allowing central determination of privileges in a more fine-grained manner. This facilitates future changes in privilege model, auditing of privilege, global configuration of privilege, etc. However, a few locations still exist in the kernel where the old suser(9) KPI appears -- mostly ifdef'd for older FreeBSD versions, but a few not. These last few uses need to be eliminated.
This task would cause the use of kernel privileges to be audited, likely in a style similar to Solaris (although our privilege model is much more fine-grained).
The root-has-all-privilege model has proven remarkably effective over time, but has significant limitations--not least, that root privilege is often significantly in excess of the actual privilege required for an application, forcing excess privilege to be granted to applications unnecessarily. Many systems have approached alternative privilege models--we need to analyze past attempts and implement an adequate but conservative model addressing this requirement. We have past experience implementing POSIX.1e privileges on FreeBSD, and have determined that most approaches carry a significant amount of risk--managing and minimizing that risk will be the tricky part. A starting point here will be to re-read and review POSIX.1e, the Linux model derived from POSIX.1e, and the Solaris model.
Currently, the set of privileges allowed in a jail is hard-coded in kern_jail.c based on an a priori determination of what is "safe". An alternative model would allow that set of privileges to be specified, so that specific jails could have additional privileges enabled or disabled.
FreeBSD 7 ships with significantly improved security regression testing, but much more work needs to be done. Among other things, we need to continue to flesh out the privilege regression suite, and add testing of DAC for various IPC models. We also need to check that P_SUGID and other process protections are properly and fully implemented.
Currently there is inadequate synchronization of interface address lists in the network stack--multicast address lists are locked, but unicast are not. In practice this proves not to be much of an issue since they change only very infrequently, and hardly ever concurrently, but it does need to be fixed. With the advent of rmlocks in 8.0, this should not have a significant performance impact.
FreeBSD currently supports POSIX.1e ACLs, but with support for NFSv4 and ZFS, there is a strong desire to also support NFSv4 ACLs. This requires upgrading the ACL framework in the kernel to support the expression of both ACL types, implementation of NFSv4 ACL logic, and adaptation of the user library and tools. A test suite is also required.
The kernel audit code requires a worker thread, audit_worker, to manage asynchronous BSM conversion and disk writing. Right now, the audit_worker thread is created unconditionally if audit support is compiled into the kernel, but it should really be created and destroyed dynamically as required if audit is enabled and disabled, so that it doesn't consume memory in the event audit isn't enabled.
Currently, the TrustedBSD MAC Framework has basic support for IPv6 in the form of TCP/UDP labeling and access control checks. However, it doesn't have explicit support for some other types of IPv6 traffic, such as nd6 and mld6, and IPv6 packet reassembly. This needs to be fleshed out.
FreeBSD 7.0 ships with ptmx/pts support but it is disabled by default as not enough applications had been updated by ship date. In 8.x, we should enable it by default, and sweep applications to make sure they properly support it.
procfs(4) has historically provided a debugging interface to dump process information -- with the deprecating of procfs in FreeBSD, some of its functions are no longer available, such as the ability to easily inspect the VM layout of a process. This task creates a new tool and supporting sysctls to extract process features, such as VM layout, current state of each thread, and file descriptor information, from the command line or via a core dump.
Kernel dumps and minidumps, while very useful, don't address all debugging needs; likewise, live debugging via serial ports isn't always available, and debugging on the system console has serious limitations. Implement a new dump mechanism to capture the output of automated DDB scripts, allowing the creation of a "textdump" suitable for analysis without access to kernel source and symbols, and appropriate for filing as a bug report.
Earlier this year, I implemented zero-copy BPF buffers under contract to Seccuris. It works, but it needs more formal performance evaluation, design and implementation review, and optimization before it can be merged to the FreeBSD CVS repository for inclusion in 8.0 and possibly later 7.x releases. We also need to feed libpcap changes back to tcpdump.org after cleaning them up.