|
Posted
over 7 years
ago
by
Kamil Rytarowski
Sanitization is a process of detecting potential issues during the execution process.
Sanitizers instrument (embedding checks into the generated code) and interact with
the runtime linked into an executable, either statically or dynamically.
In the
... [More]
past month, I've finished a functional support of MKSANITIZER with Address Sanitizer and Undefined Behavior Sanitizer.
MKSANITIZER uses the default compiler runtime shipped with Clang and GCC and ported to NetBSD.
Over the past month, I've implemented from scratch a clean-room version of the UBSan runtime.
The initial motivation was the need of developing one for the purposes of catching undefined behavior reports
(unspecified code semantics in a compiled executable) in the NetBSD kernel.
However, since we need to write a new runtime,
I've decided to go two steps further and design code that will be usable inside libc and as a
standalone library (linked .c source code) for the use of ATF regression tests.
The µUBSan (micro-UBSan) design and implementation
The original Clang/LLVM runtime is written in C++ with features that are not available in libc and in the NetBSD kernel.
The Linux kernel version of an UBSan runtime is written natively in C, and mostly without additional unportable dependencies,
however, it's GPL, and the number of features is beyond the code generation support in the newest version of Clang/LLVM from trunk (7svn).
The implementation of µUBSan is located in
common/lib/libc/misc/ubsan.c.
The implementation is mostly Machine Independent, however, it assumes a typical 32bit or 64bit CPU with support for typical floating point types.
Unlike the other implementations that I know, µUBSan is implemented without triggering Undefined Behavior.
The whole implementation inside a single C file
I've decided to write the whole µUBSan runtime as a single self-contained .c soure-code file,
as it makes it easier for it to be reused by every interested party.
This runtime can be either inserted inline or linked into the program.
The runtime is written in C, because C is more portable, it's the native language of libc and the kernel, and additionally
it's easier to match the symbols generated by the compilers (Clang and GCC).
According to C++ ABI, C++ symbols are mangled,
and in order to match the requested naming from the compiler instrumentation
I would need to partially tag the code as C file anyway (extern "C").
Additionally, going the C++ way without C++ runtime features is not a typical way to use C++,
and unless someone is a C++ enthusiast it does not buy much.
Additionally, the programming language used for the runtime is almost orthogonal to the instrumentated programming language
(although it must have at minimum the C-level properties to work on pointers and elementary types).
A set of supported reporting features
µUBSan supports all report types except -fsanitize=vtpr.
For vptr there is a need for low-level C++ routines to introspect and validate the low-level parts of the C++ code
(like vtable, compatiblity of dynamic types etc).
While all other UBSan checks are done directly in the instrumented and inlined code, the vptr one is performed in runtime.
This means that most of the work done by a minimal UBSan runtime is about deserializing reports into verbose messages and printing them out.
Furthermore there is an option to configure a compiler to inject crashes once an UB issue will be detected and the runtine might not be needed at all,
however this mode would be difficult to deal with and the sanitized code had to be executed with aid of a debugger to extract any useful information.
Lack of a runtime would make UBSan almost unusable in the internals of base libraries such as libc or inside the kernel.
These Clang/LLVM arguments for UBSan are documented as follows in the
official documentation:
Additionally the following flags can be used:
The GCC runtime is a downstream copy of the Clang/LLVM runtime, and it has a reduced number of checks, since it's behind upstream.
GCC developers sync the Clang/LLVM code from time to time.
The first portion of merged NetBSD support for UBSan and ASan landed in GCC 8.x (NetBSD-8.0 uses GCC 5.x, NetBSD-current as of today uses GCC 6.x).
This version of GCC also contains useful compiler attributes to mark certain parts of the code and disable sanitization of certain functions or files.
Format of the reports
I've decided to design the policy for reporting issues differently to the Linux kernel one.
UBSan in the Linux kernel prints
out messages in a multiline format with stacktrace:
================================================================================
UBSAN: Undefined behaviour in ../include/linux/bitops.h:110:33
shift exponent 32 is to large for 32-bit type 'unsigned int'
CPU: 0 PID: 0 Comm: swapper Not tainted 4.4.0-rc1+ #26
0000000000000000 ffffffff82403cc8 ffffffff815e6cd6 0000000000000001
ffffffff82403cf8 ffffffff82403ce0 ffffffff8163a5ed 0000000000000020
ffffffff82403d78 ffffffff8163ac2b ffffffff815f0001 0000000000000002
Call Trace:
[
Multiline print has an issue of requiring locking that prevents interwinding multiple reports,
as there might be a process of printing them out by multiple threads in the same time.
There is no way to perform locking in a portable way that is functional inside libc and the kernel,
across all supported CPUs and what is more important within all contexts.
Certain parts of the kernel must not block or delay execution and in certain parts of the booting
process (either kernel or libc) locking or atomic primitives might be unavailable.
I've decided that it is enough to print a single-line message where occurred a problem and what was it,
assuming that printing routines are available and functional.
A typical UBSan report looks this way:
Undefined Behavior in /public/netbsd-root/destdir.amd64/usr/include/ufs/lfs/lfs_accessors.h:747:1, member access within misaligned address 0x7f7ff7934444 for type 'union FINFO' which requires 8 byte alignment
These reports are pretty much selfcontained and similar to the ones from the Clang/LLVM runtime:
test.c:4:14: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
Not implementing __ubsan_on_report()
The Clang/LLVM runtime ships with a callback API for the purpose of debuggers that can be notified by sanitizers reports.
A debugger has to define __ubsan_on_report() function and call __ubsan_get_current_report_data() to collect report's information.
As an illustration of usage, there is a testing code shipped with compiler-rt for this feature
(test/ubsan/TestCases/Misc/monitor.cpp):
// Override the definition of __ubsan_on_report from the runtime, just for
// testing purposes.
void __ubsan_on_report(void) {
const char *IssueKind, *Message, *Filename;
unsigned Line, Col;
char *Addr;
__ubsan_get_current_report_data(&IssueKind, &Message, &Filename, &Line, &Col,
&Addr);
std::cout << "Issue: " << IssueKind << "\n"
<< "Location: " << Filename << ":" << Line << ":" << Col << "\n"
<< "Message: " << Message << std::endl;
(void)Addr;
}
Unfortunately this API is not thread aware and guaranteeing so in the implementation
would require excessively complicated code shared between the kernel and libc.
The usability is still restricted to debugger (like a LLDB plugin for UBSan),
there is already an alternative plugin for such use-cases when it would matter.
I've documented the __ubsan_get_current_report_data() routine with the following comment:
/*
* Unimplemented.
*
* The __ubsan_on_report() feature is non trivial to implement in a
* shared code between the kernel and userland. It's also opening
* new sets of potential problems as we are not expected to slow down
* execution of certain kernel subsystems (synchronization issues,
* interrupt handling etc).
*
* A proper solution would need probably a lock-free bounded queue built
* with atomic operations with the property of multiple consumers and
* multiple producers. Maintaining and validating such code is not
* worth the effort.
*
* A legitimate user - besides testing framework - is a debugger plugin
* intercepting reports from the UBSan instrumentation. For such
* scenarios it is better to run the Clang/GCC version.
*/
Reporting channels
The basic reporting channel for kernel messages is the dmesg(8) buffer.
As an implementation detail I'm using variadic output routines (va_list) such as vprintf() ones.
Depending on the type of a report there are two types of calls used in the kernel:
printf(9) - for non-fatal reports, when a kernel can continue execution.
panic(9) - for fatal reports stopping the kernel execution with a panic string.
The userland version has three reporting channels:
standard output (stdout),
standard error (stderr),
syslog (LOG_DEBUG | LOG_USER)
Additionally, a user can tune into the runtime whether non-fatal reports are turned into fatal messages or not.
The fatal messages stop the execution of a process and raise the abort signal (SIGABRT).
The dynamic options in uUBSan can be changed with LIBC_UBSAN environment variable.
The variable accepts options specified with single characters that either enable or disable a specified option.
There are the following options supported:
a - abort on any report,
A - do not abort on any report,
e - output report to stderr,
E - do not output report to stderr,
l - output report to syslog,
L - do not output report to syslog,
o - output report to stdout,
O - do not output report to stdout.
The default configuration is "AeLO".
The flags are parsed from left to right and supersede previous options for the same property.
Differences between µUBsan in the kernel, libc and as a standalone library
There are three contexts of operation of µUBsan and there is need to use conditional compilation in few parts.
I've been trying to keep to keep the differences to an absolute minimum, they are as follows:
kUBSan uses kernel-specific headers only.
uUBSan uses userland-specific headers, with a slight difference between libc ("namespace.h" internal header usage) and standalone userspace usage (in ATF tests).
uUBSan defines a fallback definition of kernel-specific macros for the ISSET(9) API.
kUBSan does not build and does not handle floating point routines.
kUBSan outputs reports with either printf(9) or panic(9).
uUBSan outputs reports to either stdout, stderr or syslog (or to a combination of them).
kUBSan does not contain any runtime switches and is configured with build options (like whether certain reports are fatal or not) using
the CFLAGS argument and upstream compiler flags.
uUBSan does contain runtime dynamic configuration of the reporting channel and whether a report is turned into a fatal error.
MKLIBCSANITIZER
I've implemented a global build option of the distribution MKLIBCSANITIZER.
A user can build the whole userland including libc, libm, librt, libpthread with a dedicated sanitizer implemented inside libc.
Right now, there is only support for the Undefined Behavior sanitizer with the µUBSan runtime.
I've documented this feature in share/mk/bsd.README with the following text:
MKLIBCSANITIZER If "yes", use the selected sanitizer inside libc to compile
userland programs and libraries as defined in
USE_LIBCSANITIZER, which defaults to "undefined".
The undefined behavior detector is currently the only supported
sanitizer in this mode. Its runtime differs from the UBSan
available in MKSANITIZER, and it is reimplemented from scratch
as micro-UBSan in the user mode (uUBSan). Its code is shared
with the kernel mode variation (kUBSan). The runtime is
stripped down from C++ features, in particular -fsanitize=vptr
is not supported and explicitly disabled. The only runtime
configuration is restricted to the LIBC_UBSAN environment
variable, that is designed to be safe for hardening.
The USE_LIBCSANITIZER value is passed to the -fsanitize=
argument to the compiler in CFLAGS and CXXFLAGS, but not in
LDFLAGS, as the runtime part is located inside libc.
Additional sanitizer arguments can be passed through
LIBCSANITIZERFLAGS.
Default: no
This means that a user can build the distribution with the following command:
./build.sh -V MKLIBCSANITIZER=yes distribution
The number of issues detected is overwhelming.
The Clang/LLVM toolchain - as mentioned above - reports much more potential bugs than GCC,
but with both compilers during the execution of ATF tests there are thousands or reports.
Most of them are reported multiple times and the number of potential code flaws is around 100.
An example log of execution of the ATF tests with MKLIBCSANITIZER (GCC):
atf-mklibcsanitizer-2018-07-25.txt.
I've also prepared a version that is preprocessed with identical lines removed, and reduced to UBSan reports only:
atf-mklibcsanitizer-2018-07-25-processed.txt.
I've fixed a selection of reported issues, mostly the low-hanging fruit ones.
Part of the reports, especially the misaligned pointer usage ones
(for variables it means that their address has to be a multiplication of their size)
usage ones might be controversial.
Popular CPU architectures such as X86 are tolerant to misaligned pointer usage
and most programmers are not aware of potential issues in other environments.
I defer further discussion on this topic to other resources, such as the kernel misaligned data pointer policy in other kernels.
Kernel Undefined Behavior Sanitizer
As already noted, kUBSan uses the same runtime as uBSan with a minimal conditional switches.
µUBSan can be enabled in a kernel config with the KUBSAN option.
Althought, the feature is Machine Independent, I've been testing it with the NetBSD/amd64 kernel.
The Sanitizer can be enabled in the kernel configuration with the following diff:
Index: sys/arch/amd64/conf/GENERIC
===================================================================
RCS file: /cvsroot/src/sys/arch/amd64/conf/GENERIC,v
retrieving revision 1.499
diff -u -r1.499 GENERIC
--- sys/arch/amd64/conf/GENERIC 3 Aug 2018 04:35:20 -0000 1.499
+++ sys/arch/amd64/conf/GENERIC 7 Aug 2018 00:10:44 -0000
@@ -111,7 +111,7 @@
#options KGDB # remote debugger
#options KGDB_DEVNAME="\"com\"",KGDB_DEVADDR=0x3f8,KGDB_DEVRATE=9600
makeoptions DEBUG="-g" # compile full symbol table for CTF
-#options KUBSAN # Kernel Undefined Behavior Sanitizer (kUBSan)
+options KUBSAN # Kernel Undefined Behavior Sanitizer (kUBSan)
#options SYSCALL_STATS # per syscall counts
#options SYSCALL_TIMES # per syscall times
#options SYSCALL_TIMES_HASCOUNTER # use 'broken' rdtsc (soekris)
As a reminder, the command to build a kernel is as follows:
./build.sh kernel=GENERIC
A number of issues have been detected and a selection of them already fixed.
Some of the fixes change undefined behavior into inplementation specific behavior,
which might be treated as appeasing the sanitizer,
e.g. casting a variable to an unsigned type, shifting bits and casting back to signed.
ATF tests
I've implemented 38 test scenarios verifying various types of Undefined Behavior that can be caught by the sanitizer.
The are two sets of tests: C and C++ ones and they are located in tests/lib/libc/misc/t_ubsan.c and tests/lib/libc/misc/t_ubsanxx.cpp.
Some of the issues are C and C++ specific only, others just C or C++ ones.
I've decided to achieve the following purposes of the tests:
Validation of µUBSan.
Validation of compiler instrumentation part (independent from the default compiler runtime correctness).
The following tests have been implemented:
add_overflow_signed
add_overflow_unsigned
builtin_unreachable
cfi_bad_type
cfi_check_fail
divrem_overflow_signed_div
divrem_overflow_signed_mod
dynamic_type_cache_miss
float_cast_overflow
function_type_mismatch
invalid_builtin_ctz
invalid_builtin_ctzl
invalid_builtin_ctzll
invalid_builtin_clz
invalid_builtin_clzl
invalid_builtin_clzll
load_invalid_value_bool
load_invalid_value_enum
missing_return
mul_overflow_signed
mul_overflow_unsigned
negate_overflow_signed
negate_overflow_unsigned
nonnull_arg
nonnull_assign
nonnull_return
out_of_bounds
pointer_overflow
shift_out_of_bounds_signednessbit
shift_out_of_bounds_signedoverflow
shift_out_of_bounds_negativeexponent
shift_out_of_bounds_toolargeexponent
sub_overflow_signed
sub_overflow_unsigned
type_mismatch_misaligned
vla_bound_not_positive
integer_divide_by_zero
float_divide_by_zero
The tests have all been verified to work with the following configurations:
amd64 and i386,
Clang/LLVM (started with 3.8, later switched to 7svn) and GCC 6.x,
C and C++.
Changes merged with the NetBSD sources
Avoid unportable signed integer left shift in intr_calculatemasks()
Avoid unportable signed integer left shift in fd_used()
Try to appease KUBSan in sys/sys/wait.h in W_EXITCODE()
Avoid unportable signed integer left shift in fd_isused()
Avoid unportable signed integer left shift in fd_copy()
Avoid unportable signed integer left shift in fd_unused()
Paper over Undefined Behavior in in6_control1()
Avoid undefined operation in signed integer shift in MAP_ALIGNED()
Avoid Undefined Behavior in pr_item_notouch_get()
Avoid Undefined Behavior in ffs_clusteracct()
Avoid undefined behavior in pr_item_notouch_put()
Avoid undefined behavior in pciiide macros
Avoid undefined behavior in scsipiconf.h in _4ltol() and _4btol()
Avoid undefined behavior in mq_recv1()
Avoid undefined behavior in mq_send1()
Avoid undefined behavior in lwp_ctl_alloc()
Avoid undefined behavior in lwp_ctl_free()
Remove UB from definition of symbols in i915_reg.h
Correct unportable signed integer left shift in i386/amd64 tss code
Remove unaligned access to mpbios_page[] (reverted)
Try to avoid signed integer overflow in callout_softclock()
Avoid undefined behavior of signedness bit shift in ahcisata_core.c
Disable profile and compat 32-bit tests cc sanitizer tests
Disable profile and compat 32-bit c++ sanitizer tests
Use __uint128_t conditionally in aarch64 reg.h
TODO.sanitizers: Remove a finished item
Avoid potential undefined behavior in bta2dpd(8)
Appease GCC in hci_filter_test()
Document the default value of MKSANITIZER in bsd.README
Avoid undefined behavior in ecma167-udf.h
Avoid undefined behavior in left bit shift in jemalloc(3)
Avoid undefined behavior in an ATF test: t_types
Avoid undefined behavior in an ATF test: t_bitops
Avoid undefined behavior semantics in msdosfs_fat.c
Document MKLIBCSANITIZER in bsd.README
Introduce MKLIBCSANITIZER in the share/mk rules
Introduce a new option -S in crunchgen(1)
Specify NOLIBCSANITIZER in x86 bootloader-like code under sys/arch/
Specify NOLIBCSANITIZER for rescue
Avoid undefined behavior in the definition of LAST_FRAG in xdr_rec.c
Avoid undefined behavior in ftok(3)
Avoid undefined behavior in an cpuset.c
Avoid undefined behavior in an inet_addr.c
Avoid undefined behavior in netpgpverify
Avoid undefined behavior in netpgpverify/sha2.c
Avoid undefined behavior in snprintb.c
Specify NOLIBCSANITIZER in lib/csu
Import micro-UBSan (ubsan.c)
Fix build failure in dhcpcd under uUBSan
Fix dri7 build with Clang/LLVM
Fix libGLU build with Clang/LLVM
Fix libXfont2 build with Clang/LLVM on i386
Fix xf86-video-wsfb build with Clang/LLVM
Disable sanitization of -fsanitize=function in libc
Allow to overwrite sanitizer flags for userland
Tidy up the comment in ubsan.c
Register a new directory in common/lib/libc/misc
Import micro-UBSan ATF tests
Register micro-UBSan ATF tests in the distribution
Add a support to build ubsan.c in libc
Appease GCC in the openssh code when built with UBSan
Register kUBSan in the GENERIC amd64 kernel config
Fix distribution lists with MKCATPAGES=yes
Restrict -fno-sanitize=function to Clang/LLVM only
Try to fix the evbppc-powerpc64 build
Summary
The NetBSD community has aquired a new clean-room Undefined Behavior sanitizer runtime µUBSan,
that is already ready to use by the community of developers.
There are three modes of µUBSan:
kUBSan - kernelmode UBSan,
uUBSan - usermode UBSan - as MKLIBCSANITIZER inside libc,
uUBSan - usermode UBSan - as a standalone .c library for use with ATF tests.
A new set of bugs can be detected with a new development tool, ensuring better quality of the NetBSD Operating System.
It's worth to note the selection of fixes have been ported and/or pushed to other projects.
Among them FreeBSD developers merged some of the patches into their soures.
The new runtime is designed to be portable and resaonably licensed (BSD-2-clause) and can be
reused by other operating systems, improving the overall quality in them.
Plan for the next milestone
The Google Summer of Code programming period is over and I intend to finish two leftover tasks::
Port the ptrace(2) attach functionality in honggfuzz to NetBSD.
It will allow catching crash signals more effectively during the fuzzing process.
Resume the porting process (together with the student) of Address Sanitizer to the NetBSD kernel.
This work was sponsored by The NetBSD Foundation.
The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL, and chip in what you can:
http://netbsd.org/donations/#how-to-donate [Less]
|
|
Posted
over 7 years
ago
|
|
Posted
over 7 years
ago
by
snj
The NetBSD release engineering team is announcing a new support policy for our release branches. This affects NetBSD 8.0 and subsequent major releases (9.0, 10.0, etc.). All currently supported releases (6.x and 7.x) will keep their existing
... [More]
support policies.
Beginning with NetBSD 8.0, there will be no more teeny branches (e.g., netbsd-8-0).
This means that netbsd-8 will be the only branch for 8.x and there will be only one category of releases derived from 8.0: update releases. The first update release after 8.0 will be 8.1, the next will be 8.2, and so on. Update releases will contain security and bug fixes, and may contain new features and enhancements that are deemed safe for the release branch.
With this simplification of our support policy, users can expect:
More frequent releases
Better long-term support (example: quicker fixes for security issues, since there is only one branch to fix per major release)
New features and enhancements to make their way to binary releases faster (under our current scheme, no major release has received more than two feature updates in its life)
We understand that users of teeny branches may be concerned about the increased number of changes that update releases will bring. Historically, NetBSD stable branches (e.g., netbsd-7) have been managed very conservatively. Under this new scheme, the release engineering team will be even more strict in what changes we allow on the stable branch. Changes that would create issues with backwards compatibility are not allowed, and any changes made that prove to be problematic will be promptly reverted.
The support policy we've had until now was nice in theory, but it has not worked out in practice. We believe that this change will benefit
the situation for vast majority of NetBSD users. [Less]
|
|
Posted
over 7 years
ago
by
martin
The NetBSD Project is pleased to announce NetBSD 8.0, the sixteenth
major release of the NetBSD operating system. It represents many bug fixes, additional hardware support and
new security features.
If you are running an earlier release of NetBSD, we
... [More]
strongly
suggest updating to 8.0.
For more details, please see the
release notes.
Complete source and binaries for NetBSD are available for download at
many sites around the world and our
CDN.
A list of download sites providing FTP,
AnonCVS, and other services may be found at the list of mirrors.
[Less]
|
|
Posted
over 7 years
ago
by
Leonardo Taccari
Prepared by Keivan Motavalli as part of GSoC 2018.
Packages may install code (both machine executable code and
interpreted programs), documentation and manual pages, source
headers, shared libraries and other resources such as graphic
elements
... [More]
, sounds, fonts, document templates, translations and
configuration files, or a combination of them.
Configuration files are usually the mean through which the behaviour
of software without a user interface is specified. This covers
parts of the operating systems, network daemons and programs in
general that don't come with an interactive graphical or textual
interface as the principal mean for setting options.
System wide configuration for operating system software tends to
be kept under /etc, while configuration for software installed via
pkgsrc ends up under LOCALBASE/etc
(e.g., /usr/pkg/etc).
Software packaged as part of pkgsrc provides example configuration
files, if any, which usually get extracted to
LOCALBASE/share/examples/PKGBASE/.
After a package has been extracted pre-pending the
PREFIX(/LOCALBASE?)
to relative file paths as listed in the PLIST file, metadata entries
(such as +BUILD_INFO, +DESC, etc) get extracted to
PKG_DBDIR/PKGNAME-PKGVERSION (creating files under
/usr/pkg/pkgdb/tor-0.3.2.10, as an example).
Some shell script also get extracted there, such as +INSTALL and
+DEINSTALL. These incorporate further snippets that get copied
out to distinct files after pkg_add executes the +INSTALL script
with UNPACK as argument.
Two main frameworks exist taking care of installation and deinstallation
operations: pkgtasks, still experimental, is structured as a library
of POSIX-compliant shell scripts implementing functions that get
included from LOCALBASE/share/pkgtasks-1 and called by the
+INSTALL and +DEINSTALL scripts upon execution.
Currently pkgsrc defaults to using the pkginstall
framework, which as mentioned copies out from the main file separate,
monolithic scripts handling the creation and removal of directories
on the system outside the PKGBASE, user accounts, shells, the setup
of fonts... Among these and other duties, +FILES ADD, as
called by +INSTALL, copies with correct permissions files from the
PKGBASE to the system, if required by parts of the package such as init
scripts and configuration files.
Files to be copied are added as comments to the script at package
build time, here's an example:
# FILE: /etc/rc.d/tor cr share/examples/rc.d/tor 0755
# FILE: etc/tor/torrc c share/examples/tor/torrc.sample 0644
"c" indicates that LOCALBASE/share/examples/rc.d/tor
is to be copied in place to /etc/rc.d/tor with permissions 755,
"r" that it is to be handled as an rc.d script.
LOCALBASE/share/examples/tor/torrc.sample, the example file coming
with default configuration options for the tor network daemon, is
to be copied to LOCALBASE/etc/tor/torrc.
As of today, this only happens if the package has never been
installed before and said configuration file doesn't already exist
on the system, this to avoid overwriting explicit option changes
made by the user (or site administrator) when upgrading or reinstalling
packages.
Let's see where how it's done...
actions are defined under case switches:
case $ACTION in
ADD)
${SED} -n "/^\# FILE: /{s/^\# FILE: //;p;}" ${SELF} | ${SORT} -u |
while read file f_flags f_eg f_mode f_user f_group; do
…
case "$f_flags:$_PKG_CONFIG:$_PKG_RCD_SCRIPTS" in
*f*:*:*|[!r]:yes:*|[!r][!r]:yes:*|[!r][!r][!r]:yes:*|*r*:yes:yes)
if ${TEST} -f "$file"; then
${ECHO} "${PKGNAME}: $file already exists"
elif ${TEST} -f "$f_eg" -o -c "$f_eg"; then
${ECHO} "${PKGNAME}: copying $f_eg to $file"
${CP} $f_eg $file
[...]
[...]
Programs and commands are called using variables set in the script
and replaced with platform specific paths at build time, using the
FILES_SUBST facility (see mk/pkginstall/bsd.pkginstall.mk) and
platform tools definitions under mk/tools.
In order to also store revisions of example configuration files in
a version control system, +FILES needs to be modified to always
store revisions in a VCS, and to attempt merging changes non
interactively when a configuration file is already installed on
the system.
In order to avoid breakage, installed configuration is backed up
first in the VCS, separating user-modified files from files that
have been already automatically merged in the past, in order to
allow the administrator to easily restore the last manually edited
file in case of breakage.
Branches are deliberately not used, since not everyone may wish to
get familiar with version control systems technicalities when
attempting to make a broken system work again.
Here's what the modified pkginstall +FILES script does when installing spamd:
case "$f_flags:$_PKG_CONFIG:$_PKG_RCD_SCRIPTS" in
*f*:*:*|[!r]:yes:*|[!r][!r]:yes:*|[!r][!r][!r]:yes:*|*r*:yes:yes)
if ${TEST} "$_PKG_RCD_SCRIPTS" = "no" -a ! -n "$NOVCS"; then
VCS functionality only applies to configuration files, not to rc.d
scripts, and only if the environment variable $NOVCS
is unset. Set it to any value - yes will work :) - to disable the
handling of configuration file revisions.
A small note: these options could, in the future, be parsed by
pkg_add from some configuration file and passed calling
setenv before executing +INSTALL, without the need to
pass them as arguments and thus minimizing code path changes.
$VCSDIR is used to set a working directory for VCS
functionality different from the default one, VARBASE/confrepo.
VCSDIR/automergedfiles
is a textual list made by the absolute paths of installed configuration
files already automatically merged in the past during package
upgrades.
Manually remove entries from the list when you make manual
configuration changes after a package has been automatically merged!
And don't worry: automatic merging is disabled by default, set
$VCSAUTOMERGE to enable it.
When a configuration file already exists on the system, if it is
absent from VCSDIR/automergedfiles, it is assumed to be user
edited and copied to
VCSDIR/user/path/to/installed/file is a working file
REGISTERed (added and committed) to the version control system.
Check it out and restore it from there in case of breakage!
If the file is about to get automatically merged, and the operation
already succeeded in the past, then you can find automatically
merged revisions of installed configuration files under
VCSDIR/automerged/path/to/installed/file
checkout the required revision!
A new script, +VERSIONING, handles operations such as
PREPARE (checks that a vcs repository is initialized),
REGISTER (adds a configuration file from the working directory to the repo),
COMMIT (commit multiple REGISTER actions after all configuration
has been handled by the +FILES script, for VCSs that support atomic
transactions), CHECKOUT (checks out the last revision of a file to
the working directory) and CHECKOUT-FIRST (checks out the first
revision of a file).
The version control system to be used as a backend can be set
through $VCS. It default to RCS, the Revision Control System, which
works only locally and doesn't support atomic transactions.
It will get setup as a tool when bootstrapping pkgsrc on platforms
that don't already come with it.
Other backends such as CVS are supported and more will come; these,
being used at the explicit request of the administrator, need to
be already installed and placed in a directory part of $PATH.
Let's see what happens with rcs when NOVCS is unset, installing
spamd (for the first time).
cd pkgsrc/mail/spamd
# bmake
=> Bootstrap dependency digest>=20010302: found digest-20160304
===> Skipping vulnerability checks.
> Fetching spamd-20060330.tar.gz
[...]
bmake install
===> Installing binary package of spamd-20060330nb2
spamd-20060330nb2: Creating group ``_spamd''
spamd-20060330nb2: Creating user ``_spamd''
useradd: Warning: home directory `/var/chroot/spamd' doesn't exist, and -m was not specified
rcs: /var/confrepo/defaults//usr/pkg/etc/RCS/spamd.conf,v: No such file or directory
/var/confrepo/defaults//usr/pkg/etc/spamd.conf,v <-- /var/confrepo/defaults//usr/pkg/etc/spamd.conf
initial revision: 1.1
done
REGISTER /var/confrepo/defaults//usr/pkg/etc/spamd.conf
spamd-20060330nb2: copying /usr/pkg/share/examples/spamd/spamd.conf to /usr/pkg/etc/spamd.conf
===========================================================================
The following files should be created for spamd-20060330nb2:
/etc/rc.d/pfspamd (m=0755)
[/usr/pkg/share/examples/rc.d/pfspamd]
===========================================================================
===========================================================================
$NetBSD: MESSAGE,v 1.1.1.1 2005/06/28 12:43:57 peter Exp $
Don't forget to add the spamd ports to /etc/services:
spamd 8025/tcp # spamd(8)
spamd-cfg 8026/tcp # spamd(8) configuration
===========================================================================
/usr/pkg/etc/spamd.conf didn't already exists, so in the end,
as usual, the example/default configuration
/usr/pkg/share/examples/spamd/spamd.conf gets copied to
PKG_SYSCONFDIR/spamd.conf.
The modified +FILES script also copied the example
file under the VCS working directory at
/var/confrepo/default/share/examples/spamd/spamd.conf
it then REGISTEREd this (initial) revision of the default configuration with RCS.
When installing an updated (ouch!) spamd package, the installed
configuration at /usr/pkg/etc/spamd.conf won't get touched, but a
new revision of share/examples/spamd/spamd.conf will get stored
using the revision control system.
For VCSs that support them, remote repositories can also be used via $REMOTEVCS.
From the +VERSIONING comment:
REMOTEVCS, if set, must contain a string that the chosen VCS understands as
an URI to a remote repository, including login credentials if not specified
through other means. This is non standard across different backends, and
additional environment variables and cryptographic material
may need to be provided.
So, if using CVS accessing a remote repository over ssh, one should setup keys on the systems,
then set and export
VCS=cvs
CVS_RSH=/usr/bin/ssh
REMOTEVCS=user@hostname:/path/to/existing/repo
Remember to initialize (e.g., mkdir -p /path/to/repo; cd /path/to/repo;
cvs init) the repository on the remote system before attempting to
install new packages.
Let's try to make a configuration change to spamd.conf and reinstall it:
I will enable whitelists uncommenting
#whitelist:\
# :white:\
# :method=file:\
# :file=/var/mail/whitelist.txt:
...and enable automerge:
export VCSAUTOMERGE=yes
bmake install
[...]
merged with no conflict. installing it to /usr/pkg/etc/spamd.conf!
No conflicts get reported, diff shows no output since the installed
file is already identical to the automerged one, which is installed
again and contains the whitelisting options uncommented:
more /usr/pkg/etc/spamd.conf
# Whitelists are done like this, and must be added to "all" after each
# blacklist from which you want the addresses in the whitelist removed.
#
whitelist:\
:white:\
:method=file:\
:file=/var/mail/whitelist.txt:
Let's simulate instead the addition of a new configuration option
in a new package revision: this shouldn't generate conflicts!
bmake extract
===> Extracting for spamd-20060330nb2
vi work/spamd-20060330/etc/spamd.conf
# spamd config file, read by spamd-setup(8) for spamd(8)
#
# See spamd.conf(5)
# this is a new comment!
#
save, run bmake; bmake install:
===> Installing binary package of spamd-20060330nb2
RCS file: /var/confrepo/defaults//usr/pkg/etc/spamd.conf,v
done
/var/confrepo/defaults//usr/pkg/etc/spamd.conf,v <-- /var/confrepo/defaults//usr/pkg/etc/spamd.conf
new revision: 1.9; previous revision: 1.8
done
REGISTER /var/confrepo/defaults//usr/pkg/etc/spamd.conf
spamd-20060330nb2: /usr/pkg/etc/spamd.conf already exists
spamd-20060330nb2: attempting to merge /usr/pkg/etc/spamd.conf with new defaults!
saving the currently installed revision to /var/confrepo/automerged//usr/pkg/etc/spamd.conf
RCS file: /var/confrepo/automerged//usr/pkg/etc/spamd.conf,v
done
/var/confrepo/automerged//usr/pkg/etc/spamd.conf,v <-- /var/confrepo/automerged//usr/pkg/etc/spamd.conf
file is unchanged; reverting to previous revision 1.1
done
/var/confrepo/defaults//usr/pkg/etc/spamd.conf,v --> /var/confrepo/defaults//usr/pkg/etc/spamd.conf
revision 1.1
done
merged with no conflict. installing it to /usr/pkg/etc/spamd.conf!
--- /usr/pkg/etc/spamd.conf 2018-07-09 22:21:47.310545283 +0200
+++ /var/confrepo/defaults//usr/pkg/etc/spamd.conf.automerge 2018-07-09 22:29:16.597901636 +0200
@@ -5,6 +5,7 @@
# See spamd.conf(5)
#
# Configures whitelists and blacklists for spamd
+# this is a new comment!
#
# Strings follow getcap(3) convention escapes, other than you
# can have a bare colon (:) inside a quoted string and it
revert from the last revision of /var/confrepo/automerged//usr/pkg/etc/spamd.conf if needed
===========================================================================
The following files should be created for spamd-20060330nb2:
/etc/rc.d/pfspamd (m=0755)
[/usr/pkg/share/examples/rc.d/pfspamd]
===========================================================================
===========================================================================
$NetBSD: MESSAGE,v 1.1.1.1 2005/06/28 12:43:57 peter Exp $
Don't forget to add the spamd ports to /etc/services:
spamd 8025/tcp # spamd(8)
spamd-cfg 8026/tcp # spamd(8) configuration
===========================================================================
more /usr/pkg/etc/spamd.conf
[...]
# See spamd.conf(5)
#
# Configures whitelists and blacklists for spamd
# this is a new comment!
#
# Strings follow getcap(3) convention escapes, other than you
[...]
# Whitelists are done like this, and must be added to "all" after each
# blacklist from which you want the addresses in the whitelist removed.
#
whitelist:\
:white:\
:method=file:\
:file=/var/mail/whitelist.txt:
We're set for now. In case of conflicts merging, the user is
notified, the installed configuration file is not replaced and the
conflict can be manually resolved by opening the file (as an example,
/var/confrepo/defaults/usr/pkg/etc/spamd.conf.automerge)
in a text editor.
[Less]
|
|
Posted
over 7 years
ago
|
|
Posted
over 7 years
ago
by
Leonardo Taccari
On July 7th and 8th there was
pkgsrcCon 2018
in Berlin,
Germany. It was my first
pkgsrcCon
and it was really really nice... So, let's share a report about it,
what we have done, the talk presented and everything else!
Friday (06/07): Social Event
... [More]
I arrived by plane at
Berlin Tegel Airport
in the middle of the afternoon. TXL buses were pretty full but after waiting
for 3 of them, I was finally in the direction for
Berlin Hauptbahnhof
(nice thing about the buses is that after many are getting
too full they start to arrive minute after minute!)
and then took the S7 for Berlin
Jannowitzbrücke station, just a couple of minutes on foot to
republik-berlin (for the Friday
social event).
On 18:00 we met in
republik-berlin for the social event.
We had good burgers there and one^Wtwo^Wsome beers
together!
The place were a bit noisy for the Belgium vs Brazil World Cup match, but we
still had nice discussions together (and also without losing a lot of
people cheering on! :))
There was also a table tennis table and spz, maya,
youri and myself played (I'm a terrible table tennis player
but it was very funny to play the wild west without any rules! :)).
Saturday (07/07): Talks session
Meet & Greet -- Pierre Pronchery (khorben), Thomas Merkel (tm)
Pierre and Thomas welcomed us (aliens! :)) in
c-base. c-base is a space station under
Berlin (or probably one of the oldest hackerspace, at least old enough that the
word "hackerspace" even didn't existed!).
Slides
(PDF) are available!
Keynote: Beautiful Open Source -- Hugo Teso
Hugo talked about his experience as an open source developer and focused in
particular how important is the user interface.
He discussed that examinating some projects he worked on: Inguma, Bokken,
Iaitö and Cutter
extracting patterns about his experience.
Slides
(PDF) are available!
The state of desktops in pkgsrc -- Youri Mouton (youri)
Youri discussed about the state of desktop environments (DE) in pkgsrc starting
with xfce,
MATE,
LXDE,
KDE and
Defora.
He then discussed about the WIP desktop environments:
Cinnamon,
LXQT,
Gnome 3 and
CDE, hardware support and
login managers.
Especially for the WIP desktop environments help is more than welcomed so if
you're interested in any of that, would like to help (that's also a great way to
start involved in pkgsrc!) please get in touch with youri and/or
give a look at the wip/*/TODO files in pkgsrc-wip!
NetBSD & Mercurial: One year later -- Jörg Sonnenberger (joerg)
Jörg started discussing about Git (citing
High-level Problems with Git and How to Fix Them - Gregory Szorc) and then
discussed on why using Mercurial.
Then he announced the latest changes: hgmaster.NetBSD.org and
anonhg.NetBSD.org that permits to experiment with Mercurial and
source-changes-hg@ and pkgsrc-changes-hg@ mailing
lists.
The talk ended describing missing/TODO steps.
Slides (HTML) are available!
Maintaining qmail in 2018 -- Amitai Schleier (schmonz)
Amitai shared his long experience in maintaining
qmail.
A lot of lesson learned in doing that were shared and it was also funny to see
that at a certain point from MAINTAINER he was more and more involved doing that
and ending up writing patches and tools for qmail.
Slides (HTML)
are available!
A beginner's introduction to GCC -- Maya Rashish (maya)
Maya discussed about GCC. First she talked about an overview of the toolchain
(in general) and the corresponding GCC projects, how to pass flags to each of
them and how to stop the compilation process for each of them.
Then she talked about the black magic that happens in preprocessor, for example, what
a program does an #include and why
__NetBSD__ is defined.
We then saw that with -save-temps is possible to save all
intermediary results and how this is very helpful to debug possible problems.
Compiler, assembler and linker were then discussed. We have also seen
specfiles, readelf and other GCC internals.
Slides (HTML)
are available!
Handling the workflow of pkgsrc-security -- Leonardo Taccari (leot)
I discussed about the workflow of the pkgsrc Security Team (pkgsrc-security).
I gave a brief introduction to nmh
(new MH) message handling system.
Then talked about the mission, tasks and workflow of the pkgsrc-security.
For the last part of the talk, I tried to put everything together
and showed how to try to automate some part of the pkgsrc-security
with nmh and some shell scripting.
Slides (PDF)
are available!
Preaching for releng-pkgsrc -- Benny Siegert (bsiegert)
Benny discussed about pkgsrc Releng team (releng-pkgsrc).
The talk started discussing about the pkgsrc Quarterly Releases.
Since 2003Q4, every quarter a new pkgsrc release is released. Stable releases are
the basis for binary packages. Security, build and bug fixes get applied over
the liftime of the release via pullups, until the next quarterly release.
The release procedure and freeze period were also discussed.
Then we examined the life of a pullup. Benny first introduced what a
pullup is, the rules for requesting them and a practical example of how to file
a good pullup request.
Under the hood parts of releng were also discussed, for example how tickets are
handled with req, help script to ease the pullup, etc..
The talk concluded with the importance of releng-pkgsrc and also a call for
volunteers to join releng-pkgsrc! (despite they're really doing a great work,
at the moment there is a shortage of members in releng-pkgsrc, so, if you are
interested and would like to join them please get in touch with them!)
Something old, something new, something borrowed -- Sevan Janiyan (sevan)
Sevan discussed about the state of
NetBSD/macppc port.
Lot of improvements and news happened (a particular kudos to
macallan for doing an amazing work on the macppc port!)!
HEAD-llvm builds for macppc were added;
awacs(4)
Bluetooth support, IPsec support, Veriexec support are all enabled
by default now.
radeonfb(4) and XCOFF boot loader had several improvements and now DVI is
supported on the G4 Mac Mini.
The other big news in the macppc land is the G5 support that will
probably be interesting also for possible pkgsrc bulk builds.
Sevan also discussed about some current problems (and workarounds!), bulk builds
takes time, no modern browser with JavaScript support is easily available right
now but also how using macppc port helped to spot several bugs.
Then he discussed about Upspin (please also
give a look to the corresponding package in
wip/go-upspin!)
Slides (PDF)
are available!
Magit -- Christoph Badura (bad)
Christoph talk was a live introduction to Magit, a
Git interface for Emacs.
The talk started quoting James
Mickens It Was Never Going to Work, So
Let's Have Some Tea talk presented at
USENIX LISA15 when James
Mickens talked about an high level picture of how Git works.
We then saw how to clone a repository inside Magit, how to navigate the
commits, how to create a new branch, edit a file and look at unstaged changes,
stage just some hunks of a change and commit them and how to rebase them
(everything is just one or two keystrokes far!).
Post conf dinner
After the talks we had some burgers and beers together at
Spud Bencer.
We formed several groups to go there from c-base and I was actually in the
group that went there on foot so it was also a nice chance to sightsee Berlin
(thanks to khorben for being a very nice guide! :)).
Sunday (08/07): Hacking session
An introduction to Forth -- Valery Ushakov (uwe)
On Sunday morning Valery talked about
Forth
from the ground up.
We saw how to implement a Forth interpreter step by step and discussed
threaded code.
Unfortunately the talk was not recorded... However, if
you are curious I suggest taking a look to
nbuwe/forth BitBucket repository.
internals.txt file also contains a lot of interesting resources
about Forth.
Hacking session
After Valery talk there was the hacking session where we hacked on pkgsrc,
discussed together, etc..
Late in the afternoon some of us visited
Computerspielemuseum.
More than 50 years of computer games were covered there and it was fun to also
play to several historical and also more recent video games.
We then met again for a dinner together in
Potsdamer Platz.
Conclusion
pkgsrcCon 2018 was
really really great!
First of all I would like to thank all the pkgsrcCon organizers:
khorben and tm. It was very well organized and
everything went well, thank you Pierre and Thomas!
A big thank you also to
wiedi, just after few hours all the recordings of the talk were
shared and that's really impressive!
Thanks also to youri and
Gilberto for photographs.
Last, but not least, thanks to
The NetBSD Foundation for
supporting three developers to attend the conference.
c-base for kindly providing a very nice location
for the pkgsrcCon.
Our sponsors: Defora
Networks for sponsoring the t-shirts and badges for the conference and
SkyLime for sponsoring the catering on
Saturday.
Thank you!
[Less]
|
|
Posted
over 7 years
ago
by
Leonardo Taccari
On July 7th and 8th there was
pkgsrcCon 2018
in Berlin,
Germany. It was my first
pkgsrcCon
and it was really really nice... So, let's share a report about it,
what we have done, the talk presented and everything else!
Friday (06/07): Social Event
... [More]
I arrived by plane at
Berlin Tegel Airport
in the middle of the afternoon. TXL buses were pretty full but after waiting
for 3 of them, I was finally in the direction for
Berlin Hauptbahnhof
(nice thing about the buses is that after many are getting
too full they start to arrive minute after minute!)
and then took the S7 for Berlin
Jannowitzbrücke station, just a couple of minutes on foot to
republik-berlin (for the Friday
social event).
On 18:00 we met in
republik-berlin for the social event.
We had good burgers there and one^Wtwo^Wsome beers
together!
The place were a bit noisy for the Belgium vs Brazil World Cup match, but we
still had nice discussions together (and also without losing a lot of
people cheering on! :))
There was also a table tennis table and spz, maya,
youri and myself played (I'm a terrible table tennis player
but it was very funny to play the wild west without any rules! :)).
Saturday (07/07): Talks session
Meet & Greet -- Pierre Pronchery (khorben), Thomas Merkel (tm)
Pierre and Thomas welcomed us (aliens! :)) in
c-base. c-base is a space station under
Berlin (or probably one of the oldest hackerspace, at least old enough that the
word "hackerspace" even didn't existed!).
Slides
(PDF) are available!
Keynote: Beautiful Open Source -- Hugo Teso
Hugo talked about his experience as an open source developer and focused in
particular how important is the user interface.
He discussed that examinating some projects he worked on: Inguma, Bokken,
Iaitö and Cutter
extracting patterns about his experience.
Slides
(PDF) are available!
The state of desktops in pkgsrc -- Youri Mouton (youri)
Youri discussed about the state of desktop environments (DE) in pkgsrc starting
with xfce,
MATE,
LXDE,
KDE and
Defora.
He then discussed about the WIP desktop environments:
Cinnamon,
LXQT,
Gnome 3 and
CDE, hardware support and
login managers.
Especially for the WIP desktop environments help is more than welcomed so if
you're interested in any of that, would like to help (that's also a great way to
start involved in pkgsrc!) please get in touch with youri and/or
give a look at the wip/*/TODO files in pkgsrc-wip!
NetBSD & Mercurial: One year later -- Jörg Sonnenberger (joerg)
Jörg started discussing about Git (citing
High-level Problems with Git and How to Fix Them - Gregory Szorc) and then
discussed on why using Mercurial.
Then he announced the latest changes: hgmaster.NetBSD.org and
anonhg.NetBSD.org that permits to experiment with Mercurial and
source-changes-hg@ and pkgsrc-changes-hg@ mailing
lists.
The talk ended describing missing/TODO steps.
Slides (HTML) are available!
Maintaining qmail in 2018 -- Amitai Schleier (schmonz)
Amitai shared his long experience in maintaining
qmail.
A lot of lesson learned in doing that were shared and it was also funny to see
that at a certain point from MAINTAINER he was more and more involved doing that
and ending up writing patches and tools for qmail.
Slides (HTML)
are available!
A beginner's introduction to GCC -- Maya Rashish (maya)
Maya discussed about GCC. First she talked about an overview of the toolchain
(in general) and the corresponding GCC projects, how to pass flags to each of
them and how to stop the compilation process for each of them.
Then she talked about the black magic that happens in preprocessor, for example, what
a program does an #include and why
__NetBSD__ is defined.
We then saw that with -save-temps is possible to save all
intermediary results and how this is very helpful to debug possible problems.
Compiler, assembler and linker were then discussed. We have also seen
specfiles, readelf and other GCC internals.
Slides (HTML)
are available!
Handling the workflow of pkgsrc-security -- Leonardo Taccari (leot)
I discussed about the workflow of the pkgsrc Security Team (pkgsrc-security).
I gave a brief introduction to nmh
(new MH) message handling system.
Then talked about the mission, tasks and workflow of the pkgsrc-security.
For the last part of the talk, I tried to put everything together
and showed how to try to automate some part of the pkgsrc-security
with nmh and some shell scripting.
Slides (PDF)
are available!
Preaching for releng-pkgsrc -- Benny Siegert (bsiegert)
Benny discussed about pkgsrc Releng team (releng-pkgsrc).
The talk started discussing about the pkgsrc Quarterly Releases.
Since 2003Q4, every quarter a new pkgsrc release is released. Stable releases are
the basis for binary packages. Security, build and bug fixes get applied over
the liftime of the release via pullups, until the next quarterly release.
The release procedure and freeze period were also discussed.
Then we examined the life of a pullup. Benny first introduced what a
pullup is, the rules for requesting them and a practical example of how to file
a good pullup request.
Under the hood parts of releng were also discussed, for example how tickets are
handled with req, help script to ease the pullup, etc..
The talk concluded with the importance of releng-pkgsrc and also a call for
volunteers to join releng-pkgsrc! (despite they're really doing a great work,
at the moment there is a shortage of members in releng-pkgsrc, so, if you are
interested and would like to join them please get in touch with them!)
Something old, something new, something borrowed -- Sevan Janiyan (sevan)
Sevan discussed about the state of
NetBSD/macppc port.
Lot of improvements and news happened (a particular kudos to
macallan for doing an amazing work on the macppc port!)!
HEAD-llvm builds for macppc were added;
awacs(4)
Bluetooth support, IPsec support, Veriexec support are all enabled
by default now.
radeonfb(4) and XCOFF boot loader had several improvements and now DVI is
supported on the G4 Mac Mini.
The other big news in the macppc land is the G5 support that will
probably be interesting also for possible pkgsrc bulk builds.
Sevan also discussed about some current problems (and workarounds!), bulk builds
takes time, no modern browser with JavaScript support is easily available right
now but also how using macppc port helped to spot several bugs.
Then he discussed about Upspin (please also
give a look to the corresponding package in
wip/go-upspin!)
Slides (PDF)
are available!
Magit -- Christoph Badura (bad)
Christoph talk was a live introduction to Magit, a
Git interface for Emacs.
The talk started quoting James
Mickens It Was Never Going to Work, So
Let's Have Some Tea talk presented at
USENIX LISA15 when James
Mickens talked about when Git goes bad.
We then saw how to clone a repository inside Magit, how to navigate the
commits, how to create a new branch, edit a file and look at unstaged changes,
stage just some hunks of a change and commit them and how to rebase them
(everything is just one or two keystrokes far!).
Post conf dinner
After the talks we had some burgers and beers together at
Spud Bencer.
We formed several groups to go there from c-base and I was actually in the
group that went there on foot so it was also a nice chance to sightsee Berlin
(thanks to khorben for being a very nice guide! :)).
Sunday (08/07): Hacking session
An introduction to Forth -- Valery Ushakov (uwe)
On Sunday morning Valery talked about
Forth
from the ground up.
We saw how to implement a Forth interpreter step by step and discussed
threaded code.
Unfortunately the talk was not recorded... However, if
you are curious I suggest taking a look to
nbuwe/forth BitBucket repository.
internals.txt file also contains a lot of interesting resources
about Forth.
Hacking session
After Valery talk there was the hacking session where we hacked on pkgsrc,
discussed together, etc..
Late in the afternoon some of us visited
Computerspielemuseum.
More than 50 years of computer games were covered there and it was fun to also
play to several historical and also more recent video games.
We then met again for a dinner together in
Potsdamer Platz.
Conclusion
pkgsrcCon 2018 was
really really great!
First of all I would like to thank all the pkgsrcCon organizers:
khorben and tm. It was very well organized and
everything went well, thank you Pierre and Thomas!
A big thank you also to
wiedi, just after few hours all the recordings of the talk were
shared and that's really impressive!
Thanks also to youri and
Gilberto for photographs.
Last, but not least, thanks to
The NetBSD Foundation for
supporting three developers to attend the conference.
c-base for kindly providing a very nice location
for the pkgsrcCon.
Our sponsors: Defora
Networks for sponsoring the t-shirts and badges for the conference and
SkyLime for sponsoring the catering on
Saturday.
Thank you!
[Less]
|
|
Posted
over 7 years
ago
by
Kamil Rytarowski
Prepared by Yang Zheng (tomsun.0.7 AT Gmail DOT com) as part of GSoC 2018
This is the second part of the project
of integrating
libFuzzer for the userland applications, you can
learn about the first part of this project
in this
post.
After the
... [More]
preparation of the first part, I started to fuzz the
userland programs with the libFuzzer. The programs we
chose are five:
expr(1)
sed(1)
sh(1)
file(1)
ping(8)
After we fuzzed them with libFuzzer, we also tried
other fuzzers,
i.e.: American Fuzzy
Lop
(AFL), honggfuzz
and Radamsa.
Fuzz Userland Programs with libFuzzer
"LLVM Logo" by
Teresa Chang / All Right Retained by Apple
In this section, I'll introduce how to fuzz the five programs
with libFuzzer. The libFuzzer is an
in-process, coverage-guided fuzzing engine. It can provide some
interfaces to be implemented by the users:
LLVMFuzzerTestOneInput: fuzzing target
LLVMFuzzerInitialize: initialization function to
access argc and argv
LLVMFuzzerCustomMutator: user-provided custom mutator
LLVMFuzzerCustomCrossOver: user-provided custom cross-over function
In the above functions, only the LLVMFuzzerTestOneInput
is necessary to be implemented for any fuzzing programs. This function
takes a buffer and the buffer length as input, it is the target to be
fuzzed again and again. When the users want to finish some
initialization job with argc and argv
parameters, they also need to
implement LLVMFuzzerInitialize. With LLVMFuzzerCustomMutator
and LLVMFuzzerCustomCrossOver, the users can also change
the behaviors of producing input buffer with one or two old input
buffers. For more details, you can refer
to this document.
Fuzz Userland Programs with Sanitizers
libFuzzer can be used with different sanitizers. It is
quite simple to use sanitizers together with libFuzzer,
you just need to add sanitizer names to the option
like -fsanitize=fuzzer,address,undefined. However,
memory sanitizer seems to be an exception. When we tried
to use it together with libFuzzer, we got some runtime
errors. The official
document has mentioned that "using MemorySanitizer (MSAN) with
libFuzzer is possible too, but tricky", but it doesn't mention how to
use it properly.
In the following part of this article, you can assume that we have
used the address and undefined sanitizers
together with fuzzers if there is no explicit description.
Fuzz expr(1) with libFuzzer
The expr(1) takes some parameters from the command line as
input and then treat the command line as a whole expression to be
calculated. A example usage of the expr(1) would be
like this:
$ expr 1 + 1
2
This program is relatively easy to fuzz, what we only to do is
transform the original main function to the form
of LLVMFuzzerTestOneInput. Since the implementation of
the parser in expr(1) takes the argc
and argv parameters as input, we need to transform the
buffer provided by the LLVMFuzzerTestOneInput to the
format needed by the parser. In the implementation, I assume the
buffer is composed of several strings separated by the space
characters (i.e.: ' ', '\t'
and '\n'). Then, we can split the buffer into different
strings and organize them into the form of argc
and argv parameters.
However, there comes the first problem when I start to
fuzz expr(1) with this modification. Since
the libFuzzer will treat
every exit
as an error while fuzzing, there will be a lot of false
positives. Fortunately, the implementation of expr(1)
is simple, so we only need to replace the exit(3)
with the return statement. In the fuzzing process of other
programs, I'll introduce how to handle the exit(3)
and other error handling interfaces elegantly.
You can also pass the fuzzing dictionary file (to provide keywords)
and initial input cases to the libFuzzer, so that it
can produce test cases more smartly. For expr(1), the
dictionary file will be like this:
min="-9223372036854775808"
max="9223372036854775807"
zero="0"
one="1"
negone="-1"
div="/"
mod="%"
add="+"
sub="-"
or="|"
add="&"
And there is only one initial test case:
1 / 2
With this setting, we can quickly reproduce an existing bug which
has been fixed by Kamil
Rytarowski in
this patch, that is, when you try to feed one of
-9223372036854775808 / -1 or -9223372036854775808
% -1 expressions to expr(1), you will get
a SIGFPE. After adopting the fix of this bug, it also
detected a bug of integer overflow by feeding expr(1)
with 9223372036854775807 * -3. This bug is detected
with the help of undefined sanitizer
(UBSan). This has been fixed
in this
commit. The fuzzing of expr(1) can be reproduced
with this
script.
Fuzz sed(1) with libFuzzer
The sed(1) reads from files or standard input
(stdin) and modifying the input as specified by a list
of commands. It is more complicated than the expr(1) to
be fuzzed as it can receive input from several sources including
command line parameters (commands), standard input (text to be
operated on) and files (both commands and text). After reading the
source code of
sed(1), I have two findings:
The commands are added by the add_compunit
function
The input files (including standard input) are organized by
the s_flist structure and the mf_fgets
function
With these observations, we can manually parse
the libFuzzer buffer with the interfaces above. So I
organized the buffer as below:
command #1
command #2
...
command #N
// an empty line
text strings
The first several lines are the commands, one line for one
command. Then there will be an empty line to identify the end of
command lists. At last, the remaining part of this buffer is the
text to be operated on. After parsing the buffer like this, we can
add the commands one by one with the add_compunit
interface. For the text, since we can directly get the whole text
buffer as the format of a buffer, I re-implement
the mf_fgets interface to get the input directly from
the buffer provided by the libFuzzer.
As mentioned before in the fuzzing
of expr(1), exit(3) will result in false
positives with libFuzzer. Replacing
the exit(3) with return statement can
solve this problem in expr(1), but it will not work
in sed(1) due to the deeper function call
stack. The exit(3) interface is usually used to handle
the unexpected cases in the programs. So, it will be a good idea to
replace it with exceptions. Unfortunately, the programs we fuzzed
are all implemented in C language instead
of C++. Finally, I choose to
use setjmp/longjmp
interfaces to handle it: use the setjmp interface to
define an exit point in the LLVMFuzzerTestOneInput
function, and use longjmp to jmp to this point whenever
the original implementation wants to call exit(3).
The dictionary file for it is like this:
newline="\x0A"
"a\\\"
"b"
"c\\\"
"d"
"D"
"g"
"G"
"h"
"H"
"i\\\"
"l"
"n"
"N"
"p"
"P"
"q"
"t"
"x"
"y"
"!"
":"
"="
"#"
"/"
And here is an initial test case:
s/hello/hi/g
hello, world!
which means replacing the "hello"
into "hi" in the text of "hello,
world!". The fuzzing script of sed(1) can be
found here.
Fuzz sh(1) with libFuzzer
sh(1) is the standard command interpreter for the
system. I choose the evalstring function as the fuzzing
entry for sh(1). This function takes a string as the
commands to be executed, so we can directly pass
the libFuzzer input buffer to this function to start
fuzzing. The dictionary file we used is like this:
"echo"
"ls"
"cat"
"hostname"
"test"
"["
"]"
We can also add some other commands and shell script syntax to this
file to reproduce other conditions. And also an initial test case is
provided:
echo "hello, world!"
You can also reproduce the fuzzing of sh(1)
by this
script.
Fuzz file(1) with libFuzzer
The fuzzing of file has been done by Christos Zoulas
in this
project. The difference between this program and other programs
from the list is that the main functionality is provided by
the libmagic library. As a result, we can directly fuzz
the important functions (e.g.: magic_buffer) from this
library.
Fuzz ping(8) with libFuzzer
The ping(8) is quite different from all of the programs
mentioned above, the main input source is from the network instead of
the command line, standard input or files. This challenges us a lot
because we usually use the socket interface to receive
network data and thus more complex to transform a single buffer into
the socket model.
Fortunately, the ping(8) organizes all the network
interfaces as the form of hooks to be registered in a structure. So
I re-implement all these necessary interfaces
(including socket(2), recvfrom(2), sendto(2), poll(2)
and etc.) for ping(8).These re-implemented interfaces
will take the data from the libFuzzer buffer and
transform it into the data to be accessed by the network
interfaces. After that, then we can use libFuzzer to
fuzz the network data for ping(8). The script to
reproduce can be
found here.
Fuzz Userland Programs with Other Fuzzers
To compare libFuzzer with other fuzzers from different
aspects, including the effort to modify, performance and
functionalities, we also fuzzed these five programs
with AFL, honggfuzz
and radamsa.
Fuzz Programs with AFL and honggfuzz
The AFL and honggfuzz can fuzz the input
from standard input and file. They both provide specific compilers
(such
as afl-cc, afl-clang, hfuzz-cc, hfuzz-clang
and etc.) to fuzz programs with coverage information. So, the basic
process to fuzz programs with them is to:
Use the specific compilers to compile programs with necessary
sanitizers
Run the fuzzed programs with proper command line
parameters
For detailed parameters, you can refer to the scripts
for expr(1), sed(1), sh(1), file(1)
and ping(8).
"Miniature
Lop" (A kind of fuzzy lop) from Wikipedia
/ CC BY-SA
3.0
There is no need to do any modification to
fuzz sed(1), sh(1)
and file(1) with AFL
and honggfuzz, because these programs mainly get input
from standard input or files. But this doesn't mean that they can
achieve the same functionalities as libFuzzer. For
example, to fuzz the sed(1), you may also need to pass
the commands in the command line parameters. This means that you
need to manually specify the commands in the command line and you
cannot fuzz them with AFL and honggfuzz,
because they can only fuzz input from standard input and
files. There is an option of reusing the modifications from the
fuzzing process with libFuzzer, but we need to further
add a main function for the fuzzed program.
"Höngg"
(A quarter in district 10 in Zürich)
by Ikiwaner
/ CC BY-SA
3.0
For expr(1) and ping(8), we even need
more modifications than the libFuzzer solution, because
expr(1) mainly gets input from command line parameters
and ping(8) mainly gets input from the network.
During this period, I have also prepared a package to install
honggfuzz for the pkgsrc-wip
repository. To make it compatible with NetBSD, we have also
contributed to improving the code in the official repository, for more
details, you can refer to
this pull
request.
Fuzz Programs with Radamsa
Radamsa is a test case generator, it works by reading
sample files and generating different interesting
outputs. Radamsa is not dependant on the fuzzed programs,
it is only dependant on the input sample, which means it will not
record the coverage information.
"The
Moomins" ("Radamsa" is a word spoken by a creature in Moomins)
from the comic book cover
by Tove
Jansson
With Radamsa, we can use scripts to fuzz different
programs with different input sources. For
the expr(1),
we can generate the mutated string and store it to a variable in the
shell script and then feed it to the expr(1) in
command line parameters. For
the sed(1),
we can generate both command strings and text
by Radamsa and then feed them by command line
parameters and file separately. For
both sh(1)
and file(1),
we can generate the needed input file by Radamsa in the
shell scripts.
It seems that the shell script and Radamsa combination
can fuzz any kinds of programs, but it encounters some problems
with ping(8). Although Radamsa supports
generating input cases as a network server or client, it doesn't
support the ICMP protocol. This means that we can not
fuzz ping(8) with modifications or help from other
applications.
Comparison Among Different Fuzzers
In this project, we have tried four different
fuzzers: libFuzzer, AFL, honggfuzz
and Radamsa. In this section, I will introduce a
comparison from different aspects.
Modification of Fuzzing
For the programs we mentioned above, here I list the lines of code
we need to modify as a factor of porting difficulties:
expr(1)
sed(1)
sh(1)
file(1)
ping(8)
libFuzzer
128
96
60
48
582
AFL/honggfuzz
142
0
0
0
590
Radamsa
0
0
0
0
N/A
As mentioned before, the libFuzzer needs to modify more
lines for programs who mainly get input from standard input and
files. However, for other programs (i.e.: expr(1)
and ping(8)), the AFL
and honggfuzz need to add more lines of code to get
input from these sources. As for Radamsa, since it only
needs the sample input data to generate outputs, it can fuzz all
programs without modifications except ping(8).
Binary Sizes
The binary sizes for these fuzzers should also be considered if we
want to ship them with NetBSD. The following binary sizes are based
on the NetBSD-current with the nearly newest LLVM
(compiled from source) as an external toolchain:
Dependency
Compilers
Fuzzer
Tools
Total
libFuzzer
0
56MB
N/A
0
56MB
AFL
0
24KB
292KB
152KB
468KB
honggfuzz
36KB
840KB
124KB
0
1000KB
Radamsa
588KB
0
608KB
0
1196KB
The above table shows the space needed to install different
fuzzers. The "Dependency" column shows the size of dependant
library; the "Compilers" column shows the size of compilers used for
re-compiling fuzzed programs; the "Fruzzer" column shows the size of
fuzzer itself and the "Tools" column shows the size of analysis
tools.
For the libFuzzer, if the system has already included
the LLVM together with compiler-rt as the
toolchain, we don't need extra space to import it. The fuzzer
of libFuzzer is compiled together with the user's
program, so the size is not counted. The compiler size shown above
in this table is the size of statically compiled
compiler clang. If we compile it dynamically, then
there will be a plenty of dependant libraries should be considered.
For the AFL, there is no dependant library
except libc, so the size is zero. It will also
introduce some tools
like afl-analyze, afl-cmin and
etc. The honggfuzz is dependant on
the libBlocksRuntime library whose size
is 36KB. This library is also included in
the compiler-rt of LLVM. So, if you have
already installed it, this size can be ignored. As for
the Radamsa, it needs
the Owl Lisp
during the building process. So the size of the dependency is the
size of Owl Lisp interpreter.
Compiler Compatibility
All these fuzzers except libFuzzer are compatible with
both GCC and clang. The AFL
and honggfuzz provide a wrapper for the native compiler,
and the Radamsa does not care about the compilers. As
for the libFuzzer, it is implemented in
the compiler-rt of LLVM, so it cannot
support the GCC compiler.
Support for Sanitizers
All these fuzzers can work together with sanitizers, but only
the libFuzzer can provide a relatively strong guarantee
that it can provide them. The AFL
and honggfuzz, as I mentioned above, provide some
wrappers for the underlying compiler. This means that it is
dependant on the native compiler to decide whether they can fuzz the
programs with the support of sanitizers. The Radamsa
can only fuzz the binary directly, so the programs should be
compiled with the sanitizers first. However, since the sanitizers
are in the compiler-rt together
with libFuzzer, you can directly add some flags of
sanitizers while compiling the fuzzed programs.
Performance
At last, you may wonder how fast are those fuzzers to find an
existing bug. For the above programs we have fuzzed in NetBSD,
only libFuzzer can find two bugs for
the expr(1). However, we cannot assert that
the libFuzzer performs well than others. To further
evaluate the performance of different fuzzers we have used, I choose
some simple functions with bugs to measure how fast they can find
them out. Here is a table to show the time for them to find the
first bug:
libFuzzer
AFL
honggfuzz
Radamsa
DivTest+S
<1s
7s
1s
7s
DivTest
>10min
>10min
2s
>10min
SimpleTest+S
<1s
>10min
1s
>10min
SimpleTest
<1s
>10min
1s
>10min
CxxStringEqTest+S
<1s
>10min
2s
>10min
CxxStringEqTest
>10min
>10min
2s
>10min
CounterTest+S
1s
5min
1s
7min
CounterTest
1s
4min
1s
7min
SimpleHashTest+S
<1s
3s
1s
2s
The "+S" symbol means the version with sanitizers (in this
evaluation, I used address and undefined
sanitizers). In this table, we can observe
that libFuzzer and honggfuzz perform
better than others in most cases. And another point is that
fuzzers can work better with sanitizers. For example, in the case
of DivTest, the primary goal of this test is to
trigger a "divide-by-zero" error, however, when working with
the undefined sanitizer, all these fuzzers will
trigger the "integer overflow" error more quickly. I only present
a part of the interesting results of this evaluation here. You can
refer to
this
script to reproduce some results or do more evaluation by
yourself.
Summary
In the past one month, I mainly contributed to:
Porting the libFuzzer to NetBSD
Preparing a pkgsrc-wip package
for honggfuzz
Fuzzing some userland programs with libFuzzer and other three
different fuzzers
Evaluating different fuzzers from different aspects
Regarding the third contribution, I tried to use different methods
to handle them according to their features. During this period, I
have fortunately found two bugs for the expr(1).
I'd like to thank my mentor Kamil Rytarowski and Christos Zoulas for
their suggestions and proposals. I also want to thank Kamil
Frankowicz for his advice on fuzzing and playing
with AFL. At last, thanks to Google and the NetBSD
community for giving me a good opportunity to work on this project.
[Less]
|
|
Posted
over 7 years
ago
by
Kamil Rytarowski
Prepared by Yang Zheng (tomsun.0.7 AT Gmail DOT com) as part of GSoC 2018
This is the second part of the project
of integrating
libFuzzer for the userland applications, you can
learn about the first part of this project
in this
post.
After the
... [More]
preparation of the first part, I started to fuzz the
userland programs with the libFuzzer. The programs we
chose are five:
expr(1)
sed(1)
sh(1)
file(1)
ping(8)
After we fuzzed them with libFuzzer, we also tried
other fuzzers,
i.e.: American Fuzzy
Lop
(AFL), honggfuzz
and Radamsa.
Fuzz Userland Programs with libFuzzer
"LLVM Logo" by
Teresa Chang / All Right Retained by Apple
In this section, I'll introduce how to fuzz the five programs
with libFuzzer. The libFuzzer is an
in-process, coverage-guided fuzzing engine. It can provide some
interfaces to be implemented by the users:
LLVMFuzzerTestOneInput: fuzzing target
LLVMFuzzerInitialize: initialization function to
access argc and argv
LLVMFuzzerCustomMutator: user-provided custom mutator
LLVMFuzzerCustomCrossOver: user-provided custom cross-over function
In the above functions, only the LLVMFuzzerTestOneInput
is necessary to be implemented for any fuzzing programs. This function
takes a buffer and the buffer length as input, it is the target to be
fuzzed again and again. When the users want to finish some
initialization job with argc and argv
parameters, they also need to
implement LLVMFuzzerInitialize. With LLVMFuzzerCustomMutator
and LLVMFuzzerCustomCrossOver, the users can also change
the behaviors of producing input buffer with one or two old input
buffers. For more details, you can refer
to this document.
Fuzz Userland Programs with Sanitizers
libFuzzer can be used with different sanitizers. It is
quite simple to use sanitizers together with libFuzzer,
you just need to add sanitizer names to the option
like -fsanitize=fuzzer,address,undefined. However,
memory sanitizer seems to be an exception. When we tried
to use it together with libFuzzer, we got some runtime
errors. The official
document has mentioned that "using MemorySanitizer (MSAN) with
libFuzzer is possible too, but tricky", but it doesn't mention how to
use it properly.
In the following part of this article, you can assume that we have
used the address and undefined sanitizers
together with fuzzers if there is no explicit description.
Fuzz expr(1) with libFuzzer
The expr(1) takes some parameters from the command line as
input and then treat the command line as a whole expression to be
calculated. A example usage of the expr(1) would be
like this:
$ expr 1 + 1
2
This program is relatively easy to fuzz, what we only to do is
transform the original main function to the form
of LLVMFuzzerTestOneInput. Since the implementation of
the parser in expr(1) takes the argc
and argv parameters as input, we need to transform the
buffer provided by the LLVMFuzzerTestOneInput to the
format needed by the parser. In the implementation, I assume the
buffer is composed of several strings separated by the space
characters (i.e.: ' ', '\t'
and '\n'). Then, we can split the buffer into different
strings and organize them into the form of argc
and argv parameters.
However, there comes the first problem when I start to
fuzz expr(1) with this modification. Since
the libFuzzer will treat
every exit
as an error while fuzzing, there will be a lot of false
positives. Fortunately, the implementation of expr(1)
is simple, so we only need to replace the exit(3)
with the return statement. In the fuzzing process of other
programs, I'll introduce how to handle the exit(3)
and other error handling interfaces elegantly.
You can also pass the fuzzing dictionary file (to provide keywords)
and initial input cases to the libFuzzer, so that it
can produce test cases more smartly. For expr(1), the
dictionary file will be like this:
min="-9223372036854775808"
max="9223372036854775807"
zero="0"
one="1"
negone="-1"
div="/"
mod="%"
add="+"
sub="-"
or="|"
add="&"
And there is only one initial test case:
1 / 2
With this setting, we can quickly reproduce an existing bug which
has been fixed by Kamil
Rytarowski in
this patch, that is, when you try to feed one of
-9223372036854775808 / -1 or -9223372036854775808
% -1 expressions to expr(1), you will get
a SIGFPE. After adopting the fix of this bug, it also
detected a bug of integer overflow by feeding expr(1)
with 9223372036854775807 * -3. This bug is detected
with the help of undefined sanitizer
(UBSan). This has been fixed
in this
commit. The fuzzing of expr(1) can be reproduced
with this
script.
Fuzz sed(1) with libFuzzer
The sed(1) reads from files or standard input
(stdin) and modifying the input as specified by a list
of commands. It is more complicated than the expr(1) to
be fuzzed as it can receive input from several sources including
command line parameters (commands), standard input (text to be
operated on) and files (both commands and text). After reading the
source code of
sed(1), I have two findings:
The commands are added by the add_compunit
function
The input files (including standard input) are organized by
the s_flist structure and the mf_fgets
function
With these observations, we can manually parse
the libFuzzer buffer with the interfaces above. So I
organized the buffer as below:
command #1
command #2
...
command #N
// an empty line
text strings
The first several lines are the commands, one line for one
command. Then there will be an empty line to identify the end of
command lists. At last, the remaining part of this buffer is the
text to be operated on. After parsing the buffer like this, we can
add the commands one by one with the add_compunit
interface. For the text, since we can directly get the whole text
buffer as the format of a buffer, I re-implement
the mf_fgets interface to get the input directly from
the buffer provided by the libFuzzer.
As mentioned before in the fuzzing
of expr(1), exit(3) will result in false
positives with libFuzzer. Replacing
the exit(3) with return statement can
solve this problem in expr(1), but it will not work
in sed(1) due to the deeper function call
stack. The exit(3) interface is usually used to handle
the unexpected cases in the programs. So, it will be a good idea to
replace it with exceptions. Unfortunately, the programs we fuzzed
are all implemented in C language instead
of C++. Finally, I choose to
use setjmp/longjmp
interfaces to handle it: use the setjmp interface to
define an exit point in the LLVMFuzzerTestOneInput
function, and use longjmp to jmp to this point whenever
the original implementation wants to call exit(3).
The dictionary file for it is like this:
newline="\x0A"
"a\\\"
"b"
"c\\\"
"d"
"D"
"g"
"G"
"h"
"H"
"i\\\"
"l"
"n"
"N"
"p"
"P"
"q"
"t"
"x"
"y"
"!"
":"
"="
"#"
"/"
And here is an initial test case:
s/hello/hi/g
hello, world!
which means replacing the "hello"
into "hi" in the text of "hello,
world!". The fuzzing script of sed(1) can be
found here.
Fuzz sh(1) with libFuzzer
sh(1) is the standard command interpreter for the
system. I choose the evalstring function as the fuzzing
entry for sh(1). This function takes a string as the
commands to be executed, so we can directly pass
the libFuzzer input buffer to this function to start
fuzzing. The dictionary file we used is like this:
"echo"
"ls"
"cat"
"hostname"
"test"
"["
"]"
We can also add some other commands and shell script syntax to this
file to reproduce other conditions. And also an initial test case is
provided:
echo "hello, world!"
You can also reproduce the fuzzing of sh(1)
by this
script.
Fuzz file(1) with libFuzzer
The fuzzing of file has been done by Christos Zoulas
in this
project. The difference between this program and other programs
from the list is that the main functionality is provided by
the libmagic library. As a result, we can directly fuzz
the important functions (e.g.: magic_buffer) from this
library.
Fuzz ping(8) with libFuzzer
The ping(8) is quite different from all of the programs
mentioned above, the main input source is from the network instead of
the command line, standard input or files. This challenges us a lot
because we usually use the socket interface to receive
network data and thus more complex to transform a single buffer into
the socket model.
Fortunately, the ping(8) organizes all the network
interfaces as the form of hooks to be registered in a structure. So
I re-implement all these necessary interfaces
(including socket(2), recvfrom(2), sendto(2), poll(2)
and etc.) for ping(8).These re-implemented interfaces
will take the data from the libFuzzer buffer and
transform it into the data to be accessed by the network
interfaces. After that, then we can use libFuzzer to
fuzz the network data for ping(8). The script to
reproduce can be
found here.
Fuzz Userland Programs with Other Fuzzers
To compare libFuzzer with other fuzzers from different
aspects, including the effort to modify, performance and
functionalities, we also fuzzed these five programs
with AFL, honggfuzz
and radamsa.
Fuzz Programs with AFL and honggfuzz
The AFL and honggfuzz can fuzz the input
from standard input and file. They both provide specific compilers
(such
as afl-cc, afl-clang, hfuzz-cc, hfuzz-clang
and etc.) to fuzz programs with coverage information. So, the basic
process to fuzz programs with them is to:
Use the specific compilers to compile programs with necessary
sanitizers
Run the fuzzed programs with proper command line
parameters
For detailed parameters, you can refer to the scripts
for expr(1), sed(1), sh(1), file(1)
and ping(8).
"Miniature
Lop" (A kind of fuzzy lop) from Wikipedia
/ CC BY-SA
3.0
There is no need to do any modification to
fuzz sed(1), sh(1)
and file(1) with AFL
and honggfuzz, because these programs mainly get input
from standard input or files. But this doesn't mean that they can
achieve the same functionalities as libFuzzer. For
example, to fuzz the sed(1), you may also need to pass
the commands in the command line parameters. This means that you
need to manually specify the commands in the command line and you
cannot fuzz them with AFL and honggfuzz,
because they can only fuzz input from standard input and
files. There is an option of reusing the modifications from the
fuzzing process with libFuzzer, but we need to further
add a main function for the fuzzed program.
"Höngg"
(A quarter in district 10 in Zürich)
by Ikiwaner
/ CC BY-SA
3.0
For expr(1) and ping(8), we even need
more modifications than the libFuzzer solution, because
expr(1) mainly gets input from command line parameters
and ping(8) mainly gets input from the network.
During this period, I have also prepared a package to install
honggfuzz for the pkgsrc-wip
repository. To make it compatible with NetBSD, we have also
contributed to improving the code in the official repository, for more
details, you can refer to
this pull
request.
Fuzz Programs with Radamsa
Radamsa is a test case generator, it works by reading
sample files and generating different interesting
outputs. Radamsa is not dependant on the fuzzed programs,
it is only dependant on the input sample, which means it will not
record the coverage information.
"The
Moomins" ("Radamsa" is a word spoken by a creature in Moomins)
from the comic book cover
by Tove
Jansson
With Radamsa, we can use scripts to fuzz different
programs with different input sources. For
the expr(1),
we can generate the mutated string and store it to a variable in the
shell script and then feed it to the expr(1) in
command line parameters. For
the sed(1),
we can generate both command strings and text
by Radamsa and then feed them by command line
parameters and file separately. For
both sh(1)
and file(1),
we can generate the needed input file by Radamsa in the
shell scripts.
It seems that the shell script and Radamsa combination
can fuzz any kinds of programs, but it encounters some problems
with ping(8). Although Radamsa supports
generating input cases as a network server or client, it doesn't
support the ICMP protocol. This means that we can not
fuzz ping(8) with modifications or help from other
applications.
Comparison Among Different Fuzzers
In this project, we have tried four different
fuzzers: libFuzzer, AFL, honggfuzz
and Radamsa. In this section, I will introduce a
comparison from different aspects.
Modification of Fuzzing
For the programs we mentioned above, here I list the lines of code
we need to modify as a factor of porting difficulties:
expr(1)
sed(1)
sh(1)
file(1)
ping(8)
libFuzzer
128
96
60
48
582
AFL/honggfuzz
142
0
0
0
590
Radamsa
0
0
0
0
N/A
As mentioned before, the libFuzzer needs to modify more
lines for programs who mainly get input from standard input and
files. However, for other programs (i.e.: expr(1)
and ping(8)), the AFL
and honggfuzz need to add more lines of code to get
input from these sources. As for Radamsa, since it only
needs the sample input data to generate outputs, it can fuzz all
programs without modifications except ping(8).
Binary Sizes
The binary sizes for these fuzzers should also be considered if we
want to ship them with NetBSD. The following binary sizes are based
on the NetBSD-current with the nearly newest LLVM
(compiled from source) as an external toolchain:
Dependency
Compilers
Fuzzer
Tools
Total
libFuzzer
0
56MB
N/A
0
56MB
AFL
0
24KB
292KB
152KB
468KB
honggfuzz
36KB
840KB
124KB
0
1000KB
Radamsa
588KB
0
608KB
0
1196KB
The above table shows the space needed to install different
fuzzers. The "Dependency" column shows the size of dependant
library; the "Compilers" column shows the size of compilers used for
re-compiling fuzzed programs; the "Fruzzer" column shows the size of
fuzzer itself and the "Tools" column shows the size of analysis
tools.
For the libFuzzer, if the system has already included
the LLVM together with compiler-rt as the
toolchain, we don't need extra space to import it. The fuzzer
of libFuzzer is compiled together with the user's
program, so the size is not counted. The compiler size shown above
in this table is the size of statically compiled
compiler clang. If we compile it dynamically, then
there will be a plenty of dependant libraries should be considered.
For the AFL, there is no dependant library
except libc, so the size is zero. It will also
introduce some tools
like afl-analyze, afl-cmin and
etc. The honggfuzz is dependant on
the libBlocksRuntime library whose size
is 36KB. This library is also included in
the compiler-rt of LLVM. So, if you have
already installed it, this size can be ignored. As for
the Radamsa, it needs
the Owl Lisp
during the building process. So the size of the dependency is the
size of Owl Lisp interpreter.
Compiler Compatibility
All these fuzzers except libFuzzer are compatible with
both GCC and clang. The AFL
and honggfuzz provide a wrapper for the native compiler,
and the Radamsa does not care about the compilers. As
for the libFuzzer, it is implemented in
the compiler-rt of LLVM, so it cannot
support the GCC compiler.
Support for Sanitizers
All these fuzzers can work together with sanitizers, but only
the libFuzzer can provide a relatively strong guarantee
that it can provide them. The AFL
and honggfuzz, as I mentioned above, provide some
wrappers for the underlying compiler. This means that it is
dependant on the native compiler to decide whether they can fuzz the
programs with the support of sanitizers. The Radamsa
can only fuzz the binary directly, so the programs should be
compiled with the sanitizers first. However, since the sanitizers
are in the compiler-rt together
with libFuzzer, you can directly add some flags of
sanitizers while compiling the fuzzed programs.
Performance
At last, you may wonder how fast are those fuzzers to find an
existing bug. For the above programs we have fuzzed in NetBSD,
only libFuzzer can find two bugs for
the expr(1). However, we cannot assert that
the libFuzzer performs well than others. To further
evaluate the performance of different fuzzers we have used, I choose
some simple functions with bugs to measure how fast they can find
them out. Here is a table to show the time for them to find the
first bug:
libFuzzer
AFL
honggfuzz
Radamsa
DivTest+S
<1s
7s
1s
7s
DivTest
>10min
>10min
2s
>10min
SimpleTest+S
<1s
>10min
1s
>10min
SimpleTest
<1s
>10min
1s
>10min
CxxStringEqTest+S
<1s
>10min
2s
>10min
CxxStringEqTest
>10min
>10min
2s
>10min
CounterTest+S
1s
5min
1s
7min
CounterTest
1s
4min
1s
7min
SimpleHashTest+S
<1s
3s
1s
2s
The "+S" symbol means the version with sanitizers (in this
evaluation, I used address and undefined
sanitizers). In this table, we can observe
that libFuzzer and honggfuzz perform
better than others in most cases. And another point is that
fuzzers can work better with sanitizers. For example, in the case
of DivTest, the primary goal of this test is to
trigger a "divide-by-zero" error, however, when working with
the undefined sanitizer, all these fuzzers will
trigger the "integer overflow" error more quickly. I only present
a part of the interesting results of this evaluation here. You can
refer to
this
script to reproduce some results or do more evaluation by
yourself.
Summary
In the past one month, I mainly contributed to:
Porting the libFuzzer to NetBSD
Preparing a pkgsrc-wip package
for honggfuzz
Fuzzing some userland programs with libFuzzer and other three
different fuzzers
Evaluating different fuzzers from different aspects
Regarding the third contribution, I tried to use different methods
to handle them according to their features. During this period, I
have fortunately found two bugs for the expr(1).
I'd like to thank my mentor Kamil Rytarowski and Christos Zoulas for
their suggestions and proposals. I also want to thank Kamil
Frankowicz for his advice on fuzzing and playing
with AFL. At last, thanks to Google and the NetBSD
community for giving me a good opportunity to work on this project.
[Less]
|