72
I Use This!
Activity Not Available

News

Analyzed 10 months ago. based on code collected over 2 years ago.
Posted over 7 years ago by Kamil Rytarowski
Sanitization is a process of detecting potential issues during the execution process. Sanitizers instrument (embedding checks into the generated code) and interact with the runtime linked into an executable, either statically or dynamically. In the ... [More] past month, I've finished a functional support of MKSANITIZER with Address Sanitizer and Undefined Behavior Sanitizer. MKSANITIZER uses the default compiler runtime shipped with Clang and GCC and ported to NetBSD. Over the past month, I've implemented from scratch a clean-room version of the UBSan runtime. The initial motivation was the need of developing one for the purposes of catching undefined behavior reports (unspecified code semantics in a compiled executable) in the NetBSD kernel. However, since we need to write a new runtime, I've decided to go two steps further and design code that will be usable inside libc and as a standalone library (linked .c source code) for the use of ATF regression tests. The µUBSan (micro-UBSan) design and implementation The original Clang/LLVM runtime is written in C++ with features that are not available in libc and in the NetBSD kernel. The Linux kernel version of an UBSan runtime is written natively in C, and mostly without additional unportable dependencies, however, it's GPL, and the number of features is beyond the code generation support in the newest version of Clang/LLVM from trunk (7svn). The implementation of µUBSan is located in common/lib/libc/misc/ubsan.c. The implementation is mostly Machine Independent, however, it assumes a typical 32bit or 64bit CPU with support for typical floating point types. Unlike the other implementations that I know, µUBSan is implemented without triggering Undefined Behavior. The whole implementation inside a single C file I've decided to write the whole µUBSan runtime as a single self-contained .c soure-code file, as it makes it easier for it to be reused by every interested party. This runtime can be either inserted inline or linked into the program. The runtime is written in C, because C is more portable, it's the native language of libc and the kernel, and additionally it's easier to match the symbols generated by the compilers (Clang and GCC). According to C++ ABI, C++ symbols are mangled, and in order to match the requested naming from the compiler instrumentation I would need to partially tag the code as C file anyway (extern "C"). Additionally, going the C++ way without C++ runtime features is not a typical way to use C++, and unless someone is a C++ enthusiast it does not buy much. Additionally, the programming language used for the runtime is almost orthogonal to the instrumentated programming language (although it must have at minimum the C-level properties to work on pointers and elementary types). A set of supported reporting features µUBSan supports all report types except -fsanitize=vtpr. For vptr there is a need for low-level C++ routines to introspect and validate the low-level parts of the C++ code (like vtable, compatiblity of dynamic types etc). While all other UBSan checks are done directly in the instrumented and inlined code, the vptr one is performed in runtime. This means that most of the work done by a minimal UBSan runtime is about deserializing reports into verbose messages and printing them out. Furthermore there is an option to configure a compiler to inject crashes once an UB issue will be detected and the runtine might not be needed at all, however this mode would be difficult to deal with and the sanitized code had to be executed with aid of a debugger to extract any useful information. Lack of a runtime would make UBSan almost unusable in the internals of base libraries such as libc or inside the kernel. These Clang/LLVM arguments for UBSan are documented as follows in the official documentation: Additionally the following flags can be used: The GCC runtime is a downstream copy of the Clang/LLVM runtime, and it has a reduced number of checks, since it's behind upstream. GCC developers sync the Clang/LLVM code from time to time. The first portion of merged NetBSD support for UBSan and ASan landed in GCC 8.x (NetBSD-8.0 uses GCC 5.x, NetBSD-current as of today uses GCC 6.x). This version of GCC also contains useful compiler attributes to mark certain parts of the code and disable sanitization of certain functions or files. Format of the reports I've decided to design the policy for reporting issues differently to the Linux kernel one. UBSan in the Linux kernel prints out messages in a multiline format with stacktrace: ================================================================================ UBSAN: Undefined behaviour in ../include/linux/bitops.h:110:33 shift exponent 32 is to large for 32-bit type 'unsigned int' CPU: 0 PID: 0 Comm: swapper Not tainted 4.4.0-rc1+ #26 0000000000000000 ffffffff82403cc8 ffffffff815e6cd6 0000000000000001 ffffffff82403cf8 ffffffff82403ce0 ffffffff8163a5ed 0000000000000020 ffffffff82403d78 ffffffff8163ac2b ffffffff815f0001 0000000000000002 Call Trace: [ Multiline print has an issue of requiring locking that prevents interwinding multiple reports, as there might be a process of printing them out by multiple threads in the same time. There is no way to perform locking in a portable way that is functional inside libc and the kernel, across all supported CPUs and what is more important within all contexts. Certain parts of the kernel must not block or delay execution and in certain parts of the booting process (either kernel or libc) locking or atomic primitives might be unavailable. I've decided that it is enough to print a single-line message where occurred a problem and what was it, assuming that printing routines are available and functional. A typical UBSan report looks this way: Undefined Behavior in /public/netbsd-root/destdir.amd64/usr/include/ufs/lfs/lfs_accessors.h:747:1, member access within misaligned address 0x7f7ff7934444 for type 'union FINFO' which requires 8 byte alignment These reports are pretty much selfcontained and similar to the ones from the Clang/LLVM runtime: test.c:4:14: runtime error: left shift of 1 by 31 places cannot be represented in type 'int' Not implementing __ubsan_on_report() The Clang/LLVM runtime ships with a callback API for the purpose of debuggers that can be notified by sanitizers reports. A debugger has to define __ubsan_on_report() function and call __ubsan_get_current_report_data() to collect report's information. As an illustration of usage, there is a testing code shipped with compiler-rt for this feature (test/ubsan/TestCases/Misc/monitor.cpp): // Override the definition of __ubsan_on_report from the runtime, just for // testing purposes. void __ubsan_on_report(void) { const char *IssueKind, *Message, *Filename; unsigned Line, Col; char *Addr; __ubsan_get_current_report_data(&IssueKind, &Message, &Filename, &Line, &Col, &Addr); std::cout << "Issue: " << IssueKind << "\n" << "Location: " << Filename << ":" << Line << ":" << Col << "\n" << "Message: " << Message << std::endl; (void)Addr; } Unfortunately this API is not thread aware and guaranteeing so in the implementation would require excessively complicated code shared between the kernel and libc. The usability is still restricted to debugger (like a LLDB plugin for UBSan), there is already an alternative plugin for such use-cases when it would matter. I've documented the __ubsan_get_current_report_data() routine with the following comment: /* * Unimplemented. * * The __ubsan_on_report() feature is non trivial to implement in a * shared code between the kernel and userland. It's also opening * new sets of potential problems as we are not expected to slow down * execution of certain kernel subsystems (synchronization issues, * interrupt handling etc). * * A proper solution would need probably a lock-free bounded queue built * with atomic operations with the property of multiple consumers and * multiple producers. Maintaining and validating such code is not * worth the effort. * * A legitimate user - besides testing framework - is a debugger plugin * intercepting reports from the UBSan instrumentation. For such * scenarios it is better to run the Clang/GCC version. */ Reporting channels The basic reporting channel for kernel messages is the dmesg(8) buffer. As an implementation detail I'm using variadic output routines (va_list) such as vprintf() ones. Depending on the type of a report there are two types of calls used in the kernel: printf(9) - for non-fatal reports, when a kernel can continue execution. panic(9) - for fatal reports stopping the kernel execution with a panic string. The userland version has three reporting channels: standard output (stdout), standard error (stderr), syslog (LOG_DEBUG | LOG_USER) Additionally, a user can tune into the runtime whether non-fatal reports are turned into fatal messages or not. The fatal messages stop the execution of a process and raise the abort signal (SIGABRT). The dynamic options in uUBSan can be changed with LIBC_UBSAN environment variable. The variable accepts options specified with single characters that either enable or disable a specified option. There are the following options supported: a - abort on any report, A - do not abort on any report, e - output report to stderr, E - do not output report to stderr, l - output report to syslog, L - do not output report to syslog, o - output report to stdout, O - do not output report to stdout. The default configuration is "AeLO". The flags are parsed from left to right and supersede previous options for the same property. Differences between µUBsan in the kernel, libc and as a standalone library There are three contexts of operation of µUBsan and there is need to use conditional compilation in few parts. I've been trying to keep to keep the differences to an absolute minimum, they are as follows: kUBSan uses kernel-specific headers only. uUBSan uses userland-specific headers, with a slight difference between libc ("namespace.h" internal header usage) and standalone userspace usage (in ATF tests). uUBSan defines a fallback definition of kernel-specific macros for the ISSET(9) API. kUBSan does not build and does not handle floating point routines. kUBSan outputs reports with either printf(9) or panic(9). uUBSan outputs reports to either stdout, stderr or syslog (or to a combination of them). kUBSan does not contain any runtime switches and is configured with build options (like whether certain reports are fatal or not) using the CFLAGS argument and upstream compiler flags. uUBSan does contain runtime dynamic configuration of the reporting channel and whether a report is turned into a fatal error. MKLIBCSANITIZER I've implemented a global build option of the distribution MKLIBCSANITIZER. A user can build the whole userland including libc, libm, librt, libpthread with a dedicated sanitizer implemented inside libc. Right now, there is only support for the Undefined Behavior sanitizer with the µUBSan runtime. I've documented this feature in share/mk/bsd.README with the following text: MKLIBCSANITIZER If "yes", use the selected sanitizer inside libc to compile userland programs and libraries as defined in USE_LIBCSANITIZER, which defaults to "undefined". The undefined behavior detector is currently the only supported sanitizer in this mode. Its runtime differs from the UBSan available in MKSANITIZER, and it is reimplemented from scratch as micro-UBSan in the user mode (uUBSan). Its code is shared with the kernel mode variation (kUBSan). The runtime is stripped down from C++ features, in particular -fsanitize=vptr is not supported and explicitly disabled. The only runtime configuration is restricted to the LIBC_UBSAN environment variable, that is designed to be safe for hardening. The USE_LIBCSANITIZER value is passed to the -fsanitize= argument to the compiler in CFLAGS and CXXFLAGS, but not in LDFLAGS, as the runtime part is located inside libc. Additional sanitizer arguments can be passed through LIBCSANITIZERFLAGS. Default: no This means that a user can build the distribution with the following command: ./build.sh -V MKLIBCSANITIZER=yes distribution The number of issues detected is overwhelming. The Clang/LLVM toolchain - as mentioned above - reports much more potential bugs than GCC, but with both compilers during the execution of ATF tests there are thousands or reports. Most of them are reported multiple times and the number of potential code flaws is around 100. An example log of execution of the ATF tests with MKLIBCSANITIZER (GCC): atf-mklibcsanitizer-2018-07-25.txt. I've also prepared a version that is preprocessed with identical lines removed, and reduced to UBSan reports only: atf-mklibcsanitizer-2018-07-25-processed.txt. I've fixed a selection of reported issues, mostly the low-hanging fruit ones. Part of the reports, especially the misaligned pointer usage ones (for variables it means that their address has to be a multiplication of their size) usage ones might be controversial. Popular CPU architectures such as X86 are tolerant to misaligned pointer usage and most programmers are not aware of potential issues in other environments. I defer further discussion on this topic to other resources, such as the kernel misaligned data pointer policy in other kernels. Kernel Undefined Behavior Sanitizer As already noted, kUBSan uses the same runtime as uBSan with a minimal conditional switches. µUBSan can be enabled in a kernel config with the KUBSAN option. Althought, the feature is Machine Independent, I've been testing it with the NetBSD/amd64 kernel. The Sanitizer can be enabled in the kernel configuration with the following diff: Index: sys/arch/amd64/conf/GENERIC =================================================================== RCS file: /cvsroot/src/sys/arch/amd64/conf/GENERIC,v retrieving revision 1.499 diff -u -r1.499 GENERIC --- sys/arch/amd64/conf/GENERIC 3 Aug 2018 04:35:20 -0000 1.499 +++ sys/arch/amd64/conf/GENERIC 7 Aug 2018 00:10:44 -0000 @@ -111,7 +111,7 @@ #options KGDB # remote debugger #options KGDB_DEVNAME="\"com\"",KGDB_DEVADDR=0x3f8,KGDB_DEVRATE=9600 makeoptions DEBUG="-g" # compile full symbol table for CTF -#options KUBSAN # Kernel Undefined Behavior Sanitizer (kUBSan) +options KUBSAN # Kernel Undefined Behavior Sanitizer (kUBSan) #options SYSCALL_STATS # per syscall counts #options SYSCALL_TIMES # per syscall times #options SYSCALL_TIMES_HASCOUNTER # use 'broken' rdtsc (soekris) As a reminder, the command to build a kernel is as follows: ./build.sh kernel=GENERIC A number of issues have been detected and a selection of them already fixed. Some of the fixes change undefined behavior into inplementation specific behavior, which might be treated as appeasing the sanitizer, e.g. casting a variable to an unsigned type, shifting bits and casting back to signed. ATF tests I've implemented 38 test scenarios verifying various types of Undefined Behavior that can be caught by the sanitizer. The are two sets of tests: C and C++ ones and they are located in tests/lib/libc/misc/t_ubsan.c and tests/lib/libc/misc/t_ubsanxx.cpp. Some of the issues are C and C++ specific only, others just C or C++ ones. I've decided to achieve the following purposes of the tests: Validation of µUBSan. Validation of compiler instrumentation part (independent from the default compiler runtime correctness). The following tests have been implemented: add_overflow_signed add_overflow_unsigned builtin_unreachable cfi_bad_type cfi_check_fail divrem_overflow_signed_div divrem_overflow_signed_mod dynamic_type_cache_miss float_cast_overflow function_type_mismatch invalid_builtin_ctz invalid_builtin_ctzl invalid_builtin_ctzll invalid_builtin_clz invalid_builtin_clzl invalid_builtin_clzll load_invalid_value_bool load_invalid_value_enum missing_return mul_overflow_signed mul_overflow_unsigned negate_overflow_signed negate_overflow_unsigned nonnull_arg nonnull_assign nonnull_return out_of_bounds pointer_overflow shift_out_of_bounds_signednessbit shift_out_of_bounds_signedoverflow shift_out_of_bounds_negativeexponent shift_out_of_bounds_toolargeexponent sub_overflow_signed sub_overflow_unsigned type_mismatch_misaligned vla_bound_not_positive integer_divide_by_zero float_divide_by_zero The tests have all been verified to work with the following configurations: amd64 and i386, Clang/LLVM (started with 3.8, later switched to 7svn) and GCC 6.x, C and C++. Changes merged with the NetBSD sources Avoid unportable signed integer left shift in intr_calculatemasks() Avoid unportable signed integer left shift in fd_used() Try to appease KUBSan in sys/sys/wait.h in W_EXITCODE() Avoid unportable signed integer left shift in fd_isused() Avoid unportable signed integer left shift in fd_copy() Avoid unportable signed integer left shift in fd_unused() Paper over Undefined Behavior in in6_control1() Avoid undefined operation in signed integer shift in MAP_ALIGNED() Avoid Undefined Behavior in pr_item_notouch_get() Avoid Undefined Behavior in ffs_clusteracct() Avoid undefined behavior in pr_item_notouch_put() Avoid undefined behavior in pciiide macros Avoid undefined behavior in scsipiconf.h in _4ltol() and _4btol() Avoid undefined behavior in mq_recv1() Avoid undefined behavior in mq_send1() Avoid undefined behavior in lwp_ctl_alloc() Avoid undefined behavior in lwp_ctl_free() Remove UB from definition of symbols in i915_reg.h Correct unportable signed integer left shift in i386/amd64 tss code Remove unaligned access to mpbios_page[] (reverted) Try to avoid signed integer overflow in callout_softclock() Avoid undefined behavior of signedness bit shift in ahcisata_core.c Disable profile and compat 32-bit tests cc sanitizer tests Disable profile and compat 32-bit c++ sanitizer tests Use __uint128_t conditionally in aarch64 reg.h TODO.sanitizers: Remove a finished item Avoid potential undefined behavior in bta2dpd(8) Appease GCC in hci_filter_test() Document the default value of MKSANITIZER in bsd.README Avoid undefined behavior in ecma167-udf.h Avoid undefined behavior in left bit shift in jemalloc(3) Avoid undefined behavior in an ATF test: t_types Avoid undefined behavior in an ATF test: t_bitops Avoid undefined behavior semantics in msdosfs_fat.c Document MKLIBCSANITIZER in bsd.README Introduce MKLIBCSANITIZER in the share/mk rules Introduce a new option -S in crunchgen(1) Specify NOLIBCSANITIZER in x86 bootloader-like code under sys/arch/ Specify NOLIBCSANITIZER for rescue Avoid undefined behavior in the definition of LAST_FRAG in xdr_rec.c Avoid undefined behavior in ftok(3) Avoid undefined behavior in an cpuset.c Avoid undefined behavior in an inet_addr.c Avoid undefined behavior in netpgpverify Avoid undefined behavior in netpgpverify/sha2.c Avoid undefined behavior in snprintb.c Specify NOLIBCSANITIZER in lib/csu Import micro-UBSan (ubsan.c) Fix build failure in dhcpcd under uUBSan Fix dri7 build with Clang/LLVM Fix libGLU build with Clang/LLVM Fix libXfont2 build with Clang/LLVM on i386 Fix xf86-video-wsfb build with Clang/LLVM Disable sanitization of -fsanitize=function in libc Allow to overwrite sanitizer flags for userland Tidy up the comment in ubsan.c Register a new directory in common/lib/libc/misc Import micro-UBSan ATF tests Register micro-UBSan ATF tests in the distribution Add a support to build ubsan.c in libc Appease GCC in the openssh code when built with UBSan Register kUBSan in the GENERIC amd64 kernel config Fix distribution lists with MKCATPAGES=yes Restrict -fno-sanitize=function to Clang/LLVM only Try to fix the evbppc-powerpc64 build Summary The NetBSD community has aquired a new clean-room Undefined Behavior sanitizer runtime µUBSan, that is already ready to use by the community of developers. There are three modes of µUBSan: kUBSan - kernelmode UBSan, uUBSan - usermode UBSan - as MKLIBCSANITIZER inside libc, uUBSan - usermode UBSan - as a standalone .c library for use with ATF tests. A new set of bugs can be detected with a new development tool, ensuring better quality of the NetBSD Operating System. It's worth to note the selection of fixes have been ported and/or pushed to other projects. Among them FreeBSD developers merged some of the patches into their soures. The new runtime is designed to be portable and resaonably licensed (BSD-2-clause) and can be reused by other operating systems, improving the overall quality in them. Plan for the next milestone The Google Summer of Code programming period is over and I intend to finish two leftover tasks:: Port the ptrace(2) attach functionality in honggfuzz to NetBSD. It will allow catching crash signals more effectively during the fuzzing process. Resume the porting process (together with the student) of Address Sanitizer to the NetBSD kernel. This work was sponsored by The NetBSD Foundation. The NetBSD Foundation is a non-profit organization and welcomes any donations to help us continue funding projects and services to the open-source community. Please consider visiting the following URL, and chip in what you can: http://netbsd.org/donations/#how-to-donate [Less]
Posted over 7 years ago
Posted over 7 years ago by snj
The NetBSD release engineering team is announcing a new support policy for our release branches. This affects NetBSD 8.0 and subsequent major releases (9.0, 10.0, etc.). All currently supported releases (6.x and 7.x) will keep their existing ... [More] support policies. Beginning with NetBSD 8.0, there will be no more teeny branches (e.g., netbsd-8-0). This means that netbsd-8 will be the only branch for 8.x and there will be only one category of releases derived from 8.0: update releases. The first update release after 8.0 will be 8.1, the next will be 8.2, and so on. Update releases will contain security and bug fixes, and may contain new features and enhancements that are deemed safe for the release branch. With this simplification of our support policy, users can expect: More frequent releases Better long-term support (example: quicker fixes for security issues, since there is only one branch to fix per major release) New features and enhancements to make their way to binary releases faster (under our current scheme, no major release has received more than two feature updates in its life) We understand that users of teeny branches may be concerned about the increased number of changes that update releases will bring. Historically, NetBSD stable branches (e.g., netbsd-7) have been managed very conservatively. Under this new scheme, the release engineering team will be even more strict in what changes we allow on the stable branch. Changes that would create issues with backwards compatibility are not allowed, and any changes made that prove to be problematic will be promptly reverted. The support policy we've had until now was nice in theory, but it has not worked out in practice. We believe that this change will benefit the situation for vast majority of NetBSD users. [Less]
Posted over 7 years ago by martin
The NetBSD Project is pleased to announce NetBSD 8.0, the sixteenth major release of the NetBSD operating system. It represents many bug fixes, additional hardware support and new security features. If you are running an earlier release of NetBSD, we ... [More] strongly suggest updating to 8.0. For more details, please see the release notes. Complete source and binaries for NetBSD are available for download at many sites around the world and our CDN. A list of download sites providing FTP, AnonCVS, and other services may be found at the list of mirrors. [Less]
Posted over 7 years ago by Leonardo Taccari
Prepared by Keivan Motavalli as part of GSoC 2018. Packages may install code (both machine executable code and interpreted programs), documentation and manual pages, source headers, shared libraries and other resources such as graphic elements ... [More] , sounds, fonts, document templates, translations and configuration files, or a combination of them. Configuration files are usually the mean through which the behaviour of software without a user interface is specified. This covers parts of the operating systems, network daemons and programs in general that don't come with an interactive graphical or textual interface as the principal mean for setting options. System wide configuration for operating system software tends to be kept under /etc, while configuration for software installed via pkgsrc ends up under LOCALBASE/etc (e.g., /usr/pkg/etc). Software packaged as part of pkgsrc provides example configuration files, if any, which usually get extracted to LOCALBASE/share/examples/PKGBASE/. After a package has been extracted pre-pending the PREFIX(/LOCALBASE?) to relative file paths as listed in the PLIST file, metadata entries (such as +BUILD_INFO, +DESC, etc) get extracted to PKG_DBDIR/PKGNAME-PKGVERSION (creating files under /usr/pkg/pkgdb/tor-0.3.2.10, as an example). Some shell script also get extracted there, such as +INSTALL and +DEINSTALL. These incorporate further snippets that get copied out to distinct files after pkg_add executes the +INSTALL script with UNPACK as argument. Two main frameworks exist taking care of installation and deinstallation operations: pkgtasks, still experimental, is structured as a library of POSIX-compliant shell scripts implementing functions that get included from LOCALBASE/share/pkgtasks-1 and called by the +INSTALL and +DEINSTALL scripts upon execution. Currently pkgsrc defaults to using the pkginstall framework, which as mentioned copies out from the main file separate, monolithic scripts handling the creation and removal of directories on the system outside the PKGBASE, user accounts, shells, the setup of fonts... Among these and other duties, +FILES ADD, as called by +INSTALL, copies with correct permissions files from the PKGBASE to the system, if required by parts of the package such as init scripts and configuration files. Files to be copied are added as comments to the script at package build time, here's an example: # FILE: /etc/rc.d/tor cr share/examples/rc.d/tor 0755 # FILE: etc/tor/torrc c share/examples/tor/torrc.sample 0644 "c" indicates that LOCALBASE/share/examples/rc.d/tor is to be copied in place to /etc/rc.d/tor with permissions 755, "r" that it is to be handled as an rc.d script. LOCALBASE/share/examples/tor/torrc.sample, the example file coming with default configuration options for the tor network daemon, is to be copied to LOCALBASE/etc/tor/torrc. As of today, this only happens if the package has never been installed before and said configuration file doesn't already exist on the system, this to avoid overwriting explicit option changes made by the user (or site administrator) when upgrading or reinstalling packages. Let's see where how it's done... actions are defined under case switches: case $ACTION in ADD) ${SED} -n "/^\# FILE: /{s/^\# FILE: //;p;}" ${SELF} | ${SORT} -u | while read file f_flags f_eg f_mode f_user f_group; do … case "$f_flags:$_PKG_CONFIG:$_PKG_RCD_SCRIPTS" in *f*:*:*|[!r]:yes:*|[!r][!r]:yes:*|[!r][!r][!r]:yes:*|*r*:yes:yes) if ${TEST} -f "$file"; then ${ECHO} "${PKGNAME}: $file already exists" elif ${TEST} -f "$f_eg" -o -c "$f_eg"; then ${ECHO} "${PKGNAME}: copying $f_eg to $file" ${CP} $f_eg $file [...] [...] Programs and commands are called using variables set in the script and replaced with platform specific paths at build time, using the FILES_SUBST facility (see mk/pkginstall/bsd.pkginstall.mk) and platform tools definitions under mk/tools. In order to also store revisions of example configuration files in a version control system, +FILES needs to be modified to always store revisions in a VCS, and to attempt merging changes non interactively when a configuration file is already installed on the system. In order to avoid breakage, installed configuration is backed up first in the VCS, separating user-modified files from files that have been already automatically merged in the past, in order to allow the administrator to easily restore the last manually edited file in case of breakage. Branches are deliberately not used, since not everyone may wish to get familiar with version control systems technicalities when attempting to make a broken system work again. Here's what the modified pkginstall +FILES script does when installing spamd: case "$f_flags:$_PKG_CONFIG:$_PKG_RCD_SCRIPTS" in *f*:*:*|[!r]:yes:*|[!r][!r]:yes:*|[!r][!r][!r]:yes:*|*r*:yes:yes) if ${TEST} "$_PKG_RCD_SCRIPTS" = "no" -a ! -n "$NOVCS"; then VCS functionality only applies to configuration files, not to rc.d scripts, and only if the environment variable $NOVCS is unset. Set it to any value - yes will work :) - to disable the handling of configuration file revisions. A small note: these options could, in the future, be parsed by pkg_add from some configuration file and passed calling setenv before executing +INSTALL, without the need to pass them as arguments and thus minimizing code path changes. $VCSDIR is used to set a working directory for VCS functionality different from the default one, VARBASE/confrepo. VCSDIR/automergedfiles is a textual list made by the absolute paths of installed configuration files already automatically merged in the past during package upgrades. Manually remove entries from the list when you make manual configuration changes after a package has been automatically merged! And don't worry: automatic merging is disabled by default, set $VCSAUTOMERGE to enable it. When a configuration file already exists on the system, if it is absent from VCSDIR/automergedfiles, it is assumed to be user edited and copied to VCSDIR/user/path/to/installed/file is a working file REGISTERed (added and committed) to the version control system. Check it out and restore it from there in case of breakage! If the file is about to get automatically merged, and the operation already succeeded in the past, then you can find automatically merged revisions of installed configuration files under VCSDIR/automerged/path/to/installed/file checkout the required revision! A new script, +VERSIONING, handles operations such as PREPARE (checks that a vcs repository is initialized), REGISTER (adds a configuration file from the working directory to the repo), COMMIT (commit multiple REGISTER actions after all configuration has been handled by the +FILES script, for VCSs that support atomic transactions), CHECKOUT (checks out the last revision of a file to the working directory) and CHECKOUT-FIRST (checks out the first revision of a file). The version control system to be used as a backend can be set through $VCS. It default to RCS, the Revision Control System, which works only locally and doesn't support atomic transactions. It will get setup as a tool when bootstrapping pkgsrc on platforms that don't already come with it. Other backends such as CVS are supported and more will come; these, being used at the explicit request of the administrator, need to be already installed and placed in a directory part of $PATH. Let's see what happens with rcs when NOVCS is unset, installing spamd (for the first time). cd pkgsrc/mail/spamd # bmake => Bootstrap dependency digest>=20010302: found digest-20160304 ===> Skipping vulnerability checks. > Fetching spamd-20060330.tar.gz [...] bmake install ===> Installing binary package of spamd-20060330nb2 spamd-20060330nb2: Creating group ``_spamd'' spamd-20060330nb2: Creating user ``_spamd'' useradd: Warning: home directory `/var/chroot/spamd' doesn't exist, and -m was not specified rcs: /var/confrepo/defaults//usr/pkg/etc/RCS/spamd.conf,v: No such file or directory /var/confrepo/defaults//usr/pkg/etc/spamd.conf,v <-- /var/confrepo/defaults//usr/pkg/etc/spamd.conf initial revision: 1.1 done REGISTER /var/confrepo/defaults//usr/pkg/etc/spamd.conf spamd-20060330nb2: copying /usr/pkg/share/examples/spamd/spamd.conf to /usr/pkg/etc/spamd.conf =========================================================================== The following files should be created for spamd-20060330nb2: /etc/rc.d/pfspamd (m=0755) [/usr/pkg/share/examples/rc.d/pfspamd] =========================================================================== =========================================================================== $NetBSD: MESSAGE,v 1.1.1.1 2005/06/28 12:43:57 peter Exp $ Don't forget to add the spamd ports to /etc/services: spamd 8025/tcp # spamd(8) spamd-cfg 8026/tcp # spamd(8) configuration =========================================================================== /usr/pkg/etc/spamd.conf didn't already exists, so in the end, as usual, the example/default configuration /usr/pkg/share/examples/spamd/spamd.conf gets copied to PKG_SYSCONFDIR/spamd.conf. The modified +FILES script also copied the example file under the VCS working directory at /var/confrepo/default/share/examples/spamd/spamd.conf it then REGISTEREd this (initial) revision of the default configuration with RCS. When installing an updated (ouch!) spamd package, the installed configuration at /usr/pkg/etc/spamd.conf won't get touched, but a new revision of share/examples/spamd/spamd.conf will get stored using the revision control system. For VCSs that support them, remote repositories can also be used via $REMOTEVCS. From the +VERSIONING comment: REMOTEVCS, if set, must contain a string that the chosen VCS understands as an URI to a remote repository, including login credentials if not specified through other means. This is non standard across different backends, and additional environment variables and cryptographic material may need to be provided. So, if using CVS accessing a remote repository over ssh, one should setup keys on the systems, then set and export VCS=cvs CVS_RSH=/usr/bin/ssh REMOTEVCS=user@hostname:/path/to/existing/repo Remember to initialize (e.g., mkdir -p /path/to/repo; cd /path/to/repo; cvs init) the repository on the remote system before attempting to install new packages. Let's try to make a configuration change to spamd.conf and reinstall it: I will enable whitelists uncommenting #whitelist:\ # :white:\ # :method=file:\ # :file=/var/mail/whitelist.txt: ...and enable automerge: export VCSAUTOMERGE=yes bmake install [...] merged with no conflict. installing it to /usr/pkg/etc/spamd.conf! No conflicts get reported, diff shows no output since the installed file is already identical to the automerged one, which is installed again and contains the whitelisting options uncommented: more /usr/pkg/etc/spamd.conf # Whitelists are done like this, and must be added to "all" after each # blacklist from which you want the addresses in the whitelist removed. # whitelist:\ :white:\ :method=file:\ :file=/var/mail/whitelist.txt: Let's simulate instead the addition of a new configuration option in a new package revision: this shouldn't generate conflicts! bmake extract ===> Extracting for spamd-20060330nb2 vi work/spamd-20060330/etc/spamd.conf # spamd config file, read by spamd-setup(8) for spamd(8) # # See spamd.conf(5) # this is a new comment! # save, run bmake; bmake install: ===> Installing binary package of spamd-20060330nb2 RCS file: /var/confrepo/defaults//usr/pkg/etc/spamd.conf,v done /var/confrepo/defaults//usr/pkg/etc/spamd.conf,v <-- /var/confrepo/defaults//usr/pkg/etc/spamd.conf new revision: 1.9; previous revision: 1.8 done REGISTER /var/confrepo/defaults//usr/pkg/etc/spamd.conf spamd-20060330nb2: /usr/pkg/etc/spamd.conf already exists spamd-20060330nb2: attempting to merge /usr/pkg/etc/spamd.conf with new defaults! saving the currently installed revision to /var/confrepo/automerged//usr/pkg/etc/spamd.conf RCS file: /var/confrepo/automerged//usr/pkg/etc/spamd.conf,v done /var/confrepo/automerged//usr/pkg/etc/spamd.conf,v <-- /var/confrepo/automerged//usr/pkg/etc/spamd.conf file is unchanged; reverting to previous revision 1.1 done /var/confrepo/defaults//usr/pkg/etc/spamd.conf,v --> /var/confrepo/defaults//usr/pkg/etc/spamd.conf revision 1.1 done merged with no conflict. installing it to /usr/pkg/etc/spamd.conf! --- /usr/pkg/etc/spamd.conf 2018-07-09 22:21:47.310545283 +0200 +++ /var/confrepo/defaults//usr/pkg/etc/spamd.conf.automerge 2018-07-09 22:29:16.597901636 +0200 @@ -5,6 +5,7 @@ # See spamd.conf(5) # # Configures whitelists and blacklists for spamd +# this is a new comment! # # Strings follow getcap(3) convention escapes, other than you # can have a bare colon (:) inside a quoted string and it revert from the last revision of /var/confrepo/automerged//usr/pkg/etc/spamd.conf if needed =========================================================================== The following files should be created for spamd-20060330nb2: /etc/rc.d/pfspamd (m=0755) [/usr/pkg/share/examples/rc.d/pfspamd] =========================================================================== =========================================================================== $NetBSD: MESSAGE,v 1.1.1.1 2005/06/28 12:43:57 peter Exp $ Don't forget to add the spamd ports to /etc/services: spamd 8025/tcp # spamd(8) spamd-cfg 8026/tcp # spamd(8) configuration =========================================================================== more /usr/pkg/etc/spamd.conf [...] # See spamd.conf(5) # # Configures whitelists and blacklists for spamd # this is a new comment! # # Strings follow getcap(3) convention escapes, other than you [...] # Whitelists are done like this, and must be added to "all" after each # blacklist from which you want the addresses in the whitelist removed. # whitelist:\ :white:\ :method=file:\ :file=/var/mail/whitelist.txt: We're set for now. In case of conflicts merging, the user is notified, the installed configuration file is not replaced and the conflict can be manually resolved by opening the file (as an example, /var/confrepo/defaults/usr/pkg/etc/spamd.conf.automerge) in a text editor. [Less]
Posted over 7 years ago
Posted over 7 years ago by Leonardo Taccari
On July 7th and 8th there was pkgsrcCon 2018 in Berlin, Germany. It was my first pkgsrcCon and it was really really nice... So, let's share a report about it, what we have done, the talk presented and everything else! Friday (06/07): Social Event ... [More] I arrived by plane at Berlin Tegel Airport in the middle of the afternoon. TXL buses were pretty full but after waiting for 3 of them, I was finally in the direction for Berlin Hauptbahnhof (nice thing about the buses is that after many are getting too full they start to arrive minute after minute!) and then took the S7 for Berlin Jannowitzbrücke station, just a couple of minutes on foot to republik-berlin (for the Friday social event). On 18:00 we met in republik-berlin for the social event. We had good burgers there and one^Wtwo^Wsome beers together! The place were a bit noisy for the Belgium vs Brazil World Cup match, but we still had nice discussions together (and also without losing a lot of people cheering on! :)) There was also a table tennis table and spz, maya, youri and myself played (I'm a terrible table tennis player but it was very funny to play the wild west without any rules! :)). Saturday (07/07): Talks session Meet & Greet -- Pierre Pronchery (khorben), Thomas Merkel (tm) Pierre and Thomas welcomed us (aliens! :)) in c-base. c-base is a space station under Berlin (or probably one of the oldest hackerspace, at least old enough that the word "hackerspace" even didn't existed!). Slides (PDF) are available! Keynote: Beautiful Open Source -- Hugo Teso Hugo talked about his experience as an open source developer and focused in particular how important is the user interface. He discussed that examinating some projects he worked on: Inguma, Bokken, Iaitö and Cutter extracting patterns about his experience. Slides (PDF) are available! The state of desktops in pkgsrc -- Youri Mouton (youri) Youri discussed about the state of desktop environments (DE) in pkgsrc starting with xfce, MATE, LXDE, KDE and Defora. He then discussed about the WIP desktop environments: Cinnamon, LXQT, Gnome 3 and CDE, hardware support and login managers. Especially for the WIP desktop environments help is more than welcomed so if you're interested in any of that, would like to help (that's also a great way to start involved in pkgsrc!) please get in touch with youri and/or give a look at the wip/*/TODO files in pkgsrc-wip! NetBSD & Mercurial: One year later -- Jörg Sonnenberger (joerg) Jörg started discussing about Git (citing High-level Problems with Git and How to Fix Them - Gregory Szorc) and then discussed on why using Mercurial. Then he announced the latest changes: hgmaster.NetBSD.org and anonhg.NetBSD.org that permits to experiment with Mercurial and source-changes-hg@ and pkgsrc-changes-hg@ mailing lists. The talk ended describing missing/TODO steps. Slides (HTML) are available! Maintaining qmail in 2018 -- Amitai Schleier (schmonz) Amitai shared his long experience in maintaining qmail. A lot of lesson learned in doing that were shared and it was also funny to see that at a certain point from MAINTAINER he was more and more involved doing that and ending up writing patches and tools for qmail. Slides (HTML) are available! A beginner's introduction to GCC -- Maya Rashish (maya) Maya discussed about GCC. First she talked about an overview of the toolchain (in general) and the corresponding GCC projects, how to pass flags to each of them and how to stop the compilation process for each of them. Then she talked about the black magic that happens in preprocessor, for example, what a program does an #include and why __NetBSD__ is defined. We then saw that with -save-temps is possible to save all intermediary results and how this is very helpful to debug possible problems. Compiler, assembler and linker were then discussed. We have also seen specfiles, readelf and other GCC internals. Slides (HTML) are available! Handling the workflow of pkgsrc-security -- Leonardo Taccari (leot) I discussed about the workflow of the pkgsrc Security Team (pkgsrc-security). I gave a brief introduction to nmh (new MH) message handling system. Then talked about the mission, tasks and workflow of the pkgsrc-security. For the last part of the talk, I tried to put everything together and showed how to try to automate some part of the pkgsrc-security with nmh and some shell scripting. Slides (PDF) are available! Preaching for releng-pkgsrc -- Benny Siegert (bsiegert) Benny discussed about pkgsrc Releng team (releng-pkgsrc). The talk started discussing about the pkgsrc Quarterly Releases. Since 2003Q4, every quarter a new pkgsrc release is released. Stable releases are the basis for binary packages. Security, build and bug fixes get applied over the liftime of the release via pullups, until the next quarterly release. The release procedure and freeze period were also discussed. Then we examined the life of a pullup. Benny first introduced what a pullup is, the rules for requesting them and a practical example of how to file a good pullup request. Under the hood parts of releng were also discussed, for example how tickets are handled with req, help script to ease the pullup, etc.. The talk concluded with the importance of releng-pkgsrc and also a call for volunteers to join releng-pkgsrc! (despite they're really doing a great work, at the moment there is a shortage of members in releng-pkgsrc, so, if you are interested and would like to join them please get in touch with them!) Something old, something new, something borrowed -- Sevan Janiyan (sevan) Sevan discussed about the state of NetBSD/macppc port. Lot of improvements and news happened (a particular kudos to macallan for doing an amazing work on the macppc port!)! HEAD-llvm builds for macppc were added; awacs(4) Bluetooth support, IPsec support, Veriexec support are all enabled by default now. radeonfb(4) and XCOFF boot loader had several improvements and now DVI is supported on the G4 Mac Mini. The other big news in the macppc land is the G5 support that will probably be interesting also for possible pkgsrc bulk builds. Sevan also discussed about some current problems (and workarounds!), bulk builds takes time, no modern browser with JavaScript support is easily available right now but also how using macppc port helped to spot several bugs. Then he discussed about Upspin (please also give a look to the corresponding package in wip/go-upspin!) Slides (PDF) are available! Magit -- Christoph Badura (bad) Christoph talk was a live introduction to Magit, a Git interface for Emacs. The talk started quoting James Mickens It Was Never Going to Work, So Let's Have Some Tea talk presented at USENIX LISA15 when James Mickens talked about an high level picture of how Git works. We then saw how to clone a repository inside Magit, how to navigate the commits, how to create a new branch, edit a file and look at unstaged changes, stage just some hunks of a change and commit them and how to rebase them (everything is just one or two keystrokes far!). Post conf dinner After the talks we had some burgers and beers together at Spud Bencer. We formed several groups to go there from c-base and I was actually in the group that went there on foot so it was also a nice chance to sightsee Berlin (thanks to khorben for being a very nice guide! :)). Sunday (08/07): Hacking session An introduction to Forth -- Valery Ushakov (uwe) On Sunday morning Valery talked about Forth from the ground up. We saw how to implement a Forth interpreter step by step and discussed threaded code. Unfortunately the talk was not recorded... However, if you are curious I suggest taking a look to nbuwe/forth BitBucket repository. internals.txt file also contains a lot of interesting resources about Forth. Hacking session After Valery talk there was the hacking session where we hacked on pkgsrc, discussed together, etc.. Late in the afternoon some of us visited Computerspielemuseum. More than 50 years of computer games were covered there and it was fun to also play to several historical and also more recent video games. We then met again for a dinner together in Potsdamer Platz. Conclusion pkgsrcCon 2018 was really really great! First of all I would like to thank all the pkgsrcCon organizers: khorben and tm. It was very well organized and everything went well, thank you Pierre and Thomas! A big thank you also to wiedi, just after few hours all the recordings of the talk were shared and that's really impressive! Thanks also to youri and Gilberto for photographs. Last, but not least, thanks to The NetBSD Foundation for supporting three developers to attend the conference. c-base for kindly providing a very nice location for the pkgsrcCon. Our sponsors: Defora Networks for sponsoring the t-shirts and badges for the conference and SkyLime for sponsoring the catering on Saturday. Thank you! [Less]
Posted over 7 years ago by Leonardo Taccari
On July 7th and 8th there was pkgsrcCon 2018 in Berlin, Germany. It was my first pkgsrcCon and it was really really nice... So, let's share a report about it, what we have done, the talk presented and everything else! Friday (06/07): Social Event ... [More] I arrived by plane at Berlin Tegel Airport in the middle of the afternoon. TXL buses were pretty full but after waiting for 3 of them, I was finally in the direction for Berlin Hauptbahnhof (nice thing about the buses is that after many are getting too full they start to arrive minute after minute!) and then took the S7 for Berlin Jannowitzbrücke station, just a couple of minutes on foot to republik-berlin (for the Friday social event). On 18:00 we met in republik-berlin for the social event. We had good burgers there and one^Wtwo^Wsome beers together! The place were a bit noisy for the Belgium vs Brazil World Cup match, but we still had nice discussions together (and also without losing a lot of people cheering on! :)) There was also a table tennis table and spz, maya, youri and myself played (I'm a terrible table tennis player but it was very funny to play the wild west without any rules! :)). Saturday (07/07): Talks session Meet & Greet -- Pierre Pronchery (khorben), Thomas Merkel (tm) Pierre and Thomas welcomed us (aliens! :)) in c-base. c-base is a space station under Berlin (or probably one of the oldest hackerspace, at least old enough that the word "hackerspace" even didn't existed!). Slides (PDF) are available! Keynote: Beautiful Open Source -- Hugo Teso Hugo talked about his experience as an open source developer and focused in particular how important is the user interface. He discussed that examinating some projects he worked on: Inguma, Bokken, Iaitö and Cutter extracting patterns about his experience. Slides (PDF) are available! The state of desktops in pkgsrc -- Youri Mouton (youri) Youri discussed about the state of desktop environments (DE) in pkgsrc starting with xfce, MATE, LXDE, KDE and Defora. He then discussed about the WIP desktop environments: Cinnamon, LXQT, Gnome 3 and CDE, hardware support and login managers. Especially for the WIP desktop environments help is more than welcomed so if you're interested in any of that, would like to help (that's also a great way to start involved in pkgsrc!) please get in touch with youri and/or give a look at the wip/*/TODO files in pkgsrc-wip! NetBSD & Mercurial: One year later -- Jörg Sonnenberger (joerg) Jörg started discussing about Git (citing High-level Problems with Git and How to Fix Them - Gregory Szorc) and then discussed on why using Mercurial. Then he announced the latest changes: hgmaster.NetBSD.org and anonhg.NetBSD.org that permits to experiment with Mercurial and source-changes-hg@ and pkgsrc-changes-hg@ mailing lists. The talk ended describing missing/TODO steps. Slides (HTML) are available! Maintaining qmail in 2018 -- Amitai Schleier (schmonz) Amitai shared his long experience in maintaining qmail. A lot of lesson learned in doing that were shared and it was also funny to see that at a certain point from MAINTAINER he was more and more involved doing that and ending up writing patches and tools for qmail. Slides (HTML) are available! A beginner's introduction to GCC -- Maya Rashish (maya) Maya discussed about GCC. First she talked about an overview of the toolchain (in general) and the corresponding GCC projects, how to pass flags to each of them and how to stop the compilation process for each of them. Then she talked about the black magic that happens in preprocessor, for example, what a program does an #include and why __NetBSD__ is defined. We then saw that with -save-temps is possible to save all intermediary results and how this is very helpful to debug possible problems. Compiler, assembler and linker were then discussed. We have also seen specfiles, readelf and other GCC internals. Slides (HTML) are available! Handling the workflow of pkgsrc-security -- Leonardo Taccari (leot) I discussed about the workflow of the pkgsrc Security Team (pkgsrc-security). I gave a brief introduction to nmh (new MH) message handling system. Then talked about the mission, tasks and workflow of the pkgsrc-security. For the last part of the talk, I tried to put everything together and showed how to try to automate some part of the pkgsrc-security with nmh and some shell scripting. Slides (PDF) are available! Preaching for releng-pkgsrc -- Benny Siegert (bsiegert) Benny discussed about pkgsrc Releng team (releng-pkgsrc). The talk started discussing about the pkgsrc Quarterly Releases. Since 2003Q4, every quarter a new pkgsrc release is released. Stable releases are the basis for binary packages. Security, build and bug fixes get applied over the liftime of the release via pullups, until the next quarterly release. The release procedure and freeze period were also discussed. Then we examined the life of a pullup. Benny first introduced what a pullup is, the rules for requesting them and a practical example of how to file a good pullup request. Under the hood parts of releng were also discussed, for example how tickets are handled with req, help script to ease the pullup, etc.. The talk concluded with the importance of releng-pkgsrc and also a call for volunteers to join releng-pkgsrc! (despite they're really doing a great work, at the moment there is a shortage of members in releng-pkgsrc, so, if you are interested and would like to join them please get in touch with them!) Something old, something new, something borrowed -- Sevan Janiyan (sevan) Sevan discussed about the state of NetBSD/macppc port. Lot of improvements and news happened (a particular kudos to macallan for doing an amazing work on the macppc port!)! HEAD-llvm builds for macppc were added; awacs(4) Bluetooth support, IPsec support, Veriexec support are all enabled by default now. radeonfb(4) and XCOFF boot loader had several improvements and now DVI is supported on the G4 Mac Mini. The other big news in the macppc land is the G5 support that will probably be interesting also for possible pkgsrc bulk builds. Sevan also discussed about some current problems (and workarounds!), bulk builds takes time, no modern browser with JavaScript support is easily available right now but also how using macppc port helped to spot several bugs. Then he discussed about Upspin (please also give a look to the corresponding package in wip/go-upspin!) Slides (PDF) are available! Magit -- Christoph Badura (bad) Christoph talk was a live introduction to Magit, a Git interface for Emacs. The talk started quoting James Mickens It Was Never Going to Work, So Let's Have Some Tea talk presented at USENIX LISA15 when James Mickens talked about when Git goes bad. We then saw how to clone a repository inside Magit, how to navigate the commits, how to create a new branch, edit a file and look at unstaged changes, stage just some hunks of a change and commit them and how to rebase them (everything is just one or two keystrokes far!). Post conf dinner After the talks we had some burgers and beers together at Spud Bencer. We formed several groups to go there from c-base and I was actually in the group that went there on foot so it was also a nice chance to sightsee Berlin (thanks to khorben for being a very nice guide! :)). Sunday (08/07): Hacking session An introduction to Forth -- Valery Ushakov (uwe) On Sunday morning Valery talked about Forth from the ground up. We saw how to implement a Forth interpreter step by step and discussed threaded code. Unfortunately the talk was not recorded... However, if you are curious I suggest taking a look to nbuwe/forth BitBucket repository. internals.txt file also contains a lot of interesting resources about Forth. Hacking session After Valery talk there was the hacking session where we hacked on pkgsrc, discussed together, etc.. Late in the afternoon some of us visited Computerspielemuseum. More than 50 years of computer games were covered there and it was fun to also play to several historical and also more recent video games. We then met again for a dinner together in Potsdamer Platz. Conclusion pkgsrcCon 2018 was really really great! First of all I would like to thank all the pkgsrcCon organizers: khorben and tm. It was very well organized and everything went well, thank you Pierre and Thomas! A big thank you also to wiedi, just after few hours all the recordings of the talk were shared and that's really impressive! Thanks also to youri and Gilberto for photographs. Last, but not least, thanks to The NetBSD Foundation for supporting three developers to attend the conference. c-base for kindly providing a very nice location for the pkgsrcCon. Our sponsors: Defora Networks for sponsoring the t-shirts and badges for the conference and SkyLime for sponsoring the catering on Saturday. Thank you! [Less]
Posted over 7 years ago by Kamil Rytarowski
Prepared by Yang Zheng (tomsun.0.7 AT Gmail DOT com) as part of GSoC 2018 This is the second part of the project of integrating libFuzzer for the userland applications, you can learn about the first part of this project in this post. After the ... [More] preparation of the first part, I started to fuzz the userland programs with the libFuzzer. The programs we chose are five: expr(1) sed(1) sh(1) file(1) ping(8) After we fuzzed them with libFuzzer, we also tried other fuzzers, i.e.: American Fuzzy Lop (AFL), honggfuzz and Radamsa. Fuzz Userland Programs with libFuzzer "LLVM Logo" by Teresa Chang / All Right Retained by Apple In this section, I'll introduce how to fuzz the five programs with libFuzzer. The libFuzzer is an in-process, coverage-guided fuzzing engine. It can provide some interfaces to be implemented by the users: LLVMFuzzerTestOneInput: fuzzing target LLVMFuzzerInitialize: initialization function to access argc and argv LLVMFuzzerCustomMutator: user-provided custom mutator LLVMFuzzerCustomCrossOver: user-provided custom cross-over function In the above functions, only the LLVMFuzzerTestOneInput is necessary to be implemented for any fuzzing programs. This function takes a buffer and the buffer length as input, it is the target to be fuzzed again and again. When the users want to finish some initialization job with argc and argv parameters, they also need to implement LLVMFuzzerInitialize. With LLVMFuzzerCustomMutator and LLVMFuzzerCustomCrossOver, the users can also change the behaviors of producing input buffer with one or two old input buffers. For more details, you can refer to this document. Fuzz Userland Programs with Sanitizers libFuzzer can be used with different sanitizers. It is quite simple to use sanitizers together with libFuzzer, you just need to add sanitizer names to the option like -fsanitize=fuzzer,address,undefined. However, memory sanitizer seems to be an exception. When we tried to use it together with libFuzzer, we got some runtime errors. The official document has mentioned that "using MemorySanitizer (MSAN) with libFuzzer is possible too, but tricky", but it doesn't mention how to use it properly. In the following part of this article, you can assume that we have used the address and undefined sanitizers together with fuzzers if there is no explicit description. Fuzz expr(1) with libFuzzer The expr(1) takes some parameters from the command line as input and then treat the command line as a whole expression to be calculated. A example usage of the expr(1) would be like this: $ expr 1 + 1 2 This program is relatively easy to fuzz, what we only to do is transform the original main function to the form of LLVMFuzzerTestOneInput. Since the implementation of the parser in expr(1) takes the argc and argv parameters as input, we need to transform the buffer provided by the LLVMFuzzerTestOneInput to the format needed by the parser. In the implementation, I assume the buffer is composed of several strings separated by the space characters (i.e.: ' ', '\t' and '\n'). Then, we can split the buffer into different strings and organize them into the form of argc and argv parameters. However, there comes the first problem when I start to fuzz expr(1) with this modification. Since the libFuzzer will treat every exit as an error while fuzzing, there will be a lot of false positives. Fortunately, the implementation of expr(1) is simple, so we only need to replace the exit(3) with the return statement. In the fuzzing process of other programs, I'll introduce how to handle the exit(3) and other error handling interfaces elegantly. You can also pass the fuzzing dictionary file (to provide keywords) and initial input cases to the libFuzzer, so that it can produce test cases more smartly. For expr(1), the dictionary file will be like this: min="-9223372036854775808" max="9223372036854775807" zero="0" one="1" negone="-1" div="/" mod="%" add="+" sub="-" or="|" add="&" And there is only one initial test case: 1 / 2 With this setting, we can quickly reproduce an existing bug which has been fixed by Kamil Rytarowski in this patch, that is, when you try to feed one of -9223372036854775808 / -1 or -9223372036854775808 % -1 expressions to expr(1), you will get a SIGFPE. After adopting the fix of this bug, it also detected a bug of integer overflow by feeding expr(1) with 9223372036854775807 * -3. This bug is detected with the help of undefined sanitizer (UBSan). This has been fixed in this commit. The fuzzing of expr(1) can be reproduced with this script. Fuzz sed(1) with libFuzzer The sed(1) reads from files or standard input (stdin) and modifying the input as specified by a list of commands. It is more complicated than the expr(1) to be fuzzed as it can receive input from several sources including command line parameters (commands), standard input (text to be operated on) and files (both commands and text). After reading the source code of sed(1), I have two findings: The commands are added by the add_compunit function The input files (including standard input) are organized by the s_flist structure and the mf_fgets function With these observations, we can manually parse the libFuzzer buffer with the interfaces above. So I organized the buffer as below: command #1 command #2 ... command #N // an empty line text strings The first several lines are the commands, one line for one command. Then there will be an empty line to identify the end of command lists. At last, the remaining part of this buffer is the text to be operated on. After parsing the buffer like this, we can add the commands one by one with the add_compunit interface. For the text, since we can directly get the whole text buffer as the format of a buffer, I re-implement the mf_fgets interface to get the input directly from the buffer provided by the libFuzzer. As mentioned before in the fuzzing of expr(1), exit(3) will result in false positives with libFuzzer. Replacing the exit(3) with return statement can solve this problem in expr(1), but it will not work in sed(1) due to the deeper function call stack. The exit(3) interface is usually used to handle the unexpected cases in the programs. So, it will be a good idea to replace it with exceptions. Unfortunately, the programs we fuzzed are all implemented in C language instead of C++. Finally, I choose to use setjmp/longjmp interfaces to handle it: use the setjmp interface to define an exit point in the LLVMFuzzerTestOneInput function, and use longjmp to jmp to this point whenever the original implementation wants to call exit(3). The dictionary file for it is like this: newline="\x0A" "a\\\" "b" "c\\\" "d" "D" "g" "G" "h" "H" "i\\\" "l" "n" "N" "p" "P" "q" "t" "x" "y" "!" ":" "=" "#" "/" And here is an initial test case: s/hello/hi/g hello, world! which means replacing the "hello" into "hi" in the text of "hello, world!". The fuzzing script of sed(1) can be found here. Fuzz sh(1) with libFuzzer sh(1) is the standard command interpreter for the system. I choose the evalstring function as the fuzzing entry for sh(1). This function takes a string as the commands to be executed, so we can directly pass the libFuzzer input buffer to this function to start fuzzing. The dictionary file we used is like this: "echo" "ls" "cat" "hostname" "test" "[" "]" We can also add some other commands and shell script syntax to this file to reproduce other conditions. And also an initial test case is provided: echo "hello, world!" You can also reproduce the fuzzing of sh(1) by this script. Fuzz file(1) with libFuzzer The fuzzing of file has been done by Christos Zoulas in this project. The difference between this program and other programs from the list is that the main functionality is provided by the libmagic library. As a result, we can directly fuzz the important functions (e.g.: magic_buffer) from this library. Fuzz ping(8) with libFuzzer The ping(8) is quite different from all of the programs mentioned above, the main input source is from the network instead of the command line, standard input or files. This challenges us a lot because we usually use the socket interface to receive network data and thus more complex to transform a single buffer into the socket model. Fortunately, the ping(8) organizes all the network interfaces as the form of hooks to be registered in a structure. So I re-implement all these necessary interfaces (including socket(2), recvfrom(2), sendto(2), poll(2) and etc.) for ping(8).These re-implemented interfaces will take the data from the libFuzzer buffer and transform it into the data to be accessed by the network interfaces. After that, then we can use libFuzzer to fuzz the network data for ping(8). The script to reproduce can be found here. Fuzz Userland Programs with Other Fuzzers To compare libFuzzer with other fuzzers from different aspects, including the effort to modify, performance and functionalities, we also fuzzed these five programs with AFL, honggfuzz and radamsa. Fuzz Programs with AFL and honggfuzz The AFL and honggfuzz can fuzz the input from standard input and file. They both provide specific compilers (such as afl-cc, afl-clang, hfuzz-cc, hfuzz-clang and etc.) to fuzz programs with coverage information. So, the basic process to fuzz programs with them is to: Use the specific compilers to compile programs with necessary sanitizers Run the fuzzed programs with proper command line parameters For detailed parameters, you can refer to the scripts for expr(1), sed(1), sh(1), file(1) and ping(8). "Miniature Lop" (A kind of fuzzy lop) from Wikipedia / CC BY-SA 3.0 There is no need to do any modification to fuzz sed(1), sh(1) and file(1) with AFL and honggfuzz, because these programs mainly get input from standard input or files. But this doesn't mean that they can achieve the same functionalities as libFuzzer. For example, to fuzz the sed(1), you may also need to pass the commands in the command line parameters. This means that you need to manually specify the commands in the command line and you cannot fuzz them with AFL and honggfuzz, because they can only fuzz input from standard input and files. There is an option of reusing the modifications from the fuzzing process with libFuzzer, but we need to further add a main function for the fuzzed program. "Höngg" (A quarter in district 10 in Zürich) by Ikiwaner / CC BY-SA 3.0 For expr(1) and ping(8), we even need more modifications than the libFuzzer solution, because expr(1) mainly gets input from command line parameters and ping(8) mainly gets input from the network. During this period, I have also prepared a package to install honggfuzz for the pkgsrc-wip repository. To make it compatible with NetBSD, we have also contributed to improving the code in the official repository, for more details, you can refer to this pull request. Fuzz Programs with Radamsa Radamsa is a test case generator, it works by reading sample files and generating different interesting outputs. Radamsa is not dependant on the fuzzed programs, it is only dependant on the input sample, which means it will not record the coverage information. "The Moomins" ("Radamsa" is a word spoken by a creature in Moomins) from the comic book cover by Tove Jansson With Radamsa, we can use scripts to fuzz different programs with different input sources. For the expr(1), we can generate the mutated string and store it to a variable in the shell script and then feed it to the expr(1) in command line parameters. For the sed(1), we can generate both command strings and text by Radamsa and then feed them by command line parameters and file separately. For both sh(1) and file(1), we can generate the needed input file by Radamsa in the shell scripts. It seems that the shell script and Radamsa combination can fuzz any kinds of programs, but it encounters some problems with ping(8). Although Radamsa supports generating input cases as a network server or client, it doesn't support the ICMP protocol. This means that we can not fuzz ping(8) with modifications or help from other applications. Comparison Among Different Fuzzers In this project, we have tried four different fuzzers: libFuzzer, AFL, honggfuzz and Radamsa. In this section, I will introduce a comparison from different aspects. Modification of Fuzzing For the programs we mentioned above, here I list the lines of code we need to modify as a factor of porting difficulties: expr(1) sed(1) sh(1) file(1) ping(8) libFuzzer 128 96 60 48 582 AFL/honggfuzz 142 0 0 0 590 Radamsa 0 0 0 0 N/A As mentioned before, the libFuzzer needs to modify more lines for programs who mainly get input from standard input and files. However, for other programs (i.e.: expr(1) and ping(8)), the AFL and honggfuzz need to add more lines of code to get input from these sources. As for Radamsa, since it only needs the sample input data to generate outputs, it can fuzz all programs without modifications except ping(8). Binary Sizes The binary sizes for these fuzzers should also be considered if we want to ship them with NetBSD. The following binary sizes are based on the NetBSD-current with the nearly newest LLVM (compiled from source) as an external toolchain: Dependency Compilers Fuzzer Tools Total libFuzzer 0 56MB N/A 0 56MB AFL 0 24KB 292KB 152KB 468KB honggfuzz 36KB 840KB 124KB 0 1000KB Radamsa 588KB 0 608KB 0 1196KB The above table shows the space needed to install different fuzzers. The "Dependency" column shows the size of dependant library; the "Compilers" column shows the size of compilers used for re-compiling fuzzed programs; the "Fruzzer" column shows the size of fuzzer itself and the "Tools" column shows the size of analysis tools. For the libFuzzer, if the system has already included the LLVM together with compiler-rt as the toolchain, we don't need extra space to import it. The fuzzer of libFuzzer is compiled together with the user's program, so the size is not counted. The compiler size shown above in this table is the size of statically compiled compiler clang. If we compile it dynamically, then there will be a plenty of dependant libraries should be considered. For the AFL, there is no dependant library except libc, so the size is zero. It will also introduce some tools like afl-analyze, afl-cmin and etc. The honggfuzz is dependant on the libBlocksRuntime library whose size is 36KB. This library is also included in the compiler-rt of LLVM. So, if you have already installed it, this size can be ignored. As for the Radamsa, it needs the Owl Lisp during the building process. So the size of the dependency is the size of Owl Lisp interpreter. Compiler Compatibility All these fuzzers except libFuzzer are compatible with both GCC and clang. The AFL and honggfuzz provide a wrapper for the native compiler, and the Radamsa does not care about the compilers. As for the libFuzzer, it is implemented in the compiler-rt of LLVM, so it cannot support the GCC compiler. Support for Sanitizers All these fuzzers can work together with sanitizers, but only the libFuzzer can provide a relatively strong guarantee that it can provide them. The AFL and honggfuzz, as I mentioned above, provide some wrappers for the underlying compiler. This means that it is dependant on the native compiler to decide whether they can fuzz the programs with the support of sanitizers. The Radamsa can only fuzz the binary directly, so the programs should be compiled with the sanitizers first. However, since the sanitizers are in the compiler-rt together with libFuzzer, you can directly add some flags of sanitizers while compiling the fuzzed programs. Performance At last, you may wonder how fast are those fuzzers to find an existing bug. For the above programs we have fuzzed in NetBSD, only libFuzzer can find two bugs for the expr(1). However, we cannot assert that the libFuzzer performs well than others. To further evaluate the performance of different fuzzers we have used, I choose some simple functions with bugs to measure how fast they can find them out. Here is a table to show the time for them to find the first bug: libFuzzer AFL honggfuzz Radamsa DivTest+S &lt1s 7s 1s 7s DivTest &gt10min &gt10min 2s &gt10min SimpleTest+S &lt1s &gt10min 1s &gt10min SimpleTest &lt1s &gt10min 1s &gt10min CxxStringEqTest+S &lt1s &gt10min 2s &gt10min CxxStringEqTest &gt10min &gt10min 2s &gt10min CounterTest+S 1s 5min 1s 7min CounterTest 1s 4min 1s 7min SimpleHashTest+S &lt1s 3s 1s 2s The "+S" symbol means the version with sanitizers (in this evaluation, I used address and undefined sanitizers). In this table, we can observe that libFuzzer and honggfuzz perform better than others in most cases. And another point is that fuzzers can work better with sanitizers. For example, in the case of DivTest, the primary goal of this test is to trigger a "divide-by-zero" error, however, when working with the undefined sanitizer, all these fuzzers will trigger the "integer overflow" error more quickly. I only present a part of the interesting results of this evaluation here. You can refer to this script to reproduce some results or do more evaluation by yourself. Summary In the past one month, I mainly contributed to: Porting the libFuzzer to NetBSD Preparing a pkgsrc-wip package for honggfuzz Fuzzing some userland programs with libFuzzer and other three different fuzzers Evaluating different fuzzers from different aspects Regarding the third contribution, I tried to use different methods to handle them according to their features. During this period, I have fortunately found two bugs for the expr(1). I'd like to thank my mentor Kamil Rytarowski and Christos Zoulas for their suggestions and proposals. I also want to thank Kamil Frankowicz for his advice on fuzzing and playing with AFL. At last, thanks to Google and the NetBSD community for giving me a good opportunity to work on this project. [Less]
Posted over 7 years ago by Kamil Rytarowski
Prepared by Yang Zheng (tomsun.0.7 AT Gmail DOT com) as part of GSoC 2018 This is the second part of the project of integrating libFuzzer for the userland applications, you can learn about the first part of this project in this post. After the ... [More] preparation of the first part, I started to fuzz the userland programs with the libFuzzer. The programs we chose are five: expr(1) sed(1) sh(1) file(1) ping(8) After we fuzzed them with libFuzzer, we also tried other fuzzers, i.e.: American Fuzzy Lop (AFL), honggfuzz and Radamsa. Fuzz Userland Programs with libFuzzer "LLVM Logo" by Teresa Chang / All Right Retained by Apple In this section, I'll introduce how to fuzz the five programs with libFuzzer. The libFuzzer is an in-process, coverage-guided fuzzing engine. It can provide some interfaces to be implemented by the users: LLVMFuzzerTestOneInput: fuzzing target LLVMFuzzerInitialize: initialization function to access argc and argv LLVMFuzzerCustomMutator: user-provided custom mutator LLVMFuzzerCustomCrossOver: user-provided custom cross-over function In the above functions, only the LLVMFuzzerTestOneInput is necessary to be implemented for any fuzzing programs. This function takes a buffer and the buffer length as input, it is the target to be fuzzed again and again. When the users want to finish some initialization job with argc and argv parameters, they also need to implement LLVMFuzzerInitialize. With LLVMFuzzerCustomMutator and LLVMFuzzerCustomCrossOver, the users can also change the behaviors of producing input buffer with one or two old input buffers. For more details, you can refer to this document. Fuzz Userland Programs with Sanitizers libFuzzer can be used with different sanitizers. It is quite simple to use sanitizers together with libFuzzer, you just need to add sanitizer names to the option like -fsanitize=fuzzer,address,undefined. However, memory sanitizer seems to be an exception. When we tried to use it together with libFuzzer, we got some runtime errors. The official document has mentioned that "using MemorySanitizer (MSAN) with libFuzzer is possible too, but tricky", but it doesn't mention how to use it properly. In the following part of this article, you can assume that we have used the address and undefined sanitizers together with fuzzers if there is no explicit description. Fuzz expr(1) with libFuzzer The expr(1) takes some parameters from the command line as input and then treat the command line as a whole expression to be calculated. A example usage of the expr(1) would be like this: $ expr 1 + 1 2 This program is relatively easy to fuzz, what we only to do is transform the original main function to the form of LLVMFuzzerTestOneInput. Since the implementation of the parser in expr(1) takes the argc and argv parameters as input, we need to transform the buffer provided by the LLVMFuzzerTestOneInput to the format needed by the parser. In the implementation, I assume the buffer is composed of several strings separated by the space characters (i.e.: ' ', '\t' and '\n'). Then, we can split the buffer into different strings and organize them into the form of argc and argv parameters. However, there comes the first problem when I start to fuzz expr(1) with this modification. Since the libFuzzer will treat every exit as an error while fuzzing, there will be a lot of false positives. Fortunately, the implementation of expr(1) is simple, so we only need to replace the exit(3) with the return statement. In the fuzzing process of other programs, I'll introduce how to handle the exit(3) and other error handling interfaces elegantly. You can also pass the fuzzing dictionary file (to provide keywords) and initial input cases to the libFuzzer, so that it can produce test cases more smartly. For expr(1), the dictionary file will be like this: min="-9223372036854775808" max="9223372036854775807" zero="0" one="1" negone="-1" div="/" mod="%" add="+" sub="-" or="|" add="&" And there is only one initial test case: 1 / 2 With this setting, we can quickly reproduce an existing bug which has been fixed by Kamil Rytarowski in this patch, that is, when you try to feed one of -9223372036854775808 / -1 or -9223372036854775808 % -1 expressions to expr(1), you will get a SIGFPE. After adopting the fix of this bug, it also detected a bug of integer overflow by feeding expr(1) with 9223372036854775807 * -3. This bug is detected with the help of undefined sanitizer (UBSan). This has been fixed in this commit. The fuzzing of expr(1) can be reproduced with this script. Fuzz sed(1) with libFuzzer The sed(1) reads from files or standard input (stdin) and modifying the input as specified by a list of commands. It is more complicated than the expr(1) to be fuzzed as it can receive input from several sources including command line parameters (commands), standard input (text to be operated on) and files (both commands and text). After reading the source code of sed(1), I have two findings: The commands are added by the add_compunit function The input files (including standard input) are organized by the s_flist structure and the mf_fgets function With these observations, we can manually parse the libFuzzer buffer with the interfaces above. So I organized the buffer as below: command #1 command #2 ... command #N // an empty line text strings The first several lines are the commands, one line for one command. Then there will be an empty line to identify the end of command lists. At last, the remaining part of this buffer is the text to be operated on. After parsing the buffer like this, we can add the commands one by one with the add_compunit interface. For the text, since we can directly get the whole text buffer as the format of a buffer, I re-implement the mf_fgets interface to get the input directly from the buffer provided by the libFuzzer. As mentioned before in the fuzzing of expr(1), exit(3) will result in false positives with libFuzzer. Replacing the exit(3) with return statement can solve this problem in expr(1), but it will not work in sed(1) due to the deeper function call stack. The exit(3) interface is usually used to handle the unexpected cases in the programs. So, it will be a good idea to replace it with exceptions. Unfortunately, the programs we fuzzed are all implemented in C language instead of C++. Finally, I choose to use setjmp/longjmp interfaces to handle it: use the setjmp interface to define an exit point in the LLVMFuzzerTestOneInput function, and use longjmp to jmp to this point whenever the original implementation wants to call exit(3). The dictionary file for it is like this: newline="\x0A" "a\\\" "b" "c\\\" "d" "D" "g" "G" "h" "H" "i\\\" "l" "n" "N" "p" "P" "q" "t" "x" "y" "!" ":" "=" "#" "/" And here is an initial test case: s/hello/hi/g hello, world! which means replacing the "hello" into "hi" in the text of "hello, world!". The fuzzing script of sed(1) can be found here. Fuzz sh(1) with libFuzzer sh(1) is the standard command interpreter for the system. I choose the evalstring function as the fuzzing entry for sh(1). This function takes a string as the commands to be executed, so we can directly pass the libFuzzer input buffer to this function to start fuzzing. The dictionary file we used is like this: "echo" "ls" "cat" "hostname" "test" "[" "]" We can also add some other commands and shell script syntax to this file to reproduce other conditions. And also an initial test case is provided: echo "hello, world!" You can also reproduce the fuzzing of sh(1) by this script. Fuzz file(1) with libFuzzer The fuzzing of file has been done by Christos Zoulas in this project. The difference between this program and other programs from the list is that the main functionality is provided by the libmagic library. As a result, we can directly fuzz the important functions (e.g.: magic_buffer) from this library. Fuzz ping(8) with libFuzzer The ping(8) is quite different from all of the programs mentioned above, the main input source is from the network instead of the command line, standard input or files. This challenges us a lot because we usually use the socket interface to receive network data and thus more complex to transform a single buffer into the socket model. Fortunately, the ping(8) organizes all the network interfaces as the form of hooks to be registered in a structure. So I re-implement all these necessary interfaces (including socket(2), recvfrom(2), sendto(2), poll(2) and etc.) for ping(8).These re-implemented interfaces will take the data from the libFuzzer buffer and transform it into the data to be accessed by the network interfaces. After that, then we can use libFuzzer to fuzz the network data for ping(8). The script to reproduce can be found here. Fuzz Userland Programs with Other Fuzzers To compare libFuzzer with other fuzzers from different aspects, including the effort to modify, performance and functionalities, we also fuzzed these five programs with AFL, honggfuzz and radamsa. Fuzz Programs with AFL and honggfuzz The AFL and honggfuzz can fuzz the input from standard input and file. They both provide specific compilers (such as afl-cc, afl-clang, hfuzz-cc, hfuzz-clang and etc.) to fuzz programs with coverage information. So, the basic process to fuzz programs with them is to: Use the specific compilers to compile programs with necessary sanitizers Run the fuzzed programs with proper command line parameters For detailed parameters, you can refer to the scripts for expr(1), sed(1), sh(1), file(1) and ping(8). "Miniature Lop" (A kind of fuzzy lop) from Wikipedia / CC BY-SA 3.0 There is no need to do any modification to fuzz sed(1), sh(1) and file(1) with AFL and honggfuzz, because these programs mainly get input from standard input or files. But this doesn't mean that they can achieve the same functionalities as libFuzzer. For example, to fuzz the sed(1), you may also need to pass the commands in the command line parameters. This means that you need to manually specify the commands in the command line and you cannot fuzz them with AFL and honggfuzz, because they can only fuzz input from standard input and files. There is an option of reusing the modifications from the fuzzing process with libFuzzer, but we need to further add a main function for the fuzzed program. "Höngg" (A quarter in district 10 in Zürich) by Ikiwaner / CC BY-SA 3.0 For expr(1) and ping(8), we even need more modifications than the libFuzzer solution, because expr(1) mainly gets input from command line parameters and ping(8) mainly gets input from the network. During this period, I have also prepared a package to install honggfuzz for the pkgsrc-wip repository. To make it compatible with NetBSD, we have also contributed to improving the code in the official repository, for more details, you can refer to this pull request. Fuzz Programs with Radamsa Radamsa is a test case generator, it works by reading sample files and generating different interesting outputs. Radamsa is not dependant on the fuzzed programs, it is only dependant on the input sample, which means it will not record the coverage information. "The Moomins" ("Radamsa" is a word spoken by a creature in Moomins) from the comic book cover by Tove Jansson With Radamsa, we can use scripts to fuzz different programs with different input sources. For the expr(1), we can generate the mutated string and store it to a variable in the shell script and then feed it to the expr(1) in command line parameters. For the sed(1), we can generate both command strings and text by Radamsa and then feed them by command line parameters and file separately. For both sh(1) and file(1), we can generate the needed input file by Radamsa in the shell scripts. It seems that the shell script and Radamsa combination can fuzz any kinds of programs, but it encounters some problems with ping(8). Although Radamsa supports generating input cases as a network server or client, it doesn't support the ICMP protocol. This means that we can not fuzz ping(8) with modifications or help from other applications. Comparison Among Different Fuzzers In this project, we have tried four different fuzzers: libFuzzer, AFL, honggfuzz and Radamsa. In this section, I will introduce a comparison from different aspects. Modification of Fuzzing For the programs we mentioned above, here I list the lines of code we need to modify as a factor of porting difficulties: expr(1) sed(1) sh(1) file(1) ping(8) libFuzzer 128 96 60 48 582 AFL/honggfuzz 142 0 0 0 590 Radamsa 0 0 0 0 N/A As mentioned before, the libFuzzer needs to modify more lines for programs who mainly get input from standard input and files. However, for other programs (i.e.: expr(1) and ping(8)), the AFL and honggfuzz need to add more lines of code to get input from these sources. As for Radamsa, since it only needs the sample input data to generate outputs, it can fuzz all programs without modifications except ping(8). Binary Sizes The binary sizes for these fuzzers should also be considered if we want to ship them with NetBSD. The following binary sizes are based on the NetBSD-current with the nearly newest LLVM (compiled from source) as an external toolchain: Dependency Compilers Fuzzer Tools Total libFuzzer 0 56MB N/A 0 56MB AFL 0 24KB 292KB 152KB 468KB honggfuzz 36KB 840KB 124KB 0 1000KB Radamsa 588KB 0 608KB 0 1196KB The above table shows the space needed to install different fuzzers. The "Dependency" column shows the size of dependant library; the "Compilers" column shows the size of compilers used for re-compiling fuzzed programs; the "Fruzzer" column shows the size of fuzzer itself and the "Tools" column shows the size of analysis tools. For the libFuzzer, if the system has already included the LLVM together with compiler-rt as the toolchain, we don't need extra space to import it. The fuzzer of libFuzzer is compiled together with the user's program, so the size is not counted. The compiler size shown above in this table is the size of statically compiled compiler clang. If we compile it dynamically, then there will be a plenty of dependant libraries should be considered. For the AFL, there is no dependant library except libc, so the size is zero. It will also introduce some tools like afl-analyze, afl-cmin and etc. The honggfuzz is dependant on the libBlocksRuntime library whose size is 36KB. This library is also included in the compiler-rt of LLVM. So, if you have already installed it, this size can be ignored. As for the Radamsa, it needs the Owl Lisp during the building process. So the size of the dependency is the size of Owl Lisp interpreter. Compiler Compatibility All these fuzzers except libFuzzer are compatible with both GCC and clang. The AFL and honggfuzz provide a wrapper for the native compiler, and the Radamsa does not care about the compilers. As for the libFuzzer, it is implemented in the compiler-rt of LLVM, so it cannot support the GCC compiler. Support for Sanitizers All these fuzzers can work together with sanitizers, but only the libFuzzer can provide a relatively strong guarantee that it can provide them. The AFL and honggfuzz, as I mentioned above, provide some wrappers for the underlying compiler. This means that it is dependant on the native compiler to decide whether they can fuzz the programs with the support of sanitizers. The Radamsa can only fuzz the binary directly, so the programs should be compiled with the sanitizers first. However, since the sanitizers are in the compiler-rt together with libFuzzer, you can directly add some flags of sanitizers while compiling the fuzzed programs. Performance At last, you may wonder how fast are those fuzzers to find an existing bug. For the above programs we have fuzzed in NetBSD, only libFuzzer can find two bugs for the expr(1). However, we cannot assert that the libFuzzer performs well than others. To further evaluate the performance of different fuzzers we have used, I choose some simple functions with bugs to measure how fast they can find them out. Here is a table to show the time for them to find the first bug: libFuzzer AFL honggfuzz Radamsa DivTest+S &lt1s 7s 1s 7s DivTest &gt10min &gt10min 2s &gt10min SimpleTest+S &lt1s &gt10min 1s &gt10min SimpleTest &lt1s &gt10min 1s &gt10min CxxStringEqTest+S &lt1s &gt10min 2s &gt10min CxxStringEqTest &gt10min &gt10min 2s &gt10min CounterTest+S 1s 5min 1s 7min CounterTest 1s 4min 1s 7min SimpleHashTest+S &lt1s 3s 1s 2s The "+S" symbol means the version with sanitizers (in this evaluation, I used address and undefined sanitizers). In this table, we can observe that libFuzzer and honggfuzz perform better than others in most cases. And another point is that fuzzers can work better with sanitizers. For example, in the case of DivTest, the primary goal of this test is to trigger a "divide-by-zero" error, however, when working with the undefined sanitizer, all these fuzzers will trigger the "integer overflow" error more quickly. I only present a part of the interesting results of this evaluation here. You can refer to this script to reproduce some results or do more evaluation by yourself. Summary In the past one month, I mainly contributed to: Porting the libFuzzer to NetBSD Preparing a pkgsrc-wip package for honggfuzz Fuzzing some userland programs with libFuzzer and other three different fuzzers Evaluating different fuzzers from different aspects Regarding the third contribution, I tried to use different methods to handle them according to their features. During this period, I have fortunately found two bugs for the expr(1). I'd like to thank my mentor Kamil Rytarowski and Christos Zoulas for their suggestions and proposals. I also want to thank Kamil Frankowicz for his advice on fuzzing and playing with AFL. At last, thanks to Google and the NetBSD community for giving me a good opportunity to work on this project. [Less]