Project Summary
A collection of string sorting algorithm implementationsThis collection features several string sorting algorithm implementations, that have been tuned to take better advantage of modern hardware. Classic implementations tend to optimize instruction counts, but when sorting large collections of strings, we also need to focus on memory issues. All algorithms are implemented using C and C++, and are released under the MIT license.
Technical details:
All of the implementations sort the strings by raw byte values. This means that they are mainly intended for research use. Includes several variants of known and efficient (string) sorting algorithms, such as MSD radix sort, burstsort and multi-key-quicksort. Emphasis on reducing cache misses and memory stalls. Includes the tools to create an extensive HTML based report, that can be used to compare the different implementations. The report includes technical details, such as TLB, L1 and L2 cache misses, run times and memory peak usage. Supports Linux huge pages. For more information, see below.
RequirementsLinux Boost CMake Requirements to create the HTML based reportOProfile for most measurements, probably also requires root privileges. The default settings use Intel Core 2 specific events. When profiling on other platforms, you will most likely need to modify the scripts in the report/ directory. /usr/bin/memusage for measuring the memory peak usage. This is a GNU libc utility. CompilationCompiling with GCC $ svn checkout http://string-sorting.googlecode.com/svn/trunk/ string-sorting-read-only
$ mkdir string-sorting-read-only-build
$ cd string-sorting-read-only-build
$ cmake -DCMAKE_BUILD_TYPE=release ../string-sorting-read-only
$ make
$ ./sortstringCompiling with ICCThe option USE_ICC can be used to select the Intel C/C++ compiler (ICC) easily. This sets up the proper compiler for C and C++ files, and selects some ICC specific compiler options.
$ svn checkout http://string-sorting.googlecode.com/svn/trunk/ string-sorting-read-only
$ mkdir icc-build
$ cd icc-build
$ cmake -DCMAKE_BUILD_TYPE=release -DUSE_ICC=true ../string-sorting-read-only
$ make
$ ./sortstringDebug buildA separate binary can be built for easier debugging. In this build, runtime assertions from the algorithm implementations and libraries (ie. Boost) are enabled. Some implementations also print out lots of debugging messages.
You can use either GCC (default) or ICC (-DUSE_ICC=true).
$ svn checkout http://string-sorting.googlecode.com/svn/trunk/ string-sorting-read-only
$ mkdir debug-build
$ cd debug-build
$ cmake -DCMAKE_BUILD_TYPE=debug ../string-sorting-read-only
$ make
$ ./sortstringHuge pagesThe default page size on x86 processors is quite small, something like four kilobytes. When working with large data sets, this means that the input is spread to thousands of memory pages. Unfortunately random access in thousands of pages can be slow (see e.g. http://en.wikipedia.org/wiki/Translation_lookaside_buffer). Using a larger page size can alleviate the problem. If the page size is in the order of megabytes or even gigabytes, most data sets can fit into a much smaller amount of memory pages.
In this program, huge pages can be enabled using the --hugetlb-text and --hugetlb-ptrs options. The former option places the input data (i.e. the actual strings from the given file) into huge pages, and the latter option places the string pointer array into huge pages. Using huge pages in Linux requires a recent kernel version, CPU support, and properly adjusted kernel settings.
libhugetlbfs (http://libhugetlbfs.ozlabs.org/) can be used to replace all calls to malloc to use huge pages.