0
I Use This!
Inactive
Analyzed 1 day ago. based on code collected 1 day ago.

Project Summary

A collection of string sorting algorithm implementationsThis collection features several string sorting algorithm implementations, that have been tuned to take better advantage of modern hardware. Classic implementations tend to optimize instruction counts, but when sorting large collections of strings, we also need to focus on memory issues. All algorithms are implemented using C and C++, and are released under the MIT license.

Technical details:

All of the implementations sort the strings by raw byte values. This means that they are mainly intended for research use. Includes several variants of known and efficient (string) sorting algorithms, such as MSD radix sort, burstsort and multi-key-quicksort. Emphasis on reducing cache misses and memory stalls. Includes the tools to create an extensive HTML based report, that can be used to compare the different implementations. The report includes technical details, such as TLB, L1 and L2 cache misses, run times and memory peak usage. Supports Linux huge pages. For more information, see below.

RequirementsLinux Boost CMake Requirements to create the HTML based reportOProfile for most measurements, probably also requires root privileges. The default settings use Intel Core 2 specific events. When profiling on other platforms, you will most likely need to modify the scripts in the report/ directory. /usr/bin/memusage for measuring the memory peak usage. This is a GNU libc utility. CompilationCompiling with GCC $ svn checkout http://string-sorting.googlecode.com/svn/trunk/ string-sorting-read-only
$ mkdir string-sorting-read-only-build
$ cd string-sorting-read-only-build
$ cmake -DCMAKE_BUILD_TYPE=release ../string-sorting-read-only
$ make
$ ./sortstringCompiling with ICCThe option USE_ICC can be used to select the Intel C/C++ compiler (ICC) easily. This sets up the proper compiler for C and C++ files, and selects some ICC specific compiler options.

$ svn checkout http://string-sorting.googlecode.com/svn/trunk/ string-sorting-read-only
$ mkdir icc-build
$ cd icc-build
$ cmake -DCMAKE_BUILD_TYPE=release -DUSE_ICC=true ../string-sorting-read-only
$ make
$ ./sortstringDebug buildA separate binary can be built for easier debugging. In this build, runtime assertions from the algorithm implementations and libraries (ie. Boost) are enabled. Some implementations also print out lots of debugging messages.

You can use either GCC (default) or ICC (-DUSE_ICC=true).

$ svn checkout http://string-sorting.googlecode.com/svn/trunk/ string-sorting-read-only
$ mkdir debug-build
$ cd debug-build
$ cmake -DCMAKE_BUILD_TYPE=debug ../string-sorting-read-only
$ make
$ ./sortstringHuge pagesThe default page size on x86 processors is quite small, something like four kilobytes. When working with large data sets, this means that the input is spread to thousands of memory pages. Unfortunately random access in thousands of pages can be slow (see e.g. http://en.wikipedia.org/wiki/Translation_lookaside_buffer). Using a larger page size can alleviate the problem. If the page size is in the order of megabytes or even gigabytes, most data sets can fit into a much smaller amount of memory pages.

In this program, huge pages can be enabled using the --hugetlb-text and --hugetlb-ptrs options. The former option places the input data (i.e. the actual strings from the given file) into huge pages, and the latter option places the string pointer array into huge pages. Using huge pages in Linux requires a recent kernel version, CPU support, and properly adjusted kernel settings.

libhugetlbfs (http://libhugetlbfs.ozlabs.org/) can be used to replace all calls to malloc to use huge pages.

Tags

sorting

In a Nutshell, string-sorting...

Quick Reference

MIT License
Permitted

Commercial Use

Modify

Distribute

Sub-License

Private Use

Forbidden

Hold Liable

Required

Include Copyright

Include License

These details are provided for information only. No information here is legal advice and should not be used as such.

This Project has No vulnerabilities Reported Against it

Did You Know...

  • ...
    55% of companies leverage OSS for production infrastructure
  • ...
    anyone with an Open Hub account can update a project's tags
  • ...
    65% of companies leverage OSS to speed application development in 2016
  • ...
    data presented on the Open Hub is available through our API

Languages

C++
70%
C
24%
5 Other
6%

30 Day Summary

Jan 26 2026 — Feb 25 2026

12 Month Summary

Feb 25 2025 — Feb 25 2026

Ratings

Be the first to rate this project
Click to add your rating
  
Review this Project!