0
I Use This!
Inactive

Commits : Listings

Analyzed 1 day ago. based on code collected 1 day ago.
Jun 07, 2024 — Jun 07, 2025
Commit Message Contributor Files Modified Lines Added Lines Removed Code Location Date
consistently use <http://url.example.net> with <> but not (<>) More... about 13 years ago
rename README -> README.rst for GitHub formatting More... about 13 years ago
Reformat README as Markdown/reStructuredText More... about 13 years ago
Added .gitignore More... about 13 years ago
Removed the feature that counts donwloaded files More... almost 14 years ago
svn path=/src/trunk/corpuscatcher/; revision=17530 More... about 14 years ago
Tries to align files in two folders - src and tgt language - using html structure, numbers and url correspondence More... about 14 years ago
Improved pattern matching for urls by using regex's More... about 14 years ago
Added an option (-e) to specify a pattern to be matched in the URLs to be downloaded. More... about 14 years ago
Fixed a bug related to selecting encodings for html files More... about 14 years ago
Assume that immediately consequtive lines are part of the same paragraph and join them. Split paragraphs in our outputs by two newlines. More... about 14 years ago
Some cleanup, simplification, reordering More... about 14 years ago
Better support for non-list output (output as running text) More... about 14 years ago
Suppress unnecessary warning about having the browser handle gzipped data More... over 16 years ago
Don't convert pages if there's nothing to convert. More... over 16 years ago
- Moved browser object initialization to a seperate method (so that it's available to importing clients). - Added a "browser" parameter to download_url(). More... over 16 years ago
Fixed a bug where only the last crawled URL (and its connections) are converted to text. More... over 16 years ago
Make corpuscatcher an importable module. More... over 16 years ago
Added support for handling more encodings. More... almost 17 years ago
- Added -V/--version command-line argument - Added more specific settings to the mechanize.Browser object used for crawling More... almost 17 years ago
Added -V/--version command-line argument. More... almost 17 years ago
Documentation updated: - Added LICENSE and __version__.py - README points the read to the README on the wiki. More... almost 17 years ago
Fix copyright date More... almost 17 years ago
Correct copyright dates More... almost 17 years ago
Initial version of CorpusCatcher tools. More... almost 17 years ago