Forums : Suggestions for Ohloh 2.0

Dear Open Hub Users,

We’re excited to announce that we will be moving the Open Hub Forum to https://community.blackduck.com/s/black-duck-open-hub. Beginning immediately, users can head over, register, get technical help and discuss issue pertinent to the Open Hub. Registered users can also subscribe to Open Hub announcements here.

On May 1, 2020, we will be freezing https://www.openhub.net/forums and users will not be able to create new discussions. If you have any questions and concerns, please email us at [email protected]

Your source control analysis d

Your source control analysis doesn't seem to be able to deal well with projects that have been moved within a single subversion repository. looking at Solr for example...

http://www.ohloh.net/projects/5455/analyses/latest

...ohloh indicates that the project has only been around since Jan 2007, but that's just when it graduated from incubation and the files were svn mved to their current location. The svn history shows the files actually go back to Jan 2006...

http://svn.apache.org/viewvc/lucene/solr/trunk/README.txt?view=log

It actually looks like ohloh would be able to handle this if the code had been completely moved from one repository to another (by adding a second enlistment) but since it was moved within the same repository there is no old repository URL to add.

Anonymous Coward about 18 years ago

Yes, this is a known limitation with our Subversion importer -- we don't follow any file copies or renames.

I'd desperately love to get this fixed.

Robin Luckey about 18 years ago

okie, since my head isn't working, remind me again why the importer needs --stop-on-copy

Daniel / Nazca ... about 18 years ago

OK, this always makes my head hurt, but here we go:

Our Subversion importer is brute-force. You give us an URL, and we check out every revision in that URL. Between revisions, we look for diffs and put the diffs in our database.

For example, suppose you tell us to download http://myproject.com/trunk.

Also suppose that at some point in history, the admins had the great idea to reorganize the source tree, and so in the early history all the code lived in a different directory, for example http://myproject.com/myproject.

So at some point in the Subversion log, there's going to be a massive event which looks something like this:

A /trunk
D /myproject

Everything in the log prior to this event is going refer only to files in the /myproject directory.

And here's the problem: We're only checking out and looking for diffs in /trunk. To us, the /myproject directory is a completely different URL which we are not downloading. So all of the log entries for activity in the /myproject directory is irrelevant to us, because they don't change the /trunk directory at all.

So rather than waste our time checking out revisions that don't affect /trunk, we suppress these revisions from the start by passing --stop-on-copy to svn log.

I freely admit that this is a very crude (but very reliable) way to do downloads, and for a long time we've wanted to reimplement our Subversion importer (the new svn clone command features prominently in this plan). This should allow us to both download Subversion history more quickly, have better fidelity, and open the door to proper handling of copying and branching. But life has a way of continually interrupting these plans....

Robin Luckey about 18 years ago

*nod* it's a scope problem.

The question is, should branching actually be supported? When someone adds an enlistment, they give the trunk directory ... and the history of the trunk and the history of the folder are two different things.

As a thought... you're doing svn log, then starting at the beginning and stepping through each revision in the log, right? Would it be a lot of work to change it about a little so it looks like this:

svn log the enlistment (without --stop-on-copy)
grep out files outside the enlistment url
grep out empty revisions (could possibly be combined with the above step)
start checking out as before

It's not as pretty as the current method, but it might work?

Though as I say, it begs the question of if the parser should be sticking to the url it's given, or following the history of the folder at that url :/ I can see arguments for both logics.

(the new svn clone command features prominently in this plan).

svn clone? that is new. Suppose if you can clone the repo that simplifies the problem greatly cause you then have the dump filter to play with.

Edit: grrr, bug in formating code: before I added the escaping, the asterisks around nod made italics, but the ones around is didn't

Daniel / Nazca ... about 18 years ago