Forums : Ohloh General Discussion

Dear Open Hub Users,

We’re excited to announce that we will be moving the Open Hub Forum to https://community.blackduck.com/s/black-duck-open-hub. Beginning immediately, users can head over, register, get technical help and discuss issue pertinent to the Open Hub. Registered users can also subscribe to Open Hub announcements here.

On May 1, 2020, we will be freezing https://www.openhub.net/forums and users will not be able to create new discussions. If you have any questions and concerns, please email us at [email protected]

missing contributors

I have recently added an enlistment to a project. The status tells it has been processed. However, some people that contributed to the code are not listed in the contributors list...

Further, the project report does not show the early history (six monthes missing)

Is there a delay before both report and contributors list get updated ?

Godefroid Chapelle over 18 years ago

I also wonder about this because I am not listed for the Drupal (contributed modules) project, when I have been contributing for the last few months. I can't even find commits for the project I'm working on there and the commits are going to HEAD and not just a branch.

This project has been listed on ohloh since November 1, 2006, so it ought not to be a delay issue.

Andrew Sterling... over 18 years ago

Godefroid, specifically which enlistment are you looking at?

Which contributor names are missing?

Robin Luckey over 18 years ago

Hi Andrew,

I spent a long and complicated day trying to figure out why your name does not appear in our report. From the CVS logs, it's clear that your name should appear. I suspect this is a one-time data corruption issue, or perhaps an unusual CVS failure occured at the time we downloaded your first commit.

While I continue to investigate, I've started a new, fresh download of the Drupal contributions module. I'm optimistic that this will clear up the report problems and your name will be correctly listed. This module is pretty large, so I would expect this to take a few days at least to complete.

Robin Luckey over 18 years ago

gchapelle
tziade
vbaumann

all found in http://svn.nuxeo.org/z3lab/azax/trunk/ of KSS

Godefroid Chapelle over 18 years ago

Hi Godefroid,

I have bad news.

Our Subversion parser does not follow the history of branches. We only import activity which occurred in the directory specified in the enlistment, in this case, /azax/trunk.

/azax/trunk was created on 2006-05-02, so we do not show any history before this date.

The work by tziade occured in /azax/branches/plugin, a directory which is ignored by our importer.

A lot of projects are impacted by this problem. This is a priority fix for us.

Robin Luckey over 18 years ago

I found a way around this by enlisting each subdirectory as a different repository, the parser merged all the projects together into one for the code tab and contributors.

In fact, i had to do it this way, as there are folders within trunk/ for me which are not part of my project, and i would not want to be included as part of the projects source tree in ohloh.

brain over 18 years ago

Hi Robin,

I understand that you are currently not parsing branches. Because many projects have contributors working on branches and other people doing the merge on trunk later, the work done by the original contributors is currently not tracked. I suppose this is the problem you mention.

However, I just checked the repository :
svn log http://svn.nuxeo.org/z3lab/azax/trunk/demos.

This shows code by tziade on 2005-06-25. This is the code I am surprised to not see in your report.

Godefroid Chapelle over 18 years ago

/z3lab/azax/trunk/demos is in a different directory.

/azax/trunk is the directory which was parsed to created the Ohloh report, and only files that begin with this prefix are in the Ohloh report.

Complication: yes, it's true that these are really the same directory, because on 2006-11-14 /z3lab/azax was renamed to /azax. However, for our purposes, this is functionally the same as creating a new branch in Subversion, and our system can't follow the history beyond the rename.

Robin Luckey over 18 years ago

http://svn.nuxeo.org/z3lab/azax/trunk is one of the enlistments for KSS project.
The enlistment page tells it has been completed 4 days ago...
/z3lab/azax/trunk/demosshould be parsed if I understand correctly what you tell me in your post.

What is the /azax/trunk you mention ? Is it http://codespeak.net/svn/kukit/azax/trunk that is also enlisted ?

Godefroid Chapelle over 18 years ago

Hi Godefroid,

Sorry for the confusion. Sometimes I get carried away after reading thousands of Subversion logs.The log for this repository is very confusing because of all the branching. I'll try to explain myself better, but the sheer length of this post might make our brains explode. :-)

The /azax/trunk directory I am talking about is the directory of the trunk since 2006-05-02. The full URL of this directory is http://svn.nuxeo.org/z3lab/azax/trunk. This is the URL you supplied to Ohloh. This directory did not exist before 2006-05-02.

Before 2005-11-14, this directory was named /z3lab/zax/trunk, and its URL was http://svn/nuxeo.org/z3lab/z3lab/azax/trunk. Note carefully the double /z3lab/z3lab. This is the old URL for this project. It has not existed since 2005-11-14. You can confirm that it used to exist:

svn log -r573:573 http://svn.nuxeo.org/z3lab/z3lab/azax/trunk@573

But it no longer exists:

svn log -rHEAD http://svn.nuxeo.org/z3lab/z3lab/axax/trunk

Because this directory no longer exists, Ohloh does not see it. This old directory is where tziade did his work.

You can see tziade's contribution in the log you described because by default, Subversion follows the complete chain of all branches that led to tziade's contribution. Ohloh does not follow this branching history. We use the --stop-on-copy flag when we retrieve the svn log.

If you do not use the --stop-on-copy flag, as you did, you will see a lot of branching activity:

At revision 573, tziade makes his last contribution:

r573 | tziade | 2005-06-25 10:21:22 -0700 (Sat, 25 Jun 2005) | 1 line

M /z3lab/azax/trunk/demos/azaxdemo/browser/azax_demo.pt

At revision 1804, the /z3lab/azax directory is renamed to simply /azax (this eliminates the double /z3lab/z3lab from the URL)

r1804 | root | 2005-11-14 09:36:48 -0800 (Mon, 14 Nov 2005) | 1 line

D /z3lab/azax

A /azax (from /z3lab/azax:1803)

At revision 2247, /azax/trunk is branched to /azax/branches/snowsprint:

r2247 | bree | 2006-01-30 03:13:48 -0800 (Mon, 30 Jan 2006) | 2 lines

A /azax/branches/snowsprint (from /azax/trunk:2246)

At revision 2566, /azax/branches/snowsprint is branched to /azax/branches/plugin:

r2566 | bree | 2006-03-09 22:55:41 -0800 (Thu, 09 Mar 2006) | 2 lines

A /azax/branches/plugin (from /azax/branches/snowsprint:2565)

Finally, at revision 3023, /azax/branches/plugin is branched one more time, replacing /azax/trunk:

3023 | bree | 2006-05-02 07:15:49 -0700 (Tue, 02 May 2006) | 1 line

D /azax/branches/plugin

A /azax/trunk (from /azax/branches/plugin:3022)

Ohloh sees everything after revision 3023, and nothing before it, because we don't follow branches. tziade's contribution is hidden by 4 separate branch events!

OK, if you are still reading and your brain has not exploded, I hope this has helped answer your question.

Robin Luckey over 18 years ago

Hi Robin,

Waow, thanks for the time taken to write such a long answer.

Now I understand !

What is the reason to use the --stop-on-copy flag ?
limit the amount of data ? I cannot guess something else.

Godefroid Chapelle over 18 years ago

I don't know who set up the plone project here, but was there any reason to just put individual trunks of sub-projects in there? I mean http://svn.plone.org/svn/archetypes would catch any branch and any subproject. Adding individual http://svn.plone.org/svn/archetypes/ArchGenXML/trunk items seems like a lot of work (with apparently also the kind of drawbacks that Godefroid encountered).

As I barely had any commits in the plone project, I assumed (without checking) that plone meant only the http://svn.plone.org/svn/plone repository and not the (related) http://svn.plone.org/svn/archetypes and http://svn.plone.org/svn/collective repositories where I did do a lot of work. So I registered them both two days ago as separate projects.

Any suggestions?

Remove the archetypes and collective projects and fold them under the plone project (which seems a good idea)?
Remove all the individual small bits and pieces of repository and just add the three full svn repositories (plone, collective. archetypes)?
I mean, I'm not even contemplating adding the 20 collective projects I'm working on (each probably including a trunk and two branches or so), which would make some 60 new submits. Yuck.

Reinout

Reinout van Rees over 18 years ago

I (and others) did setup the Plone project. I think that there has been a non-spoken consensus that plone project would be the code delivered in Plone tgz.

I think it would be nice to have a Plone community project where collective and archetypes repositories would be reported about.

However, this is a LOT of code (among others because it has all branches). I am not sure Ohloh team wants to import so much code.

Godefroid Chapelle over 18 years ago

Well, I started the archetypes project here and just added the whole repository. The download goes slow and is currently at 66%, but they'll get there. The collective gives an unspecified error. So eventually the collective&archetypes repositories will get there.

I actually wonder how good a picture the statistics of the plone project give now. the code delivered in a plone .tgz sounds ok, but only specific branches or trunk versions are listed: are those all trunks/branched ever included in a plone release? Or just the versions in the current releases?

The same style of cherry-picking branches and trunks also happens in the zope project and there you see a very strange graph. 20% of the code disappears somewhere in early 2005, for instance. It can be a code cleanup, but in this case it is probably a branch that got moved (and subsequently ignored in the statistics).

Does ohloh have an opinion on this? Do you want to be clobbered with full repositories yes/no? Is that preferred? Not?

Reinout van Rees over 18 years ago

I guess I am the one to blame for the missing contributors then. At some point the development went out completely in branches, and instead of merging it in to the trunk I moved the entire branch to trunk.

Needless to say that this is perfectly valid operation for svn, and it also keeps he history. I did not know that at some point in the future this will confuse ohloh.

I will avoid doing this from now, but it would be good if there were a way to resolve the situation for the past.

Balázs Reé over 18 years ago

Regarding huge (and/or many repositories):

Ohloh's preference is for completeness and accuracy. We encourage people to specify full, historical repositories. Initial crawls can take a while but otherwise we should be able to handle it.

Jason Allen over 18 years ago

I want to clarify Jason's comment, because it's important not to misunderstand.

A single Ohloh report should include only a single line of development, either the trunk or a single, specific branch. There are two reasons for this:

First, your report will be incorrect if you include multiple branches. If your trunk has 10,000 lines of code, and you create a branch, you will now have 20,000 lines of code. The Ohloh report will show double the correct number. Also, the person who created the branch will be credited with writing 10,000 lines of new code.

Second, the extra work can be very hard on the source control server and our own servers. It might take a very long time to download. Some projects have hundreds of branches (sometimes creating a new branch every time they build). This is one of the great features of Subversion, but it also can result in a directory with millions of lines of redundant code, and it will crush our service.

We really want to cover all the code we can, and all the history that it is possible to import -- just one branch per report. If you have an additional branch besides the trunk that you want to import, we recommend creating a new Ohloh project report for that. You can make as many of these additional reports as you'd like, and it's probable that someday we'll implement a grouping feature that lets you merge several such reports together. We just want to avoid mega-projects that include multiple branches in a single report.

Because we are interfacing remotely with a public Subversion server, we are limited in our import tactics, and we have to be good net citizens in our downloads. As time and resources allow, I'd love to improve and optimize our imports, but for Version 1.0 we have to be happy with one branch per report.

Robin Luckey over 18 years ago

Robin, translating this to the plone project:

It is right that the plone project didn't include just the full repository.
But adding all the 40 or so individual smaller projects inside one project is subobtimal.
Plone ought to be split into 40 small projects.
Don't add more than one branch/trunk to a project.

To me, it starts to feel pretty cumbersome. And you're more or less guaranteed to get some black holes that aren't counted. I'll kill off the two full-repository-projects that I've submitted and await how it works out in practice.

Reinout van Rees over 18 years ago

Hi Reinout,

You're correct, but it's not necessarily true that Plone ought to be 40 smaller projects. For example, the GNOME project has very similar repository structure to Plone, with a lot of subprojects that make up a single whole. In that case, I used a script to insert the 500+ GNOME trunk directories into Ohloh. This gives us a very accurate count of the total lines of code, but you are correct that we have some black holes where work that was done on a branch is hidden from the Ohloh report.

It looks like the Plone collective also has over 500 subprojects in its repository. If you can generate a text file listing of the directories you'd like to see in the Ohloh report, I can run a similar script to do an automated insert into our system.

Robin Luckey over 18 years ago

I'm not certain that's really valuable to do, such a one-time list of trunk directories. Sure, I can spend 30 minutes writing a script that selects all trunk directories (not every subproject will be divided into trunk/branches/tags). But:

It will have limited historical value (quite a number will have their history truncated because a branch got copied to trunk).
It is a one-time snapshot, so a maintenance nightmare if projects are added.
You're guaranteed to miss stuff (as you already mentioned) on branches.

I think something like the collective ought to wait for a later version of ohloh. All the data and knowledge about branches is inside the svn repository. Trying to do part of the work inside ohloh and part of the work outside ohloh (figuring out which branches to feed to ohloh) sounds error-prone to me. One tool should do it. What is counted? What not? What if there's an upgrade later on that does it in a different way? Must the list of 142 trunks be changed, then? Etc.

It is not my intention to be negative. I already like the few statistics that I got out of ohloh. I'm just very much in favour of automating things so that they're reliable. So a non-maintainable one-time listing of trunks based on guesswork... no.

Reinout van Rees over 18 years ago

Hi Robin,

Could you anser to the question hereunder that I had asked higher in the thread ?
(Reinout diverged the conversation ;-)

What is the reason to use the --stop-on-copy flag ? limit the amount of data ?

I cannot guess something else.

Godefroid Chapelle over 18 years ago

Godefroid,

Our system doesn't handle branches. Branches result in duplicate code in our reports, inaccurate lines of code totals, and attributes work to the wrong people. Because Subversion branching is represented simply as directories, it's impossible for us to easily know what is a branch and what is not. So to prevent branches from coming in to our system, we use the --stop-on-copy flag when fetching the log.

Reinout,

I think you understand the problem well and can see what we're up against.

Even if we master the art of branching and can track all activity in the entire repository, it will still be impossible for us to give a total lines of code for the project unless someone manually tells us what is in the trunk (that is, what is part of the build and is used by developers) and what is not (forgotten branches, nightly build tags, version drops, etc). There are two separate problems here: what is the historical activity (which we aren't very good at yet), and what is the net total lines of code of a particular snapshot (which we are very good at).

I'm eager to improve the works. It's a worthy challenge.

Robin Luckey over 18 years ago

Hi Robin,

Thanks answering, please feel free to stop this conversation if you feel so...

I think I understand the issue with branches : you have currently no way to figure out which code is duplicated in a branch or not...

However, I still do not understand why you use the --stop-on-copy in the history of a trunk : I do not see how the fact that a trunk is a copy of a previous branch adds any risk to import duplicate code...

Godefroid Chapelle over 18 years ago

Well, this a tough question to answer. It's technically complicated, and I'm not sure I exactly understand what you're asking. But I'm happy to keep trying....

When you tell Ohloh to import a particular url, for instance http://myproject.net/trunk, Ohloh does a very simple, brute-force operation: we do a svn checkout of the oldest revision of http://myproject.net/trunk. Then we check out the next revision and look for changes. We keep checking out the repository over and over until we have captured every revision.

If you copy a file from another branch into the trunk, we have a problem, and here's why:

Suppose you copied a file from http://myproject.net/branches/helloworld.c to http://myproject.net/trunk/helloworld.c.

Because our engine will check out every revision of http://myproject.net/trunk, we can see this file after it was copied.

However, our engine never checks out http://myproject.net/branches. We will never see the contents of any of those files.

So although we can tell from the log that you copied the file, we are unable to get the contents of the file before it was copied.

Since we can't get the file contents anyway, it's simpler for us if the log excludes the copy.

I'm not sure if this helps answer your question -- feel free to follow up.

Robin Luckey over 18 years ago