Forums : Ohloh General Discussion

Dear Open Hub Users,

We’re excited to announce that we will be moving the Open Hub Forum to https://community.synopsys.com/s/black-duck-open-hub. Beginning immediately, users can head over, register, get technical help and discuss issue pertinent to the Open Hub. Registered users can also subscribe to Open Hub announcements here.

On May 1, 2020, we will be freezing https://www.openhub.net/forums and users will not be able to create new discussions. If you have any questions and concerns, please email us at [email protected]

Criticism

This is criticism in the constructive sense

It looks like all the attention was applied to making a pretty site, and little to the backend, the most important part of such a project. Or at least, the frontend was made first.

Anyway, your statistics are so far off for the project I submitted that it's rediculous - I'm listed as the top contributor with 25KLoc modified. I've contributed a few K tops, yet I was the one to do the full directory restructure. I'm sure other projects have similar issues. For instance, the top contributor for mondevelop, daniel gruenwald (I sorta know him) - I'm pretty sure didn't pump ~200KLoc C# in 2 years.

Your stuff needs to split the concepts of moving/deleting/adding/modifying.

It also seriously needs to learn how to use code revisioning systems... I just checked the stats for the tracker and you did as many checkouts/updates as there are revisions. Good lord. No wonder your lab has problems with analysis.

Anyway, without heeding such issues this website will be essentially pointless. Sure, you'll need to reanalyze all your stuff, but I'm sure it'll go much faster once you do it properly.

If you're looking for help, I might, I'm looking for an internship and would rather not go MS - Newcastle, WA

PS: Why not make the backend OSS

mgsloan over 17 years ago

Yeah. We have the same problem:

http://ohloh.net/projects/3159 <-- beewee hasn't modified that many lines but moved some files
http://ohloh.net/projects/3301 <-- we renamed the project and therefore the location in the svn, now it looks like everything was written by gbrandl

But I think that's fixable :D

Armin Ronacher over 17 years ago

It's fixable sure, but it's also a dire problem. Most people move around a few files. This on top of the issue of ohloh indexing the root of the svn (indexing tags and branches, multiplying the LOC count by quite alot). I'm not even worrying about the fact that LOC is a bad marker of productivity. Another thing I am worrying about is that the types of line modification are not divided up - lines deleted, lines modified, lines created, lines intitial (solves the issue of people branching another project).

I'm worried that they assumed code analytics would be a quick task, and that the real project is the nice webpage. Or maybe their coder(s) are just good at web dev.

mgsloan over 17 years ago

thanks for your feedback!

I would have honestly preferred a friendlier tone but frankly I appreciate feedback in almost every form.

It looks like all the attention was applied to making a pretty site...

I didn't find this comment to be very constructive. I guess I'm flattered about your appreciation for the UI, but I believe your main point is that our backend is underdeveloped. I can't argue that our backend needs help: my only counterpoint is that I feel our frontend sucks just as much! We don't support IE very well, we have tons of wacky rendering issues, etc...

Anyway, your statistics are so far off for the project I submitted...

I understand your main complaint to be that we don't interpret checkins correctly: we assume every checkin represents that single developer's contributions, when in truth some checkins are really an aggregate of many other's work. The most common example of this is that the initial checkin of a project: it's usually an initial copying of a pre-existing codebase and crediting that developer with all that code is wrong/misleading. We are aware of this problem and are planning on addressing it soon.

Your stuff needs to split the concepts of moving/deleting/adding/modifying.

We do some of that already - for example, we don't credit indentation changes. There's more to do here - no doubt. If you have some concrete ideas on how to interpret file diffs into moving/deleting/adding/modifying primitives, let me know.

It also seriously needs to learn how to use code revisioning systems... No wonder your lab has problems with analysis.

I'm actually flattered that you bothered to investigate the activity we had on your source control system. Source control systems rarely have a duplicate the history command available. The only way for Ohloh to accurately know the history of checkins is to replay the history of checkins one-by-one. I'd be overjoyed to discover a shortcut - if you know any please forward them on.

Anyway, without heeding such issues this website will be essentially pointless...

Since when has being pointless hurt a popular website's traffic? ;-). On a serious note though, we're psyched by the response so far and are continuing to improve the site. Specific bugs/feature ideas help out a lot - please consider contributing more.

If you're looking for help, I might, I'm looking for an internship and would rather not go MS - Newcastle, WA

We don't offer internships at this point but we'll contact you if we do. Meanwhile, feel free to contact me ([email protected]) if you want some help with MS interview preparation - i've performed a few hundred development interviews whilst working there.

PS: Why not make the backend OSS

Great question: we keep revisiting this question and the simple answer is that we need to make a buck and aren't sure how to both make the backend OSS and make money.

Jason Allen over 17 years ago

mitsuhiko,

Thanks for the specific examples. We're currently designing new features to help rectify these scenarios. We'll discuss them on our blog soon.

Jason Allen over 17 years ago

mgsloan,

we don't index subversion from the root - we attempt to identify the trunk (although one can point Ohloh to a branch/tag on purpose).

I am wholly in agreement that LOC is a poor marker of productivity - but it is ubiquitous and easily accessible. We are beginning to look at more metrics and welcome suggestions - I am very eager to move beyond LOCs.

We never assumed source analytics to be quick or easy - and it didn't disappoint. Constructive metrics / feature suggestions are again very welcome - please get concrete!

Now for your suggestion regarding the classification of lines (deleted, modified, created, initial). We plan on enabling a feature where users could indicate a checkin/commit as a port or initial checkin (and as a result we wouldn't credit the developer with those lines). This should handle a large subset of your criticisms above. The next step is to identify modification vs adding/deleting. How do you propose classifying a line as modified vs added/deleted?

-jay

Jason Allen over 17 years ago

Yeah, sorry for the negative tone, anyway, a few days ago I sent a much more detailed email with methods etc, and got a very detailed email back (and you/they have already tried the same basic things). Main reason for this was that I wanted to use ohloh to get an idea of how much I've done on a project overall.

I guess I jumped to conclusions as far as your attention to the backend goes. My main problem with the system is that moving files gives you a modify count of 2x the lines in the file. Other things are rather negligable in comparison. Perhaps cumalative LOC modifications could be displayed on the main screen, yet the detailed view show more detailed subdivisions.

Anyway, as far as metrics other than LOC, just raw file size is often times better. I suppose you could filter out whitespace/comments. The best technical metric would be to have parsers for all the languages and count nodes in the AST.

If you have some sort of representation similar to an svn diff, then you should be able to locate runs of -/+, and match lines by similarity (numerous algorithms exist), so they are modifications. If you wanted to catch moves within a single file, you might take a similarity method I think I invented (there are probably better ones like it) which has a relatively costly setup, but very cheap scan (perhaps tried on the first line of a block of removes, and start the search from that point for the other lines). Anyway, basically you create a map aka hash table with char keys and values storing location of all occurances of that char. When scanning start a score at a certain amount (vary based on line width, probably). skip a certain number of letters (could also depend on line width), and check if the next letter will match any in the current's list. If so, then require that subsequent letters are equivalent to the text line you are searching for. As soon as differences are found, decrement score, and skip forward a bit. Probably sounds a bit inchorent in text form. ahwell, you'll probably get the idea.

I thought of the marking as port/branch thing - would probably work. Still, automatic is better.

Thanks for the interview advice offer as well, I'll keep it in mind if I get that far.

mgsloan over 17 years ago

Hi Elpie,

Thanks for informing us about the Mambo stats. I would like to fix this asap. Could you please educate me about what exact aspects of the report are suspicious/wrong?

Jason Allen over 17 years ago

Elpie posted details here:

http://www.ohloh.net/projects/23/reviews

bombguy over 17 years ago

Thanks bombguy!

Jason Allen over 17 years ago

Hi Elpie,

There is still confusion around as to where the project resides.

Ohloh attempts to automate the software audit process. We believe that source control history is a powerful indicator of a project's health. Confusion around the project history reflects poorly on any audit, whether automated or manual.

We used the repository indicated at SourceForge to create our report. This repository is less than two years old, has not seen any activity since May of 2006.

If there are other repositories we should be using which reflects either older or more recent activity, please send us a link and we'll be happy to update our report.

Robin

Robin Luckey over 17 years ago

It looks like all the attention was applied to making a pretty site, and little to the backend, the most important part of such a project. Or at least, the frontend was made first.

The funny thing is that a completely functionnal backend with a much poorer UI would have put you off even sooner because you may have not even tried. As the OSS mojo states, release often release early. I must admit I got hooked at those small charts when I saw them and yes I quickly wanted more but I think the OHLOH team has done a wondeful job at finding the right balance to launch their service and attract people so that they wanted more. In a way your reaction actually prooves them right because you haven't said That's crap I'm off. No you tried to be constructive (a tiny bit rude but that's frustration I guess ;)). So hats off to the team and keep pushing folks.

Note that I don't belong to the OHLOH team and I only discovered the service two weeks ago.

Personally the main downside I see for now is the name. It sounds pretty but it's a nightmare to type right without looking it up first in my address bar.

Sylvain

Lawouach about 17 years ago

I also think that one of the editor features ought to be the ability to mark two developers as the same person. For instance if aoliver and acoliver are really the same person.

acoliver about 17 years ago