Forums : Feedback Forum

Dear Open Hub Users,

We’re excited to announce that we will be moving the Open Hub Forum to https://community.blackduck.com/s/black-duck-open-hub. Beginning immediately, users can head over, register, get technical help and discuss issue pertinent to the Open Hub. Registered users can also subscribe to Open Hub announcements here.


On May 1, 2020, we will be freezing https://www.openhub.net/forums and users will not be able to create new discussions. If you have any questions and concerns, please email us at [email protected]

Excluding Code/Markup

Hi there

Is it possible to exclude code from statistic?
There's a lot of generated code which leads to a 4 person year project even it's only a few months old and uses a third-part-library.

Greez

Michel Jung almost 17 years ago
 

I have a similar problem. I have a project with a 3rd party library in the same repository, so it lists 3 person year instead of the correct 1-4 months.

Nicolas A. Barriga almost 17 years ago
 

Sorry, this is not currently possible.

For a long time, there has been a good idea floating around: Ohloh should support some kind of robots.txt-like file that would allow you to instruct Ohloh to ignore or give special treatment to certain directories.

I think that's a great idea, but we've simply never had the development resources required to get it done.

If you are using Subversion, there may be a workaround, but it's a lot of work: rather than enlisting your entire trunk in Ohloh, you can individually add every directory except the directory containing the 3rd party library. If you have a lot of directories, I can appreciate that this may not be a realistic option.

Thanks,
Robin

Robin Luckey almost 17 years ago
 

A few of the projects I manage show as mostly written in XSLT because we have the DocBook stylesheets in our SVN repositories. We'd also appreciate a way to exclude a particular directory.

jfuerth almost 17 years ago
 

I have added a ticket: http://labs.ohloh.net/ohcount/ticket/317; unfortunately I am not a Ruby coder (yet); anyone else up for it?

(see also thread https://www.ohloh.net/topics/3356?page=1#post_10651)

tpokorra almost 17 years ago
 

I'm hitting this with my project as well. I've just added an app I'm writing called WarFoundry. It has a System.Windows.Forms (Microsoft .Net Windows native) front end. All of the .resx (resource) files are bumping our XML line count way out of proportion. The Glade files for the GTK# front-end are probably doing the same thing. Both are all auto-generated files.

Unfortunately the multiple directories idea won't work because resx files are in the same place as .cs and Glade files are in a folder below the main code. What would be great for that situation would be an ignore file name pattern option :)

IBBoard almost 17 years ago
 

A better workaround for Subversion is to make the 3rd party code an svn external. Last time I checked, Ohloh doesn't traverse externals.

You can do this by adding the 3rd party code outside the regular code tree, for example at /3rdparty instead of /trunk/3rdparty and make /trunk/3rdparty an external pointing to /3rdparty.

Peter Bex almost 17 years ago
 

Ok you all got some ideas, but i can't agree with most of them. I'd say we all don't want to instrumentalize our project structure just to go well with the ohloh statistics.

The only good solution I see is to set paths/patterns to exclude in the ohloh control panel.

Michel Jung almost 17 years ago
 

Looks like there's a minor false-alarm with my project :) While digging around I found that I'd included the Log4Net documentation as well as the DLL (I hadn't paid attention to what was in the .xml file). It appears that although .resx files are XML, Ohloh doesn't pick them up as such.

Still, the general idea of filterable paths for when a project does include code or files that are being picked up but aren't wanted in the count is a good one :)

IBBoard almost 17 years ago
 

This is really necessary as it is currently what holds me from adding my primary git repositories to ohloh. As it is now, Im manually syncing the changes into a separate subversion repo where I only enlisted the src directory.

But I have one issue with the include/exclude thought, why dont we just classify paths into categories like sourcecode, data, docs, external etc.
For example:

/lib/ > external

/docs/ > documentation

data/ > data

* > sourcecode

It shouldn't be to hard to match each path against this during counting, and assigning the score to the appropriate category..

okinsey over 16 years ago
 

Hi okinsey,

I agree with your idea -- I'd always visualized this as more of a tagging system than an exclude/include system.

Initially, we might only honor the ignore tag, but as time goes on we might allow code to be tagged in all kinds of interesting ways.

Robin Luckey over 16 years ago
 

@robin, good to hear that, but the main question still remains - is this feature ever going to be implemented?

The topic has existed for quite some time, and most of the solutions presented has been quite easy to implement.

okinsey over 16 years ago
 

Sorry, I can't make any estimate when we will get to this.

We are currently focused on performance and reliability issues. We're physically moving to a new data center, and we are redesigning our source control processing for better scalability. I can't guess how long it will be before we have free cycles to add new features.

And while agree that the solutions on this thread are good ones in principle, when you think about actually implementing them, they turn out to be surprisingly complicated.

Does a robots.txt-style file apply to all revisions of a repository, or just particular revisions? If I rename or move code, I'll need to change my robots.txt. How does the time axis of robots.txt work? How do I know which revisions are covered by which robots.txt?

How do I confirm that Ohloh processed my robots.txt correctly? How will Ohloh explain that some code is ignored intentionally? Currently, Ohloh doesn't even let me browse the code at all. How can I debug the reason for missing/extra code?

Finally, what happens after someone makes a change to the robots.txt? One small change might require Ohloh to do a full recalculation across the full history of the project, which might take a week of server time on a large project. How will we avoid that?

Those are some reasons why this feature still does not exist. I'd really like to get it done, but it's messier than it seems at first. Maybe after we've hired some more help... :-)

Robin Luckey over 16 years ago
 

The easy answer would be not to use a file present IN the repository, but rather require the person enlisting the repository to supply the patterns for categorizing befor any processing takes place. These patterns could be immutable, and hence, would not lead to any extra processing - rather, by having an ignore tag (that actually caused the parser to ignore the files) you would free cpu cycles for more important work.

okinsey over 16 years ago
 

Just to go back and correct one of my previous statements, it appears that Ohcount does count .resx (.Net's resources wrapped in XML) files as XML - https://www.ohloh.net/p/WarFoundry/commits/48803516?page=4 - as well as .manifest files (which are XML, but are also auto-generated) - https://www.ohloh.net/p/WarFoundry/commits/48803516?page=5 - and a few others.

So, from a C# project point of view then filtering based on extensions to remove auto-generated code from the list would be useful :) Personally, I'd prefer an Ohloh-based solution checking file paths against a pattern rather than some extra file to put in the repo (which contaminates it with unnecessary junk).

The recalculation problem could be an issue, unless it just never gets applied retrospectively (much like a commit - once you make it then it is always there, which is why my line count spiked like crazy because of some XML docs!)

IBBoard over 16 years ago
 

你好

hero6 over 16 years ago
 

@robin, any news on this feature?
I'm betting this is quite a big showstopper for git users...

okinsey over 16 years ago
 

What needs to happen is for someone to add an ignore setting to ohcount which takes a folder path as input. Then robin can come along and just add it to ohloh.

Andrew Fenn about 16 years ago
 

Not to be contrary, but this is a bit more difficult than just implementing ignore features in Ohcount.

There's the whole question of time specificity -- if someone changes the ignore settings for a project, does that apply only to the code moving forward, or will it apply to all of old source control history as well? Ohloh doesn't have the processing power to re-calculate from scratch the line counts for the entire project history every time the ignore settings are changed.

What happens if the code that needs to be ignored changes its location in the source tree over time?

There's also the question of how to communicate to the users which project contents are or aren't being ignored by Ohloh, and whether the settings entered by the users are working correctly.

This problem is a bit trickier -- and a lot more expensive computationally-- than it seems at first glance, which is why we have been dragging our feet on an implementation.

Robin Luckey about 16 years ago
 

There will always be things that Ohcount cannot catch, no matter how much effort you put in.

In my opinion it would make more sense to provide an option to project owners/managers in the Ohloh web-interface. Such an option should be on a per-project basis and allow project owners/managers to flag certain files in the project for non-standard treatment.

This would then allow files to be flagged meta-data or ignore and Ohcount could then be made not to count them.

It could also allow files to be flagged embedded library and Ohcount could count the lines separately and list them in the statistics under embedded libraries. If there was also an option to specify another Ohloh tracked project as the origin for such an embedded library, then that would make it possible to automatically increment the use count for that project.

Last but not least, it would also allow non-standard use of file extensions to be fixed by the project owners without having to add disambiguation code to Ohcount and without an old project requiring renaming of filenames and trickery to hide the history from Ohcount.

trijezdci about 16 years ago
 

i just added my latest project,
which uses git as repository,
now in the git itself i've bundled another opensource project, since everything else would make it pain in the ass to handle (other types of repositories as submodule? forget it!)
now it would be really nice if i could exclude that directory from my stats - since it adds a ridiculous amount of work by someone else to my project,
also falsifies the License stats (my project is MIT license, the bundled one BSD style)

ppetermann almost 16 years ago
 

I would love to see this feature added to, it is a common problem in many projects I work on. A simple system where we could simply ignore specific paths would work for 90% I suspect. I would want it to go back to the start in general though, not just from now on...

I appreciate the resources that this might require. May be an extra checkbox to request such as action? I don't know how you do scheduling. Thanks for all of the work you guys put into Ohloh!

Marcus D. Hanwell almost 16 years ago
 

Its Looks like there's a minor false-alarm with my project :)

Thanks
Anya
Garden GlovesManufacturer

Anyajohn almost 16 years ago
 

I agree, would be a very nice option. Good luck including it!

svenn almost 16 years ago
 

Hi all,

Just to make sure everyone is aware, we recently deployed a feature to tell Ohloh which files to ignore. See more at https://www.ohloh.net/blog/LatestUpdatesToIgnoringFilesandDirectories

Sara Ford about 15 years ago