Forums : Technical Issue Help

Dear Open Hub Users,

We’re excited to announce that we will be moving the Open Hub Forum to https://community.synopsys.com/s/black-duck-open-hub. Beginning immediately, users can head over, register, get technical help and discuss issue pertinent to the Open Hub. Registered users can also subscribe to Open Hub announcements here.


On May 1, 2020, we will be freezing https://www.openhub.net/forums and users will not be able to create new discussions. If you have any questions and concerns, please email us at [email protected]

exclude files by type?

Hi all,

I recently added a project to ohloh the source code of which contains a lot of data, e.g., PDF files or numerical test files.
I tried explicitly listing them all on the ignore list, but quickly hit the 1000-characters-limit here.
Is it possible to exclude files, e.g., based on the file extension?

--Nico

Nico Schlömer about 11 years ago
 

Nico,

The scheme we are using is mainly based on directories so if all that data is in a few directories (like /doc/), it would be easier to exclude a few directories. We don't have a facility to leave out particular extensions and we also don't support wildcard or regex kinds of exclusions either.

Also, it may not make much difference since we don't scan most non-source files anyway and though we do retrieve them, if they aren't too dynamic and changing all the time we won't retrieve them again unless they change.

Not sure if this helps but let me know if you have additional questions.

Thanks!

ssnow-blackduck about 11 years ago
 

Hi,

the data items are somewhat scattered throughout the source tree unfortunately, but I'll do my best to single out the files.
A wildcard handling would have helped indeed, so here's a +1 for me on that feature request.

--Nico

Nico Schlömer about 11 years ago
 

Nico,

I figured that might be the case. I'll pass on your recommendation. Let me know if you have success or failure since the info is valuable either way. Also let me know why you want the ignore order in the first place. It may make a difference to how to structure it.

Thanks!

ssnow-blackduck about 11 years ago
 

We're talking about https://www.ohloh.net/p/trilinos: Ohloh says that it's written mostly in Objective-C which is not actually the case. My suspicion is that the file parser misinterprets the test data, the PDFs, or whatever else as Objective-C. I don't know of a way to confirm this though.
I new searched for the top 30ish files by file size, and put them in the ignore list. Let's see if this changes anything.

Nico Schlömer about 11 years ago
 

Nico,

It looks as if there might have been some change. Currently I'm seeing C++ at 59% on the summary and Objective-C as only 6%. I looked at the code and I can see where there may be some confusion since Objective-C and Matlab code share an extension and at least some of your code looks perhaps vaguely like Objective-C with the semicolon at the end of the line. Is there any actual Objective-C in the code?

Also I seemed to notice that there was a lot of code in the packages directory and subs. Is any of that dependencies or is it all work product of your project? Could be a prime candidate for an ignore order if you haven't already added it.

Thanks!

ssnow-blackduck about 11 years ago
 

If you check out the exclude list you'll find I already added a bunch of files. Much of the data that's already excluded now was interpreted as Objective-C. I'll extend the list to at least contain every non-source file that's larger than 1MB.

Is there any actual Objective-C in the code?

Not to my knowledge. The code may contain a bit of MATLAB though.

packages directory

The packages form the actual content of Trilinos. :)

Nico Schlömer about 11 years ago
 

Nico,

Good. Looks like you're getting somewhere. It may be impossible to get all the Objective-C content out of the listing if there's a common style between that and Matlab code. Our current recognition algorithms sometimes misidentify closely related code styles where there is common syntax between them. Sometimes only your compiler knows for sure...

Thanks!

ssnow-blackduck about 11 years ago
 

For sure. We don't need a perfect count there either, but something approximate would be nice.
I'm now struggling with the 1000-character limit on the exclude list. Any reason why there's a limit at all? It's be great for me if that constraint could be lifted to, e.g. 10k.

Nico Schlömer about 11 years ago
 

Nico,

I imagine it's arbitrary but it must be related to the common database that stores all this info. I can imagine that it might be expensive in terms of storage space or compute time to have a large limit on the number of lines of ignore entries we can accommodate. It was conceived as a directory-based system and not necessarily as a one-file-per-line entry though that's not outlawed. I remember that it's related to the robots.txt syntax now that I think of it and that's pretty limited in scope since it needs to be quickly interpreted by many different web servers.

Thanks!

ssnow-blackduck about 11 years ago
 

I'd like to add a vote to this issue. We work on a project (Amber Smalltalk) where we compile smalltalk files into javascript files; but we have some hand-written .js files as well, that implement the core of the system. Now, statistics show that project is primarily JavaScript and shows lots of uncommented JavaScript LoCs, tweaking lots of stats.

It would be very helpful if I can switch generated files off, so having a wildcard (in this case, excluding files by dir & extension is enough; but having only one of them is not working).

If possible, please add a way to exclude files with certain extension from certain directories.

Thank you very much, Herby

Herbert Vojčík about 10 years ago
 

We have some CSS dependencies for a help page that distorts our percentages as well. Is there a way to ignore a file type? I cannot find it in the settings!

tresf over 7 years ago