Posted
about 10 years
ago
by
Soeren Sonnenburg, Gunnar Raetsch, Sergey Lisitsyn, Heiko Strathmann, Viktor Gal, Fernando Iglesias
Overview
The SHOGUN machine learning toolbox's focus is on large scale kernel
methods and especially on Support Vector Machines (SVM). It comes with a
generic interface for kernel machines and features 15 different SVM
implementations that
... [More]
all access features in a unified way via a general kernel
framework or in case of linear SVMs so called "DotFeatures", i.e., features
providing a minimalistic set of operations (like the dot product).
Features
SHOGUN includes the LinAdd accelerations for string kernels and the COFFIN
framework for on-demand computing of features for the contained linear SVMs.
In addition it contains more advanced Multiple Kernel Learning, Multi Task
Learning and Structured Output learning algorithms and other linear methods.
SHOGUN digests input feature-objects of basically any known type, e.g., dense,
sparse or variable length features (strings) of any type char/byte/word/int/long
int/float/double/long double.
The toolbox provides efficient implementations to 35 different kernels among
them the
Linear,
Polynomial,
Gaussian and
Sigmoid Kernel
and also provides a number of recent string kernels like the
Locality Improved,
Fischer,
TOP,
Spectrum,
Weighted Degree Kernel (with shifts)
.
For the latter the efficient LINADD optimizations are implemented. Also
SHOGUN offers the freedom of working with custom pre-computed kernels. One
of its key features is the combined kernel which can be constructed by a
weighted linear combination of a number of sub-kernels, each of which not
necessarily working on the same domain. An optimal sub-kernel weighting can be
learned using Multiple Kernel Learning. Currently SVM one-class, 2-class,
multi-class classification and regression problems are supported. However
SHOGUN also implements a number of linear methods like
Linear Discriminant Analysis (LDA)
Linear Programming Machine (LPM),
Perceptrons and features algorithms to train Hidden Markov Models.
The input feature-objects can be read from plain ascii files (tab separated
values for dense matrices; for sparse matrices libsvm/svmlight format),
a efficient native binary format and general support to the hdf5 based format,
supporting
dense
sparse or
strings of various types
that can often be converted between each other. Chains of preprocessors (e.g.
subtracting the mean) can be attached to each feature object allowing
for on-the-fly pre-processing.
Structure and Interfaces
SHOGUN's core is implemented in C++ and is provided as a library libshogun to
be readily usable for C++ application developers. Its common interface functions
are encapsulated in libshogunui, such that only minimal code (like setting or
getting a double matrix to/from the target language) is necessary.
This allowed us to easily create interfaces to Matlab(tm), R, Octave and Python.
(note that a modular object oriented and static interfaces are provided to r,
octave, matlab, python, python_modular, r_modular, octave_modular, cmdline,
libshogun).
Application
We have successfully applied SHOGUN to several problems from computational biology,
such as Super Family classification, Splice Site Prediction, Interpreting the
SVM Classifier, Splice Form Prediction, Alternative Splicing and Promoter
Prediction. Some of them come with no less than 10 million training
examples, others with 7 billion test examples.
Documentation
We use Doxygen for both user and developer documentation which may be read
online here. More than 600 documented examples for the interfaces
python_modular, octave_modular, r_modular, static python, static matlab and
octave, static r, static command line and C++ libshogun developer interface
can be found in the documentation.
[Less]
|
Posted
about 10 years
ago
by
Soeren Sonnenburg, Gunnar Raetsch, Sergey Lisitsyn, Heiko Strathmann, Viktor Gal, Fernando Iglesias
Overview
The SHOGUN machine learning toolbox's focus is on large scale kernel
methods and especially on Support Vector Machines (SVM). It comes with a
generic interface for kernel machines and features 15 different SVM
implementations that
... [More]
all access features in a unified way via a general kernel
framework or in case of linear SVMs so called "DotFeatures", i.e., features
providing a minimalistic set of operations (like the dot product).
Features
SHOGUN includes the LinAdd accelerations for string kernels and the COFFIN
framework for on-demand computing of features for the contained linear SVMs.
In addition it contains more advanced Multiple Kernel Learning, Multi Task
Learning and Structured Output learning algorithms and other linear methods.
SHOGUN digests input feature-objects of basically any known type, e.g., dense,
sparse or variable length features (strings) of any type char/byte/word/int/long
int/float/double/long double.
The toolbox provides efficient implementations to 35 different kernels among
them the
Linear,
Polynomial,
Gaussian and
Sigmoid Kernel
and also provides a number of recent string kernels like the
Locality Improved,
Fischer,
TOP,
Spectrum,
Weighted Degree Kernel (with shifts)
.
For the latter the efficient LINADD optimizations are implemented. Also
SHOGUN offers the freedom of working with custom pre-computed kernels. One
of its key features is the combined kernel which can be constructed by a
weighted linear combination of a number of sub-kernels, each of which not
necessarily working on the same domain. An optimal sub-kernel weighting can be
learned using Multiple Kernel Learning. Currently SVM one-class, 2-class,
multi-class classification and regression problems are supported. However
SHOGUN also implements a number of linear methods like
Linear Discriminant Analysis (LDA)
Linear Programming Machine (LPM),
Perceptrons and features algorithms to train Hidden Markov Models.
The input feature-objects can be read from plain ascii files (tab separated
values for dense matrices; for sparse matrices libsvm/svmlight format),
a efficient native binary format and general support to the hdf5 based format,
supporting
dense
sparse or
strings of various types
that can often be converted between each other. Chains of preprocessors (e.g.
subtracting the mean) can be attached to each feature object allowing
for on-the-fly pre-processing.
Structure and Interfaces
SHOGUN's core is implemented in C++ and is provided as a library libshogun to
be readily usable for C++ application developers. Its common interface functions
are encapsulated in libshogunui, such that only minimal code (like setting or
getting a double matrix to/from the target language) is necessary.
This allowed us to easily create interfaces to Matlab(tm), R, Octave and Python.
(note that a modular object oriented and static interfaces are provided to r,
octave, matlab, python, python_modular, r_modular, octave_modular, cmdline,
libshogun).
Application
We have successfully applied SHOGUN to several problems from computational biology,
such as Super Family classification, Splice Site Prediction, Interpreting the
SVM Classifier, Splice Form Prediction, Alternative Splicing and Promoter
Prediction. Some of them come with no less than 10 million training
examples, others with 7 billion test examples.
Documentation
We use Doxygen for both user and developer documentation which may be read
online here. More than 600 documented examples for the interfaces
python_modular, octave_modular, r_modular, static python, static matlab and
octave, static r, static command line and C++ libshogun developer interface
can be found in the documentation.
[Less]
|
Posted
over 10 years
ago
by
sanuj sharma
dot_prod_expensive_unsorted (to linalg)
Sorry that was a mistake. I think we should move it to linalg.
I'll remove "fequal" entirely or move it to CMath as the situation demands.
On 12 Jan 2015 16:56, "Fernando J. Iglesias García" <
fernando.iglesias-Z2lspGEnpre4vvrruGDysh2eb7JE58TQ< at >public.gmane.org> wrote:
|
Posted
over 10 years
ago
by
Rahul De
The idea is, where ever we are currently managing without relying on any
third party backends, we should keep it that way (provide native to _at
least_ those many methods).
On Mon, Jan 12, 2015 at 5:05 PM, Rahul De <heavensdevil6909-Re5JQEeQqe8AvxtiuMwx3w< at >public.gmane.org>
wrote:
|
Posted
over 10 years
ago
by
Rahul De
Hi Fernando,
To me moving SGMatrix and SGVector methods to linalg makes lot of sense.
Please note that some of them are already there in linalg (IIRC multiply,
scale things) and we can get rid of them as soon as we replace their usage
with the
... [More]
linalg ones. This might take some time - some of them may have
native implementation while others won't. I think its best to have native
impl with default backend set to eigen3, in absence of which, things fall
back to native impl. Then its lot safer when we get rid of these methods.
Not sure about the sparse stuffs. IMO since we are not using any external
libs for sparse, linalg is not the right place for those.
On Mon, Jan 12, 2015 at 4:56 PM, Fernando J. Iglesias García <
fernando.iglesias-Z2lspGEnpre4vvrruGDysh2eb7JE58TQ< at >public.gmane.org> wrote:
[Less]
|
Posted
over 10 years
ago
by
Fernando J. Iglesias García
Hi Sanuj,
Thanks for going through the SG* classes and pointing out the methods that
should be taken out from them.
On 9 January 2015 at 16:04, sanuj sharma <sanuj.sharma.in-Re5JQEeQqe8AvxtiuMwx3w< at >public.gmane.org> wrote:
To me
... [More]
, this method has no sense in SGVector since it is not doing anything
related to a vector. Can you please check where this method is used? If it
is used nowhere, remove it. If it is used, then use an equivalent one in
CMath (there may be already one) or move this one there.
All the previous methods related to dot products you have suggested to move
to linalg. Why this one to CMath?
I agree with moving all these methods out from SG* classes. < at >Heiko,
< at >lambday, is linalg the right place for most of them as Sanuj suggests?
[Less]
|
Posted
over 10 years
ago
by
Fernando J. Iglesias García
Dear Oana,
I think that the class in Shogun that corresponds to the multitask SVM you
reference is LibLinearMTL.
You can find that class along with other multitask SVMs at
... [More]
https://github.com/shogun-toolbox/shogun/tree/develop/src/shogun/transfer/multitask
.
Cheers,
Fernando.
On 12 January 2015 at 05:57, Oana <oanamursu-Re5JQEeQqe8AvxtiuMwx3w< at >public.gmane.org> wrote:
[Less]
|
Posted
over 10 years
ago
by
Oana
Hi,
I am interested in using multitask SVMs. This paper
(http://www.cs.bris.ac.uk/~flach/ECMLPKDD2012papers/1125558.pdf) suggests
that there is an implementation of multitask SVMs in Shogun, and I am
writing to ask which function corresponds to the multitask SVMs.
Thank you
|
Posted
over 10 years
ago
by
Xiaoxue Zhao
Also, when I did
$ cmake -DCMAKE_INSTALL_PREFIX="$HOME/shogun-install" ..
I got a bunch of warnings such as :
CMake Warning at examples/undocumented/libshogun/CMakeLists.txt:116
(add_executable):
Cannot generate a safe runtime search path for
... [More]
target
transfer_multitasklogisticregression because files in some directories may
conflict with libraries in implicit directories:
runtime library [libxml2.so.2] in /usr/lib/x86_64-linux-gnu may be
hidden by files in:
/home/sheryl/anaconda/lib
runtime library [libcurl.so.4] in /usr/lib/x86_64-linux-gnu may be
hidden by files in:
/home/sheryl/anaconda/lib
runtime library [libz.so.1] in /usr/lib/x86_64-linux-gnu may be hidden
by files in:
/home/sheryl/anaconda/lib
Some of these libraries may not be found correctly.
Does it mean that there was some conflicts between anaconda and shogun that
caused the problem? Please advice. Thanks!
CMake Warning at examples/undocumented/libshogun/CMakeLists.txt:116
(add_executable):
Cannot [Less]
|
Posted
over 10 years
ago
by
Xiaoxue Zhao
Yes now I got it work till
$ make -j5 all
$ make install
$ make -j5 all
took while and seemed to be mostly fine.
but by
$ make install
I got two errors
collect2: error: ld returned 1 exit status
make[2]: ***
... [More]
[examples/undocumented/libshogun/balanced_conditional_probability_tree]
Error 1
make[1]: ***
[examples/undocumented/libshogun/CMakeFiles/balanced_conditional_probability_tree.dir/all]
Error 2
make: *** [all] Error 2
Then I tried
$ export LD_LIBRARY_PATH="$HOME/shogun-install/lib:$LD_LIBRARY_PATH"
$ cd "$HOME/shogun-install/share/shogun/examples/libshogun"
$ chmod +x ./so_multiclass_BMRM && ./so_multiclass_BMRM
just to find that in my home directory there was no directory called
'shogun-install'.
Do you have any clue why?
Thank you very much!
--
Xiaoxue Zhao
Mphil/PhD in Computer Science,
University College London
x.zhao-kecf2yCh/1qFxr2TtlUqVg< at >public.gmane.org
On 18 November 2014 09:30, Heiko Strathmann <heiko.strathmann-Re5JQEeQqe8AvxtiuMwx3w< at >public.gmane.org>
wrote:
[Less]
|