Tags : Browse Projects

Select a tag to browse associated projects and drill deeper into the tag cloud.

Apache Spark

Compare

Claimed by Apache Software Foundation Analyzed about 1 hour ago

Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. To run programs faster, Spark provides primitives for in-memory cluster computing: your job can load data into memory and query it repeatedly more rapidly than with ... [More] disk-based systems like Hadoop. To make programming faster, Spark offers high-level APIs in Scala, Java and Python, letting you manipulate distributed datasets like local collections. You can also use Spark interactively to query big data from the Scala or Python shells. Spark integrates closely with Hadoop to run inside Hadoop clusters and can access any existing Hadoop data source. [Less]

1.3M lines of code

374 current contributors

about 11 hours since last commit

56 users on Open Hub

Very High Activity
5.0
 
I Use This

Apache Hive

Compare

Claimed by Apache Software Foundation No analysis available

Hive is a data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, adhoc querying and analysis of large datasets data stored in Hadoop files. It provides a mechanism to put structure on this data and it also provides a simple query language called ... [More] Hive QL which is based on SQL and which enables users familiar with SQL to query this data. At the same time, this language also allows traditional map/reduce programmers to be able to plug in their custom mappers and reducers to do more sophisticated analysis which may not be supported by the built-in capabilities of the language. [Less]

0 lines of code

0 current contributors

0 since last commit

23 users on Open Hub

Activity Not Available
5.0
 
I Use This
Mostly written in language not available
Licenses: apache_2

Apache Storm

Compare

Claimed by Apache Software Foundation Analyzed 1 day ago

Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language. Storm is fast: a benchmark ... [More] clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate. Storm integrates with the queueing and database technologies you already use. A Storm topology consumes streams of data and processes those streams in arbitrarily complex ways, repartitioning the streams between each stage of the computation however needed. [Less]

353K lines of code

45 current contributors

3 days since last commit

6 users on Open Hub

Moderate Activity
5.0
 
I Use This

Apache Airavata

Compare

Claimed by Apache Software Foundation Analyzed about 4 hours ago

Apache Airavata is a software toolkit currently used to build science gateways but that has a much wider potential use. It provides features to compose, manage, execute, and monitor small to large scale applications and workflows on computational resources ranging from local clusters to national ... [More] grids and computing clouds. Gadgets interfaces to Airavata back end services can be deployed in open social containers such as Apache Rave and modify them to suit their needs. Airavata builds on general concepts of service oriented computing, distributed messaging, and workflow composition and orchestration. [Less]

2.78M lines of code

15 current contributors

15 days since last commit

4 users on Open Hub

Moderate Activity
0.0
 
I Use This

TORQUE Resource Manager

Compare

  Analyzed about 8 hours ago

TORQUE is an open source resource manager providing control over batch jobs and distributed compute nodes. It is a community effort based on the original *PBS project and, with more than 1,200 patches, has incorporated significant advances in the areas of scalability, fault tolerance, and feature ... [More] extensions contributed by NCSA, OSC, USC , the U.S. Dept of Energy, Sandia, PNNL, U of Buffalo, TeraGrid, and many other leading edge HPC organizations. [Less]

334K lines of code

1 current contributors

almost 4 years since last commit

3 users on Open Hub

Inactive
4.0
   
I Use This

GridGain

Compare

  Analyzed 1 day ago

GridGain is an open-source Java-based grid computing platform that is changing the world of grid computing in the same way as JBoss and Spring Framework reshaped J2EE market.

1.81M lines of code

155 current contributors

3 days since last commit

3 users on Open Hub

High Activity
4.66667
   
I Use This

ROOT-Sim

Compare

  Analyzed about 21 hours ago

The ROme OpTimistic Simulator: Multithreaded Parallel Discrete Event Simulator

7.26K lines of code

2 current contributors

12 months since last commit

2 users on Open Hub

Very Low Activity
0.0
 
I Use This

GC3Pie

Compare

  Analyzed about 23 hours ago

gc3pie is a suite of Python classes (and command-line tools built upon them) to aid in submitting and controlling batch jobs to clusters and grid resources seamlessly. gc3pie aims at providing the building blocks by which Python scripts that combine several applications in a dynamic workflow can be ... [More] quickly developed. The gc3pie suite is comprised of two main components: * gc3libs: A python package for controlling the life-cycle of a Grid or batch computational job * gc3utils: Command-line tools exposing the main functionality provided by gc3libs [Less]

83.6K lines of code

5 current contributors

over 2 years since last commit

2 users on Open Hub

Inactive
0.0
 
I Use This

Crossdata

Compare

  Analyzed about 6 hours ago

Easy access to big things. Library for Apache Spark extending and improving its capabilities

30.7K lines of code

2 current contributors

over 4 years since last commit

2 users on Open Hub

Inactive
5.0
 
I Use This

gridscale

Compare

  Analyzed about 11 hours ago

GridScale allows to access remote job and storage services and to manage files / jobs life cycle. It supporst EGI Grid, PBS / SGE clusters, SSH, HTTP, local filesystem...

3.32K lines of code

2 current contributors

about 1 month since last commit

2 users on Open Hub

Very Low Activity
0.0
 
I Use This
Licenses: No declared licenses