Tags : Browse Projects

Select a tag to browse associated projects and drill deeper into the tag cloud.

Apache Spark

Compare

Claimed by Apache Software Foundation Analyzed 5 months ago

Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. To run programs faster, Spark provides primitives for in-memory cluster computing: your job can load data into memory and query it repeatedly more rapidly than with ... [More] disk-based systems like Hadoop. To make programming faster, Spark offers high-level APIs in Scala, Java and Python, letting you manipulate distributed datasets like local collections. You can also use Spark interactively to query big data from the Scala or Python shells. Spark integrates closely with Hadoop to run inside Hadoop clusters and can access any existing Hadoop data source. [Less]

1.02M lines of code

374 current contributors

5 months since last commit

56 users on Open Hub

Activity Not Available
5.0
 
I Use This

Apache Hive

Compare

Claimed by Apache Software Foundation Analyzed almost 2 years ago

Hive is a data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, adhoc querying and analysis of large datasets data stored in Hadoop files. It provides a mechanism to put structure on this data and it also provides a simple query language called ... [More] Hive QL which is based on SQL and which enables users familiar with SQL to query this data. At the same time, this language also allows traditional map/reduce programmers to be able to plug in their custom mappers and reducers to do more sophisticated analysis which may not be supported by the built-in capabilities of the language. [Less]

1.77M lines of code

0 current contributors

about 3 years since last commit

23 users on Open Hub

Activity Not Available
5.0
 
I Use This

Apache Storm

Compare

Claimed by Apache Software Foundation Analyzed almost 3 years ago

Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language. Storm is fast: a benchmark ... [More] clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate. Storm integrates with the queueing and database technologies you already use. A Storm topology consumes streams of data and processes those streams in arbitrarily complex ways, repartitioning the streams between each stage of the computation however needed. [Less]

349K lines of code

45 current contributors

about 3 years since last commit

6 users on Open Hub

Activity Not Available
5.0
 
I Use This

Chapel

Compare

  Analyzed 5 months ago

Chapel is an emerging programming language designed for productive parallel computing at scale. Chapel's design and implementation have been undertaken with portability in mind, permitting Chapel to run on multicore desktops and laptops, commodity clusters, and the cloud, in addition to the high-end ... [More] supercomputers for which it was designed. Chapel's design and development are being led by Cray Inc. in collaboration with academia, computing centers, and industry. [Less]

4.95M lines of code

42 current contributors

5 months since last commit

5 users on Open Hub

Activity Not Available
0.0
 
I Use This

Apache Airavata

Compare

Claimed by Apache Software Foundation Analyzed about 23 hours ago

Apache Airavata is a software toolkit currently used to build science gateways but that has a much wider potential use. It provides features to compose, manage, execute, and monitor small to large scale applications and workflows on computational resources ranging from local clusters to national ... [More] grids and computing clouds. Gadgets interfaces to Airavata back end services can be deployed in open social containers such as Apache Rave and modify them to suit their needs. Airavata builds on general concepts of service oriented computing, distributed messaging, and workflow composition and orchestration. [Less]

2.74M lines of code

15 current contributors

5 days since last commit

4 users on Open Hub

High Activity
0.0
 
I Use This

GridGain

Compare

  Analyzed about 7 hours ago

GridGain is an open-source Java-based grid computing platform that is changing the world of grid computing in the same way as JBoss and Spring Framework reshaped J2EE market.

1.8M lines of code

155 current contributors

2 months since last commit

3 users on Open Hub

Moderate Activity
4.66667
   
I Use This

TORQUE Resource Manager

Compare

  Analyzed almost 3 years ago

TORQUE is an open source resource manager providing control over batch jobs and distributed compute nodes. It is a community effort based on the original *PBS project and, with more than 1,200 patches, has incorporated significant advances in the areas of scalability, fault tolerance, and feature ... [More] extensions contributed by NCSA, OSC, USC , the U.S. Dept of Energy, Sandia, PNNL, U of Buffalo, TeraGrid, and many other leading edge HPC organizations. [Less]

328K lines of code

1 current contributors

almost 4 years since last commit

3 users on Open Hub

Activity Not Available
4.0
   
I Use This

ROOT-Sim

Compare

  Analyzed 1 day ago

The ROme OpTimistic Simulator: Multithreaded Parallel Discrete Event Simulator

7.14K lines of code

2 current contributors

2 months since last commit

2 users on Open Hub

Moderate Activity
0.0
 
I Use This

gridscale

Compare

  Analyzed 1 day ago

GridScale allows to access remote job and storage services and to manage files / jobs life cycle. It supporst EGI Grid, PBS / SGE clusters, SSH, HTTP, local filesystem...

3.24K lines of code

2 current contributors

3 months since last commit

2 users on Open Hub

Low Activity
0.0
 
I Use This
Licenses: No declared licenses

GC3Pie

Compare

  Analyzed about 9 hours ago

gc3pie is a suite of Python classes (and command-line tools built upon them) to aid in submitting and controlling batch jobs to clusters and grid resources seamlessly. gc3pie aims at providing the building blocks by which Python scripts that combine several applications in a dynamic workflow can be ... [More] quickly developed. The gc3pie suite is comprised of two main components: * gc3libs: A python package for controlling the life-cycle of a Grid or batch computational job * gc3utils: Command-line tools exposing the main functionality provided by gc3libs [Less]

83.6K lines of code

5 current contributors

about 1 year since last commit

2 users on Open Hub

Very Low Activity
0.0
 
I Use This