Tags : Browse Projects

Select a tag to browse associated projects and drill deeper into the tag cloud.

Apache Spark

Compare

Claimed by Apache Software Foundation Analyzed about 5 hours ago

Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. To run programs faster, Spark provides primitives for in-memory cluster computing: your job can load data into memory and query it repeatedly more rapidly than with ... [More] disk-based systems like Hadoop. To make programming faster, Spark offers high-level APIs in Scala, Java and Python, letting you manipulate distributed datasets like local collections. You can also use Spark interactively to query big data from the Scala or Python shells. Spark integrates closely with Hadoop to run inside Hadoop clusters and can access any existing Hadoop data source. [Less]

1.38M lines of code

374 current contributors

about 18 hours since last commit

56 users on Open Hub

Very High Activity
5.0
 
I Use This

Apache Hive

Compare

Claimed by Apache Software Foundation No analysis available

Hive is a data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, adhoc querying and analysis of large datasets data stored in Hadoop files. It provides a mechanism to put structure on this data and it also provides a simple query language called ... [More] Hive QL which is based on SQL and which enables users familiar with SQL to query this data. At the same time, this language also allows traditional map/reduce programmers to be able to plug in their custom mappers and reducers to do more sophisticated analysis which may not be supported by the built-in capabilities of the language. [Less]

0 lines of code

0 current contributors

0 since last commit

23 users on Open Hub

Activity Not Available
5.0
 
I Use This
Mostly written in language not available
Licenses: apache_2

Apache Avro

Compare

Claimed by Apache Software Foundation No analysis available

Avro is a serialization system.

0 lines of code

75 current contributors

0 since last commit

8 users on Open Hub

Activity Not Available
0.0
 
I Use This
Mostly written in language not available
Licenses: apache_2

AppScale

Compare

  Analyzed 2 days ago

AppScale is an open-source implementation of the Google AppEngine (GAE) cloud computing interface. AppScale enables execution of GAE applications on virtualized cluster systems. In particular, AppScale enables users to execute GAE applications using their own clusters with greater scalability and ... [More] reliability than the GAE SDK provides. Moreover, AppScale executes automatically and transparently over cloud infrastructures such as the Amazon Web Services (AWS) Elastic Compute Cloud (EC2) and Eucalyptus, the open-source implementation of the AWS interfaces. [Less]

1.23M lines of code

10 current contributors

over 4 years since last commit

7 users on Open Hub

Inactive
5.0
 
I Use This

StreamSets Data Collector

Compare

Claimed by StreamSets No analysis available

Open source software for the rapid development and ​reliable​ operation of complex data flows.

0 lines of code

60 current contributors

0 since last commit

4 users on Open Hub

Activity Not Available
5.0
 
I Use This
Mostly written in language not available
Licenses: apache_2

Apache Flume

Compare

Claimed by Apache Software Foundation Analyzed 2 days ago

Apache Flume is a system for reliably collecting high-throughput data from streaming data sources like logs.

72.8K lines of code

3 current contributors

about 1 month since last commit

4 users on Open Hub

Very Low Activity
0.0
 
I Use This

Apache Hama

Compare

Claimed by Apache Software Foundation Analyzed about 1 hour ago

Hama is a distributed computing framework based on BSP (Bulk Synchronous Parallel) computing techniques for massive scientific computations, Currently being incubated as one of the incubator project by the Apache Software Foundation

107K lines of code

0 current contributors

over 6 years since last commit

2 users on Open Hub

Inactive
0.0
 
I Use This

Apache Whirr

Compare

Claimed by Apache Software Foundation Analyzed 3 days ago

Apache Whirr is a set of libraries for running cloud services. Whirr provides: * A cloud-neutral way to run services. You don't have to worry about the idiosyncrasies of each provider. * A common service API. The details of provisioning are particular to the service. * Smart defaults for ... [More] services. You can get a properly configured system running quickly, while still being able to override settings as needed. You can also use Whirr as a command line tool for deploying clusters. [Less]

26.9K lines of code

0 current contributors

over 9 years since last commit

2 users on Open Hub

Inactive
0.0
 
I Use This

DevOps Perl Tools

Compare

  Analyzed about 2 hours ago

DevOps CLI Tools for Hadoop, Hive, HDFS file/snapshot age out, Solr / SolrCloud CLI, Ambari FreeIPA Kerberos, Config / Log Anonymizer, URL watcher for load balanced web farms, SQL ReCaser (Hive, Impala, Cassandra CQL, Couchbase N1QL, MySQL, PostgreSQL, Apache Drill, Microsoft SQL Server, Oracle, Pig ... [More] Latin, Neo4j, InfluxDB, Dockerfiles), Nginx stats watcher, Datameer, Linux tools... [Less]

5.89K lines of code

1 current contributors

about 2 months since last commit

1 users on Open Hub

Very Low Activity
0.0
 
I Use This
Licenses: No declared licenses

archon

Compare

  No analysis available

It is a OSGi based distributed system controler used to build/manage linux boxes

0 lines of code

0 current contributors

0 since last commit

1 users on Open Hub

Activity Not Available
5.0
 
I Use This
Mostly written in language not available
Licenses: apache_2