Projects tagged ‘hdfs’

Apache Spark

Claimed by Apache Software Foundation Analyzed about 6 hours ago

Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. To run programs faster, Spark provides primitives for in-memory cluster computing: your job can load data into memory and query it repeatedly more rapidly than with ... [More]

1.46M lines of code

374 current contributors

about 14 hours since last commit

56 users on Open Hub

Very High Activity

0 Reviews

I Use This

Mostly written in Scala

Licenses: apache_2

Apache Hive

Claimed by Apache Software Foundation No analysis available

Hive is a data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, adhoc querying and analysis of large datasets data stored in Hadoop files. It provides a mechanism to put structure on this data and it also provides a simple query language called ... [More]

0 lines of code

0 current contributors

0 since last commit

23 users on Open Hub

Activity Not Available

0 Reviews

I Use This

Mostly written in language not available

Licenses: apache_2

Tags apache bigdata cluster clustercomputing distributed_computing hadoop hdfs java mapreduce orc spark sql 4 more...

Apache Avro

Claimed by Apache Software Foundation No analysis available

Avro is a serialization system.

0 lines of code

75 current contributors

0 since last commit

8 users on Open Hub

Activity Not Available

0 Reviews

I Use This

Mostly written in language not available

Licenses: apache_2

Tags hadoop hdfs serialization

AppScale

Analyzed about 10 hours ago

AppScale is an open-source implementation of the Google AppEngine (GAE) cloud computing interface. AppScale enables execution of GAE applications on virtualized cluster systems. In particular, AppScale enables users to execute GAE applications using their own clusters with greater scalability and ... [More]

1.23M lines of code

10 current contributors

almost 5 years since last commit

7 users on Open Hub

Inactive

0 Reviews

I Use This

Mostly written in Python

Licenses: apache_2, bsd

Tags appengine appscale cassandra cloudcomputing hadoop hbase hdfs hypertable memcachedb mongodb mysql platform-as-a-service 1 more...

StreamSets Data Collector

Claimed by StreamSets No analysis available

Open source software for the rapid development and reliable operation of complex data flows.

0 lines of code

60 current contributors

0 since last commit

4 users on Open Hub

Activity Not Available

0 Reviews

I Use This

Mostly written in language not available

Licenses: apache_2

Tags azure bigdata cassandra cluster dataflow ec2 etl hadoop hdfs ingest jdbc kafka 5 more...

Apache Flume

Claimed by Apache Software Foundation Analyzed 1 day ago

Apache Flume is a system for reliably collecting high-throughput data from streaming data sources like logs.

72.8K lines of code

3 current contributors

7 months since last commit

4 users on Open Hub

Very Low Activity

0 Reviews

I Use This

Mostly written in Java

Licenses: apache_2

Tags apache apache-software-foundation bigdata data hadoop hdfs java mapreduce streamingdata

Apache Hama

Claimed by Apache Software Foundation Analyzed about 15 hours ago

Hama is a distributed computing framework based on BSP (Bulk Synchronous Parallel) computing techniques for massive scientific computations, Currently being incubated as one of the incubator project by the Apache Software Foundation

107K lines of code

0 current contributors

almost 7 years since last commit

2 users on Open Hub

Inactive

0 Reviews

I Use This

Mostly written in Java

Licenses: apache_2

Tags bsp bulk distributed google graphdb hdfs large_scale matrix message_passing pagerank parallel pregel 1 more...

Apache Whirr

Claimed by Apache Software Foundation Analyzed about 11 hours ago

Apache Whirr is a set of libraries for running cloud services. Whirr provides: * A cloud-neutral way to run services. You don't have to worry about the idiosyncrasies of each provider. * A common service API. The details of provisioning are particular to the service. * Smart defaults for ... [More]

26.9K lines of code

0 current contributors

almost 10 years since last commit

2 users on Open Hub

Inactive

0 Reviews

I Use This

Mostly written in Java

Licenses: apache_2

Tags amazon apache apache-software-foundation aws bigdata cassandra chef cloudcomputing cloudservers data ec2 hadoop 14 more...

DevOps Perl Tools

D

Analyzed about 15 hours ago

DevOps CLI Tools for Hadoop, Hive, HDFS file/snapshot age out, Solr / SolrCloud CLI, Ambari FreeIPA Kerberos, Config / Log Anonymizer, URL watcher for load balanced web farms, SQL ReCaser (Hive, Impala, Cassandra CQL, Couchbase N1QL, MySQL, PostgreSQL, Apache Drill, Microsoft SQL Server, Oracle, Pig ... [More]

5.9K lines of code

1 current contributors

2 months since last commit

1 users on Open Hub

Very Low Activity

0 Reviews

I Use This

Mostly written in Perl

Licenses: No declared licenses

Tags Ambari anonymization anonymizer bigdata hadoop hdfs hive hortonworks http linux loadbalancing nginx 3 more...

archon

No analysis available

It is a OSGi based distributed system controler used to build/manage linux boxes

0 lines of code

0 current contributors

0 since last commit

1 users on Open Hub

Activity Not Available

0 Reviews

I Use This

Mostly written in language not available

Licenses: apache_2

Tags cassandra cloudera hadoop hbase hdfs hypertable

Tags : Browse Projects