Tags : Browse Projects

Select a tag to browse associated projects and drill deeper into the tag cloud.

Apache Spark

Compare

Claimed by Apache Software Foundation Analyzed 5 months ago

Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. To run programs faster, Spark provides primitives for in-memory cluster computing: your job can load data into memory and query it repeatedly more rapidly than with ... [More] disk-based systems like Hadoop. To make programming faster, Spark offers high-level APIs in Scala, Java and Python, letting you manipulate distributed datasets like local collections. You can also use Spark interactively to query big data from the Scala or Python shells. Spark integrates closely with Hadoop to run inside Hadoop clusters and can access any existing Hadoop data source. [Less]

1.02M lines of code

374 current contributors

5 months since last commit

56 users on Open Hub

Activity Not Available
5.0
 
I Use This

Apache Apex

Compare

Claimed by Apache Software Foundation Analyzed about 17 hours ago

Apache Apex is an enterprise grade big data-in-motion platform that unifies stream and batch processing. Apex was built for scalability and low-latency processing, high availability and operability. The Apex engine is supplemented by Malhar, the library of pre-built operators, including connectors ... [More] that integrate with many existing technologies as sources and destinations, like message buses, databases, files or social media feeds. [Less]

284K lines of code

1 current contributors

over 1 year since last commit

6 users on Open Hub

Very Low Activity
0.0
 
I Use This

Apache Storm

Compare

Claimed by Apache Software Foundation Analyzed almost 3 years ago

Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language. Storm is fast: a benchmark ... [More] clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate. Storm integrates with the queueing and database technologies you already use. A Storm topology consumes streams of data and processes those streams in arbitrarily complex ways, repartitioning the streams between each stage of the computation however needed. [Less]

349K lines of code

45 current contributors

almost 3 years since last commit

6 users on Open Hub

Activity Not Available
5.0
 
I Use This

Apache Flume

Compare

Claimed by Apache Software Foundation Analyzed about 11 hours ago

Apache Flume is a system for reliably collecting high-throughput data from streaming data sources like logs.

103K lines of code

3 current contributors

6 days since last commit

4 users on Open Hub

Moderate Activity
0.0
 
I Use This

StreamSets Data Collector

Compare

Claimed by StreamSets Analyzed over 1 year ago

Open source software for the rapid development and ​reliable​ operation of complex data flows.

1.04M lines of code

60 current contributors

over 1 year since last commit

4 users on Open Hub

Activity Not Available
5.0
 
I Use This

Crossdata

Compare

  Analyzed about 1 hour ago

Easy access to big things. Library for Apache Spark extending and improving its capabilities

30.7K lines of code

2 current contributors

almost 3 years since last commit

2 users on Open Hub

Inactive
5.0
 
I Use This

Sip Tools

Compare

  Analyzed almost 3 years ago

Sip Tools is a composite project including several toolkits to enhance JAIN-SIP, Java Media Framework, and similar tools centering on SIP and RTP Media. Iced Java is a Java implementation of RFC 5245 ICE, RFC 5389 STUN and RFC 5766 TURN. The goal of this project is to be as all-encompassing of ... [More] use cases as possible, while imposing a minimal burden on the users of the library to modify their code. RTP Streaming is the most obvious use case, though any P2P Datagram based service is a good candidate for using Iced Java to reduce the programming burden imposed by NATs. [Less]

11.5K lines of code

0 current contributors

about 11 years since last commit

2 users on Open Hub

Activity Not Available
5.0
 
I Use This

SIEGate

Compare

Claimed by Grid Protection Alliance (GPA) Analyzed 5 months ago

SIEGate (the Secure Information Exchange Gateway pronounced Psy-gate) exists to (1) improve the security posture and minimize the external cyber-attack surface of electric utility control centers, and (2) to reduce the cost of maintaining current control-room-to-control-room information exchange. ... [More] SIEGate implements a true publish-subscribe architecture where the sending gateway owner authorizes data as available for subscription by specific consuming gateways. Once authorized, the consuming gateway discovers the data that has been made available to it by other SIEGate nodes and allows selective subscription. Data made available for publication and subscription by SIEGate includes measurements, such as SCADA or synchrophasor data, files, notifications and alarms. [Less]

383K lines of code

3 current contributors

5 months since last commit

1 users on Open Hub

Activity Not Available
0.0
 
I Use This

action-core

Compare

  Analyzed 1 day ago

HDFS browser over HTTP, deserialize files on the fly (Thrift, Avro, ...).

3.68K lines of code

0 current contributors

over 10 years since last commit

1 users on Open Hub

Inactive
0.0
 
I Use This

collector-core

Compare

  Analyzed 1 day ago

HDFS endpoint collecting and aggregating data flows.

21K lines of code

0 current contributors

almost 9 years since last commit

1 users on Open Hub

Inactive
0.0
 
I Use This