Tags : Browse Projects

Select a tag to browse associated projects and drill deeper into the tag cloud.

Apache Spark

Compare

Claimed by Apache Software Foundation Analyzed 1 day ago

Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. To run programs faster, Spark provides primitives for in-memory cluster computing: your job can load data into memory and query it repeatedly more rapidly than with ... [More] disk-based systems like Hadoop. To make programming faster, Spark offers high-level APIs in Scala, Java and Python, letting you manipulate distributed datasets like local collections. You can also use Spark interactively to query big data from the Scala or Python shells. Spark integrates closely with Hadoop to run inside Hadoop clusters and can access any existing Hadoop data source. [Less]

1.3M lines of code

374 current contributors

3 days since last commit

56 users on Open Hub

Very High Activity
5.0
 
I Use This

Apache Apex

Compare

Claimed by Apache Software Foundation Analyzed 1 day ago

Apache Apex is an enterprise grade big data-in-motion platform that unifies stream and batch processing. Apex was built for scalability and low-latency processing, high availability and operability. The Apex engine is supplemented by Malhar, the library of pre-built operators, including connectors ... [More] that integrate with many existing technologies as sources and destinations, like message buses, databases, files or social media feeds. [Less]

284K lines of code

1 current contributors

about 3 years since last commit

6 users on Open Hub

Inactive
0.0
 
I Use This

Apache Storm

Compare

Claimed by Apache Software Foundation Analyzed 1 day ago

Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language. Storm is fast: a benchmark ... [More] clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate. Storm integrates with the queueing and database technologies you already use. A Storm topology consumes streams of data and processes those streams in arbitrarily complex ways, repartitioning the streams between each stage of the computation however needed. [Less]

353K lines of code

45 current contributors

5 days since last commit

6 users on Open Hub

Moderate Activity
5.0
 
I Use This

Apache Flume

Compare

Claimed by Apache Software Foundation Analyzed 1 day ago

Apache Flume is a system for reliably collecting high-throughput data from streaming data sources like logs.

83.7K lines of code

3 current contributors

22 days since last commit

4 users on Open Hub

Low Activity
0.0
 
I Use This

StreamSets Data Collector

Compare

Claimed by StreamSets No analysis available

Open source software for the rapid development and ​reliable​ operation of complex data flows.

0 lines of code

60 current contributors

0 since last commit

4 users on Open Hub

Activity Not Available
5.0
 
I Use This
Mostly written in language not available
Licenses: apache_2

Crossdata

Compare

  Analyzed about 2 hours ago

Easy access to big things. Library for Apache Spark extending and improving its capabilities

30.7K lines of code

2 current contributors

over 4 years since last commit

2 users on Open Hub

Inactive
5.0
 
I Use This

Sip Tools

Compare

  No analysis available

Sip Tools is a composite project including several toolkits to enhance JAIN-SIP, Java Media Framework, and similar tools centering on SIP and RTP Media. Iced Java is a Java implementation of RFC 5245 ICE, RFC 5389 STUN and RFC 5766 TURN. The goal of this project is to be as all-encompassing of ... [More] use cases as possible, while imposing a minimal burden on the users of the library to modify their code. RTP Streaming is the most obvious use case, though any P2P Datagram based service is a good candidate for using Iced Java to reduce the programming burden imposed by NATs. [Less]

0 lines of code

0 current contributors

0 since last commit

2 users on Open Hub

Activity Not Available
5.0
 
I Use This
Mostly written in language not available
Licenses: lgpl3

SIEGate

Compare

Claimed by Grid Protection Alliance (GPA) Analyzed about 9 hours ago

SIEGate (the Secure Information Exchange Gateway pronounced Psy-gate) exists to (1) improve the security posture and minimize the external cyber-attack surface of electric utility control centers, and (2) to reduce the cost of maintaining current control-room-to-control-room information exchange. ... [More] SIEGate implements a true publish-subscribe architecture where the sending gateway owner authorizes data as available for subscription by specific consuming gateways. Once authorized, the consuming gateway discovers the data that has been made available to it by other SIEGate nodes and allows selective subscription. Data made available for publication and subscription by SIEGate includes measurements, such as SCADA or synchrophasor data, files, notifications and alarms. [Less]

1.11M lines of code

3 current contributors

21 days since last commit

1 users on Open Hub

Moderate Activity
0.0
 
I Use This

Cipherpack

Compare

  Analyzed about 17 hours ago

Cipherpack, a secure stream processor utilizing public-key signatures to authenticate the sender and public-key encryption of a symmetric-key for multiple receiver ensuring their privacy and high-performance message encryption. Cipherpack securely streams messages through any media, via file ... [More] using ByteInStream_File and via all libcurl network protocols using ByteInStream_URL are build-in and supported. Note: libcurl must be enabled via -DUSE_LIBCURL=ON at build. A user may use the media agnostic ByteInStream_Feed to produce the input stream by injecting data off-thread and a CipherpackListener to receive the processed output stream. Cipherpack is implemented using C++17 or C++20 and is accessible via C++ and Java. [Less]

308K lines of code

0 current contributors

about 1 month since last commit

1 users on Open Hub

Very Low Activity
0.0
 
I Use This

collector-core

Compare

  Analyzed about 9 hours ago

HDFS endpoint collecting and aggregating data flows.

21K lines of code

0 current contributors

over 10 years since last commit

1 users on Open Hub

Inactive
0.0
 
I Use This