Projects tagged ‘bigdata’

python pandas

P

Analyzed 1 day ago

pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it ... [More]

487K lines of code

449 current contributors

2 days since last commit

89 users on Open Hub

Very High Activity

1 Review

I Use This

Mostly written in Python

Licenses: BSD-3-Clause

Apache Spark

Claimed by Apache Software Foundation Analyzed about 8 hours ago

Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. To run programs faster, Spark provides primitives for in-memory cluster computing: your job can load data into memory and query it repeatedly more rapidly than with ... [More]

1.8M lines of code

374 current contributors

2 days since last commit

56 users on Open Hub

Very High Activity

0 Reviews

I Use This

Mostly written in Scala

Licenses: apache_2

Tags apache bigdata cluster clustercomputing distributed distributed_computing ec2 graph_computing hadoop hdfs in_memory java 8 more...

Apache Hive

Claimed by Apache Software Foundation No analysis available

Hive is a data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, adhoc querying and analysis of large datasets data stored in Hadoop files. It provides a mechanism to put structure on this data and it also provides a simple query language called ... [More]

0 lines of code

0 current contributors

0 since last commit

22 users on Open Hub

Activity Not Available

0 Reviews

I Use This

Mostly written in language not available

Licenses: apache_2

Tags apache bigdata cluster clustercomputing distributed_computing hadoop hdfs java mapreduce orc spark sql 4 more...

rasdaman

Claimed by rasdaman Analyzed about 8 hours ago

rasdaman is the pioneer Array DBMS enabling Big Datacube Analytics on spatio-temporal sensor, image, simulation, and statistics data through a high-level declarative datacube query language. Frontends supported include Leaflet, QGIS, ArcGIS, Python, R, openEO, and many more. Client-side parts are ... [More]

735K lines of code

11 current contributors

9 days since last commit

20 users on Open Hub

Moderate Activity

1 Review

I Use This

Mostly written in C++

Licenses: gpl3_or_l...

Tags analytics array_processing arrayquery bigdata database scientificdataanalysis

JasperReports Server

J

Analyzed over 2 years ago

JasperReports Server is part of the Jaspersoft Business Intelligence Suite, an open source business intelligence platform from Jaspersoft, providing common services like security and metadata management, and the capability to easily add additional functionality. JasperReports Server is built by the ... [More]

563K lines of code

0 current contributors

over 2 years since last commit

14 users on Open Hub

Activity Not Available

0 Reviews

I Use This

Mostly written in Java

Licenses: AGPL3

Tags analytics bigdata business_intelligence charting charts csv dashboards data embedded excel ireport jasper 8 more...

Apache Flink

Claimed by Apache Software Foundation Analyzed about 3 hours ago

Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale. Learn more about Flink at http://flink.apache.org/

2.28M lines of code

323 current contributors

21 days since last commit

8 users on Open Hub

Very High Activity

0 Reviews

I Use This

Mostly written in Java

Licenses: apache_2

Tags apache bigdata cluster distributed hadoop java machinelearning mapreduce scala streaming

Jaspersoft Studio

Analyzed over 2 years ago

Eclipse-based JasperReports Designer

1.04M lines of code

0 current contributors

almost 3 years since last commit

7 users on Open Hub

Activity Not Available

0 Reviews

I Use This

Mostly written in Java

Licenses: eclipse

Tags bigdata business_intelligence charting charts dashboards data database ide jasper jasperreports java jdbc 7 more...

Apache Storm

A

Claimed by Apache Software Foundation Analyzed 2 days ago

Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language. Storm is fast: a benchmark ... [More]

351K lines of code

45 current contributors

4 days since last commit

6 users on Open Hub

High Activity

0 Reviews

I Use This

Mostly written in Java

Licenses: apache_2

Tags bigdata cloud cluster clustercomputing datastreams distributed distributed_computing distributedsystem ec2 fault_tolerant java json 8 more...

Alluxio (formerly Tachyon): A Memory Speed Virtual Distributed Stor...

Analyzed about 1 hour ago

Alluxio, formerly Tachyon, is a memory speed virtual distributed storage enabling reliable data sharing at memory-speed across cluster jobs. Alluxio lies between computation frameworks, such as Spark, MapReduce, or Flink, and various storage systems, such as Amazon S3, OpenStack Swift, GlusterFS ... [More]

349K lines of code

210 current contributors

about 1 year since last commit

6 users on Open Hub

Very Low Activity

0 Reviews

I Use This

Mostly written in Java

Licenses: apache_2

Tags bigdata distributedsystems memorycentric storage virtual

Apache Apex

Claimed by Apache Software Foundation Analyzed about 15 hours ago

Apache Apex is an enterprise grade big data-in-motion platform that unifies stream and batch processing. Apex was built for scalability and low-latency processing, high availability and operability. The Apex engine is supplemented by Malhar, the library of pre-built operators, including connectors ... [More]

284K lines of code

1 current contributors

over 5 years since last commit

6 users on Open Hub

Inactive

0 Reviews

I Use This

Mostly written in Java

Licenses: apache_2

Tags analytics apache bigdata cluster datastream distributed fault_tolerant hadoop high_performance java opensource realtime 4 more...

Tags : Browse Projects