Tags : Browse Projects

Select a tag to browse associated projects and drill deeper into the tag cloud.

python pandas

Compare

  Analyzed about 2 hours ago

pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it ... [More] has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. pandas is well suited for many different kinds of data: * Tabular data, as in an SQL table or Excel spreadsheet * Ordered and unordered (not necessarily fixed-frequency) time series data * Matrix data, potentially with row and column labels * Any other form of common structured data [Less]

456K lines of code

449 current contributors

1 day since last commit

89 users on Open Hub

Very High Activity
4.9375
   
I Use This

Apache Spark

Compare

Claimed by Apache Software Foundation Analyzed 1 day ago

Apache Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write. To run programs faster, Spark provides primitives for in-memory cluster computing: your job can load data into memory and query it repeatedly more rapidly than with ... [More] disk-based systems like Hadoop. To make programming faster, Spark offers high-level APIs in Scala, Java and Python, letting you manipulate distributed datasets like local collections. You can also use Spark interactively to query big data from the Scala or Python shells. Spark integrates closely with Hadoop to run inside Hadoop clusters and can access any existing Hadoop data source. [Less]

1.28M lines of code

374 current contributors

3 days since last commit

56 users on Open Hub

Very High Activity
5.0
 
I Use This

Apache Hive

Compare

Claimed by Apache Software Foundation No analysis available

Hive is a data warehouse infrastructure built on top of Hadoop that provides tools to enable easy data summarization, adhoc querying and analysis of large datasets data stored in Hadoop files. It provides a mechanism to put structure on this data and it also provides a simple query language called ... [More] Hive QL which is based on SQL and which enables users familiar with SQL to query this data. At the same time, this language also allows traditional map/reduce programmers to be able to plug in their custom mappers and reducers to do more sophisticated analysis which may not be supported by the built-in capabilities of the language. [Less]

0 lines of code

0 current contributors

0 since last commit

23 users on Open Hub

Activity Not Available
5.0
 
I Use This
Mostly written in language not available
Licenses: apache_2

rasdaman

Compare

Claimed by rasdaman Analyzed about 16 hours ago

rasdaman is the pioneer Array DBMS enabling Big Data Analytics on massive n-D arrays ("datacubes") such as spatio-temporal sensor, image, simulation, and statistics data. Its declarative query language extends SQL with high-level array operators. Frontends supported include python, R, GDAL, QGIS ... [More] , OpenLayers, NASA WorldWind, and many more. Client-side parts are LGPL for free use of rasdaman in open and closed applications; the server itself (a separate process) is GPL implementing the paradigm of "life is a give and take". rasdaman has been released into OS by rasdaman GmbH and is now co-led by Jacobs U, the makers of several Big Data standards: ISO Array SQL, OGC WCS and WCPS; rasdaman is official OGC and INSPIRE WCS Reference Implementation and part of OSGeo Live (live.osgeo.org). [Less]

744K lines of code

11 current contributors

2 months since last commit

20 users on Open Hub

Moderate Activity
5.0
 
I Use This

JasperReports Server

Compare

  Analyzed 2 months ago

JasperReports Server is part of the Jaspersoft Business Intelligence Suite, an open source business intelligence platform from Jaspersoft, providing common services like security and metadata management, and the capability to easily add additional functionality. JasperReports Server is built by the ... [More] developers of JasperReports, the leading open source reporting engine. See the JasperReports Server community at: http://community.jaspersoft.com [Less]

563K lines of code

0 current contributors

5 months since last commit

14 users on Open Hub

Activity Not Available
4.44444
   
I Use This

Apache Flink

Claimed by Apache Software Foundation Analyzed about 21 hours ago

Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale. Learn more about Flink at http://flink.apache.org/

2.1M lines of code

323 current contributors

4 days since last commit

9 users on Open Hub

Very High Activity
5.0
 
I Use This

Jaspersoft Studio

Compare

  Analyzed 2 months ago

Eclipse-based JasperReports Designer

1.04M lines of code

0 current contributors

7 months since last commit

7 users on Open Hub

Activity Not Available
5.0
 
I Use This

Apache Storm

Compare

Claimed by Apache Software Foundation Analyzed about 13 hours ago

Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language. Storm is fast: a benchmark ... [More] clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate. Storm integrates with the queueing and database technologies you already use. A Storm topology consumes streams of data and processes those streams in arbitrarily complex ways, repartitioning the streams between each stage of the computation however needed. [Less]

353K lines of code

45 current contributors

7 days since last commit

6 users on Open Hub

Moderate Activity
5.0
 
I Use This

Alluxio (formerly Tachyon): A Memory Speed Virtual Distributed Stor...

Compare

  No analysis available

Alluxio, formerly Tachyon, is a memory speed virtual distributed storage enabling reliable data sharing at memory-speed across cluster jobs. Alluxio lies between computation frameworks, such as Spark, MapReduce, or Flink, and various storage systems, such as Amazon S3, OpenStack Swift, GlusterFS ... [More] , HDFS, or Ceph. Alluxio brings significant performance improvement and bridges new workloads with data stored in traditional storage. Alluxio is Hadoop compatible. Existing Spark and MapReduce programs can run on top of it without code changes. The project is deployed at multiple companies. With less than three years open source history, Alluxio has attracted more than 600 contributors from over 200 institutions, including Alibaba, Alluxio, Baidu, IBM, Intel, Red Hat, and UC Berkeley. [Less]

0 lines of code

210 current contributors

0 since last commit

6 users on Open Hub

Activity Not Available
5.0
 
I Use This
Mostly written in language not available
Licenses: apache_2

Apache Apex

Compare

Claimed by Apache Software Foundation Analyzed about 16 hours ago

Apache Apex is an enterprise grade big data-in-motion platform that unifies stream and batch processing. Apex was built for scalability and low-latency processing, high availability and operability. The Apex engine is supplemented by Malhar, the library of pre-built operators, including connectors ... [More] that integrate with many existing technologies as sources and destinations, like message buses, databases, files or social media feeds. [Less]

284K lines of code

1 current contributors

almost 3 years since last commit

6 users on Open Hub

Inactive
0.0
 
I Use This