Posted
about 8 years
ago
by
Jonathan Natkins
Many Roads Lead to Rome Over the last ten years, the data management landscape has changed dramatically — on that, I think we can all agree. The rise of big data and the new data management ecosystem has created an abundance of new patterns and
|
Posted
about 8 years
ago
by
Pat Patterson
Graph databases represent and store data in terms of nodes, edges and properties, allowing quick, easy retrieval of complex hierarchical structures that may be difficult to model in traditional relational databases. Neo4j is an open source graph
|
Posted
about 8 years
ago
by
Pat Patterson
The Script Evaluators in StreamSets Data Collector (SDC) allow you to manipulate data in pretty much any way you please. I've already written about how you can call external Java code from your scripts – compiled Java code has great performance, but
|
Posted
about 8 years
ago
by
Pat Patterson
I run StreamSets Data Collector on my MacBook Pro. In fact, I have about a dozen different versions installed – the latest, greatest 2.5.0.0, older versions, release candidates, and, of course, a development ‘master' build that I hack on. Preparing
|
Posted
over 8 years
ago
by
Pat Patterson
One of the most powerful features in StreamSets Data Collector (SDC) is support for Expression Language, or ‘EL' for short. EL was introduced in JavaServer Pages (JSP) 2.0 as a mechanism for accessing Java code from JSP. The Expression Evaluator and
|
Posted
over 8 years
ago
by
Pat Patterson
Multithreaded Pipelines, introduced a couple of releases back, in StreamSets Data Collector (SDC) 2.3.0.0, enable a single pipeline instance to process high volumes of data, taking full advantage of all available CPUs on the machine. In this blog
|
Posted
over 8 years
ago
by
Pat Patterson
There has been an explosion of innovation in open source stream processing over the past few years. Frameworks such as Apache Spark and Apache Storm give developers stream abstractions on which they can develop applications; Apache Beam provides an
|
Posted
over 8 years
ago
by
Kirit Basu
We’re thrilled to announce version 2.5 of StreamSets Data Collector, a major release which includes important functionality related to the Internet of Things (IoT), high-performance database ingest, integration with Apache Spark and integration into
|
Posted
over 8 years
ago
by
Pat Patterson
Mike Fuller, a consultant at Red Pill Analytics, recently wrote Stream Me Up (to the Cloud), Scotty, a tutorial on installing StreamSets Data Collector (SDC) on Amazon Web Services EC2. Mike's article takes you all the way from logging in to
|
Posted
over 8 years
ago
by
Pat Patterson
I've written quite a bit over the past few months about the more advanced aspects of data manipulation in StreamSets Data Collector (SDC) – writing custom processors, calling Java libraries from JavaScript, Groovy & Python, and even using Java
|