4
I Use This!
Activity Not Available

News

Posted about 8 years ago by Jonathan Natkins
Many Roads Lead to Rome Over the last ten years, the data management landscape has changed dramatically — on that, I think we can all agree. The rise of big data and the new data management ecosystem has created an abundance of new patterns and ... [More] tools, each of which is more specialized than the last. The post Embrace Diversity in Your Data Architecture appeared first on StreamSets. [Less]
Posted about 8 years ago by Pat Patterson
Graph databases represent and store data in terms of nodes, edges and properties, allowing quick, easy retrieval of complex hierarchical structures that may be difficult to model in traditional relational databases. Neo4j is an open source graph ... [More] database widely deployed in the community; in this blog entry I'll show you how to use StreamSets Data The post Visualizing and Analyzing Salesforce Data with Neo4j appeared first on StreamSets. [Less]
Posted about 8 years ago by Pat Patterson
The Script Evaluators in StreamSets Data Collector (SDC) allow you to manipulate data in pretty much any way you please. I've already written about how you can call external Java code from your scripts – compiled Java code has great performance, but ... [More] sometimes the code you need isn't available in a JAR. Today I'll show you how to call an external JavaScript The post Calling External Libraries from the JavaScript Evaluator appeared first on StreamSets. [Less]
Posted about 8 years ago by Pat Patterson
I run StreamSets Data Collector on my MacBook Pro. In fact, I have about a dozen different versions installed – the latest, greatest 2.5.0.0, older versions, release candidates, and, of course, a development ‘master' build that I hack on. Preparing ... [More] for tonight's St Louis Hadoop User Group Meetup, I downloaded Cloudera's CDH 5.10 Quickstart VM so I The post Quick Tip: Resolving ‘minReplication’ Hadoop FS Error appeared first on StreamSets. [Less]
Posted over 8 years ago by Pat Patterson
One of the most powerful features in StreamSets Data Collector (SDC) is support for Expression Language, or ‘EL' for short. EL was introduced in JavaServer Pages (JSP) 2.0 as a mechanism for accessing Java code from JSP. The Expression Evaluator and ... [More] Stream Selector stages rely heavily on EL, but you can use EL in configuring almost The post Create a Custom Expression Language Function for StreamSets Data Collector appeared first on StreamSets. [Less]
Posted over 8 years ago by Pat Patterson
Multithreaded Pipelines, introduced a couple of releases back, in StreamSets Data Collector (SDC) 2.3.0.0, enable a single pipeline instance to process high volumes of data, taking full advantage of all available CPUs on the machine. In this blog ... [More] entry I'll explain a little about how multithreaded pipelines work, and how you can implement your own multithreaded pipeline The post Creating a Custom Multithreaded Origin for StreamSets Data Collector appeared first on StreamSets. [Less]
Posted over 8 years ago by Pat Patterson
There has been an explosion of innovation in open source stream processing over the past few years. Frameworks such as Apache Spark and Apache Storm give developers stream abstractions on which they can develop applications; Apache Beam provides an ... [More] API abstraction, enabling developers to write code independent of the underlying framework, while tools such as The post Making Sense of Stream Processing appeared first on StreamSets. [Less]
Posted over 8 years ago by Kirit Basu
We’re thrilled to announce version 2.5 of StreamSets Data Collector, a major release which includes important functionality related to the Internet of Things (IoT), high-performance database ingest, integration with Apache Spark and integration into ... [More] your enterprise infrastructure.  You can download the latest open source release here. This release has over 22 new features, 95 improvements The post StreamSets Data Collector v2.5 Adds IoT, Spark, Performance and Scale appeared first on StreamSets. [Less]
Posted over 8 years ago by Pat Patterson
Mike Fuller, a consultant at Red Pill Analytics, recently wrote Stream Me Up (to the Cloud), Scotty, a tutorial on installing StreamSets Data Collector (SDC) on Amazon Web Services EC2. Mike's article takes you all the way from logging in to ... [More] a fresh EC2 instance to seeing your first pipeline in action. We're reposting it here courtesy of The post Installing StreamSets Data Collector on Amazon Web Services EC2 appeared first on StreamSets. [Less]
Posted over 8 years ago by Pat Patterson
I've written quite a bit over the past few months about the more advanced aspects of data manipulation in StreamSets Data Collector (SDC) – writing custom processors, calling Java libraries from JavaScript, Groovy & Python, and even using Java ... [More] and Scala with the Spark Evaluator. As a developer, it's always great fun to break out The post Transform Data in StreamSets Data Collector appeared first on StreamSets. [Less]