4
I Use This!
Activity Not Available

News

Posted over 8 years ago by Pat Patterson
New in StreamSets Data Collector (SDC) 2.2.0.0 is the Spark Evaluator, a processor stage that allows you to run an Apache Spark application, termed a Spark Transformer, as part of an SDC pipeline. With the Spark Evaluator, you can build a pipeline to ... [More] ingest data from any supported origin, apply transformations, such as filtering and lookups, using existing SDC The post Running Apache Spark Code in StreamSets Data Collector appeared first on StreamSets. [Less]
Posted over 8 years ago by Kirit Basu
And here it is folks, the last release of 2016 – StreamSets Data Collector version 2.2.0.0. We’ve put in a host of important new features and resolved 120+ bugs. We’re gearing up for a solid roadmap in 2017, enabling exciting new use cases and ... [More] bringing in some great contributions from customers and our community. Please take this out for The post Announcing Data Collector ver 2.2.0.0 appeared first on StreamSets. [Less]
Posted over 8 years ago by Pat Patterson
Apache Flume “is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data”. The typical use case is collecting log data and pushing it to a destination such as the Hadoop Distributed ... [More] File System. In this blog entry we’ll look at a couple of Flume use cases, and The post Upgrading From Apache Flume to StreamSets Data Collector appeared first on StreamSets. [Less]
Posted almost 9 years ago by Rick Bilodeau
It’s been a little over a year (9/24/15) since we launched StreamSets Data Collector as an open source project. For those of you unfamiliar with the product, it’s any-to-any big data ingestion software through which you can build and place into ... [More] production complex batch and streaming pipelines using built-in processors for all sorts of data The post More Than One Third of the Fortune 100 Have Downloaded StreamSets Data Collector appeared first on StreamSets. [Less]
Posted almost 9 years ago by Pat Patterson
As you likely already know, StreamSets Data Collector (SDC) is open source, made available via the Apache 2.0 license. The entire source code for the product is hosted in a GitHub project and the binaries are always available for download. As well as ... [More] being part of our engineering culture, open source gives us a number The post Contributing to the StreamSets Data Collector Community appeared first on StreamSets. [Less]
Posted almost 9 years ago by Rick Bilodeau
Reposted from the Cloudera Vision blog. What do Sony, Target and the Democratic Party have in common? Besides being well-respected brands, they’ve all been subject to some very public and embarrassing hacks over the past 24 months. Because cybercrime ... [More] is no longer driven by angst-ridden teenagers but rather professional criminal organizations and state-sponsored hacker groups, the The post The Challenge of Fetching Data for Apache Spot (incubating) appeared first on StreamSets. [Less]
Posted almost 9 years ago by Pat Patterson
Back in March, I wrote a tutorial showing how to create a custom destination for StreamSets Data Collector (SDC). Since then I’ve been looking for a good sample use case for a custom processor. It’s tricky to find one, since the set of out-of-the-box ... [More] processors is pretty extensive now! In particular, the scripting processors make The post Creating a Custom Processor for StreamSets Data Collector appeared first on StreamSets. [Less]
Posted almost 9 years ago by Kirit Basu
We’re happy to announce a new release of the Data Collector. This minor release has over 30+ bug fixes and a number of  improvements and a few new features : A Package Manager that allows you to install new Stage Libraries (Origins, Processors ... [More] , Destinations) right from the User Interface. With this feature you can download The post Announcing Data Collector ver 2.1.0.0 appeared first on StreamSets. [Less]
Posted almost 9 years ago by Pat Patterson
Sandish Kumar, a Solutions Engineer at phData, builds and manages solutions for phData customers. In this article, reposted from the phData blog, he explains how to generate simulated NetFlow data, read it into StreamSets Data Collector via the UDP ... [More] origin, then buffer it in Apache Kafka before sending it to Apache Kudu. A true big data enthusiast, Sandish spends The post Visualizing NetFlow Data with StreamSets Data Collector, Kudu, Impala and D3 appeared first on StreamSets. [Less]
Posted almost 9 years ago by Kirit Basu
Last October, we publicly announced StreamSets Data Collector version 1.0. Over the last 12 months we have seen an awesome (a word we don’t use lightly) amount of adoption of our first product – from individual developers simplifying their day-to-day ... [More] work, to small startups building the next big thing, to the very largest companies building The post Announcing StreamSets Data Collector version 2.0 appeared first on StreamSets. [Less]