4
I Use This!
Activity Not Available

News

Posted almost 9 years ago by Pat Patterson
Today’s post is from Raphaël Velfre, a senior data engineer at MapR. Raphaël has spent some time working with StreamSets Data Collector (SDC) and MapR’s Converged Data Platform. In this blog entry, originally published on the MapR Converge ... [More] blog, Raphaël explains how to use SDC to extract data from MySQL and write it to MapR Streams, and then move The post MySQL Database Change Capture with MapR Streams, Apache Drill, and StreamSets appeared first on StreamSets. [Less]
Posted almost 9 years ago by Arvind Prabhakar
Apache Kudu and Open Source StreamSets Data Collector Simplify Batch and Real-Time Processing As originally posted on the Cloudera VISION Blog. At StreamSets, we come across dataflow challenges for a variety of applications. Our product, StreamSets ... [More] Data Collector is an open-source any-to-any dataflow system that ensures that all your data is safely delivered in the The post Creating a Post-Lambda World with Apache Kudu appeared first on StreamSets. [Less]
Posted almost 9 years ago by Girish Pancha
Friends of StreamSets, Today I am delighted to announce our new product, StreamSets Dataflow Performance Manager, or DPM, the industry’s first solution for managing operations of a company’s end-to-end dataflows within a single pane of glass. The ... [More] result of a year’s worth of innovative engineering and collaboration with key customers, DPM will be generally available The post Introducing StreamSets DPM – Operational Control of Your Data in Motion appeared first on StreamSets. [Less]
Posted almost 9 years ago by Pat Patterson
After Guglielmo Iozzia, a big data infrastructure engineer on the Ethical Hacking Team at IBM Ireland, recently spoke about building data pipelines using StreamSets Data Collector at Hadoop User Group Ireland, I invited him to contribute a blog post ... [More] outlining how he discovered StreamSets Data Collector (SDC) and the kinds of problems he and his team The post StreamSets Data Collector in Action at IBM Ireland appeared first on StreamSets. [Less]
Posted almost 9 years ago by Pat Patterson
Importing data into Apache Hive is one of the most common use cases in big data ingest, but gets tricky when data sources ‘drift’, changing the schema or semantics of incoming data. Introduced in StreamSets Data Collector (SDC) 1.5.0.0, the Hive ... [More] Drift Solution monitors the structure of incoming data, detecting schema drift and updating the The post Ingesting Drifting Data into Hive and Impala appeared first on StreamSets. [Less]
Posted almost 9 years ago by Kirit Basu
It’s been a busy summer here at StreamSets, we’ve been enabling some exciting use-cases for our customers, partners and the community of open-source users all over the world. We are excited to announce the newest version of the StreamSets Data ... [More] Collector. This version has a host of new features and over 100 bug fixes. Download it now. The post Announcing Data Collector ver 1.6.0.0 appeared first on StreamSets. [Less]
Posted almost 9 years ago by Pat Patterson
A key aspect of StreamSets Data Collector (SDC) is its ability to parse incoming data, giving you unprecedented flexibility in processing data flows. Sometimes, though, you don’t need to see ‘inside’ files – you just need to move them from a source ... [More] to one or more destinations. Breaking news – the upcoming StreamSets Data Collector 1.6.0.0 release The post Whole File Transfer with StreamSets Data Collector appeared first on StreamSets. [Less]
Posted almost 9 years ago by Pat Patterson
This blog post concludes a short series building up a IoT sensor testbed with StreamSets Data Collector (SDC), a Raspberry Pi and Apache Cassandra. Previously, I covered: Part 1: Ingesting Sensor Data on the Raspberry Pi with StreamSets Data ... [More] Collector Part 2: Retrieving Metrics via the StreamSets Data Collector REST API Part 3: Standard Deviations on The post Dynamic Outlier Detection with StreamSets and Cassandra appeared first on StreamSets. [Less]
Posted about 9 years ago by Pat Patterson
If you’ve been following the StreamSets blog over the past few weeks, you’ll know that I’ve been building an Internet of Things testbed on the Raspberry Pi. First, I got StreamSets Data Collector (SDC) running on the Pi, ingesting sensor data and ... [More] sending it to Apache Cassandra, and then I wrote a Python app to display SDC The post Standard Deviations on Cassandra – Rolling Your Own Aggregate Function appeared first on StreamSets. [Less]
Posted about 9 years ago by Kirit Basu
We’re happy to announce a version release of StreamSets Data Collector. This is a relatively minor mid term update with a number of important bug fixes, yet packs in a couple of fun features. Support for Azure Blob storage using the WASB protocol. ... [More] Customers can now use Data Collector to write directly to Azure HDInsight. Support The post Announcing Data Collector ver 1.5.1.0 appeared first on StreamSets. [Less]