StreamSets Data Collector

Settings | Report Duplicate

4

I Use This!

Activity Not Available

News

Managing Data Operations on the Edge Posted over 7 years ago by Pat Patterson Together, StreamSets Control Hub (SCH) and StreamSets Data Collector Edge (SDC Edge) allow you to create, deploy and run dataflow pipelines in an unprecedented variety of environments. In this short series of videos, I'll show you how to install SDC ... [More]
Managing Data Operations on the Edge Posted over 7 years ago by Pat Patterson Together, StreamSets Control Hub (SCH) and StreamSets Data Collector Edge (SDC Edge) allow you to create, deploy and run dataflow pipelines in an unprecedented variety of environments. In this short series of videos, I'll show you how to install SDC ... [More]
Using StreamSets Control Hub for Scalable Deployment via Kubernetes Posted over 7 years ago by Hari Nayak In my previous blog entry, I explained how to spin up Data Collectors as Kubernetes deployments along with Dataflow Performance Manager. I recommended using a deployment with one replica as the design environment and a deployment with many replicas ... [More]
Using StreamSets Control Hub for Scalable Deployment via Kubernetes Posted over 7 years ago by Hari Nayak In my previous blog entry, I explained how to spin up Data Collectors as Kubernetes deployments along with Dataflow Performance Manager. I recommended using a deployment with one replica as the design environment and a deployment with many replicas ... [More]
Streaming Data from Twitter for Analysis in Spark Posted over 7 years ago by Pat Patterson Happy New Year! Our first blog entry of 2018 is a guest post from Josh Janzen, a data scientist based in Minnesota. Josh wanted to ingest tweets referencing NFL games into Spark, then run some analysis to look for a correlation between ... [More]
Streaming Data from Twitter for Analysis in Spark Posted over 7 years ago by Pat Patterson Happy New Year! Our first blog entry of 2018 is a guest post from Josh Janzen, a data scientist based in Minnesota. Josh wanted to ingest tweets referencing NFL games into Spark, then run some analysis to look for a correlation between ... [More]
Speed up Hive Data Retrieval using Spark, StreamSets and Predera Posted over 7 years ago by Pat Patterson In this guest blog, Predera‘s Kiran Krishna Innamuri (Data Engineer), and Nazeer Hussain (Head of Platform Engineering and Services) focus on building a data pipeline to perform lookups or run queries on Hive tables with the Spark execution engine ... [More]
Introducing StreamSets Control Hub Posted over 7 years ago by Clarke Control. We always want it, regularly don’t get it, yet in business it’s a must have to ensure things run as expected. Control is particularly critical when it comes to moving data around your company. Without it, it’s difficult to know where data is ... [More]
Generate your Avro Schema – Automatically! Posted over 7 years ago by Pat Patterson In a previous blog post, I explained how StreamSets Data Collector (SDC) can work with Apache Kafka and Confluent Schema Registry to handle data drift via Avro schema evolution. In that blog post, I mentioned SDC's Schema Generator processor; today ... [More]
Announcing StreamSets Data Collector Edge Posted over 7 years ago by Kirit Basu Today an increasing amount of data is being generated from outside the data center or cloud – it isn’t always easy to get this data out of source systems or perform analytics right where it’s generated. Furthermore, getting this data into central big ... [More]

Edit News Feeds

←
1
2
…
35
36
37
38
39
40
41
42
43
…
50
51
→