Posted
about 6 years
ago
by
Pat Patterson
Together, StreamSets Control Hub (SCH) and StreamSets Data Collector Edge (SDC Edge) allow you to create, deploy and run dataflow pipelines in an unprecedented variety of environments. In this short series of videos, I'll show you how to install SDC
|
Posted
about 6 years
ago
by
Pat Patterson
Together, StreamSets Control Hub (SCH) and StreamSets Data Collector Edge (SDC Edge) allow you to create, deploy and run dataflow pipelines in an unprecedented variety of environments. In this short series of videos, I'll show you how to install SDC
|
Posted
over 6 years
ago
by
Hari Nayak
In my previous blog entry, I explained how to spin up Data Collectors as Kubernetes deployments along with Dataflow Performance Manager. I recommended using a deployment with one replica as the design environment and a deployment with many replicas
|
Posted
over 6 years
ago
by
Hari Nayak
In my previous blog entry, I explained how to spin up Data Collectors as Kubernetes deployments along with Dataflow Performance Manager. I recommended using a deployment with one replica as the design environment and a deployment with many replicas
|
Posted
over 6 years
ago
by
Pat Patterson
Happy New Year! Our first blog entry of 2018 is a guest post from Josh Janzen, a data scientist based in Minnesota. Josh wanted to ingest tweets referencing NFL games into Spark, then run some analysis to look for a correlation between
|
Posted
over 6 years
ago
by
Pat Patterson
Happy New Year! Our first blog entry of 2018 is a guest post from Josh Janzen, a data scientist based in Minnesota. Josh wanted to ingest tweets referencing NFL games into Spark, then run some analysis to look for a correlation between
|
Posted
over 6 years
ago
by
Pat Patterson
In this guest blog, Predera‘s Kiran Krishna Innamuri (Data Engineer), and Nazeer Hussain (Head of Platform Engineering and Services) focus on building a data pipeline to perform lookups or run queries on Hive tables with the Spark execution engine
|
Posted
over 6 years
ago
by
Clarke
Control. We always want it, regularly don’t get it, yet in business it’s a must have to ensure things run as expected. Control is particularly critical when it comes to moving data around your company. Without it, it’s difficult to know where data is
|
Posted
over 6 years
ago
by
Pat Patterson
In a previous blog post, I explained how StreamSets Data Collector (SDC) can work with Apache Kafka and Confluent Schema Registry to handle data drift via Avro schema evolution. In that blog post, I mentioned SDC's Schema Generator processor; today
|
Posted
over 6 years
ago
by
Kirit Basu
Today an increasing amount of data is being generated from outside the data center or cloud – it isn’t always easy to get this data out of source systems or perform analytics right where it’s generated. Furthermore, getting this data into central big
|