4
I Use This!
Activity Not Available

News

Posted about 6 years ago by Pat Patterson
Together, StreamSets Control Hub (SCH) and StreamSets Data Collector Edge (SDC Edge) allow you to create, deploy and run dataflow pipelines in an unprecedented variety of environments. In this short series of videos, I'll show you how to install SDC ... [More] Edge on a Raspberry Pi, how to get started building edge pipelines with SCH's Pipeline The post Managing Data Operations on the Edge appeared first on StreamSets. [Less]
Posted about 6 years ago by Pat Patterson
Together, StreamSets Control Hub (SCH) and StreamSets Data Collector Edge (SDC Edge) allow you to create, deploy and run dataflow pipelines in an unprecedented variety of environments. In this short series of videos, I'll show you how to install SDC ... [More] Edge on a Raspberry Pi, how to get started building edge pipelines with SCH's Pipeline […] The post Managing Data Operations on the Edge appeared first on Continuous Dataflows Built with StreamSets DataOps Platform. [Less]
Posted over 6 years ago by Hari Nayak
In my previous blog entry, I explained how to spin up Data Collectors as Kubernetes deployments along with Dataflow Performance Manager. I recommended using a deployment with one replica as the design environment and a deployment with many replicas ... [More] for execution. We recently announced StreamSets Control Hub which makes the Kubernetes integration way smoother! StreamSets Control Hub adds The post Using StreamSets Control Hub for Scalable Deployment via Kubernetes appeared first on StreamSets. [Less]
Posted over 6 years ago by Hari Nayak
In my previous blog entry, I explained how to spin up Data Collectors as Kubernetes deployments along with Dataflow Performance Manager. I recommended using a deployment with one replica as the design environment and a deployment with many replicas ... [More] for execution. We recently announced StreamSets Control Hub which makes the Kubernetes integration way smoother! StreamSets Control Hub adds […] The post Using StreamSets Control Hub for Scalable Deployment via Kubernetes appeared first on Continuous Dataflows Built with StreamSets DataOps Platform. [Less]
Posted over 6 years ago by Pat Patterson
Happy New Year! Our first blog entry of 2018 is a guest post from Josh Janzen, a data scientist based in Minnesota. Josh wanted to ingest tweets referencing NFL games into Spark, then run some analysis to look for a correlation between ... [More] Twitter activity and game winners. Josh originally posted this entry on his personal blog, The post Streaming Data from Twitter for Analysis in Spark appeared first on StreamSets. [Less]
Posted over 6 years ago by Pat Patterson
Happy New Year! Our first blog entry of 2018 is a guest post from Josh Janzen, a data scientist based in Minnesota. Josh wanted to ingest tweets referencing NFL games into Spark, then run some analysis to look for a correlation between ... [More] Twitter activity and game winners. Josh originally posted this entry on his personal blog, […] The post Streaming Data from Twitter for Analysis in Spark appeared first on Continuous Dataflows Built with StreamSets DataOps Platform. [Less]
Posted over 6 years ago by Pat Patterson
In this guest blog, Predera‘s Kiran Krishna Innamuri (Data Engineer), and Nazeer Hussain (Head of Platform Engineering and Services) focus on building a data pipeline to perform lookups or run queries on Hive tables with the Spark execution engine ... [More] using StreamSets Data Collector and Predera’s custom Hive-JDBC lookup processor. Introduction Why run Hive on Spark? Since The post Speed up Hive Data Retrieval using Spark, StreamSets and Predera appeared first on StreamSets. [Less]
Posted over 6 years ago by Clarke
Control. We always want it, regularly don’t get it, yet in business it’s a must have to ensure things run as expected. Control is particularly critical when it comes to moving data around your company. Without it, it’s difficult to know where data is ... [More] coming from, where it’s going and how it’s been manipulated (and The post Introducing StreamSets Control Hub appeared first on StreamSets. [Less]
Posted over 6 years ago by Pat Patterson
In a previous blog post, I explained how StreamSets Data Collector (SDC) can work with Apache Kafka and Confluent Schema Registry to handle data drift via Avro schema evolution. In that blog post, I mentioned SDC's Schema Generator processor; today ... [More] I'll explain how you can use the Schema Generator to automatically create Avro schemas. We'll The post Generate your Avro Schema – Automatically! appeared first on StreamSets. [Less]
Posted over 6 years ago by Kirit Basu
Today an increasing amount of data is being generated from outside the data center or cloud – it isn’t always easy to get this data out of source systems or perform analytics right where it’s generated. Furthermore, getting this data into central big ... [More] data systems managed by the enterprise is an arduous task involving a The post Announcing StreamSets Data Collector Edge appeared first on StreamSets. [Less]