4
I Use This!
Activity Not Available

News

Posted over 7 years ago by Kirit Basu
Version 3.0 marks an important new milestone for StreamSets. With close to a million downloads and a strong community and customer base, we are very excited to offer a host of powerful new capabilities within the product. This release has greater ... [More] connectivity with cloud services, deeper integration with Hadoop distributions, new data aggregations and an The post Announcing StreamSets Data Collector version 3.0 appeared first on StreamSets. [Less]
Posted over 7 years ago by Pat Patterson
Mike Fuller, a consultant at Red Pill Analytics, has been busy integrating an Oracle RDS database with Snowflake's cloud data warehouse via StreamSets Data Collector. His blog post on bulk loading data into Snowflake is a great description of a ... [More] real-world data integration use case. We're reposting it here courtesy of Mike and Red Pill. Whether The post Bulk Loading Data into Snowflake Data Warehouse appeared first on StreamSets. [Less]
Posted almost 8 years ago by Pat Patterson
As well as parsing incoming data into records, many StreamSets Data Collector (SDC) origins can be configured to ingest Whole Files. The blog entry Whole File Transfer with StreamSets Data Collector provides a basic introduction to the concept. ... [More] Although the initial release of the Whole File feature did not allow file content to be accessed in the pipeline, we The post Fun with FileRefs – Manipulating Whole File Data appeared first on StreamSets. [Less]
Posted almost 8 years ago by Clarke
When it comes to loading data into Apache Hadoop™, the de facto choice for bulk loads of data from leading relational databases is Apache Sqoop™. After initially entering Apache Incubator status in 2011, it quickly saw wide spread adoption and ... [More] development, eventually graduating to a Top-Level Project (TLP) in 2012. In StreamSets Data Collector (SDC) The post How to Convert Apache Sqoop™ Commands Into StreamSets Data Collector Pipelines appeared first on StreamSets. [Less]
Posted almost 8 years ago by Pat Patterson
Apache Avro is widely used in the Hadoop ecosystem for efficiently serializing data so that it may be exchanged between applications written in a variety of programming languages. Avro allows data to be self-describing; when data is serialized via ... [More] Avro, its schema is stored with it. Applications reading Avro-serialized data at a later time read The post Evolving Avro Schemas with Apache Kafka and StreamSets Data Collector appeared first on StreamSets. [Less]
Posted almost 8 years ago by Rick Bilodeau
This post was originally published on the Cloudera VISION blog by Sam Heywood.   StreamSets configurations and images of Apache Spot Open Data Model ingest pipelines can be found here on Github. A quick conversation with most Chief Information ... [More] Security Officers (CISOs) reveals they understand they need to modernize their security architecture and the correct answer The post Getting Started with Cloudera’s Cybersecurity Solution (feat. StreamSets, Arcadia Data and Centrify) appeared first on StreamSets. [Less]
Posted almost 8 years ago by Clarke
Three months into my journey here at StreamSets and I’ve had a chance to talk with many of our customers and prospects to understand how they are using the open source StreamSets Data Collector (SDC) across a number of different use cases. As it ... [More] turns out, behind solving technical problems in areas such as cybersecurity, IoT The post Straight from Our Customers: The Benefits of Modern Ingestion appeared first on StreamSets. [Less]
Posted almost 8 years ago by Pat Patterson
It's fair to say that most developers are familiar with Stack Overflow and the Stack Exchange network of question and answer sites. Q&A sites such as Stack Overflow serve communities of users focused around a particular topic or discipline – in ... [More] the case of Stack Overflow, programming. Today, we're launching Ask StreamSets, a Q&A site The post Ask StreamSets: Questions and Answers for the StreamSets Community appeared first on StreamSets. [Less]
Posted almost 8 years ago by Kirti Velankar
‘Simplicity is the ultimate sophistication.’ – Leonardo da Vinci As a recent hire on the Engineering Productivity team here at StreamSets, my early days at the company were marked by efforts to dive head-first into StreamSets Data Collector (SDC). As ... [More] it turns out, the Docker images we publish for SDC were the easiest way to The post Getting Started with StreamSets Data Collector on Docker appeared first on StreamSets. [Less]
Posted almost 8 years ago by Kirit Basu
We are happy to announce version 2.7.1.0 of StreamSets Data Collector. This release has a number of new features, improvements and bug fixes. For a list of all our new features, please see What's New. For a list of bug fixes and known issues, see the ... [More] Release Notes. In the event that you have already The post Announcing Data Collector v2.7.1.0 appeared first on StreamSets. [Less]