I Use This!
Activity Not Available

News

Analyzed 4 months ago. based on code collected 5 months ago.
Posted about 12 years ago by Sally
The Apache Software Foundation (ASF) welcomes Citrix to the roster of sponsors at the Platinum level. "We are pleased to welcome Citrix to our individual and corporate sponsors whose generosity helps advance the day-to-day operations of ... [More] The Apache Software Foundation," said ASF Chairman Doug Cutting. "This support helps us successfully shepherd more than 100 top-level projects, incubate dozens of open source innovations, broaden community outreach, and enhance the lives of countless users and developers The Apache Way." Citrix joins the following Sponsors: Platinum level --Facebook, Google, Microsoft, and Yahoo!; Gold level --AMD, Hortonworks, HP, and IBM; Silver level --Basis Technology, Cloudera, Matt Mullenweg, PSW GROUP, and SpringSource; Bronze level --AirPlus International, BlueNog, Digital Primates, FuseSource, Intuit, Joost, Liip AG SA Ltd, Lucid Imagination, Talend, Two Sigma Investments, and WANdisco. For more information on becoming a Sponsor of the ASF, please see http://apache.org/foundation/sponsorship.html # # #  [Less]
Posted about 12 years ago by arvind
Apache Sqoop is a tool designed for efficiently transferring bulk data between Apache Hadoop and structured datastores such as relational databases. You can use Sqoop to import data from external structured datastores into Hadoop Distributed ... [More] File System or related systems like Hive and HBase. Conversely, Sqoop can be used to extract data from Hadoop and export it to external structured datastores such as relational databases and enterprise data warehouses. In its monthly meeting in March of 2012, the board of Apache Software Foundation (ASF) resolved to grant a Top-Level Project status to Apache Sqoop, thus graduating it from the Incubator. This is a significant milestone in the life of Sqoop, which has come a long way since its inception almost three years ago. The following figure offers a brief overview of what has happened in the life of Sqoop so far: Figure 1: A timeline of Sqoop Project Sqoop started as a contrib module for Apache Hadoop in May of 2009, first submitted as a patch to HADOOP-5815 by Aaron Kimball. Over the course of next year, it saw about 56 patches submitted towards its development. Given the inertia of large projects, Aaron decided to decouple it from Hadoop and host it elsewhere to facilitate faster development and release cycles. Consequently, in April of 2010 Sqoop was taken out from Hadoop via MAPREDUCE-1644 and hosted on GitHub by Cloudera as an Apache Licensed project.Over the course of next year, Sqoop saw wide adoption along with four releases and 191 patches. An extension API was introduced early in Sqoop that allowed the development of high-speed third party connectors for rapid data transfer from specialized systems such as enterprise data warehouses. As a result, multiple connectors were developed by various vendors that plugged into Sqoop. To bolster this fledgling community of users and third party connector vendors, Cloudera decided to propose it for incubation in Apache. Sqoop was accepted for incubation by the Apache Incubator in June of 2011.  Inside the Incubator, Sqoop saw a healthy growth in its community and gained four new committers. With active community and committers, Sqoop made two incubating releases. The focus of its first release was migration of code from com.cloudera.sqoop namespace to org.apache.sqoop while preserving backward compatibility. Thanks to phenomenal work by Bilung Lee, the release manager of the first incubating release, this release met all of its expectations. The second incubating release of Sqoop focused on its interoperability with various versions of Hadoop. The release manager of this release - Jarek Jarcec Cecho - was instrumental in making sure that it delivered to this requirement and could work with Hadoop versions 0.20, 0.23 and 1.0. Along with the stated goals of these incubating releases, Sqoop saw a steady growth with 116 patches by various contributors and committers. With excellent mentorship by Patrick Hunt, other mentors of the project, and from Incubator PMC members, Sqoop acquired the ability to self-govern, follow the ASF policies and guidelines, and, foster and grow the community. Sqoop successfully graduated from the Incubator in March of 2012 and is now a Top-Level Apache project. You can download its latest release artifacts by visiting http://sqoop.apache.org/. While Sqoop has no doubt delivered significant value to the community of users, it is fair to say that it is in the early stages of fulfilling requirements of data integration around Hadoop. Work has started towards the development of next major revision of Sqoop which will address more of these requirements than before. Along the way, we are looking forward to grow the community many folds, get more committers on board, and solve some real challenging problems of data movement between Hadoop and external systems. We sincerely hope you will join us in taking Sqoop towards fulfilling all these goals and to become a standard component in Hadoop deployments everywhere. [Less]
Posted about 12 years ago by cos
Just a couple months ago Apache Hadoop fans, committers and alike were cheered up by the news that Hadoop community has released a long awaited version 1.0 of the famous Hadoop data crunching platform. It is a pleasure to be writing this blog ... [More] because while I am doing it, Apache mirrors are synchronizing the first ever release of Haodop 1.0 based data analytic stack that has been fully built, validated, and packaged by the Apache Bigtop project (incubating). Ladies and gentleman: we are proud to present a collaborative effort of many teams and individuals, that allows Bigtop to put together a first ever 100% open-source Apache Hadoop big data stack. Just a few highlights: Bigtop 0.3.0 (incubating) includes 10 major data analytic components including Apache HBase, Pig, Hive, Mahout, etc. out of the box native packaging for all major Linux distributions: Ubuntu, Fedora, CentOS, Suse complete set of source code (including all packaging specs, validation framework iTest, etc.) is available under ASF 2.0 license from Apache SVN repository Puppet recipes for fully automated cluster nodes configuration release comes with a significant number of improvements and fixed issues What's in the release? You will be pleased to find Hadoop 1.0.1, Hbase 0.92 Hive 0.8.1 Mahout 0.6.1 Oozie 3.1.3 Pig 0.9.2 Sqoop 1.4.1 Whirr 0.7 ZooKeeper 3.4.3 Flume 1.0.0 The set of the components will provide you with a complete stack of data collection and analytics pipeline, that has been thoroughly validated to work with each other and be fully compatible. Make sure to evaluate and upgrade to the latest official Hadoop 1.0 data analytic stack from Bigtop! Whether you're a seasoned data analyst, BOFH Hadoop DevOps or a curious open source developer -- make sure to check out the Bigtop 0.3.0 distribution. Most of all, though, consider joining our community and help us build the most reliable, 100% Apache big data analytics stack that commercial vendors can be envy of! [Less]
Posted about 12 years ago by Sally
Open Source big data tool used for efficient bulk transfer between Apache Hadoop and structured datastores. Forest Hill, MD --The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of nearly 150 Open ... [More] Source projects and initiatives, today announced that Apache Sqoop has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the Project’s community and products have been well-governed under the ASF's meritocratic process and principles. Designed to efficiently transfer bulk data between Apache Hadoop and structured datastores such as relational databases, Apache Sqoop allows the import of data from external datastores and enterprise data warehouses into Hadoop Distributed File System or related systems like Apache Hive and HBase. "The Sqoop Project has demonstrated its maturity by graduating from the Apache Incubator," explained Arvind Prabhakar, Vice President of Apache Sqoop. "With jobs transferring data on the order of billions of rows, Sqoop is proving its value as a critical component of production environments." Building on the Hadoop infrastructure, Sqoop parallelizes data transfer for fast performance and best utilization of system and network resources. In addition, Sqoop allows fast copying of data from external systems to Hadoop to make data analysis more efficient and mitigates the risk of excessive load to external systems.  "Connectivity to other databases and warehouses is a critical component for the evolution of Hadoop as an enterprise solution, and that's where Sqoop plays a very important role" said Deepak Reddy, Hadoop Manager at Coupons.com. "We use Sqoop extensively to store and exchange data between Hadoop and other warehouses like Netezza. The power of Sqoop also comes in the ability to write free-form queries against structured databases and pull that data into Hadoop." "Sqoop has been an integral part of our production data pipeline" said Bohan Chen, Director of the Hadoop Development and Operations team at Apollo Group. "It provides a reliable and scalable way to import data from relational databases and export the aggregation results to relational databases." Since entering the Apache Incubator in June 2011, Sqoop was quickly embraced as an ideal SQL-to-Hadoop data transfer solution. The Project provides connectors for popular systems such as MySQL, PostgreSQL, Oracle, SQL Server and DB2, and also allows for the development of drop-in connectors that provide high speed connectivity with specialized systems like enterprise data warehouses. Craig Ling, Director of Business Systems at Tsavo Media, said "We adopted the use of Sqoop to transfer data into and out of Hadoop with our other systems over a year ago. It is straight forward and easy to use, which has opened the door to allow team members to start consuming data autonomously, maximizing the analytical value of our data repositories." Availability and Oversight Apache Sqoop software is released under the Apache License v2.0, and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. Apache Sqoop source code, documentation, mailing lists, and related resources are available at http://sqoop.apache.org/. A timeline of the project's history through graduation from the Apache Incubator is also available. About The Apache Software Foundation (ASF) Established in 1999, the all-volunteer Foundation oversees nearly one hundred fifty leading Open Source projects, including Apache HTTP Server — the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 350 individual Members and 3,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(3)(c) not-for-profit charity, funded by individual donations and corporate sponsors including AMD, Basis Technology, Cloudera, Facebook, Google, IBM, HP, Hortonworks, Matt Mullenweg, Microsoft, PSW Group, SpringSource/VMware, and Yahoo!. For more information, visit http://www.apache.org/. "Apache", "Apache Sqoop", and "ApacheCon" are trademarks of The Apache Software Foundation. All other brands and trademarks are the property of their respective owners. #  #  # [Less]
Posted about 12 years ago by stack
The bulk of the HBase Project Management Committed (PMC) met at the StumbleUpon offices in San Francisco, March 27th, 2012, ahead of the HBase Meetup that happened later in the evening. Below we post the agenda and minutes to let you all know what ... [More] was discussed but also to solicit input and comment. Agenda A suggested agenda had been put on the PMC mailing list the day before the meeting soliciting that folks add their own items ahead of the meeting. The hope was that we'd put the agenda out on the dev mailing list before the meeting started to garner more suggestions but that didn’t happen. The agenda items below sort of followed on from the contributor pow-wow we had at salesforce a while back -- pow-wow agenda, pow-wow minutes (or see summarized version) -- but it was thought that we could go into more depth on a few of the topics raised then, in particular, project existential matters. Here were the items put up for discussion: Where do we want HBase to be in two years? What will success look like? Discuss. Make a short list. How do we achieve the just made short list? What resources do we have to hand? What can we deploy in the short term to help achieve our just stated objectives? What do we need to prioritize in near future? Make a short list. How do we exchange best practices/design decisions when developing new features? Sometimes there may be more things that can be shared if everyone follows the same best practices, and less features need to be implemented. Minutes  Attendees Todd Lipcon Ted Yu Jean-Daniel Cryans Jon Hsieh Karthik Ranganathan Kannan Muthukkaruppan Andrew Purtell Jon Gray Nicolas Spiegelberg Gary Helmling Mikhail Bautin Lars Hofhansl Michael Stack Todd, the secretary, took notes. St.Ack summarized his notes into the below minutes. The meeting started at about 4:20 (after the requisite 20 minutes dicking w/ A/V/ and dial-in setup). “Where do we want HBase to be in two years? What will success look like? Discuss. Make a short (actionable) list.” Secondary indexes and transactions were suggested as was operations on a parity w/ MySQL and rock solid stable so it could be used as primary copy of data. It was also suggested that we expend effort making HBase more usable out of the box (auto-tuning/defaults). Then followed discussion of who is HBase for? Big companies? Or new users, or startups? Is our goal stimulating demand and creating demand? Or is it to be reactive to what problems people are actually hitting? A dose of reality had it that while it would be nice to make all possible users happy, and even to talk about doing this, in actuality, we are mostly going to focus on what our employers need rather than prospective ‘customers’. After this detour, the list making became more blunt and basic. It was suggested that we build a rock solid db which other people might be able to build on top of for higher-level stuff. The core engine needs to work reliably -- lets do this first -- and then talk of new features and add-ons. Meantime, we can point users to coprocessers for building extensions and features w/o their needing to touch core hbase (It was noted that we are open to extending CPs if we have to to extend the ‘control surface’ exposed but that coverage has been pretty much sufficient up to this). Usability is important but operability is more important. Don’t need to target n00bs. First nail ops being able to understand whats going on. After further banter, we arrived at list: reliability, operability (insight into the running application, dynamic config. changes, usability improvements that make it easier on a clueful ops), and performance (in this order). It was offered that we are not too bad on performance -- especially in 0.94 -- and that use cases will drive the performance improvements so focus should be on the first two items in the list. “How do we achieve the just made short list?” To improve reliability, testing has to be better. This has been said repeatedly in the past. It was noted that progress has been made at least on our unit test story (Tests have been //ized, more of hbase is now testable because of refactorings). Progress on integration tests and or contributions to Apache Bigtop have not progressed. As is, BigTop is a little "cryptic"; its a different harness with shim layers to insulate against version changes. We should help make it easier. We should add being able to do fault injection. Should hbase integration tests be done out in the open continuously running on something like an EC2 cluster that all can observe? This is the BigTop goal but its not yet there. Of note, EC2 can’t be used validating performance. Too erratic. Bottom line, improving testing requires bodies. Resources such as hardware, etc., are relatively easy to come by. Hard is getting an owner and bodies to write the tests and test infrastructure. Each of us has our own hack testing setup. Doing a general test infrastructure whether on BigTop or not is a bit of chicken and egg problem. Lets just make sure that who ever hits the need to do this general test infrastructure tooling first, that they do it out in the open and that we all pile on and help. Meantime we'll keep up w/ our custom test tools. Regards the current state of reliability, its thought that as is, we can’t run a cluster continuously w/ a chaos monkey operating. There are probably still “hundreds of issues” to be fixed before we’re at a state where we can run for days under “chaos monkey” conditions. For example, a recent experiment killing a rack in a large cluster and left it down for an hour. This turned up lots of bugs (On the other hand, we learned that an experiment done by another company recently where the downtime was less ‘recovered’ without problems). Areas to explore improving reliability include testing network failure scenarios, better MTTR, and better toleration of network partitions. Also on reliability, what core fixes are outstanding? There are still races in the master and issues where bulk cluster operations -- enable/disable -- fail in the middle. We need zookeeper-intent log (or something like Accumulo FATE) and table read/write locks. Meantime, kudos to the recently checked in hbck work because this can do fixup until missing fixes are put in place. Regards operability, we need more metrics -- e.g. 95th/99th percentile metrics (some of these just got checked in) -- and better stories around backups, point in time recovery, and cross-column family snapshots. We need to encourage/facilitate/deploy move to HA NN. Talk was about more metrics client-side rather than on the server. On metrics, what else do we need to expose? Log messages should be ‘actionable’; include help on what you need to do to solve an issue. Dynamic config. is going to help operability (here soon); we need to make sure that the important configs are indeed changeable on the fly.  Its thought that performance is taking care of itself (witness the suite of changes in 0.94.0). “How do we exchange best practices, etc, when developing new features?” Many are solving similar problems. We don’t always need to build new features. Maybe ops tricks are enough in many cases (two clusters if need two applications isolated rather than build a bunch of multi-tenacy code). People often go deep into design/code, then post the design + code only after they already spent a long time. Suggested that first should be discussion with the community early before writing code and designs. Regards any features proposal, its thought that the ‘why’ is the most important thing that needs justifying, not necessarily the technical design. Also, testability and disruption of core needs to be considered proposing new feature. Designs or any document needs to be short. Two pages at max. Hard for folks to spend the time if it goes on longer (Hadoop’ HEP process was mentioned at a proposal that failed). General Discussion  A general discussion went around on what to do about features not wanted whether because of the justification, the design, or the code and of how the latter is hardest to deal with especially if a feature large (Code review takes time). Was noted that we don’t have to commit everything, that we can revert stuff, and that its ok to throw something away even if a bunch of work has been done (A recent fb example around reuse of block cache blocks was cited where a bunch of code and review resulted in conclusion that the benefits were inconclusive so the project was abandoned). It was noted that the onus is on us to help contributors better understand what would be good things to work on moving the project forward. Was suggested that rather than a ‘sea of jiras’, instead we’d surface a short-list of what we think an important list of things to work on. General assent that roadmaps don’t work but should be easy to put up list of whats important in near and long term future for prospective contributors to pick up on. Was noted though that we also need to remain open to new features, etc., that don’t fit near-or-far-term project themes. This is open source after all. Was mentioned that we should work on high-level guidelines for how best to contribute. These need to be made public. Make a start, any kinda of start, on defining the project focus / design rationale. It doesn’t have to be perfect - just put a few words on the page, maybe even up on the home page. Meeting was adjourned around 5:45pm. [Less]
Posted about 12 years ago by Sally
Open Source mashup platform provides easy-to-use infrastructure for building and integrating with social media standards including Activity Streams, OpenSocial, W3C Widgets, and more.  Forest Hill, MD -- The Apache Software Foundation ... [More] (ASF), the all-volunteer developers, stewards, and incubators of nearly 150 Open Source projects and initiatives, today announced that Apache Rave has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the Project's community and products have been well-governed under the ASF's meritocratic process and principles. Apache Rave is an Open Source software mashup platform that allows developers to build and engage with an array of social network technologies such as OpenSocial, Activity Streams, and W3C Widgets. Rave's lightweight and extensible approach to robust personalization and collaboration capabilities supports a simple model for integration across other platforms, services and solutions. "Internet social platforms, such as Facebook, Google+, and Twitter have shaped the expectations of today's users; creating an onslaught of demand for pervasive social integration within both consumer and enterprise applications," said Matthew B. Franklin, Vice President of Apache Rave and Lead Software Engineer at The MITRE Corporation. "Developers today are constantly faced with the need to deliver low-cost, scalable, modularized applications with deep-rooted social capabilities. Apache Rave is the first open source project chartered to deliver a lightweight, flexible, widget-based platform to meet these demands." Apache Rave bundles the efforts of several independent Open Source initiatives that address similar functionality and requirements into a single, enterprise-grade platform that easily scales across federated application integrations, social intranets, and multi-channel social communities with enhanced personalization and customized content delivery. Developed on open standards, Apache Rave is collaboratively supported by individuals from a wide range of corporations, non-commercial organizations, and institutes from around the world. Seeded by code donations from The MITRE Corporation, Indiana University Pervasive Technology Institute, SURFnet, OSS Watch, Hippo, and numerous individual developers; interest in Apache Rave continues, and the Project welcomes new participants to its growing community. "Apache Rave takes the good bits from traditional Portals, leaves out whatever made them so heavy-weight and adds modern web technologies like OpenSocial, Widgets, Social Networking, Mobile delivery and Content Services. Rave has already proven to be a platform not to be underestimated. Hippo is proud to be an initiator and participant of this project, and plans to make Rave an integral part of its context-aware content delivery platform", said Ate Douma, Apache Rave incubating Champion and Chief Architect for open-source CMS vendor Hippo. "The Apache Rave project delivers a perfect platform for our personalized University Portal. Together with SURFnet we hope to develop, integrate and use all possible OpenSocial aspects to benefit our Academic community to the fullest," said Sander Liemberg, Project Manager at the University of Groningen, The Netherlands. "As participants in Apache Rave, we are very interested in applying its capabilities to managing scientific collaborations and access to computing resources. Rave is also interesting because of its capacity to be extended by developers: Rave provides a packaged, out-of-the box experience, but we are also trying to ensure it is can also serve as a starting point for developers who wish to extend its capabilities," said Marlon Pierce, Science Gateway Group Lead at Indiana University and Apache Rave PMC Member. "In particular, we at Indiana University are taking the specialized requirements of the National Science Foundation XSEDE Science Gateway program." Since entering the Apache Incubator in March 2011, the Apache Rave project has successfully produced several code releases in preparation of its first production-ready, v1.0 release. In addition, Apache Rave recently received an honorable mention in the 2011 Open Source Rookies of the Year awards. "We are pleased to have been a founding member of the Apache Rave community and are excited for the future of the project. Rave will be a cornerstone capability for our internal and external users, and we look forward to the continued collaboration and co-development with the community," said Joel Jacobs, Chief Information Officer, The MITRE Corporation. Availability and OversightApache Rave software is released under the Apache License v2.0, and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. Apache Rave source code, documentation, mailing lists, and related resources are available at http://rave.apache.org/ About The Apache Software Foundation (ASF)Established in 1999, the all-volunteer Foundation oversees nearly one hundred fifty leading Open Source projects, including Apache HTTP Server -- the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 350 individual Members and 3,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(3)(c) not-for-profit charity, funded by individual donations and corporate sponsors including AMD, Basis Technology, Cloudera, Facebook, Google, IBM, HP, Hortonworks, Matt Mullenweg, Microsoft, PSW Group, SpringSource/VMware, and Yahoo!. For more information, visit http://www.apache.org/. "Apache", "Apache Rave", and "ApacheCon" are trademarks of The Apache Software Foundation. All other brands and trademarks are the property of their respective owners. #  #  # [Less]
Posted about 12 years ago by sebawagner
In addition to the recently shown Confluence plugin there will be also a plugin to integrate OpenMeetings conference rooms into Atlassian Jira. There is also a Jira Test-Instance available: http://85.214.108.167:8080/ You can see a demo here: Well done Eugen! The demo video is available at: https://vimeo.com/37419990
Posted about 12 years ago by Sally
The first Apache BarCamp Washington, D.C. will be held on May 19 2012.If you will be in or around the Washington, D.C. area at that time, do sign up and join us!As with all Apache BarCamps, there will be a mix of Apache and non-Apache talks given ... [More] based on who comes and the topics that interest them - with an emphasis on sharing knowledge and having a fun time. For more details and/or to sign up, please take a look at our event site[1].  If you have specific questions feel free to email the planning group <[email protected]>.We are also wanting a few more sponsors, so if the D.C. market is interesting to your company and you'd like to find out some sponsoring details, contact us at <[email protected]>.Otherwise, if you can't join us in the Washington D.C. area, but you're interested in helping run an Apache BarCamp or Hackathon in your home town, find out more about getting involved in small events at <[email protected]>. Thanks! --the Apache Conference Committee[1] http://events.apache.org/event/2012/barcamp-dc/ [Less]
Posted about 12 years ago by uli
The Apache Software Foundation will be participating in the Google Summer of Code again this year as a mentoring organization. Google Summer of Code is a program where students work on open source projects backed by a stipend granted by Google. ... [More] The Apache Software Foundation has been participating in the program since its inception in 2005. Each year, 30-40 students are guided by volunteer mentors from various Apache communities. During the summer they learn what it means to participate in a diverse open source community and develop open source software "The Apache Way". Many of past students are now active contributors to our projects. This year we hope to build on our previous successes and again build student interest and enthusiasm in The Apache Software Foundation. Our list of project ideas (at http://s.apache.org/gsoc2012tasks) already contains over 100 ideas spanning more than 25 Apache projects. But that's only what we have come up with. A lot of students have their very own itch to scratch and approach our projects with their own ideas. If you are enrolled as a university student and always wanted to get involved with Apache, here's is your chance. Browse our ideas list, approach the projects you are most interested in, discuss your ideas, get involved, code the summer away, and at the end, get a nice paycheck from Google! [Less]
Posted about 12 years ago by sebawagner
Eugen from the Apache OpenMeetings team shows his first version of the Confluence Plugin. It enables integration of Apache OpenMeetings Web-Conferencing rooms into any Confluence wiki and managing the conference rooms from inside Confluence. The demo video is available at: https://vimeo.com/38767730