I Use This!
High Activity

News

Analyzed about 9 hours ago. based on code collected about 22 hours ago.
Posted over 13 years ago by Christopher Currens
We finally got it out the door, it took a lot longer than we expected. However, we have a ton of bug fixes rolled into this relase as well as a number of new features. Some of the bug fixes include: concurrency issues, mono compilation ... [More] issues, and memory leaks. A lot of work has been done to clean up the code base, refactoring the code and project files, and providing build scripts A couple of new features: Search.Regex, Simple Faceted Search, and simple phrase analysis in the Fast Vector Highlighter Download it now on our downloads page Just around the corner is a 2.9.4g release (early January), that has been substantially refactored and uses generics across the board. [Less]
Posted over 13 years ago by arvind
Over 30 people attended the inaugural Sqoop Meetup on the eve of Hadoop World in NYC. Faces were put to names, troubleshooting tips were swapped, and stories were topped - with the table-to-end-all-tables weighing in at 28 billion rows. ... [More] I started off the scheduled talks by discussing "Habits of Effective Sqoop Users." One tip to make your next debugging session more effective was to provide more information up front on the mailing list such as versions used and running with the --verbose flag enabled. Also, I pointed out workarounds to common MySQL and Oracle errors. Next up was Eric Hernandez's "Sqooping 50 Million Rows a Day from MySQL," where he displayed battle scars from creating a single data source for analysts to mine. Key lessons learned were: (1.) Develop an incremental import when sqooping in large active tables. (2.) Limit the amount of parts that data will be stored in HDFS. (3.) Compress data in HDFS. The final talk of the night was given by Joey Echeverria on "Scratching Your Own Itch." Joey methodically stepped future Sqoop committers through the science from finding a Sqoop bug, filing a jira, coding a patch, submitting it for review, revising accordingly, and finally to ship it '+1' approval. With the conclusion of the scheduled talks, the hallway talks commenced and went well into the night. Sqoop Committer Aaron Kimball was even rumored to have shed a tear over the healthy turnout and impending momentum barreling towards the next Sqoop Meetup on the Left Coast. See you there! Guest post by Kate Ting.Photos from Masatake Iwasaki and Kate Ting. [Less]
Posted over 13 years ago by orcmid
The OpenOffice.org Community Forums have been successfully migrated to operation under the Apache OpenOffice.org podling.  Forum operation, location, and resources are intact.  For users and the community that has grown the Forums into a ... [More] valuable resource, it seems nothing changed.  It wasn’t so simple.  Here’s what it took and what was gained. Community Forums on the move Cut-over of the Community Forums completed on Friday morning, October 28.  There were few disruptions during Internet propagation of the new hosting-site location.  The migrated site is now accessed by the original web addresses.  A staging server holding the necessary software was tested using backups of the data from the Oracle-hosted Forum services.  Staging preparations started in July.  It was the first-ever introduction of a Forum system at Apache. The last backup of the “live Forums” happened on October 27.  The Forums backup was restored to the Apache staging system.  The new “live Forums” stepped in, just like the old Forums.  The transplant succeeded. Adjustments will continue.  There will be alignment with remaining migrations of OpenOffice.org web properties.  There will be further  integration into the Apache OpenOffice.org podling operation.  Throughout remodeling, the Forums will be alive and well. Community Forums legacy The OpenOffice.org Community Forums originally went live on November 28, 2007.  By September 20, 2011, the English-language Forums have accumulated 200,000 posts, contributed by 45,000 Forum registrants, on 40,000 topics (threads).   At any point in time there appear to be 10-20 times as many unregistered users browsing the Forum as registered users.  The thrust is having a setting where users with questions find users with answers.  Experienced users also provide guidance to where the questions are already asked and either answered or under discussion.  The Forums are a customization of the phpBB software that is a prevalent implementation of Internet forums. The Spanish and French forums are next in size and activity, with most other forums of intermediate size. The entire Forum base is preserved on-line.  Forum content is indexed by the major web search services.  Always open, browsing welcome Visiting any of the Forum entry pages and exploring any topic of interest reveals characteristic Forum features: It is easy to see what the variety of topics and degree of activity has been in each subject area.  Threads are organized and presented with recent, active topics located quickly; other viewing options, including of one's own posts, are selected with a single click. There is integrated search for any topic and content. Images and code samples can be included in posts and all can be quoted, cross-referenced, and reached via web locations. The Forums provide links to extended topics on the Community Wiki, another migrated service. There are tutorials on all components of the OpenOffice.org suite.  Special topics include the programmability features of OpenOffice.org, including writing macros and using/creating extensions.  The Forums embrace all of the descendants of the original StarOffice/OpenOffice.org that have become siblings in the OpenOffice.org galaxy.  Tips and solutions in the use of one release are often useful to users of a peer product having the same feature.  Supporting global community The forums were originated by a group of independent volunteers.  The entire content of the Forums is created and curated by individual users and volunteers.   With migration, the volunteer structure is supplemented by arrangements for oversight as required by policies concerning properties in ASF custodianship.  Day-to-day operations and volunteer activities are unchanged.. User peer-support grows by inviting frequent contributors to serve as volunteers.  Volunteers review Forum activity, point out where moderation is required, and participate in privacy-sensitive discussions about Forum operation.  More-experienced volunteer Moderators intervene where appropriate to provide special assistance or curate threads and subscriptions. The OpenOffice.org Community Forums are one way that the Web connects users of OpenOffice.org-related products.  There are additional communities across the Internet with similar concerns as well as different specialties.  These can employ mailing lists, Internet news groups, and other web-based forums.  The Web and search engines bring the different resources of these communities into the reach of each other and users everywhere.   The OpenOffice.org Community Forums are now continuing as a substantial resource of that extended community. Moving complex web properties The OpenOffice.org web site is a complex structure of services, web pages, and downloadable content. The openoffice.org Internet domain lease is moving as part of the grant from Oracle Corporation to the Apache Software Foundation (ASF). Migrating the various properties that constitute the web site is complicated. Considerable effort is required to have migration appear effortless and smooth. Some services housed under the OpenOffice.org web locations are rather independent. Apparent integration as an OpenOffice.org web location is accomplished by splicing the service into an openoffice.org sub-domain. That is the case with http://user.services.openoffice.org/ and its ten native-language Community Forums. The English-language Forum location, http://user.services.openoffice.org/en/forum/, illustrates the pattern for individual languages. There is also consistent appearance and other features that blend the forums into the overall OpenOffice.org site.  Maintaining this structure is important so that users can find materials where they recall them, including in bookmarks and links from other materials (including other forum posts).  Search services that have already indexed the forum pages will continue to refer seekers to those same still-correct locations. developed in Forum Discussion collaboration among acknack, FJCC, floris v, Hagar Delest, kingfisher, mriisv, MrProgrammer, orcmid, RGB, RoryOF, and vasa1 on behalf of the Community Forum Volunteers, additional ooo-dev suggestions by Donald Whytock and Dave Fisher. [Less]
Posted over 13 years ago by Sally
The Apache Software Foundation Announces Apache Geronimo v3.0-beta-1 -– Leading Open Source Application Server Now Certified Java EE 6 Full- and Web Profile Compatible Flexible, modular, and easy to manage, Apache Geronimo is the ideal ... [More] platform for lightweight server deployments to full-scale enterprise environments, with complete support for Java EE 6 and OSGi programming models 16 November 2011 --FOREST HILL, MD-- The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of nearly 150 Open Source projects and initiatives, today announced that Apache Geronimo has obtained certification as a compatible implementation of both the Java EE 6 Full and Web Profiles. Apache Geronimo v3.0-beta-1 joins the Java EE 6 Reference Implementation as the only Open Source application server to be compatible with both Full and Web Profiles support. "We're very happy to announce this significant milestone for the project," said Kevan Miller, Vice President of Apache Geronimo. "In addition to the Java EE 6 capabilities we've added to the product, Geronimo is now restructured to run on an OSGi kernel. Plus, we've added support for an enterprise OSGi application programming model -- a key enhancement for enterprise application developers wishing to take advantage of the modularity, dynamism, and versioning capabilities offered by OSGi". Apache Geronimo integrates a number of ASF projects into an easy to manage, flexible, and modular application server. Java EE technologies utilized by Apache Geronimo include: Apache Tomcat, Apache OpenJPA, Apache OpenEJB, Apache MyFaces, Apache OpenWebBeans, Apache ActiveMQ, Apache Axis, Apache Wink, and Apache Bean Validation. OSGi technologies which are contained within Apache Geronimo include: Apache Aries, Apache Felix, and Apache Karaf. This wide array of Apache projects illustrates the breadth and depth of the software solutions developed at the Apache Software Foundation. "Our move to OSGi has represented a signficant amount of internal restructuring, but this restructuring leaves us well positioned for future developments," explained Miller. "The Apache Aries, Apache Karaf, and Apache Felix projects have provided us a great base for our Geronimo 3.0 OSGi enhancments. The same is true for the Java EE technologies developed at the ASF: we couldn't have accomplished this without them". Availability and Oversight As with all Apache products, Apache Geronimo v3.0-beta-1 is released under the Apache License v2.0, and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. Apache Geronimo source code, documentation, and related resources are available at http://geronimo.apache.org/. About The Apache Software Foundation (ASF) Established in 1999, the all-volunteer Foundation oversees nearly one hundred fifty leading Open Source projects, including Apache HTTP Server — the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 350 individual Members and 3,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(3)(c) not-for-profit charity, funded by individual donations and corporate sponsors including AMD, Basis Technology, Cloudera, Facebook, Google, IBM, HP, Matt Mullenweg, Microsoft, PSW Group, SpringSource/VMware, and Yahoo!. For more information, visit http://www.apache.org/. "Apache" and "Apache Geronimo" are trademarks of The Apache Software Foundation. All other brands and trademarks are the property of their respective owners. # # # Media Contact: Sally Khudairi The Apache Software Foundation +1 617 921 8656 [email protected] [Less]
Posted over 13 years ago by rhirsch
Recently, we've had a few questions about the concept of pools in ESME. I spent some time reading old threads from our mailing lists to collect the motivations behind our design decisions. This blog is a collection of tidbits from these mail ... [More] threads.  First of all, “pools" are not interchangeable with "groups". They mean different things. A pool is about the messages. A group is about the people. Groups are personal things where I assign different people into different groups and the meaning of a group is individual to me and it's all about my view of the world.  This keeps to the "opt in" mechanism that we absolutely must preserve in ESME.  If we do this type of group in the future, that's way cool, but once again, it's a personal thing that has nothing to do with access control or "sending". Using the term "group" might lead people to think they are sending a message to a group of people, whereas they will actually be making it *available* to a group of people, should anyone in that group choose to look in the pool. Pools are collections of messages that can only be read by people who have been granted access to that pool. A person who has access to a pool is able to see messages put into that pool that otherwise meet the person's criteria (who they are following, what their filter rules are.)  There is no "send to a pool" concept.  It's "place a message in a pool" and all messages are placed in one and only one pool and by default, that pool is the server-local public pool.  ESME is opt-in. A user has a relationship with a pool.  That relationship is read/read-write/administer (which implies read-write).   So, how do you get a message into a pool?  You will define your default pool.  This is the pool that your messages get put into unless you specify otherwise.  This means that the CEO can choose to put things in the "c-level" pool.  Most people will post to the public pool by default.   If a pool is deleted, the messages in the users’ timeline stay, but it is as if all the users were deleted from the pool. A message may only be in one pool.  There is no way for a message to escape the pool (eg. resend cannot change the pool) and any replies are in the pool of the original message (this is for performance and security purposes.) We are using groups and pools to mean something different than people are used to.  ESME is a different medium than people are used to.  That gives EMSE its power.  ESME is powerful because it is a dynamic, opt-in, social medium rather than a point-to-point communications medium. There are different concepts in ESME than in point-to-point mediums.  Let's do the extra work now to make sure we understand those differences and celebrate those differences and get others excited about those differences so that ESME can thrive for what it is... a social tool for social animals. [Less]
Posted over 13 years ago by Sally
Standards-based, Content and Metadata Detection and Analysis Toolkit Powers Large-scale, Multi-lingual, Multi-format Repositories at Adobe, the Internet Archive, NASA Jet Propulsion Laboratory, and more. 9 November 2011 —FOREST HILL, MD— The ... [More] Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of nearly 150 Open Source projects and initiatives, today announced Apache Tika v1.0, an embeddable, lightweight toolkit for content detection and analysis. "The Apache Tika v1.0 release is five years in the making, providing numerous improvements and new parsing formats," said Chris Mattmann, Apache Tika Vice President, Senior Computer Scientist at NASA Jet Propulsion Laboratory, and University of Southern California Adjunct Assistant Professor of Computer Science. "From a toolkit perspective, it's easy to integrate, and provides maximum functionality with little configuration." With the increasing amount of information available on the Internet today, automatic information processing and retrieval is urgently needed to understand content across cultures, languages, and continents. Apache Tika is a one-stop shop for identifying, retrieving, and parsing text and metadata from over 1,200 file formats including HTML, XML, Microsoft Office, OpenOffice/OpenDocument, PDF, images, ebooks/EPUB, Rich Text, compression and packaging formats, text/audio/image/video, Java class files and archives, email/mbox, and more. Tika entered the Apache Incubator in 2007, became a sub-project of Apache Lucene in 2008, and graduated as an ASF Top-level Project (TLP) in April 2010. Apache Tika has been tested extensively in repositories exceeding 500 million documents across a variety of applications in industry, academia and government labs. "At NASA, we leverage Apache Tika on several of our Earth science data system projects," explained Dan Crichton, Program Manager and Principal Computer Scientist, NASA Jet Propulsion Laboratory. "Tika helps us processes hundreds of terabytes of scientific data in myriad formats and their associated metadata models. Using Tika with other Apache technologies such as OODT, Lucene, and Solr, we are able to automate, virtualize and increase the efficiency of NASA's science data processing pipeline." Users and software applications use Apache Tika to explore the information landscape through flexible interfaces in Java, from the command line, REST-ful Web services, and also by consuming its functionality from a multitude of programming languages directly, including Python, .NET and C++. Tika defines a standard application programming interface (API) and makes use of existing libraries such Apache POI and PDFBox to detect and extract metadata and structured text content from various documents using existing parser libraries. "We've used Apache Tika extensively for a wide range of content extraction tasks, including parsing almost 600 million pages and documents from a large web crawl," said Ken Krugler, Founder and President of Scale Unlimited. "It's proven invaluable as a simple yet robust solution to the challenges of extracting text and metadata from the jungle of formats you find on the web." "Hippo CMS 7 uses Apache Jackrabbit to index content repositories containing as many as 500,000 documents," explained Arjé Cahn, CTO of Hippo. "We are exploring ways that Apache Tika can enhance access to metadata in our faceted navigation feature, which may result in a possible future patch." Availability and Oversight As with all Apache products, Apache Tika software is released under the Apache License v2.0, and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project’s day-to-day operations, including community development and product releases. Apache Tika source code, documentation, and related resources are available at http://tika.apache.org/. Apache Tika in Action! Apache Tika v1.0 will be featured at ApacheCon's Content Technologies track on 10 November 2011. PMC Chair Mattmann will describe the modern genesis of the project and its ecosystem, as well as the newly-launched Manning Publications book, "Tika in Action" co-authored by Mattmann and Zitting. About The Apache Software Foundation (ASF) Established in 1999, the all-volunteer Foundation oversees nearly one hundred fifty leading Open Source projects, including Apache HTTP Server — the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 350 individual Members and 3,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(3)(c) not-for-profit charity, funded by individual donations and corporate sponsors including AMD, Basis Technology, Cloudera, Facebook, Google, IBM, HP, Matt Mullenweg, Microsoft, PSW Group, SpringSource/VMware, and Yahoo!. For more information, visit http://www.apache.org/. "Apache", "Apache Tika", and "ApacheCon" are trademarks of The Apache Software Foundation. All other brands and trademarks are the property of their respective owners. # # # [Less]
Posted over 13 years ago by dblevins
This week we unveiled a new website driven by the Apache CMS! Last year at ApacheCon 2010, the Infrastructure team announced they had developed a new CMS using plain old markdown and backed by SVN. This new system is all driven by commits and ... [More] generates and publishes content instantly. Finally, it is easier to just write the documentation and immediately publish it than draft up a big long email and create a TODO to someday log into Confluence and paste in the content. When you live, eat, and breath on the command line and in your IDE, being able to edit your documentation there is a dream. A major advantage of this new system is to be able to freely mix docs and code in all sorts of creative ways, never have to wait for publishing delays to deliver answers to users in the form of fresh documentation, and the simplicity of plain old text editing in any way you might want to do it. So far we've generated content using Perl, Java, Bash and heavy amount of just plain editing in Emacs or Intellij. It's been quite nice. You hardly need any "plugins" when you have direct access to the documentation source on a plain old file system. We're rather excited about some of the new content. Some items of note: Documentation Index Configuration Settings Examples Index Simple MDB Example @AccessTimeout Example @AccessTimeout with Meta-Annotations Example While the site overall looks great, there still is some content that is badly formatted. If you find any such content, please point it out and we'll fix it, or better yet, send a patch! [Less]
Posted over 13 years ago by dblevins
Our own Jonathan Gallimore presented "Apache TomEE – Java EE Web Profile on Tomcat" at JAX London this last week. It was a 50 minute presentation with a mix of slides and demos, met by a very enthusiastic band of Tomcat lovers. Slides can be found ... [More] here. Also, check out some photos of Jon in action! You'd probably never guess it's only his second time presenting and first time presenting solo! He makes us quite proud, indeed. We'd like to give a special thanks to JAX London for their wonderful support of Apache TomEE. TomEE debuted at JAX London Spring 2011. At that point we had just started to heavily peruse certification. We were honored to be able to come back to our friends at JAX London in the fall and say, "we made it!" Hats off to JAX for being the kind of conference that seeks out and supports growing projects like Apache TomEE. Aside from ApacheCon, JAX is Apache TomEE's second home. See also this interview with David Blevins on JAXEnter, Be Small, Be Certified, Be Tomcat. [Less]
Posted over 13 years ago by dblevins
It was a big year for Apache TomEE and OpenEJB at JavaOne this year. Many thanks to everyone on all sides who helped get us there, all the wonderful people who attended and of course everyone in the community that makes this project tick. First of ... [More] all, we were very excited and honored to announce Apache TomEE as a Java EE 6 Web Profile certified. The announcement went out on Tuesday of that week and set the stage for some very exciting presentations and panels throughout the week. We had three presentations total: EJB with Meta Annotations Fun with EJB 3.1 and OpenEJB Apache TomEE Java EE 6 Web Profile And participated in three panels: Meet the Experts: EJB 3.2 Expert Group The Road to Java EE 7: Is It All About the Cloud? CDI Today and Tomorrow The Apache TomEE talk was quite full with 134 attendees, only 5 less than the "CDI Today and Tomorrow" panel. All in all a very full week and one that will not soon be forgotten! [Less]
Posted almost 14 years ago by dpharbison
The Apache OpenOffice.org project is currently in the incubation phase. We're a 'podling'. It's where all new Apache projects begin, regardless of how mature your source code base is. In this post I'll attempt to explain a bit about ... [More] incubation, and a bit about the 'Apache Way', and our current effort to meet the requirements for 3rd party code review and clearance. In future posts, I'll attempt to tackle other aspects of the project. If we all have a better understanding of how the work is becoming organized, those of you interested to volunteer will have a better idea of where to start, and those who are interested to follow our progress will have an easier way to check up on things.  First off, a podling is not from 'Invasion of the Body Snatchers' – a human being wrapped up to look like a large vegetable, or furry cute puppets from the Dark Crystal Cave of Jim Henson's imagination. It's the term we use here at Apache to describe the first phase of a prospective project; a podling is a project that is 'incubating'. Egg, podling, new thing with promise needing special care and attention. I think you get the idea. It's that special care and attention part that is consuming the efforts of the PPMC or "Podling Project Management Committee" at the moment. If we are going to hatch, 'graduate' to a TLP or "Top Level Project" in Apache-speak, we are required to meet certain criteria evolved out of deep experience accumulated through Apache's 12 year history and its involvement with many other successful projects. Apache defines a podling as “A codebase and its community while in the process of being incubated.” You can find the details on the complete Apache Incubation Policy here. OK, so we have the code base, thanks to Oracle's decision, and we have a community signed in to the project already, 75 committers and growing. So where are we with the process? When do podlings hatch, and become Apache TLP or Top Level Projects? The abbreviated answer requires the podling to: Deliver an official Apache release Demonstrate you have successfully created an open and diverse community Follow the 'Apache Way' through the process, documenting status, conducting ballots, maintaining a fully open and transparent process, etc. OpenOffice is a very large chunk of code, many millions of lines of code. The PPMC has now successfully migrated all the source files into the Apache infrastructure nestled into its new nest within the Apache Subversion repository environment. We've run a build test on Linux and we know we've got the code we need to begin to build a release. But wait, before we can meet the requirement of producing an official release, Apache requires that we conduct a thorough IP or Intellectual Property review and clearance process. This means that the resulting Apache release may be licensed under the Apache License 2. It requires that all... “incoming code is fully signed off before any release. This simply reinforces the Apache requirements: all code must have appropriate licenses....The process of preparing an Apache release should include an audit of the code to ensure that all files have appropriate headers and that all dependencies complies with Apache policy. This means that the resulting Apache OpenOffice release(s) will provide the maximum opportunity for the development of a broader spectrum of OpenOffice derivatives than we see today. The OpenOffice of the past, will look very different in the future as more developers become familiar with the code, and see new opportunities not previously available.  Right now, our immediate task is to resolve the licensing incompatibilities for 3rd party code modules used by OpenOffice. Since Oracle did not possess the copyright for these modules, they were not included in the original Oracle Software Grant Agreement, and therefore we are working to either deprecate, or find a replacement that may be used either as a binary file or an alternative source file that fills the function needed. We're confident that the process will be concluded in the next weeks, but it is detail-oriented work, and must be done thoroughly and correctly in order to clear the path for an official podling release of Apache OpenOffice. Before we can produce an Apache release, we must complete the code clearance step, ensuring that the license headers include License and Notification files for all artifacts in the build be done to the satisfaction of the PPMC and the Incubator PMC which governs the Apache OpenOffice podling. This will clear the way forward to develop a realistic target date for issuing our first 'Apache OpenOffice.org' release  In future posts, I'll sketch out how the project is being organized, mapping out the areas that offer interesting and exciting opportunities needing new volunteers to step up and take on.   - Don Harbison, PPMC Member, Apache OpenOffice.org [Less]