Posted
over 13 years
ago
by
Christopher Currens
We finally got it out the door, it took a lot longer than we expected. However, we have a ton of bug fixes rolled into this relase as well as a number of new features.
Some of the bug fixes include: concurrency issues, mono compilation
... [More]
issues, and memory leaks.
A lot of work has been done to clean up the code base, refactoring the code and project files, and providing build scripts
A couple of new features: Search.Regex, Simple Faceted Search, and simple phrase analysis in the Fast Vector Highlighter
Download it now on our downloads page
Just around the corner is a 2.9.4g release (early January), that has been substantially refactored and uses generics across the board. [Less]
|
Posted
over 13 years
ago
by
arvind
Over 30 people attended the inaugural Sqoop Meetup on the eve of Hadoop World in NYC. Faces were put to names, troubleshooting tips were swapped, and stories were topped - with the table-to-end-all-tables weighing in at 28 billion rows.
... [More]
I started off the scheduled talks by discussing "Habits of Effective Sqoop Users." One tip to make your next debugging session more effective was to provide more information up front on the mailing list such as versions used and running with the --verbose flag enabled. Also, I pointed out workarounds to common MySQL and Oracle errors.
Next up was Eric Hernandez's "Sqooping 50 Million Rows a Day from MySQL," where he displayed battle scars from creating a single data source for analysts to mine. Key lessons learned were: (1.) Develop an incremental import when sqooping in large active tables. (2.) Limit the amount of parts that data will be stored in HDFS. (3.) Compress data in HDFS.
The final talk of the night was given by Joey Echeverria on "Scratching Your Own Itch." Joey methodically stepped future Sqoop committers through the science from finding a Sqoop bug, filing a jira, coding a patch, submitting it for review, revising accordingly, and finally to ship it '+1' approval.
With the conclusion of the scheduled talks, the hallway talks commenced and went well into the night. Sqoop Committer Aaron Kimball was even rumored to have shed a tear over the healthy turnout and impending momentum barreling towards the next Sqoop Meetup on the Left Coast. See you there!
Guest post by Kate Ting.Photos from Masatake Iwasaki and Kate Ting.
[Less]
|
Posted
over 13 years
ago
by
orcmid
The OpenOffice.org Community Forums have been successfully migrated to operation under the Apache OpenOffice.org podling. Forum operation, location, and resources are intact. For users and the community that has grown the Forums into a
... [More]
valuable resource, it seems nothing changed. It wasn’t so simple. Here’s what it took and what was gained.
Community Forums on the move
Cut-over of the Community Forums completed on Friday morning, October 28. There were few disruptions during Internet propagation of the new hosting-site location. The migrated site is now accessed by the original web addresses. A staging server holding the necessary software was tested using backups of the data from the Oracle-hosted Forum services. Staging preparations started in July. It was the first-ever introduction of a Forum system at Apache. The last backup of the “live Forums” happened on October 27. The Forums backup was restored to the Apache staging system. The new “live Forums” stepped in, just like the old Forums. The transplant succeeded.
Adjustments will continue. There will be alignment with remaining migrations of OpenOffice.org web properties. There will be further integration into the Apache OpenOffice.org podling operation. Throughout remodeling, the Forums will be alive and well.
Community Forums legacy
The OpenOffice.org Community Forums originally went live on November 28, 2007. By September 20, 2011, the English-language Forums have accumulated 200,000 posts, contributed by 45,000 Forum registrants, on 40,000 topics (threads). At any point in time there appear to be 10-20 times as many unregistered users browsing the Forum as registered users. The thrust is having a setting where users with questions find users with answers. Experienced users also provide guidance to where the questions are already asked and either answered or under discussion. The Forums are a customization of the phpBB software that is a prevalent implementation of Internet forums.
The Spanish and French forums are next in size and activity, with most other forums of intermediate size. The entire Forum base is preserved on-line. Forum content is indexed by the major web search services.
Always open, browsing welcome
Visiting any of the Forum entry pages and exploring any topic of interest reveals characteristic Forum features:
It is easy to see what the variety of topics and degree of activity has been in each subject area.
Threads are organized and presented with recent, active topics located quickly; other viewing options, including of one's own posts, are selected with a single click.
There is integrated search for any topic and content.
Images and code samples can be included in posts and all can be quoted, cross-referenced, and reached via web locations.
The Forums provide links to extended topics on the Community Wiki, another migrated service.
There are tutorials on all components of the OpenOffice.org suite.
Special topics include the programmability features of OpenOffice.org, including writing macros and using/creating extensions.
The Forums embrace all of the descendants of the original StarOffice/OpenOffice.org that have become siblings in the OpenOffice.org galaxy. Tips and solutions in the use of one release are often useful to users of a peer product having the same feature.
Supporting global community
The forums were originated by a group of independent volunteers. The entire content of the Forums is created and curated by individual users and volunteers. With migration, the volunteer structure is supplemented by arrangements for oversight as required by policies concerning properties in ASF custodianship. Day-to-day operations and volunteer activities are unchanged..
User peer-support grows by inviting frequent contributors to serve as volunteers. Volunteers review Forum activity, point out where moderation is required, and participate in privacy-sensitive discussions about Forum operation. More-experienced volunteer Moderators intervene where appropriate to provide special assistance or curate threads and subscriptions.
The OpenOffice.org Community Forums are one way that the Web connects users of OpenOffice.org-related products. There are additional communities across the Internet with similar concerns as well as different specialties. These can employ mailing lists, Internet news groups, and other web-based forums. The Web and search engines bring the different resources of these communities into the reach of each other and users everywhere. The OpenOffice.org Community Forums are now continuing as a substantial resource of that extended community.
Moving complex web properties
The OpenOffice.org web site is a complex structure of services, web pages, and downloadable content. The openoffice.org Internet domain lease is moving as part of the grant from Oracle Corporation to the Apache Software Foundation (ASF). Migrating the various properties that constitute the web site is complicated. Considerable effort is required to have migration appear effortless and smooth.
Some services housed under the OpenOffice.org web locations are rather independent. Apparent integration as an OpenOffice.org web location is accomplished by splicing the service into an openoffice.org sub-domain. That is the case with http://user.services.openoffice.org/ and its ten native-language Community Forums. The English-language Forum location, http://user.services.openoffice.org/en/forum/, illustrates the pattern for individual languages. There is also consistent appearance and other features that blend the forums into the overall OpenOffice.org site. Maintaining this structure is important so that users can find materials where they recall them, including in bookmarks and links from other materials (including other forum posts). Search services that have already indexed the forum pages will continue to refer seekers to those same still-correct locations.
developed in Forum Discussion collaboration among acknack, FJCC, floris v, Hagar Delest, kingfisher, mriisv, MrProgrammer, orcmid, RGB, RoryOF, and vasa1 on behalf of the Community Forum Volunteers, additional ooo-dev suggestions by Donald Whytock and Dave Fisher.
[Less]
|
Posted
over 13 years
ago
by
Sally
The Apache Software Foundation Announces Apache Geronimo v3.0-beta-1 -– Leading Open Source Application Server Now Certified Java EE 6 Full- and Web Profile Compatible
Flexible, modular, and easy to manage, Apache Geronimo is the ideal
... [More]
platform for lightweight server deployments to full-scale enterprise environments, with complete support for Java EE 6 and OSGi programming models
16 November 2011 --FOREST HILL, MD-- The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of nearly 150 Open Source projects and initiatives, today announced that Apache Geronimo has obtained certification as a compatible implementation of both the Java EE 6 Full and Web Profiles. Apache Geronimo v3.0-beta-1 joins the Java EE 6 Reference Implementation as the only Open Source application server to be compatible with both Full and Web Profiles support.
"We're very happy to announce this significant milestone for the project," said Kevan Miller, Vice President of Apache Geronimo. "In addition to the Java EE 6 capabilities we've added to the product, Geronimo is now restructured to run on an OSGi kernel. Plus, we've added support for an enterprise OSGi application programming model -- a key enhancement for enterprise application developers wishing to take advantage of the modularity, dynamism, and versioning capabilities offered by OSGi".
Apache Geronimo integrates a number of ASF projects into an easy to manage, flexible, and modular application server. Java EE technologies utilized by Apache Geronimo include: Apache Tomcat, Apache OpenJPA, Apache OpenEJB, Apache MyFaces, Apache OpenWebBeans, Apache ActiveMQ, Apache Axis, Apache Wink, and Apache Bean Validation. OSGi technologies which are contained within Apache Geronimo include: Apache Aries, Apache Felix, and Apache Karaf. This wide array of Apache projects illustrates the breadth and depth of the software solutions developed at the Apache Software Foundation.
"Our move to OSGi has represented a signficant amount of internal restructuring, but this restructuring leaves us well positioned for future developments," explained Miller. "The Apache Aries, Apache Karaf, and Apache Felix projects have provided us a great base for our Geronimo 3.0 OSGi enhancments. The same is true for the Java EE technologies developed at the ASF: we couldn't have accomplished this without them".
Availability and Oversight
As with all Apache products, Apache Geronimo v3.0-beta-1 is released under the Apache License v2.0, and is overseen by a self-selected team of active contributors to the project. A Project Management Committee (PMC) guides the Project's day-to-day operations, including community development and product releases. Apache Geronimo source code, documentation, and related resources are available at http://geronimo.apache.org/.
About The Apache Software Foundation (ASF)
Established in 1999, the all-volunteer Foundation oversees nearly one hundred fifty leading Open Source projects, including Apache HTTP Server — the world's most popular Web server software. Through the ASF's meritocratic process known as "The Apache Way," more than 350 individual Members and 3,000 Committers successfully collaborate to develop freely available enterprise-grade software, benefiting millions of users worldwide: thousands of software solutions are distributed under the Apache License; and the community actively participates in ASF mailing lists, mentoring initiatives, and ApacheCon, the Foundation's official user conference, trainings, and expo. The ASF is a US 501(3)(c) not-for-profit charity, funded by individual donations and corporate sponsors including AMD, Basis Technology, Cloudera, Facebook, Google, IBM, HP, Matt Mullenweg, Microsoft, PSW Group, SpringSource/VMware, and Yahoo!. For more information, visit http://www.apache.org/.
"Apache" and "Apache Geronimo" are trademarks of The Apache Software Foundation. All other brands and trademarks are the property of their respective owners.
# # #
Media Contact:
Sally Khudairi
The Apache Software Foundation
+1 617 921 8656
[email protected] [Less]
|
Posted
over 13 years
ago
by
rhirsch
Recently, we've had a few questions about the concept of pools in ESME. I spent some time reading old threads from our mailing lists to collect the motivations behind our design decisions. This blog is a collection of tidbits from these mail
... [More]
threads.
First of all, “pools" are not interchangeable with "groups". They mean different things. A pool is about the messages. A group is about the people.
Groups are personal things where I assign different people into different groups and the meaning of a group is individual to me and it's all about my view of the world. This keeps to the "opt in" mechanism that we absolutely must preserve in ESME. If we do this type of group in the future, that's way cool, but once again, it's a personal thing that has nothing to do with access control or "sending".
Using the term "group" might lead people to think they are sending a message to a group of people, whereas they will actually be making it *available* to a group of people, should anyone in that group choose to look in the pool.
Pools are collections of messages that can only be read by people who have been granted access to that pool. A person who has access to a pool is able to see messages put into that pool that otherwise meet the person's criteria (who they are following, what their filter rules are.) There is no "send to a pool" concept. It's "place a message in a pool" and all messages are placed in one and only one pool and by default, that pool is the server-local public pool. ESME is opt-in.
A user has a relationship with a pool. That relationship is read/read-write/administer (which implies read-write).
So, how do you get a message into a pool? You will define your default pool. This is the pool that your messages get put into unless you specify otherwise. This means that the CEO can choose to put things in the "c-level" pool. Most people will post to the public pool by default.
If a pool is deleted, the messages in the users’ timeline stay, but it is as if all the users were deleted from the pool.
A message may only be in one pool. There is no way for a message to escape the pool (eg. resend cannot change the pool) and any replies are in the pool of the original message (this is for performance and security purposes.)
We are using groups and pools to mean something different than people are used to. ESME is a different medium than people are used to. That gives EMSE its power. ESME is powerful because it is a dynamic, opt-in, social medium rather than a point-to-point communications medium. There are different concepts in ESME than in point-to-point mediums. Let's do the extra work now to make sure we understand those differences and celebrate those differences and get others excited about those differences so that ESME can thrive for what it is... a social tool for social animals.
[Less]
|
Posted
over 13 years
ago
by
Sally
Standards-based, Content and
Metadata Detection and Analysis Toolkit Powers Large-scale,
Multi-lingual, Multi-format Repositories at Adobe, the Internet
Archive, NASA Jet Propulsion Laboratory, and more.
9 November 2011 —FOREST HILL, MD—
The
... [More]
Apache Software Foundation (ASF), the all-volunteer
developers, stewards, and incubators of nearly 150 Open Source
projects and initiatives, today announced Apache Tika v1.0, an
embeddable, lightweight toolkit for content detection and analysis.
"The Apache Tika v1.0 release is five
years in the making, providing numerous improvements and new parsing
formats," said Chris Mattmann, Apache Tika Vice President, Senior
Computer Scientist at NASA Jet Propulsion Laboratory, and University
of Southern California Adjunct Assistant Professor of Computer
Science. "From a toolkit perspective, it's easy to integrate, and
provides maximum functionality with little configuration."
With the increasing amount of
information available on the Internet today, automatic information
processing and retrieval is urgently needed to understand content
across cultures, languages, and continents.
Apache Tika is a one-stop shop for
identifying, retrieving, and parsing text and metadata from over
1,200 file formats including HTML, XML, Microsoft Office,
OpenOffice/OpenDocument, PDF, images, ebooks/EPUB, Rich Text,
compression and packaging formats, text/audio/image/video, Java class
files and archives, email/mbox, and more.
Tika entered the Apache Incubator in
2007, became a sub-project of Apache Lucene in 2008, and graduated as
an ASF Top-level Project (TLP) in April 2010. Apache Tika has been
tested extensively in repositories exceeding 500 million documents
across a variety of applications in industry, academia and government
labs.
"At NASA, we leverage Apache Tika
on several of our Earth science data system projects," explained
Dan Crichton, Program Manager and Principal Computer Scientist, NASA
Jet Propulsion Laboratory. "Tika helps us processes hundreds of
terabytes of scientific data in myriad formats and their associated
metadata models. Using Tika with other Apache technologies such as
OODT, Lucene, and Solr, we are able to automate, virtualize and
increase the efficiency of NASA's
science data processing pipeline."
Users
and software applications use Apache Tika to explore the information
landscape through flexible interfaces in Java, from the command line,
REST-ful Web services, and also by consuming its functionality from a
multitude of programming languages directly, including Python, .NET
and C++. Tika defines a standard application programming interface
(API) and makes use of existing libraries such Apache POI and PDFBox
to detect and extract metadata and structured text content from
various documents using existing parser libraries.
"We've used Apache Tika
extensively for a wide range of content extraction tasks, including
parsing almost 600 million pages and documents from a large web
crawl," said Ken Krugler, Founder and President of Scale
Unlimited. "It's proven invaluable as a simple yet robust
solution to the challenges of extracting text and metadata from the
jungle of formats you find on the web."
"Hippo CMS 7 uses Apache Jackrabbit
to index content repositories containing as many as 500,000
documents," explained Arjé Cahn, CTO of Hippo. "We are exploring
ways that Apache Tika can enhance access to metadata in our faceted
navigation feature, which may result in a possible future patch."
Availability and Oversight
As with all Apache products, Apache
Tika software is released under the Apache License v2.0, and is
overseen by a self-selected team of active contributors to the
project. A Project Management Committee (PMC) guides the Project’s
day-to-day operations, including community development and product
releases. Apache Tika source code, documentation, and related
resources are available at http://tika.apache.org/.
Apache Tika in Action!
Apache Tika v1.0 will be featured at
ApacheCon's Content Technologies track on 10 November 2011. PMC Chair
Mattmann will describe the modern genesis of the project and its
ecosystem, as well as the newly-launched Manning Publications book,
"Tika in Action" co-authored by Mattmann and Zitting.
About The Apache Software Foundation
(ASF)
Established in 1999, the all-volunteer
Foundation oversees nearly one hundred fifty leading Open Source
projects, including Apache HTTP Server — the world's most popular
Web server software. Through the ASF's meritocratic process known as
"The Apache Way," more than 350 individual Members and
3,000 Committers successfully collaborate to develop freely available
enterprise-grade software, benefiting millions of users worldwide:
thousands of software solutions are distributed under the Apache
License; and the community actively participates in ASF mailing
lists, mentoring initiatives, and ApacheCon, the Foundation's
official user conference, trainings, and expo. The ASF is a US
501(3)(c) not-for-profit charity, funded by individual donations and
corporate sponsors including AMD, Basis Technology, Cloudera,
Facebook, Google, IBM, HP, Matt Mullenweg, Microsoft, PSW Group,
SpringSource/VMware, and Yahoo!. For more information, visit
http://www.apache.org/.
"Apache", "Apache Tika",
and "ApacheCon" are trademarks of The Apache Software
Foundation. All other brands and trademarks are the property of their
respective owners.
# # # [Less]
|
Posted
over 13 years
ago
by
dblevins
This week we unveiled a new website driven by the Apache CMS!
Last year at ApacheCon 2010, the Infrastructure team announced they had developed a new CMS using plain old markdown and backed by SVN. This new system is all driven by commits and
... [More]
generates and publishes content instantly. Finally, it is easier to just write the documentation and immediately publish it than draft up a big long email and create a TODO to someday log into Confluence and paste in the content. When you live, eat, and breath on the command line and in your IDE, being able to edit your documentation there is a dream.
A major advantage of this new system is to be able to freely mix docs and code in all sorts of creative ways, never have to wait for publishing delays to deliver answers to users in the form of fresh documentation, and the simplicity of plain old text editing in any way you might want to do it. So far we've generated content using Perl, Java, Bash and heavy amount of just plain editing in Emacs or Intellij. It's been quite nice. You hardly need any "plugins" when you have direct access to the documentation source on a plain old file system.
We're rather excited about some of the new content. Some items of note:
Documentation Index
Configuration Settings
Examples Index
Simple MDB Example
@AccessTimeout Example
@AccessTimeout with Meta-Annotations Example
While the site overall looks great, there still is some content that is badly formatted. If you find any such content, please point it out and we'll fix it, or better yet, send a patch! [Less]
|
Posted
over 13 years
ago
by
dblevins
Our own Jonathan Gallimore presented "Apache TomEE – Java EE Web Profile on Tomcat" at JAX London this last week. It was a 50 minute presentation with a mix of slides and demos, met by a very enthusiastic band of Tomcat lovers.
Slides can be found
... [More]
here. Also, check out some photos of Jon in action! You'd probably never guess it's only his second time presenting and first time presenting solo! He makes us quite proud, indeed.
We'd like to give a special thanks to JAX London for their wonderful support of Apache TomEE. TomEE debuted at JAX London Spring 2011. At that point we had just started to heavily peruse certification. We were honored to be able to come back to our friends at JAX London in the fall and say, "we made it!"
Hats off to JAX for being the kind of conference that seeks out and supports growing projects like Apache TomEE. Aside from ApacheCon, JAX is Apache TomEE's second home.
See also this interview with David Blevins on JAXEnter, Be Small, Be Certified, Be Tomcat. [Less]
|
Posted
over 13 years
ago
by
dblevins
It was a big year for Apache TomEE and OpenEJB at JavaOne this year. Many thanks to everyone on all sides who helped get us there, all the wonderful people who attended and of course everyone in the community that makes this project tick.
First of
... [More]
all, we were very excited and honored to announce Apache TomEE as a Java EE 6 Web Profile certified. The announcement went out on Tuesday of that week and set the stage for some very exciting presentations and panels throughout the week.
We had three presentations total:
EJB with Meta Annotations
Fun with EJB 3.1 and OpenEJB
Apache TomEE Java EE 6 Web Profile
And participated in three panels:
Meet the Experts: EJB 3.2 Expert Group
The Road to Java EE 7: Is It All About the Cloud?
CDI Today and Tomorrow
The Apache TomEE talk was quite full with 134 attendees, only 5 less than the "CDI Today and Tomorrow" panel. All in all a very full week and one that will not soon be forgotten! [Less]
|
Posted
almost 14 years
ago
by
dpharbison
The Apache OpenOffice.org project is currently in the incubation phase. We're a 'podling'. It's where all new Apache projects begin, regardless of
how mature your source code base is. In this post I'll attempt to explain a bit about
... [More]
incubation, and a bit about the 'Apache Way', and our current effort to meet the requirements for 3rd party code review and clearance. In future posts, I'll attempt to tackle other aspects of the project. If we all have a better understanding of how the work is becoming organized, those of you interested to volunteer will have a better idea of where to start, and those who are interested to follow our progress will have an easier way to check up on things.
First off, a podling is not from
'Invasion of the Body Snatchers' – a human being wrapped up to look
like a large vegetable, or furry cute puppets from the Dark Crystal
Cave of Jim Henson's imagination. It's the term we use here at
Apache to describe the first phase of a prospective project; a podling is a
project that is 'incubating'. Egg,
podling, new thing with promise needing special care and attention. I
think you get the idea.
It's that special care and attention
part that is consuming the efforts of the PPMC or "Podling Project
Management Committee" at the moment. If we are going to hatch, 'graduate' to a TLP or "Top Level Project" in Apache-speak, we are required to meet certain criteria evolved out of deep experience accumulated through Apache's 12 year history and its involvement with many other successful projects.
Apache defines a podling as “A
codebase and its community while in the process of being incubated.”
You can find the details on the complete Apache Incubation Policy here.
OK, so we have the code base, thanks to
Oracle's decision, and we have a community signed in to the project
already, 75 committers and growing. So where are we with the process?
When do podlings hatch, and become
Apache TLP or Top Level Projects?
The abbreviated answer requires the
podling to:
Deliver an official Apache release
Demonstrate you have successfully
created an open and diverse community
Follow the 'Apache Way' through
the process, documenting status, conducting ballots, maintaining a
fully open and transparent process, etc.
OpenOffice is a very large chunk of
code, many millions of lines of code. The PPMC has now successfully
migrated all the source files into the Apache infrastructure nestled
into its new nest within the Apache Subversion repository
environment. We've run a build test on Linux and we know we've got
the code we need to begin to build a release.
But wait, before we can meet the requirement of producing an official release, Apache requires that we conduct a thorough IP or Intellectual Property review and clearance process. This means that the resulting Apache
release may be licensed under the Apache License 2. It requires
that all...
“incoming code
is fully signed off before any release. This simply reinforces the
Apache requirements: all code must have appropriate licenses....The
process of preparing an Apache release should include an audit of the code to
ensure that all files have appropriate headers and that all
dependencies complies with Apache policy.
This means that the resulting Apache
OpenOffice release(s) will provide the maximum opportunity for the
development of a broader spectrum of OpenOffice derivatives than we
see today. The OpenOffice of the past, will look very different in
the future as more developers become familiar with the code, and see
new opportunities not previously available.
Right now, our
immediate task is to resolve the licensing incompatibilities for 3rd
party code modules used by OpenOffice. Since Oracle did not possess
the copyright for these modules, they were not included in the original Oracle Software Grant Agreement, and therefore we are working to either
deprecate, or find a replacement that may be used either as a binary
file or an alternative source file that fills the function needed.
We're confident that the process will be concluded in the next weeks,
but it is detail-oriented work, and must be done thoroughly and
correctly in order to clear the path for an official podling release
of Apache OpenOffice.
Before we can produce an Apache release, we must complete the code clearance step, ensuring that the license headers include License and Notification files for all
artifacts in the build be done to the satisfaction of the PPMC
and the Incubator PMC which governs the Apache OpenOffice podling. This will clear the way forward to develop a realistic target date for issuing our first 'Apache OpenOffice.org' release
In future posts, I'll sketch out
how the project is being organized, mapping out the areas that offer
interesting and exciting opportunities needing new volunteers to step
up and take on.
- Don Harbison, PPMC Member, Apache OpenOffice.org
[Less]
|