13
I Use This!
Activity Not Available

News

Analyzed 12 months ago. based on code collected over 4 years ago.
Posted about 14 years ago
So while I was at the MySQL UC, The Xeround database came to my attention.  It bills itself as database as a service for MySQL systems and a seamless replacement for standard MySQL. Of course, since I am a QA Engineer, I could not resist the urge to ... [More] try to break it >:)  As my friend and former MySQL colleage, Kostja says, “QA Engineers are a unique breed…they like to push all the buttons” : )  I would say that the QA mindset goes a bit further than that, but it is something I will delve into in another post.  I will only say that there is a reason that Microsoft recognizes QA software engineering as a distinct and specialized discipline. So, let’s get back to Xeround.  It was the first database as a service that caught my eye and I just had to test it!  They are currently offering a free beta.  It is remarkably easy and fast to get set up with a test database and the web-based dashboard they provide is pretty interesting and offers some good information (though some of it is confusing…more on that in a bit) It was my intent to run a small handful of tests with the mighty, mighty randgen! My tests were as follows: outer_join grammar – creates seriously nasty JOIN queries that can use up to 20 tables transactional grammar – we have a grammar that creates a variety of transactions.  Some good, some bad, with lots of ROLLBACKs and SAVEPOINTs sprinkled in for spice. subqueries – the nastiest grammar I have created and as I have mentioned elsewhere, it is also part of why we are just now seeing optimizer features like index condition pushdown (ICP) being reintroduced to MySQL >: ) My thoughts were that these could be quickly executed and point out any serious problems in basic functionality.  MySQL and Drizzle both use these grammars as part of their testing.  Drizzle must survive these tests on every push to trunk, so these seem like reasonable stressors for a new engine >: ) It should be noted that I had to modify the test grammars to accomodate some Xeround limitations, the modified randgen branch I used is here.  It can be branched via bzr branch lp:~patrick-crews/randgen/randgen_drizzle_exp Each grammar would be run with the randgen’s –debug option.  This is because the user is presented with a nice report at the end of the run which indicates:  query_count:row_count (ie how many queries returned how many rows): # 2011-04-27T20:40:18 Rows returned: $VAR1 = { ‘    0′ => 59, ‘    1′ => 2, ‘    4′ => 1, ‘    9′ => 1, ‘   -1′ => 35, ‘>100′ => 1 }; I would use this as a comparison point against MySQL 5.1.  Granted, I could use the –Validator=ResultsetComparatorSimplify option, but then I would have an actual bug report that I would feel compelled to file and this would feel less like fun and more like work ; )  However, I have been in contact with engineers from Xeround and have shared my findings with them. For the transactional grammar, I would run the grammar on each system and then do a diff of mysqldump files from each database.  As Xeround is a MySQL engine, this could cause some differences, but the data in the tables should be consistent. Before I get into the testing results, I’ll provide some overall impressions: As I said, the web interface is pretty nice and provides you with a lot of useful information.  It allows you to easily create a new named database instance and provides you with data such as status, scale, uptime, cpu utilization, memory utilization, number of connections, ops/sec, and message count.  Scale refers to the autoscale capabilities that Xeround advertises.  For the beta, you are allowed to scale from 3 to 4 servers.  3 servers is considered 100%, adding the extra server (when certain user-specified CPU or Memory limits are hit) equates to 133% .  Interestingly enough, I observed that there were always 6 active connections when the database was idle (probably some of the Xeround ‘secret sauce‘ working…). The control panel also allows the user to set the CPU, memory, and connections limits that will trigger scale up (and possibly scale down).  In my experiments, I never seemed to tax memory or connections, but CPU limits were hit and auto-scale did trigger, though I will admit that I didn’t observe any noticeable change in the test execution. There are also tabs for backup (not available in the free beta, though mysqldump does work against a Xeround instance), general monitoring which provides real-time information about cpu, memory and connections, and an events (messages tab).  The one thing I noted about the events tab was that I received a number of warning messages about the health of my database during times I wasn’t using it.  However, it is a beta service for general evaluation and certain oddities are to be expected. Here is what I found with my tests: 1)  Xeround is a MySQL engine.  They do advertise this, but the main reason I noticed that all of my created test tables were now ‘Engine=Xeround’ was that I was unable to create a varchar_1024 indexed column.  Xeround is limited to 255 characters max: # 2011-04-27T19:50:27 key (`col_char_1024_key` ))  failed: 1074 Column length too big for column 'col_char_1024' (max = 255); use BLOB or TEXT instead This limitation required modification of the randgen grammars and gendata files to limit char columns to 255.  As noted above, you can find the modified version of the randgen here. 2)  Tables with an ENGINE=$engine_name argument are processed without an issue (ie you should be able to use a dumpfile without problems) and are converted to Xeround tables.  One thing to note is that dumpfiles *from* Xeround have ENGINE=Xeround for the CREATE TABLE statements create table t1 (a int not null auto_increment, primary key(a)) engine=innodb; Query OK, 0 rows affected, 2 warnings (0.702761 sec) drizzle> show create table t1; +-------+---------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table                                                                                                                          | +-------+---------------------------------------------------------------------------------------------------------------------------------------+ | t1    | CREATE TABLE `t1` ( `a` int(11) NOT NULL AUTO_INCREMENT, PRIMARY KEY (`a`) ) ENGINE=Xeround DEFAULT CHARSET=utf8 COLLATE=utf8_bin | +-------+---------------------------------------------------------------------------------------------------------------------------------------+ 3)  outer_join grammar: I used the following command line: ./gentest.pl --gendata=conf/drizzle/outer_join_drizzle.zz --grammar=conf/drizzle/outer_join_drizzle.yy --queries=100 --threads=1 --dsn=dbi:mysql:host=00.00.00.00:port=9999:user=USER:password=PASSWORD:database=test --sqltrace --debug The test is designed to generate queries with large numbers of tables (up to ~20).  The test ran without much incident.  The Xeround server monitor indicated that the CPU was hovering near 80% for most of the time, but again…beta test setup, so I’ll give them some leeway. The big trouble is what follows.  Remember those randgen summary reports I mentioned earlier?  Below is a comparison of Xeround vs. MySQL for the same command line.  The values are row_count’ => number_of_queries_returning_said_row_count.  What this means is that for the same set of queries, Xeround and MySQL do not always return the same result sets.  I did not note any differences in query failures, so this simply indicates that results processing is differing somewhere : (  To elaborate, Xeround had 56 queries that returned 0 rows, for the same workload, MySQL only had 39.  A row count of -1 indicates that there was an error with the query, such as referencing a table or column that doesn’t exist.  Somehow, Xeround hit fewer errors than MySQL, though that is also worrisome – why do they register errors differently? Xeround: # 2011-04-27T20:11:05 Rows returned: $VAR1 = { '    0' => 56, '    1' => 16, '    2' => 6, '    3' => 2, '    5' => 1, '    6' => 1, '    7' => 1, '    8' => 1, '   -1' => 13, '   10' => 2, '>10' => 1 }; MySQL 5.1 $VAR1 = { '    0' => 39, '    1' => 15, '    2' => 2, '    3' => 2, '    4' => 1, '    7' => 2, '    8' => 1, '   -1' => 32, '   10' => 1, '>10' => 5 }; 4)  transactional grammar: I used the following command line: ./gentest.pl --gendata=conf/drizzle/translog_drizzle.zz --grammar=conf/drizzle/translog_concurrent1.yy --queries=100 --threads=1 --dsn=dbi:mysql:host=00.00.00.00:port=9999:user=USER:password=PASSWORD:database=test --sqltrace --debug This grammar generates a variety of transactions and standalone queries.  The queries generated consist of both good and invalid SQL with lots of ROLLBACK’s and SAVEPOINT’s here and there.  Unfortunately, I noticed a large number of differences.  We’ll start with the easiest one: < DROP TABLE IF EXISTS `A`; < CREATE TABLE `A` ( --- > DROP TABLE IF EXISTS `a`; > CREATE TABLE `a` ( 50c50 < ) ENGINE='InnoDB' AUTO_INCREMENT=105 COLLATE='utf8_general_ci'; --- > ) ENGINE='Xeround' COLLATE='utf8_bin'; It isn’t huge, but Xeround apparently auto-converts tables names to lower-case.  The randgen attempts to create table `A`, but it is stored as table `a`.  This could be an issue for some people, but Xeround does say that the beta is for people to evaluate the system’s suitability for their purposes. The big issue is that Xeround appears to not have registered a lot of the transactions issued by the randgen.  The Xeround dumpfile only contained the original 10 rows from table `a`, while the MySQL 5.1 version I ran locally had 94 rows by the end of the randgen run : ( Further research of the randgen logs indicate the following issue: # 2011-04-27T20:06:56 Query:  INSERT INTO `d` ( `col_char_10` , `col_char_10_key` , `col_char_10_not_null` , `col_char_10_not_null_key` , `col_char_255` , `col_char_255_key` , `col_char_255_not_null` , `col_char_255_not_null_key` , `col_int` , `col_int_key` , `col_int_not_null` , `col_int_not_null_key` , `col_bigint` , `col_bigint_key` , `col_bigint_not_null` , `col_bigint_not_null_key` , `col_enum` , `col_enum_key` , `col_enum_not_null` , `col_enum_not_null_key` , `col_text` , `col_text_key` , `col_text_not_null` , `col_text_not_null_key` ) SELECT `col_char_10` , `col_char_10_key` , `col_char_10_not_null` , `col_char_10_not_null_key` , `col_char_255` , `col_char_255_key` , `col_char_255_not_null` , `col_char_255_not_null_key` , `col_int` , `col_int_key` , `col_int_not_null` , `col_int_not_null_key` , `col_bigint` , `col_bigint_key` , `col_bigint_not_null` , `col_bigint_not_null_key` , `col_enum` , `col_enum_key` , `col_enum_not_null` , `col_enum_not_null_key` , `col_text` , `col_text_key` , `col_text_not_null` , `col_text_not_null_key` FROM `bb`  ORDER BY `col_bigint`,`col_bigint_key`,`col_bigint_not_null`,`col_bigint_not_null_key`,`col_char_10`,`col_char_10_key`,`col_char_10_not_null`,`col_char_10_not_null_key`,`col_char_255`,`col_char_255_key`,`col_char_255_not_null`,`col_char_255_not_null_key`,`col_enum`,`col_enum_key`,`col_enum_not_null`,`col_enum_not_null_key`,`col_int`,`col_int_key`,`col_int_not_null`,`col_int_not_null_key`,`col_text`,`col_text_key`,`col_text_not_null`,`col_text_not_null_key`,`pk` LIMIT 50 /*Generated by THREAD_ID 1*/  failed: 1038 Out of sort memory; increase server sort buffer size So, it would appear that transactions are failing for some reason or another.  However, I repeat the disclaimer about this being a beta and not a production deployment.  It could have something to do with the resources allocated for each beta user. 5)  Subquery grammar This was the initial test I ran, but I have saved it for last.  First of all, the command line: ./gentest.pl --gendata=conf/drizzle/drizzle.zz --grammar=conf/drizzle/optimizer_subquery_drizzle.yy --queries=100 --threads=1 --dsn=dbi:mysql:host=00.00.00.00:port=9999:user=USER:password=PASSWORD:database=test --sqltrace --debug This test generates some very nasty subquery-laded queries (see below).  The first thing I noticed on the single-threaded run was that Xeround seemed to not like this query very much at all: SELECT    table2 . `col_int` AS field1 FROM ( CC AS table1 STRAIGHT_JOIN ( ( CC AS table2 STRAIGHT_JOIN CC AS table3 ON (table3 . `col_bigint_key` = table2 . `col_int_not_null_key`  ) ) ) ON (table3 . `col_text_not_null_key` = table2 . `col_char_10_key`  ) ) WHERE (  table1 . `col_int` NOT IN ( SELECT   SUBQUERY1_t1 . `col_int_not_null_key` AS SUBQUERY1_field1 FROM ( BB AS SUBQUERY1_t1 INNER JOIN ( CC AS SUBQUERY1_t2 INNER JOIN BB AS SUBQUERY1_t3 ON (SUBQUERY1_t3 . `col_char_10_key` = SUBQUERY1_t2 . `col_char_10_key`  ) ) ON (SUBQUERY1_t3 . `col_char_10_not_null_key` = SUBQUERY1_t2 . `col_char_10`  ) ) WHERE SUBQUERY1_t2 . `col_bigint` != table1 . `pk` OR SUBQUERY1_t2 . `pk` >= table2 . `pk` ) ) OR ( table1 . `col_int_key`  BETWEEN 48 AND ( 48 + 183 ) OR table1 . `pk`  BETWEEN 48 AND ( 48 + 104 ) )  GROUP BY field1  ; Now it is quite nasty, but standard MySQL executes it with a minimum of fuss (though it does take a moment to handle this monster as well). The other thing is that Xeround took an exceedingly long time to execute this workload.  While the other grammars executed in moderate amounts of time (my testing was from a hotel room in Santa Clara while the instance is in Chicago), the subquery test was noticeably slow.  I was able to walk down to the lobby, buy something, and return to my room while it was dealing with the nasty query above : (  For some context, running the same command line on my laptop took 8 seconds, Xeround took 14 minutes, but again…beta test setup and hardware, so YMMV. Finally, we have the dreaded row count report: Xeround: # 2011-04-27T20:45:19 Rows returned: $VAR1 = { '    0' => 59, '    1' => 2, '    4' => 1, '   -1' => 35, '>10' => 1, '>100' => 1 }; MySQL 5.1: # 2011-04-27T20:40:18 Rows returned: $VAR1 = { '    0' => 59, '    1' => 2, '    4' => 1, '    9' => 1, '   -1' => 35, '>100' => 1 }; As we can see, there is 1 query out of the 100 issued where result sets differed (returning 9 rows in MySQL vs. >10 rows in Xeround). I also tried using –threads=10 to really stress the Xeround system (I didn’t bother with MySQL, it handles 10 threads of nasty subqueries like a champ…incidentally, so does Drizzle) ; ) Xeround was able to handle the workload and did so in 27 minutes. Since single-threaded took 14 minutes, perhaps Xeround doesn’t really begin to shine until we start hitting large numbers of concurrent connections? So what can I say from the results of these informal tests?  Personally, I would hesitate to say that Xeround is a drop-in replacement.  The limitations on column sizes, changes in table naming, and differing result sets are a bit worrisome.  However, I will say that the Xeround engineers I met at the UC were very engaged and interested in my findings and have made significant strides in subquery processing since my initial tests.  I believe that with time these issues will be fixed and that not every customer will run into them (I know I’m beating this into the ground, but I was using a beta test system).  Behavior may be different on a production machine and not every MySQL user will generate such workloads and every customer should perform their own careful testing and evaluation before making any changes to their systems. My personal interest ends here.  The UC introduced me to a number of interesting new storage engines and I was mainly curious about ways of evaluating them.  This was a quick and dirty bit of testing just to see if I could produce any interesting pyrotechnics ; )  Go go randgen! I really want this picture to be shown when anyone searches for 'randgen' ; ) In all seriousness, I highly recommend adoption of the random query generator.  It offers a modular and customizable system for creating evaluation tools (like result set comparison, execution time comparison, replication validation, etc, etc) and has been used in production-level testing for MySQL, MariaDB and Drizzle for some time.  It also plays with Postgresql and Java DB (kind of scary that 40% of that list is owned by Oracle…), so please give it a spin and see what kinds of pretty explosions you can make…who knows, testing might actually become fun for non-QA folks >; ) Additionally, these tests only took me about half an hour to setup and execute.  Granted, I have been using the tool for some time, but 30 minutes to identify a number of potential problem areas seems pretty awesome to me, but then again, I am a QA Engineer and we live for such things. [Less]
Posted about 14 years ago
The last few weeks have been particularly quiet from me on the blogging front.  Behind the scenes things have been quite the opposite so here is a summary of things past, present and future. Rackspace and Drizzle If you have read my last ‘Last Week ... [More] in Drizzle‘ post you will know that Rackspace are no longer supporting Drizzle.  They have done a fantastic job so far and have decided to pass the baton to other companies.  As for the staff, they wished to redeploy us to other teams which is something I personally was not keen on.  I would rather remain within the MySQL/Drizzle sphere which I would have no longer been able to do effectively inside Rackspace any more. Drizzle itself will go on to do great things without Rackspace, there are a number of companies that announced support for Drizzle during the O’Reilly MySQL Conference and Expo and Google Summer of Code is still going ahead as planned. MySQL Conference For me personally it was the busiest conference I have ever attended, this is mostly down to the three talks I had to give on top of booth duty, meetings and Drizzle Developer Day.  I had some fantastic feedback from people whilst there on many subjects such as Drizzle and the MySQL 5.1 Plugins Development book.  It was great to meet up with old friends and make some new ones and I hope that the conference will continue for many years to come. SkySQL The day after returning from the conference I started my new role as Senior Sustaining Engineer at SkySQL (very jetlagged and in hindsight I should have given myself a day or two to recover!).  In this role I not only go back into supporting customers but also developing tools around the MySQL/Drizzle sphere.  I feel very honoured to be working with the team (many of whom I am working with for a second time), they have really done a great job of capturing the traditional MySQL spirit. One of the first things I have been working on is a new version of mydumper, once this is ready I will create a separate blog post about it.  I think it is a fantastic tool and hope that it will be able to help many users in the future. Google Summer of Code SkySQL have encouraged me to continue my work on Drizzle which I have also been doing.  As part of this I am a mentor for Google Summer of Code, a student called Olaf van der Spek will be working on improving the libdrizzle client API under my guidance.  Something I am very much looking forward to. The Return of the Jedi So, I am back in a support type role whilst also developing useful tools and patches to enhance the usability of MySQL, I will also be blogging more and getting involved in the community/ecosystem in other ways.  This is very similar to what I was doing at Sun/Oracle but for a company designed from the ground up to be much better for the staff and customers.  I am looking forward the the bright future of SkySQL. [Less]
Posted about 14 years ago
Drizzle source tarball, version 2011.04.16 has been released. This is a release of the Fremont series and is a development release.  Our stable GA release can be found here.  Future releases to the Elliott series will be announced appropriately. ... [More] Various bug fixes. We are still getting back up to speed following the O'Reilly MySQL Conference and Expo : ) fixes to xtrabackup fixes to replication fixes to logging stats various code refactoring innodb patches from Percona The Drizzle download file can be found here. [Less]
Posted about 14 years ago
I see that a lot of people just don’t get it when they start talking about high availability, redundancy, failover, etc. This is probably not going to change, but maybe I can try anyway. Let’s think about how you can survive a massive Amazon AWS ... [More] failure. You build your application to automatically move services to another part of the infrastructure that’s still up. Great! Now assume that everyone else is smart, too. Their applications move, too. What happens next? The whole AWS cloud melts to the ground. Have you never seen this happen, where one instance of something fails and others pick up the load and fail in turn? I have. OK, so let’s say that you’re really smart, and you also have the ability to move to an entirely different provider. Now suppose that other people are smart too. Next stop — Rackspace Cloud is down, and so is Joyent, and so on. You can’t just pretend that “the cloud” is infinite. It isn’t. Stop trying! In “the cloud,” you still have to do capacity planning, even though it’s hard or impossible, and you still have to think about the possibility that the resources you assume are there aren’t. Let’s think about cloud computing’s older name — utility computing. Can you think of any utilities that have had capacity shortages, brownouts, or even cascading failures? I worked a bunch of case studies on them in my engineering classes, but I also lived through some of them myself. This is why some old-fashioned, stupid, clueless people still own their own hardware. Those dumb clod-jumpers aren’t hip enough to move into the cloud where everything is magical. I bet they have kerosene lanterns for when the lights go out, too. With economies of scale come failures at scale. You can’t have it both ways. Related posts:Risks of running in the cloud Under-provisioning: the curse of the cloud Failure scenarios and solutions in master-master replication Drizzle stops the rain A review of Cloud Application Architectures by George Reese [Less]
Posted about 14 years ago by Marcus Eriksson
Just pushed up Drizzle JDBC 1.1 with 2 quite big new features:SSL support - add ?useSSL=true in the connection string (and do the usual ssl-in-java magic to make it work. Check the MySQL documentation for more information, it is set up exactly the ... [More] same way here)Multi queries - add ?allowMultiQueries=true to your connection string to be able to send several queries in one round trip to the server.Note that these features are only supported when using the driver against a MySQL server - Drizzle does not have these features (yet).Download it from the central maven repo (should be synced within an hour or so) [Less]
Posted about 14 years ago
My presentation from the MySQL UC didn’t give a lot of detail on the actual tool I have hacked up, nor did it go into how to play with it / try it out.  I figured I should rectify that (at least one person seemed interested in trying it out ... [More] <g>) To begin with, you should have the random query generator installed (see the docs for handling that).  Besides being *the* cutting edge, production-ready testing tool in the open-source dbms world, it comes with a handy data generator. One of the key features of kewpie, is that it can easily generate test queries against any test bed.  A standard randgen practice is to develop grammars and gendata files (which generates a user-specified test-bed) that are designed to work together.  By knowing which tables and columns will be available to the randgen at runtime, the grammar writer can tune the randgen to produce a higher percentage of valid queries. It is possible to just use the built in _field element, which will randomly retrieve some available field, however, being able to match columns by data type (for both joins and comparisons) results in much more interesting queries >:)  At some point, the randgen will likely be improved to overcome this, but it is a sad fact of qa that we often spend more time producing tests, than working on beefing up our testing infrastructure. At any rate, the kewpie demos are designed to work with the random data generator.  It is a very cool tool, and one can also use it with –start-and-exit to have a populated test server. Requirements: randgen dbd::drizzle (see randgen docs) MySQLDB (Drizzle speaks the MySQL protocol.  MySQLDB enables us to play well with SQLAlchemy too!) the demo branch from launchpad: bzr branch lp:~patrick-crews/drizzle/dbqp_kewpie_demo It is important to remember that kewpie is more of a science project than something you’ll use for testing any time soon.  It is meant to help illustrate the power behind large-scale automated query generation and analysis, but it pales in comparison to the mighty, mighty randgen.  However, if you are interested, please read on : ) Config files kewpie runs are controlled via a .cnf file.  Python has a very nice ConfigParser library and it seems like a solid way of organizing this information and getting at it.  Also, the very well-designed drizzle-automation uses similar files.  I’ll just digress a bit here to give big props to Jay Pipes of the fu for his work here.  It has informed a lot of the infrastructure work I’ve been doing for Drizzle. : ) test_info section: [test_info] comment = basic test of kewpie seeding test_schema = test init_query_count = 2 init_table_count = 3 init_column_count = 4 init_where_count = 0 # limits for various values max_table_count = 10 max_column_count = 25 max_where_count = 10 max_mutate_count = 3 This section seeds the initial query population.  In the example above, we produce 2 queries that each have 4 columns and use 2 tables (and no WHERE clause).  It is an eventual dream to have more fine-grained control over such things, but this was a proof-of-concept as much as anything. Next we have limits.  We don’t want to go over 10 tables, 25 columns (in the SELECT clause), or 10 conditions in the WHERE clause.  We also set max_mutate_count so that only 3 mutant queries will ever be produced from a seed.  Setting it higher = more variants that are possible from each query. mutators section: [mutators] add_table = 5 add_column = 1 add_where = 3 At the moment, kewpie only has 3 ways to manipulate a query – add_table, add_column, and add_where.  These should be fairly self-explanatory ; )  The vision is that these will eventually have a variety of parameters that can be set, so that we can one day ask that we only add conditions to the WHERE clause that use an integer column, for example.  The numeric values following each mutator name is how we stack the deck in favor of one mutator over another.  When we evaluate this section, we create a python list object that contains N occurances of each mutator name, when it comes time to mutate a query, we randomly choose one mutator from the list and then call that method against the query. test_servers section: [test_servers] servers = [[--innodb.replication-log]] As we do in other dbqp tests, we provide a list of python lists.  Each sublist represents the server options we want to use for the test server.  At present, there is no need to start more than 1 server, though there may be value in altering certain options. evaluators section: [evaluators] row_count = True explain_output = False Currently, we only have the row_count evaluator.  This ensures that at least one row of data was returned for a given query.  It is surprising how valuable just this tiny filter can be.  In Microsoft’s research, they found that purely random systems only produced valid queries 50% of the time.  The remainder tended to short out at the parser level.  The evaluator is what helps us produce useful queries, the mutators are what help the system hit its evaluation targets (whatever they may be). Future evaluators can measure code-coverage, server variable effect, log file effect, pretty much anything.  We want testing to be flexible and have borrowed heavily from the modular Validator and Reporter design of the randgen. Now to see it in action! We are going to take our join.cnf file and seed it so we create 2 initial queries, with 4 columns and 3 tables each.  We run this in conjunction with the conf/drizzle/drizzle.zz gendata file (sort of our go-to test bed for the randgen). ./dbqp --mode=kewpie --randgen-path=$RANDGEN_PATH --gendata=$RANDGEN_PATH/conf/drizzle/drizzle.zz join --verbose Setting --no-secure-file-priv=True for randgen mode... 21 Apr 2011 11:38:59 VERBOSE: Initializing system manager... 21 Apr 2011 11:38:59 VERBOSE: Processing source tree under test... 21 Apr 2011 11:38:59 INFO: Using Drizzle source tree: 21 Apr 2011 11:38:59 INFO: basedir: /home/user/repos/kewpie_demo <snip> 21 Apr 2011 11:39:00 INFO: Taking clean db snapshot... 21 Apr 2011 11:39:00 VERBOSE: Starting executor: bot0 21 Apr 2011 11:39:00 VERBOSE: Executor: bot0 beginning test execution... 21 Apr 2011 11:39:00 VERBOSE: Restoring from db snapshot 21 Apr 2011 11:39:00 VERBOSE: Starting server: bot0.s0 21 Apr 2011 11:39:00 INFO: bot0 server: 21 Apr 2011 11:39:00 INFO: NAME: s0 21 Apr 2011 11:39:00 INFO: MASTER_PORT: 9306 21 Apr 2011 11:39:00 INFO: DRIZZLE_TCP_PORT: 9307 21 Apr 2011 11:39:00 INFO: MC_PORT: 9308 21 Apr 2011 11:39:00 INFO: PBMS_PORT: 9309 21 Apr 2011 11:39:00 INFO: RABBITMQ_NODE_PORT: 9310 21 Apr 2011 11:39:00 INFO: VARDIR: /home/user/repos/kewpie_demo/tests/workdir/bot0/s0/var 21 Apr 2011 11:39:00 INFO: STATUS: 1 # 2011-04-21T11:39:00 Default schema: test # 2011-04-21T11:39:00 Executor initialized, id GenTest::Executor::Drizzle 2011.03.14.2269 () # 2011-04-21T11:39:00 # Creating Drizzle table: test.A; engine: ; rows: 0 . # 2011-04-21T11:39:00 # Creating Drizzle table: test.B; engine: ; rows: 0 . # 2011-04-21T11:39:00 # Creating Drizzle table: test.C; engine: ; rows: 1 . # 2011-04-21T11:39:00 # Creating Drizzle table: test.D; engine: ; rows: 1 . # 2011-04-21T11:39:00 # Creating Drizzle table: test.AA; engine: ; rows: 10 . # 2011-04-21T11:39:00 # Creating Drizzle table: test.BB; engine: ; rows: 10 . # 2011-04-21T11:39:00 # Creating Drizzle table: test.CC; engine: ; rows: 100 . # 2011-04-21T11:39:00 # Creating Drizzle table: test.DD; engine: ; rows: 100 . 21 Apr 2011 11:39:01 INFO: Executing query: SELECT table_1.col_char_1024_not_null_key AS column_1, table_3.col_char_1024 AS column_2, table_3.col_enum AS column_3, table_1.pk AS column_4 FROM AA AS table_1 RIGHT JOIN D AS table_2 ON table_1.col_char_1024_not_null_key = table_2.col_char_10_not_null_key LEFT OUTER JOIN D AS table_3 ON table_2.col_text_key = table_3.col_text 21 Apr 2011 11:39:01 INFO: EVALUATOR: row_count STATUS: True EXTRA: 1 21 Apr 2011 11:39:01 VERBOSE: ORIG QUERY:  SELECT table_1.col_char_1024_not_null_key AS column_1, table_3.col_char_1024 AS column_2, table_3.col_enum AS column_3, table_1.pk AS column_4 FROM AA AS table_1 RIGHT JOIN D AS table_2 ON table_1.col_char_1024_not_null_key = table_2.col_char_10_not_null_key LEFT OUTER JOIN D AS table_3 ON table_2.col_text_key = table_3.col_text 21 Apr 2011 11:39:01 VERBOSE: USING ADD_TABLE mutation 21 Apr 2011 11:39:01 VERBOSE: MUTANT QUERY: SELECT table_1.col_char_1024_not_null_key AS column_1, table_3.col_char_1024 AS column_2, table_3.col_enum AS column_3, table_1.pk AS column_4 FROM AA AS table_1 RIGHT JOIN D AS table_2 ON table_1.col_char_1024_not_null_key = table_2.col_char_10_not_null_key LEFT OUTER JOIN D AS table_3 ON table_2.col_text_key = table_3.col_text RIGHT JOIN B AS table_4 ON table_3.col_text_key = table_4.col_text_not_null <snip> From this output we can see how the query was executed, evaluated, and mutated.  As we wanted, we have 4 columns and 3 tables in the original query and we add extra tables to queries that evaluate well. Now let’s see what happens when we use a different gendata file.  We’ll use one called varchar_drizzle.zz which, surprisingly enough, only uses varchars: ./dbqp --mode=kewpie --randgen-path=$RANDGEN_PATH --gendata=$RANDGEN_PATH/conf/drizzle/varchar_drizzle.zz join --verbose Setting --no-secure-file-priv=True for randgen mode... 21 Apr 2011 11:44:20 VERBOSE: Initializing system manager... 21 Apr 2011 11:44:20 VERBOSE: Processing source tree under test... 21 Apr 2011 11:44:20 INFO: Using Drizzle source tree: 21 Apr 2011 11:44:20 INFO: basedir: /home/user/repos/kewpie_demo <snip> 21 Apr 2011 11:44:20 INFO: Taking clean db snapshot... 21 Apr 2011 11:44:20 VERBOSE: Starting executor: bot0 21 Apr 2011 11:44:20 VERBOSE: Executor: bot0 beginning test execution... 21 Apr 2011 11:44:20 VERBOSE: Restoring from db snapshot 21 Apr 2011 11:44:20 VERBOSE: Starting server: bot0.s0 21 Apr 2011 11:44:20 INFO: bot0 server: 21 Apr 2011 11:44:20 INFO: NAME: s0 21 Apr 2011 11:44:20 INFO: MASTER_PORT: 9306 21 Apr 2011 11:44:20 INFO: DRIZZLE_TCP_PORT: 9307 21 Apr 2011 11:44:20 INFO: MC_PORT: 9308 21 Apr 2011 11:44:20 INFO: PBMS_PORT: 9309 21 Apr 2011 11:44:20 INFO: RABBITMQ_NODE_PORT: 9310 21 Apr 2011 11:44:20 INFO: VARDIR: /home/user/repos/kewpie_demo/tests/workdir/bot0/s0/var 21 Apr 2011 11:44:20 INFO: STATUS: 1 # 2011-04-21T11:44:20 Default schema: test # 2011-04-21T11:44:20 Executor initialized, id GenTest::Executor::Drizzle 2011.03.14.2269 () # 2011-04-21T11:44:20 # Creating Drizzle table: test.table0_varchar_150_not_null; engine: ; rows: 0 . # 2011-04-21T11:44:20 # Creating Drizzle table: test.table1_varchar_150_not_null; engine: ; rows: 1 . # 2011-04-21T11:44:20 # Creating Drizzle table: test.table2_varchar_150_not_null; engine: ; rows: 2 . # 2011-04-21T11:44:20 # Creating Drizzle table: test.table10_varchar_150_not_null; engine: ; rows: 10 . # 2011-04-21T11:44:20 # Creating Drizzle table: test.table100_varchar_150_not_null; engine: ; rows: 100 . 21 Apr 2011 11:44:20 INFO: Executing query: SELECT table_1.col_varchar_1024 AS column_1, table_3.pk AS column_2, table_2.col_varchar_1024_key AS column_3, table_3.col_varchar_1024_not_null AS column_4 FROM table10_varchar_150_not_null AS table_1 RIGHT OUTER JOIN table2_varchar_150_not_null AS table_2 ON table_1.col_varchar_1024 = table_2.col_varchar_1024_not_null LEFT JOIN table10_varchar_150_not_null AS table_3 ON table_1.col_varchar_1024_key = table_3.pk 21 Apr 2011 11:44:20 INFO: EVALUATOR: row_count STATUS: True EXTRA: 2 21 Apr 2011 11:44:20 VERBOSE: ORIG QUERY:  SELECT table_1.col_varchar_1024_not_null_key AS column_1, table_3.pk AS column_2, table_3.col_varchar_1024_not_null_key AS column_3, table_1.col_varchar_1024 AS column_4 FROM table0_varchar_150_not_null AS table_1 RIGHT JOIN table2_varchar_150_not_null AS table_2 ON table_1.col_varchar_1024_not_null_key = table_2.col_varchar_1024 LEFT OUTER JOIN table1_varchar_150_not_null AS table_3 ON table_2.col_varchar_1024_not_null_key = table_3.col_varchar_1024_not_null 21 Apr 2011 11:44:20 VERBOSE: USING ADD_TABLE mutation 21 Apr 2011 11:44:21 VERBOSE: MUTANT QUERY: SELECT table_1.col_varchar_1024_not_null_key AS column_1, table_3.pk AS column_2, table_3.col_varchar_1024_not_null_key AS column_3, table_1.col_varchar_1024 AS column_4 FROM table0_varchar_150_not_null AS table_1 RIGHT JOIN table2_varchar_150_not_null AS table_2 ON table_1.col_varchar_1024_not_null_key = table_2.col_varchar_1024 LEFT OUTER JOIN table1_varchar_150_not_null AS table_3 ON table_2.col_varchar_1024_not_null_key = table_3.col_varchar_1024_not_null RIGHT JOIN table100_varchar_150_not_null AS table_4 ON table_3.col_varchar_1024_key = table_4.col_varchar_1024_not_null As you can see, the testbed (created / populated tables have changed).  As a result, the generated queries have changed as well.  Allowing this kind of flexibility will allow qa engineers to not only look for good queries, but also for interesting query / test bed combinations (sometimes optimizations and code paths executed rely on both) in an easy and automated manner. Next, we’ll take a look at how to add other things into the mix.  Suppose you want to also add WHERE conditions to your generated queries – it is as simple as tweaking the following line in join.cnf from: [mutators] add_table = 5 add_column = 0 add_where = 0 to [mutators] add_table = 2 add_column = 0 add_where = 4 We are now twice as likely to add a WHERE condition as we are to add a table to a query.  Let’s see what happens from the exact same command line: ./dbqp --mode=kewpie --randgen-path=$RANDGEN_PATH --gendata=$RANDGEN_PATH/conf/drizzle/varchar_drizzle.zz join --verbose Setting --no-secure-file-priv=True for randgen mode... 21 Apr 2011 11:50:16 VERBOSE: Initializing system manager... 21 Apr 2011 11:50:16 VERBOSE: Processing source tree under test... 21 Apr 2011 11:50:16 INFO: Using Drizzle source tree: 21 Apr 2011 11:50:16 INFO: basedir: /home/user/repos/kewpie_demo <snip> 21 Apr 2011 11:50:16 INFO: Taking clean db snapshot... 21 Apr 2011 11:50:16 VERBOSE: Starting executor: bot0 21 Apr 2011 11:50:16 VERBOSE: Executor: bot0 beginning test execution... 21 Apr 2011 11:50:16 VERBOSE: Restoring from db snapshot 21 Apr 2011 11:50:16 VERBOSE: Starting server: bot0.s0 21 Apr 2011 11:50:16 INFO: bot0 server: 21 Apr 2011 11:50:16 INFO: NAME: s0 21 Apr 2011 11:50:16 INFO: MASTER_PORT: 9306 21 Apr 2011 11:50:16 INFO: DRIZZLE_TCP_PORT: 9307 21 Apr 2011 11:50:16 INFO: MC_PORT: 9308 21 Apr 2011 11:50:16 INFO: PBMS_PORT: 9309 21 Apr 2011 11:50:16 INFO: RABBITMQ_NODE_PORT: 9310 21 Apr 2011 11:50:16 INFO: VARDIR: /home/user/repos/kewpie_demo/tests/workdir/bot0/s0/var 21 Apr 2011 11:50:16 INFO: STATUS: 1 # 2011-04-21T11:50:16 Default schema: test # 2011-04-21T11:50:16 Executor initialized, id GenTest::Executor::Drizzle 2011.03.14.2269 () # 2011-04-21T11:50:16 # Creating Drizzle table: test.table0_varchar_150_not_null; engine: ; rows: 0 . # 2011-04-21T11:50:16 # Creating Drizzle table: test.table1_varchar_150_not_null; engine: ; rows: 1 . # 2011-04-21T11:50:16 # Creating Drizzle table: test.table2_varchar_150_not_null; engine: ; rows: 2 . # 2011-04-21T11:50:16 # Creating Drizzle table: test.table10_varchar_150_not_null; engine: ; rows: 10 . # 2011-04-21T11:50:16 # Creating Drizzle table: test.table100_varchar_150_not_null; engine: ; rows: 100 . <snip> 21 Apr 2011 11:50:17 INFO: Executing query: SELECT table_1.col_varchar_1024 AS column_1, table_3.pk AS column_2, table_2.col_varchar_1024_key AS column_3, table_3.col_varchar_1024_not_null AS column_4 FROM table10_varchar_150_not_null AS table_1 RIGHT OUTER JOIN table2_varchar_150_not_null AS table_2 ON table_1.col_varchar_1024 = table_2.col_varchar_1024_not_null LEFT JOIN table10_varchar_150_not_null AS table_3 ON table_1.col_varchar_1024_key = table_3.pk 21 Apr 2011 11:50:17 INFO: EVALUATOR: row_count STATUS: True EXTRA: 2 21 Apr 2011 11:50:17 VERBOSE: ORIG QUERY:  SELECT table_1.col_varchar_1024_not_null_key AS column_1, table_3.pk AS column_2, table_3.col_varchar_1024_not_null_key AS column_3, table_1.col_varchar_1024 AS column_4 FROM table0_varchar_150_not_null AS table_1 RIGHT JOIN table2_varchar_150_not_null AS table_2 ON table_1.col_varchar_1024_not_null_key = table_2.col_varchar_1024 LEFT OUTER JOIN table1_varchar_150_not_null AS table_3 ON table_2.col_varchar_1024_not_null_key = table_3.col_varchar_1024_not_null 21 Apr 2011 11:50:17 VERBOSE: USING ADD_WHERE mutation 21 Apr 2011 11:50:17 VERBOSE: MUTANT QUERY: SELECT table_1.col_varchar_1024_not_null_key AS column_1, table_3.pk AS column_2, table_3.col_varchar_1024_not_null_key AS column_3, table_1.col_varchar_1024 AS column_4 FROM table0_varchar_150_not_null AS table_1 RIGHT JOIN table2_varchar_150_not_null AS table_2 ON table_1.col_varchar_1024_not_null_key = table_2.col_varchar_1024 LEFT OUTER JOIN table1_varchar_150_not_null AS table_3 ON table_2.col_varchar_1024_not_null_key = table_3.col_varchar_1024_not_null WHERE table_1.pk >= 'W' As I said, it is still beta software ; )  However, in all seriousness, we want to be able to generate ‘bad’ queries, but to have the option of not using them and filtering them out of a test if they serve no purpose. Hopefully, this will give anyone that is interested a better idea of how to play with the code.  Development will likely continue, but this is still more of a prototype of how things could be.  If you *really* want to test a database, I still highly recommend the amazing random query generator – it is good for blowing things up! I really want this picture to be shown when anyone searches for 'randgen' ; ) [Less]
Posted about 14 years ago
So… I had another one of those “hrrm… this shouldn’t be hard to hack a proof-of-concept” moments. Web apps are increasingly speaking JSON all around the place. Why can’t we speak JSON to/from the database? Why? Seriously, why not? One reason why ... [More] MongoDB has found users is that JSON is very familiar to people. It has gained popularity in spite of having pure disregard for the integrity and safety of your data. So I started with a really simple idea: http server in the database server. Thanks to the simple code to do that with libevent, I got that going fairly quickly. Finding a rather nice C++ library to create and parse JSON was the next challenge. I found JSONcpp, a public domain library with a nice API and proceeded to bring it into the tree (it’s not much code). I then created a simple way to find out the version of the Drizzle server you were speaking to: $ curl http://localhost:8765/0.1/version { "version" : "2011.04.15.2285" } But that wasn’t nearly enough… I also wanted to be able to issue arbitrary queries. Thanks to the supporting code we have in the Drizzle server for EXECUTE() (also used by the replication slave), this was also pretty easy. I created a way to execute the content of a HTTP POST request as if you had done so with EXECUTE() – all nicely wrapped in a transaction. I created a simple table using the drizzle client, connecting over a normal TCP socket speaking the MySQL protocol and inserted a row in it: $ ../client/drizzle --port 9306 test Welcome to the Drizzle client.. Commands end with ; or \g. Your Drizzle connection id is 4 Connection protocol: mysql Server version: 2011.04.15.2285 Source distribution (json-interface) Type 'help;' or '\h' for help. Type '\c' to clear the buffer. drizzle> show create table t1\G *************************** 1. row *************************** Table: t1 Create Table: CREATE TABLE `t1` ( `a` INT NOT NULL AUTO_INCREMENT, `b` VARCHAR(100) COLLATE utf8_general_ci DEFAULT NULL, PRIMARY KEY (`a`) ) ENGINE=InnoDB COLLATE = utf8_general_ci 1 row in set (0.001209 sec) drizzle> insert into t1 (b) values ("from mysql protocol"); Query OK, 1 row affected (0.00207 sec) Now to select rows from it via HTTP and get a JSON object back with the result set: $ curl http://localhost:8765/0.1/sql --data 'select * from t1;' { "query" : "select * from t1;", "result_set" : [ [ "1", "from mysql protocol" ], [ "", "" ] ], "sqlstate" : "00000" } I can also insert more rows using the HTTP interface and then select them from the MySQL protocol interface: $ curl http://localhost:8765/0.1/sql --data 'insert into t1 values (NULL, \"from HTTP!\");' { "query" : "insert into t1 values (NULL, \\\"from HTTP!\\\");", "sqlstate" : "00000" } drizzle> select * from t1; +---+---------------------+ | a | b | +---+---------------------+ | 1 | from mysql protocol | | 2 | from HTTP! | +---+---------------------+ 2 rows in set (0.000907 sec) So what does this get us? With the addition of proper authentication, you could start doing some really quite neat and nifty things. I imagine we could add interfaces to avoid SQL and directly do key lookups, table scans and index range scans, giving really quite sweet performance. We could start building web tools to manage and manipulate the database speaking the native language of the web. But… there’s more! Since we have a web server and a way to execute queries via HTTP along with getting the result set as JSON, why can’t we have a simple Web UI for monitoring the database server and running queries built into the database server? Yes we can. If you wanted a WHERE condition or anything else, easy. Change the query, hit execute: No TCP connection or parsing the MySQL protocol or anything. Just HTTP requests straight to the database server from the browser with a bit of client side javascript producing the HTML for the table. Proof of concept code is up on launchpad in lp:~stewart/drizzle/json-interface [Less]
Posted about 14 years ago
Tweet This year marked my fifth year at the MySQL Conference. With some distance between the Oracle acquisition, this year’s show provided an interesting glimpse into the status of MySQL, both the project and the ecosystem. Let’s get ... [More] to the questions. Q: Before we begin, do you have anything to disclose? A: Yes. Prior to its acquisition by Oracle, Sun was a RedMonk client. And prior to its acquisition by Sun, MySQL was a RedMonk client. In addition, multiple entities that compete directly or indirectly with MySQL are RedMonk clients, including Akiban, Basho, IBM, Lucid Imagination, Membase, and Microsoft. Q: With that out of the way, how did the show do, logistically? A: I’m not aware of what the actual attendance figures were, but they were reported to be down from last year. The show floor, at least, was sparsely populated. In general, the show is clearly down from the height of its popularity. Q: What is this a symptom of, do you think? A: Many things. Market fragmentation, scheduling conflicts, but Oracle probably played the most direct role. Q: How so? What did Oracle do, or not do, to impact the MySQL Conference? A: As with the prior year, Oracle’s commitment to the show was anemic. To begin with, their Collaborate conference was scheduled direct opposite the MySQL conference. Financially, they were not even the lead sponsor for a conference focused on their product; that spot went instead to a MySQL competitor, EnterpriseDB. More broadly, Oracle’s relationship with its ecosystem is more complicated than it used to be. MySQL partners deemed competitive with Oracle, for example, have had their relationships terminated. Q: So Oracle is then a poor steward of MySQL? A: Of the community and ecosystem, perhaps. With respect to the product itself, however, Oracle appears to be living up to its EU commitments: 5.6 looks like an excellent release, and reactions have been very positive. Q: What did this conference say about the future of MySQL? A: First, that the community remains large and vibrant. Second, that MySQL is potentially facing an Android like future of multiple implementations. Last, that MySQL is an option these days, rather than the option. Q: Let’s take those in order. How did this conference validate the size of the MySQL Community, particularly if attendance was down? A: Through the sheer variety and scope of sessions and attendees. I spoke with services people, hardware people, software product people, developers, engineers, DBAs, and sysadmins…and all in the space of a few hours. The speakers included the usual suspects: members of the MySQL family (Data Differential. Monty Program, MySQL, Percona, SkySQL, etc) and long time web users (Craigslist, Facebook, Google, Twitter, Yahoo, Zynga, etc). But it also featured talks from AOL, Amazon, Blizzard, BYU, Canonical, Metric Insights, Rackspace, and Recorded Future, not to mention erstwhile competitors such as Akiban, Aster Data, Cloudera, Membase, Mongo, and PostgreSQL. That’s a reasonably diverse base. Q: And what about the fragmentation? A: MySQL users increasingly face an interesting question: where do I buy MySQL support? The obvious answer, and certainly the safe one for larger businesses, is Oracle. The difficulty, as it usually is with Oracle, is in pricing. As in, it goes up. Regularly. At some point, for some customers, Oracle’s offerings become less relevant as a function of their pricepoint. Fortunately for these customers, there is no shortage of alternative options. You can purchase MySQL, or something very like it, from Amazon. Or you could buy support from SkySQL, launched late last year and home to many of the original MySQL staffers. Percona will support its flavor of MySQL for you. Or canonical MySQL, Drizzle, MariaDB, or RDS. Pythian provides MySQL support services. And so on. If you’re a customer, which do you choose, and why? It was easy – or easier – when MySQL was an independent. As part of Oracle, the support calculus for some users will change. But the plethora of available support options is counterintuitively a negative for customers, as they contemplate the array of options available to them. This is a natural consequence of the relatively recent acquisition of Sun/MySQL by Oracle; roiled markets take time to settle. But the issue of fragmentation remains. Henrik created MepSQL for a reason, and that reason is that there is a high volume of decentralized development occurring around the codebase. This is a positive for functional development, obviously, but it poses challenges from a customer adoption standpoint. Centralization would be useful, but under what mechanism? A commercial vendor? Neither are likely, though you should watch Amazon here [coverage]. Q: And lastly, how is MySQL just an option now, rather than the option? A: As we’ve documented previously [coverage], the wider industry trend is away from general purpose and towards specialization. The database market is no exception to this: we’ve seen an accelerating acceptance of specialized datastores within businesses of non-specific types or industries. While these typically non-relational technologies are not typically displacing traditional databases, they are absorbing substantial volumes of new workloads that were once serviced by RDBMS systems. Rather than bending MySQL into a key value store, users instead are selecting built-for-purposes persistence mechanisms such as Bitcask, Membase, memcached, Redis or Voldemort. Or column databases. Or document databases. Pick your datastore type of choice. The end result of this process is a more diverse, competitive marketplace. One in which MySQL is a popular option for certain workloads, but no longer the de facto only option. Q: Bigger picture, what do you think of the state of MySQL, circa 2011? What would you tell a MySQL user? A: Like Brian Aker, I’m generally positive. Whatever you think about the fragmentation, the increased market competition, or Oracle’s lack of support for community oriented events, the fact is that MySQL remains an immense ecosystem under active development. Support options are varied, the code is being improved, and the ubiquity of the database is a massive advantage. As such, I see no reason to recommend against current or future MySQL usage. Q: And for developers? A: Most developers have already made the adjustment to Oracle’s acquisition. Many were already leveraging MySQL side-by-side with specialized or alternative datastores – the interest in NoSQL predates the Oracle acquisition – and those that had issues with Oracle’s support have moved on to the likes of Percona and SkySQL. Q: What about the idea that it won’t be called the MySQL Conference anymore, but the open source database conference? Will the 2011 MySQL Conference be the last one? A: That’s more a function of conference logistics than the health of MySQL. MySQL AB, while it was independent, outsourced the conference organization to O’Reilly in a partnership that worked well for both parties. For better or for worse, Oracle appears to have no interest in similar community focused events, and as a result it would be impossible to blame O’Reilly if they retired the MySQL branding considering that the product owner is only peripherally involved. And if you factor in market context, the timing is appropriate: the open source database space is exploding. Frankly, whatever O’Reilly calls the show next year – assuming they have one, of course – I’ll be there. The ecosystem was always bigger than just MySQL, and if the naming reflects that, so be it. [Less]
Posted about 14 years ago
A number of years ago I coined the term “the mysql ecosystem”. I did it at the time to express a view that MySQL had moved beyond being just what MySQL AB defined “MySQL” as being.   It was a radical thought at the time. In part because when I ... [More] expressed it, I did it not only outwardly to the world, but inwardly to the company as well. Many at the time thought that the ecosystem danced at the whim of the MySQL AB entity. When Peter Zaitsev left to form Percona I remember very clearly a management meeting where there was a hubris that his business would amount to nothing, and that he was missing his opportunity to be a part of something greater. History is of course writing a very different story. So how is the ecosystem? It turns out it is pretty healthy. I wasn’t sure if that was the case up until a couple of weeks ago. I was having lunch with Moshe Shadmon of ScaleDB and I asked him “Do you think the market is collapsing?”  His response to me was one of enthusiasm. He pointed out to me the obvious indicators. The growth Amazon has created with its relational database service, and the continued growth in applications that support the MySQL interface.   The conversation put me into a really positive mind set about the community. What did I find at the O’Reilly MySQL Conference? I found a lot of happy people. I saw adoption numbers which show positive growth. What didn’t I find? The overwhelming negativism of the previous two years that I have sensed in the community was not to be found. It has at times made me question not only my involvement, but the involvement of Drizzle* in the ecosystem. I personally don’t wake up everyday wanting to welcome that into my life.  But what was the vibe of the community this year, and that of the conference? This year the negative vibe was seen as something that was not only as ugly, but as something that was an aberration. An evolutionary path that the ecosystem seems to not be taking. That is pretty awesome. What are the big questions facing the Ecosystem? Oracle. I watch the MySQL trees, and I see that they are having an overall positive influence on the codebase. They are making good decisions, none of which appear to be malicious in nature. I hear from people who are using it, and I get an overall positive view of the work.  The people I ask?  They aren’t the shills that are trying to gain favor with Oracle, these are people who have 24x7 needs who don’t have the time to write blog entries, and who see MySQL has just one piece of their overall architecture.  If you are using MySQL today, and you need a solid path forward on it as a platform? I’d stick with what Oracle is creating. Oracle will be Oracle though. They have a giant marketing machine that will not want, and by policy not allow, events to occur which favor a product like MySQL over other products. Oracle Open World will not be a MySQL conference. MySQL will be a track in that conference, a booth at best. Oracle will push for venues that they control. Oracle will push for users to adopt their stack, and MySQL is just another cog in their system. A vector to attack Microsoft? A product to keep at bay the growth of an open source database?  It might be all of that and more, but it will not be a crown jewel. The company is too large to focus its attention on MySQL, and the money that it obtains from MySQL is not enough for it to ever take center stage.  In the end? The attention span of large companies is quite small, and at some point it will fade. Will Oracle have a MySQL Sunday again this year at Oracle World? If it does, will it have one the year after? There is nothing wrong with this, it is just the nature of large companies. Percona.  Percona is impressive. They do excellent work based on an excellent reputation which they have grown by doing the right thing. I’ve been asked before if they will become the next MySQL. I don’t believe they will. Percona looks to be the next Electronic Data Services.   Do they have a server product? Yes. Will Percona Server be the next MySQL server? No. Is that because it is inferior? No. It is because Percona server is about delivering on their ability to be the best at MySQL consulting. It is not going to go away, but I will be surprised if Percona decides that it is their one and only product that they service. Percona Server is an asset for them, but they show no evidence of being singularly focused by their own product.  SkySQL. SkySQL has a great feeling to it. It has the exciting feel that MySQL once had, but I see no signs of the baggage that MySQL AB gained in later years. The people they are hiring are excellent. In the MySQL world they could very easily take the dominate position in the next year.  Monty Program. I don’t feel like I can really say much here, but I don’t want to say anything by leaving it off the list either. Amazon. They were a sponsor of the conference this year. They are certainly a player in the ecosystem, though for the most part a silent one. From an engineering stand point I believe they have one hell of a challenge. How do they continue to provide MySQL services without a deep technical bench and a roadmap that will allow them to adopt new versions of MySQL? They don’t shape the MySQL universe in the ways that others do. They do not provide code, and they do not influence the direction of the product in any manner that allows them to influence beyond the scope of their own service.  Their service though? Amazon could be setting a stage where we see the MySQL interface solidified. If a large portion of MySQL apps are shaped by the question “will this app work in the Amazon cloud?” then they will have their say. Are there others? There are plenty of others. Canonical and Redhat will shape the Linux distributions, and that in turn will shape what users first see. There are players like Infobright who will shape the analytical market.  Postgres continues to make progress. When I ask folks who study the market how they see Postgres I never get a response that it is on their radar. But when I ask operation folks? There I hear about its growth. At some point an application is going to come along that will change the view of the market.  The MySQL codebase? It is GPL. Nothing has changed about that, and nothing that we are seeing, or that is talked about in private conversation, leads me to believe that is changing. There was some hubbub at the conference about Oracle removing the FLOSS exception from the codebase. There was talk that this created a situation where at any moment Oracle could change the exception and squeeze someone via a license gotcha. When it was brought up it made me suspicious as well.  The thing is? Its up on the website still, and the page has been recently updated. It has also been cached and stored by Google. Removing it from the source code doesn’t mean much.   Its good to be suspicious, but I suspect that all the removal was, was a simple mistake made by a blanket policy about communication. Oracle’s open source behavior, its table manners, are haphazard. I don’t believe you can expect anything else. In the end? The MySQL Ecosystem is doing just fine. There are challenges, but there has always been challenges.  *Drizzle I leave Drizzle out of the discussion because I both feel like it is inappropriate to mention it because of my own involvement, and because I actively debate our involvement in the MySQL Ecosystem. I’d rather push for our own environment. [Less]
Posted about 14 years ago
My very first Drizzle trunk merge: Kind of silly, it’s literally a 1 character change for a drizzle command line shortcut.  I found it in the low hanging fruit drizzle bug list and figured I’d do it to get a sense for the branch -> merge process. ... [More] Hopefully this is only the beginning, but we’ll see how much I can get back into C++ with a day job doing something else… [Less]