53
I Use This!
High Activity

News

Analyzed about 7 hours ago. based on code collected about 12 hours ago.
Posted over 11 years ago
DataStax Enterprise‘s heavy usage of Cassandra’s innate datacenter concepts are important as they allow multiple workloads to be run across multiple datacenters. This occurs on near real-time data without ETL processes or any other manual operations. ... [More] In most setups, this is handled via “virtual” datacenters that follow Cassandra’s internals for datacenters, while the actual hardware exists in the same physical datacenter.read more [Less]
Posted over 11 years ago
DataStax Enterprise‘s heavy usage of Cassandra’s innate datacenter concepts are important as they allow multiple workloads to be run across multiple datacenters. This occurs on near real-time data without ETL processes or any other manual operations. ... [More] In most setups, this is handled via “virtual” datacenters that follow Cassandra’s internals for datacenters, while the actual hardware exists in the same physical datacenter. Over the course of this blog post, we will cover this and a couple of other use cases for multiple datacenters. Workload Separation By implementing datacenters as the divisions between varying workloads, DataStax Enterprise allows a natural distribution of data from real-time datacenters to near real-time analytics and search datacenters. Whenever a write comes in via a client application, it hits the main Cassandra datacenter and returns the acknowledgment at the current consistency level (typically less than LOCAL_QUORUM, to allow for a high throughput and low latency). In parallel and asynchronously, these writes are sent off to the Analytics and Solr datacenters based on the replication strategy for active keyspaces. A typical replication strategy would look similar to {Cassandra: 3, Analytics: 2, Solr: 1}, depending on use cases and throughput requirements. Once these asynchronous hints are received on the additional clusters, they undergo the normal write procedures and are assimilated into that datacenter. This way, any analytics jobs that are running can easily and simply access this new data without an ETL process. For DSE’s Solr nodes, these writes are introduced into the memtables and additional Solr processes are triggered to incorporate this data. Live Backup Scenario Some Cassandra use cases instead use different datacenters as a live backup that can quickly be used as a fallback cluster. We will cover the most common use case using Amazon’s Web Services Elastic Cloud Computing (EC2) in the following example. Specific Use Case Allow your application to have multiple fallback patterns across multiple consistencies and datacenters. Example A client application was created and currently sends requests to EC2′s US-East-1 region at a consistency level (CL) of QUORUM. Later, another datacenter is added to EC2′s US-West-1 region to serve as a live backup. The replication strategy can be a full live backup ({US-East-1: 3, US-West-1: 3}) or a smaller live backup ({US-East-1: 3, US-West-1: 2}) to save costs and disk usage for this regional outage scenario. All clients continue to write to the US-East-1 nodes by ensuring that the client’s pools are restricted to just those nodes, to minimize cross datacenter latency. To implement a better cascading fallback, initially the client’s connection pool will only be aware of all nodes in the US-East-1 region. In the event of client errors, all requests will retry at a CL of QUORUM, for X times, then decrease to a CL of ONE while escalating the appropriate notifications. If the requests are still unsuccessful, using a new connection pool consisting of nodes from the US-West-1 datacenter, requests should begin contacting US-West-1 at a higher CL, before ultimately dropping down to a CL of ONE. Meanwhile, any writes to US-West-1 should be asynchronously tried on US-East-1 via the client, without waiting for confirmation and instead logging any errors separately. For cases like this, natural events and other failures can be prevented from affecting your live applications. Recovery As long as the original datacenter is restored within gc_grace_seconds (10 days by default), perform a rolling repair (without the -pr option) on all of its nodes once they come back online. If, however, the nodes will be set to come up and complete the repair commands after gc_grace_seconds, you will need to take the following steps in order to ensure that deleted records are not reinstated: remove all the offending nodes from the ring using `nodetool removetoken`, clear all their data (data directories, commit logs, snapshots, and system logs), decrement all tokens in this region, disable auto_bootstrap, start up the entire region, and, finally, run a rolling repair (without the -pr option) on all nodes in the other region. After these nodes are up to date, you can restart your applications and continue using your primary datacenter. Geographical Location Scenario There are certain use cases where data should be housed in different datacenters depending on the user’s location in order to provide more responsive exchange. The use case we will be covering refers to datacenters in different countries, but the same logic and procedures apply for datacenters in different regions. The logic that defines which datacenter a user will be connected to resides in the application code. Specific Use Case Have users connect to datacenters based on geographic location, but ensure this data is available cluster-wide for backup, analytics, and to account for user travel across regions. Example The required end result is for users in the US to contact one datacenter while UK users contact another to lower end-user latency. An additional requirement is for both of these datacenters to be a part of the same cluster to lower operational costs. This can be handled using the following rules: When reading and writing from each datacenter, ensure that the clients the users connect to can only see one datacenter, based on the list of IPs provided to the client. If doing reads at QUORUM, ensure that LOCAL_QUORUM is being used and not EACH_QUORUM since this latency will affect the end user’s performance experience. Depending on how consistent you want your datacenters to be, you may choose to run repair operations (without the -pr option) more frequently than the required once per gc_grace_seconds. Benefits When setting up clusters in this manner: You ensure faster performance for each end user. You gain multi-region live backups for free, just as mentioned in the section above. Granted, the performance for requests across the US and UK will not be as fast, but your application does not have to hit a complete standstill in the event of catastrophic losses. Users can travel across regions and in the time taken to travel, the user’s information should have finished replicating asynchronously across regions. Caveats Always use the NetworkTopologyStrategy when using multiple datacenters. SimpleStrategy will work across datacenters, but has the following disadvantages: It is harder to maintain. Does not provide any of the features mentioned above. Introduces latency on each write (depending on datacenter distance and latency between datacenters). Defining one rack for the entire cluster is the simplest and most common implementation. Multiple racks should be avoided for the following reasons: Most users tend to ignore or forget rack requirements that state racks should be in an alternating order to allow the data to get distributed safely and appropriately. Many users are not using the rack information effectively by using a setup with as many racks as they have nodes, or similar non-beneficial scenarios. When using racks correctly, each rack should typically have the same number of nodes. In a scenario that requires a cluster expansion while using racks, the expansion procedure can be tedious since it typically involves several node moves and has has to ensure to ensure that racks will be distributing data correctly and evenly. At times when clusters need immediate expansion, racks should be the last things to worry about. [Less]
Posted over 11 years ago
Schema Design the cornerstone of making awesome products. If you do not understand you data, do not understand what users need, and do not understand limitations of hardware and software you can not effectively design schema.To understand schema ... [More] design in Cassandra I think start with what Cassandra is: http://wiki.apache.org/cassandra/Cassandra is a highly scalable, eventually consistent, distributed, structured key-value store. Cassandra brings together the distributed systems technologies from Dynamo and the data model from Google's BigTable. Like Dynamo, Cassandra is eventually consistent. Like BigTable, Cassandra provides a ColumnFamily-based data model richer than typical key/value systems.  This  From http://research.google.com/archive/bigtable.htmlA Bigtable is a sparse, distributed, persistent multi-dimensional sorted map. The map is indexed by a rowkey, column key, and a timestamp; each value in the mapis an uninterpreted array of bytes.(row:string, column:string, time:int64) ? stringTable *T = OpenOrDie("/bigtable/web/webtable");RowMutation r1(T, "com.cnn.www");r1.Set("anchor:www.c-span.org", "CNN");r1.Delete("anchor:www.abc.com");Operation op;Apply(&op, &r1); [Less]
Posted over 11 years ago
Schema Design the cornerstone of making awesome products. If you do not understand you data, do not understand what users need, and do not understand limitations of hardware and software you can not effectively design schema.To understand schema design in Cassandra I think start with what Cassandra is: read more
Posted over 11 years ago
One of the interesting subplots of "Hadoop Strata World" aka "Cassandra Denial World" was the battle for mindshare over who the HDCIC (Head Dremel Clone In Charge) is. The two contenders are https://github.com/cloudera/impal and Apache Drill (of ... [More] course its hard to say Drill is a contender since it is nothing but a proposal) Both are attempting to produce the best Dremel clone  read more [Less]
Posted over 11 years ago
One of the interesting subplots of "Hadoop Strata World" aka "Cassandra Denial World" was the battle for mindshare over who the HDCIC (Head Dremel Clone In Charge) is. The two contenders are https://github.com/cloudera/impal and Apache Drill (of ... [More] course its hard to say Drill is a contender since it is nothing but a proposal) Both are attempting to produce the best Dremel clone  If I had to sum up Dremel, I would sum it up in one sentence "Make protobufs query-able". Dremel does some clever stuff to break row orientated protobufs into a columnar form and store it in BigTable (for optimization on the read side as a more real time query system)I had not recalled reading the dremel paper in a while, but after all the hoopla it was getting I decided to read it again. I came to a funny conclusion:There is another open source Dremel implementation. Like many of my "taco bell" exploits, my system does not require anything other then Hadoop and Hive. I know, I'm sorry, my system does not have 17 components and it's not 5,000,000 lines of source code. So its not sexy, I won't have 50 people in my company retweeting and calling it the most innovative thing in the last thousand years. I will not be unveiling it at a super huge conference with rap stars or anything like that.Anyway here it is: 1) You write protobuf to a sequence file2) You make a hive table declaring what protobuf objects are in the file3) The protobuf is converted to nested struct types4) You query the hive table using familiar hive syntax and lateral viewsLet us see how this works by making some protobuf objects. Cars make great examples.message TireMaker {  optional string maker = 1;  optional int32 price = 2;}message Tire {  optional int32 tirePressure = 1;  optional TireMaker tireMaker = 2;}message Accessory {  optional string name = 1;  optional string value = 2;}message Car {  repeated Tire tires = 1;  repeated Accessory accessories = 2; } Now write those to a sequence file. SequenceFile.Writer w = SequenceFile.createWriter(this.getFileSystem(),            new Configuration(), p, BytesWritable.class, BytesWritable.class);    TireMaker.Builder maker = TireMaker.newBuilder();    maker.setMaker("badyear");    Car.Builder car = Car.newBuilder();    Tire.Builder tire = Tire.newBuilder();    car.addTires(tire.setTireMaker(maker.build()).build());    BytesWritable key = new BytesWritable();    BytesWritable value = new BytesWritable();    ByteArrayOutputStream s = new ByteArrayOutputStream();    car.build().writeTo(s);    ByteArrayOutputStream t = new ByteArrayOutputStream();    key.set(s.toByteArray(), 0, s.size());    value.set(t.toByteArray(), 0, t.size());    w.append(key, value);    w.close();Then we create a hive table on top of this sequence file.    client.execute("create table "+table+" "            + " ROW FORMAT SERDE '" + ProtobufDeserializer.class.getName() +"'"             + " WITH SERDEPROPERTIES ('KEY_SERIALIZE_CLASS'='" +Ex.Car.class.getName()             + "' )"            + " STORED AS INPUTFORMAT '" +KVAsVSeqFileBinaryInputFormat.class.getName() + "'"             + " OUTPUTFORMAT '" + SequenceFileOutputFormat.class.getName() +"'");     client.execute("load data local inpath '" + p.toString() + "' into table "+table+" ");Then we query it.     client.execute("SELECT key FROM "+table);    List<String> results = client.fetchAll();    String expected="{\"accessoriescount\":0,\"accessorieslist\":[],\"tirescount\":1,\"tireslist\":[{\"tiremaker\":{\"maker\":\"badyear\",\"price\":null},\"tirepressure\":null}]}";Wow look at that! A system to query protobuf objects directly! Why is this nice? Well. It works. You can query it. You do not need a stack of 40 components to make it work. It's not a work in progress. Dremel and its clones "normalize" the object on insert by writing it to multiple tables/ column families/what have you. Hive-protobuf takes the other approach. We write the entire row as is to Hadoop. This is a trade off we are all familiar with. We are trading insertion speed for query speed. Both approaches have merit, possibly both can be used together.The inspiration for hive-protobuf came from the hive-avro work recently added to hive trunk. Which one could argue is another Dremel clone since it lets you query complex structured avro docs. [Less]
Posted over 11 years ago
DataStax Community Edition 1.1.6, which includes Apache Cassandra 1.1.6, is now available on our downloads page. All changes can be found in the CHANGES.txt file in the main installation directory.
Posted over 11 years ago
Forewords This guide only describes version 3 of the CQL language. To avoid confusion, we will try to use the term CQL3 rather than CQL most of the time, but even when we don’t, this is implicit. read more
Posted over 11 years ago
Forewords This guide only describes version 3 of the CQL language. To avoid confusion, we will try to use the term CQL3 rather than CQL most of the time, but even when we don’t, this is implicit. Furthermore, CQL3 is beta in 1.1 (the language ... [More] itself is beta, not just its implementation), and will only be final in Cassandra 1.2 (in beta itself at the time of this writing) and there are a few breaking syntax change and added features between CQL3 beta and its final version. This guide describe the final version of CQL3 and in particular may describe/use features that are not available in Cassandra 1.1. Introduction CQL3 (the Cassandra Query Language) provides a new API to work with Cassandra. Where the legacy thrift API exposes the internal storage structure of Cassandra pretty much directly, CQL3 provides a thin abstraction layer over this internal structure. This is A Good Thing as it allows hiding from the API a number of distracting and useless implementation details (such as range ghosts) and allows to provide native syntaxes for common encodings/idioms (like the CQL3 collections as we’ll discuss below), instead of letting each client or client library reimplement them in their own, different and thus incompatible, way. However, the fact that CQL3 provides a thin layer of abstraction means that thrift users will have to understand the basics of this abstraction if they want to move existing application to CQL3. This is what this post tries to address. It explains how to translate thrift to CQL3. In doing so, this post also explains the basics of the implementation of CQL3 and can thus be of interest for those that want to understand that. But before getting to the crux of the matter, let us have a word about when one should use CQL3. As described above, we believe that CQL3 is a simpler and overall better API for Cassandra than the thrift API is. Therefore, new projects/applications are encouraged to use CQL3 (though remember that CQL3 is not final yet, and so this statement will only be fully valid with Cassandra 1.2). But the thrift API is not going anywhere. Existing applications do not have to upgrade to CQL3. Internally, both CQL3 and thrift use the same storage engine, so all future improvements to this engine will impact both of them equally. Thus, this guide is for those that 1) have an existing application using thrift and 2) want to port it to CQL3. Finally, let us remark that CQL3 does not claim to fundamentally change how to model applications for Cassandra. The main modeling principles are the same than they always have been: efficient modeling is still based on collocating that data that are accessed together through denormalization and the use of the ordering the storage engine provides, and is thus largely driven by the queries. If anything, CQL3 claims to make it easier to model along those principles by providing a simpler and more intuitive syntax to implement a number of idioms that this kind of modeling requires. Vocabulary used in this post The thrift API has historically confuse people coming from the relational world with the fact that it uses the terms “rows” and “columns”, but with a different meaning than in SQL. CQL3 fixes that since in the model it exposes, row and columns have the same meaning than in SQL. We believe this to be an improvement for newcomers, but unfortunately, in doing so, it creates some temporary confusion when you want to switch from thrift to CQL3, as a “thrift” row doesn’t always map to a “CQL3″ row, and a “CQL3″ column doesn’t always map to a “thrift” column. To avoid that confusion, we will use the following conventions (that have already been used in previous posts too): We will always use “internal row” when we want to talk of a row as in thrift. We use the term “internal” because this correspond to the definition of a row in the internal implementation (which thrift exposes directly). Hence the term “row” alone will describe a CQL3 row, though we’ll sometime use the term “CQL3 row” to recall that fact. We will use the term “cell” instead of “column” for thrift/internal columns. And thus “column” (or “CQL3 column”) will always design CQL3 column. We will prefer the term “column family” for thrift and the term “table” for CQL3, though both terms can be considered as synonymous To sum up, for the purpose of this document, an internal row contains cells while a CQL3 row contains columns, and these two notions do not always map directly; this document will explain when these concepts match and when they do not. Standard column families Static Column family In thrift, a static column family is one where each internal row will have more or less the same set of cell names, and that set is finite. A typical example is user profiles. There is a finite number of properties (though those may evolve over time) in the profiles the application uses, and each concrete profile will have some subset of those properties. Such static column family would typically be defined in thrift (using the cli) by the following definition (to which we will later refer as definition 1): create column family user_profiles with key_validation_class = UTF8Type and comparator = UTF8Type and column_metadata = [ {column_name: first_name, validation_class: UTF8Type}, {column_name: last_name, validation_class: UTF8Type}, {column_name: email, validation_class: UTF8Type}, {column_name: year_of_birth, validation_class: IntegerType} ] And profile will be internally stored as: The functionally equivalent definition for such column family in CQL3 is (definition 2): CREATE TABLE user_profiles ( user_id text PRIMARY KEY, first_name text, last_name text, year_of_birth int ) WITH COMPACT STORAGE This CQL definition will store data in exactly the same way than the thrift definition above (we will come back to the reason for the use of the WITH COMPACT STORAGE option later, but suffice to say that it is required for the previous statement to be true). So for static column families, an internal/thrift row does correspond exactly to a CQL3 row. But while each thrift cells has a corresponding CQL3 column, CQL3 defines one column (user_id) that doesn't map to a cell, but rather map to the thrift row key (and obviously the PRIMARY KEY annotation is what make that happen). Now the attentive reader may have noticed that the CQL definition above has a little more information than the thrift one had. Namely, the CQL definition provides a name for the row key (i.e. user_id) that don't exist in the thrift definition. In other words, CQL3 uses more metadata than thrift and those metadata won't be there at first if you try to access the thrift column family created with definition 1. The way CQL3 deals with this is twofold: CQL3 picks a default name for the row key if none exists. This will be the case if you try to access (from CQL3) the column family created with definition 1. For the row key, that default name key. In other words, definition 1 is strictly equivalent to the following CQL definition: CREATE TABLE user_profiles ( key text PRIMARY KEY, first_name text, last_name text, year_of_birth int ) WITH COMPACT STORAGE And by equivalent, we mean that if you create user_profiles using definition 1, then from cqlsh you will have: cqlsh:test> SELECT * FROM user_profiles; key | email | first_name | last_name | year_of_birth --------+---------------+------------+-----------+--------------- tbomba | [email protected] | Tom | Bombadil | 1954 If you want to declare a more user friendly name than the default one, you can do so using ALTER TABLE user_profiles RENAME key TO user_id; Such statement declares the missing metadata to CQL3. It has no impact on the thrift side, but after that statement, you will be able to access the table from CQL3 as if it had been declared using definition 2 above. Dynamic Column family A dynamic column family (or column family with wide rows) is one where each internal row may contain completely different sets of cells. The typical example of such a column family is time series. For example, keeping for each user a timeline of all the links on which the user has clicked. Such a column family would be defined in thrift by (definition 3): create column family clicks with key_validation_class = UTF8Type and comparator = DateType and default_validation_class = UTF8Type And for a given user, its clicks will be internally stored as: In other words, all the urls clicked by one user will be stored in a single internal row. And because internal rows are sorted according to the comparator, and because the comparator in that case is DateTime the clicks will be in chronological order, allowing very efficient requests of all the clicks for a given user in a given time interval. In CQL3, the functionally equivalent definition would be (definition 4): CREATE TABLE clicks ( user_id text, time timestamp, url text, PRIMARY KEY (user_id, time) ) WITH COMPACT STORAGE That definition will store data in exactly the same way than definition 3 does. The difference with the static case is the compound primary key. The way CQL3 works, it will map the first component (user_id) of the primary key to the internal row key and the second component (time) to the internal cell name. And the last CQL3 column (url) will be mapped to the cell value. That is how CQL3 allows access to wide rows: by transposing one internal wide rows into multiple CQL3 rows, one per cell of the wide row. This is however just a different way to view the same information. Now, as in the static case, this definition has more information than its thrift counterpart (definition 3) as it provides user friendly names for the row key (user_id), the cell names (time) and the column values (url). And in that case too, CQL will pick default names if those don't exist: key for the row key as before, column1 for the cell name, and value for the cell value. In other words, the clicks column family defined in thrift by definition 3 is in fact strictly equivalent to the following CQL3 table: CREATE TABLE clicks ( key text, column1 timestamp, value text, PRIMARY KEY (key, column) ) WITH COMPACT STORAGE Again, "equivalent" means that you can access the column family created from thrift in CQL3 as if it had been defined with the preceding definition. So for instance, you can retrieve a time slice of clicks (making use of the internal sorting of wide rows) with: cqlsh:test> SELECT column1, value FROM clicks WHERE key = 'tbomba' AND column1 >= '2012-10-25 14:31:00' AND column1 < '2012-10-25 18:00:00'; column1 | value --------------------------+------------------------- 2012-10-25 14:33:14+0000 | http://www.amazon.fr 2012-10-25 17:47:05+0000 | http://www.datastax.com And this query will internally translate to a simple get_slice call. But of course you can define more use friendly name here again: ALTER TABLE clicks RENAME key TO user_id AND column1 TO time AND value TO url; after which the query above can be wrote as: cqlsh:test> SELECT time, url FROM clicks WHERE user_id = 'tbomba' AND time >= '2012-10-25 14:31:00' AND time < '2012-10-25 18:00:00'; time | url --------------------------+------------------------- 2012-10-25 14:33:14+0000 | http://www.amazon.fr 2012-10-25 17:47:05+0000 | http://www.datastax.com Mixing static and dynamic In the majority of cases, column families are used in either the static or the dynamic way describe above, both of which CQL3 handles natively as we've seen. In a few cases however, it can be useful to have a bit of dynamicity in otherwise static column families. A typical example is if you want to add tags to our user_profiles example above. There is two main ways to model this. You can create a separate tags dynamic column family that for each user stores its tags. In that case, this column family can simply be accessed through CQL3 as explained in the preceding section. Or you can directly add the tags to the user_profiles column family. That second technique has the advantage that reading an entire user profile with its tags requires only one read instead of two if you were to use the 2 column family solution. Technically, if you want to add the tags 'good guy' and 'friendly' to a user profile, you would insert a cell named 'tag:good guy' and one called 'tag:friendly' (both with empty values) to that user profile, and internally the profile would look like: As explained in a previous blog post, CQL3 provides a native way to perform that kind of technique with its support of collections. In fact, CQL3 sets are implemented with exactly the same technique as the tag example above: each element of the set is internally a separate cell with an empty value. The only difference is that with thrift, you have to prepend the string 'tag:' to each tag cell manually, and regroup all tags cells together on read, while CQL3 handles all that transparently for you when you use a set. Note that this why "exposing collections to Thrift" doesn't really make sense. Collections are exposed to Thrift, it's just that Thrift directly exposes the internal storage engine and thus exposes each element of the collection in its bare format. Another way to put it is that collections in CQL3 are just syntactic sugar for a (very) useful pattern. And such sugar can only be provided because the API is an abstraction over the storage engine. So, for people starting a new project in CQL3 directly, they should use collections when they need to mix static and dynamic behavior. Collections are a better interface for such things than doing it manually à la Thrift. However, if you have used something similar to the thrift tag example above, the upgrade to CQL3 will be less straightforward than in the case of pure static or pure dynamic column families. Namely, CQL3 still consider the column family static because the definition of user_profile is still the one of definition 1. But since the tag columns themselves are not declared (and cannot be, they are dynamically created), you will not be able to access them for CQL3 by default. The only solution to be able to access the column family fully is to remove the declared columns from the thrift schema altogether, i.e. to update the thrift schema with: update column family user_profiles with key_validation_class = UTF8Type and comparator = UTF8Type and column_metadata=[] Once you have done this, you will be able to access the column family as in the dynamic case, and so, after that update, you will have: cqlsh> SELECT * FROM user_profiles WHERE key = 'tbomba' key | column1 | value --------+---------------+--------------- tbomba | email | [email protected] tbomba | first_name | Tom tbomba | last_name | Bombadil tbomba | year_of_birth | \x07\xa2 This does have some downsides however: From an API perspective, a user profile will be exposed as multiple CQL rows (again, this is only the view the API exposes; internally the layout hasn't changed). This is mainly cosmetic, but is arguably uglier than having one CQL row per profile. Some server side validation of the static column values have been lost in the operation. Since the static column values validation types have been dropped, they are not available to your client library anymore. In particular, as can be seen in the output above, cqlsh display some value in a non human-readable format. And unless the client library exposes an easy way to force the deserialization format for a value, such deserialization will have to be done manually in client code. It is clear that upgrading to CQL3 from thrift if you have lots of existing column families that mix static and dynamic behavior (in the same column family) is a tad inconvenient. This is unfortunate but let's recall that: Thrift isn't going anywhere. What CQL3 brings is a much more user-friendly API. But CQL3 doesn't do anything that is fundamentally unachievable with thrift. Therefore, if you do have a lot of existing column families in which you mix static and dynamic behavior, it might be simpler to stick to thrift for those (you won't be worse off than if CQL3 did not exist) and keep CQL3 for new applications/projects. If you really want to upgrade to CQL3 and can afford it, a data migration from your existing column families to a standard (non-compact, see below) CQL3 table with the use of collections is obviously always possible. Composites There is one sub-case of dynamic column families that is worth discussing: the case where the comparator is a CompositeType. Say for instance that you need to store some events for users and you are interested in grouping said events by day (the day number of the year) and for each day, by the minute within a day the event occurred. This is a basic time series example so you could model it as the clicks column family from earlier on, but allow me this slightly artificial example and say you want to separate the day number from the minute within that day. You might define the following thrift column family: create column family events with key_validation_class = UTF8Type and comparator = 'CompositeType(IntegerType, IntegerType)' and default_validation_class = UTF8Type and for one user, its events may internally look like: where 4:120 represents the composite cell name composed of the two component 4 and 120. If you query this column family from CQL3, you will get: cqlsh:test> SELECT * FROM events; key | column1 | column2 | value --------+---------+---------+--------- tbomba | 4 | 120 | event 1 tbomba | 4 | 2500 | event 2 tbomba | 9 | 521 | event 3 tbomba | 10 | 3525 | event 4 In other words, as far as CQL3 is concerned, this column family is equivalent to: CREATE TABLE events ( key text, column1 int, column2 int, value text, PRIMARY KEY(key, column1, column2) ) WITH COMPACT STORAGE As you can see, the composite type is handled natively by CQL3, that maps one CQL3 column for each component of the composite cell name. Of course, as in the previous examples, you can rename those default names to more user friendly ones: cqlsh:test> ALTER TABLE events RENAME key TO id AND column1 TO day AND column2 TO minute_in_day AND value TO event; cqlsh:test> SELECT * FROM events; id | day | minute_in_day | event --------+-----+---------------+--------- tbomba | 4 | 120 | event 1 tbomba | 4 | 2500 | event 2 tbomba | 9 | 521 | event 3 tbomba | 10 | 3525 | event 4 Do note that in that case it is advised to rename all the columns in one ALTER TABLE statement like done above. Due to a technical limitation, if you rename the colums one by one, you could end up in a wrong situation. Non compact tables All the CQL3 definitions above have used the COMPACT STORAGE option. Indeed, column families created from Thrift will always map to a so-called compact table. To explain how non-compact CQL3 table differs from compact ones, consider the following non-compact CQL3 table: CREATE TABLE comments ( article_id uuid, posted_at timestamp, author text, karma int, content text, PRIMARY KEY (article_id, posted_at) ) This table stores comments made on some articles. Each comment is a CQL row that is identified by the article_id it is a comment of, and the time at which it has been posted, and each thus identified comment is composed of an author name, a content text and a 'karma' integer (some value representing what how much users like this comment). As in the dynamic column family example from the previous section, we have a composite primary key, and here again, article_id will be mapped to the internal row key and posted_at will be mapped to (the first part of) the cell name. However, in the previous section, the definition had just one CQL3 column not part of the primary key (and it would have been a bug to declare more than one column, due to the use of compact storage) which was mapped internally to the cell value. But here we have 3 CQL3 columns not part of the primary key. The way this is handled is that internally this comments table will use a CompositeType comparator whose first component will be mapped to posted_at, and the second one will be the name of the column the cell represents. That is, for a given article, the internal layout of the definition above will be: So this comment table uses wide rows internally, but each CQL3 row actually maps to a small slice of internal cells (note that the first "empty" cell for each CQL3 row internal slice is not a mistake, but is here for implementation details). In other words, non-compact CQL3 tables map rows to static slices within wide ordered internal rows. This is very useful for things like materialized views that are common in Cassandra. And yes, this is very similar to what super columns give you in thrift (see below for a discussion on super columns), but is somewhat better than super columns in that in this example, some of the CQL3 columns are text (author, content), while others are integers (karma), which you cannot do with super columns (you would have to use BytesType as sub-comparator or store the karma integer as a string). This also allows updating and deleting each individual column since they are broken out into individual storage engine cells. But what about truly static tables. How does the definition 2 from the first section (that uses COMPACT STORAGE) differs from: CREATE TABLE user_profiles ( user_id text PRIMARY KEY, first_name text, last_name text, year_of_birth int ) i.e, the same definition but without the COMPACT STORAGE option. The difference is that the definition above will internally use a CompositeType comparator with a single UTF8Type component, instead of an UTF8Type comparator. This may seem wasteful (technically, the use of CompositeType adds 2 bytes of overhead per internal cells (hence the compact/non-compact terms)), but the reason for that is collections support. Internally, collections require the use of CompositeType. In other words, with the definition above you can do: ALTER TABLE user_profiles ADD tags TYPE set but you cannot do so if COMPACT STORAGE is used. Please do note that while non-compact table are less "compact" internally, we highly suggest using them for new developments. The extra possibility of being able to evolve your table with the use of collections later (even if you don't think you will need it at first) is worth the slight storage overhead -- overhead that is further diminished by sstable compression, which is enabled by default since Cassandra 1.1.0. A word on super columns As we've seen in the previous section, CQL3 natively supports the same use cases that super columns are able to support, and the CQL3 way has less limitations than super columns have. However, existing super column families cannot, at the time of this writing, be accessed from CQL3. CASSANDRA-3237 is open to internally replace super columns by an equivalent encoding using composite columns. That encoding will be the encoding of non-compact CQL3 table described in the previosu section. Hence, as soon as that patch lands, super columns will be fully available through CQL3 as normal CQL3 table. This is on the roadmap for Cassandra 1.3. [Less]
Posted over 11 years ago
This article is one in a series of quick-hit interviews with companies using Apache Cassandra and/or DataStax Enterprise for key parts of their business.  For this interview, we spoke with Gary Ogasawara who is VP of Engineering for Cloudian. ... [More] DataStax: Gary, we you know you’re busy and we appreciate you taking the time to tell us what you guys are doing at Cloudian. Gary: Sure.read more [Less]