Friday, March 2, 2012

Cassandra Sample Application

Book Review and Excerpt
Cassandra: The Definitive Guide by Eben Hewitt
Chapter 4


My previous post on Cassandra gave you an overview on Cassandra’s design goals, data model, and some general behavior characteristics.

Let's write some code now!

The sample application of this book has been tested against 0.7 beta 1 Cassandra release. I strongly recommend to use this version for getting started unless you are eager to rewrite the sample application for Cassandra 1.0.7 or later.

Please note that the author of Cassandra: The Definitive Guide posted his book code here.

For this example, we'll use a hotel that wants to allow guests to book a reservation.

Contrary to a relational modeling, in Cassandra you don’t start your application with the data model; you start with the query model.
So first, determine your queries:
• Find hotels in a given area.
• Find information about a given hotel, such as its name and location.
• Find points of interest near a given hotel.
• Find an available room in a given date range.
• Find the rate and amenities for a room.
• Book the selected room by entering guest information.

Cassandra Design


 

Loading the schema 
Put your schema definition in cassandra.yaml
 

keyspaces:
- name: Hotelier
replica_placement_strategy: org.apache.cassandra.locator.RackUnawareStrategy
replication_factor: 1
column_families:
  - name: Hotel
 compare_with: UTF8Type
 - name: HotelByCity
 compare_with: UTF8Type
 - name: Guest
 compare_with: BytesType
 - name: Reservation
 compare_with: TimeUUIDType
 - name: PointOfInterest
 column_type: Super
 compare_with: UTF8Type
 compare_subcolumns_with: UTF8Type
 - name: Room
 column_type: Super
 compare_with: BytesType
 compare_subcolumns_with: BytesType
 - name: RoomAvailability
 column_type: Super
 compare_with: BytesType
 compare_subcolumns_with: BytesType


Once you have the schema defined in YAML, you need to load it:
• open a console
• start the jconsole application
• connect to Cassandra via JMX.
• execute the operation loadSchemaFromYAML, which is part of the org.apache.cassandra.service.StorageService MBean.
 

Now Cassandra knows about your schema and you can start using it. You can also use the API itself to create keyspaces and column families.

Getting a Connection
TTransport tr = new TSocket("localhost", 9160);
// returns a new connection to our keyspace
public Cassandra.Client connect() throws TTransportException,
TException, InvalidRequestException {
TFramedTransport tf = new TFramedTransport(tr);
TProtocol proto = new TBinaryProtocol(tf);
Cassandra.Client client = new Cassandra.Client(proto);
tr.open();
client.set_keyspace(KEYSPACE);
return client; 

[...]
tr.close();

Prepopulating the Database 

// Insert in Cassandra
// insertByCityIndex(String rowKey, String hotelName)
Clock clock = new Clock(System.nanoTime());
Column nameCol = new Column(hotelName.getBytes(UTF8), new byte[0], clock);
ColumnOrSuperColumn nameCosc = new ColumnOrSuperColumn();
nameCosc.column = nameCol;
Mutation nameMut = new Mutation();
nameMut.column_or_supercolumn = nameCosc;
//set up the batch
Map<String, Map<String, List<Mutation>>> mutationMap =
  new HashMap<String, Map<String, List<Mutation>>>();
Map<String, List<Mutation>> muts =
 new HashMap<String, List<Mutation>>();
List<Mutation> cols = new ArrayList<Mutation>();
cols.add(nameMut);
String columnFamily = "HotelByCity";
muts.put(columnFamily, cols);
//outer map key is a row key
//inner map key is the column family name
mutationMap.put(rowKey, muts);
//create representation of the column
ColumnPath cp = new ColumnPath(columnFamily);
cp.setColumn(hotelName.getBytes(UTF8));
ColumnParent parent = new ColumnParent(columnFamily);
//here, the column name IS the value (there's no value)
Column col = new Column(hotelName.getBytes(UTF8), new byte[0], clock);
client.insert(rowKey.getBytes(), parent, col, CL);
LOG.debug("Inserted HotelByCity index for " + hotelName);

// Batch Mutate
// insertAllHotels()
String columnFamily = "Hotel";
//row keys
String cambriaKey = "AZC_043";
String clarionKey = "AZS_011";
String wKey = "CAS_021";
String waldorfKey = "NYN_042";
//conveniences
Map<byte[], Map<String, List<Mutation>>> cambriaMutationMap =
 createCambriaMutation(columnFamily, cambriaKey);
Map<byte[], Map<String, List<Mutation>>> clarionMutationMap =
 createClarionMutation(columnFamily, clarionKey);
Map<byte[], Map<String, List<Mutation>>> waldorfMutationMap =
 createWaldorfMutation(columnFamily, waldorfKey);
Map<byte[], Map<String, List<Mutation>>> wMutationMap =
 createWMutation(columnFamily, wKey);
client.batch_mutate(cambriaMutationMap, CL);
LOG.debug("Inserted " + cambriaKey);
client.batch_mutate(clarionMutationMap, CL);
LOG.debug("Inserted " + clarionKey);
client.batch_mutate(wMutationMap, CL);
LOG.debug("Inserted " + wKey);
client.batch_mutate(waldorfMutationMap, CL);
LOG.debug("Inserted " + waldorfKey);

The Search Application
HotelApp.java shows using Materialized View pattern, get, get_range_slices, key slices. Have a look yourself:
 

// Use column slice to get from Super Column
// findPOIByHotel(String hotel)
SlicePredicate predicate = new SlicePredicate();
SliceRange sliceRange = new SliceRange();
sliceRange.setStart(hotel.getBytes());
sliceRange.setFinish(hotel.getBytes());
predicate.setSlice_range(sliceRange);
// read all columns in the row
String scFamily = "PointOfInterest";
ColumnParent parent = new ColumnParent(scFamily);
KeyRange keyRange = new KeyRange();
keyRange.start_key = "".getBytes();
keyRange.end_key = "".getBytes();
List<POI> pois = new ArrayList<POI>();
//instead of a simple list, we get a map whose keys are row keys
//and the values the list of columns returned for each
//only row key + first column are indexed
Connector cl = new Connector();
Cassandra.Client client = cl.connect();
List<KeySlice> slices = client.get_range_slices(
parent, predicate, keyRange, CL);
[...]
// Use key range
// findHotelByCity(String city, String state)
LOG.debug("Seaching for hotels in " + city + ", " + state);
String key = city + ":" + state.toUpperCase();
//query
SlicePredicate predicate = new SlicePredicate();
SliceRange sliceRange = new SliceRange();
sliceRange.setStart(new byte[0]);
sliceRange.setFinish(new byte[0]);
predicate.setSlice_range(sliceRange);
// read all columns in the row
String columnFamily = "HotelByCity";
ColumnParent parent = new ColumnParent(columnFamily);
KeyRange keyRange = new KeyRange();
keyRange.setStart_key(key.getBytes());
keyRange.setEnd_key((key+1).getBytes()); //just outside lexical range
keyRange.count = 5;
Connector cl = new Connector();
Cassandra.Client client = cl.connect();
List<KeySlice> keySlices =
client.get_range_slices(parent, predicate, keyRange, CL);
 

Note: This sample application uses the Thrift API. Thrift is changing to Avro, so although the basic ideas here work, you don’t want to follow this example in a real application. Instead use one of the many available third-party clients for Cassandra (we'll come back to these options later in another post).

Twissandra
When you start thinking about how to design for Cassandra, take a look at Twissandra, written by Eric Florenzano.
Visit http://www.twissandra.com to see a fully working Twitter clone that you can download and try out. The source is all in Python, and it has a few dependencies on Django and a JSON library to sort out, but it’s a great place to start. You can use what’s likely a familiar data model (Twitter’s) and see how users, time lines, and tweets all fit into a simple Cassandra data model.

There is also a helpful post by Eric Evans explaining how to use Twissandra, which is  available at http://www.rackspacecloud.com/blog/2010/05/12/cassandra-by-example.
 

By now you should have a good idea of a complete, working Cassandra application.

If this post has open your apatite for the subject, you can find much more information in the book itself: Cassandra: The Definitive Guide

The excerpt is from the book, 'Cassandra: The Definitive Guide', authored by Eben Hewit, published November 2010 by O’Reilly Media, Copyright 2011 Eben Hewitt.

No comments:

Post a Comment