LearningPatterns: Your Global Source for Java Training, Mentoring, and Consulting
Home | About LearningPatterns | Our Services | Contact Us | Printer Friendly Link

Course Description:

Cassandra Next Generation for Developers with CQL3 and the new Java Driver

The Cassandra (C*) NoSQL database is one of the most powerful and widely used non-relational databases available today. It is a fault tolerant, highly scalable database with tunable consistency that meets the demanding requirements of the "can't fail, must scale" systems driving growth for many of the most successful enterprises of today. However, along with that capability comes a new data and programming model that many organizations lack the expertise to use in an optimal fashion.

This course provides a technical introduction to all the conceptual and practical areas needed to use Cassandra successfully. It is written expressly for the new capabilities in the recent versions of C*, including CQL3 and the new DataStax Java driver. The course provides a solid foundation in the architecture and data model of C* and how to work with it. It covers CQL3 in detail, as well as important data modeling techniques to optimize your usage of the database. It includes in-depth coverage of the new Java driver for C*, as well as a full-scale application based on a stock-trading system (StockWatcher) that uses the driver.

After taking this course, you will be ready to work with Cassandra in an informed and productive manner, including using CQL3 and the new Java driver. You will be aware of some common pitfalls as well as best practices for creating your data model and applications. You will gain a clear understanding of how C* works, and be fully prepared to use it in production systems.

Course Information

Duration: 3 days

Hands-on: Minimum 50% hands-on

Prerequisites Reasonable Java experience for the Java driver labs, some knowledge of databases

Supported Platforms: Cassandra 1.2 on Linux Operating Systems (VM provided for labs)

Learning Objectives (Not necessarily in presentation sequence)


  • Understand the motivation for non-relational data stores
    • Be familiar with the needs/challenges of modern applications
    • Understand what Big Data is
    • Understand the types of use cases that non-relational data stores are designed to address
    • Understand why relational databases don't support big data applications well
    • Be familiar with how non-relational databases handle Big Data(short)
  • Be familiar with Cassandra at a high-level
    • Use cases
    • Strengths (Scalability, robustness, linear performance with scale-out), etc.
    • High Level Structure
      • Fully Distributed, Peer-to-Peer organization, no single point of failure
      • Data Replication
      • Multi-Data Center support
      • Durable, log-structured storage engine
      • Tight integration with storage engine and locally managed storage
    • Scalability on commodity hardware
    • Features
    • Benchmarks

Basic Cassandra Architecture

  • Understand the basic Cassandra architecture, including
    • The cluster structure - Nodes, virtual nodes, and ring topology
    • Tokens, Partitioners, and data distribution
    • Data replication and the Replication Factor
    • The role a keyspace plays
    • Consistency and the CAP theorem
    • The Gossip protocol
    • Seed nodes
  • Be familiar with basic installation / setup of Cassandra, and how an installation is structured

Data Modeling

  • Learn the basics of the Cassandra Data Model
    • Single primary key tables and how to define them using CQL
    • Defining Columns and their Data Types
    • Primary key / Partition key
  • Learn CQL Basics
    • Creating / using / dropping a keyspace (CREATE / USE / DROP KEYSPACE and associated options)
    • Creating / using a table (column family) with a single Primary Key (CREATE TABLE and associated options)
    • Inserting / mutating / retrieving data (INSERT, UPDATE, SELECT)
    • Understand limitations on WHERE clauses
      • For primary key, for non-primary, non-indexed columns
    • Import and export data
  • Learn and use cqlsh
    • Execute CQL statements and connect to a particular node
    • Use command history
    • Execute commands from a file
    • Use tracing within cqlsh
  • Know the standard CQL data types
    • Standard numerical and text types
    • Timestamps and timeuuid
    • Collections
    • Other types - uuid, boolean, blob, counter, etc.
  • Understand and use compound primary keys
    • CQL table definition
    • The partition key and clustering columns
    • CQL mapping of data to rowsets
    • Internal storage view
    • Clustering order, ORDER BY, and CLUSTERING ORDER BY
    • Using multiple clustering columns
    • Filtering results and ALLOW FILTERING
  • Understand and use composite partition keys
    • Motivation and uses
    • CQL definition with composite partition key
    • Effect on partitioning
    • Internal storage view
  • Understand and use collections
    • Motivation and uses
    • CQL definition (set, list, and map)
    • Inserting, updating, deleting with a collection
    • Limitations (no indexing, retrieve complete collection, etc)
    • Internal storage view
  • Understand and use expiring columns / Time To Live (TTL)
    • Definition in CQL
    • Effects
  • Understand and use secondary indexes
    • Motivation and uses
    • Defining secondary index in CQL
    • Known Patterns and Anti-patterns for secondary index usage
    • Known limitations of secondary indexes, including practical guidelines
  • Understand and use counters
    • Motivation and uses
    • Structure, characteristics, and usage
    • Defining and using counters in CQL
    • Limitations
  • Understand and use batches
    • Atomic and UNLOGGED batches
    • Counter batches
  • Schema changes in Cassandra
    • Altering tables using CQL (1.2 and 2.0 capabilities)
    • Impact on existing data
  • Be familiar with aggregate data in Cassandra
    • What is supported
    • Strategies for implementing other aggregates
  • Be aware of best practices for data modeling
    • Query Driven
    • Understand denormalization / materialized views and their usage
    • Impact of internal storage structure on queries
    • Optimizing data model for queries
  • Be familiar with typical Cassandra data models and their characteristics
    • "Twissandra" and other social-networking type models
    • Time series type models
  • Cassandra Architecture

    • Be familiar with the cluster structure
      • Tokens, nodes and rings
      • Virtual nodes
      • Snitches
    • Understand how Cassandra partitions/replicates data in a ring
      • Partition key / token mapping
      • Partitioners
      • Replication, Replication Factor (RF), Replication Strategies
      • Multi-Data Center support and partitioning
    • Understand Eventual Consistency and how it's implemented in Cassandra
      • The CAP theorem and its ramifications
      • Strong / weak / eventual consistency
      • Synchronous / asynchronous replication
      • Consistency levels and their uses
      • Querying at different consistency levels
      • Lightweight transactions (Cassandra 2.0)
      • Repair Mechanisms
    • Be familiar with the internal storage structure of Cassandra
      • Internal cell structure (name/value/timestamp)
      • Wide row structure
      • Compound primary keys and clustering column - internal storage view
    • Be familiar at a high level with how writes work in the Cassandra storage engine, including:
      • Commit log
      • No read-before-write
      • Memtable / SSTable structure
      • Compression
      • Compaction
      • Deletes, tombstones and their implications
    • Be familiar at a high level with how reads work in Cassandra
      • Bloom Filters
      • Fragments
      • Row and key caches (and when we should use each of them)

    Java API (New DataStax Java Driver)

    • Understand the structure of the Java driver
      • High-level architecture
      • Features (connection pooling, node discovery, automatic failover, load balancing)
      • Query capabilities (synchronous, asynchronous, prepared statements, query builder, etc.)
      • Fluent interface for query and cluster building
      • Binary protocol, capabilities and configuration on server
    • Library requirements and environment setup
      • Library downloads and dependencies
      • Environment setup
    • Use the basic API to work with Cassandra
      • Cluster and Cluster.Builder: Configuring a cluster connection
      • Session: Connecting to a cluster and executing queries
      • ResultSet and Row: Getting query results
      • PreparedStatement and BoundStatement: Reusing statements for queries
      • CQL3 to Java type mapping
      • Working with UUIDs
    • Using QueryBuilder to build queries "fluently"
      • QueryBuilder overview
      • Building queries
      • Executing Queries
    • Asynchronous Queries
      • Using executeAsync()
      • Future and ResultSetFuture
      • Retrieving asynchronous query results
    • Debugging
      • Enabling and using tracing
      • Monitoring metrics
    • Connection configuration
      • Connection Options (port, compressions)
      • Pooling Options
      • Socket Options
      • Tuning Policies (load balancing, reconnection, retry policies)

    Cassandra in Practice / Patterns / Anti-Patterns

    • Data Modeling Techniques
    • Anti-Patterns
    • Consistency Levels

    Working with a larger application

    • Run the StockWatcher application - accessing C* data and visualizing via a Spring MVC Web app and graphing controls
    • Run the StockWatcher simulator engine that inserts real stock trading data into the C* database
    • Examine and work with the source code for both (part of Java driver labs)
    LearningPatterns Inc. | http://www.LearningPatterns.com | tel:212.487.9064 | e-mail:
    Copyright © 2009 LearningPatterns Inc. All rights reserved. LearningPatterns is a trademark of LearningPatterns Inc. Java, and all Java-based trademarks are registered trademarks of Sun Microsystems in the U.S. and other countries. JBoss is a registered trademark of Red Hat, Inc. in the U.S. and other countries. IBM and Rational are registered trademarks of IBM Corp. in the U.S. and other countries. All other products and company names mentioned here may be trademarks of their respective owners.