Cassandra 4.0 RC lands in a major milestone for the open source distributed database
Lots of important extras including Full Query Logging...
Apache Cassandra 4.0 RC* has landed, with the latest iteration of the widely used distributed database bringing a range of eagerly anticipated features, including a claimed 5X improvement on data streaming speed during scaling operations, enterprise-grade auditing, live query logging, Java 11 support and more.
As DataStax VP of Developer Relations Patrick McFadin put it: "This is a big milestone for the project. The amount of time that has gone into building, testing and verifying a distributed database like this is unprecedented."
Apache Cassandra is widely used in the mission-critical systems of some huge companies, from major banks to Netflix and eBay. The database is favoured for demanding, internet-facing applications running at scale, and can be reliably deployed on utility servers. Getting a stable release rich with new features out of the door has taken some heroic efforts all around by the community, which is celebrating the imminent GA release with an "Apache Cassandra World Party" (three sessions on April 28, at 6am-7am BST, noon-1pm PM BST, and 10-11pm BST.)
See also: DataStax Astra just took Cassandra Serverless. That’s a game-changer.
Among the additions landing with Apache Cassandra 4.0 are features to support live query logging. FQL (Full Query Logging) is safe for production use, Cassandra's contributors say, with configurable limits to heap memory and disk space to prevent out-of-memory errors; a feature designed to support live traffic capture, as well as traffic replay; it can also be used for both debugging query traffic and migration.
(New nodetool
options are also added to enable, disable or reset FQL, as well as a new tool to read and replay the binary logs. The full query logging (FQL) capability uses Chronicle-Queue to rotate a log of queries.)
Also new in Cassandra 4.0
Cassandra 4.0 makes several improvements to streaming: i.e. how Cassandra cluster nodes exchange data in the form of SSTables -- the immutable data files that Cassandra uses for persisting data on disk.
https://twitter.com/cassandra/status/1386569105134784518
Streaming of SSTables is performed for several operations, including host replacement, rebuilds, and cluster expansion. Pre-Cassandra 4.0, during streaming Cassandra reified SSTables into objects.
Per the release notes, this "creates unnecessary garbage and slows down the whole streaming process as some SSTables can be transferred as a whole file rather than individual partitions... Cassandra 4.0 has added support for streaming entire SSTables when possible for faster Streaming using ZeroCopy APIs. If enabled, Cassandra will use ZeroCopy for eligible SSTables significantly speeding up transfers and increasing throughput… Zero copy streaming is hardware bound; only limited by the hardware limitations (Network and Disk IO )."
Explore more new features here. Download Cassandra 4.0 RC here.
Follow The Stack on LinkedIn
*RC = release candidate -- a pre full-fat GA release typically made when critical bugs are reported fixed in the project's issue queue. Should be stable at this point.