Integration¶

Integration with an existing Cassandra cluster¶

Elassandra includes a modified version of Cassandra, available at strapdata-cassandra repro, so all nodes of a cluster should run Elassandra binaries. However, you can start a node with or without the Elasticsearch support. Obviously, all nodes of a datacenter should run Cassandra only or Cassandra with Elasticsearch.

Rolling upgrade from Cassandra to Elassandra¶

Before starting any Elassandra node with Elasticsearch enabled, do a rolling replace of the Cassandra binaries with the Elassandra ones. For each node :

Install Elassandra.
Replace the Elassandra configuration files (cassandra.yaml and snitch configuration file) with the ones from your existing cluster.
Bind the Elassandra data folder to the existing Cassandra data folder
Stop your Cassandra node.
Restart Cassandra elassandra bin/cassandra or Cassandra with Elasticsearch enabled elassandra bin/cassandra -e

Create a new Elassandra datacenter¶

The overall procedure is similar to the Cassandra one described in Adding a datacenter to a cluster.

For each node in your new datacenter :

Install Elassandra.
Set auto_bootstrap: false in your conf/cassandra.yaml.
Start Cassandra-only nodes in your new datacenter and check that all nodes join the cluster.

bin/cassandra

Restart all nodes in your new datacenter with Elasticsearch enabled. You should see started shards but empty indices.

bin/cassandra -e

Set the replication factor of indexed keyspaces to one or more in your new datacenter.
Pull data from your existing datacenter.

nodetool rebuild <source-datacenter-name>

After rebuilding all of your new nodes, you should see the same number of documents for each index in your new and existing datacenters.

Set auto_bootstrap: true (default value) in your conf/cassandra.yaml
Create new Elasticsearch index or map some existing Cassandra tables.

Tip

If you need to replay this procedure for a node :

stop your node
nodetool removenode <id-of-node-to-remove>
clear data, commitlogs and saved_cache directories.

Installing Elasticsearch plugins¶

Elasticsearch plugin installation remains unchanged, see Elasticsearch plugin installation.

bin/plugin install <url>

Running Kibana with Elassandra¶

Kibana can run with Elassandra, providing a visualization tool for Cassandra and Elasticsearch data.

If you want to load sample data from the Kibana Getting started, apply the following changes to logstash.jsonl with a sed command.

s/logstash-2015.05.18/logstash_20150518/g
s/logstash-2015.05.19/logstash_20150519/g
s/logstash-2015.05.20/logstash_20150520/g

s/article:modified_time/articleModified_time/g
s/article:published_time/articlePublished_time/g
s/article:section/articleSection/g
s/article:tag/articleTag/g

s/og:type/ogType/g
s/og:title/ogTitle/g
s/og:description/ogDescription/g
s/og:site_name/ogSite_name/g
s/og:url/ogUrl/g
s/og:image:width/ogImageWidth/g
s/og:image:height/ogImageHeight/g
s/og:image/ogImage/g

s/twitter:title/twitterTitle/g
s/twitter:description/twitterDescription/g
s/twitter:card/twitterCard/g
s/twitter:image/twitterImage/g
s/twitter:site/twitterSite/g

JDBC Driver sql4es + Elassandra¶

The Elasticsearch JDBC driver. can be used with Elassandra. Here is a code example :

Class.forName("nl.anchormen.sql4es.jdbc.ESDriver");
Connection con = DriverManager.getConnection("jdbc:sql4es://localhost:9300/twitter?cluster.name=Test%20Cluster");
Statement st = con.createStatement();
ResultSet rs = st.executeQuery("SELECT user,avg(size),count(*) FROM tweet GROUP BY user");
ResultSetMetaData rsmd = rs.getMetaData();
int nrCols = rsmd.getColumnCount();
while(rs.next()){
    for(int i=1; i<=nrCols; i++){
         System.out.println(rs.getObject(i));
     }
}
rs.close();
con.close();

Running Spark with Elassandra¶

For Elassandra 5.5, a modified version of the elasticsearch-hadoop connector is available for Elassandra on the strapdata repository. This connector works with spark as described in the Elasticsearch documentation available at elasticsearch/hadoop.

For example, in order to submit a spark job in client mode:

bin/spark-submit --driver-class-path <yourpath>/elasticsearch-spark_2.10-2.2.0.jar  --master spark://<sparkmaster>:7077 --deploy-mode client <application.jar>