Integration¶
Integration with an existing Cassandra cluster¶
Elassandra includes a modified version of Cassandra, available at strapdata-cassandra repro, so all nodes of a cluster should run Elassandra binaries. However, you can start a node with or without the Elasticsearch support. Obviously, all nodes of a datacenter should run Cassandra only or Cassandra with Elasticsearch.
Rolling upgrade from Cassandra to Elassandra¶
Before starting any Elassandra node with Elasticsearch enabled, do a rolling replace of the Cassandra binaries with the Elassandra ones. For each node :
- Install Elassandra.
- Replace the Elassandra configuration files (cassandra.yaml and snitch configuration file) with the ones from your existing cluster.
- Bind the Elassandra data folder to the existing Cassandra data folder
- Stop your Cassandra node.
- Restart Cassandra
elassandra bin/cassandra
or Cassandra with Elasticsearch enabledelassandra bin/cassandra -e
Create a new Elassandra datacenter¶
The overall procedure is similar to the Cassandra one described in Adding a datacenter to a cluster.
For each node in your new datacenter :
- Install Elassandra.
- Set
auto_bootstrap: false
in your conf/cassandra.yaml. - Start Cassandra-only nodes in your new datacenter and check that all nodes join the cluster.
bin/cassandra
- Restart all nodes in your new datacenter with Elasticsearch enabled. You should see started shards but empty indices.
bin/cassandra -e
- Set the replication factor of indexed keyspaces to one or more in your new datacenter.
- Pull data from your existing datacenter.
nodetool rebuild <source-datacenter-name>
After rebuilding all of your new nodes, you should see the same number of documents for each index in your new and existing datacenters.
- Set
auto_bootstrap: true
(default value) in your conf/cassandra.yaml - Create new Elasticsearch index or map some existing Cassandra tables.
Tip
If you need to replay this procedure for a node :
- stop your node
- nodetool removenode <id-of-node-to-remove>
- clear data, commitlogs and saved_cache directories.
Installing Elasticsearch plugins¶
Elasticsearch plugin installation remains unchanged, see Elasticsearch plugin installation.
- bin/plugin install <url>
Running Kibana with Elassandra¶
Kibana can run with Elassandra, providing a visualization tool for Cassandra and Elasticsearch data.
- If you want to load sample data from the Kibana Getting started, apply the following changes to logstash.jsonl with a sed command.
s/logstash-2015.05.18/logstash_20150518/g
s/logstash-2015.05.19/logstash_20150519/g
s/logstash-2015.05.20/logstash_20150520/g
s/article:modified_time/articleModified_time/g
s/article:published_time/articlePublished_time/g
s/article:section/articleSection/g
s/article:tag/articleTag/g
s/og:type/ogType/g
s/og:title/ogTitle/g
s/og:description/ogDescription/g
s/og:site_name/ogSite_name/g
s/og:url/ogUrl/g
s/og:image:width/ogImageWidth/g
s/og:image:height/ogImageHeight/g
s/og:image/ogImage/g
s/twitter:title/twitterTitle/g
s/twitter:description/twitterDescription/g
s/twitter:card/twitterCard/g
s/twitter:image/twitterImage/g
s/twitter:site/twitterSite/g
JDBC Driver sql4es + Elassandra¶
The Elasticsearch JDBC driver. can be used with Elassandra. Here is a code example :
Class.forName("nl.anchormen.sql4es.jdbc.ESDriver");
Connection con = DriverManager.getConnection("jdbc:sql4es://localhost:9300/twitter?cluster.name=Test%20Cluster");
Statement st = con.createStatement();
ResultSet rs = st.executeQuery("SELECT user,avg(size),count(*) FROM tweet GROUP BY user");
ResultSetMetaData rsmd = rs.getMetaData();
int nrCols = rsmd.getColumnCount();
while(rs.next()){
for(int i=1; i<=nrCols; i++){
System.out.println(rs.getObject(i));
}
}
rs.close();
con.close();
Running Spark with Elassandra¶
For Elassandra 5.5, a modified version of the elasticsearch-hadoop connector is available for Elassandra on the strapdata repository. This connector works with spark as described in the Elasticsearch documentation available at elasticsearch/hadoop.
For example, in order to submit a spark job in client mode:
bin/spark-submit --driver-class-path <yourpath>/elasticsearch-spark_2.10-2.2.0.jar --master spark://<sparkmaster>:7077 --deploy-mode client <application.jar>