Local installation of standalone HBase and Apache Storm simple cluster

We mainly use Apache Storm for streaming processing and Apache HBase as NoSQL wide-column database.

Even if Apache Cassandra is a great NoSQL database, we mostly prefer HBase because of Cloudera distribution and as it is more consistent (check CAP theorem) than Cassandra.

HBase is based on HDFS, but it can be easy installed as standalone for testing purposes. You just need to download latest version, extract compressed file, start standalone node and then start an HBase shell and play.

$> tar zxvf hbase-1.1.2-bin.tar.gz
$> cd hbase-1.1.2/bin/
$> ./start-hbase.sh
$> ./hbase shell
hbase(main):001:0> create 'DummyTable', 'cf'
hbase(main):001:0> scan 'DummyTable'

When you start HBase in standalone mode, then it automatically starts a local Zookeeper node too (running in default port 2181).

$> netstat -anp|grep 2181

Zookeeper is used by HBase and Storm as a distributed coordinator mechanism. Now, as you have already running a local Zookeeper node, then you are ready to configure and run a local Storm cluster.

  • Download latest Storm
  • Extract
  • Configure “STORM_HOME/conf/storm.yaml” (check below)
  • Start local cluster:
    • $> cd STORM_HOME/bin
    • $> ./storm nimbus
    • $> ./storm supervisor
    • $> ./storm ui
  • Logs are located at “STORM_HOME/logs/” directory
  • Check local Storm UI at: localhost:8080

Contents of new “storm.yaml” configuration file:

storm.zookeeper.servers:
- "localhost"

nimbus.host: "localhost"

supervisor.slots.ports:
- 6701
- 6702

You can also set parameter “worker.childopts” to set JVM options for each Worker (processing nodes). Here is a simple example for my local JVMs, where I set min/max heap size, garbage collection strategy, enable JXM and GC logs.

worker.childopts: "-server -Xms512m -Xmx2560m -XX:PermSize=128m -XX:MaxPermSize=512m -XX:+UseParallelOldGC -XX:ParallelGCThreads=3 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -verbose:gc -Xloggc:/tmp/gc-storm-worker-%ID%.log -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.port=1%ID% -XX:+PrintFlagsFinal -Djava.awt.headless=true -Djava.net.preferIPv4Stack=true"

ParameterĀ “worker.childopts” is loaded by all the Worker JVM nodes. Variable “%ID%” corresponds to portĀ (6701 or 6702) assigned to each Worker. As you can see, I have used it to enable different JMX port for each worker and different GC log file.

We are using Storm using JDK 7, but JDK 8 seems to be compatible too. Latest Storm has switched from Logback to Log4j2 (check full release notes here and here).

Using the above instructions, you will be able to run HBase and Storm mini cluster in your laptop without any problem.

Regards,
Adrianos Dadis.

Real Democracy requires Free Software

Advertisement

About Adrianos Dadis

Building Big Data & Stream processing solutions in many business domains. Interested in distributed systems and enterprise integration.
This entry was posted in Administration, Big Data and tagged , , , . Bookmark the permalink.

1 Response to Local installation of standalone HBase and Apache Storm simple cluster

  1. Pingback: Links & Reads from 2015 Week 50 – Martin's Weekly Curations

Post your thought

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s