Tuesday, September 13, 2011

Simple Hbase Fixing Steps

Hbase/Hadoop is not perfect like any other system. This is especially true in this case because its a relatively newer technology and things can go completely wrong by even a small problem like accidental shutdown. Re-install is really trouble and can even worsen situation in such cases. Hadoop is relatively stable so you will rarely have problem of Hadoop by simple mistakes like accidental shutdown. But, Hbase is notorious in such things. So, this step is only for cases where you have hadoop working perfectly fine (do sudo -u hdfs hadoop dfsadmin -report) but you have screwed up hbase (doing status in hbase does not give you any thing but hangs or reports problems).

The idea is simple. Empty hbase data including system files so you have a hbase back. So, you need backup system on a regular basis for this to avoid trouble. So, the step is this
Shutdown hbase (all servers including zookeeper, master and regionservers)
1. delete /hbase directory
sudo -u hdfs hadoop fs -rmr /hbase
2. create empty /hbase directory
sudo -u hdfs hadoop fs -mkdir /hbase
3. change ownership to hbase user from default hdfs
sudo -u hdfs hadoop fs -chown hbase /hbase
4. delete zookeeper cache
sudo rm -r /var/zookeeper/version-2/
5. create cache folder back
sudo mkdir /var/zookeeper/version-2/
6. change its ownership to zookeeper
sudo chown zookeeper /var/zookeeper/version-2/

Now, start hbase. try doing status. it may not stuck this time but will give you odd results (if you are not lucky like me). So, give it a new try this time. Just stop hbase and restart it again. Now, try doing status usual. it must work. Now start your map/reduce jobs again to reload data.

This may not be best but a quick fix. I have experimented with other methods but i have screwed up things every time. But, as usual analyse problem. The hbase problem may be simple that may be fixed easily without removing all data. So, see log files first. And, backup frequently and regularly. Dont wait for problems. They are very usual in hbase. Thats why they say its NoSQL viz. you dont have stability of SQL servers. It can break very easily.

Note: We have not shutdown hadoop or made any changes to it except removing hbase folder.