My Tech Blogs

Thursday, May 24, 2012

Brightness Control for My HP Pavilion System enabled finally

Never knew that the problem that haunted me for years would be related to something I never ever had thought at. Being a software engineer based in Nepal, Loadshedding problem is very common for me. Long hours of loadshedding demands you to have support for longer battery life. Controlling brightness of monitor is very important to prolong battery life. I have HP Pavillion Dv3 system which I bought two years ago. My first step after buying was to install Ubuntu by replacing Vista which accompanied with it. But, I noticed immediately afterwards that ubuntu was not supporting changing screen brightness. After searching for solution I found it out as a general problem. But, I was not able to solve it even after trying various means. I tried this for every time I updated my ubuntu to a newer version till now (12.04).

So, if you also face from same problem, Here are the steps you need to do to fix it. It is related to grub. So, here are the steps

```
gksudo gedit /etc/default/grub
```

Change GRUB_CMDLINE_LINUX from "” to "acpi_osi=Linux"

Change GRUB_CMDLINE_LINUX_DEFAULT from (”” or "quiet splash") to “quiet splash acpi_backlight=vendor”

```
Close gedit
```
```
run sudo update-grub
```
```
reboot your system
```

I got it working after doing (I couldn't believe it did!!) and here are the links that gave me that solution. Hope it worked for you too. 
http://livinginjava.blogspot.de/2010/11/ubuntu-1010-brightness-problem-in-acer.html
http://askubuntu.com/questions/139796/cannot-adjust-brightness-on-my-packard-bell-easynote-ts11hr-127ge-laptop

Monday, March 26, 2012

Field and Field Descriptor pattern

Software is simply speaking applying logic on data. For applying logic, data needs to be defined consistently so that multiple people involved can work together over long time without continuous training or feedback. This also helps in consistently applying logic. A configurable system where data definition is not coded but is defined outside in a configuration file is much more flexible to change.

So, the pattern I am talking about is applicable in places where you have data having some format which is inputted through some configuration file. So, system basically have two inputs one being data while other being format of data. System uses the latter to apply logic on former. Logic may be hardcoded or again can be configurable. But, that is not focus of this pattern. This pattern only tries to focus on the two inputs specified and their representation.

Configuration must be known before data is available when system starts. Every definition in configuration is represented by FieldDescriptor. The fieldDescriptor defines the format of that field. It may contain values like name of field, expected type, file name to expect data from, location of field in data, etc. it is problem specific. It is an immutable class that should not change value on run time once created and are thus thread safe. While loading, system parses the configuration file. Each definition in that file is converted into a FieldDescriptor object. You can apply any pattern to create it. It must however not hold actual data. Descriptor object should be comparable and must use all fields into consideration while comparing equality. Higher or lower values can be problem specific. Holding value is job of another set called Field. Once fieldDescriptor's are loaded and data is available, system now goes through descriptor list and uses it to extract data from source in format specified in fieldDescriptor. It then creates its corresponding Field object. Field object primarily has two values descriptor and value. It can have any other value depending on problem. If descriptor is not available while creating field but individual description is available then they must be passed onto field and field creates them and stores them. Field must be able to provide its value as well as descriptor when asked for.

Here is the basic scenario

Scenario in which this pattern is applied

Wednesday, March 14, 2012

Installing Protocol Buffer 2.3.0 in newer Ubuntu versions

Protocol buffer is an important api for storing data internally. But, it has one severe limitation which requires you to have only specific version only. So, when you move into newer version of ubuntu where older protocol buffer version is no more supported then you have to uninstall the newer protocol buffer and replace it by old one eg. 2.3.0.

Here are the steps that you need to do that

# remove existing version

sudo apt-get remove libprotobuf* (* is version you may be having eg. 5, 7, etc)

sudo apt-get remove protobuf-compiler

sudo apt-get remove python-protobuf

#time to install new one

sudo apt-get install g++ # if you don't have g++ installed

#download protocol buffer

wget your-protocol-bufer.tar.gz

tar xvf your-protocol-bufer.tar.gz

cd protobuf-2.3.0

./configure

make

make check

sudo make install

cd python

sudo python setup.py install (it attempts to download setuptools-0.6c9-py2.7.egg file from location. But, it is not present. So, download the one available at location http://pypi.python.org/packages/2.7/s/setuptools/ and add it on current location. After adding, rename the downloaded file by setuptools-0.6c9-py2.7.egg and re-run the script again)

sudo ldconfig

protoc --version

libprotoc 2.3.0

Hope it was installed successfully :)

Monday, January 16, 2012

Worse and Better way of writing Software Architecture

Architecture is the foundation stone of any engineering paradigm. Every body must have known this truth. I learned this truth in every semester but only realized this when I entered my professional world. It becomes very important and comes as the 20% part of project taking about 80% of development time. It is cements as well as forms base-stone of project and must not be taken lightly. It is simply a set of codes which defines entire system and upon which feature codes are written. Once stabilized, feature codes can be staked upon it as in any house building work. New developers can easily work on it without intensive imagination of their part to understand things. All this said, Architecture writing itself is however a very difficult task. It requires a highly experienced person with great passion and dedication for writing. It also requires a great deal of imagination and technical know-how. Also, a great deal of vision is necessary for it. Its the critical part of project that can define the future of system. The architect must be very careful while writing it as it defines the boundary of overall project. Especially in a team effort this becomes immensely important. Communication becomes key to success in such teams. The problem lies not just on its technical excellence but also in its readability by team as multiple people are bound to be involved in projects sometime even getting outsourced to a different team in different nation with current architect not available at all by then!

So, here are the bad ways in which you can develop software architecture. I am working in a team that faces similar challenges which forced me to reflect upon the practices that we were following. We currently have a complete re-write of architecture in a new way that tries to support same protocols that feature codes used. So, in short we are aiming at replacing the old stack of architecture code on which we had written workable feature codes by a generic architecture. I can imagine this like Linus Torvalds planning to write a complete re-write of Linux kernel code in a microkernel style replacing the existing monolithic code without changing the application software layer. So, our manager proposed this and we liked this idea as we just had beta release and were in position to make that change. Also, it gave a challenging atmosphere to backend team which is always fun. We are bunch of people involved in that task. My manager who proposed this already had written few lines of codes of it which reflected his idea. So, here starts the bad way of development. The idea discussed was very general without any concrete definition. There was no Use case or sequence diagram discussion. There was code with lengthy documentation that connected each other like in html document. I read it for whole day trying to find clue but it just went in vain. Now, comes the worse way of writing architecture. The well documented code was just set of interfaces. We couldn't understand the connection between interfaces. Which interface was expected to call which interface? I just tried to follow my intuition based on naming but still got confused. Perhaps a simple sequence diagram would have solved entire problem without lengthy documentation. Now, comes the worst part. I was expected to write complete implementation of those interfaces and support two major feature stack over it without changing feature code. The interface had long documentation of what was expected to be done at each method but in a generic way. For example, one method just had set of expected actions to be done without any concrete way. I couldn't apply my own intuition in this as I had been bound to follow each interaction as written in documentation. There may be some things that are different while implementing.

So, here is what I thought an architect must follow while writing architecture in a team. Lets start with good way. The first and foremost way of doing it is this. Suppose an architect has a vision of developing architecture in a specified way but is not completely clear. Also, he doesn't have time for writing it and relies on other resource to do so. In such case, he can just draw a diagram perhaps give a flowchart or module diagram showing interaction between expected modules that reflects his idea. In that same diagram, expected responsibilities can be explained to team. Diagrams are always easy to remember and understood than text. Once understood, team can easily if not perfectly code it into a workable solution. So, I call it a good way of writing architecture. So, lets move onto better way. In this case, an architect writes his own code first based on his idea and imagination. His code contains few classes but those that are very important ones and includes one important use-case scenario. Remember, there is no interface!! Pattern is implemented. Once written, he calls his team and explains each class in diagram clearly showing interaction between using a top level sequence or interaction diagrams that are common UML notations. He then leaves team with task of expanding architecture. The best way is similar to the better way. The team is involved in idea discussion initially itself. The architect implements the idea as before resolving the technical difficulties involved while implementation. It may include which configuration to use or even what technology to use. As before interfaces are not presented. Classes are minimal containing important ones. Standard Documentation are provided which provides example of how and what team must document while developing later. Important patterns are implemented. Anyways, the code follows what was discussed in diagram and so does documentation. So, for team the architecture code is not like arrange marriage where every thing is strange and new (You don't know if looking or looking away from bride is appropriate). Its rather like love marriage and you know your bride very well and expect to behave in specified manner. Architect acts as a guardian when handling architecture code. For few months, he is involved in working directly with team and showing them the expected way of expanding architecture. So, seems I have acted harsh against use of interface which is not the case. I am a strong supporter of interface(even in second semester while working in C++ I had only abstract class because C++ does not have interfaces). But, its use is only clear after architecture is written and when we have different implementation. When module interaction are fixed and responsibilities of each module are crystal clear only then interfaces must come. They help in expanding architecture without changing existing codes. It does not helps while building architecture so I have supported against its use at starting phase. While expanding, interfaces are much better than simple extend. Sub-classing makes understanding code very hard.

To summarize, An architect must be involved in writing initial architecture. An initial architecture must constitute one important use case using technology and pattern that will form basis of entire system. It means while expanding, there should not be big change in technology or pattern unless necessary. Use of interface has to be avoided during start of architecture building. Once initial architecture is complete and tested for its workability only then interfaces must be introduced one by one to help expanding it.

Tuesday, January 10, 2012

MockHTable: HTable Simulator with filter problem

While working with Hbase and mainly TDD, we have found difficulty to have hbase running backend before we can start running unit tests. In such cases, MockHTable class comes pretty handy. It implements HTableInterface interface and thus provides all services what an hbase table would provide but in memory. For unit tests, this is very efficient way for testing thus providing a complete transparent way. We replace the table with regular Hbase table in production. I found it as a silver bullet for my daily problems related to hbase running which used to get down for some reason in middle of the day halting development.

But, every convenience comes with a price. I used filters especially PrefixFilter for doing range scans. But, PrefixFilter was not being applied on scanning rather it would return all rows of the table. But, the mysterious thing was that it would work perfectly for other non-row based filters like SingleColumnValueFilter or QualifierFilter. After having look at the getScanner method of MockHTable, I realized that it was applying filterKeyValue for each keyvalue of each row. After some search, I realized this was again a change made on actual code. The actual code used filterRow which seemed pretty reasonable. But, the problem seemed to be with Filter class itself. According to definition, filterRow simply deletes all columns of a row that dont subject to filter condition. But, those involving row id went unnoticed by that call. Since, filterRow had other problems too, we had removed it earlier by filterKeyValue that was a better version of using original filterRow.

As you can see, for every KeyValue of the row, we are checking if that KeyValue is allowed by filter. This works fine for non-row filters but where we need to apply filter on row id, this fails returning all rows. My problem came for PrefixFilter class. So, need to check for this class and apply row id checking. I used the filterRowKey method passing row id for checking with prefix defined earlier at filter creation. According to its definition, it returns false when that row has to be deleted and true when included. So, applying this fixed my problem for PrefixFilter filter in MockHTable.

Date Format and its value in Data Search

Date data type can be very important field in Data Analytical systems especially in Reporting part. In addition if your application has online data search application as in Makalu then it becomes more important that you specify right formatting.

Date type data comes normally in String format in a YYYY-MM-DD format which is what sql normally follows. But, storing data in string format is very problematic especially when different format come into account and each one must be supported for exact data search or reporting. So, We chose long type instead which is equivalent to milliseconds after epoch time(00:00:00 GMT on January 1, 1970). It allows date comparision which is key in searching activities especially when date fields are indexed.

There is however one catch in time conversion. It arises out of default TimeZone of processing system. The time value for same date value can be different based on timezone (remember epoch is defined wrt GMT). So, date value '2018-10-12 11:12:11' is equivalent to 1539322031000 in NPT and 1539342731000 in UTC. This works fine as far as client and server are in same TimeZone. But, If the requesting client and serving server are on different TimeZone, then there maybe great variance that may even extend to one whole day! In Data search, if client requests getting all records having service date less than '2018-10-12 11:12:11' from Nepal to server based on US, it requests data wrt Nepal but since data are stored wrt UTC, it may end up getting completely records that are upto '2018-10-12 21:12:11'. A very common scenario would be easily arised due to Cloud as system are completely transparent and may exist in different US states that may lie in different time zones. So, this issue becomes very important to deal.

There is however an easy way to deal this. The DateFormat class to parse data into Date object has simple method for parsing onto common TimeZone. Using this, We converted all Date wrt GMT format while processing data. Also, while requesting client would be converting data into GMT format so they are always compatible with each other.

Here are the two ways one can do
1.    DateFormat sqlDateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
    sqlDateFormat.setTimeZone(TimeZone.getTimeZone("GMT"));
    Date d = sqlDateFormat.parse(getDate(a));
    return d.getTime();
2.    Calendar c = Calendar.getInstance(TimeZone.getTimeZone("GMT"));
    DateFormat sqlDateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
    Date d = sqlDateFormat.parse(getDate(a));
    c.setTimeZone(TimeZone.getTimeZone("GMT"));
    c.setTime(d);
    return c.getTimeInMillis();

References:
http://stackoverflow.com/questions/2627992/force-java-timezone-as-gmt-utc
http://docs.oracle.com/javase/1.4.2/docs/api/java/util/Date.html
http://docs.oracle.com/javase/1.4.2/docs/api/java/util/Calendar.html
http://docs.oracle.com/javase/1.4.2/docs/api/java/text/DateFormat.html#setTimeZone(java.util.TimeZone)
http://docs.oracle.com/javase/1.4.2/docs/api/java/util/TimeZone.html

Tuesday, September 13, 2011

Simple Hbase Fixing Steps

Hbase/Hadoop is not perfect like any other system. This is especially true in this case because its a relatively newer technology and things can go completely wrong by even a small problem like accidental shutdown. Re-install is really trouble and can even worsen situation in such cases. Hadoop is relatively stable so you will rarely have problem of Hadoop by simple mistakes like accidental shutdown. But, Hbase is notorious in such things. So, this step is only for cases where you have hadoop working perfectly fine (do sudo -u hdfs hadoop dfsadmin -report) but you have screwed up hbase (doing status in hbase does not give you any thing but hangs or reports problems).

The idea is simple. Empty hbase data including system files so you have a hbase back. So, you need backup system on a regular basis for this to avoid trouble. So, the step is this
Shutdown hbase (all servers including zookeeper, master and regionservers)
1. delete /hbase directory
sudo -u hdfs hadoop fs -rmr /hbase
2. create empty /hbase directory
sudo -u hdfs hadoop fs -mkdir /hbase
3. change ownership to hbase user from default hdfs
sudo -u hdfs hadoop fs -chown hbase /hbase
4. delete zookeeper cache
sudo rm -r /var/zookeeper/version-2/
5. create cache folder back
sudo mkdir /var/zookeeper/version-2/
6. change its ownership to zookeeper
sudo chown zookeeper /var/zookeeper/version-2/

Now, start hbase. try doing status. it may not stuck this time but will give you odd results (if you are not lucky like me). So, give it a new try this time. Just stop hbase and restart it again. Now, try doing status usual. it must work. Now start your map/reduce jobs again to reload data.

This may not be best but a quick fix. I have experimented with other methods but i have screwed up things every time. But, as usual analyse problem. The hbase problem may be simple that may be fixed easily without removing all data. So, see log files first. And, backup frequently and regularly. Dont wait for problems. They are very usual in hbase. Thats why they say its NoSQL viz. you dont have stability of SQL servers. It can break very easily.

Note: We have not shutdown hadoop or made any changes to it except removing hbase folder.