Sunday 29 July 2012

Coding Week 7: Building Entry Point

Features which make everyone crazy about any building, structure, Code or any things is its first impression. For a building Entrance, main door and outer look plays a role, for us our body outer looks, presentation. I don't mean to say we should judge person or building by these characteristics only but saying they are mostly the last impression also isn't wrong too.


So for any organization, person,Coding making first impress worth while is the biggest aim.So is/was for me.


This week was about designing, building or adjust the outer most module Query parser for my project.As any user except a developer wont see bi-gram implementation in index what he will experience in bi-gram is Query parsed with bi-grams.So for me this is Entrance Point and exiting module to code.


Query Parser module implementation started with understanding the current code and code was very difficult and need a lot of patience.But i had a savior Sehaj Singh Kalra.He is working on improving the Query Parser this year as a GSOC project.He gave me the docs he created as part of project.And those docs are the most useful or worth reading docs in my life.Thanks for such a great help Sehaj.


I tried my hand on the code with help of ssk's docs and learned where i need to attack but still there were some grey areas.In my review meeting i decided to discuss these issues.Dan Colish,James Aylett and myself have discussion on parser and decided to do the easiest changes and move forward to other module in the evaluation meeting.Aim was to do minimal changes and move forward to evaluation module and then making changes will make sense as we will know does this change benefit us in the module.


I want to share one of my experience with parth(mentor @ xapian).Once i was working in same lab with parth and discussion one of my project with him. He Stated make a module or script with which you can play and see numerical results stating "It's always easy if you can see you result in numbers and it motivates you"(not exact but it carry the gist of what he said).So we decided to build this evaluation module and later improve all of our module using result of evaluation module.


This week we made changes to NEAR, PHRASE, ADJ, GROUP queries to include the bi-gram in them and re-factored and adjusted Weighting scheme UnigramLMWeight to LMWeight. LMWeight handles uni-gram and bi-gram Language model based on parameters.And obviously buckleup for evaluation module.

Coding Week 5 & 6: Cleaning the Mess

At this moment of time i completed the work of back-end to store bi-gram and retrieving them for use.But for a good software best thing is to make it work correctly and expectantly.As a statement in software world its said "Actual work starts when you complete implementation" .This  worked well in my GSOC project too.
Generally any piece of software have set of pre-made test which are kept or written to prevent unusual and check behavior of code.I have run these test a lot of test were Failing in my code.So its time to fix these "Pain in Ass bugs".Most of the time the are so common bugs that you tend to forget these every time you check code or don't expect this to happen.

These Weeks were mostly spent sleeping with these Bugs and Discussion on Comments by my mentor and Watching "Burn Notice"(One of Awesome Series i have seen).

In week 6 my mentor(jaylett) commented on my work and i was totally impressed by his way of finding deep problems which are generally overlooked and learned a lot from each and every of his comments.So Weeks were awesome as mentor comments taught me a lot.

Saturday 28 July 2012

Confession for Week 5 to Week 9(Not Blogging)

It's Week 10th but i proudly want to write Week 5th Post as its better late than Never. After appreciation from Dan,I should have put in more effort but i kind of became reluctant.In past days few days ruined my routine and started sleeping at bad time,watching series and doing unnecessary things along with work. And Just barely managed to complete GSOC Work to be frank.Really Felt and feel bad about that :( . One Morning i was sleeping phrase  "Let it Go" bedazzled me.
 One more even  which left me with similar impression was watching WeekIn startup
 program of Jacob calacanis.In news section he had discussion  about facebook and 
bars .And i think u know what it would be ;) for those who don't "Its Just Waste of time".

That was and this is Moment of Eureka.I plan to be little more careful about what i do 
and how i don't.I will use Should i "Let it Go" to evoke my self consciousness to 
take control of my.

Blabbering:

Almost every second of the time we run with pace of our life (Slow or Fast) and don't 
take time to think what we do or not think too seriously to let our self consciousness 
take possession of the situation.
Kind of we know i shouldn't do this.
but
we restrain self consciousness to take possession.

This happens with me very often,So having a helper like "Let it Go" or 
anything which bedazzled you is best to help yourself where you are the culpable.

Tuesday 19 June 2012

Coding Week 4:Chess,Game of Strategies

Yes, you read it correctly chess it is !!!


Oh, i am in no way relating  myself Indian prodigy Vishwanathan Anand.Just Information retrieval of Chess,India reminded of  our prodigy and thought it would be good to choose his pic correctly depicts Chess.

Without a lot of explanations will start explaining how was the week:

Weeks started with a lot of frustration to discard the previous weeks implementation and resolution to repeat mistakes of previous week again.As we gotta learn from our mistake.

As a formal procedure jotting down what was done in week:

1. Removed and adjusted the back-end to treat bi-grams as yet another terms.
2. Storing of bi-grams ruined the statistics of document,Storing the various document statistics like Document Length,Bi-gram Document length,Number unique term,Number of Unique bi-grams.
3. Bug fixing due to bi-gram statistics in posting list(Found during regression testing)

Week started with cleaning up previous weeks mess.Removing the implementation and treating bi-grams as terms.Followed by the discussion on how to store the bi-gram statistics.I learned from my mistakes and first discussed about foundation idea on how to store the document statistics in back-end.

Since per document statistics is accessed while matching,so its stores in easily accessible location post list table.So now the per document resides as 4 different posting entry in post-list table.

Following this was the most interesting thing of week playing chess.While regression testing the back-end changes i found all test of the test-suite failing in front of me.Believe me this is not something which you would want to get it at-least for back-end but you always end up here only.There were almost two important bugs after the investigation.Please refer here for details .Then started the Game of Chess(Bug Identification) with Xapian.

I was all there in playground seeing making checkpoint, estimation problem and making  strategies what  could me next and planning if code gave this error(played this move) then will do this or possible reason could be and finally solve this bug giving bug check and mate(win the Game of Chess so. much relieve for mind). 

Coding Week 3: Working Holiday

Beginning of week was really exciting as previous week i haSo it was The time to get back to coding again and implement the proposal.So it was The time to get back to coding again and implement the proposal.d Treaking Exploration With Xapian code and prepared so called Bi-gram Integration Proposal.And asked for the feedback from the community about the  Proposal from community.

So it was The time to get back to coding again and implement the proposal.

I started working as proposed by me in proposal as no  one responded on proposal.And this was my biggest mistake!!!!

What i did (Implemented) :
  1.  Bigram Iterator.
  2.  Bigram PostList
  3. Stores terms of document in termlist table with a different key than normal terms to treat bi-grams in backend differently than uni-gram,so that we can access terms of document fairly easily
  4. Adjusted Postlist to store without any changes as backend postlist table storage have term as keys.so it doesn't require a different key.
Didn't understand what i did? referring this might help: Bigrams
After completing the work i informed informally Olly Betts(My Saviour) that  this is what i did.He told me its better to treat Bigrams as terms and i was convinced that we should keep it separate mainly due to two reasons:

  1.  I thought it was best to keep them separate as it would be easy to access later.
  2.  Keeping them as term would all implementation a big waste and i thought if we have them separate then why to make them a mess and keep together.

I spent a full day arguing with Olly Betts about we should keep them separate.But finally Olly said:

"olly: I guess we are going over a circle without quite understanding each other"

So, then i decided to discard my implementation and treat them as terms.With some  more discussion to understand why we store them together Olly told me we don't really use these frequently and moreover backend dont need to know which are bi-grams and which are uni-grams.Application can check which are bigrams easily and it won't decrease performance as it rarely done process to open termlist of document.

So it seemed a good reason to treat them same.And i discarded my whole implementation  and started a fresh.So in all i learned and had fun  coding this week but no useful work making it a Working Holiday.

Free Advice for readers(if any) :P:

1. Never Start foundation implementation before discussion on every bit with mentor or concerned members.
2. Try to understand other and do question wisely(to solve doubts) than circling over a problem.

Friday 8 June 2012

Coding Week 2: Trekking Exploration

Trek is a long, adventurous journey undertaken on foot in areas where common means of transport are generally not available. 

yes as the title suggest week 2 has been completely dedicated to Xapian exploration.Bigram Integration proposal was in the highest procrastinated stage.So I decided to explore and create a plan for integration of Bi-gram which is major part of my work in GSOC2012.

I remember one is went to a small trip to rishikesh a had little trekking adventure in package.Trekking is like discovering new things around you in a difficult environment. Most Important part of trekking is you stand at a distance from a beautiful waterfall or surrounding and you could feel about wonderful place being around you.Which creates an anxiety to run and visit.

ah! Strangely or Luckily i must say code exploration of Xapian also followed on the line of trekking.

How Come Trekking Exploration ==== Code Exploration ??
In desire to make Bi-gram Integration proposal for GSOC project i started with the entrance of Xapian which is User API classes and function.Since first step would be indexing i started from what is indexing.Which could be considered as entrance point from where exploration starts and people are really excited about going on excavation. So was I :) !!!

Then following the trail what functions the API calls and what is infrastructure which supports will the bi-gram integration in the architecture.You See a lot of exciting code or component in between or some sight-seeing locations in case of trekking.
i was less aware about how the TermList is being accessed by the API User and i Found the TermIterator which is basically a pointer to termlist.So While going to your final goal "Bigram integration Proposal". We get plenty of exciting experience in your journey, Of-course which are code-components ,sight-seeing locations which we admire. I would also like to quote a proverb from film peaceful warriors.
Journey are more important than the destination itself - Peaceful Warriors

In program when you locate place where code could reside you rush to the class to find the code and get satisfied about how this thing works.Don't say you couldn't relate it to trekking obviously you hear water falling and you rush to location of water fall.And Finally there is a (happy)moment where you actually i found on this is how the terms are stored so i can create a parallel infrastructure for Bi-grams or how the surrounding of waterfall looks.So in all you completed your exploration and feels happy about it.

There were discouraging moments when i couldn't find how this thing works and i am glad i wasn't disheartened and re-searched  the trails and finally created the Bigram Integration Proposal.These Discouraging and Happy moments while exploration filled gap in Journey of Proposal.Really enjoyed this journey filled with discouraging and happy moment, Experience while journey matter more than Proposal itself(will help in coding too).

In all Trekking in Week 2 was a real fun.

P.S: I have restrained my self from including a bulk of technical details if you like please visit so called Bigram-Integration- proposal at http://goo.gl/Noifa

Tuesday 29 May 2012

Coding Week 1: Roller Coaster Ride,C++

Fun-filled and exciting Ride with c++

A roller coaster is a popular amusement ride  that turn the rider briefly upside down. Roller coaster are not self-powered rather run on mechanism of conservation of energy. Potential Energy by height is converted to Kinetic Energy and vice verse for the roller coaster ride.

Similar was my experience with the first week of coding thrilling. I was inclined toward choosing title as roller coaster ride. While coding there were areas where i had experience in coding and was working fairly easily and fast those area were enriching me with more of a kinetic energy to tackle grey area of difficulty and inexperience and obviously solving difficult one gave me potential energy to work even faster with easy coding areas.This shift of Potential and Kinetic energy is giving me experience of roller coaster ride.

What more fun one can ask for summers ??

Okay i must not forgot blog is primarily meant to share what i did with people interested in my work with less emphasis on how i felt.So here how i spent first coding week

This week i extending the prototype of UnigramLMWeight class build as a patch for GSOC Selection. four type of Smoothing was implemented with ability to api user to select the type of smoothing he wants. A
Brief  discussion on Smoothing techniques to be implemented took place and finally decision on implementing four smoothing scheme Dirichlet prior smoothing,Absolute Discount Smoothing,Jelinek Mercer Smoothing,Two Stage smoothing.

So Main task of week was to include smoothing in UnigramLMWeight Class and update prototype with technique discussed to handle negative value from log,Bounds for optimization,Ability to user to select type of smoothing used.

First i started with handling negative value of log,bounds for optimization which was pretty easy based on discussion on IRC and Mailing List.

Then another major task was to parameterize the constructor to provide user ability to select smoothing and parameter  to clamp negative value of log.Adding constructor is a fairly simple task and shouldn't have taken long for me to make them working.

But after implementing when i was looking at the constructor and trouble shooting them i found always default constructor is called what ever parameters i give.SO i tried around fiddling with parameter and with experience tried to make changes to parameter etc.But no luck always default was the final call and was stuck on the issue.Then i posted the question on Stackoverflow and discussion there came to know that when i make object of weight class and put it in set_weighting_method then weight object is clone and it calls the default constructor at last.

Then major hurdle i felt was about an  adding extra per document statistics in xapian architecture number of unique term in the document to be accessed in the Weight class. Initially i got confused in Rset and Mset and with discussion i tried getting this statistic in weightinternal class and we get document id there for Rset.But i needed it for Mset so whole implementation was wasted and had to reset to last commit and add the nouniqterm for document in database and make a new constructor for function get_sumpart for Weight class with extra per-document statistics.

Then Switching to Brass backend created some hurdles and i thought i would be something to disable chert while configuration and since chert is default Backend it created problems in disabling Chert . But believe what you just need to change namespace of open function to Brass:: while calling constructor for database and it will automatically start using Brass backend.

So this week working around some of these problem due to under-experienced in programming gave me fun-filled joy rides.

Hope i will catch up speed and cover-up lagging parts till end of next week.