Tuesday 19 June 2012

Coding Week 4:Chess,Game of Strategies

Yes, you read it correctly chess it is !!!


Oh, i am in no way relating  myself Indian prodigy Vishwanathan Anand.Just Information retrieval of Chess,India reminded of  our prodigy and thought it would be good to choose his pic correctly depicts Chess.

Without a lot of explanations will start explaining how was the week:

Weeks started with a lot of frustration to discard the previous weeks implementation and resolution to repeat mistakes of previous week again.As we gotta learn from our mistake.

As a formal procedure jotting down what was done in week:

1. Removed and adjusted the back-end to treat bi-grams as yet another terms.
2. Storing of bi-grams ruined the statistics of document,Storing the various document statistics like Document Length,Bi-gram Document length,Number unique term,Number of Unique bi-grams.
3. Bug fixing due to bi-gram statistics in posting list(Found during regression testing)

Week started with cleaning up previous weeks mess.Removing the implementation and treating bi-grams as terms.Followed by the discussion on how to store the bi-gram statistics.I learned from my mistakes and first discussed about foundation idea on how to store the document statistics in back-end.

Since per document statistics is accessed while matching,so its stores in easily accessible location post list table.So now the per document resides as 4 different posting entry in post-list table.

Following this was the most interesting thing of week playing chess.While regression testing the back-end changes i found all test of the test-suite failing in front of me.Believe me this is not something which you would want to get it at-least for back-end but you always end up here only.There were almost two important bugs after the investigation.Please refer here for details .Then started the Game of Chess(Bug Identification) with Xapian.

I was all there in playground seeing making checkpoint, estimation problem and making  strategies what  could me next and planning if code gave this error(played this move) then will do this or possible reason could be and finally solve this bug giving bug check and mate(win the Game of Chess so. much relieve for mind). 

Coding Week 3: Working Holiday

Beginning of week was really exciting as previous week i haSo it was The time to get back to coding again and implement the proposal.So it was The time to get back to coding again and implement the proposal.d Treaking Exploration With Xapian code and prepared so called Bi-gram Integration Proposal.And asked for the feedback from the community about the  Proposal from community.

So it was The time to get back to coding again and implement the proposal.

I started working as proposed by me in proposal as no  one responded on proposal.And this was my biggest mistake!!!!

What i did (Implemented) :
  1.  Bigram Iterator.
  2.  Bigram PostList
  3. Stores terms of document in termlist table with a different key than normal terms to treat bi-grams in backend differently than uni-gram,so that we can access terms of document fairly easily
  4. Adjusted Postlist to store without any changes as backend postlist table storage have term as keys.so it doesn't require a different key.
Didn't understand what i did? referring this might help: Bigrams
After completing the work i informed informally Olly Betts(My Saviour) that  this is what i did.He told me its better to treat Bigrams as terms and i was convinced that we should keep it separate mainly due to two reasons:

  1.  I thought it was best to keep them separate as it would be easy to access later.
  2.  Keeping them as term would all implementation a big waste and i thought if we have them separate then why to make them a mess and keep together.

I spent a full day arguing with Olly Betts about we should keep them separate.But finally Olly said:

"olly: I guess we are going over a circle without quite understanding each other"

So, then i decided to discard my implementation and treat them as terms.With some  more discussion to understand why we store them together Olly told me we don't really use these frequently and moreover backend dont need to know which are bi-grams and which are uni-grams.Application can check which are bigrams easily and it won't decrease performance as it rarely done process to open termlist of document.

So it seemed a good reason to treat them same.And i discarded my whole implementation  and started a fresh.So in all i learned and had fun  coding this week but no useful work making it a Working Holiday.

Free Advice for readers(if any) :P:

1. Never Start foundation implementation before discussion on every bit with mentor or concerned members.
2. Try to understand other and do question wisely(to solve doubts) than circling over a problem.

Friday 8 June 2012

Coding Week 2: Trekking Exploration

Trek is a long, adventurous journey undertaken on foot in areas where common means of transport are generally not available. 

yes as the title suggest week 2 has been completely dedicated to Xapian exploration.Bigram Integration proposal was in the highest procrastinated stage.So I decided to explore and create a plan for integration of Bi-gram which is major part of my work in GSOC2012.

I remember one is went to a small trip to rishikesh a had little trekking adventure in package.Trekking is like discovering new things around you in a difficult environment. Most Important part of trekking is you stand at a distance from a beautiful waterfall or surrounding and you could feel about wonderful place being around you.Which creates an anxiety to run and visit.

ah! Strangely or Luckily i must say code exploration of Xapian also followed on the line of trekking.

How Come Trekking Exploration ==== Code Exploration ??
In desire to make Bi-gram Integration proposal for GSOC project i started with the entrance of Xapian which is User API classes and function.Since first step would be indexing i started from what is indexing.Which could be considered as entrance point from where exploration starts and people are really excited about going on excavation. So was I :) !!!

Then following the trail what functions the API calls and what is infrastructure which supports will the bi-gram integration in the architecture.You See a lot of exciting code or component in between or some sight-seeing locations in case of trekking.
i was less aware about how the TermList is being accessed by the API User and i Found the TermIterator which is basically a pointer to termlist.So While going to your final goal "Bigram integration Proposal". We get plenty of exciting experience in your journey, Of-course which are code-components ,sight-seeing locations which we admire. I would also like to quote a proverb from film peaceful warriors.
Journey are more important than the destination itself - Peaceful Warriors

In program when you locate place where code could reside you rush to the class to find the code and get satisfied about how this thing works.Don't say you couldn't relate it to trekking obviously you hear water falling and you rush to location of water fall.And Finally there is a (happy)moment where you actually i found on this is how the terms are stored so i can create a parallel infrastructure for Bi-grams or how the surrounding of waterfall looks.So in all you completed your exploration and feels happy about it.

There were discouraging moments when i couldn't find how this thing works and i am glad i wasn't disheartened and re-searched  the trails and finally created the Bigram Integration Proposal.These Discouraging and Happy moments while exploration filled gap in Journey of Proposal.Really enjoyed this journey filled with discouraging and happy moment, Experience while journey matter more than Proposal itself(will help in coding too).

In all Trekking in Week 2 was a real fun.

P.S: I have restrained my self from including a bulk of technical details if you like please visit so called Bigram-Integration- proposal at http://goo.gl/Noifa