Anand

pygami at infogami

blog

Technology in IITs

Technology in IITs is really really bad. Today they are supposed to annonce gate results. So they are going to receive millions of hits. Are they prepared for this? Not at all. Let me explain my experience with these sites.

GATE results will be announced in all IITs and IISc websites.

I started looking at all sites and found iit roorkee has a link to GATE Results 2007. So i opened that and entered the number i wanted to see. It said That number is not qualified. I was puzzled! I knew that can't happen. I tried some random numbers. Got the same response. I thought it must be some bug in their program so i wrote a progam to confirm it. I quieried for 100 registration numbers and it gave same response for all. What a pity!

After sometime IITK also put the results and the response is same. After lot of waiting and lot of frustration i went back to IIT Roorkee site. Now i see a different error message. It gives the following error:

fopen("qual.txt", "r") - No such file or directory in /var/www/html/gate/query/result07/nappquery.php on line 78
Can't Open data.dat for reading.

Now it is clear what they are doing. For every request they open a text file with all numbers and do a sequential search to find the result. And they do it for every request. Lets do some quick calculations. Suppose there are 1 million registration numbers. For each number they must store score. So it takes 6 chars for the number + 5 chars for score + one space and one new line. So total 13 chars per number. So total 13 MB per request. If they get 1000 requests at a time. It will take 1.3 GB of ram and system starts trashing.

There is a simple alternative. Write a small mod_python or mod_perl program and keep all data in memory. Every request can be handled without any disk IO. Infact the html pages can also be kept in memory to make it still better. I am very confident that such a system will be able to handle thousands of requests very easily if not millions.

People at IITK have done some kind of replication which is really stupid. They have put the results in 3 directories on the same machine.

How confusing. Gate results are available in JMET 2006 and JMET 2007 directories.

After a while all servers stopped responding.

  • The server www.iitk.ac.in is taking too long to respond.
  • The server at gate.iisc.ernet.in is taking too long to respond.

Surprising! IITM is working. and i got what i wanted.

itch

I started scratching my personal itch to port web.py to scheme. I have been thinking for a while for porting web.py to scheme and haskell. but motivation for this moment came from here.

browsing history

I started looking for tools to capture my browsing history. These are my findings.

  • TrailBlazer gives a nice graphical view of the history.
  • BHV allows you to export browser history. supports safari/mozilla. written in python.
  • Mork.pm can parse mozilla history files.
  • slogger

tracking browsing history

I explore many websites everyday. I somehow want to keep track of important ones with notes and able to go back and read later. delicious is good, but it is not for that purpose. There i can bookmark interesting websites. Here i want to capture just the browsing history. sites i visit today will be displayed on the top, site i visit often are displayed in bold or something like that. If you visit more pages in the same site, they will be grouped together. If you see an old page again, it will come to top. Somehow it should also capture all my browsing graph.

May be this is very good idea for an web2.0 startup. Using this you can capture the browsing patterns of people and findout which sites are popular and which sites are hot etc.

missing markdown!

Its painful to blog with wordpress. I am really missing markdown. I stopped blogging since its very painful to insert links in wordpress. So i am back here again.

new blog

I started blogging in wordpress now.

Check it out: http://anandology.wordpress.com.

fortune

not feeling like working now. looking at fortune.

The clash of ideas is the sound of freedom.

Life would be so much easier if we could just look at the source code.
    -- Dave Olson

A student who changes the course of history is probably taking an exam.


You have taken yourself too seriously.


Sight is a faculty; seeing is an art.

gumstix

Recently i was exploring low power computers. found very interesting one, gumstix. It is very small computer which you can keep in your pocket and it runs linux. it has a full powered webserver too!

Somebody has already ported astrisk on it. see this link or google cached copy.

I am thinking of buying one and try to implement in timbaktu. Unfortunately they are not delivering it in india. should see how to get it.

svn + trac on ubuntu

i was trying to setup svn + trac on ubuntu 5.10. installation was really smooth.

I got all the required information from the following 2 links:

* http://www.ubuntuforums.org/showthread.php?t=51753
* http://projects.edgewall.com/trac/wiki/TracOnUbuntu

Web.py on apache

i am now able to run my web.py application on top of apache. i wasnt as difficult as i thought.

i downloaded fastcgi module and installed. followed the instructions given in http://webpy.org/install.

but it refused to work. i looked at apache error logs. it gave the following error:

 ImportError: No module named flup.server.fcgi

Then i tried locate flup. Surprisingly i had it in my downloads. installation of flup was very simple.

Thats all i have web.py running on top of apache.

I am working on setting up svn and trac at my work place. I will try to use web.py to automate some tasks there.

In the process i looked at lighttpd as the preferred the webserver for web.py. reddit run on it! should install it and try it on my machine.

The Pragmatic Programmer

Just read The Pragmatic Programmer Quick Reference Guide. It looks really interesting. I think i should get a copy of The Pragmatic Programmer.

twill

twill is a simple scripting language for Web browsing. It uses mechanize.

Seems like it is a better idea to result a twill script than mechanize code for mechanizer(twiller?).

YouOS

I am playing with YouOS now and its pretty cool. it has many applications including a browser. They have tutorials to develop applications for YouOS. I should get my hands dirty!

Einstein's Puzzle

I solved the Einstein's Puzzle: Who owns the fish in python.

Here is the code.

thoughts...

i wrote web crawler using mechanize, and it worked really well for my job. i wrote a web interface to the downloaded pages using web.py, it really saved lot of my time by making navigation very easy. i want to make the web interface complete by adding some more things.

i have been thinking of learning css. web.py, form.py, template.py all there i want to play with and make something that looks beautiful. i found some css design and downloaded it. will play with that over weekend.

today i saw a new y-combinator startup wufoo. their idea is simple and very nice. make forms online, without effort. it looks really nice, though i couldnt play with it, wasn't working on my machine :(. But i wonder, infogami will soon be able to do many more things than that.

coming back to web crawler, i want to a generic web crawler interface, driven by web which generates the code for doing the web crawling. idea is quite simple, first page asks you for which url you want to go. you enter that. it downloads that and replaces the links with its own, so that it can know what user is submiting and record it as a python script.

startup school 2006 is coming up. i wish i could go there....

Hackers box

Quite some time back i read a paul graham's article Return of the Mac. I personally use mac, so i liked that article. But i didnt realize what he means. I see it now. almost all founders of Y combinators startups i have seen use mac.

TODO: write more..

Web crawling

I have been trying for quite some days to do some web crawling using python. i was using urllib2. it was quite painful to see the HTML source and try some heuristics to go to the next page. but none of my attemts resulted in success. i guess there was some problem with cookies.

after lot of googling, today i found something interesting. its mechanize. i found it from a blog. This is talking about applying tidy on the html output to make it more robust etc... i want to try all these, but i am in hurry to use mechanize right now. hope it works.

Hello, world!

Hello, world!