Month: February 2005

And elanor was not guilty after all… (a Twilight Zone episode)

Posted by – 23/02/2005

That surely will enter my “book of the strangest things I’ve seen wrt computers”. As I told before, greyhavens, the Pentium II I use as an intermediary router, was suffering from an unidentified disease that was freezing it up without letting it answer to SysRq sequences. Well… I found the cause of the illness, but first lemme tell you the whole thing…

Overnight, greyhavens, which I built with spare pieces I had home, froze with no furder warning. I had updated its kernel about a week ago, and remembered that fact, but as I rebooted it and, as everything went fine, I just let it run… for the next 24 hours. Then it froze again… I booted the older kernel (just to be on the safe side) and run some checks (all passed ok). It run for about 3 hours and froze up again. That gave me the challenge: what was causing this sudden illness?

I checked everything. Since greyhavens is nothing but a simple router, with a 350 Mb disk, there was not much to be checked. I cleaned it, checked the coolers and the temperature (it have been really hot these summer days), run memtest, e2fsck, replaced the CMOS battery (OK I was desperate), etc, etc, and it just repeated the same behavior over and over again: it ran for a while and then froze.

When I wrote the last blog entry, I was almost sure some piece of harware was faulty… The question was which one…

Without no clue, I just adopt the “standard attitude”: I observed. Then it hit me: elanor‘s led were not blinking. Elanor (named after Samwise Gangee daughter) is a PCI ethernet card I am very found of. “She” is with me for longer than I remember (I suspect it came with the first PCI computer I had), and she has been a spare card for the last three years. Since she have a BNC and a RJ-45 connector, she is a very useful spare piece, and everytime there was a computer event, I brought her with me (I used it in 3rd FISL and in DebConf4). When I built the router though, she was promoted to a first class citizen, and has been inhabiting greyhavens since then.

“What the hell!!! Elanor is dead!”. It did not make sense at all. Since greyhavens worked as expected before freezing, the idea that elanor was dead is just nonsense… Unless she was dying.

I’ve seen a lot of NICs dying, and there is always some signs: They begin to cause errors, DUPs in pings, floatations of time response, etc, etc. And Elanor was not showing any of this signs. In fact, she was perfectly healthy. But there it was: the leds were off, and that meant only one thing: my elanor was dead.

I bought a new NIC and replaced Elanor. When I turned the power switch on, what was my surprise when I realize that the brand new NIC also had its leds off!

What was going on?!? I knew the net-switch was good, and I’ve tested the cable in every connector just to be sure. Besides, it was plain nonsense to believe that a bad net-switch could freeze a Linux box so bad that not even SysRq sequences did any good. Then, since I was clueless (and had spent R$ 23,00 to buy the new NIC), I just replaced the cat-5 cable that linked the new NIC to the switch… voilá. Greyhavens is up and running again…

I replaced the new NIC for elanor again, and rebooted greyhavens. Everything is alive again…

I just cannot believe that a faulty cat-5 cable is able to freeze a Linux box as hard as greyhavens was frozen. I have never heard of something like that before, and I will not be surprise if none of the readers have had this experience also. Anyways, that really happened… as weird as it may seem.

Yesterday install fest pictures and fun with LARTC

Posted by – 20/02/2005

Yesterday’s install fest was great. We had 52 registered people, and although I guess less than a half showed up (yesterday was really hot! People must have gone to the beach), it surely was a good install fest for this time of the year. Check out the pictures:

Today I decided to upgrade my cable modem speed connection to 512kbit. Then, after messing around with LARTC the whole morning, I managed to priorize my ssh connections giving them a small percentage of my uplink (since it must remain there, but not really too speedy), sharing the remaining bandwidth between eriador and valfenda (unevenly, since I use eriador more often) and adding a lot of tweaks to allow p2p connections in a variety of protocols.

I am still tweaking, but it seems to finally have reached a “confortably useful” state. I’ve messed with tc and htb long ago, and I am glad the main operation have not changed much and I remembered almost everything: LARTC, although a good documentation, is not a pleasant reading, and I would hate having to spend the whole day reading it.

Now, back to the “playground”, I still have to understand what is happening with greyhavens, a Pentium II I use as intermediary router. It is freezing up completely without any error message or core dump, and unresponsive to SysRq sequences, apparently with no obvious explanation. I suspect it have to do with hardware, since I’ve made no software changes in many time (except for the kernel, but booting up an old compile gave me the same behavior).

I guess greyhavens will have to wait until early afternoon, since LARTC always makes me hungry… 😉

[Update 2005-02-20 18:57:51 GMT ] I’ve uploaded more pictures from my vacation.

Sad world

Posted by – 15/02/2005

I am coming back from the vacation, and it seems all the work that have not been done because of my vacation is right here, waiting for me. Unbelievable! I come back and have to work twice as hard as before! I hope I can get a vacation after that! 🙂

News from the IT world are also astonishing. It is a sad world where people’s jobs are threatened by a mega corporation – which have just reported really good earnings – over the software patents issue. This patent madness is going too far! It’s time to decide if we are going to governed by our own elected governments, or by corporations… if it’s not already too late.