Fork me on GitHub

Swiss Tournament in Ruby 0

Being a chess player (not a very good one), I’ve always been intrigued by Swiss Tournaments. They are so practical, and ensure that even a lowsy player like myself, can play the same number of rounds as any other player. That’s being inclusive!

I’ve played some knock-out tournaments (to me it meant being kicked off in the second or third round), and, given their nature, not-so-good players tend not to attend these tournaments (since their fun will, almost surely, end before long).

Well, to solve a very similar problem, but not in any game championship, a co-worker suggested we could use a Swiss Tournament system. I liked the idea, but not being sure it could really solve the problem, I had to quickly implement something to test our data with… so Ruby to the rescue!

In no time we were up and running, and apart from minor issues that were being fixed along the way, I guess it’s a pretty good implementation. You can checkout the code to get a feel of it. Of course, it doesn’t follow any rules from any Chess or Go association (Wikipedia, after all, was my guide here), but it serves our goal. Being a “proof-of-concept” code, feel free to improve it (just tell me about it, will you?).

Looking for a new programming language to learn 21

I know it has been a long time since my last post. I am sorry about that, but life has it’s complications every now and then (as you know)... Well, on to the article.

Recently I had to reimplement in C a prefork server I wrote in Ruby for an internal project at Propus. Not that the Ruby version wasn’t enough (after all, although being in Ruby, I was using Unix plumbing, much in the fashion Ryan tell us about in the – now famous – I like Unicorn because it’s Unix article)... The problem is that, in one of our clients, the only version available for Ruby was 1.8.1.

Yeah… I know… But we were not allowed to upgrade and, although it didn’t seem at first, the same server presented a nasty memory leak in 1.8.1 that was not present in 1.8.7 and 1.9.1. I still don’t know where the problem is… I suspect some of the C-to-Ruby glues around TCP sockets might be blamed, but after a couple of days trying to figure it out, I decided it was easier just to reimplement it using C.

It actually took less than a day to get the C version going… nothing fancy and, apart from memory footprint, just the same functionality and about the same speed of the Ruby version. But it was enough to remind me I really don’t like all the scaffolding one has to raise in order to make something useful in C. It’s not just a matter of SLOC (of course, C version was more than 3 times longer than Ruby one)... I am talking about all the manual memory management, pointer operations and the disgusting experience of dealing with strings in C. I know some people are addicted to that sort of thing like heroin, but to me it just slows development.

This experience made me think about learning a second compiled programming language. I do some Perl, a lot of Python and (of course) most of my work in Ruby, but those are all interpreted languages. For compiled languages I always resorted to C… So I am officially looking for a language to learn.

So far, the best candidates are OCaml (I got a little excited about JoCaml a few months ago, now I might get serious about it), Haskell, Lisp, Objective-C, Ada, and Vala. Of these, I’ve been reading a lot about OCaml… It seems a fine and expressive language, with decent foundations, object-oriented extension, broad standard library and (with JoCaml) concurrency… Also it might give me the proper excuse to finally wrap my mind around a functional language!

People keep me pointing to Java and Erlang… Well… for using Java I would much prefer using JRuby. Erlang, ITOH, has a weird syntax (at least to me) and it seems much of what makes it great will, eventually, be part of Ruby (or already is using libraries) – either that or I’ll just wait for Reia to be ready. Besides, neither can be compiled to native code (ok, that argument can be stretched both ways, so just ignore it).

So, what do you think? Any advice?

Ruby versus Python 12

This is not another rant to praise one in spite of the other (an everybody knows I love Ruby, so it would not be impartial), but sometimes people seems to live in another world and do things for the wrong reasons.

I just read this blog post by Kanwei Li in which he gives 2 or 3 reasons he ditched Ruby in favor of Python. First of all, both are great languages and, although I favor Ruby, I use Python for some projects and they are not all that different. Of course, everyone is free to choose which language one favors, but Kanwei seems to be “ditching” Ruby out of not knowing much about it, or out of preferring one style over the other…

His first “reason” is that in Python white spaces matter. I used to think this is just a matter of style, but every now and then mandatory alignment hurts me (just try to put together a code generator and you’ll notice it). Although my code is always correctly aligned, I like that it’s done so because I want it that way, and not because some language demands it. Rants and more rants have been written about Python’s mandatory alignment (or other languages lack of it), and I am not going through all of it… Just I don’t think it’s a good reason to ditch Ruby…

After, he makes a big deal out of Ruby’s ternary if. As written by him, he prefers

#python
if len(a) > 0:
        v = a[0]
        a = a[1:]
else:
        v = None
over Ruby’s ternary if

#ruby
v = a.empty? ? a.shift : nil
Hey! Come on… Ruby’s ternary if is not mandatory… It was copied from C just as a syntax sugar. You can do without it, just as in Python:

#ruby
if ! a.empty?
    v = a.shift
else
    v = nil
end
Better yet! you can use if’s return as v value:

#ruby
v = if ! a.empty?
    a.shift
else
    nil
end

How beautiful is that!

Python lacked ternary if for a long time, and when it finally acquired one via PEP 308 its syntax was made different from every other language! Although I don’t think that is a problem, some people might think it would be better not reinventing the wheel.

Next, Kanwei goes over a famous “problem” of Ruby: the lack of a sum method for Array. I admit it’s strange, but that is completely coherent: Ruby’s Arrays are ordered collection of objects and not mathematical arrays. How do you sum objects that are not numbers? Many different people will have many different answers to that, so Ruby leaves this decision for the programmer and provides basic methods to deal with collections of anything (that can be used to apply sum to numbers, if wished). So, in Ruby you have to use Array#inject to perform a sum:

[1,2,3].inject(0) { |sum, value| sum + value }

Array#inject (actually Enumerable#inject) was borrowed from Smalltalk and allows you to loop through an array, building up an “accumulator value” as you go. When it’s done, the final value of this accumulator is returned. Very useful for combining array elements, whether by summing them, building up a pretty display string, whatever. In the example above, I am initializing the accumulator with 0.

If you use Array to mathematical operations and you want your arrays to work that way, you can always add a sum method to Array class:

class Array
    def sum
        self.inject(0) {|sum, value| sum + value}
    end
end

Maybe it would be better if you just use Arrays as containers (as it was intended to) and implement that sum inside your own class… I completely agree with Reg Braithwaite here.

Kanwei also mention Python is faster than Ruby. That is true, but was “more true” some time ago. First of all, Python is older and has had more time to improve its speed. Ruby, ITOH, just now acquired a good VM and improvements to it finally can run parallel to improvements in the language itself, so I am expecting this to be less true every release. Python is already not getting much faster between releases, unlike Ruby (the differences between 1.8.7 and 1.9.1 are really impressive!). IMHO this is not a good reason to choose one instead of the other: if you really need speed, go for C :-)

Now this is something I find interesting Kanwei has mentioned: “Python is more production ready”. He argues that Google is using it, so it must be good. Well… I cannot argue against that: Google is really using Python. But IBM, Oracle, EA, Cisco, Siemens, etc are using Ruby… so that is just a matter of preferring one or another company. Both are production ready… I agree, though, that Ruby 1.9.1 has many differences from 1.8.7, and that that may be seen as some inconsistency, but Python also has changed a lot since its 2.0 version, for that matter. And the changes to Ruby brought many benefits… I think they worth it.

At last, Kanwei compares Python and Ruby docstrings. Here I also have to agree with him: Ruby docstrings sucks. Actually that’s why everybody uses rdoc instead (and that is much more powerful than Python’s docstrings). Again, I don’t think that is reason enough to ditch Ruby (actually, the existence of rdoc, rubygems & friends should bring people to Ruby instead), but that is a matter of personal taste.

Surely, Kanwei’s reasons were easy to argue against. There are areas were Python shines much more than Ruby (and vice-versa), but those Kanwei mentioned are not among them.

I think both languages are powerful enough, and both are way better than Perl or PHP, so either one you choose would be fine. Better if you don’t have to choose and use both ;). If you have to, ITOH, pay more attention on how you feel while coding in each one, and not to some cheap reasons such as above. If you are a programmer, what matters most is that you’ll spend a lot of time coding with any given language… let that be something pleasant then.

Code testing coverage 0

I like building tests for my code. That is not an old habit, it’s just something I’ve been developing in the recent months or some few years. No, I am not doing TDD (although that doesn’t sound like a bad idea): I just build tests after I code as a safeguard – to be sure I haven’t broken anything. I suspect there are more programmers like myself than those using tests as part of a TDD (BDD, SDD, etc) approach, but that is just an opinion.

Well I just recently became found of code coverage estimates and tools, and rcov is such a nice tool that sometimes I just find myself building tests just to “please” it. I also suspect there are at least a bunch of people that do the same. Here are the results of the test coverage of one of my projects:


spectra@rohan:~/work/xmpp4r-observable$ rake rcov
(in /home/spectra/work/xmpp4r-observable)
rm -r coverage
Loaded suite /usr/bin/rcov
Started
........................
Finished in 70.995814 seconds.

24 tests, 97 assertions, 0 failures, 0 errors
+----------------------------------------------------+-------+-------+--------+
|                  File                              | Lines |  LOC  |  COV   |
+----------------------------------------------------+-------+-------+--------+
|lib/xmpp4r-observable.rb                            |   648 |   414 |  61.4% |
|lib/thread_store.rb                                 |    58 |    39 |  87.2% |
|lib/observable_thing.rb                             |   187 |   118 |  91.5% |
+----------------------------------------------------+-------+-------+--------+
|Total                                               |   893 |   571 |  69.4% |
+----------------------------------------------------+-------+-------+--------+
69.4%   3 file(s)   893 Lines   571 LOC
spectra@rohan:~/work/xmpp4r-observable$

Sure it’s tempting to get more of lib/xmpp4r-observable.rb covered, isn’t it?

Apresentando XMPP4R-Observable 0

Há apenas alguns dias fiz uma apresentação no FISL10 sobre a utilização de XMPP PubSub com Ruby e sobre um fork de uma biblioteca popular à qual acrescentei os rudimentos do PubSub. Naquela mesma apresentação listei uma série de problemas que aquela abordagem tem e falei sobre um roadmap para o futuro…

Acontece que acabei me convencendo de que não posso utilizar o PubSub no lado XMPP da biblioteca e uma forma de periodical pooling no lado Ruby. Resolvi, então, substituir a biblioteca que havia forkado por uma versão Observable, preservando as coisas boas do XMPP4R-Simple. O resultado chamei de XMPP4R-Observable, e acabo de publicar no GitHub.

Uma boa parte do código está coberta por testes (e “roubei” alguns dos testes da própria XMPP4R-Simple)... pretendo cobrir o restante ao longo do tempo (contribuições são bem-vindas). Por hora, chamei esse primeiro release de versão 0.5.1 e acrescentei um .gemspec para gerar um .gem automaticamente… No entanto, o GitHub ainda não publicou o .gem… Quando publicar, para instalá-lo deve ser tão simples quanto:


bash# gem sources -a http://gems.github.com
bash# gem install spectra-xmpp4r-observable

Não deixem de reportar qualquer erro. Happy hacking.

Update 2009-09-13 10:29:00: Acabo de confirmar que o .gem foi publicado pelo GitHub.

Update 2009-10-10 20:21:00: O .gem do XMPP4R-Observable vai ser mantido no GemCutter, a partir de hoje.

Mandando pro Subversion algo que só existe no git 0

Eu não sei bem por que tem pessoas que acham que eu sou um expert em git… Caras: só porque eu mantenho alguns projetos no GitHub não quer dizer que virei expert. Toda semana tem algum email para mim perguntanto alguma coisa sobre git… A maioria eu consigo responder já que é coisa básica (ou aponto para alguma documentação e pronto), mas ontem veio uma pergunta meio estranha: Como mandar para o subversion algo que, até o momento, só existe em git?

Essa é interessante… Até agora eu não tinha precisado disso: só estava usando o git para manter projetos que já tinham começado no subversion da empresa… Pesquisando um pouco e adaptando para o estilo de trabalhar da Propus, eis minha proposta:


bash$ cd /caminho/para/o/projetoX
bash$ svn mkdir https://servidor.svn/projetoX -m "Importando do Git" 
bash$ svn mkdir https://servidor.svn/projetoX/trunk -m "Importando do Git" 
bash$ git checkout -b svn
bash$ git svn init https://servidor.svn/projetoX -s
bash$ git svn fetch
bash$ git rebase trunk
(aqui eventualmente o git se "perde", e algum conflito é gerado. Nos
projetos em que isso aconteceu para mim, um "git add arquivo-com-conflito" 
seguido de um "git rebase --continue" foi o suficiente).
bash$ git svn dcommit

Com isso você tem um branch chamado svn que vai espelhar o que está no subversion. A partir de então e só seguir mantendo o código no master (ou em algum branch que quiser), fazer o merge com o branch svn e mandar para cima com um git svn dcommit...

find | while read var; do something "$var"; done 2

Essa vai para a galera que scripta muito bash. É a milésima vez que tenho de repetir esse comando para alguém (na milésima-primeira eu desisto e ponho no blog para referência ;-)).

O pessoal fica estressado com nomes de arquivos com espaços, ou tentando usar xargs com mais de um comando. No loop while você pode colocar o conjunto de comandos que quiser para executar sobre a variável em questão:


bash$ find ~/photos | while read foto; do mogrify -resize 800x "$foto"; done

Simples e eficiente.

rsync logs with restricted ssh 5

SSH is really the Swiss Army pocket knife of sysadmin tools. When I needed to periodically synchronize log files from an old server (old as in customer-would-never-update-it-or-install-anything-new), I built a simple and secure solution using rsync and ssh. This is what I did:

(I will call “remote” the system where the logs I want to retrieve are, and “local” system where I want them to be copied to) First I created an account with a restricted shell (ideally this should be a system account, but we’ll get there!):


remote# adduser --ingroup nogroup --shell /bin/rbash rlogs

Then locally, I created a new, password-less ssh key pair, copying it to my remote system:


local$ ssh-keygen
>>> When asked where to save it, I chose a different name, like .ssh/rlogs
local$ ssh-copy-id -i .ssh/rlogs.pub rlogs@remote
...
>>> You can delete the password of user rlogs, so it, effectively,
>>> cannot log-in with it (almost like a system user).
remote# passwd -d rlogs

Now you should be able to run password-less rsync already (note that I use -e option to point to a different key):


local$ mkdir logs
local$ rsync -av -e "ssh -i $HOME/.ssh/rlogs" rlogs@remote:"logs/" logs/
receiving file list ... done
./
file1
file2
...
fileN

But even with a restricted shell, I wanted even less possible things to happen. That’s what command= directive is for… It will only allow that command to be run in a session started by that key. Since rsync translates a lot of its command-line options, I run it again with a dirty ps-in-a-loop in the remote host, just to see what running rsync locally causes remotely:


remote$ while 1; do ps wp $(pgrep rsync); sleep 1; done
...
local$ rsync -av -e "ssh -i $HOME/.ssh/rlogs" rlogs@remote:"logs/" logs/
>>> in the remote loop you should be able to get the command:
  PID TTY      STAT   TIME COMMAND
 6183 ?        Ss     0:00 /usr/bin/rsync --server --sender -vlogDtpre.i . logs/

Here comes the authorized_keys magic. At the remote host I edited .ssh/authorized_keys to add a command= line with what I found out in my dirty loop. Also, I added a couple of directives to restrict it even further (they are pretty self-explanatory):


rlogs@remote$ cat .ssh/authorized_keys
command="rsync --server --sender -vlogDtpre.i . logs/",no-port-forwarding,no-agent-forwarding,no-X11-forwarding,no-pty ssh-rsa (...) myuser@local

Now everything is set. I just added the rsync command to the local crontab and it’s done.

PubSub with XMPP4R-Simple 0

As I already told you, I am using XMPP4R-Simple in an internal project. For this, I’d have to add PubSub capabilities to it, and I did when I decided to get serious with git. Today I committed a version with integration of various patches by other forks I found on GitHub plus a lot of improvements on PubSub functions.

With this version it’s simple to use PubSub like this:


require 'xmpp4r-simple'

# Simple function just to parse an event
require 'time'
def parse_event(event)
  item = event.children[0]
  node = item.node
  time = nil; item.each_element("//published") { |e| time = Time.parse(e.text) }
  text = nil; item.each_element("//body") { |e| text = e.text }
  return { :item => item, :node => node, :time => time, :text => text }
end

# Create the clients
im1 = Jabber::Simple.new "im1@example.com", "password" 
im2 = Jabber::Simple.new "im2@example.com", "password" 

# im1 creates a node
im1.create_node("/some/node")

# im2 subscribe to that node
im2.pubsubscribe_to("/some/node")

# We'll start a simple thread to get the events coming from that node to im2
Thread.new { loop {
  sleep 1 while ! im2.received_events?
  im2.received_events { |event|
    h = parse_event(event)
    puts ">>> Got an event from node #{h[:node]} published at #{h[:time]} with text #{h[:text]}" 
  }
}}

# Now im1 just publishes anything to that node.
im1.publish_atom_item("/some/node", "This is my node", "This is the content of my node")

# The thread should capture the event and run puts on the hash from parsing.

PubSub is great, isn’t it?

NTP-Pool: aumentando a participação brasileira 8

Você conhece o NTP Pool? Provavelmente não, mas se você estiver usando uma das distribuições de GNU/Linux mais recentes, incluindo as maiores, e estiver atualizando o relógio pela rede, provavelmente está usando o NTP Pool.

Manter o relógio certo é um problema bastante comum e crucial para a computação atual. Muita coisa depende de sincronia, e de manter dados fidedignos quanto ao momento exato em que determinada ação ocorreu. O que parece um desafio, na realidade tem uma solução muito simples: Network Time Protocol, um protocolo padrão da Internet que mantém o relógio do computador local sincronizado com o de um computador remoto.

Simples sim, mas com um custo: existem relativamente poucos computadores ligados a fontes de sincronia temporal fidedignas (como GPS, ou relógios atômicos). Logo, o acesso a um desses computadores acabava sendo um privilégio de poucos. O NTP também tem resposta a isso… uma vez que um computador obtenha a sincronia com um desses servidores centrais (ou de camada 1 – stratum 1 – na terminologia do NTP) pode exportar essa mesma sincronia uma camada abaixo (efetivamente tornando-se stratum 2), num sistema que é escalável exponencialmente, sem perda significativa da precisão.

No entanto, um problema ainda ficava em aberto: que servidor NTP usar? Aí é que entra o NTP Pool, um conglomerado de servidores públicos, com banda doada pelos seus mantenedores, integrando um circuito cada vez maior de sincronia temporal, monitorados para garantir a precisão e a disponibilidade.

Sou usuário de NTP há muito tempo, tendo mantido um servidor NTP para uso interno em cada rede que monto. Esse servidor interno aponta sempre para o NTP Pool. Recentemente me interessei por saber mais sobre o NTP Pool, e fiquei bastante surpreso ao saber que dos mais de 1700 servidores de NTP que integram o NTP Pool, menos de 20 estão na América do Sul (e desses, apenas 13 no Brasil!). É impressionante! Eu mesmo mantenho um servidor NTP no Pool (ntp.nardol.org), no entanto, como ele se localiza fisicamente nos EUA, serve para aumentar as estatísticas de lá.

Vamos aumentar a participação brasileira? Se você possui um servidor no Brasil pode instalar facilmente o ntp server. Instruções mais detalhadas podem ser encontradas no próprio site do NTP Pool. O consumo de banda ocorre via protocolo UDP, e é ridículo. Para garantir que esse consumo nunca exceda a capacidade do meu servidor, por exemplo, uso simples regras de IPTables, como as abaixo:


bash$ iptables-save
(...)
-A INPUT -p udp -m udp --dport 123 -j ntp
(...)
-A ntp -m limit --limit 10/sec --limit-burst 6 -j ACCEPT
-A ntp -j LOG --log-prefix "ntp flood: " --log-level 7
-A ntp -j DROP
(...)
bash$ 

No entanto veja que as regras de limitação de conexões no meu servidor nunca foram atingidas (pelo menos desde o último reboot :-) ):


bash$ uptime
 15:20:00 up 23 days, 17:32,  1 user,  load average: 0.04, 0.12, 0.18
bash$ iptables -vnL
(...)
Chain ntp (1 references)
 pkts bytes target     prot opt in     out     source               destination 
 233K   18M ACCEPT     0    --  *      *       0.0.0.0/0            0.0.0.0/0         limit: avg 10/sec burst 6
    0     0 LOG        0    --  *      *       0.0.0.0/0            0.0.0.0/0         LOG flags 0 level 7 prefix `ntp flood: '
    0     0 DROP       0    --  *      *       0.0.0.0/0            0.0.0.0/0 
(...)
bash$ 

ntp_pool

Happy hacking!

Maceio - took some days off 4

I finally took some days off. Those are most needed, since I spent carnival on call at the hospital (argh!)... So Brenda and I decided to spend those days at Maceio, capital of Alagoas state, and a very anticipated vacation. They have a lot of sun and beautiful beaches, enough to fill our week (and get some tan also).

This picture was taken at “Praia do Gunga” (Gunga’s Beach), a charming place with a calm shore, almost like a pool, protected by natural reefs. As you can see, I am having a bad time right now :-)

Food is excellent, so are the people. But there are some inconveniences (as always). Beaches around downtown are not proper for bathing… They’re fighting a long fight against pollution (and loosing, if you ask me)... Also, Alagoas is a poor state… Our guide said alphabetization covers less than 70% of the people…

Also, network connection is expensive in hotels. Ours charges BRL 1,00 every 5 minutes! And the speed is not the best. They have one of those systems requiring a web authentication before you go. I’ve seem people complaining about this kind of system in Planet Debian before (reference please!) and suggesting Tunneling over DNS as a “fix”. I’ve noticed it would work in our hotel, but I decided to try another approach I’ve already written about: just a quick tunnel over an ssh connection.

I know I told you I needed an authentication before, but that is for the first connection! Yes, once the connection is established, I could just log out (thus stop the charging). No new connections could be made, but the tunnel was already up, so just put everything through the tunnel and I should be fine right? Wrong. I got bitten by a drawback of the technique already pointed in a comment when I first wrote about it: in an error-prone network, TCP-in-TCP slowly dies of attempting to correct itself over and over… and I was using a poorly connected wi-fi (loosing almost 30% of the packets!).

So, I was left with the set-up of a not foreseen tunnel using DNS as the only option… This would take time (and money)... So I decided for a simpler approach: SOCKS proxy. Yes, everything I would do could be done through a SOCKS! So a simple:


bash$ ssh -D 8888 my.remote.location

was all that I needed. That and setting my Firefox to use a SOCKS proxy on localhost:8888 and all went fine. I paid to set-up the tunnel then, once established, I logged out and kept using my tunnel all this time. Simple and effective, and I got some time left to blog about it. :-)

Git basics: reversing the 'git sucks' effect 10

I’ve been using git this last few days and I am still working on a workflow for my projects. Unfortunately, as others have noticed, git violates POLS is so many ways, it ends up being hard to get.

Creating a remote repository seems to be the first thing to bite a developer switching to git (mainly if coming from a centralized SCM). I have not decided which is the best way of doing it, but I’ve been using git-daemon via inetd and a path in my remote hosting holding all my repositories for public pulling, and ssh for pushing. Here is how to create it:


local$ ssh spectra@remotehost
remotehost$ mkdir /var/git/myproject.git
remotehost$ cd /var/git/myproject.git
remotehost$ git --bare init 
remotehost$ logout
local$ cd myproject
local$ git remote add remotehost spectra@remotehost:/var/git/myproject.git
local$ git push remotehost master
...
otherlocal$ git clone git://remotelocal/myproject.git
otherlocal$ cd myproject
otherlocal$ git remote add remotehost spectra@remotehost:/var/git/myproject.git
(hack hack hack)
otherlocal$ git push remotehost master

Now everybody can pull your repository at git://remotehost/myproject.git and you can push and pull to it via ssh. Note that you have to setup git-daemon, which is pretty straight forward. I am using it as an inetd daemon, but you can use it as a standalone one. Debian has a package which does just that.

Now, some people think logging in a remote server just to create an empty repository is too much. Well… repositories are just .git directories. It happens that you can “push” for the first time by rsyncing your .git with a remote host:


local$ cd myproject
local$ rsync -a .git/ spectra@remotehost:/var/git/myproject.git
local$ git remote add remotehost spectra@remotehost:/var/git/myproject.git
(hack hack hack)
local$ git push remotehost master

(Apparently you can “push” using rsync every time, but it’s regarded as wiser to keep your – probably – crappy local repository commits separated from the public repository… otherwise commit messages like “Please, don’t use this code” are likely to pop up everywhere :) ). Now, I don’t know if this have side effects, but it works :-)

Another thing to notice is that git is more directed at pulling than pushing. This may be because of its designer: the way Linus works is by pulling changes from others’ repositories and not by letting others push’em into his one. And this is another violation of POLS for most of the people, who is used to “commit” their changes into some remote repository. Rather than that, people using git would expose their own repositories in order to have it pulled by others.

I also agree with most of the git critics wrt git commands… There’s a lot of examples – and I am not going deeper in this – but I think it was a bad choice calling “checkout” what git does when told to “checkout”, for instance. Yes, I know… different tools, different ways of seeing it… but everyone was already used to what centralized SCMs call their operations, and I think it would only help git adopting same names for the same operations, and inventing new ones for those proper of decentralized operations. Anyway, once you get it (and I have not completely got it yet), it seems all flow as expected. If this adaptation fails, there’s still Easy Git to the rescue!

One last thing that I think contribute to the “git sucks” effect: git-svn. This is a great tool, but it was built from git’s point of view… Given it’s intended as a glue for Subversion newcomers, it would benefit more from being built from svn’s point of view. This was mentioned by a colleague developer in my company, when he just couldn’t understand why one have to git svn rebase instead of a simple update. So git-svn also suffers of the “bad command naming habit” git do. Of course, that given that you came from this environment (I am sure git and git-svn makes perfect sense for Linus & cia :-) ). I have not tried yet, but yap seems to be targeted on providing an alternative to git-svn.

Unix time 1234567890 0

Fantastic! It’s Unix time == 1234567890:


spectra@erebor:~$ irb
irb(main):001:0> Time.at(1234567890)
=> Fri Feb 13 21:31:30 -0200 2009
irb(main):002:0>

(And also Friday 13th…. brrrrrr).

BTW, the best “rant” about Unix Epoch I’ve ever seem on a comic strip was from xkcd (please, don’t miss the ‘title’ field of the image – just hold your mouse pointer on it for a few seconds).

Cheers!

GitHub saving space? 6

I was browsing GitHub, getting to know the system and feeling pretty amazed by it (seriously… I felt I discovered Orkut for developers)... when some thought just stroke me: how do they save space?

Yes, I know git is pretty efficient when it comes to saving space. Yes, I also know that space are becoming cheaper with time, but still, they claim to have +50k developers hooked up their servers… if they’re not doing something about space, things will go inefficient quite quickly. Rails alone seems to have 464 forks! If all of them represent one bare clone of the ‘canonical’ repository on GitHub’s side, that is a lot of space wasted in duplicated things…

Git has one amazing feature: hashing the objects it keeps track of. It surely doesn’t seem too complicated to design a schema that avoids having two copies of objects with the same hash. So all those forks of Rails, on GitHub’s side, would be just hardlinks to the ‘canonical’ repository…

Surely people working in GitHub are smart enough to have though about it on their own… Who knows? Maybe that’s exactly what they’re doing already! If so, can I ask them to share their schema, since it can become very useful to the rest of us? If not, can we beat them ;-D?

Alright! I surrender: Git rules! 5

Everybody seems to be using git these days… I am not very found of “hypes”, as I told you before, but there’s been some time I’ve been evaluating git. I was happy with svk for quite a long time… Lately, though, I’ve been developing and extending a lot of ruby libraries, and all of them seem to be hosted at GitHub and using git… So, why not give it a serious try?

I chose a simple task on my TODO-list: to extend xmpp4r-simple to support XMPP PubSub, so I could use it in an internal project at my company. So I “cloned” its git repo and started developing. Not a really hard task, since all I had to do was use the underlying library (xmpp4r – not surprisingly also in GitHub) and mimic what Blaine’s done for the callbacks and it was ready (and working… although I still need to put more time on the test suite).

I am still exploring… It will take some time to migrate my stuff (and maybe a lot will remain in my company’s Subversion), but that’s what git-svn is for, isn’t it? Right now I am looking for ways to work with git-buildpackage and Debian git tools… Do you have some advice on that?