Refresh Machine em Ruby

Eventualmente todos enfrentaremos esta situação: você tem um objeto com “estados” que são lidos de uma fonte externa através de pooling. Usualmente você simplesmente adiciona uma thread no construtor do objeto e coloca dentro dela um loop infinito que verifica a cada intervalo de tempo se existe estado novo e, se for o caso, atualiza o estado do objeto.

Essa é uma construção muito simples e bastante utilizada, no entanto tem um problema de escalabilidade importante: para cada objeto você tem uma thread e um loop infinito. Se você tem poucos objetos, isso não chega a ser um problema… mas se tem muitos, aí é outra conversa.

Alguns vão argumentar que não é obrigatório usar essa construção, que dependendo da fonte dos dados, ela pode “atualizar” o objeto, em um padrão Observer, o que está mais do que correto… Mas nem sempre controlamos a fonte dos dados, e nem sempre ela é inteligente assim (na realizade, quando não a controlamos ela parece bantante burra!). Foi para isso que programei uma “Refresh Machine”: um container para objetos que precisam ser atualizados através de pooling. Ela presume duas coisas: que esse objeto tenha um método/atributo refresh que armazene o intervalo em segundos entre as atualizações, e um método do_refresh, que será executado entre os intervalos.

Vamos ao código:


require 'thread'
class RefreshMachine

  attr_accessor :wait

  def initialize(wait = false)
    @wait    = wait
    @killall = false
    @queue   = Queue.new
    @removed = Array.new
    @thread  = Thread.new do
      loop do
        ( @removed.clear; @queue.clear; @killall = false ) if @killall
        next if @queue.empty?
        object, last_refresh, thread = @queue.deq

        # Helps the garbage collection
        thread = nil if (! thread.nil?) and (! thread.alive?)

        # Three things can happen with a dequeued object
        if (Time.now < (last_refresh + object.refresh)) or
           (! thread.nil? and @wait)
          # First: It's too early to refresh it or we still have
          #        a refresh thread running and have to 'wait', so we
          #        just put it back in the queue
          @queue.enq [ object, last_refresh, thread ]
        else
          if @removed.include?(object)
            # Second: We have a "remove request" for it, so we
            #         delete the request and avoid queueing it again
            @removed.delete(object)
          else
            # Third: It's time to refresh it, so we
            #        call do_refresh and put it back in the queue
            add(object)
          end
        end
      end
    end
  end

  def add(object)
    @queue.enq [ object, Time.now, Thread.new { object.do_refresh } ]
  end

  def del(object)
    @removed << object unless @removed.include?(object)
  end

  def killall
    @killall = true
  end

end # of class RefreshMachine

O objetivo é usar o mínimo possível de threads com loops infinitos…. e ficou pequena o bastante para blogar a respeito :-)

Great News: Etch'n'Half

Great to hear about “etch and a half”. I’ve just upgraded all my systems and everything went smooth. I dumped my home-compiled Ruby in favor of Debian’s version now, since it fixes the annoying security bug. Thanks for the good work people!

Exciting new World

I’ve just tested the improvements in the performance of Javascript in Firefox 3 and WOW! Javascript in FF3 is really fast. While googling about it I just ran across a recent interview with Brendan Eich about the future of Javascript and I got excited about two things about this future.

First was what we already have, still in the beginning, but with a lot of potential: HotRuby. Really interesting to script a webpage in Ruby (which is my favorite language) and, while it’s not embedded the way Javascript is, it gets “compiled” in the server side with YARV (the new bytecode compiler for the next version of Ruby, 1.9), and then served to the browser in the form of JSON objects, so it can be interpreted by the Javascript engine in it. All this is transparent and work with XMLHttpRequest. It’s not a coincidence that Eich mentions it as being a form of ARAX (changing the J in AJAX for R – from Ruby).

I already do a lot of coding in Ruby… not having to deal with Javascript anymore is surely a plus. ITOH, Eich is talking about improvements in Javascript that would render it as a real programming language… Maybe coding in it would not be so painful anymore by then ;-)

The whole interview have to do with this Project Tamarin, a “high-performance, open source implementation of the ECMAScript 4th edition (ES4) language specification” [ ECMAScript 4 is the same thing as Javascript 2 ] by the Mozilla developers. And this is the second thing I got excited for: they’ve planned to glue IronRuby (Ruby compiler for argh! .NET) to it via IronMonkey.

So… exciting news! Either via Tamarin or via HotRuby, we’ll get Ruby browser scripting. My “free mind” tends to favor HotRuby instead of IronRuby/IronMonkey/Tamarin… But in the end what matters is that all those people now cursed by Javascript will finally have a taste of what a real programming language feels like.. Who knows! They might even like it ;-D

Ruby security advisory and fix

Debian 4.0 version of Ruby is open to the, now widely known, Ruby security vulnerabilities. The bug is reported as 487238 in Debian’s BTS, and is closed, since the version now in sid (version 1.8.7.22-1) is already fixed. Users of stable can apply the patch provided by Daniel Franke (it doesn’t seem to fix all, but goes a long way).

Apparently, this brought up (again) the rants over full disclosure. Indeed, what is vulnerable is not that hard to find, as Zed Shaw showed us, so, why not talk about it in a plain and bold form? Why just provide the CVE numbers and ask for everybody to upgrade? Zed goes more deep about the quality of C code, but that is not the issue I want to talk about…

As a Free and Open Source Software supporter (and developer), I can see the benefits of full disclosure. As a not-full-time webmaster, I can see the benefits of not having a “proof-of-concept” piece of code attached to the vulnerability report. Of course, there’s a lot of things a webmaster can do to prevent having a machine completely compromised in case a security advisory is published with a proof-of-concept code in it (think about chrooting, randomized memory protection, security libraries, grsecurity, SELinux, etc) – and my machines, although vulnerable to the bug, would not be fully compromised if exploited.

I guess one should be prepared to whatever comes from the Internet… Full disclosure, in this sense, have more pros than cons, IMHO. For instance it was not clear if Debian 4.0 were vulnerable… There were no security advisory coming from Debian (and there’s still not), and it is not promptly obvious if the version packaged is affected. I know that at least I wanted to run a proof-of-concept to check if my server is vulnerable or not before going all the way into packaging a fix (or backporting the sid version), and it was not until I read Matasano Chargen Blog that I could test older versions. But different people have different ideas…