sourcify

Workarounds before ruby-core officially supports Proc#to_source (& friends)

3 years after

= Sourcify

== IMPORTANT#1: Sourcify was written in the days of ruby 1.9.x, it should be buggy for anything beyond that. == IMPORTANT#2: Sourcify is no longer maintained, use it at your own risk, & expect no bug fixes.

ParseTree[http://github.com/seattlerb/parsetree] is great, it accesses the runtime AST (abstract syntax tree) and makes it possible to convert any object to ruby code & S-expression, BUT ParseTree doesn't work for 1.9.* & JRuby.

RubyParser[http://github.com/seattlerb/ruby_parser] is great, and it works for any rubies (of course, not 100% compatible for 1.8.7 & 1.9.* syntax yet), BUT it works only with static code.

I truely enjoy using the above tools, but with my other projects, the absence of ParseTree on the different rubies is forcing me to hand-baked my own solution each time to extract the proc code i need at runtime. This is frustrating, the solution for each of them is never perfect, and i'm reinventing the wheel each time just to address a particular pattern of usage (using regexp kungfu).

Enough is enough, and now we have Sourcify, a unified solution to extract proc code. When ParseTree is available, it simply works as a thin wrapper round it, otherwise, it uses a home-baked ragel-generated scanner to extract the proc code. Further processing with RubyParser & Ruby2Ruby to ensure 100% with ParseTree (yup, there is no denying that i really like ParseTree).

== Installing It

The religiously standard way:

$ gem install ParseTree sourcify

Or on 1.9.* or JRuby:

$ gem install ruby_parser file-tail sourcify

== Sourcify adds 4 methods to Proc

=== 1. Proc#to_source

Returns the code representation of the proc:

require 'sourcify'

lambda { x + y }.to_source

>> "proc { (x + y) }"

proc { x + y }.to_source

>> "proc { (x + y) }"

Like it or not, a lambda is represented as a proc when converted to source (exactly the same way as ParseTree). It is possible to only extract the body of the proc by passing in {:strip_enclosure => true}:

lambda { x + y }.to_source(:strip_enclosure => true)

>> "(x + y)"

lambda {|i| i + 2 }.to_source(:strip_enclosure => true)

>> "(i + 2)"

=== 2. Proc#to_sexp

Returns the S-expression of the proc:

require 'sourcify'

x = 1 lambda { x + y }.to_sexp

>> s(:iter,

>> s(:call, nil, :proc, s(:arglist)),

>> nil,

>> s(:call, s(:lvar, :x), :+, s(:arglist, s(:call, nil, :y, s(:arglist)))))

To extract only the body of the proc:

lambda { x + y }.to_sexp(:strip_enclosure => true)

>> s(:call, s(:lvar, :x), :+, s(:arglist, s(:call, nil, :y, s(:arglist)))))

=== 3. Proc#to_raw_source

Unlike Proc#to_source, which returns code that retains only functional aspects, fetching of raw source returns the raw code enclosed within the proc, including fluff like comments:

lambda do |i| i+1 # (blah) end.to_raw_source

>> "proc do |i|

>> i+1 # (blah)

>> end"

NOTE: This is extracting of raw code, it relies on static code scanning (even when running in ParseTree mode), the gotchas for static code scanning always apply.

=== 4. Proc#source_location

By default, this is only available on 1.9., it is added (as a bonus) to provide consistency under 1.8.:

/tmp/test.rb

require 'sourcify'

lambda { x + y }.source_location

>> ["/tmp/test.rb", 5]

== Sourcify adds 3 methods to Method

IMPORTANT: These only work for MRI-1.9.2, as currently, only it supports (1) discovering of the original source location with Method#source_location, and (2) reliably determinig a method's parameters with Method#parameters. Attempting to use these methods on other rubies will raise Sourcify::PlatformNotSupportedError.

NOTE: The following works for methods defined using both def .. end & Module#define_method. However, when a method is defined using the later approach, sourcify uses Proc#to_source to handle the processing, thus, the usual gotchas related to proc source extraction apply.

=== 1. Method#to_source

Returns the code representation of the method:

require 'sourcify'

class MyMath def self.sum(x, y) x + y # (blah) end end

MyMath.method(:sum).to_source

>> "def sum(x, y)

>> (x + y)

>> end"

Just like the Proc#to_source equivalent, u can set :strip_enclosure => true to extract only the body within.

=== 2. Method#to_sexp

Returns the S-expression of the method:

require 'sourcify'

class MyMath def self.sum(x, y) x + y # (blah) end end

MyMath.method(:sum).to_sexp

s(:defn, :sum, s(:args, :x, :y), s(:scope, s(:block, s(:call, s(:lvar, :x), :+, s(:arglist, s(:lvar, :y))))))

Just like the Proc#to_sexp equivalent, u can set :strip_enclosure => true to extract only the body within.

=== 3. Method#to_raw_source

Unlike Method#to_source, which returns code that retains only functional aspects, fetching of raw source returns the method's raw code, including fluff like comments:

require 'sourcify'

class MyMath def self.sum(x, y) x + y # (blah) end end

MyMath.method(:sum).to_raw_source

>> "def sum(x, y)

>> x + y # (blah)

>> end"

Just like the Proc#to_raw_source equivalent, u can set :strip_enclosure => true to extract only the body within.

== Performance

Performance is embarassing for now, benchmarking results for processing 500 procs (in the ObjectSpace of an average rails project) yiels the following:

ruby user system total real ruby-1.8.7-p299 (w ParseTree) 10.270000 0.010000 10.280000 ( 10.311430) ruby-1.8.7-p299 (static scanner) 14.120000 0.080000 14.200000 ( 14.283817) ruby-1.9.1-p376 (static scanner) 17.380000 0.050000 17.430000 ( 17.405966) jruby-1.5.2 (static scanner) 21.318000 0.000000 21.318000 ( 21.318000)

Since i'm still pretty new to ragel[http://www.complang.org/ragel], the code scanner will probably become better & faster as my knowlegde & skills with ragel improve. Also, instead of generating a pure ruby scanner, we can generate native code (eg. C or java, or whatever) instead. As i'm a C & java noob, this will probably take some time to realize.

== Gotchas

Nothing beats ParseTree's ability to access the runtime AST, it is a very powerful feature. The scanner-based (static) implementation suffer the following gotchas:

=== 1. The source code is everything

Since static code analysis is involved, the subject code needs to physically exist within a file, meaning Proc#source_location must return the expected [file, lineno], the following will not work:

def test eval('lambda { x + y }') end

test.source_location

>> ["(eval)", 1]

test.to_source

>> Sourcify::CannotParseEvalCodeError

The same applies to Blah#to_proc & &:blah:

klass = Class.new do def aa(&block); block ; end def bb; 1+2; end end

klass.new.method(:bb).to_proc.to_source

>> Sourcify::CannotHandleCreatedOnTheFlyProcError

klass.new.aa(&:bb).to_source

>> Sourcify::CannotHandleCreatedOnTheFlyProcError

=== 2. Multiple matching procs per line error

Sometimes, we may have multiple procs on a line, Sourcify can handle this as long as the subject proc has arity that is unique from others:

Yup, this works as expected :)

b1 = lambda {|a| a+1 }; b2 = lambda { 1+2 } b2.to_source

>> proc { (1 + 2) }

Nope, this won't work :(

b1 = lambda { 1+2 }; b2 = lambda { 2+3 } b2.to_source

>> raises Sourcify::MultipleMatchingProcsPerLineError

As observed, the above does not work when there are multiple procs having the same arity, on the same line. Furthermore, this bug[http://redmine.ruby-lang.org/issues/show/574] under 1.8.* affects the accuracy of this approach.

To better narrow down the scanning, try:

  • passing in the {:attached_to => ...} option

    x = lambda { proc { :blah } }

    x.to_source

    >> Sourcify::MultipleMatchingProcsPerLineError

    x.to_source(:attached_to => :lambda)

    >> "proc { proc { :blah } }"

  • passing in the {:ignore_nested => ...} option

    x = lambda { lambda { :blah } }

    x.to_source

    >> Sourcify::MultipleMatchingProcsPerLineError

    x.to_source(:ignore_nested => true)

    >> "proc { lambda { :blah } }"

  • attaching a body matcher proc

    x, y = lambda { def secret; 1; end }, lambda { :blah }

    x.to_source

    >> Sourcify::MultipleMatchingProcsPerLineError

    x.to_source{|body| body =~ /^(.*\W|)def\W/ }

    >> 'proc { def secret; 1; end }'

Pls refer to the rdoc for more details.

=== 3. Occasional Racc::ParseError

Under the hood, sourcify relies on RubyParser to yield s-expression, and since RubyParser does not yet fully handle 1.8.7 & 1.9.* syntax, you will get a nasty Racc::ParseError when you have any code that is not compatible with 1.8.6.

=== 4. Lambda operator doesn't work

When a lambda has been created using the lambda operator "->", sourcify can't handle it:

x = ->{ :blah }
x.to_source
# >> Sourcify::NoMatchingProcError

== Is it really working ??

Sourcify spec suite currently passes in the following rubies:

  • MRI-1.8.*, REE-1.8.7 (both ParseTree & static scanner modes)
  • JRuby-1.6., MRI-1.9. (static scanner ONLY)

Besides its own spec suite, sourcify has also been tested to handle:

ObjectSpace.each_object(Proc) {|o| puts o.to_source }

For projects:

(TODO: the more the merrier)

== Projects using it

Projects using sourcify include:

== Additional Resources

Sourcify is heavily inspired by many ideas gathered from the ruby community:

The sad fact that Proc#to_source wouldn't be available in the near future:

== Note on Patches/Pull Requests

  • Fork the project.
  • Make your feature addition or bug fix.
  • Add tests for it. This is important so I don't break it in a future version unintentionally.
  • Commit, do not mess with rakefile, version, or history. (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
  • Send me a pull request. Bonus points for topic branches.

== Copyright

Copyright (c) 2010 NgTzeYang. See LICENSE for details.

Related Repositories

cantango

cantango

CanCan extension with role oriented permission management, rules caching and muc ...

mayday

mayday

Easily add custom warnings and errors to your Xcode project's build process ...

pretentious

pretentious

Tool for writing characterization tests - Generate RSpec or minitest specs based ...

wunderbar

wunderbar

Easy HTML5 applications ...

maroon

maroon

A gem to make pure DCI available in Ruby (was originally named Moby) ...


Top Contributors

ngty tomykaira dekz jmettraux seamusabshere