testing - Optimizing ruby code for parsing a string to a numeric value -

i working on speed-tests ruby , need parse textfiles numeric values. due slow speed wondering if code optimized, or if ruby slow. code being read files, these files contain around 1 000 000 randomly generated lines or numbers, display few lines, know being read. filenames need read being passed arguments, coed separate scripts (just own clarity).

first want parse simple number, input comes in format:

type number  type number  ...

this how did it:

incr = 1  file.open(argv[0], "r").each_line |line|   incr += 1   if incr % 3 == 0     line.to_i   end  end

second need parse single list, input comes in format:

type (1,2,3,...)  type (1,2,3,...)  ...

this how did it

incr = 1  file.open(argv[0], "r").each_line |line|   incr += 1   if incr % 3 == 0     line.gsub("(","").gsub(")","").split(",").map{ |s| s.to_i}   end  end

finally need parse list of lists, input comes in format:

type ((1,2,3,...),(1,2,3,...),(...))  type ((1,2,3,...),(1,2,3,...),(...))  ...

this how did it:

incr = 1  file.open(argv[0], "r").each_line |line|   incr += 1   if incr % 3 == 0     line.split("),(").map{ |s| s.gsub("(","").gsub(")","").split(",").map{ |s| s.to_i}}    end  end

i not need display results, speedtesting, there no need output. did check outcome , codes seem work correctly, suprisingly slow , speedtest optimum of ruby has offer. know there several speedtests out there use, purpose need build own.

what can better? how can code optimized? did go wrong, or best ruby can do? thank in advance tips , ideas.

in first one, instead of:

file.open(argv[0], "r").each_line |line|

use:

file.foreach(argv[0]) |line|

and instead of:

  incr += 1   if incr % 3 == 0

use:

 if $. % 3 == 0

$. magic variable line number of last read line.

in second one, instead of:

line.gsub("(","").gsub(")","").split(",").map{ |s| s.to_i}

use:

line.tr('()', '').split(',').map(&:to_i)

in third one, instead of:

line.split("),(").map{ |s| s.gsub("(","").gsub(")","").split(",").map{ |s| s.to_i}}

use:

line.scan(/(?:\d+,?)+/).map{ |s| s.split(',', 0).map(&:to_i) }

here's how line works:

line.scan(/(?:\d+,?)+/) => ["1,2,3,", "1,2,3,"]  line.scan(/(?:\d+,?)+/).map{ |s| s.split(',',0) } => [["1", "2", "3"], ["1", "2", "3"]]  line.scan(/(?:\d+,?)+/).map{ |s| s.split(',', 0).map(&:to_i) } => [[1, 2, 3], [1, 2, 3]]

i didn't run benchmarks compare speed, changes should faster because gsub calls gone. changes made weren't fastest ways things, they're more-optimized versions of own code.

trying compare speed of ruby other languages requires knowledge of fastest ways of accomplishing each step, based on multiple benchmarks of step. implies you're running on identical hardware , os , languages compiled efficient-for-speed forms. languages make tradeoffs of memory use vs. speed, so, while 1 might slower another, might more memory efficient.

plus, when coding in production environment, time produce code works correctly has factored "which faster" equation. c extremely fast, takes longer write programs ruby problems, because c doesn't hold hand ruby does. faster when c code takes week write , debug, vs. ruby code took hour? stuff think about.

i didn't read through @tadman's answer , comments until finished. using:

map(&:to_i)

used slower than:

map{ |s| s.to_i }

the speed difference depends on version of ruby you're running. using &: implemented in monkey-patches it's built-into ruby. when made change sped lot:

require 'benchmark'  foo = [*('1'..'1000')] * 1000 puts foo.size  n = 10 puts "n=#{n}"  puts ruby_version puts  benchmark.bm(6) |x|   x.report('&:to_i') { n.times { foo.map(&:to_i) }}   x.report('to_i') { n.times { foo.map{ |s| s.to_i } }} end

which outputs:

1000000 n=10 2.0.0               user     system      total        real &:to_i   1.240000   0.000000   1.240000 (  1.250948) to_i     1.400000   0.000000   1.400000 (  1.410763)

that's going through 10,000,000 elements, resulted in .2/sec difference. it's not of difference between 2 ways of doing same thing. if you're going processing lot more data matters. applications it's moot point because other things bottlenecks/slow-downs, write code whichever way works you, speed difference in mind.

to show difference ruby version makes, here's same benchmark results using ruby 1.8.7:

 1000000 n=10 1.8.7              user     system      total        real &:to_i  4.940000   0.000000   4.940000 (  4.945604) to_i    2.390000   0.000000   2.390000 (  2.396693)

as far gsub vs. tr:

require 'benchmark'  foo = '()' * 500000 puts foo.size  n = 10 puts "n=#{n}"  puts ruby_version puts  benchmark.bm(6) |x|   x.report('tr') { n.times { foo.tr('()', '') }}   x.report('gsub') { n.times { foo.gsub(/[()]/, '') }} end

with these results:

 1000000 n=10 1.8.7              user     system      total        real tr      0.010000   0.000000   0.010000 (  0.011652) gsub    3.010000   0.000000   3.010000 (  3.014059)

and:

 1000000 n=10 2.0.0               user     system      total        real tr       0.020000   0.000000   0.020000 (  0.017230) gsub     1.900000   0.000000   1.900000 (  1.904083)

here's sort of difference can see changing regex pattern, forces changes in processing needed desired result:

require 'benchmark'  line = '((1,2,3),(1,2,3))'  pattern1 = /\([\d,]+\)/ pattern2 = /\(([\d,]+)\)/ pattern3 = /\((?:\d+,?)+\)/ pattern4 = /\d(?:[\d,])+/  line.scan(pattern1) # => ["(1,2,3)", "(1,2,3)"] line.scan(pattern2) # => [["1,2,3"], ["1,2,3"]] line.scan(pattern3) # => ["(1,2,3)", "(1,2,3)"] line.scan(pattern4) # => ["1,2,3", "1,2,3"]  line.scan(pattern1).map{ |s| s[1..-1].split(',').map(&:to_i) } # => [[1, 2, 3], [1, 2, 3]] line.scan(pattern2).map{ |s| s[0].split(',').map(&:to_i) }     # => [[1, 2, 3], [1, 2, 3]] line.scan(pattern3).map{ |s| s[1..-1].split(',').map(&:to_i) } # => [[1, 2, 3], [1, 2, 3]] line.scan(pattern4).map{ |s| s.split(',').map(&:to_i) }        # => [[1, 2, 3], [1, 2, 3]]  n = 1000000 benchmark.bm(8) |x|   x.report('pattern1') { n.times { line.scan(pattern1).map{ |s| s[1..-1].split(',').map(&:to_i) } }}   x.report('pattern2') { n.times { line.scan(pattern2).map{ |s| s[0].split(',').map(&:to_i) }     }}   x.report('pattern3') { n.times { line.scan(pattern3).map{ |s| s[1..-1].split(',').map(&:to_i) } }}   x.report('pattern4') { n.times { line.scan(pattern4).map{ |s| s.split(',').map(&:to_i) }        }} end

on ruby 2.0-p427:

               user     system      total        real pattern1   5.610000   0.010000   5.620000 (  5.606556) pattern2   5.460000   0.000000   5.460000 (  5.467228) pattern3   5.730000   0.000000   5.730000 (  5.731310) pattern4   5.080000   0.010000   5.090000 (  5.085965)

Search This Blog

Roma

testing - Optimizing ruby code for parsing a string to a numeric value -

Comments

Post a Comment

Popular posts from this blog

curl - PHP fsockopen help required -

HTTP/1.0 407 Proxy Authentication Required PHP -

java - More than one row with the given identifier was found: 1, for class: com.model.Diagnosis -