testing - Optimizing ruby code for parsing a string to a numeric value -
i working on speed-tests ruby , need parse textfiles numeric values. due slow speed wondering if code optimized, or if ruby slow. code being read files, these files contain around 1 000 000 randomly generated lines or numbers, display few lines, know being read. filenames need read being passed arguments, coed separate scripts (just own clarity).
first want parse simple number, input comes in format:
type number type number ...
this how did it:
incr = 1 file.open(argv[0], "r").each_line |line| incr += 1 if incr % 3 == 0 line.to_i end end
second need parse single list, input comes in format:
type (1,2,3,...) type (1,2,3,...) ...
this how did it
incr = 1 file.open(argv[0], "r").each_line |line| incr += 1 if incr % 3 == 0 line.gsub("(","").gsub(")","").split(",").map{ |s| s.to_i} end end
finally need parse list of lists, input comes in format:
type ((1,2,3,...),(1,2,3,...),(...)) type ((1,2,3,...),(1,2,3,...),(...)) ...
this how did it:
incr = 1 file.open(argv[0], "r").each_line |line| incr += 1 if incr % 3 == 0 line.split("),(").map{ |s| s.gsub("(","").gsub(")","").split(",").map{ |s| s.to_i}} end end
i not need display results, speedtesting, there no need output. did check outcome , codes seem work correctly, suprisingly slow , speedtest optimum of ruby has offer. know there several speedtests out there use, purpose need build own.
what can better? how can code optimized? did go wrong, or best ruby can do? thank in advance tips , ideas.
in first one, instead of:
file.open(argv[0], "r").each_line |line|
use:
file.foreach(argv[0]) |line|
and instead of:
incr += 1 if incr % 3 == 0
use:
if $. % 3 == 0
$.
magic variable line number of last read line.
in second one, instead of:
line.gsub("(","").gsub(")","").split(",").map{ |s| s.to_i}
use:
line.tr('()', '').split(',').map(&:to_i)
in third one, instead of:
line.split("),(").map{ |s| s.gsub("(","").gsub(")","").split(",").map{ |s| s.to_i}}
use:
line.scan(/(?:\d+,?)+/).map{ |s| s.split(',', 0).map(&:to_i) }
here's how line works:
line.scan(/(?:\d+,?)+/) => ["1,2,3,", "1,2,3,"] line.scan(/(?:\d+,?)+/).map{ |s| s.split(',',0) } => [["1", "2", "3"], ["1", "2", "3"]] line.scan(/(?:\d+,?)+/).map{ |s| s.split(',', 0).map(&:to_i) } => [[1, 2, 3], [1, 2, 3]]
i didn't run benchmarks compare speed, changes should faster because gsub
calls gone. changes made weren't fastest ways things, they're more-optimized versions of own code.
trying compare speed of ruby other languages requires knowledge of fastest ways of accomplishing each step, based on multiple benchmarks of step. implies you're running on identical hardware , os , languages compiled efficient-for-speed forms. languages make tradeoffs of memory use vs. speed, so, while 1 might slower another, might more memory efficient.
plus, when coding in production environment, time produce code works correctly has factored "which faster" equation. c extremely fast, takes longer write programs ruby problems, because c doesn't hold hand ruby does. faster when c code takes week write , debug, vs. ruby code took hour? stuff think about.
i didn't read through @tadman's answer , comments until finished. using:
map(&:to_i)
used slower than:
map{ |s| s.to_i }
the speed difference depends on version of ruby you're running. using &:
implemented in monkey-patches it's built-into ruby. when made change sped lot:
require 'benchmark' foo = [*('1'..'1000')] * 1000 puts foo.size n = 10 puts "n=#{n}" puts ruby_version puts benchmark.bm(6) |x| x.report('&:to_i') { n.times { foo.map(&:to_i) }} x.report('to_i') { n.times { foo.map{ |s| s.to_i } }} end
which outputs:
1000000 n=10 2.0.0 user system total real &:to_i 1.240000 0.000000 1.240000 ( 1.250948) to_i 1.400000 0.000000 1.400000 ( 1.410763)
that's going through 10,000,000 elements, resulted in .2/sec difference. it's not of difference between 2 ways of doing same thing. if you're going processing lot more data matters. applications it's moot point because other things bottlenecks/slow-downs, write code whichever way works you, speed difference in mind.
to show difference ruby version makes, here's same benchmark results using ruby 1.8.7:
1000000 n=10 1.8.7 user system total real &:to_i 4.940000 0.000000 4.940000 ( 4.945604) to_i 2.390000 0.000000 2.390000 ( 2.396693)
as far gsub
vs. tr
:
require 'benchmark' foo = '()' * 500000 puts foo.size n = 10 puts "n=#{n}" puts ruby_version puts benchmark.bm(6) |x| x.report('tr') { n.times { foo.tr('()', '') }} x.report('gsub') { n.times { foo.gsub(/[()]/, '') }} end
with these results:
1000000 n=10 1.8.7 user system total real tr 0.010000 0.000000 0.010000 ( 0.011652) gsub 3.010000 0.000000 3.010000 ( 3.014059)
and:
1000000 n=10 2.0.0 user system total real tr 0.020000 0.000000 0.020000 ( 0.017230) gsub 1.900000 0.000000 1.900000 ( 1.904083)
here's sort of difference can see changing regex pattern, forces changes in processing needed desired result:
require 'benchmark' line = '((1,2,3),(1,2,3))' pattern1 = /\([\d,]+\)/ pattern2 = /\(([\d,]+)\)/ pattern3 = /\((?:\d+,?)+\)/ pattern4 = /\d(?:[\d,])+/ line.scan(pattern1) # => ["(1,2,3)", "(1,2,3)"] line.scan(pattern2) # => [["1,2,3"], ["1,2,3"]] line.scan(pattern3) # => ["(1,2,3)", "(1,2,3)"] line.scan(pattern4) # => ["1,2,3", "1,2,3"] line.scan(pattern1).map{ |s| s[1..-1].split(',').map(&:to_i) } # => [[1, 2, 3], [1, 2, 3]] line.scan(pattern2).map{ |s| s[0].split(',').map(&:to_i) } # => [[1, 2, 3], [1, 2, 3]] line.scan(pattern3).map{ |s| s[1..-1].split(',').map(&:to_i) } # => [[1, 2, 3], [1, 2, 3]] line.scan(pattern4).map{ |s| s.split(',').map(&:to_i) } # => [[1, 2, 3], [1, 2, 3]] n = 1000000 benchmark.bm(8) |x| x.report('pattern1') { n.times { line.scan(pattern1).map{ |s| s[1..-1].split(',').map(&:to_i) } }} x.report('pattern2') { n.times { line.scan(pattern2).map{ |s| s[0].split(',').map(&:to_i) } }} x.report('pattern3') { n.times { line.scan(pattern3).map{ |s| s[1..-1].split(',').map(&:to_i) } }} x.report('pattern4') { n.times { line.scan(pattern4).map{ |s| s.split(',').map(&:to_i) } }} end
on ruby 2.0-p427:
user system total real pattern1 5.610000 0.010000 5.620000 ( 5.606556) pattern2 5.460000 0.000000 5.460000 ( 5.467228) pattern3 5.730000 0.000000 5.730000 ( 5.731310) pattern4 5.080000 0.010000 5.090000 ( 5.085965)
Comments
Post a Comment