ruby - How to sort and remove duplicates in an array? -

September 15, 2011

i have compare 2 csv files populated ecommerce. files similar, except newer ones have different number of items, because catalogue changes every week.

example of csv file:

sku_code, description, price, url     001, product one, 100, www.something.com/1  002, prouct two, 150, www.something.com/2

by comparing 2 files extracted on different days, produce list of products have been discontinued , list of products have been added.

my index should sku_code, univocal inside catalogue.

i've been using this code stackoverflow:

#old file f1 = io.readlines("oldfeed.csv").map(&:chomp) #new file f2 = io.readlines("newfeed.csv").map(&:chomp)  #find new products file.open("new_products.txt","w"){ |f| f.write((f2-f1).join("\n")) }  #find old products file.open("deleted_products.txt","w"){ |f| f.write((f1-f2).join("\n")) }

my issue

it works well, except in 1 case: when 1 of fields after sku_code changed, products considered "new" (eg: change of price ) though needs, it's same product.

what smartest way compare sku_code instead of whole row?

no need use csv library, because not interested in actual values (except sku_code). i'd put each line hash sku_code key, compare sku_codes, , them retrieve values hashes.

#old file f1 = io.readlines("oldfeed.csv").map(&:chomp) f1_hash = f1[1..-1].inject(hash.new) {|hash,line| hash[line[/^\d+/]] = line; hash} #new file f2 = io.readlines("newfeed.csv").map(&:chomp) f2_hash = f2[1..-1].inject(hash.new) {|hash,line| hash[line[/^\d+/]] = line; hash}  #find new products new_product_keys = f2_hash.keys - f1_hash.keys new_products = new_product_keys.map {|sku_code| f2_hash[sku_code] }  #find old products old_product_keys = f1_hash.keys - f2_hash.keys old_products = old_product_keys.map {|sku_code| f1_hash[sku_code] }  # write new products file file.open("new_products.txt","w") |f|   f.write "#{f2.first}\n"   f.write new_products.join("\n") end  #write old products file file.open("deleted_products.txt","w") |f|   f.write "#{f1.first}\n"   f.write old_products.join("\n") end

the first line of each csv file contains column names. skipped first line of each csv file (f1[1..-1]) , added later when writing new file (f.write "#{f1.first}\n").

tested 2 imaginary csv files.

edit: accidentally computed old_products using new_product_keys, typo. those, tried edit answer (but unfortunately rejected).

Search This Blog

Roma

ruby - How to sort and remove duplicates in an array? -

Comments

Post a Comment

Popular posts from this blog

How to logout from a login page in asp.net -

How do i redirect a user to the previous page they came from after logging in? HTML/ASP -

Stack level too deep error after upgrade to rails 3.2 and ruby 1.9.3 -