ruby - How to sort and remove duplicates in an array? -
i have compare 2 csv files populated ecommerce. files similar, except newer ones have different number of items, because catalogue changes every week.
example of csv file:
sku_code, description, price, url 001, product one, 100, www.something.com/1 002, prouct two, 150, www.something.com/2
by comparing 2 files extracted on different days, produce list of products have been discontinued , list of products have been added.
my index should sku_code, univocal inside catalogue.
i've been using this code stackoverflow:
#old file f1 = io.readlines("oldfeed.csv").map(&:chomp) #new file f2 = io.readlines("newfeed.csv").map(&:chomp) #find new products file.open("new_products.txt","w"){ |f| f.write((f2-f1).join("\n")) } #find old products file.open("deleted_products.txt","w"){ |f| f.write((f1-f2).join("\n")) }
my issue
it works well, except in 1 case: when 1 of fields after sku_code
changed, products considered "new" (eg: change of price ) though needs, it's same product.
what smartest way compare sku_code
instead of whole row?
no need use csv library, because not interested in actual values (except sku_code
). i'd put each line hash sku_code
key, compare sku_codes
, , them retrieve values hashes.
#old file f1 = io.readlines("oldfeed.csv").map(&:chomp) f1_hash = f1[1..-1].inject(hash.new) {|hash,line| hash[line[/^\d+/]] = line; hash} #new file f2 = io.readlines("newfeed.csv").map(&:chomp) f2_hash = f2[1..-1].inject(hash.new) {|hash,line| hash[line[/^\d+/]] = line; hash} #find new products new_product_keys = f2_hash.keys - f1_hash.keys new_products = new_product_keys.map {|sku_code| f2_hash[sku_code] } #find old products old_product_keys = f1_hash.keys - f2_hash.keys old_products = old_product_keys.map {|sku_code| f1_hash[sku_code] } # write new products file file.open("new_products.txt","w") |f| f.write "#{f2.first}\n" f.write new_products.join("\n") end #write old products file file.open("deleted_products.txt","w") |f| f.write "#{f1.first}\n" f.write old_products.join("\n") end
the first line of each csv file contains column names. skipped first line of each csv file (f1[1..-1]
) , added later when writing new file (f.write "#{f1.first}\n"
).
tested 2 imaginary csv files.
edit: accidentally computed old_products
using new_product_keys
, typo. those, tried edit answer (but unfortunately rejected).
Comments
Post a Comment