delete duplicate rows in large postgresql database table -


i have postgresql database 100 gb size. 1 of tables has half billion entries. quick data entry, of data repeated , left pruned later. 1 of columns can used identify rows unique.

i found this stackoverflow question suggested solution mysql:

alter ignore table table_name add unique (location_id, datetime) 

is there similar postgresql?

i tried deleting group , row number, computer runs out of memory after few hours in both cases.

this when try estimate number of rows in table:

select reltuples pg_class relname = 'orders';   reltuples   -------------  4.38543e+08 (1 row) 

two solutions come mind:

1). create new table select * source table clause determine unique rows. add indexes match source table, rename them both in transaction. whether or not work depends on several factors, including amount of free disk space, if table in constant use , interruptions access permissible, etc. creating new table has benefit of tightly packing data , indexes, , table smaller original because non-unique rows omitted.

2). create partial unique index on columns , add clause filter out non-uniques. example:

test=# create table t ( col1 int, col2 int, is_unique boolean); create table  test=# insert t values (1,2,true), (2,3,true),(2,3,false); insert 0 3  test=# create unique index concurrently t_col1_col2_uidx on t (col1, col2) is_unique true; create index  test=# \d t         table "public.t"   column   |  type   | modifiers  -----------+---------+-----------  col1      | integer |   col2      | integer |   is_unique | boolean |  indexes:     "t_col1_col2_uidx" unique, btree (col1, col2) is_unique true 

Comments

Popular posts from this blog

curl - PHP fsockopen help required -

HTTP/1.0 407 Proxy Authentication Required PHP -

c# - Resource not found error -