delete duplicate rows in large postgresql database table -

April 15, 2011

i have postgresql database 100 gb size. 1 of tables has half billion entries. quick data entry, of data repeated , left pruned later. 1 of columns can used identify rows unique.

i found this stackoverflow question suggested solution mysql:

alter ignore table table_name add unique (location_id, datetime)

is there similar postgresql?

i tried deleting group , row number, computer runs out of memory after few hours in both cases.

this when try estimate number of rows in table:

select reltuples pg_class relname = 'orders';   reltuples   -------------  4.38543e+08 (1 row)

two solutions come mind:

1). create new table select * source table clause determine unique rows. add indexes match source table, rename them both in transaction. whether or not work depends on several factors, including amount of free disk space, if table in constant use , interruptions access permissible, etc. creating new table has benefit of tightly packing data , indexes, , table smaller original because non-unique rows omitted.

2). create partial unique index on columns , add clause filter out non-uniques. example:

test=# create table t ( col1 int, col2 int, is_unique boolean); create table  test=# insert t values (1,2,true), (2,3,true),(2,3,false); insert 0 3  test=# create unique index concurrently t_col1_col2_uidx on t (col1, col2) is_unique true; create index  test=# \d t         table "public.t"   column   |  type   | modifiers  -----------+---------+-----------  col1      | integer |   col2      | integer |   is_unique | boolean |  indexes:     "t_col1_col2_uidx" unique, btree (col1, col2) is_unique true

Search This Blog

Roma

delete duplicate rows in large postgresql database table -

Comments

Post a Comment

Popular posts from this blog

How to logout from a login page in asp.net -

How do i redirect a user to the previous page they came from after logging in? HTML/ASP -

java - More than one row with the given identifier was found: 1, for class: com.model.Diagnosis -