Detection and elimination of duplicate data using token-based method for a data warehouse: a clustering based approach