Optimizing MySQL Distinct Queries with Groups

Here’s some fun technical stuff. If you don’t program SQL, this will make no sense to you. Just move along. I wanted to put this out for the Google to find so it benefits other programmers.

I have some medium sized tables that I’m doing duplicate removal on and I tried a variety of techniques to see what works the fastest. The differences are amazing.

The server is a dual Pentium 1.8, 256 Meg SCRAM single Seagate SCSI 7200rpm drive running Linux and MySQL 4.x using MyISAM. I have a table with 2.8 million phone numbers and tried SELECT DISTINCT and GROUP BY with and without indexes. The phone number field is a BIGINT with NO NULL.

  • SELECT DISTINCT with no index took 9 minutes 3 seconds.

  • Using GROUP BY took 13 minutes flat with no index.

  • The index took 18 seconds to create.

  • SELECT DISTINCT with the index took 8 minutes 25 seconds.

  • Using GROUP BY took 25 seconds flat (zero minutes).

WOW! What a difference the phrasing can make. I would almost consider it a bug that there is such a difference.

Keep in mind that creating this table without an index and doing the inserts unindexed was faster. Making the index once is much less work for the database engine than 2.8 million index updates.

Comments are closed.


Copyright © 2012 -1354585406