none
indexing silly noob question

    Question

  • I am a little confused on indexing.

    I have 10 million rows.  each row has the name of a unique individual and their associated city (there are 1500 cities total):

    personName, cityName

    Many times a day I will do a  lookup per city. How do I set up my table (?) so that the results are returned quickly?

    thanks countryStyle

    Wednesday, July 17, 2013 10:12 PM

Answers

  • >It is a simple question on indexes with a simple set of data and a simple query

    Indexes are one of the big challenges in SS performance tuning. Without detailed info on business critical queries and table/index DDL, nobody can give you any good advice.

    As suggested above you can try putting an index on city and see what happens. I would not get my hopes up high since it does not have a good distribution.  Clustered index on city is not a good idea.

    Can you run the following query and post results?

    WITH CTE AS (SELECT City, Frequency=count(*) 
    FROM Person.Address
    GROUP BY City)
    SELECT TOP (10) * FROM CTE
    ORDER BY Frequency DESC;
    GO
    /*
    City	Frequency
    London	434
    Paris	398
    Burien	215
    Concord	214
    Bellingham	213
    Beaverton	213
    ...
    */
    
    
    


    Kalman Toth Database & OLAP Architect sqlusa.com
    New Book / Kindle: SQL Server 2012 Administration


    Thursday, July 18, 2013 11:24 PM
  • Hi CountryStyle,

    With the current table structure, it is OK to keep the index on “city” column and use it to joint tables. If we have permission to modify the table structure, I suggest moving the city column to another table and give every table an unique ID, in the current tables, we can just references to the city ID as a foreign key, create an index on this city ID and use it to join tables.

    If you have any feedback on our support, please click here.

    Allen Li
    TechNet Community Support


    Friday, July 19, 2013 7:22 AM

All replies

  • Creating an index on cityName would be a good place to start.  After the table is created..
    CREATE INDEX IX_cityName ON MyTable(cityName)
    Then check your execution plan to make sure it is actually used!
    Wednesday, July 17, 2013 10:35 PM
  • >I have 10 million rows.  each row has the name of a unique individual and their associated city (there are 1500 cities total):

    That is not a good distribution by city.  You may have 50K rows for Chicago!

    Can you post the query?

    Optimization:

    http://www.sqlusa.com/articles/query-optimization/


    Kalman Toth Database & OLAP Architect sqlusa.com
    New Book / Kindle: SQL Server 2012 Administration



    Thursday, July 18, 2013 12:38 AM
  • how about:select * from myTable where city = "chicago"
    Thursday, July 18, 2013 2:49 AM
  • how about:select * from myTable where city = "chicago"

    * should be avoided and specify the columns needed from the table.

    When you use *, even the presence of a Non-Clustered index on city , will go for Lookup (RID or Key based on presence of Clustered index).

    Thursday, July 18, 2013 3:19 AM
  • fine.  I don't see that makes a difference when doing a straight up static query:

    so how about:select city, person from myTable where city = "chicago"

    Thursday, July 18, 2013 3:35 AM
  • Assume you get back 50K rows for Chicago. What do you plan to do with it?

    Kalman Toth Database & OLAP Architect sqlusa.com
    New Book / Kindle: SQL Server 2012 Administration


    Thursday, July 18, 2013 8:32 AM
  • left join it to another smaller table where each person will have at least one match in the other table.  The other table will have 20000 rows.
    Thursday, July 18, 2013 11:03 AM
  • Can you post the table/index DDL and the code? 

    It is hard to give good answers for the fragments you are feeding us.

    Thanks.


    Kalman Toth Database & OLAP Architect sqlusa.com
    New Book / Kindle: SQL Server 2012 Administration


    Thursday, July 18, 2013 11:11 AM
  • you mean something like

    create table myTable

    (
    name varchar(50)
    ,city varchar(50)

    )

    ???

    select l.*, r.numberOfStreetsInCity
    from mytable l
    left join myOtherTable r
    on l.city = r.city

    • Edited by CountryStyle Thursday, July 18, 2013 11:26 AM update
    Thursday, July 18, 2013 11:24 AM
  • Not really.  I mean you post the real thing.  The real tables, the real indexes & the real query.

    Kalman Toth Database & OLAP Architect sqlusa.com
    New Book / Kindle: SQL Server 2012 Administration


    Thursday, July 18, 2013 11:37 AM
  • Kalman, I think you are just jerking me around.

    It is a simple question on indexes with a simple set of data and a simple query.  The information posted so far should be good enough for a general answer.

    i just need some general information.  I have not applied and index but if SQL does maybe it is good enough.


    • Edited by CountryStyle Thursday, July 18, 2013 12:01 PM update
    Thursday, July 18, 2013 11:58 AM
  • I am beginning to think that

    CREATE INDEX IX_cityName ON MyTable(cityName)

    is the right answer to try once I get back in a test environment eventhough it has been inferred by 2 others that it is not.
    That or  ask elsewhere or do take the long route of learning the details myself.

    Thursday, July 18, 2013 12:00 PM
  • Hi,

    For best performance, I would suggest you to create clustered index on City. The whole set of rows will be sorted based on City and you will be able to retrieve the persons belonging to City in a quicker way.

    If you have the table with more columns, than you have specified here and have clustered index in other frequently used column, I would suggest you to create Non-clustered index on City with Person as included column for optimal performance of retrieving persons belonging to a City.


    Venkataraman R Be Blessed by Divine

    Thursday, July 18, 2013 12:09 PM
  • >It is a simple question on indexes with a simple set of data and a simple query

    Indexes are one of the big challenges in SS performance tuning. Without detailed info on business critical queries and table/index DDL, nobody can give you any good advice.

    As suggested above you can try putting an index on city and see what happens. I would not get my hopes up high since it does not have a good distribution.  Clustered index on city is not a good idea.

    Can you run the following query and post results?

    WITH CTE AS (SELECT City, Frequency=count(*) 
    FROM Person.Address
    GROUP BY City)
    SELECT TOP (10) * FROM CTE
    ORDER BY Frequency DESC;
    GO
    /*
    City	Frequency
    London	434
    Paris	398
    Burien	215
    Concord	214
    Bellingham	213
    Beaverton	213
    ...
    */
    
    
    


    Kalman Toth Database & OLAP Architect sqlusa.com
    New Book / Kindle: SQL Server 2012 Administration


    Thursday, July 18, 2013 11:24 PM
  • Depending on how many changes you can make to the actual table I would create a table with the Unique city names and Int ID's and the just a simple Foreign key relationship. Once you're working with ints the indexes are a lot smaller and faster to search, sort etc. Also at millions of rows it significantly reduces the size of the table.
    Friday, July 19, 2013 12:56 AM
  • Hi CountryStyle,

    With the current table structure, it is OK to keep the index on “city” column and use it to joint tables. If we have permission to modify the table structure, I suggest moving the city column to another table and give every table an unique ID, in the current tables, we can just references to the city ID as a foreign key, create an index on this city ID and use it to join tables.

    If you have any feedback on our support, please click here.

    Allen Li
    TechNet Community Support


    Friday, July 19, 2013 7:22 AM