none
[Performance]Walking through billions of records?

    Question

  • Hi Guys,

    I have a single table, which store impression records from website as following:

    CREATE TABLE [dbo].[ImpressDetail]
        (
          [DetailKey] [bigint] IDENTITY(1, 1) NOT NULL ,
          [Time] [smalldatetime] NOT NULL ,
          [DimTime] [smalldatetime] NOT NULL ,
          [DimDate] [smalldatetime] NOT NULL ,
          [InsertTime] [smalldatetime] NOT NULL ,
          [AdKey] [bigint] NOT NULL ,
          [IPAddress] [bigint] NOT NULL ,
          [IPString] [varchar](16) NOT NULL ,
          [Cookie] [nvarchar](64) NOT NULL ,
          [GeoCode] [nvarchar](64) NOT NULL
            CONSTRAINT [PK_ImpressDetail] PRIMARY KEY CLUSTERED ( [DimTime] ASC )
        )

    Incoming records will be 30 millions every day in every seconds. We need hold at least 1 year records in active table which means  billions records for single website page. We have 15 pages in total. 3 billion records already exists.

    I've already done those before:

    1. Separate single page  records into different databases

    2. Create index on [DimTime], [DetailKey], [DimDate]

    3. Partition table with [DimTime]

    Question is:

    1. Insert 30 millions is painful slow which have indexes,  disable index then rebuild also slow when table have so much records.

    2. I need hourly, daily, weekly, monthly reports on unique visit(distinct by cookie), unique IP visit(distinct  BY IP), and total count in condition, this query is too heavy.

    I am considering using slide window, but still, the history data can't be referred since we need unique counts.

    Any help will be appreciate, guys, any idea?

    Thanks

    Alex


    Monday, January 06, 2014 10:17 AM

Answers

  • The size of that detail table is going to be way too large to handle very quickly.  You need to rollup the data into the lowest reporting segment.

    I would suggest a structure like:

    Detail data for 7 days (do you even need detail data?  If not then daily)

    Weekly distinct counts for 2-8 weeks

    Monthly distinct counts for 3-12 months

    The problem with this method is to do a monthly report, you need to aggregate the detail + weekly + montly data.

    • Marked as answer by Lei Chi Tuesday, January 14, 2014 7:45 AM
    Monday, January 06, 2014 1:17 PM

All replies

  • The size of that detail table is going to be way too large to handle very quickly.  You need to rollup the data into the lowest reporting segment.

    I would suggest a structure like:

    Detail data for 7 days (do you even need detail data?  If not then daily)

    Weekly distinct counts for 2-8 weeks

    Monthly distinct counts for 3-12 months

    The problem with this method is to do a monthly report, you need to aggregate the detail + weekly + montly data.

    • Marked as answer by Lei Chi Tuesday, January 14, 2014 7:45 AM
    Monday, January 06, 2014 1:17 PM
  • Thanks for your suggestion, I am in considering~

    Another problem, we will have half billion records for single day, do I need another approach to deal with it?

    such like hardware environment, distribute database?

    Wednesday, January 08, 2014 6:54 AM
  • Thank u again
    Tuesday, January 14, 2014 7:56 AM