none
2 billion rows RRS feed

  • Question

  • Hi experts,

    I have a design/performance question…

    I have this next table…

    CREATE TABLE [dbo].[DW_Visits_2016](

           [VisitId] [int] NOT NULL,

           [UserId] [int] NOT NULL,

           [VisitReferrer] [varchar](512) NULL,

           [VisitFirstRequest] [varchar](255) NOT NULL,

           [VisitAppName] [varchar](255) NULL,

           [VisitCountry] [varchar](50) NULL,

           [VisitDate] [smalldatetime] NOT NULL,

           [VisitMins] [int] NOT NULL,

           [VisitHits] [int] NOT NULL,

           [EntryTag] [varchar](100) NOT NULL,

           [VisitCount] [int] NOT NULL,

           [VisitInitialDate] [datetime] NOT NULL,

           [AggregateType] [varchar](50) NULL,

           [MemberId] [int] NULL,

           [ServerName] [varchar](50) NULL,

           [BrowserUserAgent] [varchar](255) NULL,

           [LastModifiedDate] [smalldatetime] NULL,

           [Guid] [uniqueidentifier] NULL,

           [SessionId] [varchar](100) NULL,

           [IPAddress] [varchar](40) NULL,

     CONSTRAINT [PK_Visits] PRIMARY KEY NONCLUSTERED

    (

           [VisitId] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    )

     

    GO

     

    SET ANSI_PADDING OFF

    GO

     

    ALTER TABLE [dbo].[Visits]  WITH CHECK ADD  CONSTRAINT [CK_Visits_VisitDate] CHECK  (([VisitDate]>='2016-01-01' AND [VisitDate]<'2017-01-01'))

    GO

     

    ALTER TABLE [dbo].[Visits] CHECK CONSTRAINT [CK_Visits_VisitDate]

     

    And this same table for 2015,  2014,  2013,  2012,  2011 and 2010…

    Every table has around 150 million rows… So, combined we are talking about 1,050 million rows…

    I received a requirement… where BI people wants to have this combined on a single view… (Something crazy like select * from all_visits).

    Luckily they gave me some ‘where’ clauses, and some columns they don’t need, so the final result would be 6 columns and 20% of the rows (210 million ROWS). (But nonetheless, a ‘view’ is just a stored query).

    (Even though the box has 60GB of ram, it’s shared with many other databases…)

    Options I see…

    aa-a-      Instead of a view… Creating the views as tables and move them to a dedicated box.

    bb-b-      Create one view per year?

    cc-c-       Switch all of this to mongodb or something like vertica?!

    dd-d-      Any of the previous options combined with column stored indexes?

    It is the first time I come across a challenge like this, I would appreciate some insight/solution with someone with experience on such scenarios.

    Thanks in advance!!

     



    • Edited by maca128 Thursday, February 11, 2016 4:47 PM
    Thursday, February 11, 2016 4:46 PM

Answers

  • If they all have the same structure, why is it split between tables with year names? 

    There is no problem with a table having 1.1 billion rows. I have several with 50 billion rows.  There are many ways to approach any performance problem with indexing or partitioning, instead of creating individual tables.

    Thursday, February 11, 2016 5:23 PM

All replies

  • If the 2010 to 2015 Tables are static then I would look into the idea of creating another database and replicating the required data into this new Database; preferably on a database instance that is not as active.  If you do it this way you are applying most of the load to this new databases and only querying data from the production database for 2016.

    Please click "Mark As Answer" if my post helped. Tony C.

    Thursday, February 11, 2016 4:57 PM
  • If they all have the same structure, why is it split between tables with year names? 

    There is no problem with a table having 1.1 billion rows. I have several with 50 billion rows.  There are many ways to approach any performance problem with indexing or partitioning, instead of creating individual tables.

    Thursday, February 11, 2016 5:23 PM