none
Need help with view that returns last transaction dates RRS feed

  • Question

  • Hi,

    Please find all the necessary scripts at the bottom of this post to create the scenario I'm describing here.

    Scenario:

    I have a "Transactions" table where I store all transactions in a stock market management app. ALL transactions are in there with the pertinent info e.g. TransactionType, TransactionAmount, etc.

    The next two tables are super simple and self explanatory i.e. People and Companies.

    I then have a table called TransactionsForCompanies where I store all transactions created by corporate customers. In this table, I have only two columns: TransactionId -- coming from the Transactions table -- CompanyId -- coming from the Companies table.

    The next table called TransactionsForPeople is almost identical the last one but it's used for identifying transactions created by individuals.

    Please see the DB structure and I think it'll make sense fairly easily.

    Here's what I'm trying to do and where I need help.

    I want to create a view that gives me the "Last Transaction Date" for people. So if a person had multiple transactions in the last month, this view is supposed to give me the last transaction date for that individual who might have had the transaction only 2 hours ago. Of course, the view will return last transaction date for ALL "individual customers".

    To do this, I used the MAX function and the view worked nicely. However, because I'm using the MAX function, I cannot create an index for the view. As a result, I have a performance issue because my "Transactions" table has millions of records.

    Of course the idea is to use the view in another SELECT statement so that I can generate nice reports about most and least active clients. This SELECT statement runs very slowly and when I look at the execution plan, the bottleneck is where I have my JOIN to the view that is using the MAX function.

    QUESTION: How do I capture the last TransactionDate for people who had transactions in a way that gives me good performance?

    Thanks for your help...

    SCRIPTS:

    USE [TestDb]
    GO
    /****** Object:  Table [dbo].[Companies]    Script Date: 4/7/2013 1:48:18 PM ******/
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    CREATE TABLE [dbo].[Companies](
    	[CompanyId] [int] IDENTITY(1,1) NOT NULL,
    	[CompanyName] [nvarchar](50) NOT NULL,
     CONSTRAINT [PK_Companies] PRIMARY KEY CLUSTERED 
    (
    	[CompanyId] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]
    
    GO
    /****** Object:  Table [dbo].[People]    Script Date: 4/7/2013 1:48:18 PM ******/
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    CREATE TABLE [dbo].[People](
    	[PersonId] [int] IDENTITY(1,1) NOT NULL,
    	[FirstName] [nvarchar](20) NOT NULL,
    	[MiddleName] [nvarchar](20) NULL,
    	[LastName] [nvarchar](20) NOT NULL,
     CONSTRAINT [PK_People] PRIMARY KEY CLUSTERED 
    (
    	[PersonId] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]
    
    GO
    /****** Object:  Table [dbo].[Transactions]    Script Date: 4/7/2013 1:48:18 PM ******/
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    CREATE TABLE [dbo].[Transactions](
    	[TransactionId] [int] IDENTITY(1,1) NOT NULL,
    	[TransactionType] [nvarchar](50) NOT NULL,
    	[TransactionTimeStamp] [datetime] NOT NULL,
    	[TransactionAmount] [money] NOT NULL,
    	[Comments] [nvarchar](50) NULL,
     CONSTRAINT [PK_Transactions] PRIMARY KEY CLUSTERED 
    (
    	[TransactionId] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]
    
    GO
    /****** Object:  Table [dbo].[TransactionsForCompanies]    Script Date: 4/7/2013 1:48:18 PM ******/
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    CREATE TABLE [dbo].[TransactionsForCompanies](
    	[TransactionId] [int] NOT NULL,
    	[CompanyId] [int] NOT NULL,
     CONSTRAINT [PK_TransactionsForCompanies] PRIMARY KEY CLUSTERED 
    (
    	[TransactionId] ASC,
    	[CompanyId] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]
    
    GO
    /****** Object:  Table [dbo].[TransactionsForPeople]    Script Date: 4/7/2013 1:48:18 PM ******/
    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    CREATE TABLE [dbo].[TransactionsForPeople](
    	[TransactionId] [int] NOT NULL,
    	[PersonId] [int] NOT NULL,
     CONSTRAINT [PK_TransactionsForPeople] PRIMARY KEY CLUSTERED 
    (
    	[TransactionId] ASC,
    	[PersonId] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    ) ON [PRIMARY]
    
    GO
    SET IDENTITY_INSERT [dbo].[Companies] ON 
    
    GO
    INSERT [dbo].[Companies] ([CompanyId], [CompanyName]) VALUES (1, N'Company A')
    GO
    INSERT [dbo].[Companies] ([CompanyId], [CompanyName]) VALUES (2, N'Company B')
    GO
    INSERT [dbo].[Companies] ([CompanyId], [CompanyName]) VALUES (3, N'Company C')
    GO
    SET IDENTITY_INSERT [dbo].[Companies] OFF
    GO
    SET IDENTITY_INSERT [dbo].[People] ON 
    
    GO
    INSERT [dbo].[People] ([PersonId], [FirstName], [MiddleName], [LastName]) VALUES (1, N'John', NULL, N'Doe')
    GO
    INSERT [dbo].[People] ([PersonId], [FirstName], [MiddleName], [LastName]) VALUES (2, N'Jane', NULL, N'Smith')
    GO
    INSERT [dbo].[People] ([PersonId], [FirstName], [MiddleName], [LastName]) VALUES (3, N'Betsy', NULL, N'Green')
    GO
    SET IDENTITY_INSERT [dbo].[People] OFF
    GO
    SET IDENTITY_INSERT [dbo].[Transactions] ON 
    
    GO
    INSERT [dbo].[Transactions] ([TransactionId], [TransactionType], [TransactionTimeStamp], [TransactionAmount], [Comments]) VALUES (1, N'BUY', CAST(0x0000A1990122F5C4 AS DateTime), 300.0000, N'Got it!')
    GO
    INSERT [dbo].[Transactions] ([TransactionId], [TransactionType], [TransactionTimeStamp], [TransactionAmount], [Comments]) VALUES (2, N'BID', CAST(0x0000A19901236069 AS DateTime), 1753.5000, N'My best offer...')
    GO
    INSERT [dbo].[Transactions] ([TransactionId], [TransactionType], [TransactionTimeStamp], [TransactionAmount], [Comments]) VALUES (3, N'SELL', CAST(0x0000A19901236AF5 AS DateTime), 1753.5000, N'Will take it!')
    GO
    INSERT [dbo].[Transactions] ([TransactionId], [TransactionType], [TransactionTimeStamp], [TransactionAmount], [Comments]) VALUES (4, N'BUY', CAST(0x0000A19901236C21 AS DateTime), 1753.5000, N'Finalize purchase.')
    GO
    SET IDENTITY_INSERT [dbo].[Transactions] OFF
    GO
    INSERT [dbo].[TransactionsForCompanies] ([TransactionId], [CompanyId]) VALUES (3, 1)
    GO
    INSERT [dbo].[TransactionsForPeople] ([TransactionId], [PersonId]) VALUES (1, 2)
    GO
    INSERT [dbo].[TransactionsForPeople] ([TransactionId], [PersonId]) VALUES (2, 3)
    GO
    INSERT [dbo].[TransactionsForPeople] ([TransactionId], [PersonId]) VALUES (4, 3)
    GO
    ALTER TABLE [dbo].[Transactions] ADD  CONSTRAINT [DF_Transactions_TransactionTimeStamp]  DEFAULT (getutcdate()) FOR [TransactionTimeStamp]
    GO
    ALTER TABLE [dbo].[TransactionsForCompanies]  WITH CHECK ADD  CONSTRAINT [FK_TransactionsForCompanies_Companies] FOREIGN KEY([CompanyId])
    REFERENCES [dbo].[Companies] ([CompanyId])
    GO
    ALTER TABLE [dbo].[TransactionsForCompanies] CHECK CONSTRAINT [FK_TransactionsForCompanies_Companies]
    GO
    ALTER TABLE [dbo].[TransactionsForCompanies]  WITH CHECK ADD  CONSTRAINT [FK_TransactionsForCompanies_Transactions] FOREIGN KEY([TransactionId])
    REFERENCES [dbo].[Transactions] ([TransactionId])
    GO
    ALTER TABLE [dbo].[TransactionsForCompanies] CHECK CONSTRAINT [FK_TransactionsForCompanies_Transactions]
    GO
    ALTER TABLE [dbo].[TransactionsForPeople]  WITH CHECK ADD  CONSTRAINT [FK_TransactionsForPeople_People] FOREIGN KEY([PersonId])
    REFERENCES [dbo].[People] ([PersonId])
    GO
    ALTER TABLE [dbo].[TransactionsForPeople] CHECK CONSTRAINT [FK_TransactionsForPeople_People]
    GO
    ALTER TABLE [dbo].[TransactionsForPeople]  WITH CHECK ADD  CONSTRAINT [FK_TransactionsForPeople_Transactions] FOREIGN KEY([TransactionId])
    REFERENCES [dbo].[Transactions] ([TransactionId])
    GO
    ALTER TABLE [dbo].[TransactionsForPeople] CHECK CONSTRAINT [FK_TransactionsForPeople_Transactions]
    GO


    Thanks, Sam





    • Edited by imsam67 Sunday, April 7, 2013 6:36 PM
    Sunday, April 7, 2013 5:59 PM

Answers

  • OK, I just wanted to verify that you had not over-normalised your data model.

    It seems that you will either have to live with the poor performance, or engage in some form if denormalisation. Three options:

    1) Add a table LastTransactionDates which you feed with a trigger. The trigger is more complex to write, but gives the smallest table.

    2) Just add TransactionTimeStamp to the junction table, and add an index.

    3) Here is the same idea as the second, but using an indexed view:

    CREATE VIEW TransactionDates WITH SCHEMABINDING AS
       SELECT TFP.PersonId, T.TransactionId, T.TransactionTimeStamp
       FROM   dbo.TransactionsForPeople TFP
       JOIN   dbo.Transactions T ON T.TransactionId = TFP.TransactionId
    go
    CREATE UNIQUE CLUSTERED INDEX clusterix ON TransactionDates (PersonId, TransactionId)
    CREATE INDEX date_ix ON TransactionDates (PersonId, TransactionTimeStamp)


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    • Marked as answer by imsam67 Sunday, April 7, 2013 9:16 PM
    Sunday, April 7, 2013 9:02 PM

All replies

  • You probably don't actually need to calculate or maintain the max date for all people.

    How about you post the report query that runs slowly, and we can try to help you optimize it?

    Sunday, April 7, 2013 6:13 PM
  • Sure...

    Here's the CREATE statement for you to create the view locally...

    USE [TestDb]
    GO
    
    /****** Object:  View [dbo].[vwLastTransactionDateForPeople]    Script Date: 4/7/2013 2:28:30 PM ******/
    SET ANSI_NULLS ON
    GO
    
    SET QUOTED_IDENTIFIER ON
    GO
    
    CREATE VIEW [dbo].[vwLastTransactionDateForPeople]
    AS
    SELECT TOP (100) PERCENT tp.PersonId, MAX(t.TransactionTimeStamp) AS LastTransactionDate
    FROM     dbo.TransactionsForPeople AS tp INNER JOIN
                      dbo.Transactions AS t ON t.TransactionId = tp.TransactionId
    GROUP BY tp.PersonId
    
    GO


    And here's the SELECT statement that gives me the report. As you can see, this SELECT statement is using the view created in the previous step and when I look at the execution plan for this SELECT statement, the highest cost is where I have the JOIN to the view.

    use TestDb
    
    SELECT lt.PersonId, p.FirstName, p.LastName, lt.LastTransactionDate
    FROM vwLastTransactionDateForPeople AS lt
    	inner join People as p on p.PersonId = lt.PersonId

    To further clarify, if you run the SELECT statement, you'll see that Betsy Green (PersonId: 3) has a last transaction date of 2013-04-07 17:41 which is correct. She had two transactions i.e. one BID transaction and one BUY transaction. The last transaction is the BUY one so we get the time stamp for that one.

    Thanks again...


    Thanks, Sam


    • Edited by imsam67 Sunday, April 7, 2013 6:32 PM
    Sunday, April 7, 2013 6:30 PM
  • I don't understand your tables. Or more precisely, I don't understand TransactionsForPeople and TransactionForCompanies.

    To start with, why are there two tables? Why not have a table Persons, and then have subtables for natural and juridical persons? When it comes to a stock transaction there are no fundamental differences between people and companies. Tax rules etc may be different, but the transactions as such is the same.

    The next question: why do the tables exist at all? The keys are (TransactionId, PersonId), that is the keys of a junction table for a many-to-many relation. But how can several persons be related to the same transaction? Since I work with stock-trading applications myself this just feels wrong from a business perspective.

    I would but PersonId in the Transactions table and scrap the other two. And once you are there, it's a matter of a simple index on (PersonId, TrasactionTimeStamp).


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Sunday, April 7, 2013 6:41 PM
  • Erland,

    First, thank you for your help. I put together this scenario for demonstration purposes and to make it easy to understand. My app is not a stock market app. However, the table structures are identical.

    So, sticking to the example I came up with, a transaction can be made by an individual or a corporate client. For me to identify who initiated the transaction, I'd have to add both PersonId and CompanyId columns to the Transactions table and if the transaction is made by a person, the CompanyId column would have a NULL value and vice versa. I think this is what you're suggesting.

    In my particular app, a transaction can be tied to multiple entities at the same time which is why I took the approach I did. One example of that which is easy to understand is files. I use Amazon S3 as my file storage and as people upload files to S3, I make DB entries to register basic data about the file.

    All file data are kept in Files table. I then have FilesForProjects, FilesForCompanies, FilesForEmployees, etc. tables. In that particular scenario, the same file can be linked to a company, a project and a person simultaneously. Hopefully, this makes more sense.

    Back to my question, is there not a way for me to get the last transaction date without a major performance hit -- at this least in this design?


    Thanks, Sam

    Sunday, April 7, 2013 6:53 PM
  • OK, we are talking about something else than stock transactions and where many-to-many makes sense.

    But looking your example about files: the same file can be tied to multiple entities. But can it be tied a number of entities of the same type? That is, can there be three or thirty-three people tied to the same transaction? Your file example indicates that there can only be one person and one company.


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    Sunday, April 7, 2013 8:22 PM
  • Yes.

    For example, think of a PDF file that contains some information about certain types of employees e.g. all hourly employees. In that case, we linke the same file to those employees -- i.e. persons -- who meet the criteria. The idea is that if/when we update the contents of the file, everyone who sees that file under their documents will have the new version of the PDF.

    I chose this approach for the flexibility it gives me. I may link that file to multiple persons, multiple security groups and multiple departments simultaneously. So linking to entities of different types as well as same type.

    Also, files is not the only table where I use this approach. I also use it in messaging. Think of it as an internal email system. The message contents is in the "Messages" table. I may chose to send the message to certain persons as well as display it in certain department bulletin boards. I create the message in Messages table then create links in MessagesForPeople, MessagesForForums and MessagesForProjects. This way, if I update the message, I edit the record in the Messages table and it's updated everywhere.


    Thanks, Sam


    • Edited by imsam67 Sunday, April 7, 2013 8:41 PM
    Sunday, April 7, 2013 8:37 PM
  • OK, I just wanted to verify that you had not over-normalised your data model.

    It seems that you will either have to live with the poor performance, or engage in some form if denormalisation. Three options:

    1) Add a table LastTransactionDates which you feed with a trigger. The trigger is more complex to write, but gives the smallest table.

    2) Just add TransactionTimeStamp to the junction table, and add an index.

    3) Here is the same idea as the second, but using an indexed view:

    CREATE VIEW TransactionDates WITH SCHEMABINDING AS
       SELECT TFP.PersonId, T.TransactionId, T.TransactionTimeStamp
       FROM   dbo.TransactionsForPeople TFP
       JOIN   dbo.Transactions T ON T.TransactionId = TFP.TransactionId
    go
    CREATE UNIQUE CLUSTERED INDEX clusterix ON TransactionDates (PersonId, TransactionId)
    CREATE INDEX date_ix ON TransactionDates (PersonId, TransactionTimeStamp)


    Erland Sommarskog, SQL Server MVP, esquel@sommarskog.se
    • Marked as answer by imsam67 Sunday, April 7, 2013 9:16 PM
    Sunday, April 7, 2013 9:02 PM
  • If we run with this scenario, what would be the best way to connect the Persons table to the subtables? Some kind of type column?

    Sunday, April 7, 2013 9:11 PM
  • Erland,

    I think the 3rd option works really nicely. Thank you very much!


    Thanks, Sam

    Sunday, April 7, 2013 9:16 PM
  • Oh, so you do want to get the last dates of every single person.

    That would either cost you by calculating it every time -- and indexes wouldn't help that much, since you need to scan all the transactions (might consider using NOLOCK here, if dirty reads are okay), or by maintaining a summary table with a trigger -- which would make every transaction longer, and may cause locking or even deadlocks.

    If you'd be okay with the data being updated once a day or so, you could run a daily job to update the summary records of those with transactions in the last day.

    Sunday, April 7, 2013 9:17 PM
  • Alex,

    I do want to get the last dates for every single person. Using what Erland suggested, I have a view that is indexed. So, I can now run my MAX query against that indexed view. I haven't completely tested this yet but I think Erland's approach will give me these benefits.

    1. The view Erland suggests gives me a subset of the records in the Transactions table because all are now related to people -- not companies. The reduced number of records is benefit number 1.
    2. This view is now indexed -- including the TransactionTimeStamp column. That should increase my performance.
    3. I can now add my MAX(TransactionTimeStamp) in the SELECT statement I'm using for the report and get my data from the indexed view.

    I think all of these should give me the performance improvement I need. I haven't yet tested this on the production server but I think it will help. I'll post an update to let you guys know how it turned out.


    Thanks, Sam




    • Edited by imsam67 Sunday, April 7, 2013 9:26 PM
    Sunday, April 7, 2013 9:24 PM