none
Data analysis help on a log file data with powerpivot

    Question

  • Hi,

    I facing a problem with powerpivot which I don't know how to solve.

    Context: I have a log file, which gives me roughly:
    date|item_id|status

    item_id is a unique identifier of my items. 
    status can be "created", "deleted" or some other possibilities (such as "edited")

    => I want to look at the number of items created during a given week, minus the items which have been deleted (whatever the deletion date).

    I must add that I don't have a dimension table with all item_id available. Moreover each line is not necessarily unique, meaning I have sometimes multiple "deleted" status for the same item.

    I'm not sure how to handle such a problem right now, anyone with a clue?
    Thanks!
    • Edited by lourson Tuesday, October 22, 2013 12:17 PM
    Tuesday, October 22, 2013 7:44 AM

Answers

  • to get the number of items created you can use this calculation:

    [Created Items]:=CALCULATE(DISTINCTCOUNT('LogFile'[item_id]), 'Status'[Status] = "created")

    the number of deleted items is very similar, you just need to remove the current filter on the date-table using ALL()

    [Deleted Items]:=CALCULATE(DISTINCTCOUNT('LogFile'[item_id]), 'Status'[Status] = "deleted", ALL('Date'))

    both calculations would give you the number of distinct item_id's
    to get the number of actual factrows (as you may have duplicate rows) you could also use COUNTROWS('LogFile') instead of DISTINCTCOUNT

    hth,
    gerhard


    - www.pmOne.com -

    • Marked as answer by lourson Wednesday, October 23, 2013 9:39 AM
    Tuesday, October 22, 2013 4:05 PM

All replies

  • to get the number of items created you can use this calculation:

    [Created Items]:=CALCULATE(DISTINCTCOUNT('LogFile'[item_id]), 'Status'[Status] = "created")

    the number of deleted items is very similar, you just need to remove the current filter on the date-table using ALL()

    [Deleted Items]:=CALCULATE(DISTINCTCOUNT('LogFile'[item_id]), 'Status'[Status] = "deleted", ALL('Date'))

    both calculations would give you the number of distinct item_id's
    to get the number of actual factrows (as you may have duplicate rows) you could also use COUNTROWS('LogFile') instead of DISTINCTCOUNT

    hth,
    gerhard


    - www.pmOne.com -

    • Marked as answer by lourson Wednesday, October 23, 2013 9:39 AM
    Tuesday, October 22, 2013 4:05 PM
  • Thanks gerhard, it helps!
    Wednesday, October 23, 2013 9:40 AM
  • I've tried your solution but unfortunately it's not entirely satisfactory, because by removing the date, I end up with having a deleted items every week, and not only the week I'm looking for.

    Result example:

    Date | Created Items | # of deleted Items | Deleted Items | Items available
    W1 | 1 | 0 | 1 | 0
    W2 | 0 | 0 | 1 | -1
    W3 | 0 | 0 | 1 | -1
    W4 | 0 | 1 | 1 | -1

    In that example, one item is created on W1 and deleted on W4. 

    [Items available] = [Created Items] - [Deleted Items]

    I need to graph the [items available] result, but obviously I don't want negative values. I need the [Deleted Items] to be taken into account only for the week of creation of the item

    Thanks a lot

    • Edited by lourson Thursday, October 24, 2013 12:16 PM
    Thursday, October 24, 2013 12:15 PM
  • Hi Lourson,

    actually this was your initial request:
    "=> I want to look at the number of items created during a given week, minus the items which have been deleted (whatever the deletion date)"

    I guess what you really want is to sum up all created items until a given/selected date and subtract all deleted items until that date:

    Date   Created Items   # of deleted Items   Deleted Items   Items available NEW
    W1  1 0 1 0 1
    W2  0 0 1 -1 1
    W3  0 0 1 -1 1
    W4  0 1 1 -1 0

    to get this you need to make use of your Date-Table:

    [Created Items]:=CALCULATE(DISTINCTCOUNT('LogFile'[item_id]), 'Status'[Status] = "created",
    DATESBETWEEN('Date'[Date], BLANK(), MAX('Date'[Date])))

    [Deleted Items]:=CALCULATE(DISTINCTCOUNT('LogFile'[item_id]), 'Status'[Status] = "deleted",
    DATESBETWEEN('Date'[Date], BLANK(), MAX('Date'[Date])))

    hth,
    gerhard


    - www.pmOne.com -

    Thursday, October 24, 2013 1:42 PM
  • Thanks again for your reply. 

    The goal I'm looking at is:

    Week3:
    Date | Created Items | # of deleted Items | Items available
    W1 | 1 | 0 | 1

    W2 | 0 | 0 | 1
    W3 | 0 | 0 | 1

    Week4:
    Date | Created Items | # of deleted Items | Items available

    W1 | 1 | 0 | 0
    W2 | 0 | 0 | 0
    W3 | 0 | 0 | 0
    W4 | 0 | 1 | 0

    Once an item has been deleted, I want to handle this as if it doesn't exist anymore (and never had).

    I think I actually got something to work using a Calculated Column instead on which I look for a deletion log of each item.

    [DeletedItem] = IF(LogFile[itemID]<>"";CALCULATE(countrows(LogFile);LogFile[action]="ITEM_DELETED";ALLEXCEPT(LogFile;LogFile[itemID])))

    This gives me a mean to then filter all my dashboards if I want to. 

    Thanks for your help




    • Edited by lourson Thursday, October 24, 2013 1:51 PM
    Thursday, October 24, 2013 1:49 PM
  • so as soon as you get data for W4 this should also effect older weeks - right?

    [Created Items]:=CALCULATE(DISTINCTCOUNT('LogFile'[item_id]), 'Status'[Status] = "created")
    [Deleted Items]:=CALCULATE(DISTINCTCOUNT('LogFile'[item_id]), 'Status'[Status] = "deleted")

    [Available Items]:=CALCULATE([Created Items] - [Deleted Items], ALL('Date'))

    hth,
    gerhard


    - www.pmOne.com -

    Friday, October 25, 2013 11:09 AM
  • Thanks again so much Gerhard for your help. It's not very easy to convey what I'm looking for exactly, and I apologize for that.

    Although it's using a calculated column, the solution I found (and stated above) seems to work fine and is better suited to my exact problem.

    The reason for that is that it allows me to create a stacked bar chart showing for each week the total number of Created Items + Deleted Items (by creation date but whenever they were deleted).

    If you feel there is a better way to do that than using a calculated column, I'd be happy to hear it.

    Thanks again

    Monday, November 04, 2013 2:49 PM