locked
Combining Rows (or something better?) RRS feed

  • Question

  • I have a bunch of excel files with time clock data. The time clock records when an employee Badges In (Date & Time) and when they badge out (Date & Time). The two transactions are on two different records.  The files have several fields of data, but the key ones are: EmployeeID, EmployeeName, Date, Time, PunchDateTime, & InOut. The "InOut" field has "In" or "Out" values in it for badging in or badging out.

    The end goal, is to combine the "in" and "out" records into one "pair" for each day.  If they badge out for lunch, there would be two records for the employee on the same day. For example: Badge in 8:00 AM, Badge out 12:00 PM and Badge in 1:00 PM, Badge out 5:00 PM.

    I then need to calculate the time in hours (8.5 for example would be 8 and 1/2 hour), in incriments of .25 hours.

    I am stuck on how to combine the records to get BadgeInTime and BadgeOutTime as same fields so I can calculate the time between.

    Screenshot of key data is attached.


    John Thomas

    Thursday, February 23, 2017 3:17 PM

Answers

  • This version adds a new column with the total hours worked per day based on adding the differences between In/Out pairs. The value is repeated in each row for the same person. Uses Number_CeilingMath custom function as mentioned in previous post.

    let
        .........,
        RemovedColumns = Table.RemoveColumns(<previousStepName>, List.Difference(Table.ColumnNames(Source), {"Date", "BadgeID", "Name", "PunchDateTime", "Time", "InOut"})),
        ChangedType1 = Table.TransformColumnTypes(RemovedColumns,{{"Date", type date}, {"BadgeID", type text}}),
        GroupedRows = Table.Group(ChangedType1, {"Date", "BadgeID", "Name"}, {{"Table", each _, type table}}),
        AddedIndex = Table.TransformColumns(GroupedRows, {"Table", each Table.AddIndexColumn(_, "Index", 0, 1)}),
        AddedCustom = Table.TransformColumns(AddedIndex, {"Table", each Table.AddColumn(_, "GroupIndex", each Number.IntegerDivide([Index], 2))}),
        AddedCustom2 = Table.TransformColumns(AddedCustom, {"Table", each Table.AddColumn(_, "TimeMinus", each if [InOut] = "Out" then [Time] else -[Time])}),
        GroupedRows2 = Table.TransformColumns(AddedCustom2, {"Table", each Table.Group(_, {"GroupIndex"}, {{"Hours", each List.Sum([TimeMinus])}})}),
        ExpandedTable = Table.ExpandTableColumn(GroupedRows2, "Table", {"Hours"}),
        GroupedRows3 = Table.Group(ExpandedTable, {"Date", "BadgeID", "Name"}, {{"Total Hours", each Number_CeilingMath(List.Sum([Hours])*24, 0.25), type number}}),
        MergedTables = Table.NestedJoin(ChangedType1, "BadgeID", GroupedRows3, "BadgeID", "Merged"),
        ExpandedMerged = Table.ExpandTableColumn(MergedTables, "Merged", {"Total Hours"}, {"Total Hours"}),
        ChangedType = Table.TransformColumnTypes(ExpandedMerged,{{"Time", type time}})
    in
        ChangedType

    • Proposed as answer by Imke FeldmannMVP Monday, February 27, 2017 6:16 PM
    • Marked as answer by jbt_PwrPvt Tuesday, March 7, 2017 4:34 PM
    Friday, February 24, 2017 3:36 AM
  • Hi John,

    To track the step causing the error, you change the step name after the "in" keyword to a different step name in the script. The code is evaluated up to the name indicated after the "in" keyword. This is an iterative process. You change the step name until you find the one where the error first occurs.

    I understand the problem with shifts crossing over a day. However, what logic would we use to know that we're in the same shift period? For example, someone badges in at 9:00 PM on 2/15/2017. He then badges out at 11:45 PM. He badges in again at 12:15 AM on 2/16/2017, and so on. His shift ends at 5:00 AM on 2/16/2017. He badges in again at 9:00 PM on 2/16/2017 for a new shift.

    I'm also noticing issues with the ceiling function under certain situations. For example, if someone works a single BadgeIn/BadgeOut shift from say 10:00 PM on one day to 6:00 AM the next day, ceiling will calculate 8.25 hours instead of 8 hours. This occurs because the calculated duration is 0.333333333... of a day.

    • Marked as answer by jbt_PwrPvt Tuesday, March 7, 2017 4:40 PM
    Tuesday, February 28, 2017 8:38 PM

All replies

  • John,

    Based on your data, there are some prerequisites that need to be considered.

    How are you going to handle more outs than ins? You data shows multiple outs following one in. This is an oxymoron. There would need to be logic to point to those items and clean up or to treat each punch as an in if the previous was an out or in reverse. Does the source of the data mark as IN/OUT? 

    If the data is clean and sorted in an employee/timestamp order, we could probably look at durations between the current record and previous record with a filter for employee and in/out and add the durations for the date.

    If it were a clean in/out row/next row you perhaps use query to split the file and add the outs as a new column for outs to the ins and hove both in the same row.


    • Edited by geraldartman Thursday, February 23, 2017 6:00 PM
    Thursday, February 23, 2017 5:59 PM
  • Gerald,

    Nice catch.  Yes, it's an issue with which I also have to resolve.  What happens is the time clocks are manually held units, a turn style gate, or wall mounted units in buildings.  The employee can go to any of these to clock in/out.  In the case you are seeing, the employee must have badged out, to go out to the car for something but then badged in with a security guard using a hand-held unit.  The hand-held units have to be manually set to "in" or "out" before scanning.  The security guard didn't do that, so we end up with an "out" that should have been an "in".  All very confusing and quite the PQ puzzle.  For argument's sake, lets assume I can get the data cleaned up to just an "In" record, followed by an "Out" record.

    There is a field named "Event" (not shown in screen shot) that indicates "Access Granted - Anti-Passback Used".  This means this record is an "in" (or "out") and the previous record was also an "in" (or "out").  I tried using an index field to then add a custom column which would "look up" the previous record to find out #1 is the record's time < 15min of the current record's time, if so return a "delete me", or #2 if the "Event" field has "Anti-Passback Used" in its value, to then create the correct times.  Problem is that the index just brought the whole thing to a grinding halt.  I let it run for an hour and it still kept grinding away.  So using an index field doesn't seem to be a viable option.

    Does the source of the data mark as IN/OUT? This is the source data.  I did clean it up a bit by formatting data types, renaming columns, splitting out a few, but for the most part it's the same as the original.

    "If the data is clean and sorted in an employee/timestamp order, we could probably look at durations between the current record and previous record with a filter for employee and in/out and add the durations for the date."  How can I do this exactly?  (Without using index / look up.  I have several years of data to review.)

    "If it were a clean in/out row/next row you perhaps use query to split the file and add the outs as a new column for outs to the ins and hove both in the same row."  Yes, this is what I would like to do.  I have new columns added that use the following code:

    = Table.AddColumn(FilteredOnly_Granted_EventField, "BadgeIn", each if [InOut] = "In" and Text.Contains([Event], "Granted") then [PunchDateTime] else null)

    This creates the "BadgeIn" column and puts 'null' for any "out" fields and [PunchDateTime] for the in's.  I have another for "BadgeOut".  But how do I get them combined to the same record?

    I can send some sample data and my M-Code if it would help...


    John Thomas

    Thursday, February 23, 2017 9:06 PM
  • Perhaps the following can help. It provides a cross-tab table with one row per person per day, so that "TimeIn", "TimeOut", "Hours Worked", etc., is in the same row. 

    let
        ........,
        RemovedColumns = Table.RemoveColumns(<previousStepName>, List.Difference(Table.ColumnNames(Source), {"Date", "BadgeID", "Name", "PunchDateTime", "Time"})),
        ChangedType1 = Table.TransformColumnTypes(RemovedColumns,{{"Date", type date}, {"BadgeID", type text}, {"PunchDateTime", type text}, {"Time", type text}}),
        GroupedRows = Table.Group(ChangedType1, {"Date", "BadgeID", "Name"}, {{"Table", each _, type table}}),
        MinMaxTimes = Table.TransformColumns(GroupedRows, {"Table", each Table.FromRecords({Table.Min(_, "Time"), Table.Max(_, "Time")})}),
        CombinedColumns = Table.TransformColumns(MinMaxTimes, {"Table", each Table.CombineColumns(_,{"PunchDateTime", "Time"},Combiner.CombineTextByDelimiter(","), "Merged")}),
        RemovedColumns1 = Table.TransformColumns(CombinedColumns, {"Table", each Table.RemoveColumns(_, {"Date", "BadgeID", "Name"})}),
        TransposedColumns = Table.TransformColumns(RemovedColumns1,{"Table", each Table.Transpose(_)}),
        ExpandedTable = Table.ExpandTableColumn(TransposedColumns, "Table", {"Column1", "Column2"}),
        SplitColumnByDelimiter = Table.SplitColumn(ExpandedTable,"Column1",Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv),{"PunchDateTimeIn", "TimeIn"}),
        ChangedType2 = Table.TransformColumnTypes(SplitColumnByDelimiter,{{"PunchDateTimeIn", type datetime}, {"TimeIn", type number}}),
        SplitColumnByDelimiter1 = Table.SplitColumn(ChangedType2,"Column2",Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv),{"PunchTimeDateOut", "TimeOut"}),
        ChangedType3 = Table.TransformColumnTypes(SplitColumnByDelimiter1,{{"PunchTimeDateOut", type datetime}, {"TimeOut", type number}}),
        AddedCustom = Table.AddColumn(ChangedType3, "Hours Worked", each Number_CeilingMath(([TimeOut]-[TimeIn])*24, 0.25)),
        ChangedType4 = Table.TransformColumnTypes(AddedCustom,{{"TimeIn", type time}, {"TimeOut", type time}}),
        ReorderedColumns = Table.ReorderColumns(ChangedType4,{"Date", "BadgeID", "Name", "PunchDateTimeIn", "PunchTimeDateOut", "TimeIn", "TimeOut", "Hours Worked"})
    in
        ReorderedColumns

    The AddedCustom step uses a general purpose Ceiling.Math custom function, that works the same as Excel's CEILING.MATH. Unfortunately, M doesn't provide any CEILING, FLOOR, or MROUND functions, so I've had no choice but to create my own.

    Number_CeilingMath:

    (number as number, optional significance as nullable number, optional mode as nullable number) as number=>
    let
        Significance = if significance = null then Number.Sign(number)
                       else significance,
        Mode = if Number.Sign(number) = -1 and mode <> null then -1 else 1,             
        Ceiling = if Significance = 0 then 0
                  else if Number.Mod(number,Significance) = 0 then number
                  else if Number.Sign(number) = 1 or Mode = -1 then
                      Number.Round((number+Mode*Number.Abs(Significance)/2)/Significance,0,RoundingMode.TowardZero)*Significance
                  else
                      Number.Round((number+Mode*Number.Abs(Significance)/2)/Significance,0,RoundingMode.AwayFromZero)*Significance 
    in
        Ceiling

    Thursday, February 23, 2017 11:01 PM
  • Okay, after re-reading the requirements, I don't think my transformations will give you what you need because it only considers first time in and last time out. I missed that any duration between out and back in doesn't count as paid hours, which means that the guy who has to go to his car to retrieve something is actually losing money! :) If I were running the place, I'd make sure that all the coffee stations and washrooms are outside the building!! :D

    Now back to the drawing board. Note that no solution will ever be possible if you can't sort out your matching in/out pairs.

    Thursday, February 23, 2017 11:40 PM
  • This version adds a new column with the total hours worked per day based on adding the differences between In/Out pairs. The value is repeated in each row for the same person. Uses Number_CeilingMath custom function as mentioned in previous post.

    let
        .........,
        RemovedColumns = Table.RemoveColumns(<previousStepName>, List.Difference(Table.ColumnNames(Source), {"Date", "BadgeID", "Name", "PunchDateTime", "Time", "InOut"})),
        ChangedType1 = Table.TransformColumnTypes(RemovedColumns,{{"Date", type date}, {"BadgeID", type text}}),
        GroupedRows = Table.Group(ChangedType1, {"Date", "BadgeID", "Name"}, {{"Table", each _, type table}}),
        AddedIndex = Table.TransformColumns(GroupedRows, {"Table", each Table.AddIndexColumn(_, "Index", 0, 1)}),
        AddedCustom = Table.TransformColumns(AddedIndex, {"Table", each Table.AddColumn(_, "GroupIndex", each Number.IntegerDivide([Index], 2))}),
        AddedCustom2 = Table.TransformColumns(AddedCustom, {"Table", each Table.AddColumn(_, "TimeMinus", each if [InOut] = "Out" then [Time] else -[Time])}),
        GroupedRows2 = Table.TransformColumns(AddedCustom2, {"Table", each Table.Group(_, {"GroupIndex"}, {{"Hours", each List.Sum([TimeMinus])}})}),
        ExpandedTable = Table.ExpandTableColumn(GroupedRows2, "Table", {"Hours"}),
        GroupedRows3 = Table.Group(ExpandedTable, {"Date", "BadgeID", "Name"}, {{"Total Hours", each Number_CeilingMath(List.Sum([Hours])*24, 0.25), type number}}),
        MergedTables = Table.NestedJoin(ChangedType1, "BadgeID", GroupedRows3, "BadgeID", "Merged"),
        ExpandedMerged = Table.ExpandTableColumn(MergedTables, "Merged", {"Total Hours"}, {"Total Hours"}),
        ChangedType = Table.TransformColumnTypes(ExpandedMerged,{{"Time", type time}})
    in
        ChangedType

    • Proposed as answer by Imke FeldmannMVP Monday, February 27, 2017 6:16 PM
    • Marked as answer by jbt_PwrPvt Tuesday, March 7, 2017 4:34 PM
    Friday, February 24, 2017 3:36 AM
  • Hi Colin - Thanks for the posts. Very useful.  I am having trouble with the custom Ceiling Function.  I copied the M Code and pasted it into a blank query and named it "fxCeiling".  The data set I ended up with was one record for employee with a "BadgeIn" and "BadgeOut" date/time data type.  I noticed your function used the "time" field so I used the DateTime.Time([BadgeIn]) function to get the time part of the date/time field.

    = Table.AddColumn(GroupRecords, "Hours", each fxCeiling((DateTime.Time([BadgeOut])-DateTime.Time([BadgeIn]))*24, 0.25)) //Needs to be adjusted for night shift 5:00 AM to 5:00 PM (next day)

    I'm not real good at this and I wish there were a way to check parts of the formula to see where the error is originating.  The function works, but all the values just have "Error".  No message other than that.

    The issue with using just a Time filed is that the night crew shows up at 6:00 PM on one day, then they badge out at 5:00 AM the next day.  So I need to stick with the difference in BadgeIn BadgeOut from day-time to day-time.

    (I did try adding TimeIn & TimeOut fields using DateTime.Time(<date>) and then using these fields in the fxCeiling function and I still got the same error.


    John Thomas

    Tuesday, February 28, 2017 6:18 PM
  • Hi John,

    To track the step causing the error, you change the step name after the "in" keyword to a different step name in the script. The code is evaluated up to the name indicated after the "in" keyword. This is an iterative process. You change the step name until you find the one where the error first occurs.

    I understand the problem with shifts crossing over a day. However, what logic would we use to know that we're in the same shift period? For example, someone badges in at 9:00 PM on 2/15/2017. He then badges out at 11:45 PM. He badges in again at 12:15 AM on 2/16/2017, and so on. His shift ends at 5:00 AM on 2/16/2017. He badges in again at 9:00 PM on 2/16/2017 for a new shift.

    I'm also noticing issues with the ceiling function under certain situations. For example, if someone works a single BadgeIn/BadgeOut shift from say 10:00 PM on one day to 6:00 AM the next day, ceiling will calculate 8.25 hours instead of 8 hours. This occurs because the calculated duration is 0.333333333... of a day.

    • Marked as answer by jbt_PwrPvt Tuesday, March 7, 2017 4:40 PM
    Tuesday, February 28, 2017 8:38 PM
  • Hi John,

    do you want to follow this up or close this thread?


    Imke Feldmann TheBIccountant.com

    Saturday, March 4, 2017 6:52 AM
  • Imke - sorry for the delay.  I marked an answer as "answer" as it was close enough to get me down the road.  Lot's of variables in the data to sort out.  If I can get down to a reliable data set and still need assistance I'll open a new post.

    John Thomas

    Tuesday, March 7, 2017 4:36 PM
  • That's a cool trick.  Thanks for the tip. 


    John Thomas

    Tuesday, March 7, 2017 4:40 PM