none
How to create an new column from ordered pairs RRS feed

  • Question

  • Hi,

    Wondering if anyone can help this problem. I am analyzing network traffic between a PC and a Server. I can easily get a count of packets in each direction, PC1 to Server1 and the return path, Server 1 to PC1. But want I really want is a count of PC1 to OR from Server1, basically a count of packets by IP conversation.

    Each row is a packet with fields like: Time, Source, Destination.

    For example

    2:01:01, PC1, Server1

    2:01:01, Server1,  PC1 

    I would like to create a new column that represents a conversation. So new have columns: Time, Source, Destination, Conv. To get something like this:

    2:01:01, PC1, Server1, Conv1

    2:01:01, Server1,  PC1, Conv1

    2:01:01, PC2, Server1, Conv2

    2:01:01, Server1,  PC2, Conv2

    Thanks,

    Wes

    Tuesday, August 19, 2014 10:58 PM

Answers

  • Wes, I just got around to revisiting your problem. The solution I came up with is twofold. The first part is the creation of a custom function that generates a conversation ID in the form PC1_Server1, PC2_Server1 etc. The second part uses a lookup table for each unique combination of PC and server to return a conversation number (Conv1, Conv2...).

    First of all, the "NetTraffic" table (your original table) generates the following script

    let
        Source = Excel.CurrentWorkbook(){[Name="Table8"]}[Content],
        #"Changed Type" = Table.TransformColumnTypes(Source,{{"Time", type time}}),
    in
        #"Changed Type"

    In the above, the type of the timestamp column is changed to time, but you can alternatively fix it as text.

    I've called the custom function "ConversationIdentifier." It takes four parameters, as described by the parameter names.

    (sourceList as list, sourceValue as text, destinationList as list, destinationValue as text) as text=>
    let
        SourceConversationList = List.Sort(List.Select(sourceList,each (_ = sourceValue or _ = destinationValue))),
        DestinationConversationList = List.Sort(List.Select(destinationList,each (_ = sourceValue or _ = destinationValue)),Order.Descending),
        ConversationID = SourceConversationList{0}&"_"&DestinationConversationList{0}
    in
        ConversationID

    Usage:

    As an added custom column in Power Query, the formula, based on the "NetTraffic" table would be:

    =ConversationIdentifier(#"Changed Type"[Source],[Source],#"Changed Type"[Destination],[Destination])

    As previously mentioned, the custom function generates combinations of PC and server, such that each combination represents a conversation.

    Next, I created a two-column "ConversationLookup" table in a worksheet, with entries like PC1_Server1|Conv1, PC2_Server1|Conv2, and so on.

    In the NetTraffic table, I did a merge with the ConversationLookup table, ensuring that the "Only Include Matching Rows" option is left unchecked (left join).

    Finally, I removed unwanted columns. Note: You don't need to use the lookup step if PC1_Server1, PC2_Server1 etc. serves your need for identifying a conversation.

    Thursday, August 21, 2014 7:22 PM

All replies

  • Wes, is the time column a fixed timestamp that identifies a conversation between a particular PC and server? Just need to understand what distinguishes one conversation from another between the same PC and server.
    Wednesday, August 20, 2014 2:48 PM
  • Colin, the time column is unique for each row and represents the time when the a packet was captured by the network sniffer (packet capture tool).

    I am interesting in IP conversations which is just all packets (rows) between the PC and Server, both directions. In network terms a conversation would between the same PC and Server and the same TCP port numbers but I don't need that level of detail yet.

    Wednesday, August 20, 2014 6:46 PM
  • Wes, I just got around to revisiting your problem. The solution I came up with is twofold. The first part is the creation of a custom function that generates a conversation ID in the form PC1_Server1, PC2_Server1 etc. The second part uses a lookup table for each unique combination of PC and server to return a conversation number (Conv1, Conv2...).

    First of all, the "NetTraffic" table (your original table) generates the following script

    let
        Source = Excel.CurrentWorkbook(){[Name="Table8"]}[Content],
        #"Changed Type" = Table.TransformColumnTypes(Source,{{"Time", type time}}),
    in
        #"Changed Type"

    In the above, the type of the timestamp column is changed to time, but you can alternatively fix it as text.

    I've called the custom function "ConversationIdentifier." It takes four parameters, as described by the parameter names.

    (sourceList as list, sourceValue as text, destinationList as list, destinationValue as text) as text=>
    let
        SourceConversationList = List.Sort(List.Select(sourceList,each (_ = sourceValue or _ = destinationValue))),
        DestinationConversationList = List.Sort(List.Select(destinationList,each (_ = sourceValue or _ = destinationValue)),Order.Descending),
        ConversationID = SourceConversationList{0}&"_"&DestinationConversationList{0}
    in
        ConversationID

    Usage:

    As an added custom column in Power Query, the formula, based on the "NetTraffic" table would be:

    =ConversationIdentifier(#"Changed Type"[Source],[Source],#"Changed Type"[Destination],[Destination])

    As previously mentioned, the custom function generates combinations of PC and server, such that each combination represents a conversation.

    Next, I created a two-column "ConversationLookup" table in a worksheet, with entries like PC1_Server1|Conv1, PC2_Server1|Conv2, and so on.

    In the NetTraffic table, I did a merge with the ConversationLookup table, ensuring that the "Only Include Matching Rows" option is left unchecked (left join).

    Finally, I removed unwanted columns. Note: You don't need to use the lookup step if PC1_Server1, PC2_Server1 etc. serves your need for identifying a conversation.

    Thursday, August 21, 2014 7:22 PM
  • It works. Thanks Colin!

    Tuesday, August 26, 2014 5:27 PM