none
Sorted wordcount on an input stream

    Question

  • Hi,

    How would one go about writing a query that would take a text string for each incoming event, split it up in words, sort it and in a window return an array of the 10 most used words?

    Thanks in advance.

    Sunday, March 11, 2012 4:26 PM

Answers

  • I'm not completely sure what the stream looks like or what you are trying to do but give this a shot:

    var topWords = from words in twitterStream.Scan(new KeyWordSplit(' '))
    		group words by words into g
    		from w in g.TumblingWindow(TimeSpan.FromSeconds(10), HoppingWindowOutputPolicy.ClipToWindowEnd)
    		select new WordCount
    		{
    			Word = g.Key,
    			Count = g.Count()
    		}; 


    DevBiker (aka J Sawyer)
    My Blog
    My Bike - Concours 14


    If I answered your question, please mark as answer.
    If my post was helpful, please mark as helpful.

    • Marked as answer by mortenbpost Monday, March 12, 2012 4:42 PM
    Monday, March 12, 2012 4:08 PM
    Moderator

All replies

  • You would want to use a UDSO for this. There is a "SplitString" UDSO in the LinqPad samples for StreamInsight 1.2 that would be a partial example of what you are trying to do. It will certainly give you the structure that you'd want to use. I do strongly recommend that you use LinqPad if you are doing StreamInsight work ... it allows you to model and test queries before you actually put them into your application. I've found it to be invaluable.

    Keep in mind, also, that each of the words would need to be returned as a separate events. StreamInsight does not support arrays as part of the payload. As the sample UDSO shows, you can return an array of events from the UDSO.


    DevBiker (aka J Sawyer)
    My Blog
    My Bike - Concours 14


    If I answered your question, please mark as answer.
    If my post was helpful, please mark as helpful.

    Sunday, March 11, 2012 6:36 PM
    Moderator
  • Hi, thanks for the fast reply!

    I've made a UDSO that takes a tweet as input and outputs a stream of strings of words. Then I do the following:

    var topWords = from words in twitterStream.Scan(new KeyWordSplit(' ')).TumblingWindow(TimeSpan.FromSeconds(10), HoppingWindowOutputPolicy.ClipToWindowEnd)

                                   group words by words into g
                                   select new WordCount
                                   {
                                       Word = g.Key,
                                       Count = g.Count()
                                   };

    but I get the following compile error that I can't figure out why:

    Error 12 Could not find an implementation of the query pattern for source type 'Microsoft.ComplexEventProcessing.Linq.CepWindowStream<Microsoft.ComplexEventProcessing.Linq.CepWindow<string>>'.  'GroupBy' not found.

    Thanks in advance.

    Monday, March 12, 2012 3:57 PM
  • I'm not completely sure what the stream looks like or what you are trying to do but give this a shot:

    var topWords = from words in twitterStream.Scan(new KeyWordSplit(' '))
    		group words by words into g
    		from w in g.TumblingWindow(TimeSpan.FromSeconds(10), HoppingWindowOutputPolicy.ClipToWindowEnd)
    		select new WordCount
    		{
    			Word = g.Key,
    			Count = g.Count()
    		}; 


    DevBiker (aka J Sawyer)
    My Blog
    My Bike - Concours 14


    If I answered your question, please mark as answer.
    If my post was helpful, please mark as helpful.

    • Marked as answer by mortenbpost Monday, March 12, 2012 4:42 PM
    Monday, March 12, 2012 4:08 PM
    Moderator
  • I'm not completely sure what the stream looks like or what you are trying to do but give this a shot:

    var topWords = from words in twitterStream.Scan(new KeyWordSplit(' '))
    		group words by words into g
    		from w in g.TumblingWindow(TimeSpan.FromSeconds(10), HoppingWindowOutputPolicy.ClipToWindowEnd)
    		select new WordCount
    		{
    			Word = g.Key,
    			Count = g.Count()
    		}; 


    DevBiker (aka J Sawyer)
    My Blog
    My Bike - Concours 14


    If I answered your question, please mark as answer.
    If my post was helpful, please mark as helpful.

    That was almost what I was looking for but i got it working by doing this:

                   var groupedWords = from words in twitterStream.Scan(new KeyWordSplit(' '))
                                       group words by words into g
                                       from w in g.TumblingWindow(TimeSpan.FromSeconds(10), HoppingWindowOutputPolicy.ClipToWindowEnd)
                                       select new WordCount
                                       {
                                           Word = g.Key,
                                           Count = (int)w.Count()
                                       };


                    var top5 = (from win in groupedWords.SnapshotWindow(SnapshotWindowOutputPolicy.Clip)
                               from e in win
                               orderby e.Count descending
                               select e).Take(5, e =>
                                  new WordCount
                                  {
                                      Word = e.Payload.Word,
                                      Count = e.Payload.Count
                                  });

    It's probably over complicated...

    But thanks a lot.

    Monday, March 12, 2012 4:42 PM