none
How to reinvent findstr with Powershell?

    Question

  • I reinvented findstr (aka find/grep) using oracle's "groovy" programming language on a 8 core machine and experienced nearly a 10 fold improvement over cygwin's find/grep -- probably because I used threads (but not thread pools). This was a great surprise because I was expecting to be I/O bound. Apparently findstr/find/grep are not I/O bound and they are spending their time matching patterns. Hmm.... strange...

    After learning about the merits of Win32 thread pools I would like to reinvent findstr in powershell and use thread pools for multiple threads.

    Can someone help me write a multithreaded powershell program that (recursively?) descends thru a directory tree and searches the contents of the files for a regular expression using multiple threads from Win32 thread pools.

    I'm not sure how to do this! Should I use the .NET or the Win32 thread pool API?

    Thanks!

    Siegfried


    siegfried heintze

    Friday, February 10, 2012 7:46 PM

Answers

  • Hi,

    why not using the cmdlet select-string?

    If you want to create a multithreaded powershell command you should use c# and develop a cmdlet by your own.


    regards Thomas Paetzold visit my blog on: http://sus42.wordpress.com

    Friday, February 10, 2012 7:59 PM

All replies

  • Hi,

    why not using the cmdlet select-string?

    If you want to create a multithreaded powershell command you should use c# and develop a cmdlet by your own.


    regards Thomas Paetzold visit my blog on: http://sus42.wordpress.com

    Friday, February 10, 2012 7:59 PM
  • Have you made any progress on this?

    I'm also looking to multi-thread a string comparator.

    I need to be able to parse a 500MB log in as short a time as possible. I'm not sure how threading will help resolve this, but it is a path I had thought about.

    Siegfried_ please PM me.

    Mods, please stop marking threads as answered, when the answers make absolutely no sense.

    Peddy1st, no offense, but you didn't even come close to answering this with any thought.

    I'm going to make a post over at stackoverflow, where the contributors actually think about answers "before" they post them.

    Friday, March 30, 2012 7:04 PM
  • I've been using groovy instead of powershell because (apparently) powershell has no ability to contribute any multi-threading specific logic to an application. Yes I could call C# from powershell, but at that point, why not write the entire application in C#?

    I cannot remember, I might have marked it as an answer. I think the answer is that all multi-threading specific logic must be done in another language like C#. This is really surprising since most other scripting languages like perl and groovy and python (and ruby?) all support the threading feature. Hmmm.. cmd.exe is very good at threads either.

    I cannot recommend perl multi-threading, however: it has problems with memory managment according to the response I got a couple of years ago in the beginners-perl email list.


    siegfried heintze

    Friday, March 30, 2012 8:55 PM