locked
Problems with [regex]::matches - different results with array vs string? RRS feed

  • Question

  • New here and running into an issue when trying to leverage Regex Matches to parse out some content from a multi-part form request (file upload). I am reading the incoming data into a variable using the Request.inputstream.read(), which results in an a System.Array object being created (each newline is an array item). When I run the regex matches it evaluates things somehow and turns the array into a string, and it is removing all of the newlines in the process and thus my matched result is found but contains no newlines anymore and ends up mangling the data into one non-stop string (i.e. no newlines characters `n are found anymore). I've tried converting the System.Array into a string using join and newline as well as manipulating OFS variable but then it causes the regex match to completely fail. Not sure what I am doing wrong here and would love some help or a push in the right direction.

    My incoming data being received looks like something like this (i.e. this is what I am parsing):

    -----------------------------2865327977109
    Content-Dis-data; name="file"; filename="test.txt"
    Content-Type: text/plain
    
    This is a simple upload test...
    success or fail?
    
    content....
    ...
    ...
    ..
    .
    #EOF
    -----------------------------2865327977109
    Content-Dis-data; name="updir"
    
    C:\Users\CoolUsernameHere\AppData\Local\Temp\
    -----------------------------2865327977109
    Content-Dis-data; name="uname"
    
    test.tmp
    -----------------------------2865327977109
    Content-Dis-data; name="submitBtn"
    
    Upload File
    -----------------------------2865327977109--

    Here is my code handling receiving of the above data:

    $context = $listener.GetContext();

    $requestContentType = $context.Request.ContentType;

    # Content Type String Received: multipart/form-data; boundary=---------------------------49902129711239
    $boundary = $requestContentType.split("=")[-1] $length = $context.Request.contentlength64; $buffer = new-object "byte[]" $length; [void]$context.Request.inputstream.read($buffer, 0, $length); $reqData = [system.text.encoding]::ascii.getstring($buffer); # Now $reqData contains a System.Array mirroring the sample content already posted above


    If I use the following regex to match my boundary markers to create parsable chunks to pull file content and other parameter/value pairs from later.

    [regex]::matches($reqData, "$boundary(.+)$boundary(.+)$boundary(.+)$boundary(.+)$
    boundary" , @('MultiLine','Ignorecase')).Groups[1].Value

    This does find my match content i am looking for, however it is merging it into a single string and removing all newlines, which messes up the data and makes it impossible for me to write the uploaded file content to disk due to the formatting being messed up with no clean way to rebuild. Ends up appearing, like so:

     Content-Dis-data; name="file"; filename="test.txt" Content-Type: text/plain  This is a simple upload tes
    t... success or fail?  content.... ... ... .. . #EOF

    Is there a way I can use the regex matches and still get results but keep the newline returns (`r`n) or properly convert to string and successfully run my matches regex? Feel like I am just missing something here....

    NOTE: I would be open to other options for parsing this upload data, but was unable to find any real articles or write-ups on handling POST File Uploads out on the web. Only found how to send them which is the easier part I am not concerned with...

    Thursday, July 13, 2017 3:07 PM

All replies

  • Normally web data has no line breaks or uses only a single LF character.  You will not get an array of line without splitting the text on the LF.

    Matches will not match as expected with an array.

    Using your method just split the text to an array before outputting to a file.  The file will then be built with CrLf pairs.


    \_(ツ)_/

    Thursday, July 13, 2017 3:39 PM
  • The problem with using split before outputting is that the newlines from file content get wiped out making split no longer usable for re-joining the data. I ended up just writing a looping statement to walk through the $reqData array and filling up my content buckets so I can retain proper formatting of uploaded content. Thanks for the help anyways!

    here is what i ended up using for reference...

    $fileContent = "";
    $fileUpDir = "";
    $fileUniqName = "";
    $boundaryCounter = 0;
    $sectionLineCounter = 0;
    $tempBucketContent = "";
    $tempBucketName = "";
    $mainBucket = @{};
    for($i=0; $i -lt $reqData.Length; $i++) {
    	if($reqData[$i] -like $boundary) { 
    		$boundaryCounter++;
    		if($tempBucketName -ne "") {
    			$mainBucket.add($tempBucketName, $tempBucketContent);
    			$sectionLineCounter = 0;
    			$tempBucketName = "";
    			$tempBucketContent = "";
    		}
    		Write-Host "BOUNDARY MARKER";
    	} else {
    		$sectionLineCounter++;
    		if($sectionLineCounter -eq 1) {
    			if($reqData[$i] -match 'name="file"') {
    				$tempBucketName = "uploadFileContent";
    			} elseif($reqData[$i] -match 'name="upDir"') {
    				$tempBucketName = "uploadFileDir";
    			} elseif($reqData[$i] -match 'name="uname"') {
    				$tempBucketName = "uploadFileName";
    			}
    		} elseif($sectionLineCounter -eq 2) {
    			continue;
    		} elseif($sectionLineCounter -eq 3) {
    			if($tempBucketName -eq "uploadFileContent") {
    				continue;
    			} else {
    				$tempBucketContent += $reqData[$i];
    			}
    		} elseif($sectionLineCounter -gt 3) {
    			$tempBucketContent += $reqData[$i];
    			if($tempBucketName -eq "uploadFileContent") {
    				$tempBucketContent += "`n";
    			}
    		}
    		Write-Host "[$($sectionLineCounter)] CONTENT: $($reqData[$i])";
    	}
    }
    # $mainBucket now contains all of my desired values & file content properly formatted - just use a .Trim() to clean up any extra newlines we might have added from looping...

    Thursday, July 13, 2017 5:49 PM
  • Not new line. Split on LF only.  RegEx does not remove anything.


    \_(ツ)_/

    Thursday, July 13, 2017 6:39 PM
  • mistake on my part, I was using a sample of the data received and loading in shell session to play. When using get-content it formats it as an array, while it is actually coming in as a single string. Flipped things around and now it is working as intended. Thanks for pushing me in the right direction.
    Thursday, July 13, 2017 9:22 PM
  • Hi LearningOnTheFly

    Thanks for your posting here.

    You could mark the useful reply as an answer to help other community members find the helpful reply quickly.

    Best Regards,

    Candy


    Please remember to mark the replies as answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com.

    Friday, July 14, 2017 8:06 AM