locked
Preventing Duplicate Partition Key and Row Key columns in Stream Analytics output to Table Storage RRS feed

  • Question

  • I have an input stream of changing item data as follows:

     

    ItemId - varchar(100)

    MilliSecondsTo9999 - varchar(20)

    Value - varchar(MAX)

     

    In Table Storage this groups by item, then orders by timestamp descending. This gives me the best performance for how I need to consume the history data.

     

    I configure Azure Table Storage as output, setting the PartitionKey to "ItemId" and the RowKey to "MilliSecondsTo9999".

    If I set the query as...

    "SELECT
        ItemId, MilliSecondsTo9999, Value
    INTO
        output
    FROM
        input"

     

    ...I get SIX resulting table columns...

    PartitionKey, RowKey, TimeStamp, itemid, millisecondsTo9999, value

     

    "Place1_Area1_Item1", "1234567891234", "29/08/2015 02:38:33", "Place1_Area1_Item1", "1234567891234", "RED"
    "Place1_Area1_Item1", "12345678922234","29/08/2015 02:38:32", "Place1_Area1_Item1", "1234567892234", "BLUE"

    ---but I only wanted FOUR.

    PartitionKey, RowKey, TimeStamp, value

    "Place1_Area1_Item1", "1234567891234", "29/08/2015 02:38:33", "RED"
    "Place1_Area1_Item1", "1234567892234", "29/08/2015 02:38:32", "BLUE"

     

    As partition key and row key are quite long strings, it is a huge waste of space, almost doubling the size of the table.

     

    But if I remove the the extra columns from the query...

    "SELECT
        Value
    INTO
        output
    FROM
        input"

     

    The output fails, because the two key columns are no longer available to reference from the output stream.

    What am I doing wrong?

    Thanks,
    Pete


    #PEJL
    Got any nice code? If you invest time in coding an elegant, novel or impressive answer on MSDN forums, why not copy it over to TechNet Wiki, for future generations to benefit from! You'll never get archived again, and you could win weekly awards!

    Have you got what it takes o become this month's TechNet Technical Guru? Join a long list of well known community big hitters, show your knowledge and prowess in your favoured technologies!







    Saturday, August 29, 2015 11:26 AM

Answers

  • Also the following info may help. We will look into adding this feature through UI in the future.

    the value of ColumnsToRemove is currently separate by ‘+’, and the value for your case is shown below:

      "ColumnsToRemove":"ItemId+MilliSecondsTo9999"

    And also your query should include the partitionKey and rowKey.


    Venkat Chilakala

    • Marked as answer by Zafar Abbas Friday, September 4, 2015 4:29 PM
    Thursday, September 3, 2015 7:09 PM

All replies

  • Please put this as a feedback suggestion for the ASA team. To be able to control the columns, especially if a column is used as a PartitionID or RowID:

    http://feedback.azure.com/forums/270577-azure-stream-analytics


    MVP Program: http://www.microsoft.com/mvp

    Tuesday, September 1, 2015 12:46 PM
  • Thanks for this feedback.

    Currently there is no way in the Portal UX to specify columns to be removed from output. There is a way on the REST API however that you can use to set this property on the Output:

    https://msdn.microsoft.com/en-us/library/azure/dn835015.aspx

    Under: Data Source – Azure Table Storage

    columnsToRemove

    No

    If specified, each item in the array is the name of a column to remove (if present) from output event entities.

    You can use Azure Power Shell to update your job to add the unnecessary columns from the output.


    Zafar Abbas

    Thursday, September 3, 2015 6:23 PM
  • Also the following info may help. We will look into adding this feature through UI in the future.

    the value of ColumnsToRemove is currently separate by ‘+’, and the value for your case is shown below:

      "ColumnsToRemove":"ItemId+MilliSecondsTo9999"

    And also your query should include the partitionKey and rowKey.


    Venkat Chilakala

    • Marked as answer by Zafar Abbas Friday, September 4, 2015 4:29 PM
    Thursday, September 3, 2015 7:09 PM
  • Thank you!

    http://feedback.azure.com/forums/270577-azure-stream-analytics/suggestions/9640131-preventing-duplicate-partition-key-and-row-key-col


    #PEJL
    Got any nice code? If you invest time in coding an elegant, novel or impressive answer on MSDN forums, why not copy it over to TechNet Wiki, for future generations to benefit from! You'll never get archived again, and you could win weekly awards!

    Have you got what it takes o become this month's TechNet Technical Guru? Join a long list of well known community big hitters, show your knowledge and prowess in your favoured technologies!


    Saturday, September 5, 2015 6:09 PM
  • This workaround can be done in PowerShell as well by editing the Output JSON.

    Download the Azure PowerShell SDK for Windows using the Web Platform Installer here Windows install

    Launch Windows PowerShell ISE from the start menu. In the upper script window, you can paste the code, and make the necessary edits, then highlight one code block at a time to run piecemeal.


    #1. Log into azure

    Login-AzureRmAccount

     

    #2. Set up variables for the streaming job name and resource group, and a folder, and which output you want to fix

    $RG="YourResourceGroup"

    $JobName="YourJobName"

    $folder="C:\Temp"

    $outputname="YourOutput"

     

    #3. Stop the job to make an edit.

    Stop-AzureRMStreamAnalyticsJob -ResourceGroupName $RG -Name $JobName

     

    #4. Copy the existing output definition into a JSON file

    (Get-AzureRMStreamAnalyticsOutput -JobName $JobName -ResourceGroupName $RG -Name $outputname).PropertiesInJson | Out-File $folder\JobOutput.json

     

    (Get-AzureRMStreamAnalyticsOutput -JobName $JobName -ResourceGroupName $RG -Name $outputname).PropertiesInJson |Write-Host

     

    #5. Manually Add the storage key into the JSON  using a text editor

    notepad.exe $folder\JobOutput.json

    #6. Edit this property: "ColumnsToRemove": ["column1","column2"]

    #7. Save JSON file as ANSI encoding text file

     

    #8. Deploy the new output

    New-AzureRMStreamAnalyticsOutput  -JobName $JobName -ResourceGroupName $RG -Name $outputname -File $folder\JobOutput.json

     

    #9. Test the output - TRUE is good outcome

    Test-AzureRMStreamAnalyticsOutput -JobName $JobName -ResourceGroupName $RG -Name $outputname

     

    #10. Start the job (synchronously wait)

    Start-AzureRMStreamAnalyticsJob -ResourceGroupName $RG -Name $JobName -OutputStartMode CustomTime -OutputStartTime 2016-08-22T00:00:00Z

     


    Didn't get enough help here? Submit a case with the Microsoft Customer Support teams for deeper investigation - Azure service support: https://manage.windowsazure.com/?getsupport=true For on Premise software support go here instead: http://support.microsoft.com/select/default.aspx?target=assistance

    Tuesday, August 23, 2016 8:22 PM