none
Options for accessing HDInsight - all is not what it seems RRS feed

  • Question

  • Hi,

    I recently attended the PASSBAC summit/ conference in Santa Clara and was particularly interested in the Power Query ability to access HDInsight clusters. There are broadly 3 methods to do so.

    (A) Via the From Azure menu:

    1) Azure Blob Storage

    2) Azure HDInsight

    (B) Via the From Other Sources menu

    3) ODBC

    At the conference it was confirmed that methods (A)1 and (A)2 access the Storage Account that supports a running cluster directly, so you do not actually leverage the compute power of the compute nodes, nor the power of HIVE Map Reduce. The advantage of course is that the client (in this case Power Query and Excel) continues to use the data irrespective of whether the cluster is running or not. 

    My tests shows no substantial difference in performance between (A)1 and (A)2 (I tested this whilst the cluster was running and deleted), and it seems as if only blobs stored within the Storage Account container can be accessed. The queries generated for (A)1 and (A)2 are also very similar.

    When I compared this to method (B)2 I see a substantial performance improvement, so I assume the power of the compute nodes, and the power of HIVE Map Reduce are leveraged? But of course this method is only available for as long as the cluster is running.

    Thoughts?


    'If this was helpful, please mark and an answer or as being helpful'

    Monday, June 1, 2015 12:52 AM

Answers

All replies

  • Your understanding appears quite complete to me. Is there an option you'd like to see that isn't currently available?
    Monday, June 1, 2015 3:20 PM
  • I suppose I am questioning why there are two options in the Azure menu, both Azure Blob Storage and Azure HDInsight if they both essentially do the same thing?

    'If this was helpful, please mark and an answer or as being helpful'

    Monday, June 1, 2015 11:20 PM
  • Mostly for end user discoverability. Some users might not know that clicking Azure blobs is what they need to access HDInsight.

    Thanks, Hadeel

    Tuesday, June 2, 2015 7:37 PM