How to Connect Excel to Hadoop on Azure via HiveODBC

How to Connect Excel to Hadoop on Azure via HiveODBC

One key feature of Microsoft’s Big Data Solution is solid integration of Apache Hadoop with the Microsoft Business Intelligence (BI) components. A good example of this is the ability for Excel to connect to the Hive data warehouse framework in the Hadoop cluster. This section walks you through using Excel via the Hive ODBC driver

 Important!
This wiki topic may be obsolete.
The wiki topics on Windows Azure HDInsight Service are no longer updated by Microsoft. We moved the content to windowsazure.com where we keep it current. This topic can be found at How to Connect Excel to Windows Azure HDInsight via HiveODBC.

Install the Hive ODBC Driver

Prerequisites:

  • Ensure that Excel 2010 64-bit is already installed before installing the HiveODBC Driver. 
  • Download the 64-bit Hive ODBC driver MSI file from the Portal by clicking the Download tile after logging into HadoopOnAzure.com.
  • Note, the HiveODBC driver includes both the HiveODBC Driver and the Excel Hive-AddIn.

 

  1. Double click HiveODBCSetupx64.msi to start the installation.
  2. Read the license agreement.
  3. If you agree to the license agreement, click I agree and Install.
  4. Once the installation has completed, click Finish to exit the Setup Wizard.

 

Install the Microsoft Excel Hive Add-In

Prerequisites:

  • Microsoft Excel 2010 64-bit
  • 64bit Hive ODBC driver installed

 

  1. Start Microsoft Excel 2010 64-bit.
  2. You will be asked to install the HiveExcel add-in. Click Install.
  3. Once the add-in has been installed, click the Data tab in Microsoft Excel 2010. You should see the Hive Panel as shown in the screenshot below.

Create a Hive ODBC Data Source to use with Excel

 

  1. Click Start->Control Panel to launch the Control Panel for Microsoft Windows.
  2. In the Control Panel, Click System and Security->Administrative Tools->Data Sources (ODBC). This will launch the ODBC Data Source Administrator dialog.
  3. In the ODBC Data Source Administrator dialog, click the System DSN tab.
  4. Click Add to add a new data source.
  5. Click the HIVE driver in the ODBC driver list. 
  6. Then click Finish. This will launch the ODBC Hive Setup dialog shown in the screenshot below.
  7. Enter a data source a name in the Data Source Name box. For Example, “MyHiveData”.
  8. In the Host box , enter the host name of the cluster you created on the portal. For example, “myhadoopcluster.cloudapp.net”.
  9. Enter the username you used to authenticate on the portal.
  10. Click OK to save the new Hive data source.
  11. Click OK to close the ODBC Data Source Administrator dialog.

 

Using the Excel Hive Add-In

 

  1. Go to HadoopOnAzure.com and click Sign In to sign into your Hadoop cluster.
  2. Click the Open Ports tile to access port configurations for your cluster.
  3. On the Configure Ports page, click the toggle switch for the ODBC Server port to turn it on.
  4. Open Microsoft Excel 2010 64-bit.
  5. In Microsoft Excel 2010 64-bit, click the Data tab.
  6. Click the Hive Panel to open the Hive panel in Excel.
  7. In the drop-down list labeled Select or Enter Hive Connection, select the data source name you previously created.
  8. You will be asked to enter the password to authenticate with the cluster on the portal. Enter the password for the user name.
  9. In the drop-down list labeled Select the Hive Object to Query, select hivesampletable [Table].
  10. Select the checkbox for each of the columns in the table. The Hive Query panel should look similar to the following.
     

11.  Click Execute Query.

 

 

More Information

 

Sort by: Published Date | Most Recent | Most Useful
Comments
  • Cool.. tried it works like a charm

  • will this odbc driver work for a non azure based hadoop cluster?

  • We are planning to implement Proof of Concepts(POC) on using Microsoft BI stack/Hadoop Big data.  We have HadoopOnCloudera cluster is available with two nodes.  Microsoft provided Hive ODBC driver and Hive Add in Excel to using in Powerpivot Excel 2010.  We are planning to install these drivers and connect to HadoopCloudera thru Excel Powerpivot 2010 and publish it in SharePoint.

    I have below questions:

    1. Is this Hive driver is only for HadoopOnAzure or Can we connect to HadoopOnCloudera using Microsoft Hive ODBC drivers from Excel Powerpivot2010.  Can we also connect using SSRS/SSAS.  Are there any plug-ins available from SSIS?

    2. Hive ODBC driver and Add in Excel are Free or do we need to obtain license

    3. Is there any other better use case to implement POC with MS BI stack than what we are thinking

  • Yes, it is work very well with non Azure based Hadoop. We are working with Cloudera Hadoop and Microsoft ODBC driver.

  • Hi Michael,

    May I know what specific setting you have to do/tune on your Cloudera Hadoop in order to make this ODBC driver work?

    In my scenario:

    1. Tested Excel with this ODBC driver with Azure and it was working, so I think the setting on Excel/Windows ODBC side is okay.

    2. When trying to connect to Cloudera Hadoop, we made sure Hive server is working. When connecting from Excel to Cloudera, we chose No Authentication and it seemed to connect, but there was no table shown.

    Thanks in advance.

    Chih-Hua

  • Hi Chih-Hua,

    it could be many things like port,permissions on hadoop .....

    Drop me mail with your details I will try to help you.

    michaeln @ liveperson com

  • I have the same question as RR Netha.

  • 2 questions:

    1) Is a 32 bit release also possible, largely so I can use it in SSDT for SSIS?

    2) Will it support usage by SSIS?

    Cheers, James

  • Hi BI Monkey,  

    For the  32-bit question, check out this forum post:  social.msdn.microsoft.com/.../83862898-b578-4e1a-870d-ab07ad9e699d

    Will it support usage by SSIS, the answer is that there are plans to have an ODBC driver to connect to Hadoop.   You can configure ODBC Connection Manager in SSIS to connect, see this post:  

    alexeikh.wordpress.com/.../moving-data-from-hive-table-in-ms-hadoop-into-sql-server-using-ssis-server

    HTH,

    Michele

  • Hi RR Netha and vyasmanish,

    I believe that most of your questions were answered in the forums.  But just in case:

    Yes, the Hive driver will work with the Cloudera distribution.

    For your 2nd question, visit Denny's blog and read his article on Connecting PowerPivot to Hadoop on Azure -- Self-Service BI to Big Data in the Cloud.

    And, yes, the driver and add-in are free.

Page 1 of 2 (15 items) 12