The Microsoft deployment of Apache Hadoop for Windows lets you set up a private Hadoop cluster on Azure. One of the included administration/deployment tools is an Interactive Console for JavaScript and Hive. This video introduces the Interactive Hive console. Developer Lengning Liu demonstrates running several Hive commands against your Hadoop cluster.

See Also

  • More Videos about Hadoop Services on Windows and Windows Azure
  • Apache Hadoop Services on Windows - wiki Homepage
  • Microsoft's Big Data channel on YouTube


    Transcript (edited for readability)

    Introduction to the Hadoop Services on Azure Hive Console (video)

    Hi, my name is Lengning Liu and I'm a Developer on the Microsoft Hadoop Services team.  In this video I will introduce you to the Interactive Hive console on Hadoop for Windows Azure. I will show you how to use the Interactive Hive console to perform Hive queries against the cluster.

    To enter the console, I start from the home page and click on the Interactive Console tile.
    There are 2 consoles within this environment. One is the Interactive JavaScript console and using that console is covered in a separate video. In this demo I select Hive.

    Hive Queries

    The Interactive Hive console opens a simple webpage where I can type in arbitrary Hive queries and it will show results back to the gray area once the query has finished from the server side. Let's do a couple quick demos. I type my queries in the edit pane below the gray box.

    1. show tables;
      This query displays the names of my cluster tables.  In the current cluster I have 2 tables created; one is called hivesampletable and the other is maxquerydwelltime.
    2. describe hivesampletable;
      This query returns the schema of the table.   

      As you can see in this table I see typical dimensions that you will have for weblog and then I  have a couple of measurements that are calculated, such as dwelltime.

    3. select devicemodel, SUM(querydwelltime) as total dwelltime from hivesampletable group by devicemodel order by totaldwelltime DESC limit 10;
      This query gives me the total dwelltime for each device model. The totals are given in descending order by totaldwelltime for the top 10 device models only.

    Monitor the Job Progress

    Since this query will take some time to run on the server side, let’s go to the remote desktop session we established previously and watch the progress.

    1. Go to the headnode Administration Page to see the job status.  Active MapReduce jobs display under the Running Jobs heading.
    2. Click on the JobId to jump to the webpage for a particular job. I see that this job has 2 mappers and 1 reducer allocated. The job status is displayed as well.
    3. The job is considered complete when the map and reduce tasks show 100% and the Job Cleanup shows Successful.

      For this particular query there will be another stage, a 2nd stage job, created by Hive.

    4. Navigate back to the Administration Page and watch for the 2nd stage job to be created. 
    5. Once it appears, click on the JobId to jump to the webpage for that particular job.
    6. When the job completes successfully, I minimize the remote desktop session and return to the portal page.

    The result is shown to the screen.  I see the top 10 device models that have been used to do the query, and the maximum query time associated.

    Thank you for viewing this demo, I hope you found it helpful.