Introduction

Working with the Azure Data Lake Store can sometimes be difficult, especially when performing actions on several items. As there is currently no GUI tool for handling this, PowerShell can be used to perform various tasks. The toolkit described in this article contains several scripts, which makes automation in the Data Lake a little easier.

How To Use

  • Download AzureDataLakeStoreTookit.zip from TechNet Gallery
  • Unzip to a local folder
  • Run scripts in Admin PowerShell console. Make sure PowerShell is not restricted

Details

This section will list all the script in the toolkit, and explain their purpose. All scripts are deliberately designed to do one thing only, to avoid complex error handling. All scripts include comment section with synopsis, description, example(s) and notes. This has been removed from the code sections below. All scripts include variables which must be changed to adhere to the applicable environment. The scripts are listed alphabetically. 

CountFiles.ps1

Count all files in a specific folder

function CountFiles
{
    Param(
        [string]$rootFolder
    )
    $items = Get-AzureRmDataLakeStoreChildItem -Account $dataLakeStoreName -Path $rootFolder
 
    $count = 0
 
    Write-Host "Number of files in $rootFolder :"
 
    foreach ($item in $items)
    {
        if ($item.Type -eq "FILE")
        {
            $count += 1
        }
    }
    return $count
}
 
Login-AzureRmAccount
$dataLakeStoreName = "myDataLakeStore" #Replace value with your own Data Lake Store name
$folder = "/user" #Replace value with the folder you want to delete files from
CountFiles $folder


DeleteTMPFiles.ps1

Delete all files with a certain extension (configurable). Default value is TMP

Login-AzureRmAccount
 
$dataLakeStoreName = "myDataLakeStore" #Replace value with your own Data Lake Store name
$rootFolder = "/user/myfolder" #Replace value with the folder you want to delete files from
$count = 0
 
$items = Get-AzureRmDataLakeStoreChildItem -Account $dataLakeStoreName -Path $rootFolder | Where-Object {$_.Name.Contains(".TMP")} #Replace value with the extension you want to delete
foreach ($item in $items)
{
    $fileName = $item.Name
    Write-Host "Deleting $fileName"
    Remove-AzureRmDataLakeStoreItem -Account $dataLakeStoreName -Path $rootFolder/$fileName -Force
    $count += 1
}
 
Write-Host "`n$count file(s) were deleted"


DownloadFiles.ps1

Download all files in a folder to a folder on the local disk

Login-AzureRmAccount
 
$dataLakeStoreName = "myDataLakeStore" #Replace value with your own Data Lake Store name
$rootFolder = "/user/myfolder" #Replace value with the folder you want to download
$downloadDest = "c:\temp\"  #Replace value with the download destination folder
 
Export-AzureRmDataLakeStoreItem -Account $dataLakeStoreName -Path $rootFolder -Destination $downloadDest -Force -Recurse


GetFolderContent.ps1

List all files in a folder (recursive)

function GetFolderContent
{
    Param(
        [string]$rootFolder
    )
    $items = Get-AzureRmDataLakeStoreChildItem -Account $dataLakeStoreName -Path $rootFolder
 
    Write-Host "`nContents in $rootFolder"
 
    foreach ($item in $items)
    {
        if ($item.Type -eq "DIRECTORY")
        {
            $nextFolder = $item.Name
 
            if ($rootFolder -eq "\")
            {
                GetFolderContent $nextFolder
            }
            else
            {
                GetFolderContent $rootFolder/$nextFolder
            }    
        }
        if ($item.Type -eq "FILE")
        {
            Write-Host $item.Name
        }
    }
    return $null
}
 
Login-AzureRmAccount
 
$dataLakeStoreName = "dataplatformdlsprod" #Replace value with your own Data Lake Store name
$rootFolder = "/raw/plant/osebergd/ims/history/Compressor" #Replace value with the folder you want to get contents of
 
GetFolderContent $rootFolder


MoveFiles.ps1

Move all files in a folder to another folder

Login-AzureRmAccount
 
$dataLakeStoreName = "myDataLakeStore" #Replace value with your own Data Lake Store name
$rootFolder = "/user/from" #Replace value with the folder you want to move files from
$destFolder = "/user/to" #Replace value with the folder you want to move files to
$count = 0
 
$items = Get-AzureRmDataLakeStoreChildItem -Account $dataLakeStoreName -Path $rootFolder
foreach ($item in $items)
{
    $fileName = $item.Name
    Write-Host "Moving $fileName to"
    Move-AzureRmDataLakeStoreItem -Account $dataLakeStoreName -Path $rootFolder/$fileName -Destination $destFolder/$fileName -Force
    $count += 1
}
 
Write-Host "`n$count file(s) were moved"


RemoveFileExpiry.ps1

Remove file expiry on a file. The file will no longer be deleted after the expiration date is reached

function RemoveFileExpiry
{
    Param(
        [string]$fileName
    )
    Write-Host "Removing expiry on $fileName"
    Set-AzureRmDataLakeStoreItemExpiry -Account $dataLakeStoreName -Path $fileName
}
 
Login-AzureRmAccount
 
$dataLakeStoreName = "myDataLakeStore" #Replace value with your own Data Lake Store name
$fileName = "/user/myfile.jpg" #Replace value with the file you want to remove expiry on
 
RemoveFileExpiry $fileName


RemoveFolderExpiry.ps1

Remove file expiry on all files in a folder. All files in the folder will no longer be deleted after the expiration date is reached

function RemoveFolderExpiry
{
    Param(
        [string]$folderName
    )
    $now = Get-Date
    $items = Get-AzureRmDataLakeStoreChildItem -Account $dataLakeStoreName -Path $folderName
 
    foreach ($item in $items)
    {
        if ($item.Type -eq "DIRECTORY")
        {
            $nextFolder = $item.Name
 
            if ($folderName -eq "\")
            {
                RemoveFolderExpiry $nextFolder
            }
            else
            {
                RemoveFolderExpiry $folderName/$nextFolder
            }    
        }
        if ($item.Type -eq "FILE")
        {
            $fileName = $item.Name
            Write-Host "Removing expiry on $folderName/$fileName"
            Set-AzureRmDataLakeStoreItemExpiry -Account $dataLakeStoreName -Path $folderName/$fileName
            $global:count += 1
        }
    }  
}
 
Login-AzureRmAccount
 
$dataLakeStoreName = "myDataLakeStore" #Replace value with your own Data Lake Store name
$folderName = "/user/myfolder" #Replace value with the folder you want to remove expiry on
$global:count = 0
 
RemoveFolderExpiry $folderName
Write-Host "`nRemoved expiry on $count file(s)"


SearchForFile.ps1

Search for a file

function SearchForFile
{
    Param(
        [string]$rootFolder
    )
    $items = Get-AzureRmDataLakeStoreChildItem -Account $dataLakeStoreName -Path $rootFolder
 
    foreach ($item in $items)
    {
        if ($item.Type -eq "DIRECTORY")
        {
            $nextFolder = $item.Name
 
            if ($rootFolder -eq "\")
            {
                SearchForFile $nextFolder
            }
            else
            {
                SearchForFile $rootFolder/$nextFolder
            }    
        }
        if ($item.Type -eq "FILE")
        {
            if ($item.Name -like $searchString)
            {          
                Write-Host $item.Name "found in" $rootFolder
            }
        }
    }
    return $null
}
 
Login-AzureRmAccount
 
$dataLakeStoreName = "myDataLakeStore" #Replace value with your own Data Lake Store name
$rootFolder = "/user/myfolder" #Replace value with the folder you want to get contents of
$searchString = "*filename*" #Replace value with the file you want to search for. Asterisk allowed
 
SearchForFile $rootFolder


SetFileExpiry.ps1

Set expiry on a file. The file will be deleted after the expiration date is reached

function SetFileExpiry
{
    Param(
        [string]$fileName
    )
    $now = Get-Date
    Write-Host "Setting retention on $fileName"
    Set-AzureRmDataLakeStoreItemExpiry -Account $dataLakeStoreName -Path $fileName -Expiration $now.AddMonths(3) #Replace expiry as required
}
 
Login-AzureRmAccount
 
$dataLakeStoreName = "myDataLakeStore" #Replace value with your own Data Lake Store name
$fileName = "/user/myfile.jpg" #Replace value with the file you want expiry on
 
SetFileExpiry $fileName


SetFolderExpiry.ps1

Set expiry on all files in a folder. All files in the folder will be deleted after the expiration date is reached

function SetFolderExpiry
{
    Param(
        [string]$folderName
    )
    $now = Get-Date
    $items = Get-AzureRmDataLakeStoreChildItem -Account $dataLakeStoreName -Path $folderName
 
    foreach ($item in $items)
    {
        if ($item.Type -eq "DIRECTORY")
        {
            $nextFolder = $item.Name
 
            if ($folderName -eq "\")
            {
                SetFolderExpiry $nextFolder
            }
            else
            {
                SetFolderExpiry $folderName/$nextFolder
            }    
        }
        if ($item.Type -eq "FILE")
        {
            $fileName = $item.Name
            Write-Host "Setting expiry on $folderName/$fileName"
            Set-AzureRmDataLakeStoreItemExpiry -Account $dataLakeStoreName -Path $folderName/$fileName -Expiration $now.AddMonths(3)
            $global:count += 1
        }
    }  
}
 
Login-AzureRmAccount
 
$dataLakeStoreName = "myDataLakeStore" #Replace value with your own Data Lake Store name
$folderName = "/user/myfolder" #Replace value with the folder you want expiry on
$global:count = 0
 
SetFolderExpiry $folderName
Write-Host "`nSet expiry on $count file(s)"


UploadFiles.ps1

Upload files to the Data Lake Store

Login-AzureRmAccount
 
$dataLakeStoreName = "myDataLakeStore" #Replace value with your own Data Lake Store name
$sourceFolder = "C:\temp" #Replace value with the folder you want to upload
$uploadDest = "/user/myfolder" #Replace value with the upload destination path
 
Import-AzureRmDataLakeStoreItem -Account $dataLakeStoreName -Path $sourceFolder -Destination $uploadDest -Force -Recurse


See Also

Another important place to find an extensive amount of Cortana Intelligence Suite related articles is the TechNet Wiki itself. The best entry point is Cortana Intelligence Suite Resources on the TechNet Wiki.

Back to Top