locked
Mounting an azureml.data.filedataset RRS feed

  • Question

  • Hi all, 

    I am tinkering around with a large file dataset that I don't want to download on my Azure iPython Notebook vm (somevm.westeurope.notebooks.azureml.net). The file dataset is created and registered in Azure ML and pointing to some blob storage. While downloading a subset of the dataset using .take(n).download() works, I don't want to manually "spool" / "page" through the dataset if the "mount" function is there. Is there a way to just mount the dataset and consume it directly? What am I missing? Is this the right forum?

    I am using the following code inside a nb:

    from azureml.core import Workspace, Dataset

    subscription_id = 'theID'
    resource_group = 'resGroup'
    workspace_name = 'wsName'

    workspace = Workspace(subscription_id, resource_group, workspace_name)

    dataset = Dataset.get_by_name(workspace, name='BigDS')

    dataset.to_path() #this prints out array(['/some.png', ..., '/more.png'], dtype=object)

    # subsequent mounting does not seem to do anything:

    with dataset.mount(mount_point="/mnt/azmnt/code/Users/USERNAME/mnt") as mount_context:
    # list top level mounted files and folders in the dataset
        os.listdir(mount_context.mount_point)

    # nothing is returned and nothing shows up in the mount folder

    # however, I can do a download and subsequently use those files but this won't work for the whole dataset...

    first_files = dataset.take(10)

    first_files.downloadload() #returns an array of file names. Those files can be consumed. 


    Tuesday, February 4, 2020 2:57 PM

Answers

  • Many thanks for helping me out. I was really stuck on this. It makes a lot of sense not to mount something to a folder that is already a mounted folder.

    While

    with dataset.mount() as mount_context:
    # list top level mounted files and folders in the dataset
        os.listdir(mount_context.mount_point

    still never works for me, the two-step approach using

    mount_context = dataset.mount() mount_context.start() os.listdir(mount_context.mount_point) mount_context.mount_point ! ls <that tmp folder>

    now does the job. Many thanks!

    Wednesday, February 5, 2020 1:30 PM

All replies

  • In this case, you are mounting your blob storage account under '/mnt/azmnt/code/Users', which is an Azure File Share in the same storage account. This leads to some weird/unexpected behavior. Please try mounting the dataset to a different place on the VM.

    Additionally, you can not specify a mount_point, and one will be chosen automatically which works in most cases. 

    Also posting a question on Stack overflow with the 'azure-machine-learning-service' is preferable, to the question about forum.
    Tuesday, February 4, 2020 5:19 PM
  • Many thanks for helping me out. I was really stuck on this. It makes a lot of sense not to mount something to a folder that is already a mounted folder.

    While

    with dataset.mount() as mount_context:
    # list top level mounted files and folders in the dataset
        os.listdir(mount_context.mount_point

    still never works for me, the two-step approach using

    mount_context = dataset.mount() mount_context.start() os.listdir(mount_context.mount_point) mount_context.mount_point ! ls <that tmp folder>

    now does the job. Many thanks!

    Wednesday, February 5, 2020 1:30 PM