locked
AutoML for time series forecasting with larger datasets RRS feed

  • Question

  • We are trying to use AutoML for a set of ~ 9k timeseries (224736 records including test data). We’ve tried with various training cluster specs/sizes, but it seams that the featurization run does not finish in due time. 

    We receive the following message from the TRAINING run:

    Run timed out. No model completed training in the specified time. Possible solutions:
    1) Increase the timeout when creating a run.2) Subsample your dataset to decrease featurization/training time.3) Decrease the max horizon (forecasting datasets only).”

    We’ve observed that the FEATURIZATION run is canceled after ~ 1h and 20 minutes due to the timeout/failing of the TRAINING run.

    What is very strange is that the speed of featurization which we see it in azureml_automl.log file is the same even if we are using STANDARD_D2_V2 or STANDARD_D14_v2 nodes.  

    We have a couple of questions:

    1. Why does featurization does not use all cores available on the processing node? This is what it seems as the speeds are similar if we use STANDARD_D2_V2 or STANDARD_D14_v2 nodes.
    2. Is there a way to have the FEATURIZATION run to use more than a single node of the training cluster?
    3. Can we increase the timeout of the FEATURIZATION run? Currently we use experiment.submit(AutoMLConfiguration class).

    Thank you, Laurentiu 

    Friday, November 22, 2019 1:10 PM

All replies

  • Hi,

    Can you please add more details about the AutoMl config that you are trying, If possible can you please share the link to the sample.

    Please follow the below documents for configure and run experiment for forecasting tasks.

    https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-auto-train-forecast#configure-and-run-experiment

    https://docs.microsoft.com/en-us/python/api/azureml-train-automl-client/azureml.train.automl.automlconfig.automlconfig?view=azure-ml-py

    Thanks




    Monday, November 25, 2019 9:57 AM
  • Hi Ram, 

    Thank you for your answer. I've tried using the updated API, as per the link provided, but I get the same behavior.  

    If we are running with 100 timeseries (2100 records), pre-processing takes ~ 5 min. 

    If we are running with 9364 timeseries (196644 records), pre-processing times out after ~ 1h and 20 minutes. And experiment_timeout_minutes is not considered (it might be because it's only valid for the training run).  

    If needed, I can even share the azureml_automl.log for both scenarios. 

    I've opened a case (19112222001291) and submited the the code and sample data there. 

    Below is the code for the AutoML config. 

    time_column_name = "StartDate"
    target_column_name = "Quantity"
    grain_column_names = ["Global Store Code", "Local Item Code", "Tax Status Code"]
    k_cross_validation = 3
    metric = 'spearman_correlation'
    voting_ensemble = True
    stack_ensemble = True
    enable_tensorflow = False

    lags = [1, 3, 6]
    n_test_periods = 3
    time_series_settings = {
        'time_column_name': time_column_name,
        'grain_column_names': grain_column_names,
        'max_horizon': n_test_periods,
        'target_lags': lags,
        "preprocess": True
    }
    time_series_models = ['ElasticNet', 'LightGBM', 'GradientBoosting', 'DecisionTree', \
                          'KNN', 'LassoLars', 'SGD', 'RandomForest', 'ExtremeRandomTrees', 'AutoArima']

    automl_config = AutoMLConfig(task = 'forecasting',
                                 debug_log = 'automl_errors.log',
                                 compute_target = compute_target,
                                 training_data=train_data,
                                 label_column_name=target_column_name,
                                 country_or_region = 'GR',
                                 n_cross_validations = k_cross_validation,
                                 featurization = 'auto',
                                 primary_metric = metric,
                                 whitelist_models = time_series_models,
                                 enable_tf = enable_tensorflow,
                                 enable_voting_ensemble = voting_ensemble,
                                 enable_stack_ensemble = stack_ensemble,
                                 max_cores_per_iteration = cores_per_iteration,
                                 max_concurrent_iterations = concurrent_iterations,
                                 iteration_timeout_minutes = iteration_timeout,
                                 iterations = iterations_number,
                                 verbosity=logging.DEBUG,
                                 experiment_timeout_minutes = 1000,
                                 **time_series_settings
                                )

    Thank you, 

    Laurentiu 


    Monday, November 25, 2019 2:39 PM
  • Hi,

    Can you please try with the GPU optimized VM sizes that are specialized virtual machines available with single or multiple NVIDIA GPUs.

    Please follow the below document for ncv3 accelerated computers. 

    https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv3-series

    Please follow the below document for quota increase requests.

    https://docs.microsoft.com/en-us/azure/azure-supportability/resource-manager-core-quotas-request

    Thanks

    Friday, December 20, 2019 9:43 AM
  • Just checking in to see if the above answer(s) helped.

     

    If this answers your query, do click “Mark as Answer” and Up-Vote for the same which might be beneficial to other community members reading this thread .

    And, if you have any further queries do let us know. 

    Monday, March 23, 2020 1:38 AM