none
storing a wide dataset is sqlserver RRS feed

  • General discussion

  • Hi all,

    We are doing machine learning using python. we are considering using SQL server ml services.

    I need to store a dataset with 4000 columns for ML training. and I need fast  random access to rows by a single primary key.

    SQL server row is limited to about 8000 bytes and 1024 fields.

    I guess I can store each row as XML, but that may hurt performance.

    I guess I can split the 4000 columns into a few tables, and retrieve each data row by separate queries to each table ( as there is a limit of 4096). quite a lot of work.

    Our current solution is using hbase. It has unlimited number of fields, but requires hadoop.

    What could be a solution that fits SQL server ML services


    Thursday, June 27, 2019 12:32 PM

All replies

  • Hi,

    You can perhaps use SQL Server wide tables, if you're using SQL 2016 or higher:

    https://docs.microsoft.com/en-us/sql/relational-databases/tables/tables?view=sql-server-2017

    Saturday, July 6, 2019 11:50 PM