locked
Technical limitation documentation RRS feed

  • Question

  • I have a process whereby I need to write a csv with an array of >250000 objects. I haven't found a native psh or .net method that is able to process this number of records across a pipeline. The syntax is right, since it works fine then the number of objects in the array are <~30000. My not-so-satisfying workaround is to Export-CSV -append in "chunks" of a few thousand objects at a time.

    I'm sure someone will just say "why are you doing it in PowerShell... it's not designed for that... use a compiled language to get closer to the hardware..." I want to know HOW and WHY that's true, and where I can find that technical information regarding psh limitations.

    Where is there technical documentation that covers limits, especially:

    • The max size of an array (of objects, of hashtables…)
    • The max size the pipeline can handle
    • The max size cmdlets can handle (such as Export-CSV, Out-File…)
    Thursday, September 19, 2019 5:00 PM

Answers

  • The maximum size of all values and collections is set by the Windows OS (32/64).  You can find that information on the Internet.  THe maximum for objects on the heap is limited by available memory and page file size.

    You can create collections of 2**32 on all platforms limited by available memory.

    This is true of all elements of Windows code.  It is set by the architecture.

    If you take some time and study computer technology then all of this will become clear.

    A pipeline has no limit because it does not store anything.  It just passes one object at a time.

    A Command has the limits placed by the parameter5s and its usage.  Learning how computers and code work will help you understand how to evaluate each command.  There is no single answer.


    \_(ツ)_/


    • Edited by jrv Thursday, September 19, 2019 8:10 PM
    • Marked as answer by primohacker Thursday, September 19, 2019 8:28 PM
    Thursday, September 19, 2019 8:07 PM

All replies

  • Without seeing your actual code we're only able to take a wild guess ... there might be a problem with the code you use ... even if it works for the purpose you use it for.

    Live long and prosper!

    (79,108,97,102|%{[char]$_})-join''

    Thursday, September 19, 2019 5:36 PM
  • So I've been around the IT block for a few years...  While this sort of document might be nice, I'd challenge you to find that documented for any scripting language.

    These are not, "one size fits all" so it wouldn't make any (financial) sense to spend time gathering such data... 

    My motto...  If it's not fast enough for you, find another way.

    Thursday, September 19, 2019 6:12 PM
  • Yeah, when finding "another way" I'd sure like to know if that way is possible before applying face pressure to wall repeatedly. I would think PowerShell things have actual hard limits to inputs/outputs. Would like to see if anyone knows what they are.
    Thursday, September 19, 2019 6:42 PM
  • Thanks for the reply Olaf. The way I see it I don't have a coding problem, I have an intelligence problem.

    Although, if you ask my wife she would say I have both.

    Thursday, September 19, 2019 6:46 PM
  • This works for me.  Granted the working set memory went up to 300 megs.  Do you have a minimal, reproducible example that doesn't work?  It's much slower to append inside the loop (40 minutes?).

    $a = for ($i = 1; $i -le 250000; $i++) { [pscustomobject]@{name='Joe'} }
    $a | export-csv joe.csv

    • Edited by JS2010 Thursday, September 19, 2019 7:15 PM
    Thursday, September 19, 2019 7:14 PM
  • The maximum size of all values and collections is set by the Windows OS (32/64).  You can find that information on the Internet.  THe maximum for objects on the heap is limited by available memory and page file size.

    You can create collections of 2**32 on all platforms limited by available memory.

    This is true of all elements of Windows code.  It is set by the architecture.

    If you take some time and study computer technology then all of this will become clear.

    A pipeline has no limit because it does not store anything.  It just passes one object at a time.

    A Command has the limits placed by the parameter5s and its usage.  Learning how computers and code work will help you understand how to evaluate each command.  There is no single answer.


    \_(ツ)_/


    • Edited by jrv Thursday, September 19, 2019 8:10 PM
    • Marked as answer by primohacker Thursday, September 19, 2019 8:28 PM
    Thursday, September 19, 2019 8:07 PM
  • I am familiar with how architecture and "computer technology" but the connection between the layers of abstraction from PSH>.net>C(?)>kernel>hardware... little hard to follow what cmdlet is actually doing in terms of processing cost.

    Do you have any links to considerations for PSH devs in terms of system resource optimization? Anything concrete to keep in mind when choosing specific cmdlets? I generally try to use .net whenever possible...

    (obviously, I've already looked for such, but I'm interested resources you know about, since you seem to know more about it than me.)

    Thursday, September 19, 2019 8:33 PM
  • Simple familiarity with those terms is insufficient. You must understand how these things affect theOS and the software development and you must then be able to translate it into subsystems and how they work.

    A complete understanding of how PowerShell is designed is a good starting point for understanding CmdLets.  You cannot know wwhat is happening inside most programs but you can know what it must be doing and how that impacts.

    You are asking a question that is too vague.  This usually happens because of lack of a deep technical understanding of computer engineering and softwar4e engineering.

    To understand the impact you need to learn how to use Windows tools to measure performance. The performance measurement system in Windows is very rich and can tell you almost anything once you have a deep understanding of Windows technology.

    Most developers today do not have much systems training and just write GUIs. The development tools and a good understanding of GUI design are all they need.

    For writing scripts the need for more information is in direct relation to the purpose of the script.  You need to first learn to design a scri0pt for a Windows system (or other with Core) and know the parts of the system you are affecting. Once you learn enough basic computer engineering this will become obvious.

    If you as a specific question about a specific script or command others will be able to answer the question but your broad and ambiguous questions cannot be answered in any specific way.

    Learn computer technology at an engineering level and you will be able to find answers on your own.


    \_(ツ)_/

    Thursday, September 19, 2019 8:51 PM