locked
Pass output from external function like xpdfs pdftotext to other commands RRS feed

  • Question

  • I simply want to convert a pdf file to a .txt file and search the .txt file for strings. I'm using xpd's pdftotext.

    The problem I'm running into is using the output of the conversion somehow. I initially tried assigning it to a variable in a few different ways, something like this: 

    $output = & pdftotext "test.pdf" 
    Get-Content $output | Where-Object {$_Content(" the ")}

    I've tried many different variations of piping and assigning output to a variable with no success (I should note that I provide a variable as the file path for the actual script). What I get is that I cannot pass "null" to get-content (or any other function I might try). This tells that the variable isn't actually storing anything but a null value, right?

    How do I capture that output from pdftotext and use it elsewhere? The function does indeed convert to text, as intended - but I can't dynamically use that file thereafter.


    • Edited by aeakins Monday, June 25, 2018 11:09 PM
    Monday, June 25, 2018 11:09 PM

Answers

  • $output = pdftotext test.pdf
    $output | Where-Object {$_ -match 'the '}


    \_(ツ)_/


    • Marked as answer by aeakins Tuesday, June 26, 2018 12:20 AM
    • Edited by jrv Tuesday, June 26, 2018 12:40 AM
    Monday, June 25, 2018 11:22 PM

All replies

  • $output = pdftotext test.pdf
    $output | Where-Object {$_ -match 'the '}


    \_(ツ)_/


    • Marked as answer by aeakins Tuesday, June 26, 2018 12:20 AM
    • Edited by jrv Tuesday, June 26, 2018 12:40 AM
    Monday, June 25, 2018 11:22 PM
  • Hi again, jrv.

    This ended up working once I put the & and removed the extra $output. I should have explained that pdftotext isn't a function I built but an external application. Turned out I had gotten the right answer once before but had accidentally messed up the file path so it threw more than one error. Thanks!

    Tuesday, June 26, 2018 12:19 AM
  • This will work if you want a file:

    .\pdftotext test.pdf test.txt

    When a command or program is in the current folder use ";\" to specify the current path.


    \_(ツ)_/

    Tuesday, June 26, 2018 12:42 AM
  • Yeah, I've done it that way before as well. I just read that I should use & if it's an external program. The folder happens to be in a sub directory though, so this let's me specify full path and looks more readable to me.

    Now I'm struggling to get it to output a count on instances I know exist in the resulting .txt file though. I tried:

    $output = pdftotext test.pdf
    $output | Where-Object {$_ -match 'string'.Count} | Out-File test.txt 
    But the file is blank. Any thoughts?


    Tuesday, June 26, 2018 12:58 AM
  • Don't believe everything you read on the Internet.  Take the time to take the course and get the knowledge correct the fist time


    \_(ツ)_/

    Tuesday, June 26, 2018 1:08 AM
  • I find that your comment makes unwarranted assumptions about what I have and have not done. I have a basal knowledge of powershell from https://read.amazon.com/?asin=B078F19ZHJ, various online resources, and consistently reference the documentation.

    I assume you're taking a dig at my comment on the & operator, which, like most operators in powershell functions far differently than it does in many other object-oriented languages. I've read how it works as a forced call and I know it doesn't need to be used in an external application. But I used it for several reasons, not excluding the fact that I initially thought there might be an issue with the invocation of the application. I realize not everything on the internet is true. It's kind of presumptuous to tell me not to.

    That said, I have to actually work through and struggle through writing scripts to actually learn anything. But I want a useful product out of the mix, otherwise I have no reason to use Powershell. Reading and taking courses needs to be complemented with practice. I'd rather struggle through doing something that's a little out of reach for me right now, learning along the way and reinforcing that learning, than read something that I'll memory dump because I haven't practiced it.

    When I come across something I don't know, I take time to try to research it, and I ask questions. No need to pass judgement on a beginner for asking.

    It's discouraging.


    • Edited by aeakins Tuesday, June 26, 2018 1:32 AM
    Tuesday, June 26, 2018 1:30 AM
  • Not a dig.  All programs can be executed at a PS prompt.  The use of the "&" is not normally required.  The only rule is that we need to use a path specifier if the command is in the current folder.

    Take some time to learn what the "&" is and where/why it is used.

    Using it is not wrong but is only required under specific circumstances. 

    Also ... what is the output of the program?  As far as I can tell it has no output other tan info messages and error messages.  It converts a pdf to a text file in the same folder as the pdf.


    \_(ツ)_/

    Tuesday, June 26, 2018 2:13 AM