locked
Help Needed for Extracting specific links from a list of websites. RRS feed

  • Question

  • Hello.

    I hope this is the right place to post this.

    I need to get this working from one site to another, this is what I have working for one site:

    $InputLinksFile = "c:\temp\InputLinks.txt"
    $OutputLinksFile = "C:\temp\OutputLinks.txt"
    $InputLinks = @()

    $BasePage = "https://www.fanfiction.net/tv/Buffy-The-Vampire-Slayer/?&srt=2&lan=1&r=10&p="
    [int]$FirstPageNumber = "600"
    [int]$LastPageNumber = "601"
    $CurrentPageNumber = $FirstPageNumber

    # Make a list of all the pages we want to input, counting from FirstPageNumber to LastPageNumber
    while ($CurrentPageNumber -le $LastPageNumber) {
    $InputLinks += "$BasePage$CurrentPageNumber"
    $CurrentPageNumber++
    }

    # If you want to manually input a list of pages instead, remove # in front of the next line:
    $InputLinks = Get-Content -Path $InputLinksFile

    ForEach ($InputLink in $InputLinks) {
    # Fetch the entire page. Get links in page with ().Links. Page is compressed with gzip, so we'll have to account for that
    $InputPageLinks = (Invoke-WebRequest -Uri $InputLink -Headers @{"Accept-Encoding"="gzip"}).Links
    # Filter the link list to only contain links with the sequence "/1/" in it.
    $FilteredOutputLinks = $InputPageLinks | Where-Object {$_.href -like "*/1/*"}
    # The provided links are relative and not absolute, so we need to add the domain name to the output
    foreach ($OutputLink in $FilteredOutputLinks) {
    $FinalLink = "https://fanfiction.net$($Outputlink.href)"
    Out-File -Append -FilePath $OutputLinksFile -InputObject $FinalLink
    }
    Clear-Variable InputPageLinks
    }

    For some reason I can't post the links for the new site.

    I'm hoping someone can help me.

    Thank so much in advance.

    Thursday, March 8, 2018 4:23 PM

All replies

  • Hi,

    This is a quick note to let you know that I am currently performing research on this issue and will get back to you as soon as possible. I appreciate your patience.

    If you have any updates during this process, please feel free to let me know.

    Best Regards,
    Albert


    Please remember to mark the replies as an answers if they help.
    If you have feedback for TechNet Subscriber Support, contact tnmff@microsoft.com

    Friday, March 9, 2018 6:50 AM
  • How long do I have to wait until I can post the 2 links for the new site and an example of which links I would like extracted?
    Friday, March 9, 2018 6:58 AM