none
Trying to download multiple files from sharepoint

    질문

  • I am trying to pull down 135K+ files from our SharePoint repository.  Needless to say it is slow and I wanted to wrap it in a Parallel.ForEach loop.  I understand the ClientContext is not thread safe.  So looking around researching the problem, I understand the problem so here is what I have done:

     public static List<SP.ListItem> GetAllDocumentsInaLibrary()
            {
                List<SP.ListItem> items = new List<SP.ListItem>();
                string siteURL ="URL to Server";
                using (var ctx = new SP.ClientContext(siteURL))
                {
                    ctx.Load(ctx.Web, a => a.Lists);
                    ctx.ExecuteQuery();
                    int i = 0;
                    SP.List list = ctx.Web.Lists.GetByTitle("ArchivedData");
                    SP.ListItemCollectionPosition position = null;
                    int rowLimit = 100;
                    var camlQuery = new SP.CamlQuery();
                    camlQuery.ViewXml = @"<View Scope='RecursiveAll'>
                                          <Query>
                                            <OrderBy Override='TRUE'><FieldRef Name='ID'/></OrderBy>
                                          </Query>
                                          <ViewFields>
                                                <FieldRef Name='Title'/><FieldRef Name='ID' />
                                          </ViewFields>
                                          <RowLimit Page='TRUE'>" + rowLimit + "</RowLimit></View>";
                    do
                    {
                        SP.ListItemCollection listItems = null;
                        camlQuery.ListItemCollectionPosition = position;
                        listItems = list.GetItems(camlQuery);
                        ctx.Load(listItems);
                        ctx.ExecuteQuery();
                        i += 100;
                        position = listItems.ListItemCollectionPosition;
                        Parallel.ForEach(listItems,
                            () => CreateClientContext(siteURL),
                                (currentfileItem, localContext) =>
                                {
                                    localContext.Load(currentfileItem, x => x.File);
                                    localContext.ExecuteQuery();
                                    var fileRef = currentfileItem.File.ServerRelativeUrl;
                                    var fileInfo = SP.File.OpenBinaryDirect(localContext, fileRef);
                                    var fileName = @"c:\downloads\" + currentfileItem.File.Name;
                                    using (var fileStream = System.IO.File.Create(fileName))
                                    {
                                        fileInfo.Stream.CopyTo(fileStream);
                                        //Generate MD5 HASH
                                        using (var md5 = MD5.Create())
                                        {
                                            Console.WriteLine("MD5 Hash for file {0}: {1}", fileName, md5.ComputeHash(fileStream).GetHashCode());
    
                                            //Record HASH and File name in db with file name and title if there, and SharePointID
                                            using (SqlConnection sqlCon = new SqlConnection("[Connection String"))
                                            {
                                                sqlCon.Open();
                                                using (SqlCommand sqlCmd1 = new SqlCommand { CommandText = "INSERT INTO [TABLE] ([FileName], [FileTitle], [SharePointId], [MD5Hash]) VALUES (@fileName, @fileTitle, @sharePointId, @MD5Hash)", Connection = sqlCon })
                                                {
                                                    sqlCmd1.Parameters.AddWithValue("@fileName", currentfileItem.File.Name);
                                                    sqlCmd1.Parameters.AddWithValue("@fileTitle", String.IsNullOrEmpty(currentfileItem.File.Title) ? "null" : currentfileItem.File.Title);
                                                    sqlCmd1.Parameters.AddWithValue("@sharePointId", currentfileItem.Id);
                                                    sqlCmd1.Parameters.AddWithValue("@MD5Hash", md5.ComputeHash(fileStream).GetHashCode().ToString());
                                                    sqlCmd1.ExecuteNonQuery();
                                                }
                                                sqlCon.Close();
                                            }
                                        }
                                    }
                                    //Remove the file we don't need it anymore
                                    System.IO.File.Delete(fileName);
                                });
                        items.AddRange(listItems.ToList());
                        Console.WriteLine(i);
                    }
                    while (position != null);
                }
                return items;
            }


    Here is the code for creating the clientContext

     private static ClientContext CreateClientContext( string url)
            {
                ClientContext context = new ClientContext(url);
                return context;
            }

    I am getting an error on the Parallel.ForEach loop trying to add the ClientContext to the loop:

     () => CreateClientContext(siteURL)

    It keeps coming up with this error:

    "Cannot convert lambda expression to type 'System.Threading.Tasks.ParallelOptions' because it is not a delegate type"

    I know what the error is saying, but I am having problems formulating the Parallel.ForEach.  Can someone help me on this?

    Thanks 

    2018년 5월 18일 금요일 오후 3:21

모든 응답

  • Hi,

    You have instantiate a ClientContext by using (var ctx = new SP.ClientContext(siteURL)), may I know why you want to instantiate a new one in ForEach?           

    Best Regards,

    Lee


    Please remember to mark the replies as answers if they helped. If you have feedback for TechNet Subscriber Support, contact tnsf@microsoft.com.


    Click here to learn more. Visit the dedicated forum to share, explore and talk to experts about Microsoft Teams.


    2018년 5월 21일 월요일 오전 9:53
  • Hi,

    I am checking to see how things are going there on this issue. Please let us know if you would like further assistance.

    You can mark the helpful post as answer to help other community members find the helpful information quickly.

    You can also share your own solution here and mark it as answer, and we can learn from each other.

    Best Regards,

    Lee


    Please remember to mark the replies as answers if they helped. If you have feedback for TechNet Subscriber Support, contact tnsf@microsoft.com.


    Click here to learn more. Visit the dedicated forum to share, explore and talk to experts about Microsoft Teams.

    2018년 5월 24일 목요일 오전 8:57
  • Hi katghoti,

    If the PowerShell solution does not provide you with a solution with sufficient performance you may want to look at tools. See e.g. https://directory.collab365.community/office365-sharepoint/office-365-migration-software/

    For instance, Migration Manager (http://www.slimapplications.com/product/migration-manager/) provides download&zip capabilities that can be used for large datasets (e.g 100 k items) and libraries exceeding the list view threshold. The tool uses up to 6 parallel processes. The trial version is free and requires no installs.

    Paul

    2018년 5월 24일 목요일 오후 2:44