locked
Document not matching original RRS feed

  • Question

  • My original Word document - has about 75 single page letters - one letter to a page - I am tring to save each page as a PDF document -I am ending up with 75 documents but each document has extra lines - after a Carriage return so that the number of lines ends up being different -Is there a way to make sure that the CR/LF is not interpreted or is transalated to a CR only ???  or that the format of the newDoc matches the original that i am copying information from - I have tried numerous appoaches but have been unsuccessful . Thanks

    The following code demonstrates this problem -

    public void process_pages()
            {
                string file = textBox1.Text;

                object missing = System.Type.Missing;

                object Filename = file;
                object saveNew = false;

                object Etc = false;

                int cntr = 0;

                // set up all of the requirements for
                //Microsoft.Office.Interop.Word.ApplicationClass wordApp;

                Microsoft.Office.Interop.Word.Application wordApp = new Microsoft.Office.Interop.Word.Application();

                wordApp.Application.Visible = false;
                wordApp.WindowState = Microsoft.Office.Interop.Word.WdWindowState.wdWindowStateMinimize;

                object saveFormat = Microsoft.Office.Interop.Word.WdSaveFormat.wdFormatPDF;

                // The following line opens an existing document on the desktop; modify this to point to the document you
                // want to open.
                Microsoft.Office.Interop.Word.Document doc = wordApp.Documents.Open(ref Filename, ref missing, ref missing,
                    ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing,
                    ref missing, ref missing, ref missing);
     
                // \page is a predefined bookmark in Word. For more information, seehttp://msdn.microsoft.com/en-us/library/bb221188.aspx
                object pageBookmark = @"\page";
                Microsoft.Office.Interop.Word.Range pageRange = doc.Bookmarks.get_Item(ref pageBookmark).Range;

                // pageRange now represents the first page in the document. Do whatever you want with it...
                //MessageBox.Show(pageRange.Text);

                // loop through the pages in the document
                // creating a new document for the processing requirements
                // set all values to be equal to the current pageRange
                // execute the SaveAs method and specify the output formaat as PDF

                int pagenum = doc.Content.ComputeStatistics(Microsoft.Office.Interop.Word.WdStatistic.wdStatisticPages);
                for (int i = 1; i <= pagenum; i++)
                {
                    wordApp.Selection.GoToNext(Microsoft.Office.Interop.Word.WdGoToItem.wdGoToPage);
                    pageRange = wordApp.Selection.Bookmarks.get_Item(ref pageBookmark).Range;
                    object source = pageRange;
     
                    // Generates new doc files.
                    Microsoft.Office.Interop.Word.Document newDoc = wordApp.Documents.Add(ref missing, ref missing, ref missing, ref missing);

                   
                    //newDoc.PageSetup.LinesPage = pageRange.PageSetup.LinesPage; This is a problem !
                   
                    newDoc.PageSetup.TextColumns = pageRange.PageSetup.TextColumns;
                    newDoc.PageSetup.TopMargin = pageRange.PageSetup.TopMargin;

                    newDoc.PageSetup.FooterDistance = pageRange.PageSetup.FooterDistance;
                    newDoc.PageSetup.HeaderDistance = pageRange.PageSetup.HeaderDistance;
                   
                    newDoc.PageSetup.SectionDirection = pageRange.PageSetup.SectionDirection;
                    newDoc.PageSetup.SectionStart = pageRange.PageSetup.SectionStart;
     
                    newDoc.PageSetup.PageWidth = pageRange.PageSetup.PageWidth;
                    newDoc.PageSetup.PageHeight = pageRange.PageSetup.PageHeight;
                  
                    // copies the content as formatted text

                    newDoc.Content.FormattedText = pageRange.FormattedText;
     
                    object newDocName = "Z:\\page" + cntr.ToString() + ".pdf";

                    try
                    {
                        newDoc.SaveAs(ref newDocName, ref saveFormat,
                            ref missing, ref missing, ref missing,
                            ref missing, ref missing, ref missing,
                            ref missing, ref missing, ref missing,
                            ref missing, ref missing, ref missing,
                            ref missing, ref missing);
                    }
                    catch (Exception exx)
                    {
                        MessageBox.Show(exx.Message.ToString());
                    }

                    ((Microsoft.Office.Interop.Word._Document)newDoc).Close(ref Etc, ref Etc, ref Etc);
                   
                    newDoc = null;

                    cntr++;
                }
             }

    Monday, August 30, 2010 5:26 PM

Answers

  • Possible there is a difference between the paragraph formatting in the document that you are splitting and that of the default paragraph in the Normal.dotm template that would be being used for each of the documents that are created.

    I could give you some vba code for doing it without running into those issues, but cannot help with the language in which you are coding.

    -- Hope this helps.

    Doug Robbins - Word MVP,
    dkr[atsymbol]mvps[dot]org
    Posted via the Community Bridge

    "is_smde" wrote in message news:e8b756a2-e375-4b9a-b034-f128461d7984@communitybridge.codeplex.com...


    My original Word document - has about 75 single page letters - one letter to a page - I am tring to save each page as a PDF document -I am ending up with 75 documents but each document has extra lines - after a Carriage return so that the number of lines ends up being different -Is there a way to make sure that the CR/LF is not interpreted or is transalated to a CR only ???  or that the format of the newDoc matches the original that i am copying information from - I have tried numerous appoaches but have been unsuccessful . Thanks

    The following code demonstrates this problem -

    public void process_pages()
            {
                string file = textBox1.Text;

                object missing = System.Type.Missing;

                object Filename = file;
                object saveNew = false;

                object Etc = false;

                int cntr = 0;

                // set up all of the requirements for
                //Microsoft.Office.Interop.Word.ApplicationClass wordApp;

                Microsoft.Office.Interop.Word.Application wordApp = new Microsoft.Office.Interop.Word.Application();

                wordApp.Application.Visible = false;
                wordApp.WindowState = Microsoft.Office.Interop.Word.WdWindowState.wdWindowStateMinimize;

                object saveFormat = Microsoft.Office.Interop.Word.WdSaveFormat.wdFormatPDF;

                // The following line opens an existing document on the desktop; modify this to point to the document you
                // want to open.
                Microsoft.Office.Interop.Word.Document doc = wordApp.Documents.Open(ref Filename, ref missing, ref missing,
                    ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing,
                    ref missing, ref missing, ref missing);

                // \page is a predefined bookmark in Word. For more information, seehttp://msdn.microsoft.com/en-us/library/bb221188.aspx
                object pageBookmark = @"\page";
                Microsoft.Office.Interop.Word.Range pageRange = doc.Bookmarks.get_Item(ref pageBookmark).Range;

                // pageRange now represents the first page in the document. Do whatever you want with it...
                //MessageBox.Show(pageRange.Text);

                // loop through the pages in the document
                // creating a new document for the processing requirements
                // set all values to be equal to the current pageRange
                // execute the SaveAs method and specify the output formaat as PDF

                int pagenum = doc.Content.ComputeStatistics(Microsoft.Office.Interop.Word.WdStatistic.wdStatisticPages);
                for (int i = 1; i <= pagenum; i++)
                {

    wordApp.Selection.GoToNext(Microsoft.Office.Interop.Word.WdGoToItem.wdGoToPage);
                    pageRange = wordApp.Selection.Bookmarks.get_Item(ref pageBookmark).Range;
                    object source = pageRange;

                    // Generates new doc files.
                    Microsoft.Office.Interop.Word.Document newDoc = wordApp.Documents.Add(ref missing, ref missing, ref missing, ref missing);


                    //newDoc.PageSetup.LinesPage = pageRange.PageSetup.LinesPage; This is a problem !

                    newDoc.PageSetup.TextColumns = pageRange.PageSetup.TextColumns;
                    newDoc.PageSetup.TopMargin = pageRange.PageSetup.TopMargin;

                    newDoc.PageSetup.FooterDistance = pageRange.PageSetup.FooterDistance;
                    newDoc.PageSetup.HeaderDistance = pageRange.PageSetup.HeaderDistance;

                    newDoc.PageSetup.SectionDirection = pageRange.PageSetup.SectionDirection;
                    newDoc.PageSetup.SectionStart = pageRange.PageSetup.SectionStart;

                    newDoc.PageSetup.PageWidth = pageRange.PageSetup.PageWidth;
                    newDoc.PageSetup.PageHeight = pageRange.PageSetup.PageHeight;

                    // copies the content as formatted text

                    newDoc.Content.FormattedText = pageRange.FormattedText;

                    object newDocName = "Z:\\page" + cntr.ToString() + ".pdf";

                    try
                    {
                        newDoc.SaveAs(ref newDocName, ref saveFormat,
                            ref missing, ref missing, ref missing,
                            ref missing, ref missing, ref missing,
                            ref missing, ref missing, ref missing,
                            ref missing, ref missing, ref missing,
                            ref missing, ref missing);
                    }
                    catch (Exception exx)
                    {
                        MessageBox.Show(exx.Message.ToString());
                    }


    ((Microsoft.Office.Interop.Word._Document)newDoc).Close(ref Etc, ref Etc, ref Etc);

                    newDoc = null;

                    cntr++;
                }
             }


    Doug Robbins - Word MVP dkr[atsymbol]mvps[dot]org
    • Marked as answer by is_smde Tuesday, August 31, 2010 7:34 PM
    Monday, August 30, 2010 11:00 PM

All replies

  • Possible there is a difference between the paragraph formatting in the document that you are splitting and that of the default paragraph in the Normal.dotm template that would be being used for each of the documents that are created.

    I could give you some vba code for doing it without running into those issues, but cannot help with the language in which you are coding.

    -- Hope this helps.

    Doug Robbins - Word MVP,
    dkr[atsymbol]mvps[dot]org
    Posted via the Community Bridge

    "is_smde" wrote in message news:e8b756a2-e375-4b9a-b034-f128461d7984@communitybridge.codeplex.com...


    My original Word document - has about 75 single page letters - one letter to a page - I am tring to save each page as a PDF document -I am ending up with 75 documents but each document has extra lines - after a Carriage return so that the number of lines ends up being different -Is there a way to make sure that the CR/LF is not interpreted or is transalated to a CR only ???  or that the format of the newDoc matches the original that i am copying information from - I have tried numerous appoaches but have been unsuccessful . Thanks

    The following code demonstrates this problem -

    public void process_pages()
            {
                string file = textBox1.Text;

                object missing = System.Type.Missing;

                object Filename = file;
                object saveNew = false;

                object Etc = false;

                int cntr = 0;

                // set up all of the requirements for
                //Microsoft.Office.Interop.Word.ApplicationClass wordApp;

                Microsoft.Office.Interop.Word.Application wordApp = new Microsoft.Office.Interop.Word.Application();

                wordApp.Application.Visible = false;
                wordApp.WindowState = Microsoft.Office.Interop.Word.WdWindowState.wdWindowStateMinimize;

                object saveFormat = Microsoft.Office.Interop.Word.WdSaveFormat.wdFormatPDF;

                // The following line opens an existing document on the desktop; modify this to point to the document you
                // want to open.
                Microsoft.Office.Interop.Word.Document doc = wordApp.Documents.Open(ref Filename, ref missing, ref missing,
                    ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing, ref missing,
                    ref missing, ref missing, ref missing);

                // \page is a predefined bookmark in Word. For more information, seehttp://msdn.microsoft.com/en-us/library/bb221188.aspx
                object pageBookmark = @"\page";
                Microsoft.Office.Interop.Word.Range pageRange = doc.Bookmarks.get_Item(ref pageBookmark).Range;

                // pageRange now represents the first page in the document. Do whatever you want with it...
                //MessageBox.Show(pageRange.Text);

                // loop through the pages in the document
                // creating a new document for the processing requirements
                // set all values to be equal to the current pageRange
                // execute the SaveAs method and specify the output formaat as PDF

                int pagenum = doc.Content.ComputeStatistics(Microsoft.Office.Interop.Word.WdStatistic.wdStatisticPages);
                for (int i = 1; i <= pagenum; i++)
                {

    wordApp.Selection.GoToNext(Microsoft.Office.Interop.Word.WdGoToItem.wdGoToPage);
                    pageRange = wordApp.Selection.Bookmarks.get_Item(ref pageBookmark).Range;
                    object source = pageRange;

                    // Generates new doc files.
                    Microsoft.Office.Interop.Word.Document newDoc = wordApp.Documents.Add(ref missing, ref missing, ref missing, ref missing);


                    //newDoc.PageSetup.LinesPage = pageRange.PageSetup.LinesPage; This is a problem !

                    newDoc.PageSetup.TextColumns = pageRange.PageSetup.TextColumns;
                    newDoc.PageSetup.TopMargin = pageRange.PageSetup.TopMargin;

                    newDoc.PageSetup.FooterDistance = pageRange.PageSetup.FooterDistance;
                    newDoc.PageSetup.HeaderDistance = pageRange.PageSetup.HeaderDistance;

                    newDoc.PageSetup.SectionDirection = pageRange.PageSetup.SectionDirection;
                    newDoc.PageSetup.SectionStart = pageRange.PageSetup.SectionStart;

                    newDoc.PageSetup.PageWidth = pageRange.PageSetup.PageWidth;
                    newDoc.PageSetup.PageHeight = pageRange.PageSetup.PageHeight;

                    // copies the content as formatted text

                    newDoc.Content.FormattedText = pageRange.FormattedText;

                    object newDocName = "Z:\\page" + cntr.ToString() + ".pdf";

                    try
                    {
                        newDoc.SaveAs(ref newDocName, ref saveFormat,
                            ref missing, ref missing, ref missing,
                            ref missing, ref missing, ref missing,
                            ref missing, ref missing, ref missing,
                            ref missing, ref missing, ref missing,
                            ref missing, ref missing);
                    }
                    catch (Exception exx)
                    {
                        MessageBox.Show(exx.Message.ToString());
                    }


    ((Microsoft.Office.Interop.Word._Document)newDoc).Close(ref Etc, ref Etc, ref Etc);

                    newDoc = null;

                    cntr++;
                }
             }


    Doug Robbins - Word MVP dkr[atsymbol]mvps[dot]org
    • Marked as answer by is_smde Tuesday, August 31, 2010 7:34 PM
    Monday, August 30, 2010 11:00 PM
  • Yes, this did help -

    I have located the Normal.dotm template - and now I am trying to find out how it is arranged and what the distinct differences might be between the default that I am using and the one that was used to gerneate the orignal documents.

    I think that if I can match the two this problem will simply go away -

    Can you direct me to a really good tutorial (exhaustive) on all of the members of a templete and the steps in creating a new template ?

    I really appreciate the help -

    I will mark the question  as answered -

    Thanks

     

     

    Tuesday, August 31, 2010 2:41 PM
  • I would think that the thing that you need to be looking at is the paragraph formatting.  Assuming that the issue is that the spacing between the paragraphs has increases in the new documents, it is most likely that the space before or after the paragraph differs.

    -- Hope this helps.

    Doug Robbins - Word MVP,
    dkr[atsymbol]mvps[dot]org
    Posted via the Community Bridge

    "is_smde" wrote in message news:76eb388c-e081-446a-aa02-c035a59b2410@communitybridge.codeplex.com...

    Yes, this did help -

    I have located the Normal.dotm template - and now I am trying to find out how it is arranged and what the distinct differences might be between the default that I am using and the one that was used to gerneate the orignal documents.

    I think that if I can match the two this problem will simply go away -

    Can you direct me to a really good tutorial (exhaustive) on all of the members of a templete and the steps in creating a new template ?

    I really appreciate the help -

    I will mark the question  as answered -

    Thanks


    Doug Robbins - Word MVP dkr[atsymbol]mvps[dot]org
    Tuesday, August 31, 2010 10:45 PM
  • It sure makes sense -

    I attempted to set the following values

    newDoc.Paragraphs.SpaceAfter = pageRange.ParagraphFormat.SpaceAfter;

    newDoc.Paragraphs.SpaceBefore = pageRange.ParagraphFormat.SpaceBefore;

    but this had no effect -

    it may be the Auto value - I will try that as well.

     

    Thursday, September 2, 2010 3:24 PM
  • setting the following values had no effect.

    newDoc.Paragraphs.SpaceAfterAuto = pageRange.Paragraphs.SpaceAfterAuto;

    newDoc.Paragraphs.SpaceBeforeAuto = pageRange.Paragraphs.SpaceBeforeAuto;

    newDoc.Paragraphs.SpaceAfter = pageRange.Paragraphs.SpaceAfter;

    newDoc.Paragraphs.SpaceBefore = pageRange.ParagraphFormat.SpaceBefore;

     Is there a way to set all the paragraph Format values more globally or do I need to set each possible value - individually ?

    Thursday, September 2, 2010 3:32 PM
  • DOug;

    I am somewhat familair with VB -

    if you provide the code example - I think it might help .

    I appreciate it . Thanks

     

    Thursday, September 2, 2010 3:47 PM
  • It may be notable that setting the value :

    newDoc.PageSetup.LinesPage = pageRange.PageSetup.LinesPage;

    Generates an exception - that the value is out of range.

    The newDoc value is 36 and the value that I am trying to set it to ends up being 44.

    I am not sure how to modify this value - either.

     

     

    Thursday, September 2, 2010 4:07 PM
  • Hmmm...

    I have commented out all the code that attempts to set values and I have addd the following lines :

    newDoc.Paragraphs.LineSpacingRule = 0;

    // 0 = wdLineSpaceSingle

    newDoc.Paragraphs.SpaceAfterAuto = 0;

    newDoc.Paragraphs.SpaceBeforeAuto = 0;

    newDoc.Paragraphs.SpaceAfter = 0;

    newDoc.Paragraphs.SpaceBefore = 0;

    newDoc.Paragraphs.SpaceBefore = 0;

    I am still getting a new line after every paragraph marker ...

    that is where the extra lines are coming from ...

    More research ..

     

    Thursday, September 2, 2010 4:57 PM
  • In VBA, you would use

    Dim i As Long
    With NewDoc.Range
        For i = 1 To .Paragraphs.Count
            With .Paragraphs(i)
                .SpaceAfter = 0
                .SpaceBefore = 0
            End With
        Next i
    End With

    -- Hope this helps.

    Doug Robbins - Word MVP,
    dkr[atsymbol]mvps[dot]org
    Posted via the Community Bridge

    "is_smde" wrote in message news:b9f448b4-ae4b-44cb-a78b-91f4ba7a75e4@communitybridge.codeplex.com...

    Hmmm...

    I have commented out all the code that attempts to set values and I have addd the following lines :

    newDoc.Paragraphs.LineSpacingRule = 0;

    // 0 = wdLineSpaceSingle

    newDoc.Paragraphs.SpaceAfterAuto = 0;

    newDoc.Paragraphs.SpaceBeforeAuto = 0;

    newDoc.Paragraphs.SpaceAfter = 0;

    newDoc.Paragraphs.SpaceBefore = 0;

    newDoc.Paragraphs.SpaceBefore = 0;

    I am still getting a new line after every paragraph marker ...

    that is where the extra lines are coming from ...

    More research ..


    Doug Robbins - Word MVP dkr[atsymbol]mvps[dot]org
    Thursday, September 2, 2010 11:50 PM
  • Thanks for the information Doug .

    It looks like I am already doing basically the same thing in my code -

    I have recently contacted the vendor which supplies the documents asking about their Normaldotm file and they have indicated that they do not use a dotm format to create the original documents. Apparenty, they are using some sort of a compressed low-level format of some kind. The details are sketchy  - their help desk person was completely in the dark on this issue. (and might have gotten the whole thing all mixed up - i might add - so this explanation might not have anything at all to do with what is really going on )

    Are you familar at all with what they have told me ????  Is it possible that the original document "FormattedText" is more or less being treated like a blob or an image then ???  Would that  explain why changing these values has little or no effect.

    Could this explain why their LinesPage value is 44 and that is out of range for my system ???

    This is all very new to me ....

     

    Cheers :)

     

    Friday, September 3, 2010 7:04 PM
  • I am wondering if there is a way to load this document without a template - Can I turn off the template - default - etc - so that it is not used and see what the documetn looks like ? thanks
    Friday, September 3, 2010 7:11 PM
  • OH I see !

    I just learned that the dotm and the documents themselves are XML files (zipped) -

    so i  I unzipped it .... and started looking around at the body of the document.

    Perhaps there is some sort of an interplay between the internal XML specified in the document itself 

    and the settings in my Normaldotm file. 

    Is it possible to turn off the local template  ??? and use just the information specified by this document ?

    I just deleted the Normal.dot Normal.dotm file and re-ran - They were both re-created and the results were the same -

    So where can the extra lines be coming from ???

    Is it possible that the local vales for somehitn glike  "rsidRDefault="00BB74AF"" is the culprit -

    If so - Can you , please, let me know how to modify this for good - because a new Normaldot and dotm file is getting re-generated each time I run the application .

     

    Friday, September 3, 2010 8:16 PM
  • I have successfully modified the Normal.dotm file locally and all new documents show that on the Page layout paragraphs setting - space after = 0 . but the document - that I am creating looks the same - so - i am convinced that it is inside the XML specification - soemwhere - I suspect that the Default - being used is overriding the local settings - there may not be much I can do ....
    Friday, September 3, 2010 8:55 PM
  • I have successfully modified the Normal.dotm file locally and all new documents show that on the Page layout paragraphs setting - space after = 0 . but the document - that I am creating looks the same - so - i am convinced that it is inside the XML specification - soemwhere - I suspect that the Default - being used is overriding the local settings - there may not be much I can do ....
    Friday, September 3, 2010 8:56 PM