Formular una preguntaFormular una pregunta
 

RespondidaContent Deployment Problems

  • viernes, 20 de junio de 2008 14:11sandy_lamoureux_miltoncat.com Medallas del usuarioMedallas del usuarioMedallas del usuarioMedallas del usuarioMedallas del usuario
     
    Hello -
    I am having an ongoing problem with content deployments on MOSS 2007 SP1 that I'm hoping someone here can shed some light on.
    The GUI is problematic, so I use stsadm. I created the job in the GUI, and then I run them as:
    stsadm -o runcontentdeploymentjob -jobname "ChangesOnly"

    We have been having intermittent problems with CD for months, so last week I patched all of our Sharepoint environments. They were already SP1. I applied 3 hotifxes: 946517, 95268, and 952704 without issue. I ran the Sharepoint configuration wizard after the last two (as instructed) without a problem.

    We have an 8GB content database.
    Our 1-tier development and 2-tier staging environments are behind our firewall. Content deployments between these two environments, in both directions, run flawlessly using the same content that deploys from staging to production.
    Our 2-tier production environment is set up with the sql server behind the firewall, and the web server in the DMZ. CD from staging to production is the problem.
    Both prod servers have 4GB of RAM allocated to them. If it is of note, the are both VM (VMWare), as are the dev and staging environments.

    The CD job runs and the changes go through, but takes forever and brings down the website. Example: I started the job last night at 8:15pm. It was still running when I went to bed last night. The website went up and down a few times overnight (monitoring via siteuptime.com), from a 2 minute outage to a 40 minute outage. Finally, around 7am, after an extended outage, I had to reboot both servers becuase SQL server was pegged at 99% usage.
    Rebooting usually fixes the problem. Note that siteuptime is quite accurate - nights that I don't run the CD job, there are no outages.

    Earlier this week I did a stadm export from staging, manually copied the files to the production web server, and ran the import via stsadm on the production web server instead of a content deployment with similar results.

    I also disable the anti-virus during the import and/or CD processes to rule that out.

    I have deleted and re-created the CD jobs and paths many times - makes no difference.

    This is a problem because this is our public website, and we can't afford for it to be down. Running the CD overnight is really just a stop-gap solution, as overnight outages are better the ones during the business day. (We are not a 24-hour operation.)

    I guess I could schedule a reboot around 5am every morning, but that's not fixing the issue at hand.

    The following ports have been verified as open on our firewall:
    TCP 1433 SQL
    TCP/UDP 53
    TCP 80/443
    TCP/UDP 445
    TCP/UDP 88
    LDAP/LDAPS 389/636

    I suppose if the ports weren't open it wouldn't work at all. The changes do go through eventually.

    We don't have any custom features or anything custom, really, other than masterpages and css. It's pretty close to an out-of-box solution, just a little prettier. Smile

    We've had this set up for 18 months. It's only in the past 2-3 months that CD has become problematic, and in the past few weeks that it's become the way I describe above. I suppose that could be due to database size, but it's not all that much bigger than it was 6 months ago.

    Any ideas are welcome. I know CD has been problematic for many, but it is working for me in my dev and staging environments. I realize that may mean it's a firewall issue - but what, exactly? I need to go in armed with the right questions, otherwise I'm told "It's not the firewall." So if anyone can provide me with good, detailed things to check, it would be much appreciated. And please write it out like I'm 5 - I'm just a developer who has been thrown into trying to figure out network/firewall issues.

    Much Obliged.

    --Sandy

Respuestas

  • martes, 24 de junio de 2008 12:38sandy_lamoureux_miltoncat.com Medallas del usuarioMedallas del usuarioMedallas del usuarioMedallas del usuarioMedallas del usuario
     Respondida
    Well, for anyone interested...
    I finally had enough. Our website wouldn't come up after multiple reboots yesterday morning, so we pulled both VMs behind the firewall and tried content deployments, as well as import/export. Not much luck there. But I noticed my content database had grown from 8GB 2 weeks ago to 20GB yesterday! I assume this had to do with the failed/timed out content deployments. So, after trying a few things it came down to this:

    1. Delete the entire web application on production.
    2. Create new web application on production with new blank template.
    3. On staging, delete exisitng content deployment jobs and create a new full content deployment job to production.
    4. Run full deployment to production.
    5. Move web server back out into DMZ.

    This worked, and now my production content database is around 600MB. (Oddly, my staging content database is 4GB, so I may have some repair to do there as well.)

    Note that deleting the web app was not easy. It put the content database into single-user mode, and I had to delete it manually.

    I have not yet tested the deployments through the firewall as I'd like to have some solid uptime before I possibly bring down the servers again. But I think I've found the issue, and at least the database size is under control.

    I hope this helps save someone so time and frustration in the future.

    If anyone has any ideas on preventing the ever-expanding database in the future, I'm all ears.

    --Sandy


Todas las respuestas

  • martes, 24 de junio de 2008 12:38sandy_lamoureux_miltoncat.com Medallas del usuarioMedallas del usuarioMedallas del usuarioMedallas del usuarioMedallas del usuario
     Respondida
    Well, for anyone interested...
    I finally had enough. Our website wouldn't come up after multiple reboots yesterday morning, so we pulled both VMs behind the firewall and tried content deployments, as well as import/export. Not much luck there. But I noticed my content database had grown from 8GB 2 weeks ago to 20GB yesterday! I assume this had to do with the failed/timed out content deployments. So, after trying a few things it came down to this:

    1. Delete the entire web application on production.
    2. Create new web application on production with new blank template.
    3. On staging, delete exisitng content deployment jobs and create a new full content deployment job to production.
    4. Run full deployment to production.
    5. Move web server back out into DMZ.

    This worked, and now my production content database is around 600MB. (Oddly, my staging content database is 4GB, so I may have some repair to do there as well.)

    Note that deleting the web app was not easy. It put the content database into single-user mode, and I had to delete it manually.

    I have not yet tested the deployments through the firewall as I'd like to have some solid uptime before I possibly bring down the servers again. But I think I've found the issue, and at least the database size is under control.

    I hope this helps save someone so time and frustration in the future.

    If anyone has any ideas on preventing the ever-expanding database in the future, I'm all ears.

    --Sandy


  • miércoles, 16 de julio de 2008 0:28kuke Medallas del usuarioMedallas del usuarioMedallas del usuarioMedallas del usuarioMedallas del usuario
     

     

    Hi Sandy,

     

    It's worrying that the new Content Deployment QFE's [http://www.andrewconnell.com/blog/archive/2008/07/01/KBs-are-now-out-for-the-two-content-migrationdeployment-QFEs.aspx] you've applied haven't solved the issue - I was kind of hoping they would.

     

    We too have this ever-expanding database issue - previously we thought it was:

    • The recovery model of SQL and log files - so we set it to simple with no luck
    • Versioning on the production side - so I wrote a tool to enumerate all sites/lists and disable versioning with no luck
    • The fact that CD creates a lot of unused space in the DB - so we scheduled regular SHRINK commands with no luck

    The only other real discussion I've found on this issue was here: http://www.eggheadcafe.com/software/aspnet/30906343/full-deployment-of-conten.aspx, states that the problem is Full Deployments that don't actually clean themselves up and so you're only supposed to do a Full Deployment once. (We we've been running full deployments nightly as the incrementals were failing occassionally). So the question is, if you're only supposed to run a Full Deployment once - why can schedule them?!?

     

    I too would love any update on this issue.

     

    Just when you think you've resolved all your CD problems - you find new ones!

     

    Cheers,

     

    Peter