locked
Audit Collection Services - forwarders stops forwarding events RRS feed

  • Question

  • Hi,

    I'm in progress of deploying ACS and has run into a problem with the forwarders.

    I'm on OpsMgr 2007 R2. About ½ of my machines has had the ACS forwarder enabled i.e. 350. The problem is that many of the agents has stopped forwarding events to the Collector. There are not errors to see in the eventlog or in the console. In the console the forwarders are all green and using netstat on the collector server I can see the machines has established a connection *however* when looking at the performance object (on the collector) "ACS Collector Client" where you can see the Incomming Audits/sec pr forwarder, only a few forwarders are listed. This corresponds to the fact that there are no audit events in the database from the collectors (looking at current date). In the Operations manager log on a machine where the forwarder is enabled there are not errors - only a informational event saying the forwarder has successfully connected to the collector. Now when I restart the forwarder the forwarder is listed in the previous mentioned performance object/counter.

    The collector filter I have applied is a positive list of events I want to write to the database instead of a negative filter which is the normal approach but I don't think this is the problem.

    Has anyone seen this symptom and/or knows how I can troubleshoot the problem?

    br
    Lars

    Has
    Thursday, January 7, 2010 12:58 PM

Answers

  • Hi,
    Could you start with restore the default filter on the ACS collector? Then if you need to filter events you can also use group policy objects to control what your machines audit. If you restart a ACS fowarder, can you then see event from that machine?
    Anders Bengtsson | Microsoft MVP - Operations Manager | http://www.contoso.se
    Friday, January 8, 2010 8:25 PM
  • Hi Lars

    If restarting the forwarder makes it work then the noise filter isn't the problem. It sounds like the the forwarders have been disconnected by the Collector at some time and then have failed to reconnect. A restart of the forwarder is kicking them back on.

    Are you using SQL Standard or Enterprise for the ACS database? With Standard edition, the database must pause during daily maintenance operations. This may cause the ACS collector queue to fill with requests from ACS forwarders. A full ACS collector queue then causes ACS forwarders to be disconnected from the ACS collector. Although the disconnected ACS forwarders should reconnect after the database maintenance is complete, perhaps this is failing to happen

    If you look at the Collector event log, do you have a lot of disconnected events (4631 I think).

    There is good troubleshooting info here:
    http://securevantage.spaces.live.com/blog/cns!905E136EE69247B4!332.entry

    Cheers

    Graham
    View OpsMgr tips and tricks at http://systemcentersolutions.wordpress.com/
    Monday, January 11, 2010 10:00 AM

All replies

  • Hi,
    Could you start with restore the default filter on the ACS collector? Then if you need to filter events you can also use group policy objects to control what your machines audit. If you restart a ACS fowarder, can you then see event from that machine?
    Anders Bengtsson | Microsoft MVP - Operations Manager | http://www.contoso.se
    Friday, January 8, 2010 8:25 PM
  • Hi Lars

    If restarting the forwarder makes it work then the noise filter isn't the problem. It sounds like the the forwarders have been disconnected by the Collector at some time and then have failed to reconnect. A restart of the forwarder is kicking them back on.

    Are you using SQL Standard or Enterprise for the ACS database? With Standard edition, the database must pause during daily maintenance operations. This may cause the ACS collector queue to fill with requests from ACS forwarders. A full ACS collector queue then causes ACS forwarders to be disconnected from the ACS collector. Although the disconnected ACS forwarders should reconnect after the database maintenance is complete, perhaps this is failing to happen

    If you look at the Collector event log, do you have a lot of disconnected events (4631 I think).

    There is good troubleshooting info here:
    http://securevantage.spaces.live.com/blog/cns!905E136EE69247B4!332.entry

    Cheers

    Graham
    View OpsMgr tips and tricks at http://systemcentersolutions.wordpress.com/
    Monday, January 11, 2010 10:00 AM
  • Hi Anders,

    yes I could remove my custom filter to restore the filter but I don't understand why this would make a difference. I have tested changing my filter to include events id's I know is logged on a forwarder that is not forwarding events but hasn't disconnected according to the forwarder log. When I restart the agent then the events get logged.

    br
    Lars
    Tuesday, January 12, 2010 8:12 PM
  • I don't think it is a filter problem either. I looks as the forwarder just "gives" up however it doesn't log anything and nor does the collector plus the collector has a tcp connection with status established when I run a netstat. A restart solves the problem. But it worries me that there are no errors.

    The SQL is the standard edition. I'll look into what time the maint. is occurring but so far the Collector queue has not been filled up. The SQL maintenance is something I will more into in terms of how it affects the forwarders.

    There are only one server which is causing events of 4631 as far as I can see- Looks like a machine related problem.

    I will look at the trouble shooting guides and also enable logging on some of the agents.

    I had hope that the ACS implementation had been straight forward.

    br
    Lars


    Tuesday, January 12, 2010 8:21 PM
  • Hi Lars.

    The ACS Implementation on itself is straight forward. Some issues can make this to a more intensive task but most of the times not on the ACS Forwarder side of the story since that part of ACS is the less complex part. Most issues/challenges are found on the Collector side (Filtering and so on) and reporting side when an archiving solution is needed or additional security settings and planning/scheduling of these reports.

    As you stated before - or I misunderstood you - it happens on one server only. Is there any policy in place preventing the ACS Forwarder to run properly? Any antivirus software perhaps? A firewall? Other software which runs on that server only in place which has issues with ACS perhaps?

    Please let us know how you went. For now I mark Anders and Grahams replies as answers.
    Best regards, Marnix Wolf

    (Thoughts on OpsMgr)
    Monday, January 18, 2010 7:40 PM
  • Hi Marnix,

    No it happens on many servers. I don't think it is a policy, anti virus or any other piece of software. This looks like an ACS bug to me. I'm going to create a support case on this.

    Thanks

    Kind regards
    Lars
    Tuesday, January 19, 2010 3:01 PM
  • Hi Lars

    Please keep us posted on this - I haven't seen this occuring anywhere I have implemented ACS but would certainly be interested in finding out the underlying cause.

    If anything (and this is a guess!) I would say it is more likely that it is Collector that has the issue than the agents. If you bounce the ACS Collector server, do you get any errors in any of the windows event logs on startup? What is the underlying OS of the collector? Is SQL on the same box or remote? Which version of SQL?

    A support case though should be the quickest method of resolving the issue.

    Good Luck

    Graham
    View OpsMgr tips and tricks at http://systemcentersolutions.wordpress.com/
    Tuesday, January 19, 2010 3:13 PM
  • Hi Graham,

    Thanks for replying! I will keep you posted.

    If I bounce the collector service then on the collector I do not see any errors related to the Collector server other than warnings for some forwarders saying a gap was detected. I get warnings on some forwarders saying they could not connect to the collect (while it is restarting) and on some there are no warnings just an info event say the forwarder success fully connected - no event indicating it has ever disconnected which I would have expected.

    The OS is Win 2003 (64 bit) SP2.
    SQL is on a remote box. Its SQL Server 2005 std 64 bit

    I will do some more investigation and testing.

    br
    Lars


    Thursday, January 21, 2010 10:07 AM
  • Has anyone resolved your problem?  I have a similar problem and cannot find anything in the net to lead me to a solution.

    I have 2008 RODC's that the ACS Forwarder connects find to the collector, but my XP POSReady devices constantly connect/disconnect every 1-5 seconds.  Any ideas out there are more than welcome!! :-)

    Thanks

     

     

    Wednesday, April 21, 2010 8:57 PM
  • I too have had this problem just pop up in the last 3 weeks or so.  All but one of my forwarders just stopped forwarding with no clear errors.  What was your resolution for this?
    Monday, July 12, 2010 3:57 PM
  • There is an issue if the security log is cleared, which it is on the Domain controllers on a regular basis. In this scenario the forwarder does not resume forwarding events. It is supposed to be fixed in cummulative update 2, allthough I have not had time to verify this.

    br

    Lars

    Monday, July 12, 2010 8:01 PM
  • Good to know.  I also notice mine stopped forwarding at the exact moment 8 new MS patches were installed on my ACS server.  Coincidence?  I'm about to find out.  :)

     

    Thanks for your reply.

    Monday, July 12, 2010 8:04 PM
  • OK.

     

    In our case it was a domain policy that had been applied to grant another group read access to the event logs on DCs.  This inadvertently blocked out the Network Service account from having rights to the logs.  Changed all audit forwarders to log on as LocalSystem and the problem was immediately resolved.

    • Proposed as answer by SCOMGabe Friday, July 16, 2010 3:02 PM
    Friday, July 16, 2010 3:02 PM