Good troubleshooting steps for Systems Center agent failures:
1) If you are seeing Not Monitored on the “Root Management Server” log onto to the RMS and verify that the Configuration and Health Services are running. This can be done by opening the service snap-in. If the Management Server and Agent is showing up as being not monitored then verify that the Health Service is running on each of those individual roles.
2) If you are upgrading from RTM to SP1 make sure that you import the latest updated management packs. The core system MP’s are updated automatically after upgrade but the non-core MP’s do not get updated automatically and users need to manually re-import those MPs. Kevin Holman wrote a good blog about this which can be found here. If you are seeing unmonitored after you upgrade this could be another possible reason. All the SP1 MP’s should have version 6.0.6278.0 you can find the version of the MP’s by opening the OpsMgr console going to the Administration view and selecting the Management Packs node. The version number of the MP’s will be listed in the version column. If you see an MP with a version nuber that is below 6278 then you will need to re-import those MP’s. Kevin wrote this SQL script which will also list out the MP’s and the version number.
SELECT MPName, MPFriendlyName, MPVersion, MPIsSealed
FROM ManagementPack WITH(NOLOCK)
ORDER BY MPName
3) If users renames a server or shuts down a server that is being monitored then users will see that the server will show up as being un-monitored. This is only if they just deployed a MS or an agent and someone brings down the servers. In most cases you would see a grey icon with a checkbox which indicates that the server stopped heart beating and the last known state was healthy.
4) If there is no trust between the domain the agent is on and the management server then you are likely to get the ‘not monitored’ state. If there is only a one way trust between the Management server and agent then you would also see this issue. In OpsMgr 2007 agents initiate communication with the MS and the MS uses the same channel to communicate back to the agents. An easy way to figure out if this is the issue is to go to the event log on the agent and see if you see error events that state mutual authentication could not be established.
5) Gateway Servers are one of the big criminals of showing the ‘not monitored’ state. 99% of the times when they show that they are not monitored it usually means the users have not configured the certificates correctly, they not run the Gateway approval command line tool or they are not using the right Public Key Infrastructure (PKI).
6) When you deploy a management pack and the action account is configured as a low priv account, some workflows (monitors/rules/discoveries/tasks/diagnostics/recoveries) may not be able to execute because by default they will run under the low priv account and may not have sufficient rights to access the instrumentation they need in order to function properly. You can get more information on this from Boris’s blog.
7) If the agent action account does not have enough privileges some of the properties of the server will show up as being not monitored. Check the OpsMgr agent event log for event 1201. The agent health service normally will log that event for each management pack it downloads. Alternatively you can also go to the Heath State folder under the %Program files%Microsoft System Center Operations Manager 2007 (there should be a management pack folder) you can check to see if the agent downloaded the various MPs. If there are no P’s in the folder it is another sign that agents are not getting configuration from the Management Server.
While the above steps may not be the direct solution to the problem they should help put you on the right track to diagnose the root of the issue.