Last Updated 7-21-2008
Last name:

Enstore

Check the alarms pages ( stken, d0en, cdfen) and deal with any critical alarms that have not been taken care of, and clear any alarms that have been taken care of as appropriate.

Check if there is a security related incidents. Immediately investigate any netscan alarms that may be security related. If the alarm appears to be the result of a potential system compromise, immediately follow the procedure to report a security incident.

Check for Read CRC or selective CRC alarms on files. Immediately investigate these volumes, notify users immediately of affected files, clone the tape and send the original out for recovery as necessary. Investigate whether the data loss can be attributed to a tape drive and take it out of service immediately if so. If attributable to a tape drive, investigate other volumes written around the same time by that drive

Check for offline tape drives

Check the volumes with issues pages (stken, d0en, cdfen). Arrange to have any volumes cloned or otherwise take care of. If appropriate notify the users of any file integrity issues immediately. For notification of damaged files, the mail addresses are cms-t1@fnal.gov;enstore-admin@fnal.gov for CMS, cdf operations cdfdh_oper@fnal.gov;enstore-admin@fnal.gov for CDF, d0en-announce@fnal.gov;enstore-admin@fnal.gov for D0, and for individual stken users the authorization list plus enstore-admin@fnal.gov

Scan the plots on the plots pages ( stken, d0en, cdfen)  for any discrepancies.

Scan the cron plots ( stken, d0en, cdfen) to make sure all cron jobs are executing and investigate and for any that are not, investigate and fix, or report to mailto:enstore-admin@fnal.gov?subject=Attn:Devel for development assistance.

Walk through the enstore areas, check the stardom and nexsan raid array lights and status displays for disk failures or other problem indications.

Check email from enstore for any degraded enstore raid arrays (note: you must read the email). Replace drive and rebuild any that are degraded. Check that mail is being received from all enstore raid systems.

Check email from enstore that the metadata backups successfully completed. Check that metadata backup email is being received daily from all three systems: stken, d0en and cdfen.

dCache

Check that all cell services (stken, cdfen) are operational at the beginning and end of the day. Investigate and fix if not and get developer assistance from mailto:dcache-admin@fnal.gov?subject=Attn: Devel.

Check e-mail (dcache-admin, dcache-auto) for reports of problems
(this should in include pageDcache reports).

Monitor pageDcache messages for door problems and resolve them or report them to dcache-admin for development assistance if needed.

Check email from enstore for any degraded public dCache raid arrays (note: you must read the email). Replace drive and rebuild any that are degraded. Check that mail is being received from all public dCache raid systems.

Check the incremental public dCache metadata consistency pages (cdf, d0, stken) for discrepancies. See the pnfs inconsistencies investigation guide for guidance



Enter additional comments: