Last Updated 7-21-2008
Last name:
Enstore
Check the alarms pages (
stken
,
d0en
,
cdfen
) and deal with any critical alarms that have not been taken care of, and clear any alarms that have been taken care of as appropriate.
Check if there is a security related incidents. Immediately investigate any netscan alarms that may be security related. If the alarm appears to be the result of a potential system compromise, immediately follow the procedure to report a security incident.
Check for Read CRC or selective CRC alarms on files. Immediately investigate these volumes, notify users immediately of affected files, clone the tape and send the original out for recovery as necessary. Investigate whether the data loss can be attributed to a tape drive and take it out of service immediately if so. If attributable to a tape drive, investigate other volumes written around the same time by that drive
Check for offline tape drives
Check the volumes with issues pages (
stken
,
d0en
,
cdfen
). Arrange to have any volumes cloned or otherwise take care of. If appropriate notify the users of any file integrity issues immediately. For notification of damaged files, the mail addresses are
cms-t1@fnal.gov;enstore-admin@fnal.gov
for CMS,
cdf
operations
cdfdh_oper@fnal.gov;enstore-admin@fnal.gov
for CDF,
d0en-announce@fnal.gov;enstore-admin@fnal.gov
for D0, and for individual
stken
users the
authorization
list plus enstore-admin@fnal.gov
Scan the plots on the plots pages
(
stken
,
d0en
,
cdfen
)
for any discrepancies.
Scan the
cron
plots
(
stken
,
d0en
,
cdfen
) to make sure all
cron
jobs are executing and investigate and for any that are not, investigate and fix, or report to
mailto:enstore-admin@fnal.gov?subject=Attn:Devel
for development assistance.
Walk through the enstore areas, check the stardom and
nexsan
raid array lights and status displays for disk failures or other problem indications.
Check email from enstore for any degraded enstore raid arrays (note: you must read the email). Replace drive and rebuild any that are degraded. Check that mail is being received from all enstore raid systems.
Check email from enstore that the metadata backups successfully completed. Check that metadata backup email is being received daily from all three systems:
stken
, d0en and
cdfen
.
dCache
Check that all cell services (
stken
,
cdfen
) are operational at the beginning and end of the day. Investigate and fix if not and get developer assistance from
mailto:dcache-admin@fnal.gov?subject=Attn:
Devel
.
Check e-mail (dcache-admin, dcache-auto) for reports of problems
(this should in include
pageDcache
reports).
Monitor
pageDcache
messages for door problems and resolve them or report them to dcache-admin for development assistance if needed.
Check email from enstore for any degraded public dCache raid arrays (note: you must read the email). Replace drive and rebuild any that are degraded. Check that mail is being received from all public dCache raid systems.
Check the incremental public dCache metadata consistency pages (
cdf
,
d0
,
stken
) for discrepancies. See the
pnfs inconsistencies investigation
guide for guidance
Enter additional comments: