What is tape migration? Tape migration is an administrator provided service that relocates files to other physical medias. The minimal migration unit is a file but it is usually done in batches grouped by volumes. Migration processes are administered in the background, taking the lowest priority in acquiring system resources to reduce the impact on ENSTORE system performance. An ordinary user who accesses files only through /pnfs path will not see any difference before and after migration and the files are always available even in the middle of migration. The development of tape migration have been going on for two years and is in a pretty stable state. The early version was used in CDF 9940 to 9940B migration in 2003-2004 and the current version was used in STKEN 9840 (eagle) -> 9940B migration in 2004, both involved more than 1000 volumes. CMS 9940 -> 9940B migration will start very soon. How does tape migration help? The followings are the typical applications for tape migration. [1] migration to different media Newer media/drives usually have larger capacity, better price performance and lower operational cost. To take advantage of newer technologies, we may migrate data off old media to the new ones. In 9940 -> 9940B migration case, 9940B media is physically the same as 9940 one with different format and more than 3 times (200GB va. 60GB) capacity. By migrating files off 9940 tapes and recycling them to be 9940B tapes, we effectively increased the capacity three times without adding new media. There were cases that the older media/drives were obsolete or not supported any more. Then, the data had to be migrated. [2] data compaction Deleted files still occupy space on media. Migration is a way to reclaim the space. [3] problematic media When a media starts to show sign of problem, its data can be migrated to a good one. What are involved? Even though migration is transparent to the users, this is what happens behind the scene: Live cycle of a file in migration: [0] p(x) is a pnfs entry with all its attributes of path x; f(x) is a file record with all its attributes of bfid x. assuming file p1 in pnfs with bfid b1 is on volume v1 and it is going to be migrated. [1] copy p1 to disk file d1 -- temporary disk file d1 is uniquely named based on the volume name and location cookie. [2] copy d1 to p2, getting bfid b2 on volume v2; delete d1 -- p2 is in a special pnfs area that is not mixed with the rest -- p2 is named based on p1 [3] swapping meta data of p1(b1) and p2(b2) -- modify f(b2) so that its pnfsid points to p(p1) -- copy p(p2) to p(p1) -- now p(p1) points to f(b2) and reading through p1 is reading f(b2) on v2 [4] final check -- read the file back through p1 and check the crc -- if succeeds, mark f(b1) deleted and yank p2, and this concludes the migration of p1 Note: * Each step guarantees its own success before going to the next step. * Before any meta-data is changed, a series of extensive paranoid checks are performed. If any one fails, the whole step fails. * The state of each step is recorded in a persistent store -- the task can be reissued with the same parameters and it picks up from where it left off last time. * The file is always available. -- before step 3, p(p1) reads f(b1) on v1; immediately after step 3, p(p1) reads f(b2) on v2. * The history is recorded. In batch mode, step [4] (final check) is delayed until the migrate-to volume is invariant. Shall there be any error occurred, none of the meta-data is ever lost and we can always restore the p(p1) to f(b1). What's the impact on user? Migration is transparent to users. It requires no user involvement. It does not require any dedicated resource. To enstore system, migration is just another user task. When it needs system resources, it acquires them at the lowest priority, yielding to other user transfers. It is also a long running task in the background, it fully utilizes the un-used system capacity.