HPC, ABFT, Fault Tolerance, Parallel and Distributed Systems, MPI, Parallel Search, Checkpointing, Parallel Dynamic Programming