The conditional value-at-risk (CVaR) is a useful risk measure in fields such as machine learning, finance, insurance, energy, etc. When measuring very extreme risk, the commonly used CVaR estimation method of sample averaging does not work well due to limited data above the value-at-risk (VaR), the quantile corresponding to the CVaR level. To mitigate this problem, the CVaR can be estimated by extrapolating above a lower threshold than the VaR using a generalized Pareto distribution (GPD), which is often referred to as the peaks-over-threshold (POT) approach. This method often requires a very high threshold to fit well, leading to high variance in estimation, and can induce significant bias if the threshold is chosen too low. We address this bias-variance tradeoff by developing a novel CVaR estimator based on the POT approach that is proven to be asymptotically unbiased and less sensitive to lower thresholds being used. It is then shown empirically that the new estimator provides a significant reduction in error compared with competing CVaR estimators in finite samples from heavy-tailed distributions. To demonstrate the use of the estimator in sequential decision-making, it is applied in a best arm identification multi-armed bandit problem under a fixed budget, and a significant performance improvement is shown when compared with other estimators.