We successfully built an AL pipeline by building models that quantify both aleatoric and epistemic uncertainties and use those quantities to drive our labeling efforts.
Specifically, post training you measure those on an holdout set and then you slice the results based on features. While these models tend to be more complex and potentially less understandable we feel the pros out-weight the cons.
Additionally, giving access to a confidence score to your end users is really useful to have them trust the predictions and in case that there is a non-0 cost for acting due to false positives/negatives you can try to come up with a strategy that minimize the expected costs.
Specifically, post training you measure those on an holdout set and then you slice the results based on features. While these models tend to be more complex and potentially less understandable we feel the pros out-weight the cons.
Additionally, giving access to a confidence score to your end users is really useful to have them trust the predictions and in case that there is a non-0 cost for acting due to false positives/negatives you can try to come up with a strategy that minimize the expected costs.