> To find the most informative examples, we separately cluster examples labeled clickbait and examples labeled benign, which yields some overlapping clusters
How can you get overlapping clusters if the two sets of labelled examples are disjoint?
The information you're seeking appears to be left out of the post. My best guess is that a separate embedding model, specifically tuned for document similarly, is used to generate the vectors and then a clustering algorithm is chosen to create the clusters. They may also use PCA to reduce the embedded vector dimensions before clustering.
> How can you get overlapping clusters if the two sets of labelled examples are disjoint?
What's disjoint are the training labels and the classifier's output - not the values in high-dimension space. For classification tasks, there can be neighboring items in the same cluster but separated by the hyperplane - and therefore placed in different classes despite the proximity.
If the diagram is representative of what is happening, it would seem that each cluster is represented as a hypersphere, possibly using the cluster centroid and max distance from the centroid to any cluster member as radius. Those hyperspheres can then overlap. Not sure if that is what is actually happening though.
> To find the most informative examples, we separately cluster examples labeled clickbait and examples labeled benign, which yields some overlapping clusters
How can you get overlapping clusters if the two sets of labelled examples are disjoint?