HF’s Git LFS storage backend makes it easy to publish datasets and models - but difficult for anyone to collaborate on repos because LFS gets slower and slower with every change, with history bloated with full versions of every changed file.
Now with over 12PB stored in LFS (1.3m models, 450k datasets, 680k spaces), HF is replacing LFS with XetHub’s content-defined store that uses Merkle trees to deduplicate at the block level, unlocking the ability to push small changes to huge files without having to transfer/save the whole file again.
The implementation scales to 100TB repos while preserving all Git history to enable fast development on HF Hub.
The implementation scales to 100TB repos while preserving all Git history to enable fast development on HF Hub.
Read more: - Acquisition: https://huggingface.co/blog/xethub-joins-hf - Tech: https://xethub.com/assets/docs/how-xet-deduplication-works - Benchmarks: https://about.xethub.com/blog/benchmarking-the-modern-develo...