202308 Wanjuan

We release Wanjuan 1.0, a large-scale multi-modal dataset for pretraining.