MongoDB GridFS triples the file sizes

filesmongodbmongodb-3.6

I really like using mongodb to store my data and recently I tried out GridFS and it really fits my use case.

My problem with it is the space requirement, which seems quite odd. I have ~107GB of images in Amazon S3, which is around 1 million files (all images, mostly small ones). I made a simple Java project to download the images from S3 and insert them into two separate MongoDB GridFS collections (single server, 3.6.5, 64 bit, Windows Server 2016). The problem is, when the upload/download completes, the GridFS collections take more than 300GB storage on the server. Is this acceptable for this kind of collection or should I worry about the tripled size?

Note: I simply inserted the images using the Java Mongo Driver (Spring Boot) without any significant change, the problem is with the image chunks. I do not delete or update any images (I defined a unique index for the MD5 field though, to ignore image duplication), thus compact and repair does not change the collection sizes. As much as I can see, the collections are not overly preallocated (I don't think my problem is similar to this: Huge size on mongodb's gridfs. Should I compact? )

Also, currently it is a single mongodb server, without a replica set.

Thank you very much for your help!