We have a db collection that is around 30 million documents, and I need to trim it down, to only keeping the documents created on the last month.
One approach would be use the remove
command with a condition on the created_at
field (the collection already have an index on this field):
db.my_collection.remove({created_at: {$lte: new Date("11/01/2012")}});
But this approach will be very slow, instead of that, a better way to do it is rename the current collection (for instance to “old_collection”) using renameCollection
. Then performing a query-and-insert from the “old_collection” into “my_collection”:
db.my_collection.renameCollection("old_collection");
db.createCollection("my_collection");
db.my_collection.createIndex(...); // recreate the indexes for the collection
// copy docs from old collection into the new collection
db.old_collection.find(
{created_at: {$gte: new Date("11/01/2012")}} ).sort({_id: -1}).forEach(
function(row) { db.my_collection.insert(row); } ); // drop old collection db.old_collection.drop();
This approach is typically faster than running a bunch of removes on your data