Mongodb faster execution for $in

I have two mongodb collections:
1). First collection is: sourcequeuemanualupload . It contains all records with statuses either Unworked or Assigned. This contains records when data is uploaded. Example of one of the Assigned item record in DB:

{
        "_id" : ObjectId("63e0e46a6047d75b9c20d8ec"),
        "Properties: Name" : "Hangman - Guess Words",
        "Appstore URL" : "https://itunes.apple.com/app/id1375993101?hl=None",
        "Region" : "na",
        "Create Date" : "na",
        "AHT" : "1",
        "sourceId" : "63e0e3719b4f812ba5333a31",
        "type" : "Manual",
        "uploadTime" : "2023-02-06T11:28:42.533+0000",
        "status" : "Assigned",
        "batchId" : "63e0e3719b4f812ba5333a31_746f22e4319b4d81b8ab255f5e653c2c_612023112842"
}

2). Second collection is queuedata. It contains the data(questions, responses user takes on tool) for items(from sourcequeuemanualupload collection) worked upon and they are all saved in this collection with a status of Completed. The _id of the record in sourcequeuemanualupload is stored here as “id” field as an identifier. Example of completed item:

{
        "_id" : ObjectId("63e0e4b19b4f812ba5333a34"),
        "templateId" : "63e0e28e9b4f812ba5333a30",
        "id" : "63e0e46a6047d75b9c20d8ec",
        "moderator" : "kodaga",
        "startTime" : "2023-02-06T11:29:46.048Z",
        "endTime" : "2023-02-06T11:29:52.438Z",
        "status" : "Completed",
        "AHT" : NumberLong(6),
        "userInput" : [
                {
                        "question" : "Is the URL leading to the desired store page link?",
                        "response" : "yes"
                },
                {
                        "question" : "Comments, if any.",
                        "response" : "test 1"
                }
        ]
}

There was a mistake made by the DBA initially as he did not update the status of Assigned items that were completed to Completed in sourcequeuemanualupload collection. So the Assigned items are left as Assigned itself. We are working on a fix for this to mark all the Assigned items(that have been worked upon) to Completed.

Volume of data currently:

> db.sourcequeuemanualupload.count()
414781

> db.sourcequeuemanualupload.count({"status":"Assigned"})
306418

> db.queuedata.count()
298128

Script I have written currently to identify the completed records and mark them as completed:


var assigned_tasks_arr = []

db.sourcequeuemanualupload.find({"status":"Assigned"}).forEach(function(rec){
    assigned_tasks_arr.push(rec._id.str);
});

print(assigned_tasks_arr.length)
> 306418

db.queuedata.count({id: {$in: assigned_tasks_arr}, "status":"Completed"})     <------------- STEP 1

var completed_items = db.queuedata.find({id: {$in: assigned_tasks_arr}, "status":"Completed"},{_id:0,id:1}).toArray();       <------------- STEP 2

var completed_items_ids = []

completed_items.forEach(function(rec){
    completed_items_ids.push(rec.id)
})

var completed_items_unique_objectid = []

completed_items.forEach(function(item){
    completed_items_unique_objectid.push(new ObjectId(item));
});

db.sourcequeuemanualupload.updateMany({_id:{$in: completed_items_unique_objectid}}, {$set:{"status":"Completed"}})  <------------- STEP 3

Basically, I fetch all the _ids of Assigned status items from sourcequeuemanualupload collection and store them in an array. Next to find if they actually have been completed and record is present in queuedata, I perform a search using $in and assigned_tasks_arr to get the actually completed items. Once i get the completed ids list, i would like to go ahead and update the statuses in sourcequeuemanualupload.

However, my STEP 1 and STEP 2 are taking>30 minutes for execution. It must be taking more than an hour or longer(but i could not get exact time as my session got terminated). I think the $in with array find query execution takes (N*log(M)), where N is the length of input array and M is the size of the collection. As you see my value of N is in millions and M is also in millions, this query will take a longer time. How do i optimize this query or write a faster query to find the ids and update at once?

I need help with faster execution of STEP 1, STEP 2 and STEP 3 in above script so that I dont have to wait for hours. Can we write a better join query here or any other way to optimize the query?

Thanks

I tried searching online for faster ways to optimize. I added index as ‘id’ to queuedata collection but it is still taking a long time 🙁
Need help for the above query

NOTE: using AWS documentDB 4.0.0 as if you are providing any suggestions on query , please check link: https://docs.aws.amazon.com/documentdb/latest/developerguide/mongo-apis.html as some commands are not supported