Long story short, I’m trying to fetch all e-mails on my Gmail inbox, which happens to have 36k elements. The way I’m doing it is by using googleapis
library to fetch each e-mail (given that I already have the list of ids):
return this.gmailApi.users.messages.get({
userId: 'me',
id: email.id,
format: 'metadata',
metadataHeaders: ['Date', 'Subject', 'From']
});
Since I have thousands of e-mails to fetch, I’m trying to do as many as possible at the same time.
- First attempt was trying to fetch all e-mails with
Promise.all
- Second attempt was doing the same, but in batches of 500*
- Third attempt reduced the batch of requests to 50*
*: Still using Promise.all
for each batch, iterated by a for await
In all attempts, I got the following error:
Uncaught GaxiosError Error: request to <gmail api url> failed, reason: connect EMFILE <xxx.xxx.xx.xxx:xxx> - Local (undefined:undefined)
at _request (/gmail-api/node_modules/gaxios/build/src/gaxios.js:149:19)
at processTicksAndRejections (<node_internals>/internal/process/task_queues:95:5)
gaxios.js:149
Process exited with code 1
Now, I know this is related to the amount of descriptors my OS allows me to have open (mine is set to 4096, I believe), so it probably has nothing to do with google apis whatsoever, so my question is:
What’s a way of dealing with this in a performant manner that will not take hours (like it does when I fetch e-mails sequentially) or that will not exceed the maximum allowed file descriptors? I’m not sure yet why running it in small batches also didn’t work, but I’m really suspicious there are better ways of handling this, either with connection reusage or some other processing technique.
Also, I can think of a few workarounds to deal with this, but they’re all “bad practices” and most would take a very long time anyway, so I would be glad if you could share any ideas.
Some (probably) irrelevant informations about my project:
- NodeJS v18.20
- Typescript v5.6.3
- googleapis v144.0.0
- Ubuntu 20 (on WSL2)