I’m new. I’m working on a program that reads a bunch of docx documents. I get the document content from his XML with XPATH and xmldom. It gives me an array with every line of the document. The thing is, I have something like this:
[
'-1911312-14668500FECHA: 15-12-25',
'NOMBRE Y APELLIDO: Jhon dee',
'C.I.: 20020202 EDAD: 45 ',
'DIRECCION: LA CASA',
'TLF: 55555555',
'CORREO: thiisatest@gmail',
' HISTORIA CLINICA GINECO-OBSTETRICA',
'HO',
'NULIG',
'FUR',
'3-8-23',
'EG',
'',
'FPP',
'',
'GS',
'',
'GSP',
'',
'',
'MC: CONTROL GINECOLOGICO',
'HEA',
'',
'APP: NIEGA PAT, NIEGA ALER, QX NIEGA.',
'APF: MADRE HTA, ABUELA DM.',
'',
'AGO: MENARQUIA: 10 FUR: CICLO: 4/28 ',
' TIPO: EUM',
' MET ANTICONCEP: GENODERM DESDE HACE 3 AÑOS.',
'PRS: NPS: ITS: VPH LIE BAJO GRADO 2017 , BIOPSIA.',
'FUC: NOV 2022, NEGATIVA. COLPO NEGATIVA.',
'',
'',
'EMBARAZO',
'#/AÑO',
'TIPO DE PARTO',
'INDICACION',
'RN',
'SEXO',
'RN',
'PESO',
'OBSERVACIONES',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'',
'EXAMEN FISICO:',
'PESO: 80,1 TALLA: TA: MMHG FC: FR: ',
'',
'PIEL Y MUCOSA: DLN',
'CARDIOPULMONAR: DLN',
'',
'MAMAS: ',
'',
'ABDOMEN: ',
'GENITALES: CUELLO SIN SECRECION , COLPO SE EVDIENCIA DOS LEISONES HPRA 1 Y HORA 5',
'',
'EXTREMIDADES: DLN',
'NEUROLOGICO: DLN',
'',
' IDX: LESION EN CUELLO UTERINO',
'',
'PLAN: DEFEROL OMEGA, CAUTERIZACION Y TIPIFICACION VIRAL',
'22-8-23',
'SE TOMA MUESTRA DE TIPIFICACION VIRAL.',
'',
'',
'',
'LABORATORIOS:',
'FECHA',
'HB/HTO',
'LEU/PLAQ',
'GLICEMIA',
'UREA',
'CREAT',
'HIV/VDRL',
'UROANALISIS',
'',
'',
'',
'',
'',
'',
'',
'',
... 44 more items
]
So, I want to put this content on a js object like:
const customObj = {
fecha: "fecha on the doc",
....
}
But well I think this will works:
const fillObject = (inputArray, keywords) => {
const customObj = {};
keywords.forEach((keyword, index) => {
customObj[keyword] = inputArray.map(line => {
const keywordIndex = line.indexOf(keyword);
if (keywordIndex !== -1) {
const nextKeywordIndex = keywords.slice(index + 1).reduce((acc, nextKeyword) => {
const nextKeywordIndex = line.indexOf(nextKeyword);
return nextKeywordIndex !== -1 && nextKeywordIndex < acc ? nextKeywordIndex : acc;
}, line.length);
return line.slice(keywordIndex, nextKeywordIndex).trim();
}
return null;
}).filter(Boolean);
});
console.log(customObj);
return customObj;
};
From the function I get this: the keyword with the content before the next keyword, and i want to get only the important data.
The format of the documents is always the same, but sometimes i get spaces between a keyword and its content and sometimes I don’t. The words are always capitalized.
I try the function mentioned before, but i want to be more precise on my searching and in how the data looks in the object. The final result has to be a little more accurate because the output actually looks like this:
'FECHA:': [ 'FECHA: 19-10-23' ],
'NOMBRE Y APELLIDO:': [ 'NOMBRE Y APELLIDO: John Dee' ],
'C.I.:': [ 'C.I.: 3232323' ],
'EDAD:': [ 'EDAD: 56' ],
'DIRECCION:': [ 'DIRECCION: Marylan ],
'TLF:': [ 'TLF: 55555555' ],
'CORREO:': [ 'CORREO: [email protected]' ],
'CONTACTO:': [
'CONTACTO: IG HISTORIA CLINICA GINECO-OBSTETRICA'
],
As you can see some properties are weird like “contacto” does not fit well.