I’m building an application that does the following:
Gave a set of bio data files (~12 columns, several hundred to several thousand rows) and a large look-up file (different format, ~15 columns, ~50,000 rows). Each row in the library represents a “status”, I have a program that contains function to calculate the “energy” of a “status”.
1 – Need to modify the energy calculation then form a graph using the new energy as weights for the nodes.
2 – Then write a C program to solve for a graph-theoretic problem with a given algorithm (distributed). The program should use message-passing interface for distributed computing. Should be able to run on a multi-core 64-bit Linux machine.
3 – The output of the graph problem decides certain rows of the look-up file to be chosen as the “optimized” rows.
4 – According to these selected rows, modify the set of bio data files. Return the modified bio data files as output.
I can provide: input files, the library file that connects the two, a program that includes the bio energy calculation but needs modification, the graph problem formulation and algorithm.
