I've successfully implemented a java program that uses two common data structures: a Tree
and a Stack
along with an interface that allows a user to enter in a tree node ID and get information about it in relation to its parent. You can look at the latest version of this program here at my GitHub src for this program
Background
This ad hoc program I wrote is used to study the evolution of gene flow across hundreds of organisms by comparing data in a file that consists of: FeatureIDs = String
primitives (further down these are listed in the first column as "ATM-0000011"
, "ATM-0000012"
, and so on), and consists of the scores that are associated with their presence or absence at a particular node in the tree and these are double
primitives.
Here is what the data file looks like:
"FeatureID","112","115","120","119","124",...//this line has all tree node IDs
"ATM-0000011",2.213e-03,1.249e-03,7.8e-04,9.32e-04,1.472e-03,... //scores on these lines
"ATM-0000012",2.213e-03,1.249e-03,7.8e-04,9.32e-04,1.472e-03,...//correspond to node ID
"ATM-0000013",0.94,1.249e-03,7.8e-04,9.32e-04,1.472e-03,...//order in the first line
... //~30000 lines later
"ATM-0036186",0.94,0.96,0.97,0.95,0.95,...
The Problem
Previously, it was good enough to just make a 2D array of the doubles from the data file (the array excluded the first line in the file and the FeatureIDs, because they're Strings), and use the 2D array to then make double
stacks. The stacks would be made for parent and child nodes as determined by user input and the Tree
.
The data in the parent and child stacks would then be popped off at the same time (thus ensuring that the same FeatureIDs were being compared without actually having to include that data in the DS) and have their values compared based on whether they met a defined condition (ie. if both values were >= 0.75). Iff they did, a counter would be incremented. Once the comparisons were finished (stacks were empty) the program would return the count(s).
Now what I want to do instead of just counting, is make a list(s) of which FeatureIDs met the comparison criteria. So instead of returning the counter that says there were 4100 FeatureIDs between node A and node B that met the criteria, I want a list of all 4100 FeatureID Strings
that met the criteria being compared between node A and node B. I'm going to save that list as a file later but that's not of concern here. This means that I'll probably have to abandon the double
2D array/double
stack scheme which had previously worked so well.
The Question
Knowing what the problem is, is there a clever fix to this problem where I could make a change to the input data file, or somewhere in my code (tlacMain.java), without adding much more data to the process? I just need ideas.