Type I Clusters: Intact Metabolic Pathways and Multimeric Proteins 
We measured the performance of our algorithm by examining whether it identified gene groupings known to be functionally related (Type I clusters).
Only four (16%) of 25 Type I clusters (spy0080-0081, spy1236-1237, spy1707-1711, spy2041-2042) could have been identified in entirety by significance analysis because all clustered genes exhibited significant differential expression (PF value < 0.05).
A total of 11 (52.4%) of the remaining 21 clusters would not have been identified in their entirety without GenomeCrawler because we initially identified significant fold-changes in only a subset of genes necessary to encode particular pathways or loci; this is intuitively unreasonable if all genes are essential for functionality.
GenomeCrawler expanded these clusters to contain more genes that encode intact loci (Table 3).
For example, we initially identified (Table 1) the significant upregulation of three of the five known gene members of the folate biosynthetic pathway [40] (spy1096-1100), but GenomeCrawler identified a significant cluster containing all five genes (Table 3 and Figure 2B).
We obtained a similar result for the eight-gene operon encoding the F0F1-type proton translocating ATPase [41] (spy0754-0761).
The initial significance analysis identified only four atp genes (Table 1), but neighbor clustering identified a significant cluster containing all eight genes necessary to encode a functional ATPase (Table 3).
Each of the 11 neighbor clusters that could have been only partially identified by our initial analysis alone gained gene members after application of the algorithms and became more complete sets of functionally related genes than initially identified (Table 3).
These clusters encompass various metabolic processes, including purine biosynthesis (spy0025-0028), lactose metabolism (spy1916-1923), fatty acid biosynthesis (spy1743-1747), lipoteichoic acid synthesis (spy1308-1312), and sugar phosphotransferase transport (spy1058-1060) [14], suggesting that specific changes occur in the streptococcal metabolic program as the bacteria adhere to human pharyngeal cells in vitro.
Notably, the remaining ten Type I clusters were composed entirely of genes that individually were not significant; however, after applying our algorithms, the combined contribution of each gene resulted in a significant cluster.
For example, the nine-gene operon that spans genes spy0738-0746 encodes streptolysin S, a potent cytolytic toxin that promotes internalization and host tissue dissemination [25,44].
Though the differential expression of the individual genes was not significant following our initial statistical analysis, GenomeCrawler identified a significant downregulated cluster containing all nine genes (Table 3).
Adherence-induced downregulation of streptolysin S is consistent with its previously determined role in host cell internalization [25]; however, without neighbor clustering, expression of this operon was not evident immediately.
Although individual gene members of Type I clusters may not be statistically significant as a result of technical variability within experiments [17], the genetic structure of certain Type I operons may provide an alternative explanation.
For example, the streptolysin operon encodes an internal terminator downstream of the sagA gene (the first gene in the operon), which modulates the abundance of particular mRNA species (e.g., sagA mRNA versus the polycistronic message for all nine genes) under different environmental conditions [45].
If transcription is internally disrupted by such a terminator, the abundance of the sagA transcript may be much greater than the polycistronic message; such disproportionate transcript levels would affect log2-fold change values and impact the statistical significance of individual genes within these types of clusters.
Thus, in addition to helping resolve clusters that would not be easily recognized because of experimental technical variability, the neighbor clustering method may help to resolve operons with such internal terminators and regulators.
These results demonstrate that neighbor clustering effectively reconstructed a number of complete pathways and loci from processed array data.
Importantly, because functional gene data are not incorporated into its algorithms, GenomeCrawler is not biased toward identifying "expected" clusters.
Curating the dataset following its application may make the algorithms less user-friendly; however, the elimination of such bias is essential for this type of analysis.
