Introduction 
Microarray technology is now commonly used to reveal genome-wide transcriptional changes in bacterial pathogens during interactions with the host.
Several factors, however, limit the power of such analyses, including inadequate statistical analysis and insufficient sample replication, both of which do not account for experimental variability, and often result in arbitrary thresholds for significance [1,2].
In addition, unknown bacterial genes can confound the interpretation of expression profiles, restricting many microarray studies to the differential expression of well-characterized genes.
Several methods are available to organize gene expression profiles and to assist in extracting functional or regulatory gene information from microarray datasets.
Clustering algorithms group genes by similarities in expression patterns, based on the assumption that co-expressed genes share common function or regulation [3,4]; however, clustering solely by co-expression patterns may not reveal a considerable amount of information contained in array data.
These methods often: (1) produce unreliable data by missing known gene members of biological pathways; (2) fail to distinguish truly related gene clusters from coincidental groupings; and (3) identify clusters containing only unknown genes that may lack either common function or regulation, a considerable limitation for genomes containing a large percentage of undefined genes [1,2].
Because no tools exist to interpret unknown gene clusters or to assess their significance and completeness, a significant portion of bacterial expression profiles are not interpretable using current clustering methods.
We introduce neighbor clustering as a new tool for analyzing bacterial microarray data that addresses some of these limitations by incorporating the physical position of genes on the bacterial chromosome into the analysis of expression data.
Information about gene function and regulation is stored intrinsically in the bacterial genome structure, as genes with common function or regulation tend to be physically proximate on the chromosome and often linked as operons [5,6].
We incorporated these positional data into a series of neighbor clustering algorithms, named GenomeCrawler, that identifies groupings of potentially related genes from array data by combining two informative characteristics of bacterial genes that share common function or regulation [3-6]: (1) similar gene expression profiles (i.e., co-expression); and (2) physical proximity of genes on the chromosome.
The algorithms also recalculate the statistical significance of each gene as a member of a particular cluster, as well as the significance of each resulting grouping as a whole, to ensure accuracy of cluster assignments.
This process ultimately identifies significant clusters of co-expressed gene neighbors that likely share common function or regulation.
We used this approach to analyze microarray expression data from group A streptococci (Streptococcus pyogenes) during adherence to human pharyngeal cells, the first overt infection step [7].
The ability of all bacterial pathogens to infect the human host depends upon coordinated regulation of diverse gene sets that are required for survival in host environments.
Although recent microarray studies have highlighted the molecular responses of streptococci in relevant host conditions [8-10], characterizing differentially expressed loci during pharyngeal cell adherence is critical for understanding the molecular basis for host colonization.
Studies from our laboratory [11,12] and others [13] have demonstrated that in vitro association with pharyngeal cells results in streptococcal phage induction and the increased expression of phage-encoded virulence factors.
Although the mechanisms mediating these responses are not known, the results of these studies indicate that streptococci sense and, on a transcriptional level, respond to various signals and cues in the pharyngeal cell environment.
We undertook the present study to understand and to assess more accurately the genome-wide transcriptional responses of streptococci during one of the earliest recognized stages of infection, namely adherence to human pharyngeal cells.
We compared data generated before and after neighbor clustering to show that this method provides a more comprehensive view of transcription by: (1) identifying more differentially expressed genes than even traditional, rigorous statistical analyses; (2) reconstructing intact biological pathways that statistical significance analysis could not reconstruct; and (3) providing preliminary insight and clues about the function or regulation of uncharacterized genes by associating their co-expression with physically proximate, functionally defined genes.
