metadata only access
We have used a bioinformatics approach to evaluate the completeness and functionality of the reported human immunoglobulin heavy-chain IGHD gene repertoire. Using the hidden Markov-model-based iHMMune-align program, 1,080 relatively unmutated heavy-chain sequences were aligned against the reported repertoire. These alignments were compared with alignments to 1,639 more highly mutated sequences. Comparisons of the frequencies of gene utilization in the two databases, and analysis of features of aligned IGHD gene segments, including their length, the frequency with which they appear to mutate, and the frequency with which specific mutations were seen, were used to determine the reliability of alignments to the less commonly seen IGHD genes. Analysis demonstrates that IGHD4-23 and IGHD5-24, which have been reported to be open reading frames of uncertain functionality, are represented in the expressed gene repertoire; however, the functionality of IGHD6-25 must be questioned. Sequence similarities make the unequivocal identification of members of the IGHD1 gene family problematic, although all genes except IGHD1-14*01 appear to be functional. On the other hand, reported allelic variants of IGHD2-2 and of the IGHD3 gene family appear to be nonfunctional, very rare, or nonexistent. Analysis also suggests that the reported repertoire is relatively complete, although one new putative polymorphism (IGHD3-10*p03) was identified. This study therefore confirms a surprising lack of diversity in the available IGHD gene repertoire, and restriction of the germline sequence databases to the functional set described here will substantially improve the accuracy of IGHD gene alignments and therefore the accuracy of analysis of the V-D-J junction.