Frequently Asked Questions Q: Why do some families indicate homology at the clan level and for others at the Pfam level? A: The 100 query families in this database has been curated to account for homologies beyond those recognized by Pfam as described in (M. W. Gonzalez and W. R. Pearson. RefProtDom: A Protein Database with Improved Domain Boundaries and Homology Relationships. Manuscript in Preparation). It is widely accepted that homology can always be traced to the superfamily level (i.e. the clan level in Pfam). Therefore, one of the most obvious steps in our extened annotation process was to coalesce all the Pfam families into their respective clans. Whenever a single family was the sole representative of its superfmaily, the superfamily designation is of the form "PF#####". On the other hand, when the superfamily had several families, the superfamily is named as "CL[clan_id]". Although we recommend evaluating homology at the superfamily level, we also provide the pfam_to_clan.txt file, should you need to map the correspondence between pfams and clans. For anyone who uses the SQL releases of Pfam, the clan_id corresponds to the auto_clan number. Q: Does your database contain extended annotation for all the sequences in Pfam? A: No. The search libraries (library_*_domains*.fa.gz) and annotation files (family_members.annot.gz) only have the sequences that had homologs for a set of 97 divergent families selected from Pfam v. 21. In addition, we supplemented the homologies specified by Pfam for these 97 families by 1) extending partial homologies using local and semi-global searches (mode=ext) 2) by annotating missed homologs using reverse PSI-BLAST searches and CATH/SCOP structural evidence (mode=ua).