CIDentify is a homology-based database search algorithm designed to aid in the identification of unknown peptides by mass spectrometry (MS). It is designed to be used in conjuction with Lutefisk97 - a de novo interpretation of the unknown MS/MS spectra written by Rich Johnson, Immunex Corp., Seattle, WA. (Current Lutefisk97 source and compiled application are available from http://128.95.12.16/Lutefisk97.html). CIDentify uses the list of possible peptides produced by Lutefisk97 to search a database for homologous sequences taking into account sequence ambiguities due to the nature of the MS data. (See J. A. Taylor and R. S. Johnson (1997), "Sequence Database Searches Via de novo Peptide Sequencing by Tandem Mass Spectrometry", Rapid Communications in Mass Spectrometry X:XX-XX.) CIDentify is a derivative of the William R. Pearson's FASTA homology-based database search adapted by Alex Taylor of the University of Washington. The current version of CIDentify is based on version 20u6 (Aug. 1996) of the FASTA program package. (For information on FASTA see W. R. Pearson and D. J. Lipman (1988), "Improved Tools for Biological Sequence Analysis", PNAS 85:2444-2448, and W. R. Pearson (1990) "Rapid and Sensitive Sequence Comparison with FASTP and FASTA" Methods in Enzymology 183:63-98.) CIDentify takes a list of possible peptides from a Lutefisk97 output file as input. For example: Sequence Rank X-corr IntScr AEFVNNTK 1 0.981 1.000 AEFVEVTK 2 0.969 0.988 AEFVDLTK 3 0.920 0.937 AEFVN[144]AK 4 0.908 0.926 AEFMPVTK 5 0.908 0.925 AEFVEEAK 6 0.901 0.918 AEFVKTVE 7 0.865 0.882 AEFVK[201]K 8 0.865 0.882 AEFVTKVE 9 0.865 0.882 AEFVT[228]K 10 0.865 0.882 The Lutefisk97 rankings and scores are ignored by CIDentify. Numbers in brackets within the peptide sequence are the nominal mass of a dipeptide of unknown order and/or identity due to incomplete fragmentation data in the CID spectra. Each possible peptide sequence is treated as equally probable as they are used as query sequences to search a sequence database. The score for each database sequence is the sum of its best score vs. each of the query sequences. The CIDentify Result Compiler program can be used to combine the CIDentify output for several peptides derived from the same protein. To use the result compiler, you must first create a file containing the full path file names of the individual CIDentify output files. The top scoring match in each individual file is given a rank score of 200, the second is given a rank score of 199, etc. These rank scores are combined for database sequences found in multiple results list and they are then used to sort the final combined list. Source Code Archive Contents: ¥ Compiled CIDentify application for PPC ¥ Example Lutefisk97 output file - "BSA - [200]MKDFVAFVDK" ¥ Example CIDentify output file - "BSA - [200]MKDFVAFVDK.out" ¥ C Source code files for CIDentify: - CIDentify.CWX.µ - MetroWerks Project file for compiling on the Macintosh - fffasta.c - fasta.rsrc - Macintosh program resources - nxgetaa.c - f_band.c - scalesws.c - zzlgmata.c - jat.c - LutefiskGlobals.c - checkevent.c - Macintosh specific routines - pam.c - getenv.c - getopt.c - time.c - ndispn.c - l_band.c - llmax.c - g_band.c - FileDlog.c - Macintosh specific dialog routines - Included header files: - altlib.h - ffasta.h - f_band.h - getenv.h - getopt.h - g_band.h - jat.h - llmax.h - Lutefisk.h - LutefiskGlobals.h - l_band.h - mytime.h - ndispn.h - nxgetaa.h - pam.h - scalesws.h - uascii.gbl - upam.gbl - zzlgmata.h ¥ C source code files and changes specific for CIDentify (DNA) - CIDentify (DNA) CWX.µ - MetroWerks Project file for compiling on the Macintosh - lx_align3.c - lx_band2.c - faatran.c - zxlgmata.c - Included header files: - aamap.gbl - Removed files: f_band.c, g_band.c, l_band.c, llmax.c, zzlgmata.c - (the line: #define TFASTX in ffasta.h must also be uncommented.) ¥ C Source code files for CIDentify Result Compiler: - CIDentify RC CWX.µ - MetroWerks Project file for compiling on the Macintosh - CIDentifyRC.c - Other source files (jat.c, checkevent.c, and FileDlog) are the same as for CIDentify. - To use the result compiler, make an index file with the path names of the CIDentify output files to be compiled.