PROGRAM DESCRIPTION: CIDentify is a homology-based database search algorithm designed to aid in the identification of unknown peptides by mass spectrometry (MS). It is made to be used in conjuction with Lutefisk, the de novo MS/MS interpretation program written by Rich Johnson, Immunex Corporation, Seattle, WA. (Current Lutefisk source and compiled application are available from http://www.immunex.com/researcher/lutefisk.html). CIDentify uses the list of possible peptides produced by Lutefisk to search a database for homologous sequences taking into account sequence ambiguities due to the nature of the MS data. (See J. A. Taylor and R. S. Johnson (1997), "Sequence database searches via de novo peptide sequencing by tandem mass spectrometry", Rapid Communications in Mass Spectrometry 11:1067-1075.) CIDentify is a derivative of the William R. Pearson's FASTA homology-based database search adapted by Alex Taylor of the University of Washington (now at Immunex Corp., Seattle, WA). The current version of CIDentify is based on version 20u6 (Aug. 1996) of the FASTA program package. (For information on FASTA see W. R. Pearson and D. J. Lipman (1988),"Improved Tools for Biological Sequence Analysis", PNAS 85:2444-2448, and W. R. Pearson (1990), "Rapid and Sensitive Sequence Comparison with FASTP and FASTA" Methods in Enzymology 183:63-98.) CIDentify takes a list of possible peptides from a Lutefisk output file as input. For example: Sequence Rank X-corr IntScr AEFVNNTK 1 0.981 1.000 AEFVEVTK 2 0.969 0.988 AEFVDLTK 3 0.920 0.937 AEFVN[144]AK 4 0.908 0.926 AEFMPVTK 5 0.908 0.925 AEFVEEAK 6 0.901 0.918 AEFVKTVE 7 0.865 0.882 AEFVK[201]K 8 0.865 0.882 AEFVTKVE 9 0.865 0.882 AEFVT[228]K 10 0.865 0.882 The Lutefisk rankings and scores are ignored by CIDentify. (Hence it is easy to create your own CIDentify input file without using Lutefisk by copying the header line, " Sequence Rank X-corr IntScr", into a new text file and adding your own list of sequences to search - one per line.) Numbers in brackets within the peptide sequence are the nominal mass of a dipeptide of unknown order and/or identity due to incomplete fragmentation data in the CID spectra. Each possible peptide sequence is treated as equally probable as they are used as query sequences to search a sequence database. The score for each database sequence is the sum of its best score vs. each of the query sequences. The CIDentify Result Compiler program can be used to combine the CIDentify output for several peptides derived from the same protein. To use the result compiler, you must first create a file containing the full path file names of the individual CIDentify output files. The Result Compiler program is then run and this file selected. The top scoring match in each individual file is given a rank score of 200, the second is given a rank score of 199, etc. These rank scores are combined for database sequences found in multiple results list and they are then used to sort the final combined list. To get a brief summary of the important command-line options, invoke CIDentify with the '-h' option. (On the Macintosh, command-line options are entered on the 'Argurments:' line of the initial dialog box. To use command-line options under Win32 the executable must be invoked from a DOS prompt such as by using the Command Prompt program.) The most important command-line options are: [-q] Quiet mode [-j Lutefisk query file name] [-p Database choice] [-C modified cysteine nominal mass] (e.g.: -C "160") [-N N-terminal bonus residues] (e.g.: -N "KR") [-l FASTLIBS file name] [-b Number of results to show when using quiet mode] [-s scoring matrix file name] ________________________________________________________________________________ VERSION HISTORY: 1.0 - (June, 1997) Initial release. Based on FASTA version 20u6 (Aug. 1996). 1.0.3 - (Sept, 1997) Fixed some bugs in CIDentifyX and successfully compiled the source code for CIDentify and CIDentifyX under both OSF and Solaris using gcc v2.7 1.0.4 - (Oct, 1997) Added some command-line options and made a few other adjustments so that it could be run in quiet mode on UNIX without having to feed it responses. Fixed more bugs in CIDentifyX dealing with reverse frames, unknown sequence characters (B, Z, X), and sequences that are too short stopping the search (found when encountering an entry in dbEST whose sequence was only "AC"). Adjusted the scoring of the 1 query residue = 2 library residues case to be the matrix identity score of the query residue instead of the sum of the library matrix identities to reduce score inflation. Tweaked the output format slightly: longer line length, full descriptors by default, and inclusion of the query sequences used. 1.0.5 - (Oct, 1998) Added a '-C' command-line option to set the nominal mass of a modified cysteine. Added a '-N' command-line option to set which residues in the database sequence N-terminal to the alignment produce a bonus. The default is "RK" as would be expected for tryptic peptides. Made minor modifications to the CIDentifyX alignment output to make it consistent with the CIDentify output. Made adjustments to the result compiler to make it more robust. Successfully compiled CIDentify & CIDentifyX under LINUX and as console apps for Windoze NT. 1.0.6 - (Oct, 2000) Added support for changes in Lutefisk1900 - reading ambiguous amino acid pairs as letter or floating point masses. ________________________________________________________________________________ CIDentifyMac AND CIDentifyWin32 ARCHIVE CONTENTS: * Compiled CIDentify, CIDentifyX, and CIDentify Result Compiler applications for PPC or for Win32 * Example Lutefisk output file (CIDentify input) - "BSA-200MKDFVAFVDK" * Example CIDentify output file - "BSA-200MKDFVAFVDK.out" * This README file, "0_README", and the FASTA "COPYRIGHT" * "environment" and "fastgbs" files for creating customized database menus * A folder of FASTA documentation * A folder of Blosum scoring matrices modified for CIDentify CIDentifySrc SOURCE CODE ARCHIVE CONTENTS: (The CIDentifySrc.tar.Z archive is a tar archive that has been UNIX compressed.) * Example Lutefisk output file (CIDentify input) - "BSA-200MKDFVAFVDK" * Example CIDentify output file - "BSA-200MKDFVAFVDK.out" * This README file, "0_README", and the FASTA "COPYRIGHT" * "environment" and "fastgbs" files for creating customized database menus * A folder of FASTA documentation * A folder of Blosum scoring matrices modified for CIDentify * C Source code files for CIDentify: - Makefile - Makefile for compiling CIDentify, CIDentifyX and CIDentifyRC on UNIX or LINUX - Macintosh/CIDentify.CWP4 - Metrowerks project file for compiling on the Macintosh - Win32/CIDentify.CWP4.mcp - Metrowerks project file for compiling on Win32 - fffasta.c - nxgetaa.c - f_band.c - scalesws.c - zzlgmata.c - jat.c - LutefiskGlobals.c - pam.c - getenv.c - Needed for Macintosh & Win32 versions - getopt.c - Needed for Macintosh & Win32 versions - time.c - ndispn.c - l_band.c - llmax.c - g_band.c - Macintosh/FileDlog.c - Macintosh specific dialog routines - Macintosh/fasta.rsrc - Macintosh program resources - Macintosh/checkevent.c - Macintosh specific routines - Included header files: - altlib.h - ffasta.h - f_band.h - getenv.h - getopt.h - g_band.h - jat.h - llmax.h - Lutefisk.h - LutefiskGlobals.h - l_band.h - mytime.h - ndispn.h - nxgetaa.h - pam.h - scalesws.h - uascii.gbl - upam.gbl - zzlgmata.h * C source code files and changes specific for CIDentifyX (DNA) - Macintosh/CIDentifyX.CWP5 - Metrowerks project file for compiling on the Macintosh - Win32/CIDentifyX.CWP5.mcp - Metrowerks project file for compiling on Win32 - lx_align3.c - lx_band2.c - faatran.c - zxlgmata.c - Include header files: - aamap.gbl - Remove files: f_band.c, g_band.c, l_band.c, llmax.c, zzlgmata.c - (the line: #define TFASTX in ffasta.h must also be uncommented when building the Mac or Win32 versions) * C Source code files for CIDentify Result Compiler (CIDentifyRC): - Macintosh/CIDentifyRC.CWP5 - Metrowerks project file for compiling on the Macintosh - Win32/CIDentifyRC.CWP5.mcp - Metrowerks project file for compiling on Win32 - CIDentifyRC.c - checkevent.c and FileDlog.c - Macintosh specific routines the same as for CIDentify - To use the result compiler, make an index file with the path names of the CIDentify output files to be compiled. [ Remember, UNIX uses '/' as a directory seperator while the Mac uses ':' and Win32 uses '\' ] * Compiling on the Macintosh Current Metrowerks Projects are included in the "Macintosh" folder. If you have an older compiler you will need to create a new "Std C Console PPC" project and add the source files as specified above. * Compiling under Win32 Current Metrowerks Projects are included in the "Win32" folder. If you have an older compiler you will need to create a new "C Console App" project and add the source files as specified above. * Compiling on UNIX or LINUX Simply use the "make all" command after untarring the archive. ________________________________________________________________________________ Questions? Problems? contact Alex Taylor at jataylor@hairyfatguy.com -or- ataylor@immunex.com