Biol4230 - Python/Matrix homework 4 - DUE Monday, Feb. 12, 5:00 PM


Strategies for question 3 -- looking a scoring matrix effects on alignment length and percent identity.

  1. It is useful to separate the search problem from the data analysis problem. Once you have the appropriate search results, you do not need to keep re-running searches. And, since most of the search work is done from a shell command, it may be easier to write a shell script that runs the 10 searches for each matrix, storing the results in appropriate files.

    One way to get this program working is to have it print the ssearch or blastp command, rather than actually running it. Then, you can save the resulting set of commands to a file, and try running one or two lines from that file. Once you have lines that run and produce results, you will know that your sshell script will work.

  2. Likewise, your first version of the python data analysis program might simply analize one matrix with one set of queries.
  3. When writing this program, remember that not all searches will have 5 results, so you will need to check to count how many results are in each of the averages. (And, while you are debugging the program, you should report how many samples you had. When I wrote my first version of the program, I only had one value in the average, not 10, because of an error.)
  4. you will need to pay attention to converting the strings from the results file into either integers (for alignment length) or floats (for percent identity), and make sure that values are converted to floats() when dividing by the number of samples to get the average.
  5. And, it is important to check that you are not dividing by zero when calculating the averaages.
  6. Once you have the program working for one matrix, you can either modify it by taking the code and embedding it in a for loop that works with each matrix, or you could get the matrix name from the command line (see sys.argv()) and use the matrix name to open the appropriate results files.


Biol4230 Schedule