Command line unix (Linux) (19-Jan-2018)


This will be your homework for Monday, Jan 22. My goal is to be certain that (1) you can log into a Unix machine, (2) run some simple commads to create files and directories, (3) to edit a file using emacs (but you can use a different editor), (4) transfer a file from your laptop to Unix, and (5) use the 'curl' command to transfer files from a web site to a unix file.

  1. Get an ITS unix account
  2. On your computer, login to your account on interactive.hpc.virginia.edu or interactive.hpc.virginia.edu (use whichever one works; both computers access the same account directories).

    Outside UVA, you MUST use Cisco Anyconnect

  3. Look at your $PATH variable (used for finding programs)
    echo $PATH
    

  4. list the contents of the file
    cat path.file
    more path.file
    less path.file  # type "q" to return to the shell prompt
    

  5. Make a copy of the file
    cp path.file path.copy

  6. Make a sub-directory (folder) called "biol4230"
    mkdir biol4230
    
    Move into that directory with cd biol4230
    Make a new subdirectory in biol4230 called hwk1

    To make certain that I can read files in your biol4230 directory:

    chmod go+rx ~  # <- tilde (shift-backtic, under ESC on my keyboard)
    chmod go+rx ~/biol4230
    chmod go+rx ~/biol4230/hwk1
    

  7. Move the path.copy file into the biol4230/hwk1 folder

  8. List the contents of the biol4230/hwk1 folder
    cd ~/biol4230/hwk1
    ls
    ls -l
    
    What is the extra information in the second listing?

  9. Save the contents (directory listing) of the data folder to a file using the same strategy you used to create path.file.

  10. Use the emacs text editor to edit the directory listing file. Make several copies of some of the lines in the file, and save it.

    Check the contents of the biol4230/hwk1 directory.

  11. Transfer file of accessions from your laptop to interactive.hpc.virginia.edu.
    1. At the NCBI web site, look up: glutathione S-transferase AND human[orgn] AND srcdb_refseq[prop] in the protein database.
    2. Use the Send to link on the search result page to send to a file the accessions.
    3. use scp (Mac) or SecureFX (Windows) to copy the file of accessions to interactive.hpc or interactive.hpc.

  12. Search for glutathione S-transferase in the SwissProt database by using srcdb_swiss_prot in place of srcdb_refseq, and then download a file of accessions. Transfer this file to interactive.hpc.

  13. Use the curl command (on interactive.hpc) to download a sequence from uniprot:
    curl http://www.uniprot.org/uniprot/P09488.fasta
    
    1. Check to see if the sequence has appeared in your directory.
    2. Look at the sequence (file). Is it in FASTA format?

  14. Put all the scripts and results files in a directory on interactive.hpc (or interactive.hpc) called "biol4230/hwk1" and be certain that I can read it.
    cd                # go to home directory
    chmod go+rx .     # make it readable by others
    chmod go+rx biol4230/  # make biol4230 readable
    chmod go+rx biol4230/hwk1  # and hwk1
    
    cd biol4230/hwk1
    chmod go+r * # make all files in the directory readable by others (me) 
    

  15. When your homework is done, I should see:
    1. Your biol4230/hwk1/path.file
    2. a directory listing with several duplicated lines in it (from part 10, emacs)
    In addition, from the lecture handout, I should see:

    (0) A file called hwk1.notes, which lists the names of each of the files listed below and gives the answer to part (10).

    (1-3) A fasta format file for GSTM1_HUMAN/NP_000552 from NCBI

    (4a) file with a list of accessions for human refseq glutathione transferases

    (4c) list of accessions for human SwissProt glutathione transferases from Uniprot

    (5) file with a list of NP_ human refseq accessions: gst_refseq.NP_only

    (6) the GST fasta file transferred with get_ncbi_acc.sh

    (7) a file of NCBI Swissprot accessions for GSTs

    (8) a file with version numbers removed from file (7)

    (9) a summary of the differences between the Uniprot and SwissProt GST accessions

    (10) the name of a different (between NCBI and Uniprot) SwissProt protein, and a possible explanation for the inclusion/exclusion from the NCBI/UniProt query (in hwk1.notes)


    Biol4230 Schedule