For Lower threshold estimation chose adjusted information content(auto)*
You can leave everything else as default
The Information page will open giving you a list of your sequences and their length, as well as information about you matrix
Further down it shows you a table with information on in which sequence the motif was found, including position, score, and ln(P).
Change the Display limits to go from -800 to 0, And Go.
This will take a while
You will see a figure of each of your sequences and the TF1 binding sites (PSSM hits) shown in blue Right now it shows all hits independent of significance
Click Go
See that the table with hits is much shorter now
Change the Display limits to go from -800 to 0
And Go
The figure will now only display sequences with significant hits
The Matrix scan function can be used to look for binding sites of multiple TFs and cis-regulatroy modules:
As Background model chose Markow order 1a
From the boxes below pick Individual sequences, click on site, pval, rank, and limits Set Lower threshold for p-value to 0 and higher threshold to 0.0001
Go
This will take a while
The table shows you the individual hits for each TF binding site in each sequence together with location, p-valueb etc.
Change the Display limits to go from -800 to 0
And Go
This will take a while
You will see a figure of each of your sequences and the binding sites (PSSM hits) in different colors
Change the lower threshold of crer_sig to 0 and leave everything else at default
Note that these are very permissive parameters with high false positive rate, but we chose them to have a first look if we might have any CRERs at all
Go
The result table is huge (because of our permissive parameters) but we found some CRERs
Change the Display limits to go from -800 to 0 and un-check the box for legend
And Go
You will see a lot of CRERs depicted as red boxes in your sequences
Set Lower threshold for p-value to 0 and higher threshold to 0.0001 and set crer_sig to 2c
Go on with Feature Map
Change the Display limits to go from -800 to 0
And Go
You will see the individual binding sites and predicted CRERs
a 1 means that your background model accounts for the frequencies of di-nucleotides like CpG; 0 would just count all 4 nucleotides independently of each other; 2 would account for tri-
b p-value for each site and PSSM tells how likely it is to get the score by chance, note that your p-value threshold determines the number of false positives you allow/expect: i.e. p<0.001 gives one false prediction every 1kb
c with this p-value you expect less than 1 false positive site within 5.5kb and with crer_sig = 2 you expect 1 false positive for 100 tested CRERs