PatScan offers three choices:
There are many options that can be employed in constructing a pattern to scan for. We suggest you consider the following simple examples:
GPXGXXXXXXXXXGXXXXXNDGXXXXXXXXXXXPThis is a pattern composed of a single pattern unit. The pattern unit is made up of a 34-character region. Eight characters must match exactly, the others can be anything. This is actually not a bad characterization of a fairly interesting class of proteins, but one would probably like to allow some mismatches (i.e., one would like a slightly less stringent pattern).
GPXGXXXXXXXXXGXXXXXNDGXXXXXXXXXXXP[1,0,0]This pattern will allow a single mismatch out of the eight specified positions.
up to 1 mismatch up to 0 "deletions" and up to 0 "insertions".A "deletion" means that a character in the input pattern is skipped.
Thus, if you matched
GYHVLMMAWG[2,1,0]against
AGVGPPGGYAVCMAWGKRSTVLMthe matched subsequence would be
GYAVCMAWG.NOTE: You cannot leave a space between the string of amino acid codes and the qualifier. Thus,
GYHVLMMAWG [2,1,0]is invalid.
any(TS) 1...1 GP 1...1 G 4...4 any(LFIVM) 4...4 G 5...5 NDG 10...11 PThis pattern contains 13 "pattern units". A match is successful only when each pattern unit is successfully matched.
Here we see two new kinds of pattern units.
any(TS)will match either a T or an S in the first position. Similarly,
notany(TS)would match anything but a T or an S.
The
Min...Maxnotation is used to indicate any character string of the designated length. For example,
10...12matches any 10, 11 or 12 character string.
There are a number of other useful types of pattern units, but this should be enough to get you started. For more information see Rules for Forming Patterns.
Note that complex patterns could often match against a number of overlapping areas of a sequence: only the first would be reported (after a successful match, the matching algorithm picks up at the first character past the matched substring).
all any(TS) 1...1 GP 1...1 G 4...4 any(LFIVM) G sp|P02461|CA13_HUMAN:[1120,1131]: S P GP A G QQGA I G sp|P02461|CA13_HUMAN:[1132,1143]: S P GP A G PRGP V G sp|P02463|CA14_MOUSE:[1095,1106]: S P GP R G SPGN I G sp|P02465|CA21_BOVIN:[158,169] : S V GP V G PAGP I G sp|P04258|CA13_BOVIN:[412,423] : S P GP R G QPGV M G sp|P04258|CA13_BOVIN:[964,975] : S P GP A G HQGA V G sp|P04258|CA13_BOVIN:[976,987] : S P GP A G PRGP V G sp|P08125|CA1A_CHICK:[80,91] : S P GP Q G PPGP L G sp|P13941|CA13_RAT:[304,315] : S P GP A G PRGP V G . . . sp|Q02388|CA17_HUMAN:[2720,2731]: S A GP P G PPGS V G sp|P05997|CA25_HUMAN:[769,780] : T P GP K G DRGG I G sp|P22138|RPA2_YEAST:[53,64] : T E GP D G GLLN L G sp|P42382|CH60_EHRCH:[29,40] : T A GP K G LTVA I G sp|Q01149|CA21_MOUSE:[749,760] : T K GP K G ENGI V G COMPLETED REQUEST
So, try out some patterns and see what you get. Note that we limit the maximum number of reported hits; you can override the maximum, but we suggest that you only do so once you know that you really want to see a truly large number of matches.