Locating Patterns in Protein Sequences

To submit a request, you need to give PatScan

Protein Sequence choices:

PatScan offers three choices:

Pattern

There are many options that can be employed in constructing a pattern to scan for. We suggest you consider the following simple examples:

  1. GPXGXXXXXXXXXGXXXXXNDGXXXXXXXXXXXP
    
    This is a pattern composed of a single pattern unit. The pattern unit is made up of a 34-character region. Eight characters must match exactly, the others can be anything. This is actually not a bad characterization of a fairly interesting class of proteins, but one would probably like to allow some mismatches (i.e., one would like a slightly less stringent pattern).

  2. GPXGXXXXXXXXXGXXXXXNDGXXXXXXXXXXXP[1,0,0]
    
    This pattern will allow a single mismatch out of the eight specified positions.
    The [1,0,0] qualifier specifies:
    	up to 1 mismatch
    	up to 0 "deletions" and
    	up to 0 "insertions".
    
    A "deletion" means that a character in the input pattern is skipped.
    An "insertion" means that a character in the matched string is skipped.

    Thus, if you matched

    	GYHVLMMAWG[2,1,0]
    
    against
    	AGVGPPGGYAVCMAWGKRSTVLM
    
    the matched subsequence would be
    	GYAVCMAWG. 
    
    NOTE: You cannot leave a space between the string of amino acid codes and the qualifier. Thus,
    	GYHVLMMAWG [2,1,0]
    
    is invalid.

  3. any(TS) 1...1 GP 1...1 G 4...4 any(LFIVM) 4...4 G 5...5 NDG 10...11 P
    
    This pattern contains 13 "pattern units". A match is successful only when each pattern unit is successfully matched.

    Here we see two new kinds of pattern units.

    	any(TS)
    
    will match either a T or an S in the first position. Similarly,
    	notany(TS)
    
    would match anything but a T or an S.

    The

    	Min...Max
    
    notation is used to indicate any character string of the designated length. For example,
    	10...12 
    
    matches any 10, 11 or 12 character string.

There are a number of other useful types of pattern units, but this should be enough to get you started. For more information see Rules for Forming Patterns.

Note that complex patterns could often match against a number of overlapping areas of a sequence: only the first would be reported (after a successful match, the matching algorithm picks up at the first character past the matched substring).

Results

Results should look something like:
	all
	any(TS) 1...1 GP 1...1 G 4...4 any(LFIVM) G
	sp|P02461|CA13_HUMAN:[1120,1131]:  S P GP A G QQGA I G
	sp|P02461|CA13_HUMAN:[1132,1143]:  S P GP A G PRGP V G
	sp|P02463|CA14_MOUSE:[1095,1106]:  S P GP R G SPGN I G
	sp|P02465|CA21_BOVIN:[158,169]  :  S V GP V G PAGP I G
	sp|P04258|CA13_BOVIN:[412,423]  :  S P GP R G QPGV M G
	sp|P04258|CA13_BOVIN:[964,975]  :  S P GP A G HQGA V G
	sp|P04258|CA13_BOVIN:[976,987]  :  S P GP A G PRGP V G
	sp|P08125|CA1A_CHICK:[80,91]    :  S P GP Q G PPGP L G
	sp|P13941|CA13_RAT:[304,315]    :  S P GP A G PRGP V G
	.
	.
	.
	sp|Q02388|CA17_HUMAN:[2720,2731]:  S A GP P G PPGS V G
	sp|P05997|CA25_HUMAN:[769,780]  :  T P GP K G DRGG I G
	sp|P22138|RPA2_YEAST:[53,64]    :  T E GP D G GLLN L G
	sp|P42382|CH60_EHRCH:[29,40]    :  T A GP K G LTVA I G
	sp|Q01149|CA21_MOUSE:[749,760]  :  T K GP K G ENGI V G
	COMPLETED REQUEST

So, try out some patterns and see what you get. Note that we limit the maximum number of reported hits; you can override the maximum, but we suggest that you only do so once you know that you really want to see a truly large number of matches.