Aedes aegypti cytochrome P450s

 

Oct. 18, 2005 under revision April 21, 2006 (in progress)

Revision continues May 19, 2006 to June 25, 2006

Compiled by David Nelson and David Drane

 

The completed and named sequences are here

(http://drnelson.utmem.edu/AedesFasta.June25.htm)

This file is more archival with detailed information.

Please see the FASTA file above.

 

Useful links for analysis

http://www.ncbi.nlm.nih.gov/Traces/trace.cgi  Trace Archive at NCBI

http://trace.ensembl.org/perl/traceview Trace files at Ensemble

http://132.192.64.52/blast/P450.html P450 Blast server

http://www.proweb.org/proweb/Tools/WU-blast.html Do-it-yourself WU Blast

http://www.bioinformatics.vg/bioinformatics_tools/JVT.shtml DNA translator

http://ncbi.nlm.nih.gov/BLAST/Blast.cgi?CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&ALIGNMENTS=50&ALIGNMENT_VIEW=Pairwise&CLIENT=web&DATABASE=nr&DESCRIPTIONS=100&ENTREZ_QUERY=%28none%29&EXPECT=10&FILTER=L&FORMAT_OBJECT=Alignment&FORMAT_TYPE=HTML&GENETIC_CODE=0&NCBI_GI=on&PAGE=Translations&PROGRAM=tblastn&SERVICE=plain&SET_DEFAULTS.x=23&SET_DEFAULTS.y=10&SHOW_OVERVIEW=on&UNGAPPED_ALIGNMENT=no&END_OF_HTTPGET=Yes&SHOW_LINKOUT=yes&GET_SEQUENCE=yes   NCBI TBLASTN search

http://www.ncbi.nlm.nih.gov/BLAST/tracemb.shtml NCBI megablast

http://www.tigr.org/tigr-scripts/tgi/T_index.cgi?species=a_aegypti TIGR Aedes gene index page

 

206 Aedes sequences here including 142 complete sequences. 

Numbers in () are intron phases. Names have not been assigned for most genes.

 

Sequences collected and assembled by David Drane and David Nelson from July to Sept.

2005. 3.5 million of 15 million trace file sequences were downloaded from NCBI and

placed on a stand alone BLAST server on a Mac G4 for TBLASTN searches

at expect value of 10.  The WGS section of Genbank was searched and 220 AAGE01XXXXXX

accession numbers are given at the end of this file.  The TIGR Gene Index was

searched for text “P450”.  The EST section of Genbank was searched and

discontiguous megablast was used to extend sequences by chromosome walking.

Most sequences should be represented here now, but not all are assembled. 

The Aedes mosquito seems to have more P450s than the Anopheles mosquito. 

 

This file is in progress.  The CYP4 and CYP325 families are not yet fully assembled

because there are some large introns in these sequences.

The sequences are presented in clan groups: the CYP2, CYP3, CYP4 and mitochondrial

clans.  Note: Aedes has a CYP18 that was not found in Anopheles.

CYP329 of Anopheles now looks like it is a pseudogene of a CYP9 sequence.

It is short in the heme signature and it has a P at the critical T in the I-helix

oxygen binding pocket.  It is the only sequence that is in the CYP3 clan that does not

fall inside the CYP6 or 9 families in Anopheles.

There are 11 complete sequences in the CYP2 clan (CYP15, 18, 303, 304, 305, 306:phm, 307)

Phantom phm is one of the Halloween genes.

There are 76 complete sequences in the CYP3 clan (CYP6, CYP9)

There are 34 complete sequences in the CYP4 clan (CYP4 , CYP325)

There are 9 complete sequences in the mitochondrial clan (CYP12, 49, 301, 302:dib,

314:shd, 315:sad) These include three of the Halloween genes disembodied dib, shade shd,

shadow sad.

There are 21 pseudogenes so far.

There are 15 partial sequences (not including the pseudogenes).

 

CYP2/CYP18 clan sequences

 

>514720743 753475610 750240311 possible CYP15 N-term = DR747015.1 EST

MWQNLVVLIIFVILFCLRDMRKPGYFPP (1)

 

>CYP15B like 585964866 641740723 584363040

 78 GPNWFPLIGSGFEVFRLVKHFKFYHLMWAELMRRYGPIVGLRLGRDRVVIVSGLDA 257

258 IREVYSKDQFDGRPDGFFFRIRSFDKRLGVVFTDGAHWDIQRRFSVRTLKALGMGRTGMV 437

438 NSLEREAEEMIHHLRKLSRTQKVISMHNAFDVSVLNSIWTLIAGKR (2) 575

    FDLDDKKLEWIMETIHKSFRVIDMSGGVLNQFPPIRYVLPDKSGFAPLLNLLSPLWTFLQ 816

 

>CYP15B like seq DR746695.1 adult female corpora allata cDNA

813859354 749484786 522065275 514869301 520643713

GTIKSIRSKLDQPDNPDCFIASYLRELNIAERHSSFTNEQLLCLCLDL

FQAGSETTSNTLGYGIAHMLHHPEIVQKIHNELDSVIGRYRLPLLADRPYLPYTEAVLCE

IQRISNIAPLAIAHRTVAPVQLGTYVIPKNTITLISLYSLHMDKAYWGDPEVFRPERFLN

ETGDKLIAHEYFVPFGS (1)

GKRRCLGESLAKSSLFLFFTAFMHAFLVEPAEPGKLPELDGIDGITLSPCPYFVQLKERLI*

 

>possible complete CYP15B1 assembled from parts 52% to 15B1 from Anopheles

AAGE01116789 AAGE01129498

Used trace archive seqs to verify seq at PLLNLLRPLWTFLQ

This region is not accurate in AAGE02003241.1

638470554 

823375362 

593712263 

586030336 

641740723 

569671400 

used AAGE02003241.1 for the C-term seq changes

MWQNLVVLIIFVILFCLRDMRKPGYFPP (1)

GPNWFPLIGSGFEVFRLVKHFKFYHLMWAELMRRYGPIVGLRLGRDRVVIVSGLDA

IREVYSKDQFDGRPDGFFFRIRSFDKRLGVVFTDGAHWDIQRRFSVRTLKALGMGRTGMV

NSLEREAEEMIHHLRKLSRTQKVISMHNAFDVSVLNSIWTLIAGKR (2)

FDLDDKKLEWIMETIHKSFRVIDMSGGVLNQFPPIRYVLPDKSGFAPLLNLLRPLWTFLQ (0)

GTIKSIRSKLDQPDNPDCFIASYLRELNIAERHSSFTNEQLLCLCLDL

FQAGSETTSNTLGYGIAHMLHHPEIVQKIHNELDSVIGRYRLPLLADRPYLPYTEAVLCE

IQRISNIAPLAIAHRTVAPVQLGTYVIPKNTITLISLYSLHMDKAYWGDPEVFRPERFLN

ETGDKLVAHEYFVPFGS (1)

GKRRCLGESLAKSSLFLFFTAFMHAFLVEPAEPGKLPELDGIDGITLSPCPYYVQLKERLI*

 

>567527404  46% to CYP15B1 may be a CYP15 pseudogene

XXXXXXXXXXXXXXXXLVRRFRFYHHTCAAFMCLYRPIVDLRMGRDRVVIMTGLDP

I*KVYSKDEKENRPVGFFFRIRSFDKRLAVVFTDGAHWDIQRRFSVRTLKALGMGRTGLV

SSLEREAEEMIHHLRKLSRTQKVISRNNAFDVSVLNSIWTLIAER

 

>CYP18A1 AAGE01025833 AAGE01338874.1 AAGE01065191.1 529463664 572557122

66% to 18A1 (note: CYP18 not seen in Anopheles) complete

revised at cyan aa based on AAGE02007615.1

     MFLDTYLLGVVRQEFFDASKARST

2678 LLVFCCTLSCVVFLQWLFRLVCQIKKLPPGPWGVPIFGYLTFIGHEKHTQYMKLARKYG 2502

2501 SLFSAKLGAQLTVVISDYKIIREAFKTEDFTGRPHSPLLKTLGGF (1?)

     GIINSEGQLWKDQRRFLH 719

 718 EKLRHFGMTVLGNKKHLMESRIM (0)

 534 TEVAELLASLNEVGSQSTDLSKYLSVSVSNVICNIIMSVRFSLEDPKFKRFNWLIEEGMRLF 359

 358 GEIHTIDYIPQIQYLPGNINAKNKIAKNRQEMFDFYREVIDEHKRSFNAENIRDIVDAYL 179

 178 DEIQKAQAEGRDQELFDGKDH 116

     EIQMMQVIADLFSAGMETIKTTLLWLNVFMLRHPDAMKRVQDELDQVVGRNRLPKIEDVP 406

     YLPITETTILEVMRISSIVPLATTHSPKS (2) 319

2116 DVVINGYTIPAGSYVVPLINSVHMDPTLWDKPEEFNPSRFLDAEGKVHKPDFFIPFGVGR  1937

1936 RRCLGDVLARMELFLFFASIMHTFTIELPEDEPMPSLKGIIGVTISPQAFRVKLIPRPLN  1757

1756 ADLDRLRNVGSC*  1718

 

>AAGE01098313 (upper seq) CYP18 like fragment probable pseudogene

Query:  1703 ASLNEVGSRSS 1671

             ASLNEVGS+S+

Sbjct:   210 ASLNEVGSQST 220

 

Query:   313 VGSQSTELSKYLSVLVSNVICNIIMSVRFSLEDPKF--------------GGMHTIDYIP 176

             VGSQST+LSKYLSV VSNVICNIIMSVRFSLEDPKF              G +HTIDYIP

Sbjct:   215 VGSQSTDLSKYLSVSVSNVICNIIMSVRFSLEDPKFKRFNWLIEEGMRLFGEIHTIDYIP 274

 

Query:   175 QIQYLPGNV-------KNRQEMFDIYREVINEHKRSFNAENIRDIV 59

             QIQYLPGN+       KNRQEMFD YREVI+EHKRSFNAENIRDIV

Sbjct:   275 QIQYLPGNINAKNKIAKNRQEMFDFYREVIDEHKRSFNAENIRDIV 320

 

Query:    56 AYLDEILKAQAE 21

             AYLDEI KAQAE

Sbjct:   322 AYLDEIQKAQAE 333

 

>AAGE01227048 (upper seq) CYP18 like fragment probable pseudogene

Query:   643 ASLNEVGSQPIDLNKYLSVSVSNVICNIIMSVRFSLEDPKFA-------------G-LHT 780

             ASLNEVGSQ  DL+KYLSVSVSNVICNIIMSVRFSLEDPKF              G +HT

Sbjct:   210 ASLNEVGSQSTDLSKYLSVSVSNVICNIIMSVRFSLEDPKFKRFNWLIEEGMRLFGEIHT 269

 

Query:   763 FAGLHTIDYIPQIQYLPGNV-------KNRQEMFDFYREMIDEHKQSFNAENIRDIV 912

             F  +HTIDYIPQIQYLPGN+       KNRQEMFDFYRE+IDEHK+SFNAENIRDIV

Sbjct:   264 FGEIHTIDYIPQIQYLPGNINAKNKIAKNRQEMFDFYREVIDEHKRSFNAENIRDIV 320

 

Query:   915 AYLDEILKAQAEDRDQELFEGKDHEI 992

             AYLDEI KAQAE RDQELF+GKDH++

Sbjct:   322 AYLDEIQKAQAEGRDQELFDGKDHDV 347

 

>CYP303A1 AAGE01109944 641807020 834983680 618119317

834966118 826136105 587934965

72% to 303A1 complete

MYWYYLACFIVVFIIFLYLDCIKPANFPPGPKWYPIIGSAIEIARARQKTGMLCKAIKLIASKYDHKGVIGF

KVGKDKTVMAISGDSLREMMSNEDLDGRPTGIFYETRTWGLRRGVLLTDEEFWQEQRRFI

VRHLKEFGFARKGMAEIIGNEAEYVKNDFHALVKAGNGKALVQMQSAFSVYILNTLWLMM

AGIRYTRENKDLKYLQSLLHELFANIDMMGALFSHFPFIRFFAPRLSGYKQFVEIHNLMH

KFIGAEVENHKKSFNDTDEPRDLMDVYLKILQSNRDIPESFSQEQLLAVCLDMFIAGSET

TTKTLGFAFLHLVRQRETQLKVQKELDEVVGRNRLPTLEDRVN (2)

LPYCEAVVLEALRMFMANTFGIPHRALRDTKLCGYDIPK (0)

DTMLVGMFRGMMLNDWESPTSFKPERFLKGGKIVIPPNFHPFGVGRHRCMGEMMGKAN 110

LFLFITTLFQSFDFLVPEGYPIPSDEPIDGATPSVRQYTALIVPR*

 

>581536484 803281860 586608108 826028980

574131458 595148561 754352758 590136340 519840563 753671460

67% to 512982119 above 58% to 304B anoph

tried walking the chromosome down to exon 2 so

some numbers above are in the intron

MLLNPSAILWTVAAGLLIYRCFRFMFDRPPNFPSGPPRFPLLGSYLVLLMVNYRHLHR

AAVRLGQFYRTKILGIYLGDFPSIVVNDLAIAKEVLARSEFDGRSDLFLARMRERNFQRR

GIFFTDGPHWKEQRRFVLRHLRDYGFGRRFDELEAETRSELMTLLDVLRYGPKFEHERLF

AKDGCVKCPDAFYGLLGNVYFQVICGERFQRKDMAQLYE (1)

 

>223483644 519671636 528946489 494183870

a second 304B like C-term sequence 57% to 304B

TGRHAINFQQKGDDYGTILSYLPWLKDYFPEATNYRILREVNNRMNDLIEAMVQKY

LASYDENHMRCFLDRYIYEMKQSKPLEGDAFTFQ (1)

YDQLVMILWDMLLPTLSGSAIQLSMLLERLLLNPRVATKVQQELDGVVGHGRLPTLDDRV 889

NLPYAEATLLEALRIDTLVPSGISHVALEDTKLCGYDIPKGCFVMLSLDVINNQREFWGDP

ENFRPERFLDESGKLSLKKDISVPFGGGKRLCVGETFSRNTLFLMFTALMQNFNIKPRPG

DPLPDLGQRITGVVTSMEPFWLRFEAR*

 

>CYP304B2xx Possible full length gene joining the 512982119 and 223483644 fragments complete

note this is a hybrid of two different genes, see corrected seq below

AAGE01051934

MFVTPTIFLWLVTIGLIAYRCHRFLFDRPKNFPDGPPKFPLLGGYAVMLLINFYHLHR

AANKLCEYYRTKILGIYLGNFPTVIVNDFATVKEVLNRVEFDGRPDLFIARMREKNFLLR

GIFFTQGPDWKEQRRFILRYLRDYGFGRRFDELEAETNAEILTLVEMLRYGPRHEHETEF

MTKDGCAMVPNVFFACFANAFLYVLTGERINRDEAGALFE

TGRHAINFQQKGDDYGTILSYLPWLKDYFPEATNYRILREVNNRMNDLIEAMVQKY

LASYDENHMRCFLDRYIYEMKQSKPLEGDAFTFQ (1)

YDQLVMILWDMLLPTLSGSAIQLSMLLERLLLNPRVATKVQQELDGVVGHGRLPTLDDRV 889

NLPYAEATLLEALRIDTLVPSGISHVALEDTKLCGYDIPKGCFVMLSLDVINNQREFWGDP

ENFRPERFLDESGKLSLKKDISVPFGGGKRLCVGETFSRNTLFLMFTALMQNFNIKPRPG

DPLPDLGQRITGVVTSMEPFWLRFEAR*

 

>CYP304B3yy/xx top part = my old Byy, bottom = my old Bxx + 1 aa diff

DW987682.1 EST supports this assembly, so mine are hybrids

AAGE02028825.1 revised seq on 4/20/06

46553 MLLNPSAILWTVAAGLLIYRCFRFMFDRPPNFPSGPPRFPLLGSYLVLLMVNYRHLHRAAVRLGQFYRTKILGIYLGDFP

SIVVNDLAIAKEVLARSEFDGRSDLFLARMRERNFQRRGIFFTDGPHWKEQRRFVLRHLRDYGFGRRFDELEAETRSELM

TLLDVLRYGPKFEHERLFAKDGCVKCPDAFYGLLGNVYFQVICGERFQRKDMAQLYE (2) 47203

61403 TGRHAINFQQKGDDYGTILSYLP

WLKDYFPEATNYRILREVNNRLNDLIEAMVQKYLASYDENHMRCFLDRYIYEMKQSKPLEGDAFTFQ (1) 61672

61872 YDQLVMILWDMLL

PTLSGSAIQLSMLLERLLLNPRVATKVQQELDGVVGHGRLPTLDDRVNLPYAEATLREALRIDTLVPSGISHVALEDTKL

CGYDIPKGCFVMLSLDVINNQREFWGDPENFRPERFLDESGKLSLKKDISVPFGGGKRLCVGETFSRNTLFLMFTALMQN

FNIKPRPGDPLPDLGQRITGVVTSMEPFWLRFEAR* 62501

 

 

>512982119 637789748 834948129 570603901 750442192 570627554 568540398

743856885 581525309 637183809 812171267 586112683 570800380

579961153 793213948 581533371 587665129 570695804 574007683

60% to 304B1anopheles numbers above include a long chromosome

walk of about 5-7kb, about 500 bp per step.  No C-term was found

N-term exon is 55% to 304B anoph. and 48% to 304C anoph.

MFVTPTIFLWLVTIGLIAYRCHRFLFDRPKNFPDGPPKFPLLGGYAVMLLINFYHLHR

AANKLCEYYRTKILGIYLGNFPTVIVNDFATVKEVLNRVEFDGRPDLFIARMREKNFLLR

GIFFTQGPDWKEQRRFILRYLRDYGFGRRFDELEAETNAEILTLVEMLRYGPRHEHETEF

MTKDGCAMVPNVFFACFANAFLYVLTGERINRDEAGALFE

 

>CYP304B 494544931 512720460 41% to 476322188 72% to 304B1

827562306 594336057 512633341

(2) TGKYAMMFQRTGDDYGTIYSLLPWMRHLFPNRTRYRTIREGSLGVNRFIESII

QKRLETHEEGHVRCFLDLYFTEMKKTVPRTEDNRFTFQ (1)

HDQLVLGIVDFFFPAISGATTQ

IALLLERLLWHPEVVQKMQAEIDDVVGHGRLPTLDDRINLPYTEATLREAMRI

DTLVPSGVAHMAMKDTTLRGYDIPKDTILVLGLDSIHMQKDIWGDPERFRPERFLNYRGE

LSLSKDVSVPFGAGKRLCAGETFARNTMFLIVSALVQNFNIRQRLGDKLPDMGKRSTGII

ISPADYWVKFEPR*

 

>CYP304Byy AAGE01029809 Possible full length gene joining

the 581536484 and 494544931 fragments complete

note this is a hybrid of two different genes, see corrected seq below

MLLNPSAILWTVAAGLLIYRCFRFMFDRPPNFPSGPPRFPLLGSYLVLLMVNYRHLHR

AAVRLGQFYRTKILGIYLGDFPSIVVNDLAIAKEVLARSEFDGRSDLFLARMRERNFQRR

GIFFTDGPHWKEQRRFVLRHLRDYGFGRRFDELEAETRSELMTLLDVLRYGPKFEHERLF

AKDGCVKCPDAFYGLLGNVYFQVICGERFQRKDMAQLYE (1)

(2) TGKYAMMFQRTGDDYGTIYSLLPWMRHLFPNRTRYRTIREGSLGVNRFIESII

QKRLETHEEGHVRCFLDLYFTEMKKTVPRTEDNRFTFQ (1)

HDQLVLGIVDFFFPAISGATTQ

IALLLERLLWHPEVVQKMQAEIDDVVGHGRLPTLDDRINLPYTEATLREAMRI

DTLVPSGVAHMAMKDTTLRGYDIPKDTILVLGLDSIHMQKDIWGDPERFRPERFLNYRGE

LSLSKDVSVPFGAGKRLCAGETFARNTMFLIVSALVQNFNIRQRLGDKLPDMGKRSTGII

ISPADYWVKFEPR*

 

>CYP304Bxx/yy top part = my old Bxx, bottom = my old Byy

AAGE02028825.1 revised accurate seq 4/20/06

22307 MFVTPTIFLWLVTIGLIAYRCHRFLFDRPKNFPDGPPKFPLLGGYAVMLLINFYHLHRAANKLCEYYRTKILGIYLGNFP

TVIVNDFATVKEVLNRVEFDGRPDLFIARMREKNFLLRGIFFTQGPDWKEQRRFILRYLRDYGFGRRFDELEAETNAEIL

TLVEMLRYGPRHEHETEFMTKDGCAMVPNVFFACFANAFLYVLTGERINRDEAGALFE (2) 22960

35137 TGKYAMMFQRTGDDYGTIYSLL

PWMRHLFPNRTRYRTIREGSLGVNRFIESIIQKRLETHEEGHVRCFLDLYFTEMKKTVPRTEDNRFTFQH 35412

35469 DQLVLGIVDF

FFPAISGATTQIALLLERLLWHPEVVQKMQAEIDDVVGHGRLPTLDDRINLPYTEATLREAMRIDTLVPSGVAHMAMKDT

TLRGYDIPKDTILVLGLDSIHMQKDIWGDPERFRPERFLNYRGELSLSKDVSVPFGAGKRLCAGETFARNTMFLIVSALV

QNFNIRQRLGDKLPDMGKRSTGIIISPADYWVKFEPR* 36092

 

>CYP304C1 AAGE01104491 512990636 572473586 613989430 64% to CYP304C1

749978894 754492027 584954719 complete

MVLISELIIAALLGLLIYRFYRYLFERPSENFPPGPPRL

PLLGGYPFMLALNYKHLHKAAARLSQLYKSKLIGLYLGPLPAVIVNDYDTVKEVLTRPEF

DGRPDLFMARLRDQHFQRR (1)

GIFFTDSESWREQRRFFLRTLHHFGFGRRSPEAEADIQAGLEDVISLLRDGPKYEHEKAL

VDSAGFALCPTVFFAVFSNVLLRMIVGVRLAREDQAVMFE

VGKNAIAFHRNGDDYGMLLSYIPWIRHLFPKTTKYDLLRKVNQQANAVILSLAQKCES

SYDENDIRCLVDAYIQEMRATGSKGESTGKDEFGFQ (1)

YDQLVIGAADFLVPPFSAIPAKICLILERLIQYPEVQTKMYRELNEVVGLNRLPTLDDRA

DLPYCDAVIREGLRIDALVPSGIPHMAVTDTQLNGYQIPKGTVIVNSLEFIHHQPEIFRD

PDSFMPERFLTPDGKLALDQDKTLPFGAGKRVCGGEQFARNALFLGVTSLVQNFTFQ

LPAGRACPDLDGRITGVIQTTPDFRLKFVSRR*

 

>CYP305A6 AAGE01041187 494160882  476322188 754462117 mate pair = 754369970

which is an exact match to part of AAGE01202372

65% to 305A2 825745101 613940462

AAGE01202372.1 N-term exon for CYP305A complete

1435  MITLVLSSVVIVSFIFWLWQDLQRPPNFPP (1) 1346

GPKWLPFFGNTLLIRNLARISGGQHLAFEALSKQYKSPVIGLKLGREHVVVALQ

YPAVHEALTKEAFDGRPDNFFIRLRTMGTR (2)

LGITFTDGPFWTEHNSFVVR

HLRQAGYGRQPMQLQIQNELNELIGIIRDLDSEPVWPGSILPTSVINVLWTFTTGSRIPR

DDQRLTRLLKLLQDRSKAFDMSGG

ILSQLPWLRHIAPEWTGYNLINRFNQEIHEFFKATIEKHHQDYTEEKCSDDLIYAFIK

EMKERKDDPCSTFTDVQLSMIILDIFIAGSQTTSTTIDIALMILAMNTEIQRKIYAEIDD

NFHPDEIPDQNCRTNLQYTEAFLLEVMRLYQIAPIGGPRRALSDCTLGGYRIPRNTTILM

GLHTVQMDPDHWGDPENFRPERFIGPDGKIINTERLIPFGLGRRRCLGDSLARSCMFTFL

VGILQKFSLRLPDSLEGPSLKLTPGITLSPKPYKVVFEPRLK*

 

AAGE02003241.1

24317  MITLVLSSVVIVSFIFWLWQDLQRPPNFPP (1)  24228

13700  GPKWLPFFGNTLLIRNLARISGGQHLAFEALSKQYKSPVIGLKLGREHVVVALQYPA  13530

13529  VHEALTKEAFDGRPDNFFIRLRTMGTR (2) 13449

13391  LGITFTDGPFWTEH  13350

13349  NSFVVRHLRQAGYGRQPMQLQIQNELNELIGIIRDLDSEPVWPGSILPTSVINVLWTFTT  13170

13169  GSRIPRDDQRLTRLLKLLQDRSKAFDMSGGILSQLPWLRHIAPEWTGYNLINRFNQEIHE  12990

12989  FFKATIEKHHQDYTEEKCSDDLIYAFIKEMKERKDDPCSTFTDVQLSMIILDIFIAGSQT  12810

12809  TSTTIDIALMILAMNTEIQRKIYAEIDDNFHPDEIPDQNCRTNLQYTEAFLLEVMRLYQI  12630

12629  APIGGPRRALSDCTLGGYRIPRNTTILMGLHTVQMDPDHWGDPENFRPERFIGPDGKIIN  12450

12449  TERLIPFGLGRRRCLGDSLARSCMFTFLVGILQKFSLRLPDSLEGPSLKLTPGITLSPKP  12270

12269  YKVVFEPRLK*  12237

 

>73% to CYP305 above 519967093 521924636 570423900 pseudogene of AAGE01051792

contains a deletion and stop codon

FLPGPQWLPFIGNTPFVRKLARASGGQHLAFEALSKQYNSPVIGLKLGREYVVVALQYPAVREVHSKEEFDGRPDNF

LLKMRLERFVISRLGVTCTDGPFWAEHRNFVVRHLRQAGYGRQ

GIIRDMDGEPVWPGSILPTSVINVLWTFTTGSRIPRDDQRLARLLKLLQDRSKAFDMS

GGVLSQLPWLRHIAPEWTGYNLLKRFNQELHEFFMIIVERHHQEYHEEKCSDDLIYA

FIKEMKDRKDDPSSTFTDLQLTMIILDIFIAGSQTTSITIDLAFMMLTMHTDIQRDTCRN

R*DLHHDEMPSKRSYSLPYTE

 

AAGE02003240.1 this matches 305A5

Sbjct  53574  FLPGPQWLPFIGNTPFVRKLARASGGQHLAFEALSKQYNSPVIGLKLGREYVVVALQYPA  53395

 

Query  61     VREVHSKEEFDGRPDNF-----------------LLKMRLERFVISRLGVTCTDGPFWAE  103

              VREVHSKEEFDGRPDNF                 LLKMRLERFVISRLGVTCTDGPFWAE

Sbjct  53394  VREVHSKEEFDGRPDNFFLRLRTMGTR*DFKL*CLLKMRLERFVISRLGVTCTDGPFWAE  53215

 

Query  104    HRNFVVRHLRQAGYGRQ--------------GIIRDMDGEPVWPGSILPTSVINVLWTFT  149

              HRNFVVRHLRQAGYGRQ              GIIRDMDGEPVWPGSILPTSVINVLWTFT

Sbjct  53214  HRNFVVRHLRQAGYGRQPMQLQIQNELNELIGIIRDMDGEPVWPGSILPTSVINVLWTFT  53035

 

Query  150    TGSRIPRDDQRLARLLKLLQDRSKAFDMSGGVLSQLPWLRHIAPEWTGYNLLKRFNQELH  209

              TGSRIPRDDQRLARLLKLLQDRSKAFDMSGGVLSQLPWLRHIAPEWTGYNLLKRFNQELH

Sbjct  53034  TGSRIPRDDQRLARLLKLLQDRSKAFDMSGGVLSQLPWLRHIAPEWTGYNLLKRFNQELH  52855

 

Query  210    EFFMIIVERHHQEYHEEKCSDDLIYAFIKEMKDRKDDPSSTFTDLQLTMIILDIFIAGSQ  269

              EFFMIIVERHHQEYHEEKCSDDLIYAFIKEMKDRKDDPSSTFTDLQLTMIILDIFIAGSQ

Sbjct  52854  EFFMIIVERHHQEYHEEKCSDDLIYAFIKEMKDRKDDPSSTFTDLQLTMIILDIFIAGSQ  52675

 

Query  270    TTSITIDLAFMMLTMHTDIQRDT-CRNRXDLHHDEMPSKRS-YSLPYTE  316

              TTSITIDLAFMMLTMHTDIQ+        +LH DEMP +    SLPYTE

Sbjct  52674  TTSITIDLAFMMLTMHTDIQKKIHAEIDENLHQDEMPQQNDRTSLPYTE  52528

 

>CYP305A5 AAGE01051792 70% to CYP305A2 but no stop codon N-term exon is one of two choices.

82% to other CYP305 Aedes seq

520611721 836008963 529076567 570690021

AAGE01309663.1 CYP305A N-term exon matches by default since the other CYP305 has an exon 1 sequence complete

     MIVLVLTSVLIIAFSYWLLQELRRPPNYPP (1)

     GPQWLPFIGNTPFVRKLARASGGQHLAFEALSKQYNSPVIGLKLGREYVVVALQYPA 696

 697 VREVHSKEEFDGRPDNFFLRLRTMGTR (2?) 777

 838 LGVTCTDGPFWAEHRNFVVRHLRQAGYGRQPMQLQIQNELNELIGIIRDM 987

 988 DGEPVWPGSILPTSVINVLWTFTTGSRIPRDDQRLARLLKLLQDRSKAFDMSGGVLSQLP 1167

1168 WLRHIAPEWTGYNLLKRFNQELHEFFMIIVERHHQEYHEEKCSDDLIYAFIKEMKDRKDD 1347

1348 PSSTFTDLQLTMIILDIFIAGSQTTSITIDLAFMMLTMHTDIQKKIHAEIDENLHQDEMP 1527

1528 QQNDRTSLPYTEAFLLEVQRFFHIVPVSGPRRALSDCTLGGYQIPKNTTILMGLRTVHMD 1707

1708 PEHWGDPECFRPERFLSPDGKIITTERLIPFGLGRRRCLGESLARACMFTFLVGILQKFS 1887

1888 LRQPANCSEKPSPKLLPGITLSPKPYKVIFEPR* 1986

 

>CYP306A1 570772008 512981304 597667916 641824294 753304856 593374976 574131373

587966306 514783872 514783871 618134500 835036042 803206894 578828539

AAGE01228356 AAGE01635404 AAGE01635520 complete

MYLILGIVLILTYVLWTLLDRRGKPPGPFGLPILGYLPFIDSIKPYETLTNLAKRYG

PVYSLRMGQVDAVVLTAPDLIRDTLKREETTGRAPLFITHGIMGGH (1)

GIICAEGNLWRDQRRLSTEWLRKMGMTKFGPTRATLEARILIGVNELLE (0)

DLRRESEKVFAFDPAPLLHHILGNLMNDIVFGLQYERDDATWRYLQHLQEEGVKHIGVSMAVNFLPFLR (2)

HLPSSKRIIEFLLNGKAKTHKIYDSIIEKQRSRMEGGGSEVSDP

GRHDDCILSNFLQETRRRETGARPELAFCSDVQLRHLLADLFGAGVDTTFTTLRWLILFL

ALNKDAQERLRQEMASQLRGEPCLNDVDSLPYLKACVAEAQRLRTVVPLGIPHGAVS (0)

EITIAGYKVSKNTMIIPLLWSVHMDPSLWPNPDRFDPDRFLDESGQYSAPAHFMPFQT

GKRMCLGDELARMILLLYTGRLFWHFELDVFNGEGLDLTGVCGITLTPPPFEIIFKERV*

 

>CYP307A1 571521703 817504746 824335840 591439033 834970143

TC53059 TC28026 TC50479 78% to CYP307A1 complete

813467047 (exon 1) found by searching with the DNA seq above, 67% to anoph 307A1

246 MAYTLILVALMSLLSVVCYLKVLYEWHRKVRVQTVKSSRYAKKLQKLEESQPQEVEEAP 422

423 VEFPQAPGPYPWPVLGSAAIIGQYPAPFMGFSALAKKYGDVYSIRIGQGQCLVVSSLELI 602

603 REVLNQNGRYFGGRPDFLRYHQLFGGDRNN (1)

SLALCDWSSLQQKRRNLARKHCSPSDASSYYQKMSDVGV

AEMHYFMDQLTDVVTPGQDFKVKPLIMQACANMFSKYMCSVRFEYDDAGFQKMVHSFDEI

FYEINQGYAVDFMPWLAPFYFRHMSKLSSWSNYIRGFILERIVNEREQNLGEDEPERDFT

DALLKSLREDPSVSRDTIMYMLEDFIGGHSAIGNLVMLALGYVAKNPEIGARIQQEIDHV

TDKGLRNVTLYDTESMPYTVATIFEVLRYSSSPIVPHVATENTCIG

GYGVQTGTVVFINNYDLNTSEKYWDHPERFDPSR (2?)

SNESQKQILRVKKNIPHFLPFSIGKRTCIGQNLVRGFSFIMLANILQKYDVHT

NDPAQIKMKPACVAVPPDTYPLAFTQRSQ*

 

>CYP307B1 AAGE01081732 476411966 68% to 307B1 519649910 578920479 complete

revised according to AAGE02011086.1 and AAGE02028078.1 4/20/06

1027 MEKFTIFLFSSNTIYLLVACFLVTLIMLLLEVRQKISVKSDLVKLVKSFLFGQWLSVFTQNNKNRNL 848

847  NDTEVKVLRRAPGPKSYPIIGNLKDLDGYEVPYQAFSVLAKKYGPVVNLKLGVVDAVVIN 668

667  GIEHIKEVLINKAQYFDSRPNFRRYQLLFSGNKEN 533 (1)

     SLAFCDWSEVQKARRDMLVPHTFPRNFSGRFNELNGVINDEIRLVIGESNVNRVIEIK

14  PIIMNICANVFSQYFASHRFELEDPKFQKLVKNFDQIFYEVNQGYAADFLPFLLPLHHR 193

194 NLKRMDQLAEEIREIMLETIINDRYDNWVEGNTENDYVDSLINHVKSKIGPDMEWETALF 373

374 ALEDIIGGHSAVANFLVKTFGYIIQHPEVQQNIQSEVDRVLETEGKHTVDLSDRNHMPYT 553

554 EAVIMEALRLIASPIVPHVANQDSQIG 637 (1?)

685 GYDVPKDTLIFLNNYDLSMSENLWENPNDFVPERFLQNGRLVKPDFFIPFGAGRRS 864

865 CMGYKMTQLISFSIIANLLRSYTITPLSGHSYFVPVGSLAMPEKSYEFQINLRH* 1029

 

CYP3 clan CYP6 related sequences

Note CYP6 and CYP9 sequences (in Anopheles) have only one intron and will be the easiest to assemble. 6AG, 6AH and 6AJ

Are exceptions. 

 

CYP3 clan sequences

CYP6 related, 14 complete, 20 partials

 

>AAGE01198540 494152727 63% to CYP6Z2 67% to AY433537 519918984 574095157 569650597 complete

MFIYTFALFWLALVLVLRYIYSYWDRNGLASIKPQIPYGNLKSVAQK

TQSFGVATCELYWKSQERLAGIYLFFRPAVLIRDAHLAQRIMTTDFSYFHDRGVYCNEEI

DPFSANLFAL

PGKRWRNLRHRFTPLFTSGQLRCMMPTILDVGHKLQKFLEPAAERQEVVDIREIVSRGVL

ELIASLFFGFEADCINDPDDAFSKTLREFQLGGFMNNFRTACTFVCPELLQVTRISSLSP

QMIKFATDVVTKQIEHREKNNVSRKDFIQLLIDLRREEANNNEVALSFEQCAANVFLFYV

AGSDTSTSAITFTLHELTQNPEVMDKLQSEIDEMLVQTNGELTYTAIKELPYLDLCVKET

LRKYPGLAILNRKCTKSYAVPESSVVIQEGTQIMIPLLAYGMDEKYFPEPERYYPERFNKQSKNYDEKA

YYPFGEGPRNCI (1)

AYRMGVMVSKIGLILLLSKFKFEATQGPKIVFSAATVPLVPKGGIPVKISNR*

 

>AAGE01065173 78% to AAGE01198540

N-term is on AAGE02015843.1 (revised 4/20/06)

46903 MFIYTFALFWLAVAFAIRYIYSYWDRNGLPSIKPH

3333 IPYGNLKAVANRTESFGVATCDLYWKSKDRLVGIYLFFRPAVLIRDAHLAQQIMTTDFSH 3154

3153 FHDRGVFCNEEVDPFSANLFALAGKRWRNLRNKFTPLFTAGQLRCMMPIILSVGHKLQNV 2974

2973 LEPAAKKQEVLEIRELVSRCVLDIIASVFFGFEANCINDPNDAFIQNLRELQYDGFFNNL 2794

2793 RAAASFICPELLKLTRISSLSPEMIRFVTDIVTKQIEHREKNKVTRKDFIQLLIDLRRED 2614

2613 TNNNEAALGFEECAANVFLFYVAGSDTSTSAVAFTLHELTQNAETMGKLQTEIDEMLVKT 2434

2433 SGELTYDGIKEMSYLDLCVKETLRKYPGLAILNRECTKSYAVPNSDILLKKGTQVVIPLL 2254

2253 AYGMDEKYFPEPDRYLPERFDKSTKNYDEKAFYPFGEGPRNCI (1) 2116

2065 AFRMGVMVSKICLVLLLSRFNFEATRGPKIDFTPSTVALLPKGGIPVKISIR* 1907

 

>AAGE01047841 AY433537 62% to 6Z2 569650597 622013821 579345058 complete

MLFIYSVALLCIAVTLALKYVYSYWDRHGLPSVKPHIPFGNLKTVVKKTESFGIAIN

QLYWQTKGQLAGIYLFFRPAILVRD

AHLAQQIMTTDFNHFHDRGIYCNEEGDPFSANLFALPGKRWRNLRNKLTPLFTGGQLRGM

MPTILEVGEKLQKHLEPVAERQEVVEIRDIVSRFVLEIIATVFFGFEANCIEDRDDSFSK

VLREAQGERLSAVLRAAAMFVCPGLLRYTGISSLEPQVIAFVSEIVTKQIEHREKNSVTR

KDFIQQLIEIRRGSGENQVPAMSIEQCAANVFLFYAAGSETSTGTIAFSMHELSHHADVM

KKLQDEIDDALAKSNGAITYESVMQMQYLDLCVKETLRKYPGLPFLNRECTMDYKVPDSD

LVIRKGTQLVLPIYGFSMDEQYFPEPECYIPERFEEASKNYDEKAYYPFG

DGPRNCI (1)

AYRMGVLITKIGLILLLSKFTFEATQGPKMMFSSASVPLLPKDGISLKISN

RKR*

 

>AAGE01005406 80% to AY433537 62% to 6Z2 complete

possible pseudogene with frameshift at AVGDKLX X = ct

confirmed in four trace archive sequences

520668645, 757097876, 589569591, 811977620

5398 MLFVYTLTILSIAITLVLKFVYSYWDRYGVQNIKPHIPFGNLKTVVKKTESFGVAINQLY 5219

5218 WQTKGQLVGIYLFFRPAILIRDAHLAQQIMTTDFNHFHDRGVYCNEEGDPFSASLFSLPG 5039

5038 KRWRNLRNKLTPLFTGGQLRGMMPTILAVGDKLX 4940

4937 KHLEPVAENREPIEIRDIVSRFVLEIIATVFFGFEANCIKDRNDAFCRVLREAQRESMYT 4758

4757 NFRAAAVFVCPGLLKYTGISSLEPEVKEFVSGIVTEQIEHREKNGATRKDFIQQLIELRR 4578

4577 EDSQNQNVRMSIEQCAANVFLFYIAGSETSTGTITFTMHELSQHPEVMKKLQAEIDDTLA 4398

4397 KSNGEITYENVNQIQYLDLCVKETLRKYPGLPILNRECTSDYKVPDLDLVIRKGTQVVIP 4218

4217 LYGISMDEQYFPEPECYKPERFDGASKNYDEKAYYPFGEGPRNCI (1) 4083

4017 AFRMGVLVSKIGLVLLSSKFNFKPTQGPKIVFSPAAVPLVPKGGISLMISRRDK 3856

     VADLYMGLHISVVLKVVCS*

 

>AAGE01054542 476413066 56% to 20199522 76% to 6Y1 579367130

614744104 834925676 complete

MWLVYLVWLVAAVLLAVYLWIKKRFNFWKDRGVEYIEPEFPFGNFKTLGKVEHIAPITQR

HYDYFKQKGVPYGGVFMLTSPLLYILDTKLIKTLLVKDFNHFPNRGVYFNEKDDPLSAHMFAI

EGNKWKTLRNKLSPTFTSGRIKMTFPLVVGVCQQFCDHLGEVVQQSNEVEMHDLLSRYTI

DVIGTCAFGIDCNSFREPDNEFRKYGKIAFDKLPHSPLVVYLMKAFRSYANAFGMKQLHE

DVSSFFSKVVKDTIEYRESNNVVRNDFMDLLLKLKNTGRLEESGEEIGKISFEEIAAQAF

IFFTAGYDTSSTAMTYTLYELALNQKAQEKARKCVLDIFAANNGTLTYESVGNMGYLDQC

IN (1)

936 ETLRKHPPVAILERNADRDYKLPDSDIVIKKGRKIMIPTFAMHHDAEHFPDPE 760

759 RYDPDRFSPEQVACRDPYCYLPFGEGPRICIGMRFGTIQARVGLASLLKRFRFRVCDKTQ 580

579 IPVRYSKTNFILGPANGVWLRVEKL* 505

 

>AAGE01206812 586027460 593564617 637757183 494307621 complete 38% to 6M1

TC54189 TC23406 TC42024 38% CYP6P3 TC54190 TC23407 TC42025 TC574 TC6535

83% to TC63333 94% to TC54191 581543219

MFLVILLITLSLYLYQKWIYTYWKRRGVPQLNPSFPFGDVADTFKQRKSYANRLAELHHQ

SASDSHRFVGIYTLFQPILLVTDVELVRRMLTVDFEHFTDRGAHVNEKRDPLSGHLFSLAG

AKWRRMRLKLTPAFTTAKLKAMFPTMMACGRTLSAVIDDHVGRALAIRDLMTRFTMDVIASVGFGLE

CNSMRNPDELFRQMGGRFFSKSWKTSVRMLLAFVAPKVNRYLQVK

LNDDDVEEYMLNLVR

DTIAKREGGGEVRKDFIQLL ()

VQLRNQVEVKDGGSWEMNKVDQNKTLTVEEMAAQSFVFLN

AGYETTSSTVTFCLFELCRNKDLIRKVQEEIDRVMDGGREISYEALAEMTYLESCIDETL

RKYPISPVLFRVCTKPYKIPETDVVIEKDTLVQISLVGLQRDTRYYEDPVKFDPDRYGER

KSETMPHYSFGDGPRVCI (1)

GLRMGKVMAKMALVELLFRYDFELESPAADSGEIELDPSLLMLQAKHDVKLIPRFRAK*

 

>617983543 some differences with TC54191 TC42026 44% to CYP6Z3

94% to 586027460

584131270 520119914 760257438 832454533 625082069

complete 39% to 6N1 anopheles complete

MLLPILLVVLVVYLFQKWTYSHWKRRGVPQLNPAFP

FGNVADTFKQRTSYSNRLAELHHQAVRDGHRFVGIYTLL

QPILLVTDVELVKRMLTVDFEHFVDRGAHVNEKRDPLSGHLFSLTGAKWRRMRLKLTPAF

TTAKLKAMFPTMMACGRTLSAVIDDHVGRALAIRDLMTRFTMDVI

ASVGFGLECNSMRNPDELFRQMGGRFFSKSWKTSVRMLLAFVAPKVNRYLQVKLND

DDVEEYMLNLVRDTIAKREGGGEVRKDFIQLLVQLRNQVEVKDGGSWEMNKVDQNKTLTV

EEMAAQSFVFLNAGYETTSSTVTFCLFELCRNKDLIGKVQEEIDRVMDGGREISYEALAE

MTYLESCIDETLRKYPISPVLFRVCTKPYKIPETDVVIEKDTLVQISLVGLQRDTRYYED

PMKFDPDRYGERKSETMPHYSFGDGPRVCI

GLRMGKVMAKMALVELLSRYDFELESPAADSGEIELDPSLLMLQAKHDVKLIPRLRTK*

 

>NABNU08TR  NABNU08 32% to CYP6Z4 59% to 586027460

(no genomic match) looks like a pseudogene

best genomic match was to 617983543 at 77%

I do not think the TIGR database actually has an Aedes seq that is missing from

The 15 million trace files of Aedes, so this may be a contaminant from another

Species

 

KTLTPFERAAQSSGSQKAFYETTSATGDGSRIERSRNKDLI

GKVQEEIDRVMDGGKGIS*

EALAETTYPESCTEETLRKHPSPPDQDRGGTKPNKTPETDDASEKDTPVQTPPGGTQRDK

REKEDPEKHEPERYGERKPETTPHHSRGDGPRDSTGHRKGKATAKKALAEQPTRNDYEQE

PPAADTGENEQEPSQPTPQAKHEVKQKPRQRAK

 

>AAGE01003592 512632636 TC63333 TC10785 TC15419 TC26904 TC37692 TC4101

41% to CYP6P4 83% to 586027460 836033925 753054225 574000494

(n-term looks identical to 586027460) complete

revised 4/21/06 used AAGE02004393.1, AAGE02030939.1

MFLVILLITLSLYLYQKWIYTYWKRRGVPQLNP

SFPFGDVADTFKQRKSYANRLAELHHQSASDSHRFVG

IYTLFQPILL

VTDVELVRRMLTVDFEHFTDRGAHVNEKRDPLSGHLFSLAGAKWRWMLQKLAPAFTSAKV

KSMFPTMMTCGRTLSAVVGDHLGRALPIRALMTRFTMDVIASVGFGLDCN

SMRNPDEPFHKMGSKFFSKSWKTSVRMLLAFVAPKVNRFLQL

KLNDDDVEEYMLNLVRDTIAKREHGGEVRNDFIQLLVQLRNQVEVEDGGSWEINKVEPNK

ALTVQEIAAQSFVFLNAGYETTSSTITFCLFELCRNRDLLGKLQEEIDEVVDGGREASYE

AITEMTYLEACVEETLRKYPISPVLFRVCTKPYRIPDTDFVIEKGTLVQISLVGLNRDPR

YYEAPLKFDPDRYGERKAETMVHYSFGDGPRGCIGLRMGKVMVKMALVELLSNYDFEMES

PTGENELDPSLLMLQPKHDVILIPKFM*

 

>CYP6AG3 AF288534 AAGE01003202 TC54102 TC12857 TC2905 TC29599 TC46955

TC9197 62% to CYP6AG2 48% to 6AG1 complete

Revised 4/21/06 used AAGE02011378.1 211773-218215 (+) strand

211773 MWWTVVGVLGGILSAIYLFLSWNFNCWKKDGIKGPKPRLLFGNL

       PNVLTQKKHIFYEYEKIYN (2) 211961

216796 DFKTEPVVGYFSVRTPQLMIREPELIKEVLIKGFRYFSA

       NEFSDVVDEKSDPLFARNPFSLSGEKWKTRRGEITPAFTNNR (0)

       IKALSTLMDEVCDRMTDHVKKQKESALETKE (0) 217188

217247 LMSKYTTDVVSNCVFAIDAQSF SKDKPEIREMGRRIMDFNFAA

       QIILMVTTFLPSVKKFYKFTFVPREVEQFFIRIMKDAIRHRKENNIVRNDYLDHLLSL

       QEKKQISEIDMAGHGVSFFADGFETSSLVMTYCLFDLASHPEIQTRLREEIRNVQATK

       GGINYDNIGEMTYLDQVLNETLRIHPIIPVLAKRCTESTVLVGPKDQKIPVSAGTTVV

       IPYFVQLDSQYYQEPNKYNPERFSPENGGTKPYRERGVYFPFGEGPRMCLGMRFAIAQ

       VKRGIIEIIDKFEISVNSKTQVPLKYEPKMFMLYPVGGIWLNYKPIK* 218215

 

>CYP6AG4 possible assembled whole sequence 93% to 6AG3

NABUJ77TF = 6AG4  NABUJ77 = 6AG3 57% to CYP6AG2 90% to 6AG3

AY431873 96% to 6AG3 only 2 aa diffs to new 1.231_5

60% to 6AG2 only 45% to 6AG1 complete

DR747526.1 EST 95% to 6AG3 98% (only 3 aa diffs) to AY431873

This seq not found in the WGS section may be hybrid

Replace with

AAGE02011379

23932 MWWTVVGVLGGILSAIYLFLSWNFDCWKKDGIKGPKPRLLFGNLPNVLKQKKHIFYEYEKIYN (2) 24120

30535 DFKTEPVVGYFSVRTPQLMIREPELIKEVLIKGFRYFSANEFSDAVDEKSDPLFARN 30705

30706 PFSLSGEKWKTRRGEITPAFTNNR (0)

      IKALSTLMDEVCDRMTDHVKKQKEPAVDTKE (0) 30927

30987 LMSKYTTDVVSNCVFAIDAQSFSKDKPEIREMGRRIMDFNFRAQIILMITTFLPSVKKF 31163

31164 YKFTFLPREVEQFFIRIMKDAIRHRKENNIVRNDYLDHLLSLQEKKQISEIDMAGHGVSF 31343

31344 FADGFETSSTVMTNCLFDLASHPEIQTRLREEIRNVQATKGGINYDNIGEMTYLDQVLNE 31523

31524 TLRIHPIIPVLRKRCTESTVLVGPKDQKIPVSAGTTVVIPYFVQLDSQYYQEPNKYNPER 31703

31704 FSPENGGTKPYRERGVYFPFGEGPRMCLGMRFAIAQVKRGIIEIIDKFEISVNSKTQVPL 31883

31884 KYEPKMFMLYPVGGIWLDYKSIK* 31955

 

>AAGE01024260 51% CYP6AG1 complete

replace with AAGE02035807.1 1 aa diff to earlier version

3271  MLVTVGLLLTAFAALYLYLTWHFDYWRKRNVPGPEPLPLVGNFPAFFRRNRPVMEEKYQIYK (2)  3456

3518  DYCSKYNFVGIFTNRSPQIFITSPALARDILVKYFKNFHDNEIGLITNKELDPLF  3682

3683  GRNPFVLNGAAWKAKRAEITPAFTASR (0)  3763

3825  IKALYVSVENVCAQMTKYVKEHCESPIEMKELGDKFTTDVVSSCIFGADAQSFIHQDAE  4001

4002  IRDMGSKLMDSSLSFALKMAVMTVLPSVAKIANMSLVSKPREKFFIKLMAEAIRHREESS  4181

4182  EKYLDFLDYLSMLKKEKNITELDMAAHGVTFFLDGNETSSATLSLNLYELAKQPEIQKRL  4361

4362  REELMNATNDDGTISYETLSELPFLEQVFSEGLRLWPPVTFMSKVCTDPIELDLTSTRKV  4541

4542  PIERGTCAIISNWSLHRDPNFYEDPLKFNPDRFAPEKGGIFPYKEKGCYMPFGDGPRQCL  4721

4722  GMRFGRMQVKRGIYEVIRNFEISVASRTSDPLKIVSSPAISLGLSGIWLSFKPIRS* 4892

 

>AAGE01002325 58% to 6AG1 complete

5532 MFLTITLIVTAVAAIYLYLTWNYNYWKKLNVPGPSPLPGLGSFPSFITQRRPVADEM 5362

5361 DEIYR (2)

5286 EYKPKYNFVGVFSNRSPRIMITSSELAKDILSKNFKNFHDNEFGEMTNKEIDPLF 5107

5106 GRNPFMLTGDEWKAKRAEITPAFTTSR 5026 (0)

4960 MKALFPLVEDVCSRMTKYVTQNRGSVLDSKELSAKFTTDVVSSCIFACDAQSFTSGKPEI 4781

4780 REQGRKLMEQSFSSFLILLFIINFPTLAKIFKIGLVPKSLEKFFTDLMKEAISHRDASGT 4601

4600 NRVDYLDYLISLRNKKEISELDMAAHGVTFFIDGFETSSVAISFMLYEIAKNPEVQKRLR 4421

4420 KELQKVTTDQGTVSYDSLLELSYLDQVVNESLRLWPPAAFISKKCTEPMDLPLTANQNVT 4241

4240 IGKEICAIINIWSLHRDPEYYDDPLTFNPDRFSPETGGTAPYREKGCFIPFGDGPRQCLG 4061

4060 MRFARMQVKRCLYELVSNFKITVNEKTKQPMKLDPKQFLTMPLGGIWLDFEPISK* 3893

 

>AAGE01005157 51% to AAGE01002325 48% to 6AG1 complete

3390 MWLIVISILVTIVSLVYHYLTWNFNYWKYRGVPGPLPKPFLGTFSSTFTQKEHPIEENNRIYR 3202

3149 LFREYRKDVPFIGGFSFRSPQLFALSPTLVKDILVKYHKHFRANEVGGTFDSKADPLLAR 2970

2969 NPFFLDGEEWRSKRAQITPAFTNSR (0) 2895

2815 LKALLPIMDNICNNMVSYIDRHIPNGPIESKELSAKYTTDVVSSCIFGAEGGSLTS 2648

2647 DRSEIREMGNALFQQTFMFIVLAVISSIAPILKRFVKLSLIPKSIENYFVGLMTEAVRK 2471

2470 RKASGTKQVDYLDHLINLQEQKEISILDMAAHGVTFFIDGFETTSEVLGFSLLELSIDKE 2291

2290 IQNRLRQEIHSAEDGQLTFETIMELPYLDQIVN (1) 2192

     ETLRKWPPAYALSKRCTEEITFRLKDNHEVLIEKGITAILPIWAIHLDK 1990

1989 EFYPDPNRFNPDRFSEEDGGHSVRYYQEKGVFLPFGDGPRACIGRRIGLLQVKRALVEIV 1810

1809 KNYDFTVNSKTVLPIKIDPKNIAVTPLGGIWIDYRKL* 1696

 

>AAGE01024111 86% to AAGE01002325 complete

3449 MFITITLIVSAVTAIYLYLTWHFNYWKKLNVPGPSPLPGLGNFPSFITQKRPVAEEMDEIYR 3264 (2)

3191 EYKPKYNFAGVFSNRSPRIMITSAELAKDILVKNFKNFHDNEFGELTNKEIDPLL 3027

3026 GRNPFLLDGSEWKAKRAEVTPAFTTSR (0) 2949

2884 MKALFPLVEDVCSRMTKYLIKNRGSVIDAKELSAKFTTDVVSSCIFACDAQSFTSEKPEI 2705

2704 REQGRKLIEQTFSSFMLLLFIVNFPTLAKIFHVGFIPKSMEKFFTNLMKDAVRYRDASET 2525

2524 NRADYLDYLITLKKKKELSELDMAAHGVTFFIDGFETSSVAISFMLYEIAKNPTVQKRLR 2345

2344 QELKKVTTDNGTVSYDSLLELSYLDQVVNESLRLWPPAAFMSKKCTEPMELPLTANRSVT 2165

2164 IGKEVCAIINIWSLHRDPEYFDDPLTFNPDRFSPETGGTSPYREKGCFVPFGEGPRQCLG 1985

1984 MRFARMQVKRCLYEAVTNFAITVNPKTMEPMRLDPKQVLTMPLGGIWLNFEPISK* 1817

 

>CYP6AL1v2 AY771597 AAGE01003622 complete 40% to 6N1, 98% to 6AL1

476388850 494541913

TC67380 TC35464 TC46198 98% to AY771597, 2aa diffs to 6AL1

MLFLAFAIFVLFAIIQIVYHFRYWMRRGVPQLRPSFPFGDFGEF

FRQKHGIPMTYANIYARTRHLPYVGIYLSMRPVLFVNDPQMVKDILSRDFEHFHDRGL

HVNEETDPLSGNLFSLGGVKWKNMRAKLTPTFTSGCLKGMLAILIDKATVLQKQFAKE

IATHNTIEVKDLFARYTTDVIASVAYGIDNDSINNDHDLFRQMGIKVFQQDFKTSLRL

ALTFFIPKIKALLGFSLVAKDVEDFMINLVSKTIEHRERNGIQRKDMMQLMLQLRNSG

SVSINDQQWNLDSSATVKNLTINQVAAQVFVFFVAGYETSSTLMSFCVWELARNPEIQ

VKVHQEIDSVLSNYGGALTYEALADMEYLECCMEETLRKHPPVSFLNRECTKTYRIPE

TDVIIDKGTAVVVSLLGMHRDPQHFTQPTEFKPERFSSDEQSNESNKAYFPFGGGPRL

CIGMRLGMLQAKVALVTLLAKFEFSLGKEHVKDMELPLKANTLLLVPQDGIQLVVKKR

 

>AAGE01116725 827542817 63% to 6AL1v2 complete

    MLLAFLALSAPLVTVLIWLQFRYWTRCGVPQLDPSFPFGNFSEFFCQKNGIPS

    TYANLYHRTKHLPFVGIYLSLRPALLINDPELVKNILTRDFEHFHDRGIHVDEETDPMSG

    HLFALGGVKWKNLRAKLTPTFSSGSLKEMFPLLVEKATVLQKRFLKEIATSEVVEVKELAACY

  1 TSDVIASVAYGIDMDSINNRDDLFRRMGEK VLAHDLITSLRLALAFWFPKLKVMLGSKSI 180

181 APVIQEFMTELVRKTIEHREKEGVHRKDMMQLLLQLRNGVSLKRNGVQWTEDSAPKNAIK 360

361 SLSIDEVTAQVMVFFVAGYETSSSTVSFCLFELARHQDIQAKVHQEIDTVLAEHEGNLTY 540

541 ASLASMKYLEQCLEETVRKYPPVAILNRECTKTYRIPETDVIVEKGTPIVVPLMGMHRDP 720

721 QYFPQPNDFQPDRFEGGAQSKAYFGFGAGPRLCIGMRLGILQSKVAVVTLLRKFKF 888

889 SLANPEDQHTELRMKPRSFILTTEGGIQLVVQQRHVCET* 1008

 

>CYP6AL2 AAGE01012031 494577102 622074403 641800960 755156422 632907779 580010239

42% to 6Z2 complete 51% to 6AL1

MTLLSIGVALLCVAA

FAFLNYVFGYWKRRGIRQLTPHFPFGNFTDLFFGKASFPKVCENLYERSKQWRLLGGY

VLLRPILLVNDPQLAKDIMVKDFQHFHDRGPHVDEENDPLSGHLFSLAGEKWKHLRAKLT

PTFTSGRLKGMFQTLVDTGEVLQEYIQKYAEGEDVVEIREILARYNTDNIASVAFGIKID

SINNPNEPFRHIGRK (0)

VFEPNFRNNMRGLITFMVPKLNKYLKIKSVDDDVEKFILKVVQETLEYREKNGIVRRDMM

QLLLQLRNTGTVSVDERWDVETSDKFKKLTLKEVAAQAHVFFLAGFETSSTTMSFCLYEL

AKHPEIQRRVQAEIDSVTALHDGKLTYDSINDMRYLECCIDETLRKYPPVPVLNRECTQD

YKVPGMDFTIEKGTAIVLQIAGMQHDPQYYPDPMQFKPERFQDPEVKSKPYAPFGDGPRV

CIGMRMGKIQTKVGLCLLLSKFDFELFGHDEPELVMDPNNFVLTPVDGINLKVSCRE*

 

>AAGE01225620 AAGE01073711 84% to 6AL2 588918478 complete

818 MTPLSIGVALLCVAAFAFLNYVFSYWNRRGVQQLTPYFPFGNFSDLFLGKASFPRVCETL 639

638 YERTKKWRLLGVYILLRPVLLVNDPQLAKDIMVKDFQHFHDRGTHVDEENDPLSGHLFSL 459

458 AGEKWKHLRAKLTPTFTSGRLKGMFQTLVDTGEVLQDYIHTCAKNEEVVEIREILARYNT 279

278 DNIASVAFGIKIDSINNPNEPFRQIGRK (0) 192

120 FFESNFRNNMRLMITFMVPKLNKYFKIKSVDAEVEQFILGMAKETLEYREKNGVVRKDMM

QLLIQLRNTGTVSVDERWDVETSTNSKKLTIGEVAAQAHVFFLAGFETSSSTMSFCLYEL

AKNPEVQRKVQSEIDSVTALHDGKLTYDSINEMRYLECCIDETLRKYPPVPVLNRECTKD

YKVPDSDITIEKGTAVILQISAMHHDPQYYPDPLRFVPERFLDPDMKGKPYAPFGDGPRI

CIGLRMGKIQTKVGLCLLLSKFNFELYGHKESELVMSPNNFLNTPVNGINLKVSCRE*

 

>AAGE01005840 44% to 6AL2 810047144 637194488 complete

MLLCLLILGSIATFYLFLHHHYSYWKRRGISQLKPSFAFGDFGPVIRGRANFVHHLQGIYERTK

RDYSLLGLYVLFRPALLVNDFVVARDILSRDFQHFGDRGIYVDEKRDPFSGHLFALDGER

WRHVRHKVAPAFTPLKLKDVFQTQLIGGVVLQDHLKHFAESGQSVDVADLFLRYSVDMIA

SVAFGVEIDSVNCPEEQFYRVAHSSVESNVKNLLRWTGGFLIPKVLKYTGTR

LVDQHVQDFFMHVVQQTVEYREKTGFTRRDVLQSLLKIMNAESQNVSI 185

186 DFTITDLTVTAFTFLLAGMETSSSTATFCLYEIVNNQEIQRRLQKEIDESLQEHDGL 356

357 ITYDSVVAMKYLDHCVNEAMRKFPALAYLHRICTEDYLVPSTRTIIKKGTLVLIPIYALQ 536

537 RDQEFFPHPDLFLPDRFNDPEAIRQAPFFPFGEGPRSCIGQRMGKMNVKIALVHLLSRYN 716

717 FTLANPVDQGREAPIDPLHFTISPQGSFNMNVTHRKCSSPSSHSKSLTNHSVSLAH*

 

>AAGE01173027 TC56435 TC16115 TC27418 TC40206 TC8341 56% to CYP6N1v2

494318851, join with 223483845 complete

replace with AAGE02015839.1

note: ESTs DV300013.1 DV262803.1 have CVG not CIG at heme region

33781  MIALLLIGAVTLVFLFVKQRFNYWKVRGVPYVRPTFPLGNLWGIGTKKHLSEGLEDLYVQ  33602

33601  LKGKAQLGGIYFFINPVVLVTDLDLIKTILIKDFNFFHDRSIYYNEKDDPLTAHLFTMEG  33422

33421  IKWKNMRVKLTPTFTSGKMKLMFPIVRDCANELEKCISKEIVDGKEIEVKDILARYTTDV  33242

33241  IGNCAFGLECNSLHNPNAEFREMGRKVFQLQGLGFLKLLLTQQFSTLSRALGATVLQPDV  33062

33061  AKFFLKTVSDNVDYREKNKIERNDFIDLMIKLKNGQTLEHDKSDQRVEKLSIEQVAAQSF  32882

32881  VFFFAGFETSSTLMSFCLYELAQNQDLQDKARKDILDTLNKHGSLSYEAVHEMKYLENCVS (1)  32699

       ETLRKHPPASNIFRTATQDYTVPGTSLTIEKGTSVMIP  32522

32521  TLAIHRDPEYYPDPMKFDPDRFTADQVAARHPFAFLPFGEGPRVCIGMRFGLMQARVGLA  32342

32341  TLLKNFRFTVGERLETPAQLDPSSAILLIKGGLWLKVDKI*  32219

 

>AY433475 AAGE01031181 complete 52% TO 6M4 519940462 2246449 DR747763.1

821767964 627434636 90% to 6M5

MEPITIILVTILVLLLTYGFHLIRRQLRFFXDHNVPH

IAGNFVLIDKTQHPANHFLRWYKQSKGQYPLTGVFMFIKPIAIPLDLELIKRILVKDFQY

FQNRGMYYNERDDPLSAHLFSLEGAKWRSLRAKISPTFTSGKMKMMYPTMMAAGKQFSEH

LEEKMSEENELEMRDLLARFTTDMIGTCAFGIECNSMKEPNSKFREMGRKHFES

PRSGLKDLLKITAPGLAR

FFGVTEILPDVAEFFMDVVKSTVEYRMKNNVRRNDFMDLLIAMLDDETEGSESLTISEIA

AQAYVFFIAGFETSSTTMTWALHELSRNPEIQEEGRKCVQEVLEKYNGVMSYEAIMEMTY

IDYIIN (1)

ETLRLYPPVPLHFRVVTKDYPVPGTDTVLPAGTFTMIPVYAIHHDEDIF

PEPEKFDPTRFTPEEVSKRHAYAWTPFGEGPRICIGLRFGMMQARIGLALLLNNLRFSPG

PKSCTKMEFQPENLILTPKQGLWLKVEKV*

 

>AAGE01032555 AAGE01493222 (4 aa diffs) 476379758 88% to AY433475

570738080 578795972 58% to 6M2 complete

MEVITITLLTILVLLIAYASHLLRRQIRFFKDRNVPHIPASF

ELLDKTIHPAKHFLRWYKQFKGQYPLTGVIMFIKPIAIPLDLDLIKRILVKDFQYFQNRG

IYYNERDDPLSAHLFSLEGAKWRSLRAKISPTFTSGKMKMMYPTMVAAGKQF 558

557 SEYLEEKVEDGNELEMRDLLARFTTDMIGTCAFGIECNSMKEPNSKFREMGRKHFEA 387

386 PRNALKDAFKMTAPGLARFLRVTEILPDVSEFFMDVVKSTVEYRMKNNVRRNDFMDLLI 210

209 AMLDDKTEGSESLTINEIAAQAYVFFIAGFETSSTTMTWALHELSRNPDI

QEEGRKCVQEVLEKYNGVMSYEAIMEMTYIDQIIN (1)

ETLRLYPPVPMHFRVVSKDY

HVPETDTILPAGTFTMIPVYAIHHDEDIFPEPEKFDPTRFTPEEVNKRHAFAWTPFGEGP

RVCIGLRFGMMQARIGLALMLKNLRFSPGPKTCTEMEFQPQNFILSPKEGLWLNVEKI*

 

>CYP6M5 AAGE01133741 494330821 73% to AY433475 578801721  826022155 639077358 760273799 581855956 complete

MEVITITLLTILILLLIYVLHLLRRQIHFFKDRNVPYKPASFERLDKTIHPAMHFLRWYKQFKG

QYPLSGVFMFIKPIVIPLDLELIKRILVKDFQYFQNRGIYYNERDDPLS

AHLFSLEGAKWRNLRAKISPTFTSGKMKMMYPTMVAAGKQFSEYLEEKVGDGNELEMRD

LLARFTTDMIGTCAFGIECNSMKEPNSKFREMGRKHFEAPRNVLKDAFKMTAPGLAR

FLRVTEILPDVSEFFMDVVKSTVEYRMKNNVRRN

DFMDLLIAMLDDKTEGSESLTISEIAAQAYVFFIAGFETSSTTMTWALHELSRNPDIQ

EEGRKCVQEVLEKYNGVMSYEAIMEMTYIDQIIN

ETLRLYPPVPMHFRVVSKDYHVPETDTILPAGTFTMI

PVYAIHHDEDIFPEPEKFDPTRFTPEEVNKRHAFAWTPFGEGPRVCIGLRFGMMQARIGL

ALMLKNLRFSPGPKTCTEMEFQPQNFILSPKEGLWLNVEKI*

 

>CYP6M6 AAGE01004894 complete 476324109 637742538 512549238 568770347

5 aa diffs to 6M6 476418676 494093520 66% to AY433475

MDVFLLIAAFVLLVAYGLHLLRKQVNFWADRNVPHNPVNFRQTVDQTVHMARRFQGYYHQFK

GQYPFAGMYLFTKPVALAIDLELLKCIFVKDFQYFHDRGTYYNEKDDPLSAHLFNLEGN

KWRNLRSKISPTFTSGKMKMMYPTMIAAGKQFSEYMDEKVGVEQELELKDLLARFTTDVI

GMCAFGIECNSMKDPNAEFREKGRMHFETPRNRKKDMMCSIAPKLARMMGLKQIIPDLSD

FFLGVVRETIDYRVKNGVRRNDFMDLLIGMLTGENVELGP

LTFNEVAAQAFVFFVAGFETSSTTMTWALYELSVNQDIQEKGRKCVRDVLEKYNGEL

SYETIMEMSYIDHILH (1)

ETLRKYPPVPVHFRIVTKDYKVPNTETVLPAGTSVMIPVYAVHHDPEIFPDPK RFDPDRFTTEEINKRHPYAWTPFGEGPRICIGMRFGMMQARIGLALLLNNFRFSSG

KKSTVPLDFTAKSFILSPDEGLWLKVEKL*

 

>AAGE01105997 63% to 6M6 822931992 complete

     MMEPLDVAITVVMVALAVYMYLDKKHSYWADRKVPFVKPKFFYGNAKEISQTMQ

     VGQVFQQFYHELKGRSPFGGIYMFTAPVAVVTDLELLKCIFVK

   4 DFQYFHDRGTFYSEKGDPLSAHMFNLEGNKWKMLRNKLSPTFTSGKMKMMFPTIVAAGK 180

 181 QFHDFMDEKVKQESEFELKDLLARFTTDVIGMCAFGIECNSIKDPDAQFRVMGRKLFTTG 360

 361 RSKPKSFLMNTMPKVAKLLRLRIFPADVSDFFMKVVRETIDYRMANNVHRNDFMDLLIQM 540

 541 RNPDENKSSEGLLSFNEIAAQAFVFYLAGFETSSTLLTWTLYELAVNQDIQEKGRQHVKE 720

 721 VLKKHDGEMTYESITSMKYLDQILN 795 (1)

 860 EALRKYPPVPVHFRETSKDYTVPDSNIVIEGGTRLFVPVYAIHHDPEIFPNPEQFNPDRF 1039

1040 TPEEEQKRHPYAWTPFGEGPRICIGLRFGMMQARIGLAYLLNSFKFSIGEKCKVPLEFDV 1219

1220 KSFILAPKGGLWLKVEKI* 1276

 

>476419050 52% to CYP6N3 637792107 793200622 complete

AAGE02015839.1 4/21/06

102508  MWIFLLLSIAVLLILQVRRKYSYWKRHGVPFIQPRFPFGSITPVGDRVHSSQLMARFYNQ  102687

102688  LKGTYPFAGMYFFTNPVVLALDLDFIKNVLVRDFQYFHDRGLYHNEKDDPLTCHLFNIEG  102867

102868  TKWTNLRRKLLPTFSSGKMKMMCPTILAIADRFRTAIENSISDQNEIEMRDFLARFTTDV  103047

103048  IGTCAFGIDCNSLENPDAEFLKMGNKIFEVPTSRIIAYFFVSTFQELSKKLHIKAVPEDV  103227

103228  SRFFYKVVRETMAYRQSSGVQRNDFMNLLMQLKEKGELEGSDEKLGTLTLDEVVAQAYVF  103407

103408  FLGGYETSSTNMCFCLYELALNGEIQEKARECVQKAVAKHGGLNYEALMDMPYLEQCIY (1)  103584

        EALRKYPPIANLFRSVTQDYNVPNSNVMLPKGMNVWIPIYAIH  103767

103768  HDPEFFPEPELFDPERFTQEECEKRKPFTYMPFGEGPRTCIATRFGMMETKTGLATLLMN  103947

103948  FKFTKSARLEVPPKFSTKHVMLTPVGGLWVKVEKIEQ*  104061

 

>AAGE01021887 66% to AAGE01005098 complete

replace with AAGE02015839.1 4/21/06

112574  MWLNLLVMVFALSVILVRRRYSYWKRIGVPFIQPRFPLGSIGSIGTRIHSSQLLAQFYQQ  112753

112754  LKGSHPFAGIFYFLQPVALALDLEFVKNVMVRDFQYFHDRGLYYNEKDDPLSSHLFNIEG  112933

112934  TKWTTLRRKLVPTFSSGKLKMMCPTVVSVADRFKMCIEKSIAKEEAIEMRELLARFTTDV  113113

113114  IGSCAFGIECNSLENPDDKFRKMGEKVFDVSPFAILAFFFLSTFKDLARKCRISITDSEV  113293

113294  AAFFSTIVQKTITYREKNNVQRNDFMNLLMQMMKKNKEDESEENSVTLTLDEVVAQSYVF  113473

113474  FLGGFETSRTTMSYCLYELSLNQEVQNRARKCIQSAVAKHDGLNYEALMDMPYLEQCIN (1) 113650

113709  ESLRKYPPISNALRSTTKDYAVPGTEVILKKGTDVIVPIYAIHHDPEYYPDPELFDPD  113882

113883  RFSADQCAKRKPFTFMPFGEGPRMCVASRFGMMETKIGLAAMLMSFRFSKCEKSIVPLKI  114062

114063  SPNHLMLTPAGGLWLKVEQLESDETEMGFSKLISDERVNRLGYSM*  114200

 

>AAGE01004071 494535013 complete 55% to 476419050 56% to 6N2

TC62330 TC28072 TC40767 58% to CYP6N2

MWNLSTSSNPHTIVPAHPKEILLLATSSVPFIPARFPVGSFDGVGVRNHPSQLLAKFYR

QMKGLHPFVGVYYFLQPVVVVLDLDFAKTILIRDFQYFHDRGLYYNEKDDPISGNLLHLE

GSRWTNQRKKLIPTFSSGKLRMMCPTILKVADNLKVSFERYVAERDEIEIKDILARFSTD

VIASCAFGLDCSSLLEADDEFRRMGTKVFDISGWKLLKLFFVFAFGNVARRCHMKLIDED

ISQFFFKVVRETIDFRKKNHVHRKDFLNLLIQLKDNG

ELEGSNEKLGTLTLNEVVAHSFVFFLGGFETASTTMSYCL

YELSLNEEVQERARQCVKAAIHKYGDLNYDDLLDMPYLEQCINETLRKYPPSTIYRIVTQ

NYHVPDSSIVFPKGMSVMIPVYAIHHDPEFWPSPELYDPDRFAPEECVSRNPLTFIPFGE

GPRMCVAARLGVLQTKIGLATLLMNFRFSRCKNSTEPLQYSPKHFILTPVGGLKMRVEKI

Q*

 

>AAGE01078584 494579395 48%  to 581602077 46% to 6N2 586209056 568771938

574225739 pseudogene (sequence does not continue)

1188 MLLYLLLTVVTLAYLWIGRRYSYWKQRSVPYVEPRFPFGNLQGLNKRHFGLLAQDVYSKL 1367

1368 KGSGSKFGGMFFFVNPVAVILDLDFAKDVFVKDFQYFHDRGVYSNEKVDPITSHLVAMEG 1547

1548 IKWKNLRAKLTPTFTSGKMKMMFPTITAVADEFRKCMVNEVDKGGEIEMKEFLARFTT

DVIGSCAFGLECNSLADPEAEFRKMGKKALTMSPMGFLR 525

524 RILSVTFRDLAKFLGVRISDPDVATFFMNVVRSTIEYRERNKVQRNDFMDLLIKLKNVEP 345

344 IDENTNQLGPLTFNEIVAQAFVFFLAGFETSSTT 243

MCFCLYELAKNQELQDKARRNIDEVLAKYGTMTYEAVH

EMRYMENCIN(1)

ESLRKYPPLPNILRNVNKPY

 

>AAGE01020246 79% to 494579395 58% to 6N1 complete

4381 MLLFLLLSVVTAAYLWVIWRYSYWKRRSVPYVEPSFPFGNLQGLNKRHFGLLTQDVYSK 4557

4558 LKGTGCKFGGMFFFVNPMVVILNLDFAKDVFVKDFQYFHDRGEYSNEKADPIMAHLVTME 4737

4738 GTKWKNLRTKLTPVFTSGKMKMMFPI

4816 ITAVAEEFRKCMAKEADKGEDIEMKELLARFTTDVIGNCAFGLECNSLMDPEAEFRKMGR 4995

4996 KAMAMSSADFLRRKLCNSFRGLAKLLGVRLSDPDVSDFFMNAVRSTIEYRERNKVQRNDL 5175

5176 MDLLIKLKNAELIDEKSDRLGPLTFNEIAAQAFVFFLAGFESSSTAMSFCLYELAKNQEL 5355

5356 QDKARRNINEVLVKHGTLTYEALYEMTYIENCIN 5457

5518 ESLRKYPPVTNIVRNVSKPYRVPGMNVTLEEDCRVLLPVYAIHHDPSLYPNPDQFDPERF 5697

5698 NPENSAARHPMAFVPFGEGPRICIGLRFGSMQARIGLTYLLKNFRFTLSEKMHDPLKMMS 5877

5878 NTIILASEGGLWMRIEKL* 5934

 

>AAGE01192518 80% to AAGE01020246 637742736 (758bp upstream do not have more P450 seq)

this break is not near the usual intron boundary CIN/ESLR.  This is a probable

pseudogene fragment.

1554 YQVPGMNVTLEKGCRVLLPVYAIHQDPKLYPNPEQYDPDRFNPENSAARHSMAFVPFGEG 1375

1374 PRFCIGQRFGMMQARIGLTYLLKNFRFTLSEKTPSPLKILANSTVLASEGGLWLKLEKL* 1195

 

>AAGE01052546 494130149 64% to 76419050 615888679

12  ADRGLYYNEKDDPISCHLFNIEGSYWTNLRKKLSPIFSSGKLKLMCPMVITIAERFQKCL 191

192 SKSITQNQQEAEMKEWLNRFTIDVIGTCAFGIECNSLTNPEEKFRKMGVKMFHVANSR 365

366 IIKFFFISLFKNLAKKVHIKSVPEDVSEFFFKVIRKTIAFREMNHVLRNDFINLSMQLMA 545

546 DGKLEGSDEDVGKITLNEVVAQSFVFFLAGYETSSTVMMFCLYELSLQEDIQRRARENV 722

723 ITAVSRHGGLNYDALMDMGYLDQCVN

    ETMRKYPPAGNLGR 899

900 CVTKDYNIPNTNITLRK

 

EGPRNCIAARSGMLMAK 1136

 

AAGE02018066.1 Length=351875 use this seq

319978  MWIYLLIGIITSLVLFVRRKYLFWERQGVPFIKPKFPFGNLLVNGKRVHTSQLTTYYYNA  319799

319798  LKGKKHPIGGVFFFTTPFAVVLDRELMRNVLIQDFQHFHDRGLYYNEKDDPISCHLFNIE  319619

319618  GSYWTNLRKKLSPIFSSGKLKLMCPMVITIAERFQKCLSKSITQNQQEAEMKEWLNRFTI  319439

319438  DVIGTCAFGIECNSLTNPEEKFRKMGVKMFHVANSRIIKFFFISLFKNLAKKVHIKSVPE  319259

319258  DVSEFFFKVIRKTIAFREMNHVLRNDFINLSMQLMADGKLEGSDEDVGKITLNEVVAQSF  319079

319078  VFFLAGYETSSTVMMFCLYELSLQEDIQRRARENVITAVSRHGGLNYDALMDMGYLDQCV  318899

318898  N (1)

        ETMRKYPPAGNLGRCVTKDYNIPNTNITLRKGLNVVIPVH  318719

318718  GIHHDAEYYPDPERFDPERFSAEESTKRLPFTFMPFGEGPRNCIAARFGMLMAKVGVASM  318539

318538  LMRFQFSKCSKTAVPLVISPKHASMSPEGGMWLKVKEIK*  318419

 

>AAGE01005098 6263502207 69% to 6N2

TC55162 TC31642 TC43307 584989096 579853579 complete

    MWIYLLIAAITLSVLLVR

232 RKYSYWKRHGVPYIKPTFPFGNIRPAGNRVHSSQLMTRYYNELKGKHQFGGIFFFTNPV 408

409 ALALDLEFIKDVLVRDFQYFHDRGMYYNERDDPISGHLFNIEGTQWTNLRKKLLPTFSSG 588

589 KLKMMSPTIISVAERFQECLEKCITVDTEIEMKDLLARFTTDVIGTCAFGIDCNSLN 759

760 DPEVEFRKMGNKMFELP

TGRILKFFFISTFKNLARKARLKSVPEDVSEFFFRVVRETIDYREKSHIQRNDFMNLLMQLREKGALE

GSDEKVGTLSMNEVVAQAFVFFLGGFETSSTTMSYCLYELSLHEDIQERARECVQSAIAK

HGGFNYDAVMDMNYLELCIN (1)

ESLRKYPPGAN

LVRCATKDYQVRNSSVVFKKGMSVMVPIYAIHHDAEYYPDPERYDPERFGVEELAKRPPF

TFMPFGEGPRICIAARFGMMESKIGLAALLMNFKFSKCSKSIVPLVISNKHVVLTPAGGL

WLKVEKLEQ*

 

>AAGE01273771 520199522  728739223 86% to CYP6N4v1 578623429 complete

MLIYLTVLALTLAVLWIRKRYSYWMDRGILYVEPS

FPAGNLRGMGRKEHLSSQMQRCYKELKGKGPVGGMFFFINPVALAMDLDLIKSVLVKDFQ

YFHDRSVYYNEKDDPLSAHLFTMEGAKWKNLRAKLTPTFTSGKMKMMYPTIIGVADEFQK

LMKSEVSS

NAEIEMKEILARFTTDVIGTCAFGLECNSLHDPDAKFRAMGRKIFSFANGRF

LKAVIAQQFRSLARSLHIALVDKEVSDFFLGAVRDTIKYREENKIERNDFMSLLMKLKDD

GNTGNTETLTVEEIAAQAFVFFLAGFETSSTAMSYCLYELAQNSDLQNKARKSVMDSIKK

HGSLTYEAMQDMQYIDQCIN (1)

ESLRKYPPASTLTRSVSKDYKLPNSNVVLQQGSTLIVPVYA

LHHDAEYYPDPEKYNPDRFTPEEVAKRNPYCFLPFGEGPRICIGMRFGMMQARVGLAYLL

RDFSFTLSSKTPVPLKISPRSPVLTSEGGLWLKVQKL*

 

>AAGE01504815 581602077 91% to CYP6N3v2 753204063 815151384 632872571 complete 579754790 TC62375 TC16009 TC23365 TC47533 TC56452 TC50555 6N6v1 6T1.4

MWIYLTVLALTLAVLWVRKRYAYWKERGIPYVEPSFPAGNIR

GMGRKEHFSTQMQRCYKELKGKGPVGGVFFFINPVPLALDLDFIKTVLVKDFQYFHDRSI

YYNEKDDPLSAHLVALEGAKWKNLRTKLTPTFTSGKMKTMFPTIIGVADEFQKMMKNEVV

GNTEIEMKDILARFTTDVIGTCAFGIECNSLQDPNAQ

FRRMGRKIFSVAK

GRLLKLITAQQFRSLARMLGITLIDK 

DVSDFFIGAVRDTIKYREENKIERNDFMSLLMKLKNDESSQDTNSGDVE

TLTVEQIAAQAFVFFLAGFETSSTAMSNSLYELAQNSDLQNKARKSVMDAIKKYGSLTYE

AMQDMQYIDQCIN (1)

ESLRKYPPASNLTRTVSTDYKLPDSNVVLQQGSTLIVPVYALHHDA

EYYPDPEKYDPDRFTPEEVAKRNPYCFLPFGEGPRNCIGMRFGMLQARVGLAYLLRDFSF

TLSNKTPVPLKISPHSPILTSEGGLWLNVRKL*

 

>AAGE01026936 476398858 46% to 476419050 48% to 6N1 584294086 819690004

complete

MSALLIILALTPLFLFIIYVK

672 QKYAYWARRNVPFLKPHFPYGNFEALDRKSIADVAREAYEEMKNRGPFYGAYFFLQPL 499

498 ITITDPDLIKMVLIKDFNTFPDRGLYFNERDDPLSAHMFAIEGNKWRSLRQRLSPTFTSG 319

318 KMKMMFPTLAAVGDQFSAFLDEEIGSGKVVEVKDFMAKFTTDIIGSCAFGIECNSFK 148

147 DPHGRFRQFGKMVFETPVHGSLVRFALKSFPEISRRLRIK 28

ALHEEASKFFYGVVEDTVKYREKNGVERKDFLSLLIDMKKDGVDFT

MDEIAANSFIFFGAGFETSSSNQTFCLYELARNPECQDKARQSVLDALRNHGGMTYDAAC

DMQYLDQCIN (1)

ETLRLYPSVPVLERRAFQDYKIPGHDVVIPKGMKINIPAYAI

QRDERFYPDPDVFNPDRFHQKEVAKRHICTFIPFGEGPRICIGLRFGMMQSRVGLATILS

KFRISICSETANPLEYSSKTSVLIPKEGLWLRVDPL*

 

>AAGE01569058 TC56593 TC20604 TC28568 TC51124

56%  to CYP6AA1 834896125 complete

MGLYNTVLYLVLPIVWLLYTYFRRKYSYWADRNVPQVPGSL

PLGSFNGMGTKYHFVDVLKRVYDTYHKTHKAIGMYLSVKPILFVSDLDLIKKILVKDFNS

FRDRGMYYNEKDDPLSAHLFSIEGERWRFLRNKLSPTFTSGKIKYMYLTICE

IGEEFLACFDKYLDRKEAVDIKPLAQRFTSDVISSVAFGLKTNALKNEGSELLNKGDSVF

KPGRWETIRIFALLSYRDLAKKLGLRQFPRDVTDYFMDIIRGTVDHREKTNVMRQDFLQL

LLKLKNKGTIEDHEEESKEKITLNELAAQAFLFFFAGFETTSTTVSFALFELANNAEVQEKTR

QEVQRVLAKHGGHLTYDAIKDMTYLEQVVNETLRKHPPVGNLIRLANDPYRIDSLGTDIE

RDTMIMIPVHAIHNDPDIYPDPERFDPYRFTPEAINARHSHCFIPFGDGPRNCIGMRFAL

VEVKFGIAQLLTRLRFTVNEKTQFPVRYDPKSQFAEVKGGIWLNVERI*

 

>223495136 AABUM55TV.gz 59% to TC56593 40% to 6aa2 pseudogene?

Sequence has no exact match, 78% to 615844728(TC56593 see above)

KTLLNHPPFFNLILLLNYPYLIHSL*TFF

487 QQNTIIMIPFHTIHNYPNIYPYP*RFYPNQFTP*SINSHHSHTFIPF*YPPLNCISIPFS 308

307 LLHLNFVIAHLLTKLRFTSNHKTHFKNRYDPKSRGAVVEGGIWLKVE 167

 

>AAGE01185776 223460790 57% to TC56593 53% to 6AA2 78% to 223468847

636165685 580089056 637123288 complete

MAFLFTTLCLLLPLLGLLYYYVRRKFAYWADRGVPYVPGSLPMGSFNDMGSTKHIVELLDAIYKQYRNTHK

AVGMFLSINPILLAVDLELVKQILVKDFNSFHDRGMYFNERDDPLASHMFSVEGERWRFL

RNKLSPTFSSGKIKYMFLTVREIGLEFLASFEPFMERKEPVEIGIQAQKFTCDVIGSCAF

GLSCNALKDESTELLDIADRVFNPKPLEMMYMLLLICFRKWAVKLRLKQTPADIERFFV

NMVRKTVEHREKNNITRPDFLQLLMQLKNKGTLEESEEDSKETISMNDVIAQA 682

681 FLFFFGGFETSSKALSFALFELALNPELQEKARDEVLRTLDKHDGLLTYEALKDMTYVEQIVH (1)

    ESLRKYAPIGNVIRKANEPYQIHSPDIILEKGTMVM 325

324 IPVHSIHHDPEIYPDPSRFDPDRFTPEAISARHSHSFLPFGDGPRNCIGMRFALLEVKFG 145

144 IAQLLSRLRFTVNEKTQLPLRYDPKANVASALGGLWLDVERI*

 

AAGE02023125.1 Length=57153 use this seq probable allele of upper seq

97% identical 12 diffs

15290  MVFLFTTLCLLLPHLGLLYYYVRRKFAYWADRGVPYVPGSLPMGSFNGMGSTKHFVELLD  15111

15110  PVYKQYRNTHKAVGMFLSINPVLLAVDPDLVKQILVKDFNSFHDRGMYFNERDDPLASHM  14931

14930  FSVEGERWRFLRNKLSPTFSSGKIKYMFLTVREIGLEFLASFEPFMERKEPVEIGIQAQK  14751

14750  FTCDVIGSCAFGLSCNALKDESTELLDIADRVFNPKPLEMMYMLLLICFRKWAVKLRLKQ  14571

14570  TPADIERFFVNMVRKTVEHREKNNISRPDFLQLLMQLKNKGTLEESKEDSKETISMNDVI  14391

14390  AQAFLFFFGGFETSSKALSFALFELALNPELQEKARDEVLRTLDKHDGLLTYEALKDMTY  14211

14210  VEQIVH (1)  14193

14133  ESLRKYAPIGNVIRKANEPYQIHSPDIILEKGTMVMIPVHSIHHDPEIYPDPSR  13972

13971  FDPDRFTPEAISARHSHSFLPFGDGPRNCIGMRFALLEVKFGIAQLLSRLRFTVNEKTQL  13792

13791  PLRYDPKTNVASALGGLWLDVERI*  13717

 

>223468847 73% to 223460790 I-helix and end 62% to 223460790 pseudogene

AFLFFFVGFSTSFTPFSFSLFEFALFPQLRGVARDRFLRTLDDHDVLFTFAALIDLTYVALIFH (?)

DSLP*FAPFRYVFREAYVPFQFHSPDFFLG*S

490 TIVMIPFHSFLHDPEFFPDPSRFVPDRFSPEAISALHSHSFLPFGDGPRNCFGMRFALLE 311

310 VKFGIAPFLPPFPFSFPPPPPLPLRFVPKANVASTLPGLCFPVDLI 173

 

>AAGE01408667 TC66947 TC28477 TC49577 55% to CYP6P4 TC67159

TC33540 TC39062 593244050 519827477 50% to CYP6P4 complete

Revised May 15 2006, Trace files support the cyan seqs

MAILELYLAIGVTLVLATAGCVFLFLDKKRSFWKDRNFPCTGRAKMIYGDYKNMNQT

EHMQYINQRIYNEFKARKLPIGGTVLFLVPSTVVVDPDLIKAMLVKDFNFFHDRGVYNNP

EVDPLTGHLFSLEGQAWRQLRAKLSPTFTSGKMKMMFSTIL

SVADDLKEFLLEKTESGPTELEMKNVLAGFTTDV

IGSCAFGIECNSLRATHCRFREVSRKIFEQSVGQMLWMIVLMLFK

GVATKLKLKATPAEVENFFTNMVQETIDHRERNNVQRSDFMNILIQMKNSTNLEEKLTLN

EITAQSFIFFVAGFETSSTTMVNCLFELAMNPDIQEKLRAEIFKVC

GEGDLTYESVSSVEYLNMVIDETLRKHPVVDSLLRTSTQPYNIPNTDLKIPKGTF

VFIPVHALHHDPEYYPDPDRFDPERFNAENRASRHPFVYLPFGEGPRNCIGMRFGLMQTR

VGLITVLRNFRVRPSSNTPERLVVNPKSGIPAPLGGIPLLIERI*

 

 

>AY432230 AAGE01011017 52% to 6P4 583641208 589587999 CR937398.1

CR937850 CR937397.1 CR937849.1 complete

MDPVTVILTIFVGLTGLVYFFLRREQQKWPRLGVPFAKNPH

LLFGNVRGIFQKEHSCEILQRLYWEFKGRGLKLGGIMNFFQPAVLVIDPEISKSILVKDFNKFHDRGIFVDPAGDPLSANLFSLEGAQWKAMRTKMSPTFTSGKMKYMFESVLNVAERLKDYLAENCLKEDIELKNILQRFTMDVIGNVAFGV

ECNSIKNPSSEFRLMGLKANRFDGVRFLKFFIGGAYKNFAKKIKLKVVEDDVHKFFMSLV

HSTVHYREGNNVKRNDFLNLLMEIKNKGKFSDEPNSGGEGITMNEIAAQCFIFFTAGFET

SSTTINFCLYELANNPDIQDRLRNEIEDVVAKDGGELKYDTLLGMNYLDRVVS (1)

ETLRKYSAVDNLFRISNSPYTPDGCNFTIPAGTLFQIPIHSMHHDPEYFPDPGRFDPDRF

LPEVAKSRHPYCYLPFGEGPRVCIGVRFGLMQTKIGLVTLLRDFRFGPRSETPDRLQFEA

KTFVLTPQTGIYLKIEPIGI*

 

>AAGE01083421 476356097 39% to CYP6AD1 587854394 570810577 complete

MISGTVCILLVLANVAFLVLFVR

GVLQSRQVYWVRRRIPFVAWPHLLFGNVRRLWRHEHSSTIGQRLYRDLKARRLAAGGFNL

LVSPSILVADPDLAEEVLVGNVRRFPDRGL HVDAEVDPLSETLFALRGNRWKDKRNR 

LAPVFSEETLKPV FRMVASFADELRKEISINLDRRLQDVQEWVSRYVTQVMGKSVFGM 202

203 RCRMMQDPNTDFRRYGRISTELSWLLLLKNWIGVTMPWMARKVGLRITDATVEKFYVDLC 382

383 RSNVLVRESYKVKENDILQLFMRLREARQLTME 481

482 ELTTACYSFVKHGMEPCTSVMTFCLYELAKNLSIQKRLRDEISHNLEDTDGQ 637

638 LTYDVIMSMNYLDQVVN (1) 688

745 ETMRKYPPVDFIYRRSSQSRDNIPQGTLFVIPVYAFHHDPDHFPAPENF 897

898 DPERFTAKQARTRHPYCYLRFGAGPRECLGAR

    FGLLVVKAGLVTLLRRFRFAMPEELVHEKLQFKPNASVLSPVEGSVRLRVETI*

 

These trace files match 100%  MISGTVCILLVLA

gnl|ti|591523435 

gnl|ti|587360321 

gnl|ti|585826792 

gnl|ti|578932662 

gnl|ti|570810577 

gnl|ti|576970754 

 

These trace files match 100%  HVDAEVDPLSETLFALRGNRWKDKRNRLAPVFSEETLKPV

gnl|ti|639160181 

gnl|ti|591799078 

gnl|ti|591523435 

gnl|ti|576970754 

gnl|ti|570810577 

 

These trace files match 100%

RLAPVFSEETLKPVFRMVASFADELRKEISINLDRRLQDVQEWVSRYVTQVM

GKSVFGMRCRMMQDPNTDFRRYGRISTELSWLLLLKNWIGVTMPWMARKVGLRITD

gnl|ti|639160181 

gnl|ti|591523435 

gnl|ti|578800786 

gnl|ti|570810577 

gnl|ti|476356097 

 

These trace files match LAKNLSIQKRLRDEISHNLEDTDGQLTYDVIMSMNYLDQVVN  100%

gnl|ti|793209208 

gnl|ti|637742971 

gnl|ti|587849418 

gnl|ti|578800786 

gnl|ti|476356097 

gnl|ti|567212773 

 

These trace files match FHHDPDHFPAPENFDPERFTAKQARTRHPYCYLRFGAGPRECLGAR  100%

gnl|ti|476356097 

gnl|ti|793209208 

gnl|ti|614704229 

gnl|ti|637742971 

gnl|ti|588905694 

gnl|ti|743515336 

gnl|ti|587854394 

gnl|ti|587849418 

 

These trace files match CLGARFGLLVVKAGLVTLLRRFRFAMPEELVHEKLQFKPNASVLSPVEGSVRLRVETI100%

gnl|ti|614704229 

gnl|ti|637742971 

gnl|ti|588905694 

gnl|ti|745128895 

gnl|ti|743515336 

gnl|ti|587854394 

 

>1.1327_5 AAGE01083421 JP’s version 17 diffs to AAGE01083421, new gene

MISGTVCIVLVLANVAFLVLFVRGVLQSRQVYWVRRRIPFVAWPHLLFGNVRRLWRHEHSSTIGQRLYRDLKARRLAAGGFNLLVSPSILVADPDLAEEVLVGNVRRFPDRGLHVDAEVDPLSETLFALSGNSWQDKRNQLTPVFSEETLKPVFRMIASFADELRKEISKNLDRRLQDVQEWVSRYVTQVMGKSVFGMRCRMMQDPNTDFRRYGRISTELSWLLLLKNWIGVTMPWIARKVGLRITDATVEKFYVDLCRSNVLVRESYKVKENDILQLFMRLREARQLTMEELTTACYSFVKHGMEPCTSVMTFCLYELAKNVSIQKRLRDEISHYLEDTDGQLTYDVIMSMNYLDQVVNETMRKYPPVDFIYRRSSQSRDNIPQGTLFVIPVYAFHHDPDHFPAPEKFDPERFTAKQARTRHPYCYLPFGAGPRECLGARFGLLVVKAGLVTLLRRFRFAMPEELVHEKLQFKPNASVLSPVEGSVRLRVEAI.

 

These trace files match 100%  MISGTVCIVLVLA

gnl|ti|822917015 

gnl|ti|749404661 

gnl|ti|630748299 

gnl|ti|520549356 

 

These trace files match 100%  HVDAEVDPLSETLFALSGNSWQDKRNQLTPVFSEETLKPV

gnl|ti|749404661 

gnl|ti|630748299 

gnl|ti|591518206  this seq also matches JP’s version

QLTPVFSEETLKPVFRMIASFADELRKEISKNLDRRLQDVQEWVSRYVTQ

VMGKSVFGMRCRMMQDPNTDFRRYGRISTELSWLLLLKNWIGVTMPWIARKVGLRITDAT

VEKFYVDLCRSNVLVRESYKVKENDILQLFMRLREARQLTMEELTTACYSFVKHGMEPCT

SVMTFCLYELAKNVSIQKRLRDEISHYLEDTDGQLTYDVIMSMNYLDQ

 

These trace files match FHHDPDHFPAPEKFDPERFTAKQARTRHPYCYLPFGAGPRECLGAR  100%

Two different sequences exist

gnl|ti|808282492 

gnl|ti|593108115 

 

These trace files match CLGARFGLLVVKAGLVTLLRRFRFAMPEELVHEKLQFKPNASVLSPVEGSVRLRVEAI100%

gnl|ti|808282492 

gnl|ti|593642133 

gnl|ti|593108115 

 

>CYP6Pnew 574331490 AAGE01337778.1 571589719 600552785 complete

66% to 6P4

note: TC66947 begins 357bp downstream of this seq, same oreintation

this seq matches AAGE02030882.1

MLPFLLAVVALLLTAAGLYIRSRHRFWSDRGIPCAPNPEFLFGHVRGQVTNKHAAY 353

VNRELYQQFKARGEGFGGYSFFAVPAVIIVDPELVKTILVRDFAVFHDRGIYNNPKDDPL 173

SGQLFLLEGLQWKILRQMLTPTFTSGRMKAMFGTIMDV

AEEFRQFLVDSRERESVIEMKEVLASFTTDVIGTCAFGIECNTLKNPDSDFLKYGKKVF

EQRMSTLFKFIFASLFK

DLARKLGVKITDAGVEKFFLGLVRETVEFREKNNVMRNDFMNLLLQLKNKGRLVDQLDE

ADEVAARGLTMEELAAQCFVFFIAGYETSSTTMNFCLYELAKNPDIQEKLREDIEEAVAS

NGGRVTYDLVMGLRYLDNVVN (1)

ETLRKYPPIESLNRVPTSDYTVPGTKHVLPKQTMITIPIYALHHDPDFYLDPDNFDP

DRFLPEAAQARHPYAFIPFGEGPRNCIGMRFGLMQTKIGLITLL

RNFRFSPSAKTPDKIAFDVKSFVLSPDGGNYLRYDKI*

 

>AAGE01395583 96% to 6Pnew 592239414 N-term part C-term part 569878875

803228481 a second C-term part, neither can extend 8 aa to the normal exon boundary

This looks like a pseudogene

    MLPLLLAVVTILLTAAALYIRSRHRFWSDRGIPCAPNPEFLFGHVRGQVTNKHAAYVNRE

    LYQQFKARGEGFGGYSFFAVPAVIIVDPELVK

  8 TILVRDFAVFHDRGIYNNPKDDPLSGQLFLLEGLQWKILRQMLTPTFTSGRMKAMFGTI 184

185 MDVAEEFRQFLVDSRKRESVIEMKEILASFTTDVIGTCAFGIECNTLKNPDSDFLKYGKK 364

365 VFEQRVSTLMKFIFASLFKDLARKLRIKITDAGVEKFFLGLVRETVEFREKNNVLRNDFM 544

545 NLLLQLKNKGRLVDQLDEADEVAARGLTMEELAAQCFVFFIAGYETSSTTMNFCLYELAK 724

725 NPDIQEKLREDIEEAVASNSGRVTYDLVMGL 817

 

this is the same as

AAGE02023125.1 100% match to 1.702_1 with error at intron boundary, missing small exon

This could be a pseudogene since the gene structure does not match related genes

The end of exon 1 could be broken off by an insertion, blocking expressed. 

If it is expressed it is missing 2 conserved amino acids VM

48326  MLPLLLAVVTILLTAAALYIRSRHRFWSDRGIPCAPNPEFLFGHVRGQVTNKHAAYVNRE  48147

48146  LYQQFKARGEGFGGYSFFAVPAVIIVDPELVKTILVRDFAVFHDRGIYNNPKDDPLSGQL  47967

47966  FLLEGLQWKILRQMLTPTFTSGRMKAMFGTIMDVAEEFRQFLVDSRKRESVIEMKEILAS  47787

47786  FTTDVIGTCAFGIECNTLKNPDSDFLKYGKKVFEQRVSTLMKFIFASLFKDLARKLRIKI  47607

47606  TDAGVEKFFLGLVRETVEFREKNNVLRNDFMNLLLQLKNKGRLVDQLDEADEVAARGLTM  47427

47426  EELAAQCFVFFIAGYETSSTTMNFCLYELAKNPDIQEKLREDIEEAVASNSGRVTYDL  (0) 47253

44758  GLRYLDNVVN (1) 44729

44670  ETLRKYPPIESLNRVPTSDYTVPGTKHVLPKQTMITIPIYALHHDPDFYLDPDNFD  44503

44502  PDRFLPEAAQARHPYAFIPFGEGPRNCIGMRFGLMHTKIGLITLLRNFRFSPSPKTPDKI  44323

44322  AFDVKSFVLSPDGGNYLRYDKI*  44254

 

>AAGE01028822 752849490 633799995 600024515 576876673 complete 65% to 6S1

MILILLLLAATALFFRWINAYRARYQFWKEHNVPHLEPRFPVGNAGDILKSTIHFAH

IMDNLYRELKHFGDYAGIYFFRDPVLVVLSPEFAKTVLVK 658

DFNYFLDRGVYSNEKDDPLSANLFFMEGHRWRKLRAKLTPTFTTGKLKAMFHTILAVGEQ 478

FDRYLQDYTKQKDEVEVKDLLARFTTDIIGSCAFGIDCNSLENPESKFRQMGKRMINFPK 298

LKALKIFFAMMYRKQARWLRIRFNDEDVSDFFFAVVRDTIRYREENNFERKDFMQLLIE- 121

LKNKGYMEDDGEYVEEL

QGGRLEKLTFEEIAAQAFVFFFAGFETSATTMTFALHLLASNQEVQDRGRKCVYEVLERH 444

DGKLSYEALMEMTYIDCIIQ (1)

ETLRIYPPVATIHRITTKPYKLPNGSVLPEGVGVVIPNLAFQRDPEFFPEPMQFRPE 157

RFFEDEKDKRHNFCHLPFGEGPRICIGMRFGLLQTRMGIAMLLKNYRFRLCPKSVFP

LKTDPINLIYGPAGDVWLGIEKIQ*

 

>AAGE01126587 45% to 6AJ1 missing part of exon 4 632860227 759657436 578977303

walked by megablast to AAGE01030106.1 but this still does not get the N-term

bset match to N-term in WGS = AAGE01034156.1  Since there is only one 6AJ

This must be the correct N-terminal.

Note after the sequence PERDDQLQSLLKTK there is a gap compared to Anoph 6AJ1

The seq KSSTY that comes right after this seq in 6AJ1 appears on the

Opposite strand translation in the same spot, so there might be an inversion

Here and this might be a pseudogene.  13 aa at the end of exon 4 are missing.

Five trace file seqs have a 100% match in this region, so the seq is correct.

Alternatively the seq may be shorter like 6AH1 or 6AG.  A PHASE 1 boundary is

possible.

1061 MILTTVFLIGLLYKNPVAFFLVIVAGLLVRELIKYHFRHWERCNVPGPKPSLIFGNIASN  882

881  IFLRQHFAEMIDGWYN (2)  834

763  KFPNAPFIGFYKIFKPSVMIRDPEMIKNVLVRDQACFSANDFAFDEKLDPLLAHNPFMVSG  587

586  ERWKKSRQLLTPIFTGSKMKQLFPIMDEISSQFVDFVGRQCGREVEAKS (0)  440

     ISAAYTTQNVAGCAFSLDADCFNNPNSEWRVMGKKIFQPTLLAGIKFMLMLFVPSVTWFIPVP (2)

2075 FLPKEVDRWMRKLVSTLLQERKNKQPERDDQLQSLLKTKS (1) 1956

1485 AELTEEQIAGHSLAFFSEGFETSSTTMGFAILH (0) 1381

1327 LAENPDVQEKLFQEIQNTLGKNDIPLTFDLVQKIEYLDWVLQESLRITPPA 1175

1174 AGLQKLCTQNYCLKYKVDGKEVGTWIMPGTTVLIPIVAVHM

 990 DPKYYPEPEKFRPERFSPEEKAKRTDPVYYPFGEGPRMCLGMRFAQIQIKMALLKLVQQF 811

 810 RVRTSPNYKPWQYNRNTFLTEAKDGLQVVFERRS* 706

 

>AAGE01171970 69% to 6AK1

822917241 578615327 575500950 complete

MPLVAWLAAIALICYLAWKRNNFWNRHGVPYVLEIPAVGNFSSVALQMHSMFD

YVARIYDHVRTRDADFFGINIFFRKALVIRNPDMVKKMLVADSRYFINRQMC

TDREGDHFGYYNLMMIKEPLWKDLRGYLSPSVTSSRLRRMFSLIDE (0) 526

IGNNMLAHLDGVQQKPTK

LRETEFKELCARFTTDVIASTFFGIQANCLSDEESEFRYYGR 264

KIFEYGPKRALNMAAFFFMPELVPYLGFKLFPRDTERFLKTIIEQEIARRETSGENRGDF 84

IDSMIALKNNEATIGVEEKI (2)

HLKGDILVAQAATFYMASFETTSSVLSFTLYELTKN (0)

PEIQQRLREEIHNCIKKYGRDLSYECLVNEMPYLGMVISEAARLYPVLPFIERQCSLPAGA

TGYKLDPFHNFVVPNKMPVLVPIYAIHRDPK (0)

YFPDPLRFDPDRFSKDNADNIVPCSYMPFGVGPRTCLGSHFGTLQVKVAITRLLSKYRI

LRSESSPETLTYRKNAFTLHSNEGLYADLELDELC*

 

>AAGE01277917 65% to 6AH1

519656511 569678490  521887623 578889466 749413119 578997302 complete

MLELYIALAVLAVCLYFKWSCSYWRRVGNVDGPQPLPIFGNGLEQITGAKHFGEIFEEVYR (2?)

TYPTAAWVGIYELFNKPAIVVRDLELVKE 878

ILVGSFQHFNRNSFEVDETIDPLVAINPFTQSGDLWKERRSQVVPVFSQTKIRSCFPIIK 698

NVADNFLEYVTKTRKTSPDFEAKD 623

ICARFTIDSVASCAFGIDAESFTNPNSEFRRVGFELFNPSSIMATVRSLLALFAPKLASLLRIP  (2?)

FVPPYVDRWFRKLVNEVIRQRKEGEVKRQDLFQAMYDT 628

LTQQGTVDVKNDEIVGHSVTFLTEGFETSSTLMCYFLYELASNQHIQDRVLNEIDCVLKE 448

YDGKLTDEAVNKLTYMERAMYETLRMHSPVFTLTKVCTKEYELPPQYTDDVGKRITMKPG 268

MSAIIPVHAIHLDPEIYPDPCRFEPDRFLDENRKGRHRYAFLGFGEGPRICLGMKFGLSQ 88

SKIGIATLLSKYRVVGSDKQELPLEISRKSFLLASKNGIWVKFVERG*

 

 

CYP3 clan

CYP9J related sequences 17 complete genes, two full length pseudogenes

And two partial pseudogenes

 

>CYP9J1 TC67648 TC11677 TC2154 TC45358 AY064092 AF390099 complete 50% to 9J4 96% to 9J2 

MVEVNIFSALAVGAVILLLYHYIAKKYHYFLTKPIPCIKPTFLL

GIFDMVVLKRVELVFGSKLLYNSYPDAKIIGYYELTKPTYMVRDPEMIKKIAIKDFDS

FTDRTPVFGDAVPADSLFFNSLFSLRGQKWRDMRSTLSPAFTGSRMRHMAELVVKCAT

SMTDFFHSEAKAGRRLEFNMKDTFSRFVCDAIASVAFGIEVDSFRDPENESYKKGNES

QKIHTFKSLATFVTLRFVPFLQKVFNFDFVDANVAGYFKKLILDNMDQRNKQGIVRND

LVNMLMETKNGALKYEEQDTQVPEGFATVEESHVGKSTHSRIWTDNELISQCFFFFFA

AFDNVSSILTFLSYELTVNQDIQRRLYEEIAVTESTLNGQPITYEALQKMAYLDMVVS

ETLRKYPTATLTDRYVNKDYVFDDEEGLRFVIEKGKTIWIPMLALHHDPKYFPEPERF

DPERFSEDNRSKIVPGTYLPFGAGPRSCIGPRLALLEVKMALYHLVKDFNLQASEKTQ

IPLRLSKSAFTMQAENGVWLELKARPKA

 

96% to CYP9J1, 9J2, 9J15 complete

nearly identical to 9J2 (9aa diffs)

This is a near exact match (1 aa diff) to CYP9Jae9 below without any frameshifts

This is not a pseudogene

MVEVNLFSALAVGAVILLLYHYIAKKYHYFLTKPIPCIKPTFLFGIFDMVVLKRVELVFGSKLLYNSYPDAK (2)

IIGYYELTKPTYMVRDPEMIKKIAIKDFDSFTDRT

PVFGDAVPADSLFFNSLF

SLRGQKWRDMRSTLSPAFTGSRMRHMAELVAKCATSMTDFIHSEAKAGRRLEFNMKDTFS

RFVCDAIASVAFGIEVDSF

RDPENEFYKKGNESQKIHTFKSLATFVTLRFVPFLQKVFNFDFVDANVAGYFKK

LILDNMDQRKKQGIVR

NDLVNMLMETKKGALKYEEPDMQVSEGYATVEESHVGKSTHSRIWTDNELISQCFFFFFA

AFDNVSSILAFLSYELTVNQDIQRRLYEEIAVTESTLNGQPITYEALQKMAYLDMVVSEA

LRKYPTATLTDRYANKDYVFDDEEGLRFVIEKGKTIWIPMLALHHDPKYFPEPERFDPER

FSEDNRSKIVPGTYLPFGAGPRSCIGPRLALLEVKMALYHLVKDFNLQASEK

TQIPLRLSKSAFTMQAENGV

WLELKARPKA

 

>CYP9Jae9 494315799 588882678 579665582 579009008

          Length = 536

 

 Score = 1060 bits (2742), Expect = 0.0

 Identities = 535/557 (96%), Positives = 535/557 (96%)

 Frame = +1

 

Query: 247  MVEVNLFSALAVGAVILLLYHYIAKKYHYFLTKPIPCIKPTFLLGIFDMVVLKRVELVFG 426

            MVEVNLFSALAVGAVILLLYHYIAKKYHYFLTKPIPCIKPTFL GIFDMVVLKRVELVFG

Sbjct: 1    MVEVNLFSALAVGAVILLLYHYIAKKYHYFLTKPIPCIKPTFLFGIFDMVVLKRVELVFG 60

 

Query: 427  SKLLYNSYPDAK*VCSTNDCRAYYDDEPSFSTRIIGYYELTKPTYMVRDPEMIKKIAIKD 606

            SKLLYNSYPDAK                     IIGYYELTKPTYMVRDPEMIKKIAIKD

Sbjct: 61   SKLLYNSYPDAK---------------------IIGYYELTKPTYMVRDPEMIKKIAIKD 99

 

Query: 607  FDSFTDRTPVFGDAVPADSLFFNSLFSLRGQKWRDMRSTLSPAFTGSRMRHMAELVAKCA 786

            FDSFTDRTPVFGDAVPADSLFFNSLFSLRGQKWRDMRSTLSPAFTGSRMRHMAELVAKCA

Sbjct: 100  FDSFTDRTPVFGDAVPADSLFFNSLFSLRGQKWRDMRSTLSPAFTGSRMRHMAELVAKCA 159

 

Query: 787  TSMTDFIHSEAKAGRRLEFNMKDTFSRFVCDAIASVAFGIEVDSFRDPENEFYKKGNESQ 966

            TSMTDFIHSEAKAGRRLEFNMKDTFSRFVCDAIASVAFGIEVDSFRDPENEFYKKGNESQ

Sbjct: 160  TSMTDFIHSEAKAGRRLEFNMKDTFSRFVCDAIASVAFGIEVDSFRDPENEFYKKGNESQ 219

 

Query: 967  KIHTFKSLATFVTLRFVPFLQKVFNFDFVDANVAGYFKKLILDNMDQRKKQGIVRNDLVN 1146

            KIHTFKSLATFVTLRFVPFLQKVFNFDFVDANVAGYFKKLILDNMDQRKKQGIVRNDLVN

Sbjct: 220  KIHTFKSLATFVTLRFVPFLQKVFNFDFVDANVAGYFKKLILDNMDQRKKQGIVRNDLVN 279

 

Query: 1147 MLMETKKGALKYEEPDMQVSEGYATVEESHVGKSTHSRIWTDNELISQCFFFFFAAFDNV 1326

            MLMETKKGALKYEEPDMQVSEGYATVEESHVGKSTHSRIWTDNELISQCFFFFFAAFDNV

Sbjct: 280  MLMETKKGALKYEEPDMQVSEGYATVEESHVGKSTHSRIWTDNELISQCFFFFFAAFDNV 339

 

Query: 1327 SSILAFLSYELTVNQDIQRRLYEEIAVTESTLNGQPITYEALQKMAYLDMVVSEALRKYP 1506

            SSILAFLSYELTVNQDIQRRLYEEIAVTESTLNGQPITYEALQKMAYLDMVVSEALRKYP

Sbjct: 340  SSILAFLSYELTVNQDIQRRLYEEIAVTESTLNGQPITYEALQKMAYLDMVVSEALRKYP 399

 

Query: 1507 TATLTDRYANKDYVFDDEEGLRFVIEKGKTIWIPMLALHHDPKYFPEPERFDPERFSEDN 1686

            TATLTDRYANKDYVFDDEEGLRFVIEKGKTIWIPMLALHHDPKYFPEPERFDPERFSEDN

Sbjct: 400  TATLTDRYANKDYVFDDEEGLRFVIEKGKTIWIPMLALHHDPKYFPEPERFDPERFSEDN 459

 

Query: 1687 RSKIVPGTYLPFGAGPRSCIGPRLALLEVKMALYHLVKDFNLQASEKTQIPLRLSKSAFT 1866

            RSKIVPGTYLPFGAGPRSCIGPRLALLEVKMALYHLVKDFNLQASEKTQIPLRLSKSAFT

Sbjct: 460  RSKIVPGTYLPFGAGPRSCIGPRLALLEVKMALYHLVKDFNLQASEKTQIPLRLSKSAFT 519

 

Query: 1867 MQAENGVWLELKARPKA 1917

            MQAENGVWLELKARPKA

Sbjct: 520  MQAENGVWLELKARPKA 536

 

AAGE02011007.1 1 diff to CYP9Jae9 (fourth P450 on this contig)

177820  MVEVNLFSALAVGAVILLLYHYIAKKYHYFLTKPIPCIKPTFLLGIFDMVVLKRVELVFG  177641

177640  SKLLYNSYPDAK (2) 177605

177541  IIGYYELTKPTYMVRDPEMIKKIAIKD  177461

177460  FDSFTDRTPVFGDAVPADSLFFNSLFSLRGQKWRDMRSTLSPAFTGSRMRHMAELVA  177290

177289  KCATSMTDFIHSEAKAGRRLEFNMKDTFSRFVCDAIASVAFGIEVDSFRDPENEFYKKGN  177110

177109  ESQKIHTFKSLATFVTLRFVPFLQKVFNFDFVDANVAGYFKKLILDNMDQRKKQGIVRND  176930

176929  LVNMLMETKKGALKYEEPDMQVSEGYATVEESHVGKSTHSRIWTDNELISQCFFFFFAA  176753

176752  FDNVSSILAFLSYELTVNQDIQRRLYEEIAVTESTLNGQPITYEALQKMAYLDMVVSEAL  176573

176572  RKYPTATLTDRYANKDYVFDDEEGLRFVIEKGKTIWIPMLALHHDPKYFPEPERFDPERF  176393

176392  SEDNRSKIVPGTYLPFGAGPRSCIGPRLALLEVKMALYHLVKDFNLQASEKTQIPLRLSK  176213

176212  SAFTMQAENGVWLELKAR  176159

 

>CYP9J2 TC64859 TC11586 TC5014 TC50571 AF329892 complete

50% to 9J4

96% to 9J1, 98% to AY064093

MVEVNLFSALAVGAVILLLYHYIAKKYHYFLTKPIPCIKPTFLL

GIFDMVVLKRVELVFGSKLLYNSYPDAKIIGYYELTKPTYMVRDPEMIKKIAIKDFDS

FTDRTPVFGDAVPADSLFFNSLFSLRGQKWRDMRSTLSPAFTGSRMRYMAELVVKCAT

SMTDFIHSEAKAGRRLEFNMKDTFSRFVCDAIASVAFGIEVDSFRDPENEFYKKGNET

QKIHTFKSLATFVTLRFVPFLQKVFNFDFVDANVAGYFKKLISDNMDQRKKQGIVRND

LVNMLMETKKGALKYEEPDLQVSEGYATVEESHVGKSTHSRIWTDNELISQCFFFFFA

AFDNVSSILAFLSYELTVNQDIQRRLYEEIAATESTLNGQPITYEALQKMAYLDMVVS

EALRKYPTATLTDRYANKDYVFDDEEGLRFVIEKGKTIWISMLALHHDPKYFPEPERF

DPERFSEDNRSKIVPGTYLPFGAGPRSCIGPRLALLEVKMALYHLVKDFNLQPSEKTQ

IPLRLSKSAFTMQAENGVWLELKARPKA

 

>CYP9J15 AY064093 complete 98% to 9J2 50% to 9J4

MVEVNLFSALAVGAVILLLYHYIAKKYHYFLTKPILCIKPTFLL

GIFDMVVLKRVELVFGSKLLYNSYPDAKIIGYYELTKPTYMVRDPEMIKKIAIKDFDS

FTDRTPVFGDAVPADSLFFNSLFSLRGQKWRDMRSTLSPAFTGSRMRHVAELVAKCAT

SMTDFFHSEAKAGRRLEFNMKDTFSRFVCDAIASVAFGIEVDSFRDPENEFYKKGNET

QKVHTFKSLTTFVTLRFVPFLQKVFNFDIVDANVAGYFKKLILDNMDQRKKQGIVRND

LVNMLMETKKGALKYEEPDLQVSEGYATVEESHVGKSTHSRIWTDNELISQCFFFFFA

AFDNVSSILAFLSYELTVNQDIQRRLYEEIAATESTLNGQPITYEALQKMAYLDMVVS

EALRKYPTATLTDRYANKDYVFDDEEGLRFVIEKGKTIWISMLALHHDPKYFPEPERF

DPERFSEDNRSKIVPGTYLPFGAGPRSCIGPRLALLEVKMALYHLVKDFNLQPSEKTQ

IPLRLSKSAFTMQAENGVWLELKARPKA

 

$$$$$$$

 

>AAGE01172381 AAGE01048909 TC52199 TC17152 TC22373 TC36113 72% to TC60679

575142964 494348902 trace archive seqs

join with TC60874 TC20300 TC25446 TC48456

complete, 57% to CYP9J2, 57% to 9J1 56% to 9J15

MLEVNLFAAIAVGALILAVYHHISKRYQYFLSKPVPCMKPTFLVGNSGPMLTRKKDIA

SHIRTMYDTYPDAKMIGFYDLTKPVYLVRDPEVVKTMTVKDFEHFTDHTPTMTGTGEE

VSEKSLFGNSLFALRGQKWRDMRSTLSPAFTGSKMRHMFELVVECGQSMAEFLLSEAK

AGKRLEFEMKDIFTRFGNDVIATVAFGIKVDSMRDRENEFYMKGKQLLNFQRFTLMIK

FLLMRAMPALAEKLGADFVDAEAGKYFTGVIMENMKQRKAHGIVRNDMIHMLMEVRKG

ALKHEKGEQETKDAGFATVEESQVGKTTHSRIWKDNELVAQCFIFFLAGFDTLSTGLT

FLTYELALNPEIQQRLYEEVMETESNLDGKPLTYE

VLQQMKYMDMVISESLRKWPPGIVADRYCTKEYQFKDGP

GSFLIEKGTSLWIPTIAIHNDPRYYPNPDKFDPERFSDENKSKINPAAYIPFGVGPRNCI

GSRLALMEMKSVVYYLLREFSFEPTEKTQIPLKLTMSGFTLQGEKGVWLEFKPRSI*

 

AAGE02011007.1 Length=228841 use this seq part of a large gene cluster

With 6 genes and one pseudogene on this contig (second P450 on contig)

Others are at 143937 (+)=CYP9Jae7, 164192 (-) = AAGE01021462, 177820 (-) = 9J2, 196721 (+)= 9J9v1 (6 diffs), 218249 (+) = 9J10

220106 (+) pseudogene N-term exon 1 only new

153975 MLEVNLFAAIAVGALILAVYHHISKRYQYFLSKPVPCMKPTFLVGNS

       GPMLTRKKDIASHIRTMYDTYPDAK (2) 154190

154250 MIGFYDLTKPVYLVRDPEVVKTMTVKDFEHFTDHTPTMTGTGEEVSEKSLF

GNSLFALRGQKWRDMRSTLSPAFTGSKMRHMFELVVECGQSMAEFLISEAKAGKRLEF

EMKDTFTRFGNDVIATVAFGIKVDSMRDRENEFYMKGKQLLNFQRFTLMIKFLLMRAM

PALAEKLGADFVDAEAGKYFTGVIMENMKQRKAHGIVRNDMIHMLMEVRKGALKHEKG

EQETKDAGFATVEESQVGKTTHSRIWKDNELVAQCFIFFLAGFDTLSTGLTFLTYELA

LNPEIQQRLYEEVMETESNLDGKPLTYEVLQQMKYMDMVISESLRKWPPGIVADRYCT

KDYQFKDGPGSFLIEKGTSLWIPTIAIHNDPRYYPNPDKFDPERFSDENKSKINPAAY

IPFGVGPRNCIGSRLALMEMKSVVYYLLREFSFEPTEKTQIPLKLTMSGFTLQGEKGV

WLEFKPRSI* 155653

 

>AAGE01021462 82% to AAGE01172381 821673483 574004088 637743693 complete

AAGE01075209 only 4 aa diffs to 574004088

578 MEVDLFAAIAVGALILAVYHHLLKRYQYFLTKPVSCVKPSFPMGSSGVMLTRKRDIFSHI 399

398 QMMYNTYPDAK (2) 366

304 IMGFYDFTKPVYMIRDPEVIKRITVKDFDHFIDHTPSMTGQGEEPGENSLLGNTLFALR 128

127 GQKWRDMRSTMSPAFTGSKMRHMFELVAESGQSTAKFLLAEA 2

    KARKRLEFEMKDTFTRFGNDVIATVAFGIKVDSMRDRDNEFYMKGKQLLNFQTFL

LKIKFIMMRAMPTLA

EKLGVDLLDAEAVKYFKGMILENMKQRKAHGIIRNDMIHMLMEVRKGALKHEKDEQDTKD

AGFATVEESQVGKTTHSRIWKDNELVAQCFIFFVAGFDTVSTGLTFLAYELALNPEIQQR

LYEEIIETETTLEGKSLTYEVLQKMKYLDMVVSEGLRKWPAGI

LGDRYCTKDYQYKDAAGSFVIEKGTSLWIPTIAIHNDPQYYPNPEKFDPERFSDENKSKI

NPFAYMPFGVGPRNCIGSRLALMEMKLIMYYLLREFSFEPTEKTQIPLKLVMSGFALQGE

KDVWLEFKPRAL*

 

AAGE02011007.1 = old AAGE01021462 (third P450 on this contig)

164192  MEVDLFAAIAVGALILAVYHHLLKRYQYFLTKPVSCVKPSFPMGSSGVMLTRKRDIFSHI  164013

164012  QMMYNTYPDAK (2) 163980

163918  IMGFYDFTKPVYMIRDPEVIKRITVKDFDHFIDHTPSMTGQGEEPGENSLLGNT  163757

163756  LFALRGQKWRDMRSTMSPAFTGSKMRHMFELVAESGQSTAKFLLAEAKARKRLEFEMKDT  163577

163576  FTRFGNDVIATVAFGIKVDSMRDRDNEFYMKGKQLLNFQTFLLKIKFIMMRAMPTLAEKL  163397

163396  GVDLLDAEAVKYFKGMILENMKQRKAHGIIRNDMIHMLMEVRKGALKHEKDEQDTKDAGF  163217

163216  ATVEESQVGKTTHSRIWKDNELVAQCFIFFVAGFDTVSTGLTFLAYELALNPEIQQRLYE  163037

163036  EIIETETTLEGKSLTYEVLQKMKYLDMVVSEGLRKWPAGILGDRYCTKDYQYKDAAGSFV  162857

162856  IEKGTSLWIPTIAIHNDPQYYPNPEKFDPERFSDENKSKINPFAYMPFGVGPRNCIGSRL  162677

162676  ALMEMKLIMYYLLREFSFEPTEKTQIPLKLVMSGFALQGEKDVWLEFKPRAL*  162518

 

>AAGE01005986 476365476 825769964 631521475 578081381 476406576 591384997

51% to TC52199 59% to 9J3 complete

MDFTFWGVMAAAAIGIGLIYRYMTRNYFYFADKPIPFLEPVFAIGNLGPLLMKKRDIFEHFRWLYNRFPNDK (2)

IFGMFSMSDPVFMIRDPAMLKRIAVKDFDHFADHSGLGGDTELDNPHMLVLNTLVALRG

NKWRDMRATLSPAFTGSKMRQMFALIAECGQRMVEFYKGAEEGSRIEVEAKEMFSRFTND

VIATTAFGIEVDSFRQPENEIFSLGKAVMQPSGLLNTLKGIGYVLFPKLMVKMNVDFLSK

KDDQFFRGTIQETMRIRQEKSIFRPDMIELLIQAKKGNLKHSADKQSEVEAFSAAEESQVGRRSHDRTWTDD

ELIAQALIFFSAGFETVSTTLSFVAYELARNDDVQSRLYEEILETNRSLDGKILSYEALQ

AMPYMDMVVSETMRLWPIGTIVDRLCVKDYVYDDGQGCRFTIEKGRSVMGSVIGMHHDPK

YYPQPEKFDPERFSAENRRNINPDTYLPFGIGPRNCI ()

GSRFALMEMKAVVYYLLLNLSFDVTEKTQIPLKMQKSPSRFVSEKGIWIALKPRVTVV*

 

AAGE02011007.1 1 diff to AAGE01005986 (first P450 on this contig)

143916  MDFTFWGVMAAAAIGIGLIYRYMTRNYFYFADKPIPFLEPVFAIGNLGPLLMKKRDIFEHFRWLY  144110

144111  NRFPNDK (2) 144131

144498  IFGMFSMSDPVFMIRDPAMLKRIAVKDFDHFADHSGLGGDTELDNPHMLVLNTLVALR  144671

144672  GNKWRDMRATLSPAFTGSKMRQMFALIAECGQRMVEFYKGAEEGSRIEVEAKEMFSRFT  144848

144849  NDVIATTAFGIEVDSFRQPENEIFSLGKAVMQPSGLLNTLKGIGYVLFPKLMVKMNVDFL  145028

145029  SKKDDQFFRGTIQETMRIRQEKSIFRPDMIELLIQAKKGNLKHSADKQSEVEAFSAAEE  145205

145206  SQVGRRSHDRTWTDDELIAQALIFFSAGFETVSTTLSFVAYELARNDDVQSRLYEEILET  145385

145386  NRSLDGKILSYEALQAMPYMDMVVSETMRLWPIGTIVDRLCVKDYVYDDGQGCRFTIEKG  145565

145566  RSVMGSVIGMHHDPKYYPQPEKFDPERFSAENRRNINPDTYLPFGIGPRNCI (1)  145721

145788  GSRFALMEMKAVVYYLLLNFSFDVTEKTQIPLKMQKSPSRFVSEKGIWIALKPRVTVV*  145964

 

>476384387 587659684 832450214 519879482 82% to AY433038 54% to 9J5 complete

note this seq is upstream of 9J6 on AAGE01000868 (4723-3066) on opp. strand

MEINLELWIAVISIGILLYKWITRNNDYFHEKPIPSMAVKPFFGGIAPLVFKSFSMNGFISHIYQKYPNVK (2)

VFGFFDALTPIFVVRDPELIKKITVKDFDHFIDHLPMFGNSENDNPYSIFGKTLFAL

TGKKWRQMRATLSPAFTGSKMRKMFELVIECSDSVAQFYKTQSNETHEVELTDLLTRF

GFDVIASCAFGIRMDSLRDRDNDFYNNGIKMRRFQRLSVAIRFVMFKFCPTLMGKLGIDV

IDRDQVRYFSALIKDAVKQ

RQTKDIIRHDMIQLLIQARKGTLKHQEEKEVEEGFATVKESSIGKTNVTFNMTDNEMI

AQAFVFFLAGFETVSTALTFLIHDLVMNKDVQHRLYEEVASTHEYLQGKHLNYDTLQKMK

YMDMVVSESMRMRPAGPFMDRVCIHDYDLDDGQGLKFTIDKGTAVWIPVQGIHMDPKYYP

NPERFDPERFNDENKAAINPMTYLPFGIGPRNCIGS

RFALMEIKAIVYYLLLHFSFEANRKTQIPLKLRKGFTVVAAEGEVWIDLKAR*

 

>CYP9J6 AY433038 AAGE01000868 (10298-11954) 520111339 528815988

616367213 644315757 56% to AY431970 56% to 9J5 complete

MEVNLGAVIVILSTVILIYKWITRNNDYFHEKPIPSMAVKPLFGSTGPLILKQFSLHGFINHIYQKYPNAK (2)

VLGIFDALTPIFVVRDPELIKKIAVKDFDHFIDHRPMFGNSENDN

PYSIFGKTLFALEGQKWRDMRATLSPAFTGSKMRKMFELVIECSDSVAQYYVKQSKKVVE

VELTDMCTRFGSDVIATCAFGIKMDSLRERDNEFYDNGKKMMRFERLSVALRMFAFKFFP

TLMGQMGIDIIDREQAKYFSALIMDAVRQRQTKGITRPDMIQLLIQARKGTLKHQEEKEV

EEGFATVKESSIGKTNVSFNMTDNEMIAQAFVFFLAGFETVSTTLTFLIYDLVV

NKDVQQRLYEEIVATNDSLQG

KLLNYDTLQKMKYLDMVLSESMRIRPAAATLDRLCVRDYEVDDGQGLKFTINKGTAVWIP

TQGIHMDPMYYPNPERFDPERFNDENKATIDPMTYLPFGVGPRNCIGSRFALMEIKAIVY

YLLLHFSFEANRKTPIPLELRKGFTIVAAEGEVWIDMKAR*

 

>AAGE01063458 263503628 49 % to TC52199  48% to CYP9L3 476324061 

DR747831 520184164 820336301 223394438 51% to CYP9J1 476322739 complete

MEVNLLLLLIIVGILGVIYRQVKKHYDYFHDKPIPSMATVPLLGSTGPLMTKRCTFNDFIQTIYYKYPSAKV

FGLFDMTTKMFVLRDPEVIKKITVKDFEYFVDRRPLFGANKEDDGNENIL

FNKTLVGMVDQRWRDMRAILSPAFTGSKMRAMFELIEQYCTQMVPILKEQSAESGYVDYE

MKDFFSRVANDIIATCAFGLQVESLKSRDNEFYTMGKQMMNFNRFIVLLRVMGLRFFPSL

MIKMGVDIVDREQNQYFSKIIKEAVRARETHGIVRPDMIHLLMQARKGTLKHQQETTEST

AGFATVEESDVGKSVVSKTMSEPEFIAQCLIFFLAGFDTVSTGMLFMAYELDLNPNIQQK

LYEEIAQTNKELGGKPATYDTLQKMKYMDMVVS

ESLRMWPVAAFDRKCGRDYVLDDGAGLKFTIDAGTCIWVPVYGIHRDPKYYPNPDKFEPE

RFSDENRGKIDMTMYMPFGMGPRNCIGSRFALMEIKAIMYALLLNFSIERNEKTQVPLKL

VKGFVGLQVENGLHLRFKKRK*

 

>CYP9J9v1 AAGE01125862 AAGE01449675 TC60679 TC19466 TC24864 TC43056

59% to CYP9J1 AY431945 DR747470 91% to TC60950,

90% to TC60951 588932055 complete

MVEVDLYVAVAIGAIILLLYHYGSKKYEYFLTKPIPALKPTFLLGNT

GAMMFRRRDVSAHVKLLYNSLEGYK (2)

VAGFYDLMKPIYM

LRDPEVIKQIAVKDFDYFMDHTPTMTNNRADDEVGGDSLFGNSLFALRGQK

WRDMRATLSPAFTGSKMRHMFELVADCAKSMAEFFKSEAAAGKKLEYEMKDTFSRFGNDV

IATVAFGIKVDSLRDRDNEFYMKGKNMLNFQSVSVMFKFLLLRAFPKLSQKIGVDFVDST

LTEYFKGMIVDNMKQRDAHNIFRNDMIQMLMEVRKGSLKHQKDEKETKDAGFATVEESNV

GKSTINRVWTENELIAQCFLFFLAGFDTVSTCMTFLTYELMLNPDIQQRLFDEVMETEES

LNGKPLTYEVLQRMEYMDMVVSEALRKWPPAVVSDRFCVKNYMYDDGKGTRFPIEKGQTM

WIPTIAIHSDPRYYENPEKFDPERFNEENRSKIDTGAYLPFGVGPRNCIGSRLALMEVKV

IIYNLLKEFSLEASEKTQIPLKMAKNFFALQAENGVWLELKPRKH*

 

AAGE02011007.1 6 diffs to 9J9v1 (fifth P450 on this contig)

196721  MVEVDLYVAVAIGAIILLLYHYGSKKYEYFLTKPIPALKPTFLLGNTGAMMFRRRDVSAH  196900

196901  VKLLYNSFEGYK (2) 196936

196999  VAGFYDLMKPIYMLRDPEVIKQIAVKDFDYFMDHTPTMTNSKADDEVGGDSLFGNSLFA  197175

197176  LRGQKWRDMRATLSPAFTGSKMRHMFELVADCAKSMAEFFKSEAAAGKKLEYEMKDTFSR  197355

197356  FGNDVIATVAFGIKVDSLRDRDNEFYMKGKNMLNFQSVSVLFKFLLLRAFPKLSQKIGVD  197535

197536  FVDSNLTEYFKGMIVDNMKQRDAHNIFRNDMIQMLMEVRKGSLKHQKDEKETKDAGFATV  197715

197716  EESNVGKSTINRVWTENELIAQCFLFFLAGFDTVSTCMTFLTYELMLNPDIQQRLFDEVM  197895

197896  ETEESLNGKPLTYEVLQRMEYMDMVVSEALRKWPPAVVSDRFCVKNYMYDDGKGTRFPIE  198075

198076  KGQTMWIPTIAIHSDPRYYENPEKFDPERFNEENRSKIDTGAYLPFGVGPRNCIGSRLAL  198255

198256  MEVKVIIYNLLKEFSLEASEKTQIPLKIAKNFFALQAENGVWLELKPR  198399

 

>AAGE01065801 AY431970 60% to 9J5 494336054 641818007 578928176 complete

AAGE02029679.1 use this seq change D to E

MEVELLHVGVLVAIVAFLYRWITRNNDYFHDKPIPSMAVTPFLGASGPLLLRKVTFNDFVQSIYNKYPGVK (2)

VFGMFETITPFFVIRDPELIKQIGIKDFDHFVDHRPTFGLDDETAEHPKALFRKTLFSM TGQRWKEMRATLSPAFTGSKMRQMFSLMSECCDEMMKHYLDKAKGSG RVEVEMKDLLSRISINVIASCAFGIKVDCFKEQEHEFLYHGRKMMGFGRPIVIARMLAMR

VFPKFAAKFGIDLLDREQANYFTHVFQETIRARESHGYIRHDMIDLLLQARKGTLKYQEE

KDDQEGFATVQESDVGKADVSKSMTEAEMIAQCLIFFLGGFDTVSTCAMFTAYELVRNPE

VQHKLYEEIKQTEKELEGKPLSYDALQKMKYMDMVVSETLRMWPLAPATDRLCTQDYTID

DGQGVRFTIDKGTCVWFPAAGLHHDPQYFPNPEKFDPERFNDENKR

NINLGAYLPFGIGPRNCIGSRFALMEVKAVMYFILLKFSFVRGAKTQIPMQLRKGFTNLG

PENGMHVELKLR*

 

>AAGE01006393 81% to AAGE01065801 83% to AY431970 complete

AAGE01400897 84% to AAGE01065801 834914646 N-term 578891344 C-term

749 MEVNLVYLAVVLAVIAYLYRWITRNNDYFHDKPIPSMAVRPFLGASGSLVLRKVSFPDFI 570

569 QTIYNKFPGVK (2) 537

474 VFGMFETITPFFVIRDPELIKQIAIKDFDHFVDHRPTFGLFDEESAEH 331

330 PNALFRKTLFSMTGQRWKEMRATLSPAFTGSKMRLMFSLMGECFDGMIDHYVKKAKTSGR 151

150 VEVEVKDMMSRVSINVIASCAFGIKVDCFKDQDHEFL 181

 182 RHGKKMMDFARPIVIARMMAMRVFPKLSSRFGIDLLDPEQARYFTQVFQETIKARESHGT 361

 362 VRNDMIDLLLQARKGTLKFQEEKNDQEGFATVQESDMGKVEVMKHITESEMIAQCLVFFL 541

 542 GGFDTVSTCAMFMAYELVRSPEVQQKLYEEVLETSKELAGKPLSYDALQKMKYMDMVVSE 721

 722 TLRIWPLAPATDRLCTKDYTIDDGQGLKFTIDKGTCVWFPAAGLHHDPQYFPNPERFDPE 901

 902 RFNDENKRNINLGAYLPFGIGPRNCIGSRFALMEVKAVMYYTLLKFTIVRSAKTQIPMQL 1081

1082 RKGFTNLGPEKGMHVELKLR*

 

AAGE02029680.1 Same as above use this seq

86528  MEVNLVYLAVVLAVIAYLYRWITRNNDYFHDKPIPSMAVRPFLGASGSLVLRKVSFPDFI  86707

86708  QTIYNKFPGVK (2) 86740

86803  VFGMFETITPFFVIRDPELIKQIAIKDFDHFVDHRPTFGLFDEESAEH  86946

86947  PNALFRKTLFSMTGQRWKEMRATLSPAFTGSKMRLMFSLMGECFDGMIDHYVKKAKTSGR  87126

87127  VEVEVKDMMSRVSINVIASCAFGIKVDCFKDQDHEFLRHGKKMMDFARPIVIARMMAMRV  87306

87307  FPKLSSRFGIDLLDPEQARYFTHVFQETIKARESHGTVRNDMIDLLLQARKGTLKFQEEK  87486

87487  NDQEGFATVQESDMGKVEVTKQITESEMIAQCLVFFLGGFDTVSTCAMFMAYELVRNPEV  87666

87667  QQKLYEEVLETSKELAGKPLSYDALQKMKYMDMVVSETLRIWPLAPATDRLCTKDYTIDD  87846

87847  GQGLKFTIDKGTCVWFPAAGLHHNPQYFPNPERFDPERFNDENKRNINLGAYLPFGIGPR  88026

88027  NCIGSRFALMEVKAVMYYTLLKFTIVRSAKTQIPMQLRKGFTNLGPEKGMHVELKLR  88197

 

>AAGE01179692 AAGE01102574 AAGE01259804 (3 aa diffs)

476398393 616358813 575550118 584317339

67% to TC60679 in 9J fam

53% to 9J4 63% to 9J9 complete

AAGE01266366 parts of two genes

MAAAVLVAVLLFCRYVAKKYQYFLTK

PVPCVKPTFLLGSSGPTIFRKVDVATHFKKIYDVFPQAP

VIGFYDFTTPMYLLRDPEMIKKVSIKDFDYFTDHVPMMPTDAEKEHNPDTLFGNT

LLSLRGQKWRDMRSTLSPAFTGSKMRHMFELVAECGRSLVEHFKAEAAAGRTMEHEMKET

FSKVGSDLIATLAFGIKVDSLREPENVFYANGKKMLNLKSLATFVKFLLITFVPRLMRWLKVDVLNGQSAAY

FKRIILDNMEQREAHKILRNDMIQILMEVRKGTLQHQKEEKDTKDAGFATVEESQVGKSS

HSRVWTENELVAQCLLFFLAGLDTISTCMT

FLTYELTVDPDIQQRLYEEITETYKSLNGKPLSYDVLQRMQYMDMIVSETLRKWPPGVIS

NRYCNKNYLYDDGRGTQFVIEKGQVILIPSYCIQRDPRYFPDPDRFDPERFNEANRAQIN

TSAYIPFGVGPRNCIGSRLALMEVKCMVYYLLKDFELIATGKTQIPERIARDSFGLHPEK

GVWIEFKPRSSQDS*

 

>AAGE01198792 (parts of two genes, 1-804 is N-term of AAGE01339434,

1707-973 is C-term of another gene) 95% to 476398393 

574155449 638535809 complete

MLKVDLFMAAALLAAVLLFCRYVAKKYQYFLTKPVPCVKPTFLLGSSGPTIFRKVDV

ATHFKKIYDVFPQAP (2)

VIGFYDFTTPMYLLRDPEMIKKVSTKDFDYFTDHVPMMPTDAEKEHGPETLFGNTLL

SLRGQKWRDMRSTLSPAFTESKMRHMFELVAECGRSLVEHF

QTEAAAGRTMVHEMKETFSKVGSDLIATLAFGIKVDSLREPENVFYANGKKMLNLKSLAT

FVKFLLIMFVPRLMRWLKVDVLNGQSAAYFKRMILDNMEQREAHKILRNDMIQILMEVRK

GTLQHQKEE

1707 KDTKDAGFATVEESQVGKSSHSRVWTETELVGQCLLFFLAGLDTISTCMTFLTYELTVDP 1528

1527 DIQQRLYEEITETYKSLNGKPLSYDVLQRMQYMDMIVSETLRKWPPGVISNRYCNKNYLY 1348

1347 DDGRGTQFVIEKGQVILIPSYCIQRDPRYFPDPDRFDPERFNEANRAQINTSAYIPFDVG 1168

1167 PRNCIGSRLALMEVKCMVYYLLKDFELIATGKTQIPERIARDSFGLHPEKGVWIEFKPRNSPDF* 973

 

>AAGE01102574 possible pseudogene fragment upstream of AAGE01179692

AAGE02011008.1 pseudogene piece

1906  T*LII*KGHTAFISTYRSHHDP*YYENPEQFDPEWFNEAYRAHISTNIYIPFEFRPRNCI  2085

2086  GSRLALIEVKRM  2121

2121  VYQHLKDFETVTTEPIPVRIARDPLALHPEKGVRAELK  2234

 

AAGE02011008.1  use this seq (first P450 on contig)

Note 6 P450 genes in this contig at 2k, 4k= AAGE01339434, 6k=9J8,

16k= AAGE01007189, 19k= CYP9LaeP 494089659, 31k = 9Jnew all (+)

2420  MAAAVLVAVLLFCRYVAKKYQYFLTKPVPCVKPTFLLGSSGPTIFRKVDVATHFKKIYDV  2599

2600  FPQAP (2) 2614

2669  VIGFYDFTTPMYLLRDPEMIKKVSIKDFDYFTDHVPM  2779

2780  MPTDAEKEHNPDTLFGNTLLSLRGQKWRDMRSTLSPAFTGSKMRHMFELVAECGRSLVEH  2959

2960  FKAEAAAGRTMEHEMKETFSKVGSDLIATLAFGIKVDSLREPENVFYANGKKMLNLKSLA  3139

3140  TFVKFLLITFVPRLMRWLKVDVLNGQSAAYFKRIILDNMEQREAHKILRNDMIQILMEVR  3319

3320  KGTLQHQKEEKDTKDAGFATVEESQVGKSSHSRVWTENELVAQCLLFFLAGLDTISTCMT  3499

3500  FLTYELTVDPDIQQRLYEEITETDKSLNGKPLSYDVLQRMQYMDMIVSETLRKWPPGVIS  3679

3680  NRYCNKNYLYDDGRGTQFVIEKGQVILIPSYCIQRDPRYFPDPDRFDPERFNEANRAQIN  3859

3860  TCAYIPFGVGPRNCIGSRLALMEVKCMVYYLLKDFELIATEKTQIPVSIARDSFGLHPEK  4039

4040  GVWIEFKPRSSQDS* 4084

 

>AAGE01339434 AAGE01406122 775439256 579949790 521969711 complete 61% to 9J10

AAGE01266366 parts of two genes

AAGE01198792 (parts of two genes, 1-804 is N-term of this gene,

1707-979 is C-term of another gene 95% to 476398393) 

NABOD09TR  NABOD09 TC54929 TC20193 TC31509 TC43101 TC5368 TC8501

MEVDLLSAFAVGCIVILIYHYASQKYLYFLTKPIPSLKPTFLVGNIGDIIFRTKDALTHINELYYAFPESK (2?)

VVGFYELTKPVFMLRDPEVIKQITVKDFDHFMDRS

LPSANDRADTDQPVEGLFANSLVAFQGQKWKDMRSTLSPAFTGSKIRHMFDLVADCSRSM

VEHFRSEANAGRRLECELKDVFSRFCNDVIATVAFGIRVDSVRDPETEFYVKGKQLLDFQ

SPKIILKFLLFQTVPWLMRKLKVDFADADLADYFKGIIQ

DNMKQREVHGIVRNDMVQMLMEVRKGTLKHISDDRESKDSGFASVEESHFGKSTHSRAWT

DNELISQCFVFFIAGLDTVSSCLTFLTYELTLNPDIQKRLYEEVMDTERLLSEKPLSYEA

LQSMKYLDMVVSETLRKWPPTIDSDRYSTRDYLLDDGAGLKVPIEKGRSIYIPIVAIQND

PKYFPDPDRFDPERFSDENRSKIVPGTFIPFGAGPRNCIGSRLALMEVKVAVY

YLLREFSLERTERTDDPIRLTKKAIDLRTENGAWVELKPRKI*

 

AAGE02011008.1  use this seq 6 diffs to AAGE01339434 (second P450 on contig)

4256  MEVDLLSAFAVGCIVILIYHYVSQKYLYFLAKPIPSLKPTFLVGNIGDIIFRTKDALTHINELYYAF  4456

4457  PESK (2) 4468

4537  VVGFYELTKPVFMLRDPEVIKQITVKDFDHFMDRSLPSANDRADTDQPVEGLFANSLVAF  4716

4717  QGQKWKDMRSTLSPAFTGSKIRHMFDLVADCSRSMVEHFRSEANAGRRLECELKDVFSRF  4896

4897  CNDVIATVAFGIRVDSVRDPETEFYVKGKQLLDFQSPKIILKFLLFQTVPWLMRKLKVDF  5076

5077  ADADLADYFKGIIQDNMKQREVHGIVRNDMVQMLMEVRKGTLKHIGDDRESKDSGFASVE  5256

5257  ESHFGKSTHSRAWTDNELISQCFVFFIAGLDTVSSCLTFLTYELTLNPDIQKRLYEEVMD  5436

5437  TERLLSEKPLSYEALQSMKYLDMVVSETLRKWPPTIDTDRYSTRDYLLDDGAGLKVPIEK  5616

5617  GRSIYIPIVAIQNDPKYYPDPDRFDPERFSDENRSKIVPGTFIPFGAGPRNCIGSRLALM  5796

5797  EVKVAVYYLLREFSLERTERTDVPIRLTKKAIDLRTENGAWVELKPRKI*  5946

 

>CYP9J8 AAGE01187448 AAGE01142069 AAGE01118978

476375412 832374064 810104215 758886185 262894467 complete

AAGE01439874 N-term of 9J8 and C-term of AAGE01339434

9J8 is 609bp downstream of AAGE01339434

MLDPFLLAAFAAVIFLVYHLLNRKYQFFAERGIPYVKPTLLLGNGASVLLKKEDLLQNIQRTYDTFPNAK (2)

IMGIFDFVKPIMMIRDPDAIKQIGVKDFDHFVDHTPLFTPADCEDVGTNSLFGNSLFA

LRGQKWRDMRATLSPAFTGSKMRHMFELVLDC

ARSTAEYFREEAKSGRTTEYEMKNVFSRFSTDVIGSVAFGIKVDSLREQDNDFFVKGKAM

LNFQNLKSLLKVIMLRSAPGLMNRLNVDITSPQMNAYFKDMIMDNMKQREINGIVRNDMI

NILMQVQKGALLHQKDEQDTKDAGFATVEESSVGKALHNRVWSENELVAQCFLFFLAGFD

TVSTCLTFVSYELLANPDVQQKLFEEIMAVEASLDGKPLSYEVLQKMQYLDQIISETLRL

WPPAPFVDRYCVKDYLFDDGQGTRVPIEKGQIVWFPITALHHDAKYFPEPNRFDPER

FSEQNRPKINPGAYLPFGVGPRNCIGSRFALMEVKAIVYHLVKNFTLERSGKSRVPLKLE

KSYIAMIVEGGMWLEFRPRA*

 

AAGE02011008.1 use this seq 4 diffs to CYP9J8 AAGE01187448 (third P450 on contig)

6554  MLDPFLLAAFAAVIFLVYHLLNRKYQFFAERGIPYVKPTLLLGNGASVLLKKEDLLQNIQRTYET  6748

6749  FPNAK (2) 6763

6825  IMGIFDFVKPIMMIRDPDAIKQIGVKDFDHFVDHTPLFTPADCEDVGTNSLFGNSLFAL  7001

7002  RGQKWRDMRATLSPAFTGSKMRHMFELVLDCARSTAEYFREEAKSGRTTEYEMKNVFSRF  7181

7182  STDVIGSVAFGIKVDSLREQDNDFFVKGKAMLNFQNLKSLIKVIMLRSAPGLMNRLNVDI  7361

7362  TSPQMNAYFKDMIMDNMKQREINGIVRNDMINILMQVQKGALLHQKDEQDTKDAGFATVE  7541

7542  ESSVGKALHNRVWSENELVAQCFLFFLAGFDTVSTCLTFVSYELLANPDVQQKLFEEIMA  7721

7722  VEASLDGKLLSYEVLQKMQYLDQIISETLRLWPPAPFVDRYCVKDYLFDDGQGTRIPIEK  7901

7902  GQIVWFPITALHHDAKYFPEPNRFDPERFSEQNRPKINPGAYLPFGVGPRNCIGSRFALM  8081

8082  EVKAIVYHLVKNFTLERSGKSRVPLKLEKSYIAMIVEGGMWLEFRPRA*  8228

 

>AAGE01007189 AAGE01035444 65% to 9J8 complete

3059 MFFALAIFAGLVLFCLYNVQQKYKYFESRGIPYVKPSFLLGNSAPLIFKKKDMLRHIQDLY 2877

2876 HTHPEAK (2) 2856

     IMGLFDFTAPVWMVRDPEAIKQLAVKDFDHFS 2698

2697 DHTPIYTGGDVEDMGTDSLFGNSLLLLRGQKWRDMRATLSPAFTGSRMRLMFELVSECAQ 2518

2517 SMVDYFREEATAGKRLEYEMKDVFSRFSNDVIASVAFGIKVDSLREPDNEFFINGKELMN 2338

2337 FRNMKTVAKVLLMRMFPRLMIKLKADISSAEMNAYFRGMITDNMKQRQAHGIVRNDMINI 2158

2157 LMQVRQGALKNQKEDQETSNAGFAVVEESTTIGQPRDRVWSDNELAAQCFLFFIAGSET 1981

1980 VSTYLTFLAYELLINPEVQEKLFREIAEVERSLAGKPIGYDQLQAMKYMDMVVSENLRLW 1801

1800 PPAPFADRYCSKNYRYDDGQGTRVTIEKGQIVWFPTTALQHDPEYFPDPYRFDPERFSDQ 1621

1620 NRSKIKTGTYLPFGIGPRACIGSRLALLEVKVVAYHLVKHFKLVRSERSKVPLKLKSKMI 1441

1440 GMEVDGGVWLELEPRERS* 1384

 

AAGE02029680.1 Length=88206 note 9 P450s on this contig, use this seq

13173  MFFALAIFAGLVLFCLYNVQQKYKYFESRGIPYVKPSFLLGNSAPLIFKKKDMLRHIQDL  12994

12993  YHTHPEAK (2) 12970

12907  IMGLFDFTAPVWMVRDPEAIKQLAVKDFDHFSDHTPIYTGG  12785

12784  DVEDMGTDSLFGNSLLLLRGQKWRDMRATLSPAFTGSRMRLMFELVSECAQSMVDYFREE  12605

12604  ATAGKRTEYEMKDVFSRFSNDVIASVAFGIKVDSLREPDNEFFTNGKELMNFRNMKTVAK  12425

12424  VLLMRMFPRLMIKLKADISSAEMNAYFRGMITDNMKQRQAHGIVRNDMINILMQVRQGAL  12245

12244  KNQKEDQETTNTGFAVVEESTTIGQPRDRVWSDNELAAQCFLFFIAGSETVSTCLTFLAY  12065

12064  ELLINPEVQEKLFREIAEVERSLAGKPIGYDQLQAMKYMDMVVSENLRLWPPAPFADRYC  11885

11884  SKNYRYDDGQGTRATIEKGQIVWFPTTALQHDPEYFPDPYRFDPERFSDQNRSKIKTGTY  11705

11704  LPFGIGPRACIGSRLALLEVKVVAYHLVKHFKLVRSERSKVPLKLKSKMIGMEVDGGVWL  11525

11524  ELEPRERS*  11498

 

AAGE02011008.1 use this seq 1 diff to AAGE01007189 (fourth P450 on contig)

16655  MFFALAIFAGLVLFCLYNVQQKYKYFESRGIPYVKPSFLLGNGAPLIFKKKDMLRHIQDLYHT  16843

16844  HPEAK (2) 16858

16921  IMGLFDFTAPVWMVRDPEAIKQLAVKDFDHFS  17016

17017  DHTPIYTGGDVEDMGTDSLFGNSLLLLRGQKWRDMRATLSPAFTGSRMRLMFELVSECAQ  17196

17197  SMVDYFREEATAGKRLEYEMKDVFSRFSNDVIASVAFGIKVDSLREPDNEFFINGKELMN  17376

17377  FRNMKTVAKVLLMRMFPRLMIKLKADISSAEMNAYFRGMITDNMKQRQAHGIVRNDMINI  17556

17557  LMQVRQGALKNQKEDQETSNAGFAVVEESTTIGQPRDRVWSDNELAAQCFLFFIAGSETV  17736

17737  STYLTFLAYELLINPEVQEKLFREIAEVERSLAGKPIGYDQLQAMKYMDMVVSENLRLWP  17916

17917  PAPFADRYCSKNYRYDDGQGTRVTIEKGQIVWFPTTALQHDPEYFPDPYRFDPERFSDQN  18096

18097  RSKIKTGTYLPFGIGPRACIGSRLALLEVKVVAYHLVKHFKLVRSERSKVPLKLKSKMIG  18276

18277  MEVDGGVWLELEPRERS* 18330

 

>AAGE01004684 AAGE01021948 494089659 55% to CYP9L2 223518047 590305650 827533211

569625505 575383376 520166303 520595733 637720165 65% to 263503628 complete

possible pseudogene This seq has a stop codon seen in 9 trace files

MDINSYYLLVTIVLLILIVLYRRVSKHYGYFSDKPIPSLTPIPLFGNMFPLFMKKYTFPEFIQMIYNRFPDAK (2?)

LGMFDMSTRFVVLRDPELIKKVLVKDFEFFIDRRSLFGDSASESDSILITKTLL

LLTGQKWRDMRATLSPAFTGSKMRAMFELIVTYSDRMVGILKDQAGPVGYVDYE

MKECCSRIASDIIATCAYGLEVESLANRENDFYTMGKKMMNFGKTSFFVRLLLYSVFPKLMSKLQVDLFDGEQTRYFTEIIKDTVKARD*HGIVRPDMIHLLMQARKGVLKHHRETAEASAGFATVEESEVGKTAIGKTMT

DSEFVAQCLIFFIAGFEAISSQMSFMCYELATNPDIQQKLYEEIKETNKLLKGKPLTYDTL

QQMKYMDMVTSEALRMWSGPATDRKCVRDYVLDDGAGWKFPIEAGTCVMI 721

720 PSYAIHRDPKYYPNPDRFDPERFSEERRADINMTMYLPFGAGPRNCIGSRFALMEMKAIV 541

540 YGLLLNFSIERNEKTQVPLRLNKGFAPLAGEKGMHLRLKVRG* 418

 

AAGE02011008.1 Length=35048 use this seq both ours were wrong

19802  MDINSYYLLVTIVLLILIVLYRRVSKHYGYFSDKPIPSLTPIPLFGNMFPLFMKKYTFPE  19981

19982  FIQMIYNRFPDAK(2)

       ALGMFDMSTRFVVLRDPELIKKVLVKD  20161

20162  FEFFIDRRSLFGDSASESDSILITKTLLLLTGQKWRDMRATLSPAFTGSKMRAMFELIVT  20341

20342  YSDRMVGILKDQAGPVGYVDYEMKECCSRIASDIIATCAYGLEVESLANRENDFYTMGKK  20521

20522  MMNFGKTSFFVRLLLYSVFPKLMSKLQVDLFDGEQTRYFTEIIKDTVKARD*HGIVRPDM  20701

20702  IHLLMQARKGVLKHHQETAEASAGFATVEESEVGKTAIGKTMTDSEFVAQCLIFFIAGFE  20881

20882  AISSQMSFMCYELATNPDIQQKLYEEIKETNKLLKGKPLTYDTLQQMKYMDMVTSEALRM  21061

21062  WSGPATDRKCVRDYVLDDGAGWKFPIEAGTCVMIPSYAIHRDPKYYPNPDRFDPERFSEE  21241

21242  RRADINMTMYLPFGAGPRNCIGSRFALMEMKAIVYGLLLNFSIERNEKTQVPLRLNKGFA  21421

21422  PLAGEKGMHLRLKVRG*  21472

 

AAGE02011008.1 use this seq 1 diff to AAGE01007189 (fifth P450 on contig)

pseudogene This seq has a stop codon seen in 9 trace files

19826  MDINSYYLLVTIVLLILIVLYRRVSKHYGYFSDKPIPSLTPIPLFGNMFPLFMKKYTFPEFIQMIYNR 20005

20006  FPDAK (2) 20020

20081  ALGMFDMSTRFVVLRDPELIKKVLVKDFEFFIDRR  20185

20186  SLFGDSASESDSILITKTLLLLTGQKWRDMRATLSPAFTGSKMRAMFELIVTYSDRMV  20359

20360  GILKDQAGPVGYVDYEMKECCSRIASDIIATCAYGLEVESLANRENDFYTMGKKMMNFGK  20539

20540  TSFFVRLLLYSVFPKLMSKLQVDLFDGEQTRYFTEIIKDTVKARD*HGIVRPDMIHLLMQ  20719

20720  ARKGVLKHHQETAEASAGFATVEESEVGKTAIGKTMTDSEFVAQCLIFFIAGFEAISSQ  20896

20897  MSFMCYELATNPDIQQKLYEEIKETNKLLKGKPLTYDTLQQMKYMDMVTSEALRMWSGP  21073

21074  ATDRKCVRDYVLDDGAGWKFPIEAGTCVMIPSYAIHRDPKYYPNPDRFDPERFSEERRAD  21253

21254  INMTMYLPFGAGPRNCIGSRFALMEMKAIVYGLLLNFSIERNEKTQVPLRLNKGFAPLAG  21433

21434  EKGMHLRLKVRG* 21472

 

>CYP9Jnew AAGE01553900 AAGE01064689 59% to 9J5 644306108 757010867 616348285 complete AAGE02011008.1 (sixth P450 on contig)

30792  MEVNVLYLLIVVAVLAVIYRRITRFYEYFHDKPIPSMAAGPPFGSAGPLY

       RKKYSFNDFIKMTYDKFPGAK (2) 31004

31067  VFGLCDMTTKLFVIRDPELIKKVTVKDFDYFVNRRATFGESIDDHDEMLFAKSLLALN  31240

31241  DQKWRDMRATLSPAFTGSKMRAMFELIEGYSARMVEILKEQSQAAGYVDYEMKDCFTRVA  31420

31421  NDIIATCAFGLQVESLKNRENEFYVMGKNMLNFNRVSIMFRIFGFNLFPGLMAKLGVDLI  31600

31601  DAEFGQYFSKIIKDAVHTRETRGIVRPDMIHLLMQAKKGALKSQYETTDANTGFATVEE  31777

31778  SEVGRSSIAKAITESEMIAQCFVFFLAGFDSVSSEMVFMAYELALNPDVQQRLYEEIVET  31957

31958  DKQLGGKPPTYDTLQKMQYMDMVVSESLRMWPAGAFDRKCDRDYVLDDGAGLKFTIDRG  32134

32135  AYVWIPVHGIHRDPKYYPDPDKFDPERFSESNRDNIDMTMYMPFGAGPRNCIGSRFALME  32314

32315  IKAIMYALLLQFRIERNEKTSVPLKLVKGFAGLNGEGGVHLRLTLRQ* 32458

 

>AAGE01123974 494309314 592077078 733946792 579386359 62% to CYP9J5

pseudogene N-term missing, deletion in second exon

TPFLGASGPLMLRKVTFIEFIQSIYNKYPGVK (2?)

VFGMFDTITPFFEIRD

[DELETION]

KFGIDLLDREQADYFTHVFQETIRTRESHGIIRHDMIDLLLQARKGTLKYHEE

KDDQEGFATVQESEVGKVEMSKSMTEAEMIAQCLIFFLGGFDTVSSCIMFTAYELVRN

PEVQQKLCEEIVQTDKELGGKPLSYDALQKMKYMDMVVSESLRIWPLAPATDRLCTKDYI

VDDGQGLKFTIDKGTCVWFPAAGLHHDPQYFPNPERFDPERFNDENKRNINLGTYLPFGI

GPRNCIGSRFALMEVKAVMYYILLKFTIARSAKTQIPVQLRKGFTNVGPDHGMHMELKLR

*

 

>494345849 G719P81FE4.T0 pseudogene 92% to 494309314

DLLVQTIHGYV*SIIRRKMLRKGFRHCQES*

LGKMEMSKSMTEAEMIAQYLIFFLDGFATVSSCIMFTAYEVVRNPEVQRKLCEEIVQTDK

ELGGKPLSYEALQKMKYMDMVVSESLRIWPLAPATDRLCTKDYIVDDGQGLKFTIDKGTC

VWFPAAGLHHDPQYFPNPEQ

FDPERFNDENK

XXNSGLGTYLPFGIDPK

 

>AAGE01253357 476363988 755013039 587425733 632907226 55% to CYP9J5 complete

MEVNLLLLATVITVFVYLYRLITKNNDYFHDKPIPSLKARPLLGSTGPLLLKQVTFADFVSYVYNKFPGVK (2?)

VLGMFDTLTPFFVIRDPELIKQIAVKDFDHFMDHRPFFGESAESEEHPYALFKRVI

FALNGQRWRNMRATLSPAFTGRKMRLMFTLMVDCSERMLKHYESLMSSTGRMEVEIKDML

SRYGINVIASCAFGIDVDCFKDVDHEFMYHGRRMLQMGNPVVIAKMLFMRMFPNLAKKSG

MDVIPREQAVYFTKLIKETIRTRESQGIVRNDMIDLLLEARKGTLKYEEEREEVQEGFAT

VQESDVGKAQVTKAISEIDMIAQCLIFFIAGFESVSTTSMFMIYE

LILNPEIQQKLYEEVEQTYKQLGDKLLTYDALQSMKYMDMVVSETMRKWPLSPIGDRICV

RDYTLDDGQGLRFTIDKGTCVWFPIHGLHHDPQYYPNPDRFDPERFNDQNKGNIKMGTYL

PFGIGPRNCIGSRFALMELKAVMYHMLRKFSFHRSTNTRIPLKLRKGMNNVGTDEGMHVERIRRL*

 

>AAGE01015732 91% to 476363988 55% to 9J5 complete

2265 MEVNLLLLATVLTVFVYLYRLITKNNDYFHDKPIPSLKARPLLGSTGPLLLKQVTFSDFV 2444

2445 AYVYNKFPGVK (2) 2477

2538 VLGMFDTLTPFFVIRDPELIKQIAVKDFDHFMDHRPFFGESVESEEHPYALFKRVIFAL 2714

2715 NGQQWRNMRATLSPAFTGRKMRLMFTLMVDCSERMLKHYESLMSSTGQMEVEIKDMLSRY 2894

2895 GINVIASCAFGIDVDCFKDVDHEFMYHGTRMLQMGNPLVIAKMLFTRMFPKLANNWGMDV 3074

3075 IPREQAVYFSKLIKETIRTRESQGIVRNDMIDLLLEARKGKLKYEEEREEEQEGFATVQE 3254

3255 SDVGKAQVTKAISEVDMIAQCLIFFIAGFESVSANTMFMIYELILNPDIQQKLYEEVEQT 3434

3435 YKELGDKRLTYDALQSMKYMDMVISETLRKWPLTPVGDRMCVKDYVLDDGQGLRFTIDKG 3614

3615 TCVWFPIHGLHHDPQYYPNPDRFDPERFKDQNKGHIKMGTYIPFGIGPRNCIGSRFALME 3794

3795 MKALMYHMLRRFSFHRTANTQIPPKFRKGMNNFGTEQGLHVELRLRGQ* 3941

 

>AAGE01001411.1 3000-5000 region 10kb upstream of 9J7 complete

MDLDWTQLLAIVAIVVIIYRWLTGNHDYFHHKPIPSMTVRPIMGSTGPLLLKQCTFPEFIQSSYKKFAGAR (2)

VFGLFDTNIPMYVICDPDLIKRIAVADFDHFMDHRPIFGASNSDHPNLLFEKT

LFALTGQKWKNMRSTLSPAFTGSKMRQMFKFVVDCSESMVRFYQSEPRGTSHEMKDVFSR

FANDVIATCAFGIEVDSLRKRNNEFYVHGSKMLRLTRLSVVARLLGYRFAPTLMGKLGLD

INDQEQNQYFSSLVKETVKIRDVQGIFRPDMVHLLMEAKKGTLHHQEEIEHNKGFATVEE

SAMTKMRSMNSMTEVELIAQCLMFFLAGFDTVSTCLTFTAYELALNPTIQDKLYEEIKRI

HEAMSGKSLDYETLQKMSYMDMVISEVLRKWPAIAALDRLCVQDYEMDVGNGLKFTIDRG

SGIWIPIHAMHHDPKYYPDPERFLPERFSDENKASINMGAYLPFGIGPRNCIGSRFALME

VKAIVYHMLLRFSFERTAKTQVPVEIVKGFAPLKPKDGVFLEFRPRDAV*

 

>CYP9J7 262902386 621799144 520524964 618123933 complete

AAGE01001411.1 13000-15000 region

MDTIFVLALVGLLLLILLVLLYRFLSRKNDYFLNKPIPSLPGPLLLGGTSPLMLFRVSFTDYVKTVYDSFPDAK (2?)

VCGVMNTVIPLYIVRDPELIKKIAIKDFDHFADHRPVFGSDHGDHPNLIACKA

LFVLTGPKWKTMRATLSPAFTGAKMKFMFELIVECSEALVDYYRDQGAKEWDM

RTLFARFSNDVIATCAFGIKVNSSSDRDNEFYRRGKEMMVFTNFKTQLKIAGYLFTPWLM

NWFGIDLIKQEHSDYFAGLIRDTVRTREANGIIRPDMVHLLMQSRKGILKNQQ

EDDPEQEVSETTRSLPGPTMTESEMIGQCLFFFLAGFDTVSTALTFLAYELA

LNPDVQEKLSAEIAETHQSLNKRSIT

YEALHSMKYLDMVISESLRKWPSAPAVDRLCVQDYTLDDGQGLQFRMEKGIGIWIPIYGI

HRDPKYYPEPDKFDPERFSDQRKGDIQPGTYLPFGIGPRSCIGMRFALMELKCIVYYLLL

NFRLEKTERTEVPPVLEKGYVTLSAANGVWLKMVPK*

 

AAGE02029679.1 use this seq

77856  MDTIFVLALVGLLLLILLVLLYRFLSRKNDYFLNKPIPSLPGPLLLGGTSPLMLFRVSFT  77677

77676  DYVKTVYDSFPDAK (2) 77635

77577  VCGVMNTVIPLYIVRDPELIKKIAIKD  77497

77496  FDHFADHRPVFGSDHGDHPNLIACKALFVLTGPKWKTMRATLSPAFTGAKMKFMFELIVE  77317

77316  CSEALVDYYRDQGAKEWDMKDLFARFSNDVIATCAFGIKVNSSSDRDNEFYRRGKEMMVF  77137

77136  TNFKTQLKIAGYLFTPWLMNWFGIDLIKQEHSDYFAGLIRDTVRTREANGIIRPDMVHLL  76957

76956  MQSRKGILKNQQEDDPEQEVSETTRSLPGPTMTESEMIGQCLFFFLAGFDTVSTALTFLA  76777

76776  YELALNPDVQEKLSAEIAETHQSLNKRSITYEALYSMKYLDMVISESLRKWPSAPAVDRL  76597

76596  CVQDYTLDDGQGLQFRMEKGIGIWIPIYGIHRDPKYYPEPDKFDPERFSEQRKGDIQPGT  76417

76416  YLPFGIGPRSCIGMRFALMELKCIVYYLLLNFRLEKTERTEVPPVLEKGYVTLSAANGVW  76237

76236  LKMVPK* 76216

 

>AAGE01024220 39% to 9J8 40% to CYP329A1 anopheles 826166409

note has a P at I-helix T location like CYP329

There is a deletion and a stop codon in N-term exon

This may be the CYP329A1 pseudogene equivalent in Aedes

Change N-term to eliminate the stop codon

METEDLYWFSFILVTIVGFFTFKLMTKNR

ILSGQRSSLREAAFYLRKSG*GNK

4729 IFGFYNYLSPVYYIRDPELIRKLWINEF

 

4455  METEDLYWFSFILVTIVGFFTFKLMTKNRHFFRVRGVPFEKPHFIYGNLGEVTSGKLSSLELIASF  4652

4653  YQKFENER (2) 4676

IFGFYNYLSPVYYIRDPELIRKLWINEFNSFANHAYFLDESKDPI

LGNQLHLLKNEKWRQMRHTLTPVLSGQSVSSMSSLIRTNSLDLV

5853 DHLKASVDSELEFKGIFLKYVFNVIANCAFGLELNTFKDESDKFCTYGTALVYGNNPVQT 5674

5673 LKTMMFYLFPKMTTQMKVRLMEDEHAAYFTNLIGSTISEREKKNVNRADVIQMLHQANKG 5494

5493 ELKAEGQDDEVLQMKDFSKCKWNQEELIAQCIAFFGSGFEPLVNLLSFA 5347

5346 AYELAANPDIQQKLLSELEGSLRDDPVVSDTVDKLSYLNMVISETLRKWPASPSLDR 5176

5175 ECSKDYLLDDGGCRVQFRKGDTLWVSIWALHRDERNFPDPERFDPERFSEKNKASITPG 4999

4998 TYMPYGVGPRNCI (1)

4902 GTRLASLVAKITLVDLVRNFKLELGSRMVQPLRLSKTSYSMEPEGGFWLKMTPR* 4738

 

>CYP9J10 complete AAGE01039952 AAGE01005096 494160094 72% to TC52199

754334205 821634863 803206067 637185165 TC60951 TC32891 TC38518

57% to CYP 9J1, 90% to TC60679 98% to TC60950 (56% to anopheles 9J4)

CYP9J10 TC60950 TC38519 60% to CYP9J4 98% to TC60951 (3 aa diffs)

98% to 9J10v2 (2 aa diffs) 98% to 9J10v1 (3 aa diffs)

MVEVDLYVALAVGAIVVLLYHYAAKKYEYFLTKPIPALKPTMLFGNTGPMMFRQRDVSSHLKMLYNTYEGSK (2)

MIGFYDLMKPIYMLRDPEVIKQIAVKDFDYFMDHTPTMTNSNPEDEVGGDSLFGNSLFA

LRGQKWRDMRATLSPAFTGSKMRHMFELVADCAKSMAEFFKAEAAAGKTLEYEMKD

TFSRFGNDVIATVAFGIKVDSLRDRDNEFYLKGKAMLNFQSLSVLLKVLFLRAFPKLS

HKLGLDFVDSTLTEYFKQMIVDNMKQRAAHGIMRNDMIQMLMEVRKGSLRHQK

DEKETKDAGFATVEESNVGKSNINRVWTENELISQCFLFFVAGFDTVSTCMTFLTYELMLNQNIQQ

RLYDEVMETEKSLNGKPLTYEVLQKMEYMDMVVSEALRKWPPAVISDRFCVKNYMYDDGQ

GTRFLVEKGQTMWIPTIAIHSDPKYYENPEKFDPERFNEENRSKIDTGAYLPFGVGPRNC

IGSRLALMEVKVIIYNLLKDFSLESSEKTQIPLKMSKNFFVLQAENGVWLELKPRKR*

 

AAGE02011007.1 5 diffs to 9J10 (sixth P450 on this contig)

218249  MVEVDLYVALAVGAIVVLLYHYAAKKYEYFLTKPIPALKPTMLFGNTGPMMFRQRDVSSH  218428

218429  LKMLYNTYEGSK (2) 218464

218529  MIGFYDLMKPIYMLRDPEVIKQIAVKDFDYFMDHTPTMTNSNPEDEVGGDSLFGNSLFA  218705

218706  LRGQKWRDMRATLSPAFTGSKMRHMFELVADCAKSMAEFFKAEAAAGKTLEYEMKDTFSR  218885

218886  FGNDVIATVAFGIKVDSLRDRDNEFYLKGKAMLNFQSLSVLVKVLFLRAFPKLSQKLGLD  219065

219066  FVDSTLTEYFKQMIVDNMKQRDAHGIMRNDMIQMLMEVRKGSLRHQKDEKETKDAGFATV  219245

219246  EESNVGKSNINRVWTENELISQCFLFFVAGFDTVSTCMTFLTYELMLNQNIQQRLYDEVL  219425

219426  ETEKSLNGKPLTYEVLQKMEYMDMVVSEALRKWPPAVISDRFCVKNYMYDDGQGTRFLVE  219605

219606  KGQTMWIPTIAIHSDPKYYENPEKFDPERFNEENRSKIDTGAYLPFGVGPRNCIGSRLAL  219785

219786  MEVKVIIYNLLKDFSLVSSEKTQIPLKMSKNFFVLQAENGVWLELKPRKR* 219938

 

>AAGE02011007.1 pseudogene exon 1 new seq last P450 on this contig

168bp downstream of 9J10, 51% to CYP9Jae4

220106  MLQVHMFLAATVVLLL*YSYSITT*RNIYEYSLSKPISCAKPTFLVGRNWSTSLCKADMT  220285

220286  LHFKKICVFFPDA  220324

 

>AAGE01088707 494098990 826090661 819721560 57% to 9J5 complete

MEVNLFYFGVLVAILGTLYYLLTKKHGHFLDKPIPSMAAKPILGSVSDLMLQRVPFSTFIQTLYDKYRGVK (2)

VFGLFDMMTPTYVIRDPELIKQVAVKDFDHFADHVQVFGNSSYDHPNLLTGKTLFSLTG

LRWKTMRATLSPAFTGSKMRYMFELIVECTERAVRYYEKNALKSGPKVYEMKDVFSRFAN

DVIATCAFGLQIESSRDRDNEFFVNGSKMLDFSRPSVMLRIMGHQLVPWLMAFFGWDVID

EQQNTYFKTLILDAIREREHRGIVRPDMINLLIHAKKGTLKHQQENEHVPEGFATVQESE

VGTSSVTTVMTDVEMVAQCLIFFLAGFDTVSTSLLYASYELAINPEVQQKLYDEIQNTRT

ALNGKPLTYDAMQKMKYMDMVMSEVLRMWPPAPSTDRLCTKNYVMDEGNGVKYTIEKGTS

VWFPIHALHHDPNYYPQPEKFDPERFSDERKGSINAGAYLPFGIGPRNCIGSRFALAEVK

TILYYMLGSFSFERCSKTEVPPVLAKGFDVIPANGMHIEFKPRPKK*

 

AAGE02029679.1 2 aa diffs to AAGE01088707 use this seq

32148  MEVNLFYFGVLVAILGTLYYLLTKKHGHFLDKPIPSMAAKPILGSVSDLMLQRVPFSTFI  32327

32328  QTLYDKYRGVK (2) 32349

32424  VFGLFDMMTPTYVIRDPELIKQVAVKDF  32507

32508  DHFADHVQVFGNSSYDHPNLLTGKTLFSLTGLRWKTMRATLSPAFTGSKMRYMFELIVEC  32687

32688  TERAVRYYEKNALKSGPKVYEMKDVFSRFANDVIATCAFGLQIESSRDRDNEFFVNGSKM  32867

32868  LDFSRPSVMLRIMGHQLVPWLMAFFGWDVIDEQQNTYFKTLILDAIREREHRGIVRPDMI  33047

33048  NLLIQAKKGTLKHQQENEQVPEGFATVQESEVGTSSVTTVMTDVEMVAQCLIFFLAGFDT  33227

33228  VSTSLLYASYELAINPEVQQKLYDEIQNTRTALNGKPLTYDAMQKMKYMDMVMSEVLRMW  33407

33408  PPAPSTDRLCTKNYVMDEGNGVKYTIEKGTSVWFPIHALHHDPNYYPQPEKFDPERFSDE  33587

33588  RKGSINAGAYLPFGIGPRNCIGSRFALAEVKTILYYMLGSFSFERCSKTEVPPVLAKGFD  33767

33768  VIPANGMHIEFKPRPKK* 33821

 

>AAGE01194580 86% to 494098990 832396347 complete

AAGE01341824 89% to 494098990

 543 MEVNLFYFGAIVAIFGALYYLLTKKHGYFHDKPIPAMGAKPILGSIGDLMLQRVPFNTFL 722

 723 QAAYDKYSGVK (2) 755

 816 VFGMFDLMTPTYVIRDPELIKQVGVKDFDHFVDHEQVFGNSSYDHPNLLTGKTLFSLTG 995

 996 SRWKTMRATLSPAFTGSKMRYMFELIVECIERAVKYYEEETKKKGAQVYEMKDVFSRFAN 1175

1176 DVIATCAFGLQVESSRDRDNEFFVNGSKMVDFGKPSFILRLMGHQLVPWLMAFFGWDVID 1355

1356 GQQNTYFKRLIMDAIKEREHRGIVRPDMINLLIQAKKGTLKHQQENEQVPEGFATVQESE 1535

1536 VGKSTATTMMTDVEMVAQCLIFFLAGFDTVSTSLLYTSYELAVNPEVQKKLYDEIQNTRT 1715

1716 ALGGK 1730

     PLTYDAVQKMK

     YMDMVISEVLRKWPPIASTD 879

 878 RVCTKNYVMDEGNGIKYTIEKGAALWFPTYALHHDPKYYPQPEKFDPERFSDERKGSINT 699

 698 GAYLPFGIGPRNCIGSRFALAEVKTILYYMLGSFSFERCSKTEVPPVMPKGFDVIPVNGM 519

 518 HIEFKPRPKG* 486

 

>AAGE01341824 89% to 494098990

1238 KGTLKHQQENEQVPEGFATVQESEVGKSTATTMMTDVEMVAQCLIFFLAGFDTVSTSLLY 1059

1058 TSYELAVNPEVQKKLYDEIQNTRTALGGKPLTYDAVQKMKYMDMVISEVLRKWPPIASTD 879

 878 RVCTKNYVMDEGNGIKYTIEKGAALWFPTYALHHDPKYYPQPEKFDPERFSDERKGSINT 699

 698 GAYLPFGIGPRNCIGSRFALAEVKTILYYMLGSFSFERCSKTEVPPVMPKGFDVIPVNGM 519

 518 HIEFKPRPKG* 486

 

 

CYP3 clan

Four CYP9M related sequences and one new CYP9 subfamily, plus one pseudogene

 

>AAGE01023613 494247077  812172036 586045833 57% to 9M1 complete

MVLLDLLVVLIPIVSYLLYRWAVATYDFFEKRKIPYVKPYPFVGGLWPVFSGKLHPTDAAVLGYNLFPEN

RFSGFFAFRRPGYLIHDPALAKQIMIKDFDHFTDHMNTISVDVDPIFGRALFFMDGQRWR

HGRSGLSPAFTGSKMRNMFTLLSKYVEGAMQRLAQDAGQGKMELEIRDLFQK (2?)

LGNDIITSISFGVEIDSVHNPNNEFFKRGKQLAATGGFQGLKFFFSLVVPDSVFKLFGI

RFLPKEAADFYVDVVSKTIKHREEYKIVRPDFIHLFVQARKNEL

KEETADDELKSAGFTTVEEHIEASTENSQYTDLDITAVAASFFFGGIETTTTMLCFALYE

LAGNKEVQQKLQAEIDSVRKELGGGSLTYEVLQKMKYLDMVVTETLRRWPPLGITNRVCV

KPYTFEDHEGTKVTIEKGQLIQIPVQSFHRDPSFFPDPYRFDPERFSEENKHKINQDAFL

PFGSGPRNCIGSRLALMQAKCLLYYLFSAFSLEYSDKMDVPIKLNKMSLTYTAKNGFWFN

LLPKKVAV*

 

>AAGE01008959 54% TO 9M1 74% TO 494247077 complete

6014 MGVLEWLAVFVPIVTYLLYRWSTATYDYFREKKIPFVKPYPLFGSLWPIFSGKLHPVDAT 6193

6194 ILGYDMFPGRRFSGFFTFRTPSYLVHDPALAKQVLIKDFDHFTDHTSTILPDVDPVLGRN 6373

6374 LFFMDGQRWRHGRSGLSPAFTGSKMRNMFVLLSNYVDGAMKRLAQDAGPGKMELELRDLFQK (2) 6559

6593 LGNDIITSISFGVDIDSIHNPNNKFYKRGQKVTATGGIQGFKVFLTTVIPGSVFKF 6784

6785 FGVKVLPKEAADFYVDVISKTVKQREEYKIVRPDFIHLFMQARKNELKEDKADEELKDAG 6964

6965 YSTVEEHLQSTTKNNQYTDLDIAAVAVSFFFG 7060

7060 GIETTSSVLSFVLYELCLNPAIQHKLQEEIDTVRAQLEGNPLSYEVLQKMKYMDMVVS 7233

7234 ETLRRWAPLGIVSRKCVKPYTFEDHDGTKVTVEKGHIIQIPLQSFHRDPNFFPDPYRFDP 7413

7414 ERFSDENKHKIKQDTYIPFGSGPRNCIGSRLALMQTKCVLFYLFANFSVEFSEKMDVPIK 7593

7594 LNKMALSYTAQNGFWFHFAARDVKT* 7671

 

>AAGE01012700 71% to 494247077 55% to 9M1 complete

2863 MFESLALIVPVAAFLVYLWSIATYNYFKKRKIPFVKPYPLIGGLWPALTGKVLPLEAATL 3042

3043 GYDMFPKHRFSGYFMFRNPEYLIHDPALAKQVMIKDFDHFTDHTSVFPVEVDPIIGRSLF 3222

3223 FMDGQRWRNGRSGMSPAFTGSKMRNMFTLLSKYADSAMQRLVEDAGKNKLELEIRDLFQK (2) 3402

3464 LGNDIITSISFGLDIDSVHNPDSEMFKKGKQLAGTTGFQGFKFFLSMALPSSIYKLFGI 3640

3641 RLITKDVADFYLDIVTNTIKYREENNIVRPDFIHLFVQARKNELKEDKTDETLDSAGFTT 3820

3821 VQEHIKSSSENSKYSDFGITAVVASFFFGSIETTSTVLCFAMYELAANPEIQQKLQDEIE 4000

4001 LVKDQLNDSPLTYEVLQKMKYLDMVVSETLRRWPPLGTTNRVCVKPYTLEDYDGTQVTIE 4180

4181 KGQAVQIPIISYHHDPNYFPDPYRFDPERFSDESRDKINQDAFLPFGSGPRNCIGSRLAL 4360

4361 MQVKSLLFYLLTCFSVEFSEKMDVPIKLKKMSMTYTAQSGFWFNLVPKSVEV* 4519

 

>494569869 25% to 9M1 N-term 39% to 494247077 pseudogene of 494247077

LDLLDLVGVLIPIGSYPLDLWVLTSYYSFEKIEIPYVKPYPLVG

ELRPEFTNVLLPSYDTGIGHLLFPETDLP*

FFWIHQIACLSPYSPQAILINMIGTYLFSDFCGI*

SADVDPDFGRALFLTDGLKTRPGRSGINVIVWAYNMNMLAVCYMPLFHGSYQKQAEDARQ

CNMSIDYCDVFHL ()

RGSDVIHYNIRLDVHIDCVHVPFYDYYHKRASG*RLTGVIFWDLEF

 

>AAGE01026951 AAGE01099852 476324290 55% to 9M1

579013166 826063288 832528009 494192568 637071386

613990760 complete

METLVWIALVLLIIIFLIYRWSIACYDYFEKRNILY

VKPYPFFGGLWPVFCRKLHPTDATIMGYNMFPERRCSGLFTF

RNPAYDIHDPTLAKQIMVKDFDHFTDHMNTISADVDPILGRALFFMGGSRWRHGRAGLSP

AFTGSKMRNMFVLLSKHVDEAMRRLVEDAGEGALEVEIRELFQK (2)

LGNDITTSISFGVEVDSVHNPGNTFLEM

GKLLIATSAFQGFKYLLSLVVPESVFKFFGVRFFPKEAADFYLDIVTETISHREKNKIVR

PDFIQLFVQARKNELKKDNTDDNFK

SAGFTTVDEYIESSTENGQYTDLDIAAVALSFFFGGIETTTTAICFAVYEIVLNATIKEK

LQTEIDSVKEGLEGRPLSYEILQQMKYLDMVVSEALRRWPPVGVTNRACVKTYAFEENDG

TTVTIEEGQVVHIPVQSFHRDSNYFPDPLRFDPERFSDENKHVINQDAFLPFGSGPRNCV

GSRLALMQAKCILYYFFCTLFDGLFQQNGPTDQTQDYVSLLRSAEWFVVSFDAEHGKVVKYKK*

 

>AAGE01015749 494133555 519945594  763120971 810094850 53% to 9M1 complete

MILLLLVVAVGYLIYRWSVATFDYFEKLNVPFLKPYPFFGALWPSLKGE

KSPTDATAEGYRLFPGNRFSGFFSFREPGYLIHDPELIKQIAIRDFDHFTDHANNVPLEV

DPFLGRGLFFTGGQRWKHGRTALSPAFTGSKMRNMFQLLSSYTDGAMKRLVKDAAGGKLEREMKDLFQR (2?)

LGNDVMTSISLGFDTDSVHDPDNEFFQYGKRLSRTSGLQG

LRFFVLTLLPENILKVIGIRIIPSDIANFYNEVV

IKVIKERLEKNIVRPDFIHLMLQARKNELKADKTDEFLNDAGFSTVKEHLQSSAKNQ

IEWSDYDIAATSASFFFGGIESTTTLVCFALYEIALNHDVQQKLRAEVDATKLSLGDAKL

TYESMQQMKYMDMVITETLRKWPPFGVTNRRCTKAYSLENANGTKVTVHKGQVIFIPIYE

IQRDAQYYPNPERFDPERFSDQNRGNLNQDTYLPFGIGPRNCIGSRLTLMQAKCYLFYML

TCFEIQLSTKTDVPMQLDARSSALNAKNGLKMQLIPRGV*

 

>AAGE01236202 AAGE01528761 AAGE01574909 575351627 754305099 587660657 263512612

581727980 743856203 625109625 223413916

773058412 (exon 1) mate pair = 775439855 (exon 2)

832539269 578892595 complete 51% to 9M1

MMELLLLGAAALTAVCYLLYRWSTSTFGYFEKRSVPFGKPYPLLGALWPYLKGEKSPVDALCEGYRHFP

GCTYSGVFLFRSPCYLIHDPELIKKIAVRDFDHFADHANNVSLEVDPFMGRVLFFANGQR

WKQGRTALSPAFTGSKMRNMFGLVSEYTNGAVQRLVEDAEASGGKMERELNDLFKR (2?)

LGHDAITSISLGTDIDSIREPENEFFAHGKELAKTTGLQGFRFFIMSLLPEKILRLSR

MRIVPEHLANFYHGVVSKVIKHRLDNGIVRPDFIHLLLQARRNELKTDKTDE

KFNDAGFATVQEHLQAPTKNPIEWTDYDIAATVATFFFGGAESTTALLCFTIYELALN

PHVQQKLLAEIDSVQKTVGTEKLTYESMQQMKYLDM

VISETLRKWPPFGVTNRRCTKPYQIQDVDGHSVTIEKGQVVFLPIQHIHRDPHFFPNPMR

FDPERFATENRDQLNQDAYLPFGAGPRNCIGSRLSLMQTKCFLYYLLSTFEVQLSNRTEV

PIEIDLKATGLNSKNGFWFHLIQRVK*

 

>AAGE01014192 813491936 639416242 762398872 complete

TC52960 TC20003 TC26029 TC39058 TC4763 TC7436

40% to CYP9J4 40% to 9A4 (new subfamily in CYP9)

AAGE02005788.1 4968-6548 no introns

ESTs DV359961,DV294300,DV294302,DV359959

MEAFLLISALVGALILLYRYATAFANYFNQRGIKYRKPTFLLGNLGPILFQRTT

PVANLTDLYREFAGEKVYGFYEFRRPTIILRDLQLIKRVF

VKDFNHFTNHTAPVDEHMDSILGNGLISLEGQKWRDMRAMLSPMFTGNKIRHMVPLVGKC

AEDLCRFVERETDEVEWDVRELLAKCLVEVIGSCAFGIEVDSFNDPDNEFDRVAKYLMNQ

SDVRKVARFLLIMVFPKMCKQFGMELFDDKYKRLFRRLVSETMLKRESDGVSRPDLIQLLMLARRG

KLEADKDVEGESFAAANDYLETGTDDVKRSWSDDELTAQAVIFFAAGFDTTSTLLSFTLM

ELAIHPEIQDRLFEEIKSVQRSDSVISYEQIQSLEYLDAVISESLRKWPPLTATDRKCTK

DYLMVDPEDGSPMFSIEEGYSVWVPIYCFHHDPKYFPNPEKFDPDRFNRVNRHQLNPAAY

MPFGVGPRNCIGSRFALMSAKMILLRLLRSFRVEVCPKTDTTLQLSKTKMNMTLEKGHWV

YLKRRS*

 

CYP4 clan sequences

 

>AY433052 AAGE01072700.1 AY431937 88% to 4G16 complete

AAGE01141041.1 AAGE01223479.1 AAGE01094290.1

1517  MSATVAPADPVMANANIASPMNVFYFLLAPALLLWFIYWRISRQHMLKLAEKIPGPPGLP  1338

1337  LLGNALELIGTSH (1?)

1379  SVFRNVIEKGKDFNQVIKIWIGPKLIVFLVDPRDVELLLSSHVYIDKSPEYRFFKPWLGNGLLIST  1179

323  GHKWRQHRKLIAPTFHLNVLKSFIDLFNENSRLVVEKMHKEAGKTFDCHDYMSECTVEILL (1)

1240 ETAMGVSKKTQDQSGFDYAMAVMKMCDILHLRHRKMWLYPDLFFNMSQYAKRQVKLLDTI 1061

1060 HSLTRKVIRNKKAAFATGTRGSLATTSIKTAEFEKPKSNINTNSVEGLSFGQSANLK 890

 889 DDLDVDENDVGEKKRLAFLDLLLESAENGALISDEEIKNQVDTIMFEGHDTTAAGSSFFL 710

 709 SMMGIHQHIQDKVIQELDDIFGDSDRPATFQDTLEMKYLERCLMETLRMYPPVPIIARS

LKQDLKLASSDLVVPSGATIVVATYKLHRLETIYPNPNVFDPDNFLPERQANRHYYAFVP

FSAGPRSCV 320 (1)

 255 GRKYAMLKLKVILSTILRNFRVISDLKEEDFKLQADIILKREEGFQIRLEPRQRKPKAAKA*

 

>AAGE01114834 52% to AY433052 same as TC67187 76% to 4G17N-term probable ortholog complete 80% to 4G17 full length

AAGE01340100.1 same as AAGE01229939.1 85% to 4G17 C-term, probable ortholog

633767131 823361413 823353110

MVIFMTLVLVASALFHFWMISRRYVQLGNKIPGPRAYP

FIGNANMLLGMNHNEIMERAMQLSYIYGSVARGWLGYHLVVFLTEPADIEIILNSYVHLT

KSSEYRFFKPWLGDGLLISSGEKWRSHRKLIAPAFHMNVLKTFVDVFNDNSLAVVERMRK

EVGKEFDVHDYMSEVTVDILLETAMGSQRTSESKEGFDYAMAVMK (2)

MCDILHSRQLKFHLRMDSVFNFTKIKQEQERLLGIIHGLTRKVVKQKKELFE

KNFADGKLPSPSLSEIIAKEESESKESLPV

ISQGSLLRDDLDFNDENDIGEKRRLAFLDLMIETAKSGADLTDEEIKEEVDTIMFEGHDT

TAAGSSFVLCLLGIHQDVQDRVYKEIYQIFGNSKRKATFNDTLEMKYLERVIFETLRMYP

PVPVIARKVTQDVRLASHDYVVPAGTTVVIGTYKVHRRADIYPNPDVFNPDNFLPERTQN

RHYYSYIPFSAGPRSCV (1)

GRKYAMLKLKVLLSTILRNYRVVSNLKESDFKLQGDIILKRTDGFRIQLEPRV*

 

>CYP4C38? exon 1 AAGE01133681.1 587572087 complete

These two pieces probably are from one gene, since there are no

Other closely related sequences found. 66% to 4C36

784  MSELTTFIYGILVFLIFAPFLQWWVKRARLVQIIDKIPGPKAYPFIGTTYTFFGKKHY (1)  611

 

>CYP4C38 N-term AAGE01207392.1 AAGE01470307.1 71% to 4C27

824335234 761357490 744250376 592527729

570727647 754993699 585845687 593920597 613947338

594452687 575404595 749489367 579218945 825227784

AAGE01009885.1 TC66432 Length = 995 71% to 4C27 anopheles

AAGE01207392.1 matches the N-term part of TC66432

parts are on AAGE02022591.1 AAGE02022592.1. AAGE02022593.1

use these seqs

456 ELFYIIDERTRRYPDIHRIWTGMRPEIRISKPEYVETIIGASKHMEKSHGYDFLFDWLGEGLLTSK  259

302 GERWFQHRKLITPTFHFNILDGFCDVFAEQGAVLAERLEPFANTGKPVDVFPFITKAALDIIC (1) 490

694 ETAMGVKVNAQTGGENNYVNAIYR (2)  762

822 MSEIFVDRSIKPWLHPEFIFKRTEYGRQHKKALDIVHGYTKK (0) 947

    VIRDRKEALQVKENSTGAGDTGEDLYFGTKKRLAFL 227

228 DLLLEGNAKHKQLTDDDVREEVDTFMFE (0)

    GHDTTTAGMSWALFLLGLHPDWQDRVHQEIDS 407

408 IFAGSDRPATMKDLGEMKLLERCLKETLRLYPSVSFFGRKLSEDVTLGQYHIPAGTLMGI 587

588 HAYHVHRDER (2?)

    FYPDPEKFDPDRFLPENTEHRHPFAYIPFSAGPRNCIGQKFAILEEKS 761

762 IVSSVLRKFRVRSANTRDEQKICQELITRPNEGIRLYLEKRQ*

 

>Exon 1 of 4C25 ortholog AAGE01102043.1    80% to 4C25 complete

Exon 2 of 4C25 ortholog AAGE01326257.1

AAGE01078331.1 83% to 4C25

515 MIEATVKSSFVLSKVAKMLSYFSPITIILATMIAGAIYVYNKRRARLVKLIEKIPGPASMPLIGNSLHINVDHD 294 (1?)

EIFNRIISIRKLYGRQQGFSRAWNGPIPYVMISKASAVE  (0) 935

PILGSPRHIEKSHDYEFLKPWLGTGLLTSQGKKWHPRRKILT  1504

1505  PAFHFKILDDFVDIFQEQSAVLVQRLQRELGNEEGFNCFPYVTLCALDIVC (1)  1657

1714  ETAMGRLIHAQKNSDSDYVKAVYQ (2)

      IGSIVQNRQQKIW  1887

1888  LQPDFIFKRTEDYRNHQRCLSILHEFSNRVIRERKEEIRKQKQSNNNTINGNANNAVEAN  2067

2068  ILDGNNNAEEFGRKKRLAFLDLLIEASQDGTVLSNEDIREEVDTFMFEGHDTTSAAISWI  2247

2248  LLLLGAEPAIQDRIVEEIDHIMGGDRDRFPTMKELNDMKYLECCIKEGLRLYPSVPLIAR  2427

2428  KLVEDVQIEDYTIPAGTTAMIVVYQLHRDPAVFPNPDKFNPDNFLPENCRGRHPYAYIPF  2607

2608  SAGPRNCIGQKFAVLEEKSVISAVLRKYRIEAVDRRENLTLLGELILRPKDGLRIKISRR  2787

2788  E*  2793

 

>AAGE01094388.1 exon 2 4C like possible pseudogene fragment cannot extend

2051  FNRIECIKRLYTYQSGGYMRTWNG  1980

 

 

>AAGE01029369.1 72% to 4C26 62% to 4C25  complete

did blast with exon 1 of 4C26 to find best match

did blast with last 500bp of AAGE01029369 to find trace seq on (+)

759912013

mate pair = 759644271 will be downstream of AAGE01029369 and should match next

contig, possibly with N-term exon.  Contig match = AAGE01001656.1 (-)

over 15kb, but no P450 seq.  Might be a short contig between these

AAGE01030574.1 exon 2 4C like

586027613 matches first 500 bp on (-)

mate pair = 586024059  matches AAGE01059591.1 (+) no P450 seq

repeat with first 500bp of AAGE01059591

600013440 matches on (-) mate pair = 600014884

this matches two contigs AAGE01312028.1 (-)

and AAGE01029369.1 (+)  this one has a P450 seq

that is 4C like and complements this exon 2 seq.

Join them

Now use last 500bp of AAGE01030574 to find a trace file that matches on (+)

636183786 mate pair = 637148886 matches AAGE01001355.1

this is the same contig found in a search above going downstream from the N-term of a 4C like P450.  The intron must be more than 17kb

join exon 1 seq

AAGE01098344.1 best hit to 4C26 N-term

searched by megablast to get 585803103(+), mate pair = 585951518

searched WGS with this to find adjacent contig downstream

this = AAGE01001355.1 16kb, no P450 seq

148  MNHNIAAKIASLFSVLSPITTVILVVMVCAIITYKKKRARLVHHINRIPGPFMLPIIGNGLHVTLGCKD  354

4051  EFLDRVISAQKMYGRRIGMSRAWNGPIPYVMISKASAVE  3935

3210  PILSNPKLVEKSVDYDFMKPWLGNGLLTSRASVWHPRRKTLTPAFHFKILSEFVNIFHK   3034

3033  QALVMNEKLAEQLDNTAGFDIVPFTTLCALDIFC (1?)

2868  ETAMGCPVNAQKNSDSEYVRAHK (2?)

      IGKIIRNRLQKVWLRPD  2686

2685  FIFKHTEDYRKHQECLQVLHNFSDRVVQERKTEIVAKRCQAEDLIDLNNNKVADETISCC  2506

2505  SKKQLEFLDLLIEGSLDGNGLTDLDVREEVDTFVIGGHDTTAAAMAWILLLLGSDQKIQD  2326

2325  RVIDEIDGIMNGDRDRRPTMQELNDMKYLECCIKEGLRLYPSIPLIARRLTEDVQVDDY   2149

2148  IIPSGTTTLIVVYQLHRDPSVFPNPDKYNPDNFLPENCSGRHPYAYIPFSAGPRNCIGQK  1969

      FAILEEKMVLSTVLRKFRIEAVERREDVKLLGDLVLRPRDGLKIRVSRRL* 1816

 

1.295_1 AAGE01029369 Hils Version  15 diffs to blast file

AAGE02013631.1  exon 1, AAGE02013630.1 exon 1 exact duplicate,

AAGE02013629.1 exon 2, AAGE02013628.1  exons 3-5 use this seq

MNHNIAAKIASLFSVLSPITTVILVVMVCAIITYKKKRARLVHHINRIPGPFMLPIIGNGLHVTLGCKDEFLDRVISAQK

MYGRRIGMSRAWNGPIPYVMISKASAVEPILSNPKLVEKSVDYDFMKPWLGNGLLTSRASVWHPRRKTLTPAFHFKILSE

FVNIFHKQALVMNEKLAEQLDNTAGFDIVPFTTLCALDIFCETAMGCPVNAQRNSDSEYVRAHKLIGKIIRNRLQKVWLR

PDFIFKHTEDYRKHHECLQVLHSFSDRVVQERKAEIVAKRRQAEDLIDLNNNNESEELTSCCRKKQLAFLDLLIEGSLDG

NGLTDLDVREEVDTFVIGGHDTTAAAMAWILLLLGSDQKIQDRVIDEIDGIMNGDRDRKPTMQELNDMKYLECCIKEGLR

LYPSIPLIARRLTEDVQVDDYIIPSGTTTLIVVYQLHRDPSVFPNPDKYNPDNFLPENCSGRHPYAYIPFSAGPRNCIGQ

KFAILEEKMVLSTVLRKFRIEAVERREDVKLLGDLVLRPRDGLKIRVSRRL.

 

>476414268 92% to Aedes albopictus AY971511 complete

760814858 568935720 581452704 754413849 580048410

walked upstream to 531423840

walked to 529070673

walked to 824339230 mate pair = 823396717 matches C-term

AAGE01143020.1 63% to 4C28 AAGE01462557.1 62% to 4C37

AAGE01324666.1 exon 2 4C like same as 476414268

supercontig 1.295 frame = -

177175 MLKEPLLLVITIASQLLHAVKEFPLPATVLLGVVIVVYLFAHADRDQLKSLLRINGAKDG 176996

176995 SKKSVKFYLNQLPGPQCIPLLGNSLMMATDRE (1) 176900

DMFNRLTTARKLYGRKQGICRIWNGRTPYVLISKAEPVERILSSSVNIEKGRDYGFLRPW 546

LGNGLLTCPGSRWYKRRKALNPTFNYKMLSDFLEVFNRQAQTMVRLMEKELNRENGFN

CTRYATLCSLDILCETAMGYPIQAQEQFGSDYVKAHEE (2)

IGRIMLERLQKIWLHPDFIYKRTNFYKRQSECLKILHGFSENVIKQRRLQRDASLANKHDEDPSI

EIGRKRQLAFLDLLLEATQDGQPLSDRDIRDEVDTFILGGHDTTATAIGWLLYL

LGTDPQVQDRVFEEIDSIMGQDRDRPPTMIELNEMKYLECCIKEALRLFPSIPLIARKLT

ESVNVGDYTIPAGTNAVIVVYQLHRDTQIFPNPDKFNPDRFLPENSQGRHQY

AYIPFSAGPRNCIGQKFGLLEEKAVAVAVLRKYRITSLDRREDLTLYGELVLKSKNGL

RISISQRQ*

 

AAGE02013627.1 exon 1, AAGE02013626.1 exons 2-3 use this seq (3 diffs)

12959  MLKEPLLLVITIASQLLHAVKEFPLPATVLLGVVIVVYLFAHADRDQLKSLLRINGAKDG

       SKKSVKFYLNQLPGPQCIPLLGNSLMMATDRE (1) 12684

13019  DMFNRLTTARKLYGRKQGICRIWNGRTPYVLISKAEPVERILSSSVNIEKGRDYGFLRPW  12840

12839  LGNGLLTCPGSRWYKRRKALNPTFNYKMLSDFLEVFNRQAQTMVRLMEKELNRENGFNCT  12660

12659  PYATLCSLDILCETAMGYPIQAQEQFGSDYVKAHEE (2) 12552

12391  IGRIMLERLQKIWLHPDFIYKRTNFYKRQSECLKILHGFSENVIKQRRLQRDASLAN  12221

12220  KHDEDPSIEIGRKRQLAFLDLLLEATQDGQPLSDRDIRDEVDTFILGGHDTTATAIGWLL  12041

12040  YLLGTDLQVQDRVFEEIDSIMGQDRDRPPTMIELNEMKYLECCIKEALRLFPSIPLIARK  11861

11860  LTESVNVGDYTIPAGTNAVIVVYQLHRDTQVFPNPDKFNPDRFLPENSQGRHQYAYIPFS  11681

11680  AGPRNCIGQKFGLLEEKAVAVAVLRKYRITSLDRREDLTLYGELVLKSKNGLRISISQRQ*  11498

 

>AAGE01044016.1 AAGE01004063 47% to 4AR1, 614744667 579602080 complete

probably same gene as 4T1.6 (3 diffs), 4I1.3 (2 diffs)

MIAIIAFTAIFVLFVYVWQWRRRLSRPFRTVPGPPGLPLIGNCHQFIGKSSTNIFHMLI

ELERLYGSVFKVDVATGIWLFYMSPGDIERIMTGP

EFNCKSDDYDMLLEWLGTGLLISNGNKWFTHRKALTPAFHFKILDNFVQVFDEKSTILAR

KFLSYSGKVVGIFPLVKLCTLDVIVETAMGTESNAQTEESGYTMAVEDISEIVFWRMFNN

VYNTEFMFKLSNKYGTYKKCLETIREFTLSIIEKRRSTLNVFDKNGGTSEVCNDSTGLKK

KMALLDILLQTEIDGRPLTNEEVREEVDTFMFA (0)

GHDTTASAITFLLYAMAKYPDVQQKVYEEAVSVLGDSIDTPITL

SALNDLKYLDLVIKESLRMFPPVPYISRSTIK

EVELSGCTIPTGTNITVGIFNMHHNPKYFPDPEEFIPERFEVERGVEKQHPYAYVPFSAG

GRNCIGQKFAQYEIKSTISKVIRLCRIELI

RPNYEPPLKAEMILKPQDEMPLRFFPR*

 

>AAGE01046474.1 possible 4K2 like N-term joins with AAGE01021812

2979 MLIVVLLPLVITLCLVFAFVHRKLLQFPNLAGPPEWPIAGSATEIVNLSSI (1) 2827

2664 EIFKLLRRYAQQYGTAYKLSFWYQYTLVFAKPDIAE (0) 2557

supercontig 1.283 1377209 EIFKLLRRYAQQYGTAYKLSFWYQYTLVFAKPDIAE 1377316

 

AAGE01021812 52% to AAGE01044016 54% to CYP4K2

supercontig 1.283 1387420 KILNTQSYASKSEDYDKVAEWIGYGL 1387497

     KILNTQSYASKSEDYDKVAEWIGYGL 4389

4388 LISKGEKWFKRRKVLTPGFHFKILESFVRVFNEKSDVLCRKLASYGGSEVDVFPTLKLYT 4209

4208 LDVLCETALGYSCNAQTEDSFYPAAVEELMSILYWRFFNLFASVDTLFRFTKQYRRFHKL 4029

4028 IGDTREFTLKIIEEKRKLLNELHDEGAVNEEDDEGKKKMALLDLLLRATVDGKPLSD 3858

3857 DDIREEVDTFTFA 3819 (0)

3755 GHDTTASALTFLLFNIAKYSDVQQKLFEEISSVVGSTSELSLH (2) 3645

3583 TLNDLRYLDLVIKESLRLYPSVPMIARIATENTKLDDMPIPKCTCVSVDIFQMH 3404

3403 RDPDRFEDPESFIPERFDAIRDGGKHNAFTYIPFSAGNRNCI 3269 (1)

3219 GQKFAQYELKIAVVKLIQTFRLELPSPDIEPILKAEIVLKPAEKLPIRFITRTTK* 3049

 

AAGE02013268.1 use this seq

239125  MLIVVLLPLVITLCLVFAFVHRKLLQFPNLAGPPEWPIAGSATEIVNLSSI (1)  239274

239440  EIFKLLRRYAQQYGTAYKLSFWYQYTLVFAKPDIAE (0) 239547

249651  KILNTQSYASKSEDYDKVAEWIGYGLLISKGEKWFKRRKVLTPGFHFKILESFVRVFNE  249827

249828  KSDVLCRKLASYGGSEVDVFPTLKLYTLDVLCETALGYSCNAQTEDSFYPAAVEELMSIL 250007

250008  YWRFFNLFASVDTLFRFTKQYRRFHKLIGDTREFTLKIIEEKRKLLNELHDEGAVNEEDD  250187

250188  EGKKKMALLDLLLRATVDGKPLSDDDIREEVDTFTFA (0)

250362  GHDTTASALTFLLFNIAKYSDVQQKLFEEISSVVGSTSELSLH (2) 250490

250552  TLNDLRYLDLVIKESLRLYPSVPMIARIATENTKLDDMPIPKCTCVSVDIF  250704

250705  QMHRDPDRFEDPESFIPERFDAIRDGGKHNAFTYIPFSAGNRNCI (1) 250839

250901  GQKFAQYELKIAVVKLIQTFRLELPSPDIEPILKAEIVLKPAEKLPIRFITRTTK* 251068

 

>CYP4D23 AAGE01000026.1 476394815 TC65595 TC24018 TC42055 74% to 4D22

only 51% to 4D17, probable ortholog of 4D22 complete

4T2.8 (v1 2 diffs), 4T1.3 (v2 1 diff), 4T1.1 (v3 1 diff)

AAGE01263405.1 probable exon 1 of 4D22 ortholog

566  MSILDWILVITGAVLAINYLLVRRNLKYQSQWPGPAAVPLIGCYYLYFNKKPE (0)

5888 DVMDFIFTLSRKYGTMFRVWVGTRLALFCTNTPDTETVLSSQKLIRKSELYKFLVPWLGN 6067

6068 GLLLSTDQKWFNKRKIITPAFHFKILEQFIEVFDRQSGILVQKLKPEASGKLVNVYPYVT 6247

6248 LCALDVIC 6271 (1)

6334 ETAMGTPINAQTDVDSKYVRAVTELSYLLTTRFVKVWQRSDFLFNLSPDRKRQDKV 6501

6502 IKVLHDFTTNIIQKRRKELMDHGDSGISGDDSIGSKKKMAFLDVLLQASVDGKPL 6666

6667 TDKEIQEEVDTFMFEGHDTTTIAIAFTLLLLARHPEVQEKVYKEVTEIIGTDLSIPATYR 6846

6847 NLQDMKYLEMVIKESLRLYPPVPIIGRKFTEKTTIGGNVIPEDSNFNLGIIVMHRDPKLF 7026

7027 DDPEKFDPERFSPERTMEQSSPYAYIPFSAGPRNCI (1)

7202 GQKFAMLELKSTLSKVIRNYRLTEAGPEPQLIIQLTLKPKDGLKIAFVPRA* 7357

 

>CYP4D24 AAGE01006231.1 4T2.6 (100%) complete 494125342 62% to 4D16

AAGE01082298 bridged by 825775921 to AAGE01006231.1

1254 MLILLASVVVLSIGLAVYFYQQFANRLHYAAKIGGPKGYPLLGNSIQYGTKSPVEFLQE 1078

1077 VQKTNEQCGKFYRLWIGPDLIFPITDAKL 991 (0)

917 AILSSQKLLDKSVQYDFIRPWLGNGLLTSTGRKWHSRRKIITPTFHFKILEQFVEIFD 744

743 QQSNIFVGQLKSKAQSGEDFDVFPVVTLCALDVIC 639

4385 ESAMGTKVNAQLNSDSKYVRAVKD (2) 4456

MATVAMARSFKAFARFNFTFYFTPYRRMQDKALKVLHDYTDSVIRSRRLEL

AKGAFTKSDENENDVGIRKKVAFLDMLLQATVDGRPLDDLEVREEV

4812  DTFMFEGHDTTTSAISFLIGILAKHPDVQQKVYDEVRNVIGDDLNVSVTLSMLNQLNYLD  4991

LVIKETLRLYPSVPIYGRMLLENQEI (1)

5134  NGTVFPAGSNLAIFPYFMGRDPEYFENPLEFRPERFAVETSAEKANPYRYVPFSAGPRNCIGQKFA  5331

VAEIKSLISKLVRHYEVLPPKQPNSERMIAELVLRPEGGVPVRIRSRVR*

 

>AAGE01055570 54% to 4D16 TC58022 AAGE01032454.1 512569922 complete

mate pair = 514720868 = C-term

walked upstream to 822913819 mate pair = 822913819 = C-term

walked up to 803280909 mate pair 808283299 matches mid region

walked to 519825648 mate pair 520507511 matches C-term

walked to 826165713 and 825253376 mate pair = 825244224 matches exons 2,3

walked to 528823040 and 572484877 mate pair = 572478448 matches exons 3,4

walked to 585907890 walked to 812022667 and 749632380 mate pair = 749635932

matches exon 2, walked to 578582171 (possible repeat region)

walked to 580094767 mate pair = 585907890 above so got past the repeat

also found 759050174 mate pair = 759046608.  This mate pair has an N-term seq

which is almost identical to AAGE01124480

another hit that matches 759046608 exactly = 824317331 so the seq is confirmed

     MWFLLSLVAAACLAWAIYRKFARTLEISGQHTGPPALPILGNGLWFLNKQPD (1)

     EFLPIIQRLTDEYGDVFRFWQGPEFTLYVGRPSMIE (0)

     TLLTDKNLTDKS 392

 393 GEYGYLSNWLGDGLLLSKRNKWHARRKAITPAFHFKILEQFVDVFDRNASELVDVLGKHA 572

 573 DSGEVFDIFPHVLLYALDVIC (1) 635

 698 ESAMGTSVNALRNADSEYVRAVKEAANVSIKRMFDFIRRTPLFYLTPSYQQLR-KSLK 868

 869 VLHGYTDNVITSRRKQLSNSSNKNHKDSDDFGFRRKEAFLDMLLKTNINGKPLTDLEI 1042

1043 REEVDTFMFEGHDTTTSAVVFTLLNLAKHPAIQQKVYDEIESVIGNDLQKPIELSDLHDL 1222

1223 SYLEMVIKETLRLYPSVPLIGRRCVEETTIEGKTIPAGANIIVGVFFMGRDPNYFEKPLD 1402

1403 FIPERFSGEKSVEKFNPYKYIPFSAGPRNCI 1501

     GQKFALNEMKSVISKLLRHYEFILPAGSPAEPLLASELILKPHHGVPLQIRRRGH*

 

>516274867 broken CYP4D exon 1 probable pseudogene

AGQSALPILGNVLRFLNYLPD (1)

AGCTGGCCAATCGGCCTTACCAATCCTGGGGAATGTACTGAGGTTTCTCAACTACTTGCCCGATGGT

 

>AAGE01019344.1 AGE01032320 54% to 4D17 complete

AAGE01178606.1 exon 2 of 4D like seq, 45% to 4D17 but only 35% to 4D15

TC64783 Length = 903 83% to AAGE01055570

793219534 630759272 note that in 630759272 exons 2 and 3 are on the – strand

and exon 4 is on the plus strand. But the order is correct on 587120742

520001141 512663558,

AAGE01124480.1 516274910 = exon 1 most like 4D17

744614807 568757642 576385708 are exact matches so this seq is

really different from the seq of AAGE01055570

this is probably the N-term exon of seq AAGE01019344

904  MWIYLSLLTVGFVAVVIYRKFARTLEVAQQYAGPPALPILGNGLWFLNKQPD (1)  1059

1674 EFLPIIHKLTSTYGDVVRFWQGPQFTLYVGNPSMIE (0)  1781

19   ILTNKHLTDKSGEYDYLSNWLGDGLLLSKRHKWHARRKAITPAFHFKILEQFVDVFDRNAAELV 198

199  DVLEKHADDGKTFDMFPYVLLYALDVIC (1)

332  ESAMGTSVNALRNADSEYVRAVKEAAHVSIKRMFDIIRRTSLFYLTPSYQKLRKALK 511

512  VLHGYTDNVIVSRRNQLMSKTDSGGVSDEFGAKKKDAFLDMLLRTSINGKPLTNLEIRE 688

689  EVDTFMFEGHDTTTSAVVFTLFNLAKHPEIQQKVYDEIVSVIGKDPKEKIELSHLHDLSY 868

869  TEMAIKETLRLFPSVPLIGRRCVEEITIEGKT

     IPAGANIIVGIYFMGRDPKYFENPSHFIPERFEGEFSVEKFNPYKYIPFSAGPRNCI (1)

     GQKFALNEMKSVISKLLRHYEFILPPDSVEEPPLASELILKPHRGVPLQIRHRALN*

 

>AY431801 64% to 4D24 AAGE01115931.1 AAGE01014858.1 AAGE01023514.1 complete

AY433130 change 1 aa use AAGE02013268.1

MFLLVTVFFAVVSLAVFVYQKFANQLYYGAKIGGPKCYPLVGNAFRFINKSPP ()

DFFLTIERTVREAGKCFRLWLGPELLIIVTDAKVAE ()

GVLSSPKFIEKSGEYNFIRPWLGDGLLTSSYRKWHSHRKIIT

PTFHFKILEQFVEIFDSQSNILIDKLTPFMESGETFDVFPLVTLCALDVIC (1)

ESAMGTKVNAQIHSDSEYVQAVKE 2240 (2)

2179 ITTIIHIRTYDVLARYDFLFNLSSYRKRQDKVLEVLHGYTNSVIRSRRRELSDAKEANPD 2000

1999 NNATSELGIRRKVAFLDMLLQATVDGRPLTDVEIREEVDTFMFEGHDTTTSAISFLLYRL 1820

1819 AKHPEVQHKVYDEIKAVIGEGMTGPVTLSMLNELHYLELVIKETLRLYPSVPFYGRKVLENSEI (1) 1628

1567 EGTTFPAGSNLILMPMFMGRDPEYFDDPLEFRPERFEKEISAEKVNPYRYIPFSAGPRNCI 1385

1384 GQKFAMAELKSVASKVLRHFEVLPPEGGQEESFIGEMILRPTYGVLLRLKKRQ* 1229

 

AAGE02013268.1

212620  MFLLVTVFFAVVSLAVFVYQKFANQLYYGAKIGGPKCYPLVGNAFRFINKSPP () 212462

202379  DFFLTIERTVREAGKCFRLWLGPELLIIVTDAKVAE (0) 202272

189962  GVLSSPKFIEKSGEYNFIRPWLGDGLLTSSYRKWHSHRKIITPTFHFKILEQFVEIFDSQ  189783

189782  SNILIDKLTPFMESGETFDVFPLVTLCALDVIC (1) 189684

189644  ESAMGTKVNAQIHSDSEYVQAVKE (2) 189553

189492  ITTIIHIRTYDVLARYDFLFNLSSYRKRQDKVLEVLHGYTNSVIRSRRRELSDAKEANPDNNA  189304

189303  TSELGIRRKVAFLDMLLQATVDGRPLTDVEIREEVDTFMFEGHDTTTSAISFLLYRLAKH  189124

189123  PEVQHKVYDEIKAVIGEGMTGPVTLSMLNELHYLELVIKETLRLYPSVPFYGRKVLENSEI (1) 188941

188880  EGTTFPAGSNLILMPMFMGRDPEYFDDPLEFRPERFEKE  188764

188763  ISAEKVNPYRYIPFSAGPRNCIGQKFAMAELKSVASKVLRHFEVLPPEGGQEESFIGEMI  188584

188583  LRPTYGVLLRLKKRQ*  188536

 

>CYP4H28 4T2.2 (2 diffs) complete

AAGE01082714.1 AAGE01027375.1 AY432644 55% to 4H18

MLAILVSLATVAFLWLVYQRRMARAAKIAAYFPHPKPVLPLLGNSLMFANKDAPAIFHTVLDLHKQCG  1218

QNLVTYGLFGDVQLHISSPKAIERVLLSKVTKKNYIYEYLEPWLGTGLLLSFGEKWFQRR  1038

KIITPTFHFKILEQFLEVFNAETDRLVTKIEQHVGGEEFDMYQYITLHALDSIC  876

2915 ETSMGVSINALDNPDNAYVHAIKDFGSIVIQRTFSALRSFPLLYFLHPFYWRQQKLIK 3094

3095 TMHN FTNSVIKAKRQALEEKRHTEGETKEHNEDDGIYGKKRMSFLDLLLNESSMSD 3262

3263 ADIREEVDTFMFEGHDTTTSGIYFSLMALAMHPDIQERLYGEIRQVLETEEERHAPLTNATLQQMKY  3463

3464  LDMVIKEVLRVYPSVPIIGRELLEDVEI (1) 3547

3604  NGCQVPRGTAMVVIIHNVHRNAEVFPDPERFDPERFSDESGGKRGPYDYIPFSVGARNCI  3783

GQKYALLEMKVTLVKLLLAYRFIPGKSTDSIRIQGDLVLRPFGNMALRIESR*

 

>4H29 4I1.8  512549996 AAGE01010708.1 784728638 complete

MVPLLMLISLLASALIWVLSALVKNLLVYRELQRKLPNFVSTPTVLLLGNTHLFKKDPTPPG

IFATFNQFHRTYGNDLIVQGLLNRPALQITSAPVVEQVLQARTIKKSIIYEFMRPWLNEG

LITSLGKKWAQRRKIITPAFHFKILEEFLAIFNERTEVFVDKIKDQVGKGDFNIYEHVTL

CTLDIISESAMGVKLNAQDDPNSSYVQAVKE (2)

MSEIIFQRLFGLLRMHKFFFQMSEAAQRQRAALKVLHK

FTDSVIFQRKDQLDDEQARQESKQKLEETDIYGKRKMTLLELLLNVSVEGHHLSNS DIREEVDTFMFEGHDTTTSCISFSAYHIARHPEVQQKLYDEMVQVIGKDFKNAELSYSTL QELKYLEMTIKEVLRIHPSVPIIGRKTTGDMRIDGETVPAGVDIAVLIYAMHNNP EVFPEPEKFDPERFNEENSAKRHPYSYIPFSAGPRNCIGQKFALLEIKVTLVKLLGHYRL LPCEPENEVKVKSDITLRPVNGTFVKIVPR

 

AAGE02013311.1 use this seq (3 aa diffs)

43167  MVPLLMLISLLASALIWVLSALVKNLLVYRELQRKLPNFVSTPTVLLLGNTHLFKKDPTP  42988

42987  PGIFATFNQFHRTYGNDLIVQGLLNRPALQITSAPVVEQVLQARTIKKSIIYEFMRPWLN  42808

42807  EGLITSLGKKWAQRRKIITPAFHFKILEEFLAIFNERTEVFVDKIKDQVGKGDFNIYEHV  42628

42627  TLCTLDIISESAMGVKLNAQDDPNSSYVQAVKE (2) 42529

37902  MSEIIFQRLFGLLRMHKFFFQMSEAAQRQRAALKVLHKFTDSVIFQRKDQLDDEQARQES  37723

37722  KQKLEETDIYGKRKMTLLELLLNVSVEGHHLSNSDIREEVDTFMFAGHDTTTSCISFSAY  37543

37542  HIARHPEVQQKLYDEMVQVIGKDFKNAELSYSTLQELKYLEMTIKEVLRIHPSVPIIGRK  37363

37362  TTGDMRIDGETVPAGVDIAVLIYAMHNNPEVFPEPEKFDPERFNEENSAKRHPYSYIPFS  37183

37182  AGPRNCVGQKYALLEIKVTLVKLLGHYRLLPCEPENEVKVKSDITLRPVNGTFVKIVPR  37006

 

>476148479 476152924 832469399 620727729 529569782 68% to 4H18 complete

AAGE01076911.1

MLLILTLIFATVGYALFNYHRQRQKLLNIRSHFDGPDSHYLWGTFPMFIGKTIP (1)

DIWDIITDLHKKHGEDIAIIAAFN

ELVMDLSSSKNVEKVLLAKSIKKSFAYDFLEPWLGTGLLISTGEKWFQRRKIITPTFHFS

MLEGFLEVFNKEANILVSKLKAKAGKDEFDIYDYVTLYALDSIC

ETSMGVQINAQDDPNNEYAVAVKQMSTFILRRVFSILRTFPSLFFLYPFAKEQKKVILKLH

NFTNSVIDARRAMLEKEKSNKNVTFDLQEENMY

TKRKMTFLDLLLNVTVNGKPLSREDIREEVDTFMFEGHDTTTSGISFTLWHLAKYQDV QQKLFEEIDRVLGKDKVNAELTNLQIQELDYLDMVVKESLRLIPPVPIIGRTLVEDMEM (1) NGVTIPAGTQISIKIYNIHRNPKIWEKSDEFIPERFSKTNESKRGPYDFIPFSAGSRNCIGQ RYAMMELKVTIIKLIASFKVLPGDSMDKLRFKTDLVIRPDNGIPIKLVERI*

 

>AAGE01049176.1 67% to 4H14 complete

MLFLAIVVGALLYLVVNFYVTRKPLERMAVHFSGPKPHYLLGNVLEFLNKDLP (1)

GIFETMVGFHRKYGQDILTWNVLNLNMISVTSAENVEKVLMAKQTKKSFLYSFVEPWLGQGLL

ISSGEKWFQRRKIITPTFHFKILEQFVTVFNKETDTMVENLKKHVDGGEFDIYDYVTLMALDSIC

ETSMGTCVNAQKNPTNRYVQNVKRMSVLVLLRTISVLAGSPLLYDILHPHAWEQRKIIKQ

LHEFTISVIESRRRQLEADKLEQVDFDMNEESLYSKRKMTFLDLLLNVTVEGKPLTNADI

REEVDTFMFE (0)

GHDTTTSGISFAIYQLALNPQIQDKLYDEIVSILGKNSSNVELTF

QTLQDFRYLESVIKESMRLFPPVPFIGRTSVEDMEM (1)

NGTTVKAGQEFLVAIYVIHRNPKVYPDPERFDPERFSDTAESKRGPYDYIPFSAGSRNCI

GQRYAMLEMKVTLIKLLMNYKILPGESMGKVRVKSDLVLRPDRGIPVKLVARS*

 

>494155296  56% to AY205085 66% to 4H14 793189512 AAGE01213118.1

AAGE01473588.1 AAGE01538714.1 531423523 512616786 570666861 571502407

supercontig 1.85 Frame = - complete

2620039 MITLVLVAGVVLYFLRSFLQKRNKLLKIANHFGGPKPLPVIGNLLEFNTDIP (1) 2619884

2602651 GIVHLNHTYGPNLFVWGFLNENVLFLGDTKLVEKVLLAKQTQKSLLYSYLTCWLRTGLLLA 2602469

SGEKWFQRRKIITPTFHFKVLEQFVTVFNREAQTMVDVMRKHVGGKEFDVYSYVTLMALDSVC

ETSMGTSVNAQKDPDNRYVRNVKR (2?)

MSVLFLLRVIHPLATHPELYSLIHPNAYEQRKIVRELHEFT

DNVIATRRKQLKSDQMVDINRNVEDRYSKQKMTFLDLLLNVNIDGKPLTDLD

IREEVDTFMFE (0)

GHDTTTSGISFTIYQLALNPHVQDKIYEEIVAILGKNHKTVELTYQSLQEFKYLEMAIK

EGLRLFPSVPFIGRNLVEDLEF (1)

DDITLPAGQDILIPIYMIHRNPEIYPDPERYDPERFSDGTESKRGPYDYIPF

SAGTRNCIGQRFAMLEMKAALIKLIGNYRILPGESLKKLRIMTDLVVRPEKGVPIRLEERV*

   

AAGE02005220.1 Length=120659 USE THIS SEQ

89011  MITLVLVAGVVLYFLRSFLQKRNKLLKIANHFGGPKPLPVIGNLLEFNTDIP (1)  88856

71635  GIFEKIVHLNHTYGPNLFVWGFLNENVLFLGDTKLVEKVLLAKQTQKSLLYSYLTCWLRTGLLLAS  71438

71437  GEKWFQRRKIITPTFHFKVLEQFVTVFNREAQTMVDVMRKHVGGKEFDVYSYVTLMALDSVC (1) 71252

71190  ETSMGTSVNAQKDPDNRYVRNVKR (2) 71122

71063  MSVLFLLRVIHPLATHPELYSLIHPNAYEQRKIVRELHEFTDNVIATRRKQLKSGQM  70893

70892  LDINRNVEDRYSKQKMTFLDLLLNVNIDGKPLTDLDIREEVDTFMFE  (0) 70752

70216  GHDTTTSGISFTIYQLALNPHVQDKIYEEIVAILGKNHKTVELTYQSLQEFKYLEMAI  70043

70042  KEGLRLFPSVPFIGRNLVEDLEF (1)

69911  DDITLPAGQDILIPIYMIHRNPEIYPDPERYDPERFSDGTESKRGPYDYIPFSAGTRNCI  69732

69731  GQRFAMLEMKAALIKLIGNYRILPGESLKKLRIMTDLVVRPEKGVPIRLEERV*  69570

 

>TC65985 TC16577 TC24796 TC37697 57% to CYP4H14

AAGE01321728.1 AY205085 AAGE01106416.1 complete

MFNFAVFLVILVVGLARFCINRSKLQQLAKHFPGPKPALLVGNLLQFPADIGGIFRRMVYY

HEKFGPDIVTWGIGNTLKFNVSSTRNVEKVLMAKTVQKSLSYSFIEPWLGKGLLTSTGRK

WFQRRKIITPTFHFTILEGFAEVFNRNADTLIDKLKVHEGGSEFDVYRYVSLYALDSICE

TAMGVQVHAQDDPENQYVRDVNRLSELFLLRIFSFLGMFPTLYWYLHPNAWEQRKLIRTL

HQFTDNVIWKRREQLMNGPRNDEMDNTTLSKKKQTFLDLLLCMSVESQPLSNEDIREEVD

TFMFGGHDTTSSAISFTIMQLALHQDIQDKLYAEIVSILKGQNLKTTHLTFNNIQDFKYL

DLIVKESLRLLPPISYVGRKLTEDTELNGATIPAGQDIFIPIYMVHRNPKIYPDPERFI

PERFAENAENLRGPYDYIPFSIGSRNCIGQKYGMMQLKMTVVRLIANFRVLPSEATASVK

LRTDLVLRPEYGIPIKIEARN*

 

AAGE02013268.1 use this seq (3aa diffs) no introns

13367  MFNFAVFLVILVVGLARFCINRSKLQQLAKHFPGPKPALLVGNLLQFPADIGGIFRRMVY  13188

13187  YHEKFGPDIVTWGIGNTLKFNVSSTRNVEKVLMAKTVQKSLSYSFIEPWLGKGLLTSTGR  13008

13007  KWFQRRKIITPTFHFTILEGFAEVFNRNADTLIDKLKVHEGGSEFDVYRYVSLYALDSIC  12828

12827  ETAMGVQVHAQDDPENQYVRDVNRLSELFLLRIFSFLGMFPTLYWYLHPNAWEQRKLIRT  12648

12647  LHQFTDNVIWKRREQLMNGPRNDEMDNTTSSKKKQTFLDLLLCMSVEGQSLSNEDIREEV  12468

12467  DTFMFGGHDTTSSAISFTIMQLALHQDIQDKLYAEIVSILKGQNLKTTHLTFNNIQDFKY  12288

12287  LDLIVKESLRLLPPISYVGRKLTEDTELNGATIPAGQDIFIPIYMVHRNPKIYPDPERFI  12108

12107  PERFAENAENLRGPYDYIPFSIGSRNCIGQKYGMMQLKMTVVRLIANFRVLPSEATASVK  11928

11927  LRTDLVLRPEYGIPIKIEARN*  11862

 

>AY431450 65% to 4J10 AAGE01108571 514842991 complete

continues on AAGE01227281.1 AAGE01378346.1

MFSSVLSLVIITLIVLLAVYEWYLRQRDGYRAALQYPGGPML

PVLGNILEVLIKDTVQTFNYARSNALKYGRSYRQWIFG

NVILNVIRIREAEPILSSTKHTRKSILYRFLEPLMGDGLLCSKGSKWQARRKILTPAFHF

SILNDFLQVFQEEAEKLVGLLDSCADAEEEVVLQSIVTRFTLNTIC (1)

ETAMGVKLDTFIGADKYRSQVYDVGERIVHRTMTPWLYDDGVYNLFGYQKPLEDA

IEPIHDFTRSIIRQKREQLKQDSTMHIVDSDGI (2)

YGSKQRYAMLNTLLMAEENDAIDEEGIREEVDTFMFEGHDTTAAGLIFSILLLATEQEAQ

QRVYDELLKARSTKSESEAFTIADYNNLKYLDRFVKEALRLYPPVSFISRNLSGPLEV

DSTTFPHGTIAHIHIYDLHRDPEQFPDPERFDPDRFLPEVAA

KRNPYAYVPFSAGPRNCIGQKYALLEMKTVLCALLINYRILPVTTRQEVIFIADLVLRAK

TPIKVQFAKRKANATRS*

 

>AAGE02025842.1 first P450 on contig (of two)

6 aa diffs to AY431450 all in short interval

trace files 811916166, 586617316, 582273387 match this seq

580134265, 753220309, match AY431450

There may be two sequences

Searched with the first 211 nucleotides to see if there were two

Alternate matches affecting synonomous codons. All but one trace file matched

This genomic seq.

180450  MFSSVLSLVIITLIVLLAVYEWYLRQRDGYRAALQYPGGPMLPVLGNILEVLIKDTVQTFNYARSN 180647

180648  ALKYGRSYRQWIFGNVILNVIRIREAEPILSSTKHTRKSILYRFLEPLMGDGLLCSKGS  180824

180825  KWQARRKILTPAFHFSILNDFLQVFQEEAEKLVGLLDSCADAEEEVVLQSIVTRFTLNTIC (1) 181007

181067  ETAMGVKLDTFIGADKYRSQVYDVGERIVHRTMTPWLYDDGVYNLFGYQKPL  181222

181223  EDAIEPIHDFTRSIIRQKREELKQDSTMHIEDSGDI (2) 181330

181387  YESKQRYAMLNTLLMAEENDVIDEEGIREEVDTFMFEGHDTTAAGLIFSILLLATEQEAQQRV  181575

181576  YDELLKARSTKSESEAFTIADYNNLKYLDRFVKEALRLYPPVSFISRNLSGPLEV (1)  181740

181798  DSTTFPHGTIAHIHIYDLHRDPEQFPDPERFDPDRFLPEVAAKRN  181932

181933  PYAYVPFSAGPRNCIGQKYALLEMKTVLCALLINYRILPVTTRQEVIFIADLVLRAKTPI  182112

182113  KVQFAKRKANATRS*  182157

 

>AAGE01397643.1 84% TO 223407477 569795084

250bp downstream of AAGE01227281.1

join with AAGE01226366.1

MDFLMDWWFAVLIIVIVLLAWDAIDKSGRPYRAMNKFPGPRVFPLIGTLSEILFKDQGK

TFQLAREWPKRYGGSYRFWVNSTLYVLNVVRVREAEPILSSTKNIDKSRFYKFLHPFLG

LGLLNSTGPKWMHRRRILTPSFHFNILNGFHRTFVEECDQLLATIDEHVDKGVSTAL

 

>AAGE01226366.1 95% to AAGE01331087.1 10 aa diffs (allele?)

      YLNP

1440  KKRYAMLDSLLVAEQKQLIDEAGIREEVDTFAFEGHDTTAAALVFIFFTLAHESAVQDRI  1261

1260  YSEIRQVYNGKPQSDRVFTPQDYSEMKFLDRALKECLRLWPPVAFISRNISEDIVLEDGA  1081

1080  VIPAGCVANIHIFDLHRDPEQYPDPDRFDADRFLPEEVDRRNPYAYVPFSAGPRNCIGQK  901

900   YAMMELKVVIVNALLKFRVLPVTKLEDINFVADLVLRSTNPIEVRFERR*  754

 

AAGE02025842.1 ESTs DW194177.1 EB096538.1 use this seq

Second P450 on contig (of two)

182386  MDFLMDWWFAVLIIVIVLLAWDAIDKSGRPYRAMNKFPGPRVFPLIGTLSEILFKDQ (1) () 182556

182616  AKTFQLAREWPKRYGGSYRFWVNSTLYVLNVVRVREAEPILSSTKNIDKSRFYKFLHPFLG  182798

182799  LGLLNSTGPKWMHRRRILTPSFHFNILNGFHRTFVEECDQLLATIDEHVDKGVSTALQPV  182978

182979  MSKFTLNTIC (1)

        ETSMGVKLSTVSGADVYRTKLYEIGEALVHRLMRPWLLNDFLC

        RLTGYKAAFDKLLLPVHSFTTGIINKKREQFQASSEPLVELTEENI (2)

        YLNPKKRYAMLDSLLVAEQKQLIDEAGIREEVDTFAFEG  183506

183507  HDTTAAALVFIFFTLAHESAVQDRIYSEIRQVYNGKPQSDRVFTPQDYSEMKFLDRALKE  183686

183687  CLRLWPPVAFISRNISEDIVLEDGAVIPAGCVANIHIFDLHRDPEQYPDPDRFDADRFLP  183866

183867  EEVDRRNPYAYVPFSAGPRNCIGQKYAMMELKVVIVNALLKFRVLPVTKLEDINFVADLV  184046

184047  LRSTNPIEVRFERR  184088

 

>AAGE01331087.1 61% to 4J5 575366287 574128015

no ESTs for this seq. no exact match in WGS 95% to AAGE02025842.1 second gene

      YLNP

1211  KKRYAMLDSLLVAEQKQLIDEAGIREEVDTFAFEGHDTTAAALVFIFFTLAHEPAVQDRI  1032

1031  YSEILQVYNGKPQSERAFTPQDYAEMKFLDRALKECLRLWPPVAFISRNISEDIVLDDGT  852

851   LIPAGCVANIHIFDLHRDPEQYPEPDRFDADRFLPEEVDRRNPYAYVPFSAGPRNCIGQK  672

671   YAMMELKVVVVNALLKFRVLPVTKLEDINFVADLVLRSTNPIEVRFERR*  525

 

>AAGE01288441.1 97% to AAGE01216085.1 10 aa diffs

note on finding the C-term.  This seq is 74% identical to 4J5

at the C-term.  This 4J5 seq continues as

GQKYALLEVKTAVAYLVLRYRILPATKREEIRFIADLVLRSATPLKVRFERRQNA*

Which is 59% to AAGE01331087 so this is a good model

The N-term part of this seq has only one seq in the trace files

So it may be a poor version of the AAGE01216085.1 seq.

1368  IVIRGSFVINAIRARETEALLSSTKLIDKSILYTFLYPFMGKGLLTSTGPKWFHRRKILTAAFHFNI  1189

1188  LPKFLVTFQEECDKLLRKLDADVKADNTTTLQSVAARFTLNTIC  1057

997   ETAMGVKLDSMSMADEYRAKIQEVIKLLLLRVMNPWLVEEFPYRLLGFRRRLMKVL  830

829   KPIHAFTRSIIKQRRDLFHANVKNVDDFSEENIYVNTNQRYALLDTLLASEAKNQIDEEG  650

649   IREEVDTFMFEGHDTTASAFTFIFLVIANHQEAQRQLVEEIEAMIAGRIKPTEPLSMHDY  470

469   SELKFMDRVIKECLRLYPPVPFISRAILEDALLGDRFIPKDSMANLHIFDLHRDPDQFPD  290

289   PERFDPDRFLPANVEKRNPYAYVPFSAGPRNCI  191

 

>AAGE01216085.1 61% to 4J9 578920794 complete

TC57837 Length = 832 100% to AAGE01216085 extends the end

519943525 826152951 513457906 new C-term for 4J seq

attempted walking to join with N-term part. Ran into a gap

613942247 589588262

     MDWLTIVLLLILALLALYEVHLRLLLSNRAAKQFPGPRR

4    LPVLGNALALLFNDQVSTFKLPRRWAQRYKESYRLVIRGGFVINAIRARETEALLSSTKL  183

184  IDKSILYTFLYPFMGKGLLTSTGPKWFHRRKILTAAFHFNILPKFLVTFQEECDKLLRKL  363

364  DADVKAGNTTTLQSVAARFTLNTIC  438 (1)

498  ETAMGVKLDSMSMADEYRAKIQEVIKLLLLRVMNPWLVEEFPYRLLGFRRRLMKVL  665

666  KPIHAFTRSIIKQRRDLFHANVKNVDDFSEENIYVNTNQRYALLDTLLASEAKNQIDEEG  845

846  IREEVDTFMFEGHDTTASAFTFIFLVIANHQEAQRQLVEEIETMIAGRSNPTEPLSMHDY  1025

1026 GELKFMDRVIKECLRLYPPVPFISRAVLEDAQLGDRFIPKDSMANVHIFDLHRDPEQFPD  1205

1206 PERFDPDRFLPENVEKRNPYAYVPFSAGPRNCI  1304

QRFAMLELKAILTAVLREFRVLPVTKREDVVFVADMVLRSRDPIVVKFERR* 677

 

>223407477 AABIG09TP.gz 223407646 AABIH08TP.gz 65% to 4J9

AAGE01099570.1 574077942

MDFLTNWWFGALVIVTVLLVRDAIDKSGRIYRAINKFAGPPCLPLIGTLCEILFMNQGK (0)

TYQWARKWPKRYGGSYRFWFSSTLYVLNVVRVREAEHILSSTRNI

DKSRFYKFLHPFLGLGLLNSNGPKWMHRRRILTPSFHFNILNGFHHTFVEECDQLLATID

EHVDKGVPTALQPVMSKFTLNTIC

 

Correct seq 88% to AAGE02025842.1 matches Nelson 223407477 on top

AAGE02025843.1

6807 MDFLTNWWFGALVIVTVLLVRDAIDKSGRIYRAINKFAGPPCLPLIGTLCEILFMNQ (1) 6637

6575 ATTYQWARKWPKRYGGSYRFWFSSTLYVLNVVRVREAEHILSSTRNIDKSRFYKFLHPFLGLGLL

     NSNGSKWMHRRRILTPSFHFNILNGFHHTFVEECDQLLATIDEHVDKGVPTALQPVMSKFTLNTIC (1) 6183

6127 ETSMGVKLSTVSGADVYRTKLYEIGEVLVHRLMRPWLLNDFLCRLTGYKAA

     FDKLLLPVHSFTTGIINMKRKQFQESLEPSVELTEENI (2) 5861

5801 YLNPKKRYAMLDSLLLAEQKQLIDEAGIREEVDTFAFEGHDTTAAALVFIFFTLAREPAVQDRI

     YREILQVYSNKPQSSRAFTPQDYSEMKFLDRALKECLRLWPPVTFISRSISEDIILDDGS

     LIPAGCVANIHIMDMHHDPEQFPDPERFDADRFLPEQVDRRNPYAYVPFSAGPRNCIGQK

     YAMMELKVVVVNALLKFRVLPVTKLEDINFVADLVLRSTNPIEVRFERR* 5100

 

>AAGE02030510.1 Length=13206 98% to AAGE02025843.1 6807-5100. 9 aa diffs

new seq

11199  MDFLTNWWFGALVIVIVLLVRDAIDKSGRIYRAINKFAGPPCLPLIGTLCEILFMNQ (1) 11029

10967  ATTYQWARKWPKRYGGSYRFWFSSTLYVLNVVRVREAEPILSSTRNIDKSRFYKFLH  10797

10796  PFLGLGLLNSNGPKWMHRRRILTPSFHFNILNGFHHTFVEECDQLLATIDEHVDKGVPTA  10617

10616  LQPVMSKFTLNTIC (1) 10575

10519  ETSMGVKLSTVSGADVYRTKLYEIGEVLVHRLMRPWLLNDFLCRLTGYKAAFDKLLLP  10346

10345  VHSFTTGIINMKRKQFQESLEPSVELTEENI (2) 10253

10193  YLNPKKRYAMLDSLLLAEQKQLIDEAGIREEVDTFAFEGHDTTAAALVFIFFTLAREPAV  10014

10013  QDRIYSEILQVYSNKLQSALAFTPQDYSEMKFLDRALKECLRLWPPVTFISRSISEDIIL  9834

9833   DDGSLIPAGCVANIHIMDLHHDPEQFPDPERFDADRFLPEQVDRRNPYAYVPFSAGPRNC  9654

9653   IGQKYAMMELKVVVVNALLKFKVLPVTKLEDINFVADLVLRSTNPIEVRFERR* 9492

 

These two seqs are very close but the region QDRIYSEILQVYSNKLQSALAFTPQDY

Has 4 aa diffs.  Trace files support both sequences

588906795, 590281011 match this seq.

832533501, 832391117, 589181510, 579871626, 592076987  match the other seq AAGE02025843.1

 

>AAGE02030510.1 pseudogene like AY431450 new seq

12342  WRSWRKILTPASHFSIFSEFL*LLQKEVDKLVRLLE

       NGIDKYQWQV*DLNGKIGHSMMTPWFL  12007

       YDDGAYNLFGYQKSLEDAIEPIHDFTKN

11898  DEETIQEEVDNLMFEGYDTTAEGLIFSILLLATEQEAQQRV*NELLEDLS  11749

       TNLESESFTVASYKNFNY

 

>AAGE01099570.1 4J like pseudogene

396 RIDHSIMTQLLYDDGVYNLFKYRKSL*DAIEPIHDFIRSIFLQNCVQLNQDSMMYSEEVK 575

576 QTFASTV*VKLSRISRYGLKPTYIMMNILLTAEK 677

676 NDGTAEETILEEVDNTMFEGCDTTAAGLIFSILLLATEQEPQQRV*DKL*EDCSSKS 847

848 ESETFTWMSYNNLKYRFLK 904

GPEYALLEMITIICILLISYRAIXXXXIFIADQILQTKPTAKVDYARRKANAMRN*

 

>AAGE01003123.1 C-term of 4J like seq 90% to AAGE01331087, pseudogene

PRGCVANIHIMDMHHDPEQFPDPDRFNADRFLPEEVERRNPYAYVPFSAGPRNCI

2873  GQKYAMMEL*VVVVNALLKFRVLPVTKLKDINFVADLVLRSTNPIEVRFERR  2718

 

>476375054 Pseudogene 61% to 4J5, 4aa diffs to AAGE01003123, stop codon in same place

666 PRGCVANIHIMDMHHDPEQFPDPDRFNADRFLPEEVERRNPYAYVPFSAGTRICIGQKYA 845

846 MMEL*VVVVNALLKFGILPVTK 911

 

>AAGE01584611.1 89% to AAGE01005255.1 matches 574201551, 520163843

note mate pair of 520163843 = 520524408 that has a C-term

like AAGE01005255, but not identical.  These two genes are linked

AAGE01584611 is upstream of 520524408 on the same strand

12   AELYKSNIREVGKIIQQRIMNPLLFEDWIYKITGYQAEFDKILSPIHSFTNNIIRQRRET  191

192  FHATMRNVDSPSEENTYTNIKQRYAMLDSLLLAEAKHQIDAEGIREEVDTFTFEGHDT  365

366  IGSAFVFTFLLIAHDQLVQQSLYEEIQRMFNLQPIPTLQNYNDLKYMDRVIKESLRI  536

537  YPPVPFISRLITEDVQYDGKLVPRGTLMNVGIYDLHRDPEQFPDPLRFDPDRFLPEQVQR  716

717  RSPYAYIPFSAGPRNCI  767

 

>520524408 593182570 813103660 574115512 593092990 825253407

579726574 exact match to AAGE01484914.1

96% to AAGE01005255, but not the same gene

This seq is identical to TC57838

TC57838 Length = 974 9 aa diffs to AGE01005255 complete

593092990 and 593182570, 825253407 579726574

Another set of WGS seqs are an exact match to AGE01005255

So there are two very similar gene sequences that are 95% identical.

    MYVFTTVAGLLVFIFILYKIYLRSLPSYRAAKYFPGYPVYPIVQNLFT

    ALFKSQTGAFQQARQWARIFNNRTYRVLIQGVLYVQIIHHKDVEMLLSSSRLITKSPLYK 779

    LIVPFIGNGLLNSTGEKWHQRRKILTPTYHFNILQGFLQIFHEECRKLVNQLDKDAAQGI 599

    TTTLQPLSTQVTLNTIC (1)

    ETAMRLKLDTSETAEVYKSNIREVGKVIQQRIMNPLLFEDWIYKITGYQ

  1 AKFDKILRPIHAFTNSIIRQRRETFHETMKNVDSPSEENIYTNIKQRYAMLDSLLLAEA 177

178 KQQIDGEGIREEVDTFTFEGHDTTGSAFVFTFLLIAHEQLVQQRLFEEIERMFNLQPNP 354

355 TQQDYNDLKYMDRVIKESLRIYPPVPFISRLITEDVQYDGKLVPRGTIMNIEIYDLHRDP 534

535 EQFPDPERFDPDRFLPEEVQRRSPYAYVPFSAGPRNCI

GQRFAMLELKAILIGVLREFRVLPVTKREDVVFVGDMVLRSRDPIVVKFERR* 807

 

AAGE02035951.1 missing the last exon use this seq 3 aa diffs to 520524408

2639  MYVFTTVAGLLVFIFILYKIYLRSLPSYRAAKYFPGYPVYPIVQNLFTALFKSQTGAFQQ  2460

2459  ARQWARIFNNRTYRLLIQGVLYVQIIHHKDVEMLLSSSRLITKSPLYKLIVPFIGNGLLN  2280

2279  STGEKWHQRRKILTPTFHFNILQGFLQIFHEECRKLVNQLDKDAAQGITTTLQPLSTQVT  2100

2099  LNTIC (1) 2085

2026  ETAMGLKLDTSETAEVYKSNIREVGKVIQQRIMNPLLFEDWIYKITGYQAKFD  1868

1867  KILRPIHAFTNSIIRQRRETFHETMKNVDSPSEENIYTNIKQRYAMLDSLLLAEAKQQID  1688

1687  GEGIREEVDTFTFEGHDTTGSAFVFTFLLIAHEQLVQQRLFEEIERMFNLQPNPTQQDYN  1508

1507  DLKYMDRVIKESLRIYPPVPFISRLITEDVQYDGKLVPRGTIMNIEIYDLHRDPEQFPDP  1328

1327  ERFDPDRFLPEEVQRRSPYAYVPFSAGPRNCI (1) 1232

      GQRFAMLELKAILIGVLREFRVLPVTKREDVVFVGDMVLRSRDPIVVKFERR.

 

>TC57838 TC48249 matches 593092990 and 593182570, 825253407 579726574

 cyan = corrections

GCAAAATTCGACAAGATTCTTCGTCCCATTCATGCATTCACCAACAGCATCATCCGACAGCGAAGGGAAACATTTCATGA

AACTATGAAAAACGTGGACTCCCCATCGGAGGAGAACATATACACCAACATAAAGCAGCGCTACGCCATGCTGGATAGTC

TTCTGCTGGCGGAAGCCAAACAGCAAATTGACGGCGAAGGGATCCGCGAGGAGGTTGACACGTTTACCTTTGAAGGCCAC

GATACAACTGGCAGTGCCTTCGTGTTCACCTTTCTGTTGATTGCTCACGAGCAACTCGTTCAGCAGCGTCTGTTCGAAGA

GATTGAACGCATGTTCAACCTCCAACCCAATCCAACCCAACAGGACTACAATGACTTGAAGTACATGGATCGGGTGATCA

AGGAATCGCTTCGAATCTATCCGCCGGTGCCATTCATCTCCCGATTGATTACCGAGGATGTACAATACGATGGGAAGTTG

GTACCGAGGGGTACCATCATGAACATCGAAATCTACGATTTGCACCGAGATCCGGAGCAGTTTCCCGATCCGGAACGATT

CGATCCGGATCGGTTTCTGCCGGAGGAGGTCCAGCGGAGGAGTCCGTACGCTTATGTTCCGTTCAGTGCTGGACCGAGGA

ATTGCATTGGTCAACGGTTCGCCATGCTGGAGCTGAAGGCCATCCTCATCGGGGTGCTCCGCGAGTTCCGAGTCCTTCCC

GTTACCAAGCGGGAGGATGTGGTTTTCGTTGGGGACATGGTCCTCCGCTCGAGAGACCCAATCGTGGTCAAATTCGAACG

ACGTTAAGCTTTTTCTTGCTTTTTATAGCGACCCGTTGACCCAGTGAATTCAAGGATTTTTCAGTTTTTTACGGACAAAG

AGCGCATCCTGAACTGCTACTAGGCTACCCAAAAGTCATATTCTTAAATTGTTAATCCTAACATACTGTGTGAATAAATG

TTTTTTTATCGATT

 

>AAGE01005255.1 55% TO 4J9 TC57836 complete

5509 MYVFTTVAGLLVFIFILYEIYLRSLPSYRAAKYFPGYPVYPIVQNLFTALFKSQTGSFQQ  5330

5329 ARQWARIFNHRTYRLLIQGVLFVQIIHHKDVEMLLSSSRLITKSPLYKLIVPFIGKGLLN  5150

5149 STGEKWHQRRKILTPTFHFNILQGFLQIFHEECRKLVYQLDKDAAQGITTTLQPLSTQVTLNTIC  4955 (1)

4896 ETAMGLKLDTSETAEVYKSNIREVGKVIQQRIMNPLLFEDWIYKITGYQAKFDK 4735

4734 ILRPIHAFINSIIRQRRETFHETMKNVDTPSEENIYTNIKQRYAMLDSLLLAESKQQID 4558

4557 AEGIREEVDTFTFEGHDTTGSAFVFTFLLIAHEQLVQQRLFEEIERMFNLQPNPAL 4390

4389 QDYNDLKYMDRVIKESLRIYPPVPFISRLITEDVQYDGKFVPRGTIMNVEIYDLHRDPEQ 4210

4209 FPDPERFDPDRFLPEDVQRRSPYAYVPFSAGPRNCI 4096

     GQRFAMLELKAILTAVLREFRVLPVTKREDVVFVADMVLRSRDPIVVKFERR* 783

 

>AAGE01001298.1 AAGE01138953 gene a C-term complete

821735340 matches on the (-)