Aedes aegypti cytochrome P450s

 

Oct. 18, 2005 under revision April 21, 2006 (in progress)

Revision continues May 19, 2006 to June 25, 2006

Compiled by David Nelson and David Drane

 

The completed and named sequences are here

(http://drnelson.utmem.edu/AedesFasta.June25.htm)

This file is more archival with detailed information.

Please see the FASTA file above.

 

Useful links for analysis

http://www.ncbi.nlm.nih.gov/Traces/trace.cgi  Trace Archive at NCBI

http://trace.ensembl.org/perl/traceview Trace files at Ensemble

http://132.192.64.52/blast/P450.html P450 Blast server

http://www.proweb.org/proweb/Tools/WU-blast.html Do-it-yourself WU Blast

http://www.bioinformatics.vg/bioinformatics_tools/JVT.shtml DNA translator

http://ncbi.nlm.nih.gov/BLAST/Blast.cgi?CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&ALIGNMENTS=50&ALIGNMENT_VIEW=Pairwise&CLIENT=web&DATABASE=nr&DESCRIPTIONS=100&ENTREZ_QUERY=%28none%29&EXPECT=10&FILTER=L&FORMAT_OBJECT=Alignment&FORMAT_TYPE=HTML&GENETIC_CODE=0&NCBI_GI=on&PAGE=Translations&PROGRAM=tblastn&SERVICE=plain&SET_DEFAULTS.x=23&SET_DEFAULTS.y=10&SHOW_OVERVIEW=on&UNGAPPED_ALIGNMENT=no&END_OF_HTTPGET=Yes&SHOW_LINKOUT=yes&GET_SEQUENCE=yes   NCBI TBLASTN search

http://www.ncbi.nlm.nih.gov/BLAST/tracemb.shtml NCBI megablast

http://www.tigr.org/tigr-scripts/tgi/T_index.cgi?species=a_aegypti TIGR Aedes gene index page

 

206 Aedes sequences here including 142 complete sequences. 

Numbers in () are intron phases. Names have not been assigned for most genes.

 

Sequences collected and assembled by David Drane and David Nelson from July to Sept.

2005. 3.5 million of 15 million trace file sequences were downloaded from NCBI and

placed on a stand alone BLAST server on a Mac G4 for TBLASTN searches

at expect value of 10.  The WGS section of Genbank was searched and 220 AAGE01XXXXXX

accession numbers are given at the end of this file.  The TIGR Gene Index was

searched for text “P450”.  The EST section of Genbank was searched and

discontiguous megablast was used to extend sequences by chromosome walking.

Most sequences should be represented here now, but not all are assembled. 

The Aedes mosquito seems to have more P450s than the Anopheles mosquito. 

 

This file is in progress.  The CYP4 and CYP325 families are not yet fully assembled

because there are some large introns in these sequences.

The sequences are presented in clan groups: the CYP2, CYP3, CYP4 and mitochondrial

clans.  Note: Aedes has a CYP18 that was not found in Anopheles.

CYP329 of Anopheles now looks like it is a pseudogene of a CYP9 sequence.

It is short in the heme signature and it has a P at the critical T in the I-helix

oxygen binding pocket.  It is the only sequence that is in the CYP3 clan that does not

fall inside the CYP6 or 9 families in Anopheles.

There are 11 complete sequences in the CYP2 clan (CYP15, 18, 303, 304, 305, 306:phm, 307)

Phantom phm is one of the Halloween genes.

There are 76 complete sequences in the CYP3 clan (CYP6, CYP9)

There are 34 complete sequences in the CYP4 clan (CYP4 , CYP325)

There are 9 complete sequences in the mitochondrial clan (CYP12, 49, 301, 302:dib,

314:shd, 315:sad) These include three of the Halloween genes disembodied dib, shade shd,

shadow sad.

There are 21 pseudogenes so far.

There are 15 partial sequences (not including the pseudogenes).

 

CYP2/CYP18 clan sequences

 

>514720743 753475610 750240311 possible CYP15 N-term = DR747015.1 EST

MWQNLVVLIIFVILFCLRDMRKPGYFPP (1)

 

>CYP15B like 585964866 641740723 584363040

 78 GPNWFPLIGSGFEVFRLVKHFKFYHLMWAELMRRYGPIVGLRLGRDRVVIVSGLDA 257

258 IREVYSKDQFDGRPDGFFFRIRSFDKRLGVVFTDGAHWDIQRRFSVRTLKALGMGRTGMV 437

438 NSLEREAEEMIHHLRKLSRTQKVISMHNAFDVSVLNSIWTLIAGKR (2) 575

    FDLDDKKLEWIMETIHKSFRVIDMSGGVLNQFPPIRYVLPDKSGFAPLLNLLSPLWTFLQ 816

 

>CYP15B like seq DR746695.1 adult female corpora allata cDNA

813859354 749484786 522065275 514869301 520643713

GTIKSIRSKLDQPDNPDCFIASYLRELNIAERHSSFTNEQLLCLCLDL

FQAGSETTSNTLGYGIAHMLHHPEIVQKIHNELDSVIGRYRLPLLADRPYLPYTEAVLCE

IQRISNIAPLAIAHRTVAPVQLGTYVIPKNTITLISLYSLHMDKAYWGDPEVFRPERFLN

ETGDKLIAHEYFVPFGS (1)

GKRRCLGESLAKSSLFLFFTAFMHAFLVEPAEPGKLPELDGIDGITLSPCPYFVQLKERLI*

 

>possible complete CYP15B1 assembled from parts 52% to 15B1 from Anopheles

AAGE01116789 AAGE01129498

Used trace archive seqs to verify seq at PLLNLLRPLWTFLQ

This region is not accurate in AAGE02003241.1

638470554 

823375362 

593712263 

586030336 

641740723 

569671400 

used AAGE02003241.1 for the C-term seq changes

MWQNLVVLIIFVILFCLRDMRKPGYFPP (1)

GPNWFPLIGSGFEVFRLVKHFKFYHLMWAELMRRYGPIVGLRLGRDRVVIVSGLDA

IREVYSKDQFDGRPDGFFFRIRSFDKRLGVVFTDGAHWDIQRRFSVRTLKALGMGRTGMV

NSLEREAEEMIHHLRKLSRTQKVISMHNAFDVSVLNSIWTLIAGKR (2)

FDLDDKKLEWIMETIHKSFRVIDMSGGVLNQFPPIRYVLPDKSGFAPLLNLLRPLWTFLQ (0)

GTIKSIRSKLDQPDNPDCFIASYLRELNIAERHSSFTNEQLLCLCLDL

FQAGSETTSNTLGYGIAHMLHHPEIVQKIHNELDSVIGRYRLPLLADRPYLPYTEAVLCE

IQRISNIAPLAIAHRTVAPVQLGTYVIPKNTITLISLYSLHMDKAYWGDPEVFRPERFLN

ETGDKLVAHEYFVPFGS (1)

GKRRCLGESLAKSSLFLFFTAFMHAFLVEPAEPGKLPELDGIDGITLSPCPYYVQLKERLI*

 

>567527404  46% to CYP15B1 may be a CYP15 pseudogene

XXXXXXXXXXXXXXXXLVRRFRFYHHTCAAFMCLYRPIVDLRMGRDRVVIMTGLDP

I*KVYSKDEKENRPVGFFFRIRSFDKRLAVVFTDGAHWDIQRRFSVRTLKALGMGRTGLV

SSLEREAEEMIHHLRKLSRTQKVISRNNAFDVSVLNSIWTLIAER

 

>CYP18A1 AAGE01025833 AAGE01338874.1 AAGE01065191.1 529463664 572557122

66% to 18A1 (note: CYP18 not seen in Anopheles) complete

revised at cyan aa based on AAGE02007615.1

     MFLDTYLLGVVRQEFFDASKARST

2678 LLVFCCTLSCVVFLQWLFRLVCQIKKLPPGPWGVPIFGYLTFIGHEKHTQYMKLARKYG 2502

2501 SLFSAKLGAQLTVVISDYKIIREAFKTEDFTGRPHSPLLKTLGGF (1?)

     GIINSEGQLWKDQRRFLH 719

 718 EKLRHFGMTVLGNKKHLMESRIM (0)

 534 TEVAELLASLNEVGSQSTDLSKYLSVSVSNVICNIIMSVRFSLEDPKFKRFNWLIEEGMRLF 359

 358 GEIHTIDYIPQIQYLPGNINAKNKIAKNRQEMFDFYREVIDEHKRSFNAENIRDIVDAYL 179

 178 DEIQKAQAEGRDQELFDGKDH 116

     EIQMMQVIADLFSAGMETIKTTLLWLNVFMLRHPDAMKRVQDELDQVVGRNRLPKIEDVP 406

     YLPITETTILEVMRISSIVPLATTHSPKS (2) 319

2116 DVVINGYTIPAGSYVVPLINSVHMDPTLWDKPEEFNPSRFLDAEGKVHKPDFFIPFGVGR  1937

1936 RRCLGDVLARMELFLFFASIMHTFTIELPEDEPMPSLKGIIGVTISPQAFRVKLIPRPLN  1757

1756 ADLDRLRNVGSC*  1718

 

>AAGE01098313 (upper seq) CYP18 like fragment probable pseudogene

Query:  1703 ASLNEVGSRSS 1671

             ASLNEVGS+S+

Sbjct:   210 ASLNEVGSQST 220

 

Query:   313 VGSQSTELSKYLSVLVSNVICNIIMSVRFSLEDPKF--------------GGMHTIDYIP 176

             VGSQST+LSKYLSV VSNVICNIIMSVRFSLEDPKF              G +HTIDYIP

Sbjct:   215 VGSQSTDLSKYLSVSVSNVICNIIMSVRFSLEDPKFKRFNWLIEEGMRLFGEIHTIDYIP 274

 

Query:   175 QIQYLPGNV-------KNRQEMFDIYREVINEHKRSFNAENIRDIV 59

             QIQYLPGN+       KNRQEMFD YREVI+EHKRSFNAENIRDIV

Sbjct:   275 QIQYLPGNINAKNKIAKNRQEMFDFYREVIDEHKRSFNAENIRDIV 320

 

Query:    56 AYLDEILKAQAE 21

             AYLDEI KAQAE

Sbjct:   322 AYLDEIQKAQAE 333

 

>AAGE01227048 (upper seq) CYP18 like fragment probable pseudogene

Query:   643 ASLNEVGSQPIDLNKYLSVSVSNVICNIIMSVRFSLEDPKFA-------------G-LHT 780

             ASLNEVGSQ  DL+KYLSVSVSNVICNIIMSVRFSLEDPKF              G +HT

Sbjct:   210 ASLNEVGSQSTDLSKYLSVSVSNVICNIIMSVRFSLEDPKFKRFNWLIEEGMRLFGEIHT 269

 

Query:   763 FAGLHTIDYIPQIQYLPGNV-------KNRQEMFDFYREMIDEHKQSFNAENIRDIV 912

             F  +HTIDYIPQIQYLPGN+       KNRQEMFDFYRE+IDEHK+SFNAENIRDIV

Sbjct:   264 FGEIHTIDYIPQIQYLPGNINAKNKIAKNRQEMFDFYREVIDEHKRSFNAENIRDIV 320

 

Query:   915 AYLDEILKAQAEDRDQELFEGKDHEI 992

             AYLDEI KAQAE RDQELF+GKDH++

Sbjct:   322 AYLDEIQKAQAEGRDQELFDGKDHDV 347

 

>CYP303A1 AAGE01109944 641807020 834983680 618119317

834966118 826136105 587934965

72% to 303A1 complete

MYWYYLACFIVVFIIFLYLDCIKPANFPPGPKWYPIIGSAIEIARARQKTGMLCKAIKLIASKYDHKGVIGF

KVGKDKTVMAISGDSLREMMSNEDLDGRPTGIFYETRTWGLRRGVLLTDEEFWQEQRRFI

VRHLKEFGFARKGMAEIIGNEAEYVKNDFHALVKAGNGKALVQMQSAFSVYILNTLWLMM

AGIRYTRENKDLKYLQSLLHELFANIDMMGALFSHFPFIRFFAPRLSGYKQFVEIHNLMH

KFIGAEVENHKKSFNDTDEPRDLMDVYLKILQSNRDIPESFSQEQLLAVCLDMFIAGSET

TTKTLGFAFLHLVRQRETQLKVQKELDEVVGRNRLPTLEDRVN (2)

LPYCEAVVLEALRMFMANTFGIPHRALRDTKLCGYDIPK (0)

DTMLVGMFRGMMLNDWESPTSFKPERFLKGGKIVIPPNFHPFGVGRHRCMGEMMGKAN 110

LFLFITTLFQSFDFLVPEGYPIPSDEPIDGATPSVRQYTALIVPR*

 

>581536484 803281860 586608108 826028980

574131458 595148561 754352758 590136340 519840563 753671460

67% to 512982119 above 58% to 304B anoph

tried walking the chromosome down to exon 2 so

some numbers above are in the intron

MLLNPSAILWTVAAGLLIYRCFRFMFDRPPNFPSGPPRFPLLGSYLVLLMVNYRHLHR

AAVRLGQFYRTKILGIYLGDFPSIVVNDLAIAKEVLARSEFDGRSDLFLARMRERNFQRR

GIFFTDGPHWKEQRRFVLRHLRDYGFGRRFDELEAETRSELMTLLDVLRYGPKFEHERLF

AKDGCVKCPDAFYGLLGNVYFQVICGERFQRKDMAQLYE (1)

 

>223483644 519671636 528946489 494183870

a second 304B like C-term sequence 57% to 304B

TGRHAINFQQKGDDYGTILSYLPWLKDYFPEATNYRILREVNNRMNDLIEAMVQKY

LASYDENHMRCFLDRYIYEMKQSKPLEGDAFTFQ (1)

YDQLVMILWDMLLPTLSGSAIQLSMLLERLLLNPRVATKVQQELDGVVGHGRLPTLDDRV 889

NLPYAEATLLEALRIDTLVPSGISHVALEDTKLCGYDIPKGCFVMLSLDVINNQREFWGDP

ENFRPERFLDESGKLSLKKDISVPFGGGKRLCVGETFSRNTLFLMFTALMQNFNIKPRPG

DPLPDLGQRITGVVTSMEPFWLRFEAR*

 

>CYP304B2xx Possible full length gene joining the 512982119 and 223483644 fragments complete

note this is a hybrid of two different genes, see corrected seq below

AAGE01051934

MFVTPTIFLWLVTIGLIAYRCHRFLFDRPKNFPDGPPKFPLLGGYAVMLLINFYHLHR

AANKLCEYYRTKILGIYLGNFPTVIVNDFATVKEVLNRVEFDGRPDLFIARMREKNFLLR

GIFFTQGPDWKEQRRFILRYLRDYGFGRRFDELEAETNAEILTLVEMLRYGPRHEHETEF

MTKDGCAMVPNVFFACFANAFLYVLTGERINRDEAGALFE

TGRHAINFQQKGDDYGTILSYLPWLKDYFPEATNYRILREVNNRMNDLIEAMVQKY

LASYDENHMRCFLDRYIYEMKQSKPLEGDAFTFQ (1)

YDQLVMILWDMLLPTLSGSAIQLSMLLERLLLNPRVATKVQQELDGVVGHGRLPTLDDRV 889

NLPYAEATLLEALRIDTLVPSGISHVALEDTKLCGYDIPKGCFVMLSLDVINNQREFWGDP

ENFRPERFLDESGKLSLKKDISVPFGGGKRLCVGETFSRNTLFLMFTALMQNFNIKPRPG

DPLPDLGQRITGVVTSMEPFWLRFEAR*

 

>CYP304B3yy/xx top part = my old Byy, bottom = my old Bxx + 1 aa diff

DW987682.1 EST supports this assembly, so mine are hybrids

AAGE02028825.1 revised seq on 4/20/06

46553 MLLNPSAILWTVAAGLLIYRCFRFMFDRPPNFPSGPPRFPLLGSYLVLLMVNYRHLHRAAVRLGQFYRTKILGIYLGDFP

SIVVNDLAIAKEVLARSEFDGRSDLFLARMRERNFQRRGIFFTDGPHWKEQRRFVLRHLRDYGFGRRFDELEAETRSELM

TLLDVLRYGPKFEHERLFAKDGCVKCPDAFYGLLGNVYFQVICGERFQRKDMAQLYE (2) 47203

61403 TGRHAINFQQKGDDYGTILSYLP

WLKDYFPEATNYRILREVNNRLNDLIEAMVQKYLASYDENHMRCFLDRYIYEMKQSKPLEGDAFTFQ (1) 61672

61872 YDQLVMILWDMLL

PTLSGSAIQLSMLLERLLLNPRVATKVQQELDGVVGHGRLPTLDDRVNLPYAEATLREALRIDTLVPSGISHVALEDTKL

CGYDIPKGCFVMLSLDVINNQREFWGDPENFRPERFLDESGKLSLKKDISVPFGGGKRLCVGETFSRNTLFLMFTALMQN

FNIKPRPGDPLPDLGQRITGVVTSMEPFWLRFEAR* 62501

 

 

>512982119 637789748 834948129 570603901 750442192 570627554 568540398

743856885 581525309 637183809 812171267 586112683 570800380

579961153 793213948 581533371 587665129 570695804 574007683

60% to 304B1anopheles numbers above include a long chromosome

walk of about 5-7kb, about 500 bp per step.  No C-term was found

N-term exon is 55% to 304B anoph. and 48% to 304C anoph.

MFVTPTIFLWLVTIGLIAYRCHRFLFDRPKNFPDGPPKFPLLGGYAVMLLINFYHLHR

AANKLCEYYRTKILGIYLGNFPTVIVNDFATVKEVLNRVEFDGRPDLFIARMREKNFLLR

GIFFTQGPDWKEQRRFILRYLRDYGFGRRFDELEAETNAEILTLVEMLRYGPRHEHETEF

MTKDGCAMVPNVFFACFANAFLYVLTGERINRDEAGALFE

 

>CYP304B 494544931 512720460 41% to 476322188 72% to 304B1

827562306 594336057 512633341

(2) TGKYAMMFQRTGDDYGTIYSLLPWMRHLFPNRTRYRTIREGSLGVNRFIESII

QKRLETHEEGHVRCFLDLYFTEMKKTVPRTEDNRFTFQ (1)

HDQLVLGIVDFFFPAISGATTQ

IALLLERLLWHPEVVQKMQAEIDDVVGHGRLPTLDDRINLPYTEATLREAMRI

DTLVPSGVAHMAMKDTTLRGYDIPKDTILVLGLDSIHMQKDIWGDPERFRPERFLNYRGE

LSLSKDVSVPFGAGKRLCAGETFARNTMFLIVSALVQNFNIRQRLGDKLPDMGKRSTGII

ISPADYWVKFEPR*

 

>CYP304Byy AAGE01029809 Possible full length gene joining

the 581536484 and 494544931 fragments complete

note this is a hybrid of two different genes, see corrected seq below

MLLNPSAILWTVAAGLLIYRCFRFMFDRPPNFPSGPPRFPLLGSYLVLLMVNYRHLHR

AAVRLGQFYRTKILGIYLGDFPSIVVNDLAIAKEVLARSEFDGRSDLFLARMRERNFQRR

GIFFTDGPHWKEQRRFVLRHLRDYGFGRRFDELEAETRSELMTLLDVLRYGPKFEHERLF

AKDGCVKCPDAFYGLLGNVYFQVICGERFQRKDMAQLYE (1)

(2) TGKYAMMFQRTGDDYGTIYSLLPWMRHLFPNRTRYRTIREGSLGVNRFIESII

QKRLETHEEGHVRCFLDLYFTEMKKTVPRTEDNRFTFQ (1)

HDQLVLGIVDFFFPAISGATTQ

IALLLERLLWHPEVVQKMQAEIDDVVGHGRLPTLDDRINLPYTEATLREAMRI

DTLVPSGVAHMAMKDTTLRGYDIPKDTILVLGLDSIHMQKDIWGDPERFRPERFLNYRGE

LSLSKDVSVPFGAGKRLCAGETFARNTMFLIVSALVQNFNIRQRLGDKLPDMGKRSTGII

ISPADYWVKFEPR*

 

>CYP304Bxx/yy top part = my old Bxx, bottom = my old Byy

AAGE02028825.1 revised accurate seq 4/20/06

22307 MFVTPTIFLWLVTIGLIAYRCHRFLFDRPKNFPDGPPKFPLLGGYAVMLLINFYHLHRAANKLCEYYRTKILGIYLGNFP

TVIVNDFATVKEVLNRVEFDGRPDLFIARMREKNFLLRGIFFTQGPDWKEQRRFILRYLRDYGFGRRFDELEAETNAEIL

TLVEMLRYGPRHEHETEFMTKDGCAMVPNVFFACFANAFLYVLTGERINRDEAGALFE (2) 22960

35137 TGKYAMMFQRTGDDYGTIYSLL

PWMRHLFPNRTRYRTIREGSLGVNRFIESIIQKRLETHEEGHVRCFLDLYFTEMKKTVPRTEDNRFTFQH 35412

35469 DQLVLGIVDF

FFPAISGATTQIALLLERLLWHPEVVQKMQAEIDDVVGHGRLPTLDDRINLPYTEATLREAMRIDTLVPSGVAHMAMKDT

TLRGYDIPKDTILVLGLDSIHMQKDIWGDPERFRPERFLNYRGELSLSKDVSVPFGAGKRLCAGETFARNTMFLIVSALV

QNFNIRQRLGDKLPDMGKRSTGIIISPADYWVKFEPR* 36092

 

>CYP304C1 AAGE01104491 512990636 572473586 613989430 64% to CYP304C1

749978894 754492027 584954719 complete

MVLISELIIAALLGLLIYRFYRYLFERPSENFPPGPPRL

PLLGGYPFMLALNYKHLHKAAARLSQLYKSKLIGLYLGPLPAVIVNDYDTVKEVLTRPEF

DGRPDLFMARLRDQHFQRR (1)

GIFFTDSESWREQRRFFLRTLHHFGFGRRSPEAEADIQAGLEDVISLLRDGPKYEHEKAL

VDSAGFALCPTVFFAVFSNVLLRMIVGVRLAREDQAVMFE

VGKNAIAFHRNGDDYGMLLSYIPWIRHLFPKTTKYDLLRKVNQQANAVILSLAQKCES

SYDENDIRCLVDAYIQEMRATGSKGESTGKDEFGFQ (1)

YDQLVIGAADFLVPPFSAIPAKICLILERLIQYPEVQTKMYRELNEVVGLNRLPTLDDRA

DLPYCDAVIREGLRIDALVPSGIPHMAVTDTQLNGYQIPKGTVIVNSLEFIHHQPEIFRD

PDSFMPERFLTPDGKLALDQDKTLPFGAGKRVCGGEQFARNALFLGVTSLVQNFTFQ

LPAGRACPDLDGRITGVIQTTPDFRLKFVSRR*

 

>CYP305A6 AAGE01041187 494160882  476322188 754462117 mate pair = 754369970

which is an exact match to part of AAGE01202372

65% to 305A2 825745101 613940462

AAGE01202372.1 N-term exon for CYP305A complete

1435  MITLVLSSVVIVSFIFWLWQDLQRPPNFPP (1) 1346

GPKWLPFFGNTLLIRNLARISGGQHLAFEALSKQYKSPVIGLKLGREHVVVALQ

YPAVHEALTKEAFDGRPDNFFIRLRTMGTR (2)

LGITFTDGPFWTEHNSFVVR

HLRQAGYGRQPMQLQIQNELNELIGIIRDLDSEPVWPGSILPTSVINVLWTFTTGSRIPR

DDQRLTRLLKLLQDRSKAFDMSGG

ILSQLPWLRHIAPEWTGYNLINRFNQEIHEFFKATIEKHHQDYTEEKCSDDLIYAFIK

EMKERKDDPCSTFTDVQLSMIILDIFIAGSQTTSTTIDIALMILAMNTEIQRKIYAEIDD

NFHPDEIPDQNCRTNLQYTEAFLLEVMRLYQIAPIGGPRRALSDCTLGGYRIPRNTTILM

GLHTVQMDPDHWGDPENFRPERFIGPDGKIINTERLIPFGLGRRRCLGDSLARSCMFTFL

VGILQKFSLRLPDSLEGPSLKLTPGITLSPKPYKVVFEPRLK*

 

AAGE02003241.1

24317  MITLVLSSVVIVSFIFWLWQDLQRPPNFPP (1)  24228

13700  GPKWLPFFGNTLLIRNLARISGGQHLAFEALSKQYKSPVIGLKLGREHVVVALQYPA  13530

13529  VHEALTKEAFDGRPDNFFIRLRTMGTR (2) 13449

13391  LGITFTDGPFWTEH  13350

13349  NSFVVRHLRQAGYGRQPMQLQIQNELNELIGIIRDLDSEPVWPGSILPTSVINVLWTFTT  13170

13169  GSRIPRDDQRLTRLLKLLQDRSKAFDMSGGILSQLPWLRHIAPEWTGYNLINRFNQEIHE  12990

12989  FFKATIEKHHQDYTEEKCSDDLIYAFIKEMKERKDDPCSTFTDVQLSMIILDIFIAGSQT  12810

12809  TSTTIDIALMILAMNTEIQRKIYAEIDDNFHPDEIPDQNCRTNLQYTEAFLLEVMRLYQI  12630

12629  APIGGPRRALSDCTLGGYRIPRNTTILMGLHTVQMDPDHWGDPENFRPERFIGPDGKIIN  12450

12449  TERLIPFGLGRRRCLGDSLARSCMFTFLVGILQKFSLRLPDSLEGPSLKLTPGITLSPKP  12270

12269  YKVVFEPRLK*  12237

 

>73% to CYP305 above 519967093 521924636 570423900 pseudogene of AAGE01051792

contains a deletion and stop codon

FLPGPQWLPFIGNTPFVRKLARASGGQHLAFEALSKQYNSPVIGLKLGREYVVVALQYPAVREVHSKEEFDGRPDNF

LLKMRLERFVISRLGVTCTDGPFWAEHRNFVVRHLRQAGYGRQ

GIIRDMDGEPVWPGSILPTSVINVLWTFTTGSRIPRDDQRLARLLKLLQDRSKAFDMS

GGVLSQLPWLRHIAPEWTGYNLLKRFNQELHEFFMIIVERHHQEYHEEKCSDDLIYA

FIKEMKDRKDDPSSTFTDLQLTMIILDIFIAGSQTTSITIDLAFMMLTMHTDIQRDTCRN

R*DLHHDEMPSKRSYSLPYTE

 

AAGE02003240.1 this matches 305A5

Sbjct  53574  FLPGPQWLPFIGNTPFVRKLARASGGQHLAFEALSKQYNSPVIGLKLGREYVVVALQYPA  53395

 

Query  61     VREVHSKEEFDGRPDNF-----------------LLKMRLERFVISRLGVTCTDGPFWAE  103

              VREVHSKEEFDGRPDNF                 LLKMRLERFVISRLGVTCTDGPFWAE

Sbjct  53394  VREVHSKEEFDGRPDNFFLRLRTMGTR*DFKL*CLLKMRLERFVISRLGVTCTDGPFWAE  53215

 

Query  104    HRNFVVRHLRQAGYGRQ--------------GIIRDMDGEPVWPGSILPTSVINVLWTFT  149

              HRNFVVRHLRQAGYGRQ              GIIRDMDGEPVWPGSILPTSVINVLWTFT

Sbjct  53214  HRNFVVRHLRQAGYGRQPMQLQIQNELNELIGIIRDMDGEPVWPGSILPTSVINVLWTFT  53035

 

Query  150    TGSRIPRDDQRLARLLKLLQDRSKAFDMSGGVLSQLPWLRHIAPEWTGYNLLKRFNQELH  209

              TGSRIPRDDQRLARLLKLLQDRSKAFDMSGGVLSQLPWLRHIAPEWTGYNLLKRFNQELH

Sbjct  53034  TGSRIPRDDQRLARLLKLLQDRSKAFDMSGGVLSQLPWLRHIAPEWTGYNLLKRFNQELH  52855

 

Query  210    EFFMIIVERHHQEYHEEKCSDDLIYAFIKEMKDRKDDPSSTFTDLQLTMIILDIFIAGSQ  269

              EFFMIIVERHHQEYHEEKCSDDLIYAFIKEMKDRKDDPSSTFTDLQLTMIILDIFIAGSQ

Sbjct  52854  EFFMIIVERHHQEYHEEKCSDDLIYAFIKEMKDRKDDPSSTFTDLQLTMIILDIFIAGSQ  52675

 

Query  270    TTSITIDLAFMMLTMHTDIQRDT-CRNRXDLHHDEMPSKRS-YSLPYTE  316

              TTSITIDLAFMMLTMHTDIQ+        +LH DEMP +    SLPYTE

Sbjct  52674  TTSITIDLAFMMLTMHTDIQKKIHAEIDENLHQDEMPQQNDRTSLPYTE  52528

 

>CYP305A5 AAGE01051792 70% to CYP305A2 but no stop codon N-term exon is one of two choices.

82% to other CYP305 Aedes seq

520611721 836008963 529076567 570690021

AAGE01309663.1 CYP305A N-term exon matches by default since the other CYP305 has an exon 1 sequence complete

     MIVLVLTSVLIIAFSYWLLQELRRPPNYPP (1)

     GPQWLPFIGNTPFVRKLARASGGQHLAFEALSKQYNSPVIGLKLGREYVVVALQYPA 696

 697 VREVHSKEEFDGRPDNFFLRLRTMGTR (2?) 777

 838 LGVTCTDGPFWAEHRNFVVRHLRQAGYGRQPMQLQIQNELNELIGIIRDM 987

 988 DGEPVWPGSILPTSVINVLWTFTTGSRIPRDDQRLARLLKLLQDRSKAFDMSGGVLSQLP 1167

1168 WLRHIAPEWTGYNLLKRFNQELHEFFMIIVERHHQEYHEEKCSDDLIYAFIKEMKDRKDD 1347

1348 PSSTFTDLQLTMIILDIFIAGSQTTSITIDLAFMMLTMHTDIQKKIHAEIDENLHQDEMP 1527

1528 QQNDRTSLPYTEAFLLEVQRFFHIVPVSGPRRALSDCTLGGYQIPKNTTILMGLRTVHMD 1707

1708 PEHWGDPECFRPERFLSPDGKIITTERLIPFGLGRRRCLGESLARACMFTFLVGILQKFS 1887

1888 LRQPANCSEKPSPKLLPGITLSPKPYKVIFEPR* 1986

 

>CYP306A1 570772008 512981304 597667916 641824294 753304856 593374976 574131373

587966306 514783872 514783871 618134500 835036042 803206894 578828539

AAGE01228356 AAGE01635404 AAGE01635520 complete

MYLILGIVLILTYVLWTLLDRRGKPPGPFGLPILGYLPFIDSIKPYETLTNLAKRYG

PVYSLRMGQVDAVVLTAPDLIRDTLKREETTGRAPLFITHGIMGGH (1)

GIICAEGNLWRDQRRLSTEWLRKMGMTKFGPTRATLEARILIGVNELLE (0)

DLRRESEKVFAFDPAPLLHHILGNLMNDIVFGLQYERDDATWRYLQHLQEEGVKHIGVSMAVNFLPFLR (2)

HLPSSKRIIEFLLNGKAKTHKIYDSIIEKQRSRMEGGGSEVSDP

GRHDDCILSNFLQETRRRETGARPELAFCSDVQLRHLLADLFGAGVDTTFTTLRWLILFL

ALNKDAQERLRQEMASQLRGEPCLNDVDSLPYLKACVAEAQRLRTVVPLGIPHGAVS (0)

EITIAGYKVSKNTMIIPLLWSVHMDPSLWPNPDRFDPDRFLDESGQYSAPAHFMPFQT

GKRMCLGDELARMILLLYTGRLFWHFELDVFNGEGLDLTGVCGITLTPPPFEIIFKERV*

 

>CYP307A1 571521703 817504746 824335840 591439033 834970143

TC53059 TC28026 TC50479 78% to CYP307A1 complete

813467047 (exon 1) found by searching with the DNA seq above, 67% to anoph 307A1

246 MAYTLILVALMSLLSVVCYLKVLYEWHRKVRVQTVKSSRYAKKLQKLEESQPQEVEEAP 422

423 VEFPQAPGPYPWPVLGSAAIIGQYPAPFMGFSALAKKYGDVYSIRIGQGQCLVVSSLELI 602

603 REVLNQNGRYFGGRPDFLRYHQLFGGDRNN (1)

SLALCDWSSLQQKRRNLARKHCSPSDASSYYQKMSDVGV

AEMHYFMDQLTDVVTPGQDFKVKPLIMQACANMFSKYMCSVRFEYDDAGFQKMVHSFDEI

FYEINQGYAVDFMPWLAPFYFRHMSKLSSWSNYIRGFILERIVNEREQNLGEDEPERDFT

DALLKSLREDPSVSRDTIMYMLEDFIGGHSAIGNLVMLALGYVAKNPEIGARIQQEIDHV

TDKGLRNVTLYDTESMPYTVATIFEVLRYSSSPIVPHVATENTCIG

GYGVQTGTVVFINNYDLNTSEKYWDHPERFDPSR (2?)

SNESQKQILRVKKNIPHFLPFSIGKRTCIGQNLVRGFSFIMLANILQKYDVHT

NDPAQIKMKPACVAVPPDTYPLAFTQRSQ*

 

>CYP307B1 AAGE01081732 476411966 68% to 307B1 519649910 578920479 complete

revised according to AAGE02011086.1 and AAGE02028078.1 4/20/06

1027 MEKFTIFLFSSNTIYLLVACFLVTLIMLLLEVRQKISVKSDLVKLVKSFLFGQWLSVFTQNNKNRNL 848

847  NDTEVKVLRRAPGPKSYPIIGNLKDLDGYEVPYQAFSVLAKKYGPVVNLKLGVVDAVVIN 668

667  GIEHIKEVLINKAQYFDSRPNFRRYQLLFSGNKEN 533 (1)

     SLAFCDWSEVQKARRDMLVPHTFPRNFSGRFNELNGVINDEIRLVIGESNVNRVIEIK

14  PIIMNICANVFSQYFASHRFELEDPKFQKLVKNFDQIFYEVNQGYAADFLPFLLPLHHR 193

194 NLKRMDQLAEEIREIMLETIINDRYDNWVEGNTENDYVDSLINHVKSKIGPDMEWETALF 373

374 ALEDIIGGHSAVANFLVKTFGYIIQHPEVQQNIQSEVDRVLETEGKHTVDLSDRNHMPYT 553

554 EAVIMEALRLIASPIVPHVANQDSQIG 637 (1?)

685 GYDVPKDTLIFLNNYDLSMSENLWENPNDFVPERFLQNGRLVKPDFFIPFGAGRRS 864

865 CMGYKMTQLISFSIIANLLRSYTITPLSGHSYFVPVGSLAMPEKSYEFQINLRH* 1029

 

CYP3 clan CYP6 related sequences

Note CYP6 and CYP9 sequences (in Anopheles) have only one intron and will be the easiest to assemble. 6AG, 6AH and 6AJ

Are exceptions. 

 

CYP3 clan sequences

CYP6 related, 14 complete, 20 partials

 

>AAGE01198540 494152727 63% to CYP6Z2 67% to AY433537 519918984 574095157 569650597 complete

MFIYTFALFWLALVLVLRYIYSYWDRNGLASIKPQIPYGNLKSVAQK

TQSFGVATCELYWKSQERLAGIYLFFRPAVLIRDAHLAQRIMTTDFSYFHDRGVYCNEEI

DPFSANLFAL

PGKRWRNLRHRFTPLFTSGQLRCMMPTILDVGHKLQKFLEPAAERQEVVDIREIVSRGVL

ELIASLFFGFEADCINDPDDAFSKTLREFQLGGFMNNFRTACTFVCPELLQVTRISSLSP

QMIKFATDVVTKQIEHREKNNVSRKDFIQLLIDLRREEANNNEVALSFEQCAANVFLFYV

AGSDTSTSAITFTLHELTQNPEVMDKLQSEIDEMLVQTNGELTYTAIKELPYLDLCVKET

LRKYPGLAILNRKCTKSYAVPESSVVIQEGTQIMIPLLAYGMDEKYFPEPERYYPERFNKQSKNYDEKA

YYPFGEGPRNCI (1)

AYRMGVMVSKIGLILLLSKFKFEATQGPKIVFSAATVPLVPKGGIPVKISNR*

 

>AAGE01065173 78% to AAGE01198540

N-term is on AAGE02015843.1 (revised 4/20/06)

46903 MFIYTFALFWLAVAFAIRYIYSYWDRNGLPSIKPH

3333 IPYGNLKAVANRTESFGVATCDLYWKSKDRLVGIYLFFRPAVLIRDAHLAQQIMTTDFSH 3154

3153 FHDRGVFCNEEVDPFSANLFALAGKRWRNLRNKFTPLFTAGQLRCMMPIILSVGHKLQNV 2974

2973 LEPAAKKQEVLEIRELVSRCVLDIIASVFFGFEANCINDPNDAFIQNLRELQYDGFFNNL 2794

2793 RAAASFICPELLKLTRISSLSPEMIRFVTDIVTKQIEHREKNKVTRKDFIQLLIDLRRED 2614

2613 TNNNEAALGFEECAANVFLFYVAGSDTSTSAVAFTLHELTQNAETMGKLQTEIDEMLVKT 2434

2433 SGELTYDGIKEMSYLDLCVKETLRKYPGLAILNRECTKSYAVPNSDILLKKGTQVVIPLL 2254

2253 AYGMDEKYFPEPDRYLPERFDKSTKNYDEKAFYPFGEGPRNCI (1) 2116

2065 AFRMGVMVSKICLVLLLSRFNFEATRGPKIDFTPSTVALLPKGGIPVKISIR* 1907

 

>AAGE01047841 AY433537 62% to 6Z2 569650597 622013821 579345058 complete

MLFIYSVALLCIAVTLALKYVYSYWDRHGLPSVKPHIPFGNLKTVVKKTESFGIAIN

QLYWQTKGQLAGIYLFFRPAILVRD

AHLAQQIMTTDFNHFHDRGIYCNEEGDPFSANLFALPGKRWRNLRNKLTPLFTGGQLRGM

MPTILEVGEKLQKHLEPVAERQEVVEIRDIVSRFVLEIIATVFFGFEANCIEDRDDSFSK

VLREAQGERLSAVLRAAAMFVCPGLLRYTGISSLEPQVIAFVSEIVTKQIEHREKNSVTR

KDFIQQLIEIRRGSGENQVPAMSIEQCAANVFLFYAAGSETSTGTIAFSMHELSHHADVM

KKLQDEIDDALAKSNGAITYESVMQMQYLDLCVKETLRKYPGLPFLNRECTMDYKVPDSD

LVIRKGTQLVLPIYGFSMDEQYFPEPECYIPERFEEASKNYDEKAYYPFG

DGPRNCI (1)

AYRMGVLITKIGLILLLSKFTFEATQGPKMMFSSASVPLLPKDGISLKISN

RKR*

 

>AAGE01005406 80% to AY433537 62% to 6Z2 complete

possible pseudogene with frameshift at AVGDKLX X = ct

confirmed in four trace archive sequences

520668645, 757097876, 589569591, 811977620

5398 MLFVYTLTILSIAITLVLKFVYSYWDRYGVQNIKPHIPFGNLKTVVKKTESFGVAINQLY 5219

5218 WQTKGQLVGIYLFFRPAILIRDAHLAQQIMTTDFNHFHDRGVYCNEEGDPFSASLFSLPG 5039

5038 KRWRNLRNKLTPLFTGGQLRGMMPTILAVGDKLX 4940

4937 KHLEPVAENREPIEIRDIVSRFVLEIIATVFFGFEANCIKDRNDAFCRVLREAQRESMYT 4758

4757 NFRAAAVFVCPGLLKYTGISSLEPEVKEFVSGIVTEQIEHREKNGATRKDFIQQLIELRR 4578

4577 EDSQNQNVRMSIEQCAANVFLFYIAGSETSTGTITFTMHELSQHPEVMKKLQAEIDDTLA 4398

4397 KSNGEITYENVNQIQYLDLCVKETLRKYPGLPILNRECTSDYKVPDLDLVIRKGTQVVIP 4218

4217 LYGISMDEQYFPEPECYKPERFDGASKNYDEKAYYPFGEGPRNCI (1) 4083

4017 AFRMGVLVSKIGLVLLSSKFNFKPTQGPKIVFSPAAVPLVPKGGISLMISRRDK 3856

     VADLYMGLHISVVLKVVCS*

 

>AAGE01054542 476413066 56% to 20199522 76% to 6Y1 579367130

614744104 834925676 complete

MWLVYLVWLVAAVLLAVYLWIKKRFNFWKDRGVEYIEPEFPFGNFKTLGKVEHIAPITQR

HYDYFKQKGVPYGGVFMLTSPLLYILDTKLIKTLLVKDFNHFPNRGVYFNEKDDPLSAHMFAI

EGNKWKTLRNKLSPTFTSGRIKMTFPLVVGVCQQFCDHLGEVVQQSNEVEMHDLLSRYTI

DVIGTCAFGIDCNSFREPDNEFRKYGKIAFDKLPHSPLVVYLMKAFRSYANAFGMKQLHE

DVSSFFSKVVKDTIEYRESNNVVRNDFMDLLLKLKNTGRLEESGEEIGKISFEEIAAQAF

IFFTAGYDTSSTAMTYTLYELALNQKAQEKARKCVLDIFAANNGTLTYESVGNMGYLDQC

IN (1)

936 ETLRKHPPVAILERNADRDYKLPDSDIVIKKGRKIMIPTFAMHHDAEHFPDPE 760

759 RYDPDRFSPEQVACRDPYCYLPFGEGPRICIGMRFGTIQARVGLASLLKRFRFRVCDKTQ 580

579 IPVRYSKTNFILGPANGVWLRVEKL* 505

 

>AAGE01206812 586027460 593564617 637757183 494307621 complete 38% to 6M1

TC54189 TC23406 TC42024 38% CYP6P3 TC54190 TC23407 TC42025 TC574 TC6535

83% to TC63333 94% to TC54191 581543219

MFLVILLITLSLYLYQKWIYTYWKRRGVPQLNPSFPFGDVADTFKQRKSYANRLAELHHQ

SASDSHRFVGIYTLFQPILLVTDVELVRRMLTVDFEHFTDRGAHVNEKRDPLSGHLFSLAG

AKWRRMRLKLTPAFTTAKLKAMFPTMMACGRTLSAVIDDHVGRALAIRDLMTRFTMDVIASVGFGLE

CNSMRNPDELFRQMGGRFFSKSWKTSVRMLLAFVAPKVNRYLQVK

LNDDDVEEYMLNLVR

DTIAKREGGGEVRKDFIQLL ()

VQLRNQVEVKDGGSWEMNKVDQNKTLTVEEMAAQSFVFLN

AGYETTSSTVTFCLFELCRNKDLIRKVQEEIDRVMDGGREISYEALAEMTYLESCIDETL

RKYPISPVLFRVCTKPYKIPETDVVIEKDTLVQISLVGLQRDTRYYEDPVKFDPDRYGER

KSETMPHYSFGDGPRVCI (1)

GLRMGKVMAKMALVELLFRYDFELESPAADSGEIELDPSLLMLQAKHDVKLIPRFRAK*

 

>617983543 some differences with TC54191 TC42026 44% to CYP6Z3

94% to 586027460

584131270 520119914 760257438 832454533 625082069

complete 39% to 6N1 anopheles complete

MLLPILLVVLVVYLFQKWTYSHWKRRGVPQLNPAFP

FGNVADTFKQRTSYSNRLAELHHQAVRDGHRFVGIYTLL

QPILLVTDVELVKRMLTVDFEHFVDRGAHVNEKRDPLSGHLFSLTGAKWRRMRLKLTPAF

TTAKLKAMFPTMMACGRTLSAVIDDHVGRALAIRDLMTRFTMDVI

ASVGFGLECNSMRNPDELFRQMGGRFFSKSWKTSVRMLLAFVAPKVNRYLQVKLND

DDVEEYMLNLVRDTIAKREGGGEVRKDFIQLLVQLRNQVEVKDGGSWEMNKVDQNKTLTV

EEMAAQSFVFLNAGYETTSSTVTFCLFELCRNKDLIGKVQEEIDRVMDGGREISYEALAE

MTYLESCIDETLRKYPISPVLFRVCTKPYKIPETDVVIEKDTLVQISLVGLQRDTRYYED

PMKFDPDRYGERKSETMPHYSFGDGPRVCI

GLRMGKVMAKMALVELLSRYDFELESPAADSGEIELDPSLLMLQAKHDVKLIPRLRTK*

 

>NABNU08TR  NABNU08 32% to CYP6Z4 59% to 586027460

(no genomic match) looks like a pseudogene

best genomic match was to 617983543 at 77%

I do not think the TIGR database actually has an Aedes seq that is missing from

The 15 million trace files of Aedes, so this may be a contaminant from another

Species

 

KTLTPFERAAQSSGSQKAFYETTSATGDGSRIERSRNKDLI

GKVQEEIDRVMDGGKGIS*

EALAETTYPESCTEETLRKHPSPPDQDRGGTKPNKTPETDDASEKDTPVQTPPGGTQRDK

REKEDPEKHEPERYGERKPETTPHHSRGDGPRDSTGHRKGKATAKKALAEQPTRNDYEQE

PPAADTGENEQEPSQPTPQAKHEVKQKPRQRAK

 

>AAGE01003592 512632636 TC63333 TC10785 TC15419 TC26904 TC37692 TC4101

41% to CYP6P4 83% to 586027460 836033925 753054225 574000494

(n-term looks identical to 586027460) complete

revised 4/21/06 used AAGE02004393.1, AAGE02030939.1

MFLVILLITLSLYLYQKWIYTYWKRRGVPQLNP

SFPFGDVADTFKQRKSYANRLAELHHQSASDSHRFVG

IYTLFQPILL

VTDVELVRRMLTVDFEHFTDRGAHVNEKRDPLSGHLFSLAGAKWRWMLQKLAPAFTSAKV

KSMFPTMMTCGRTLSAVVGDHLGRALPIRALMTRFTMDVIASVGFGLDCN

SMRNPDEPFHKMGSKFFSKSWKTSVRMLLAFVAPKVNRFLQL

KLNDDDVEEYMLNLVRDTIAKREHGGEVRNDFIQLLVQLRNQVEVEDGGSWEINKVEPNK

ALTVQEIAAQSFVFLNAGYETTSSTITFCLFELCRNRDLLGKLQEEIDEVVDGGREASYE

AITEMTYLEACVEETLRKYPISPVLFRVCTKPYRIPDTDFVIEKGTLVQISLVGLNRDPR

YYEAPLKFDPDRYGERKAETMVHYSFGDGPRGCIGLRMGKVMVKMALVELLSNYDFEMES

PTGENELDPSLLMLQPKHDV