Oct. 18,
2005 under revision April 21, 2006 (in progress)
Revision
continues May 19, 2006 to June 25, 2006
Compiled
by David Nelson and David Drane
The completed and
named sequences are here
(http://drnelson.utmem.edu/AedesFasta.June25.htm)
This file
is more archival with detailed information.
Please see
the FASTA file above.
Useful
links for analysis
http://www.ncbi.nlm.nih.gov/Traces/trace.cgi Trace Archive at NCBI
http://trace.ensembl.org/perl/traceview
Trace files at Ensemble
http://132.192.64.52/blast/P450.html
P450 Blast server
http://www.proweb.org/proweb/Tools/WU-blast.html
Do-it-yourself WU Blast
http://www.bioinformatics.vg/bioinformatics_tools/JVT.shtml
DNA translator
http://www.ncbi.nlm.nih.gov/BLAST/tracemb.shtml
NCBI megablast
http://www.tigr.org/tigr-scripts/tgi/T_index.cgi?species=a_aegypti
TIGR Aedes gene index page
206 Aedes
sequences here including 142 complete sequences.
Numbers in
() are intron phases. Names have not been assigned for most genes.
Sequences
collected and assembled by David Drane and David Nelson from July to Sept.
2005. 3.5
million of 15 million trace file sequences were downloaded from NCBI and
placed on
a stand alone BLAST server on a Mac G4 for TBLASTN searches
at expect
value of 10. The WGS section of
Genbank was searched and 220 AAGE01XXXXXX
accession
numbers are given at the end of this file. The TIGR Gene Index was
searched
for text “P450”. The
EST section of Genbank was searched and
discontiguous
megablast was used to extend sequences by chromosome walking.
Most
sequences should be represented here now, but not all are assembled.
The Aedes
mosquito seems to have more P450s than the Anopheles mosquito.
This file
is in progress. The CYP4 and CYP325
families are not yet fully assembled
because
there are some large introns in these sequences.
The
sequences are presented in clan groups: the CYP2, CYP3, CYP4 and mitochondrial
clans. Note: Aedes has a CYP18 that was not
found in Anopheles.
CYP329 of
Anopheles now looks like it is a pseudogene of a CYP9 sequence.
It is
short in the heme signature and it has a P at the critical T in the I-helix
oxygen
binding pocket. It is the only
sequence that is in the CYP3 clan that does not
fall
inside the CYP6 or 9 families in Anopheles.
There are
11 complete sequences in the CYP2 clan (CYP15, 18, 303, 304, 305, 306:phm, 307)
Phantom phm is one of the Halloween genes.
There are
76 complete sequences in the CYP3 clan (CYP6, CYP9)
There are
34 complete sequences in the CYP4 clan (CYP4 , CYP325)
There are
9 complete sequences in the mitochondrial clan (CYP12, 49, 301, 302:dib,
314:shd, 315:sad) These include three of the
Halloween genes disembodied dib, shade shd,
shadow sad.
There are
21 pseudogenes so far.
There are
15 partial sequences (not including the pseudogenes).
CYP2/CYP18
clan sequences
>514720743
753475610 750240311 possible CYP15 N-term = DR747015.1 EST
MWQNLVVLIIFVILFCLRDMRKPGYFPP
(1)
>CYP15B
like 585964866 641740723 584363040
78 GPNWFPLIGSGFEVFRLVKHFKFYHLMWAELMRRYGPIVGLRLGRDRVVIVSGLDA
257
258
IREVYSKDQFDGRPDGFFFRIRSFDKRLGVVFTDGAHWDIQRRFSVRTLKALGMGRTGMV 437
438
NSLEREAEEMIHHLRKLSRTQKVISMHNAFDVSVLNSIWTLIAGKR (2) 575
FDLDDKKLEWIMETIHKSFRVIDMSGGVLNQFPPIRYVLPDKSGFAPLLNLLSPLWTFLQ 816
>CYP15B
like seq DR746695.1 adult female corpora allata cDNA
813859354
749484786 522065275 514869301 520643713
GTIKSIRSKLDQPDNPDCFIASYLRELNIAERHSSFTNEQLLCLCLDL
FQAGSETTSNTLGYGIAHMLHHPEIVQKIHNELDSVIGRYRLPLLADRPYLPYTEAVLCE
IQRISNIAPLAIAHRTVAPVQLGTYVIPKNTITLISLYSLHMDKAYWGDPEVFRPERFLN
ETGDKLIAHEYFVPFGS
(1)
GKRRCLGESLAKSSLFLFFTAFMHAFLVEPAEPGKLPELDGIDGITLSPCPYFVQLKERLI*
>possible
complete CYP15B1 assembled from parts 52% to 15B1 from
Anopheles
AAGE01116789
AAGE01129498
Used trace
archive seqs to verify seq at PLLNLLRPLWTFLQ
This
region is not accurate in AAGE02003241.1
638470554
823375362
593712263
586030336
641740723
569671400
used AAGE02003241.1 for the C-term seq changes
MWQNLVVLIIFVILFCLRDMRKPGYFPP
(1)
GPNWFPLIGSGFEVFRLVKHFKFYHLMWAELMRRYGPIVGLRLGRDRVVIVSGLDA
IREVYSKDQFDGRPDGFFFRIRSFDKRLGVVFTDGAHWDIQRRFSVRTLKALGMGRTGMV
NSLEREAEEMIHHLRKLSRTQKVISMHNAFDVSVLNSIWTLIAGKR
(2)
FDLDDKKLEWIMETIHKSFRVIDMSGGVLNQFPPIRYVLPDKSGFAPLLNLLRPLWTFLQ (0)
GTIKSIRSKLDQPDNPDCFIASYLRELNIAERHSSFTNEQLLCLCLDL
FQAGSETTSNTLGYGIAHMLHHPEIVQKIHNELDSVIGRYRLPLLADRPYLPYTEAVLCE
IQRISNIAPLAIAHRTVAPVQLGTYVIPKNTITLISLYSLHMDKAYWGDPEVFRPERFLN
ETGDKLVAHEYFVPFGS (1)
GKRRCLGESLAKSSLFLFFTAFMHAFLVEPAEPGKLPELDGIDGITLSPCPYYVQLKERLI*
>567527404 46% to CYP15B1 may be a CYP15
pseudogene
XXXXXXXXXXXXXXXXLVRRFRFYHHTCAAFMCLYRPIVDLRMGRDRVVIMTGLDP
I*KVYSKDEKENRPVGFFFRIRSFDKRLAVVFTDGAHWDIQRRFSVRTLKALGMGRTGLV
SSLEREAEEMIHHLRKLSRTQKVISRNNAFDVSVLNSIWTLIAER
>CYP18A1 AAGE01025833 AAGE01338874.1 AAGE01065191.1
529463664 572557122
66% to
18A1 (note: CYP18 not seen in Anopheles) complete
revised at cyan aa based on AAGE02007615.1
MFLDTYLLGVVRQEFFDASKARST
2678
LLVFCCTLSCVVFLQWLFRLVCQIKKLPPGPWGVPIFGYLTFIGHEKHTQYMKLARKYG 2502
2501
SLFSAKLGAQLTVVISDYKIIREAFKTEDFTGRPHSPLLKTLGGF (1?)
GIINSEGQLWKDQRRFLH
719
718 EKLRHFGMTVLGNKKHLMESRIM (0)
534
TEVAELLASLNEVGSQSTDLSKYLSVSVSNVICNIIMSVRFSLEDPKFKRFNWLIEEGMRLF 359
358
GEIHTIDYIPQIQYLPGNINAKNKIAKNRQEMFDFYREVIDEHKRSFNAENIRDIVDAYL 179
178 DEIQKAQAEGRDQELFDGKDH 116
EIQMMQVIADLFSAGMETIKTTLLWLNVFMLRHPDAMKRVQDELDQVVGRNRLPKIEDVP 406
YLPITETTILEVMRISSIVPLATTHSPKS (2) 319
2116
DVVINGYTIPAGSYVVPLINSVHMDPTLWDKPEEFNPSRFLDAEGKVHKPDFFIPFGVGR 1937
1936
RRCLGDVLARMELFLFFASIMHTFTIELPEDEPMPSLKGIIGVTISPQAFRVKLIPRPLN 1757
1756
ADLDRLRNVGSC* 1718
>AAGE01098313
(upper seq) CYP18 like fragment probable pseudogene
Query: 1703 ASLNEVGSRSS 1671
ASLNEVGS+S+
Sbjct: 210 ASLNEVGSQST 220
Query: 313
VGSQSTELSKYLSVLVSNVICNIIMSVRFSLEDPKF--------------GGMHTIDYIP 176
VGSQST+LSKYLSV VSNVICNIIMSVRFSLEDPKF
G +HTIDYIP
Sbjct: 215
VGSQSTDLSKYLSVSVSNVICNIIMSVRFSLEDPKFKRFNWLIEEGMRLFGEIHTIDYIP 274
Query: 175
QIQYLPGNV-------KNRQEMFDIYREVINEHKRSFNAENIRDIV 59
QIQYLPGN+ KNRQEMFD YREVI+EHKRSFNAENIRDIV
Sbjct: 275
QIQYLPGNINAKNKIAKNRQEMFDFYREVIDEHKRSFNAENIRDIV 320
Query: 56 AYLDEILKAQAE 21
AYLDEI KAQAE
Sbjct: 322 AYLDEIQKAQAE 333
>AAGE01227048
(upper seq) CYP18 like fragment probable pseudogene
Query: 643
ASLNEVGSQPIDLNKYLSVSVSNVICNIIMSVRFSLEDPKFA-------------G-LHT 780
ASLNEVGSQ DL+KYLSVSVSNVICNIIMSVRFSLEDPKF
G +HT
Sbjct: 210
ASLNEVGSQSTDLSKYLSVSVSNVICNIIMSVRFSLEDPKFKRFNWLIEEGMRLFGEIHT 269
Query: 763
FAGLHTIDYIPQIQYLPGNV-------KNRQEMFDFYREMIDEHKQSFNAENIRDIV 912
F +HTIDYIPQIQYLPGN+
KNRQEMFDFYRE+IDEHK+SFNAENIRDIV
Sbjct: 264
FGEIHTIDYIPQIQYLPGNINAKNKIAKNRQEMFDFYREVIDEHKRSFNAENIRDIV 320
Query: 915 AYLDEILKAQAEDRDQELFEGKDHEI
992
AYLDEI KAQAE RDQELF+GKDH++
Sbjct: 322 AYLDEIQKAQAEGRDQELFDGKDHDV
347
>CYP303A1 AAGE01109944 641807020 834983680 618119317
834966118
826136105 587934965
72% to
303A1 complete
MYWYYLACFIVVFIIFLYLDCIKPANFPPGPKWYPIIGSAIEIARARQKTGMLCKAIKLIASKYDHKGVIGF
KVGKDKTVMAISGDSLREMMSNEDLDGRPTGIFYETRTWGLRRGVLLTDEEFWQEQRRFI
VRHLKEFGFARKGMAEIIGNEAEYVKNDFHALVKAGNGKALVQMQSAFSVYILNTLWLMM
AGIRYTRENKDLKYLQSLLHELFANIDMMGALFSHFPFIRFFAPRLSGYKQFVEIHNLMH
KFIGAEVENHKKSFNDTDEPRDLMDVYLKILQSNRDIPESFSQEQLLAVCLDMFIAGSET
TTKTLGFAFLHLVRQRETQLKVQKELDEVVGRNRLPTLEDRVN
(2)
LPYCEAVVLEALRMFMANTFGIPHRALRDTKLCGYDIPK
(0)
DTMLVGMFRGMMLNDWESPTSFKPERFLKGGKIVIPPNFHPFGVGRHRCMGEMMGKAN
110
LFLFITTLFQSFDFLVPEGYPIPSDEPIDGATPSVRQYTALIVPR*
>581536484
803281860 586608108 826028980
574131458
595148561 754352758 590136340 519840563 753671460
67% to
512982119 above 58% to 304B anoph
tried
walking the chromosome down to exon 2 so
some
numbers above are in the intron
MLLNPSAILWTVAAGLLIYRCFRFMFDRPPNFPSGPPRFPLLGSYLVLLMVNYRHLHR
AAVRLGQFYRTKILGIYLGDFPSIVVNDLAIAKEVLARSEFDGRSDLFLARMRERNFQRR
GIFFTDGPHWKEQRRFVLRHLRDYGFGRRFDELEAETRSELMTLLDVLRYGPKFEHERLF
AKDGCVKCPDAFYGLLGNVYFQVICGERFQRKDMAQLYE
(1)
>223483644
519671636 528946489 494183870
a second
304B like C-term sequence 57% to 304B
TGRHAINFQQKGDDYGTILSYLPWLKDYFPEATNYRILREVNNRMNDLIEAMVQKY
LASYDENHMRCFLDRYIYEMKQSKPLEGDAFTFQ
(1)
YDQLVMILWDMLLPTLSGSAIQLSMLLERLLLNPRVATKVQQELDGVVGHGRLPTLDDRV
889
NLPYAEATLLEALRIDTLVPSGISHVALEDTKLCGYDIPKGCFVMLSLDVINNQREFWGDP
ENFRPERFLDESGKLSLKKDISVPFGGGKRLCVGETFSRNTLFLMFTALMQNFNIKPRPG
DPLPDLGQRITGVVTSMEPFWLRFEAR*
>CYP304B2xx Possible full length gene joining the
512982119 and 223483644 fragments complete
note this
is a hybrid of two different genes, see corrected seq below
AAGE01051934
MFVTPTIFLWLVTIGLIAYRCHRFLFDRPKNFPDGPPKFPLLGGYAVMLLINFYHLHR
AANKLCEYYRTKILGIYLGNFPTVIVNDFATVKEVLNRVEFDGRPDLFIARMREKNFLLR
GIFFTQGPDWKEQRRFILRYLRDYGFGRRFDELEAETNAEILTLVEMLRYGPRHEHETEF
MTKDGCAMVPNVFFACFANAFLYVLTGERINRDEAGALFE
TGRHAINFQQKGDDYGTILSYLPWLKDYFPEATNYRILREVNNRMNDLIEAMVQKY
LASYDENHMRCFLDRYIYEMKQSKPLEGDAFTFQ
(1)
YDQLVMILWDMLLPTLSGSAIQLSMLLERLLLNPRVATKVQQELDGVVGHGRLPTLDDRV
889
NLPYAEATLLEALRIDTLVPSGISHVALEDTKLCGYDIPKGCFVMLSLDVINNQREFWGDP
ENFRPERFLDESGKLSLKKDISVPFGGGKRLCVGETFSRNTLFLMFTALMQNFNIKPRPG
DPLPDLGQRITGVVTSMEPFWLRFEAR*
>CYP304B3yy/xx top part = my old Byy, bottom = my old Bxx + 1 aa diff
DW987682.1 EST supports this assembly, so mine
are hybrids
AAGE02028825.1 revised seq on 4/20/06
46553 MLLNPSAILWTVAAGLLIYRCFRFMFDRPPNFPSGPPRFPLLGSYLVLLMVNYRHLHRAAVRLGQFYRTKILGIYLGDFP
SIVVNDLAIAKEVLARSEFDGRSDLFLARMRERNFQRRGIFFTDGPHWKEQRRFVLRHLRDYGFGRRFDELEAETRSELM
TLLDVLRYGPKFEHERLFAKDGCVKCPDAFYGLLGNVYFQVICGERFQRKDMAQLYE
(2) 47203
61403 TGRHAINFQQKGDDYGTILSYLP
WLKDYFPEATNYRILREVNNRLNDLIEAMVQKYLASYDENHMRCFLDRYIYEMKQSKPLEGDAFTFQ
(1) 61672
61872 YDQLVMILWDMLL
PTLSGSAIQLSMLLERLLLNPRVATKVQQELDGVVGHGRLPTLDDRVNLPYAEATLREALRIDTLVPSGISHVALEDTKL
CGYDIPKGCFVMLSLDVINNQREFWGDPENFRPERFLDESGKLSLKKDISVPFGGGKRLCVGETFSRNTLFLMFTALMQN
FNIKPRPGDPLPDLGQRITGVVTSMEPFWLRFEAR* 62501
>512982119
637789748 834948129 570603901 750442192 570627554 568540398
743856885
581525309 637183809 812171267 586112683 570800380
579961153
793213948 581533371 587665129 570695804 574007683
60% to
304B1anopheles numbers above include a long chromosome
walk of
about 5-7kb, about 500 bp per step.
No C-term was found
N-term
exon is 55% to 304B anoph. and 48% to 304C anoph.
MFVTPTIFLWLVTIGLIAYRCHRFLFDRPKNFPDGPPKFPLLGGYAVMLLINFYHLHR
AANKLCEYYRTKILGIYLGNFPTVIVNDFATVKEVLNRVEFDGRPDLFIARMREKNFLLR
GIFFTQGPDWKEQRRFILRYLRDYGFGRRFDELEAETNAEILTLVEMLRYGPRHEHETEF
MTKDGCAMVPNVFFACFANAFLYVLTGERINRDEAGALFE
>CYP304B
494544931 512720460 41% to 476322188 72% to 304B1
827562306
594336057 512633341
(2)
TGKYAMMFQRTGDDYGTIYSLLPWMRHLFPNRTRYRTIREGSLGVNRFIESII
QKRLETHEEGHVRCFLDLYFTEMKKTVPRTEDNRFTFQ
(1)
HDQLVLGIVDFFFPAISGATTQ
IALLLERLLWHPEVVQKMQAEIDDVVGHGRLPTLDDRINLPYTEATLREAMRI
DTLVPSGVAHMAMKDTTLRGYDIPKDTILVLGLDSIHMQKDIWGDPERFRPERFLNYRGE
LSLSKDVSVPFGAGKRLCAGETFARNTMFLIVSALVQNFNIRQRLGDKLPDMGKRSTGII
ISPADYWVKFEPR*
>CYP304Byy
AAGE01029809 Possible full length gene joining
the
581536484 and 494544931 fragments complete
note this
is a hybrid of two different genes, see corrected seq below
MLLNPSAILWTVAAGLLIYRCFRFMFDRPPNFPSGPPRFPLLGSYLVLLMVNYRHLHR
AAVRLGQFYRTKILGIYLGDFPSIVVNDLAIAKEVLARSEFDGRSDLFLARMRERNFQRR
GIFFTDGPHWKEQRRFVLRHLRDYGFGRRFDELEAETRSELMTLLDVLRYGPKFEHERLF
AKDGCVKCPDAFYGLLGNVYFQVICGERFQRKDMAQLYE
(1)
(2)
TGKYAMMFQRTGDDYGTIYSLLPWMRHLFPNRTRYRTIREGSLGVNRFIESII
QKRLETHEEGHVRCFLDLYFTEMKKTVPRTEDNRFTFQ
(1)
HDQLVLGIVDFFFPAISGATTQ
IALLLERLLWHPEVVQKMQAEIDDVVGHGRLPTLDDRINLPYTEATLREAMRI
DTLVPSGVAHMAMKDTTLRGYDIPKDTILVLGLDSIHMQKDIWGDPERFRPERFLNYRGE
LSLSKDVSVPFGAGKRLCAGETFARNTMFLIVSALVQNFNIRQRLGDKLPDMGKRSTGII
ISPADYWVKFEPR*
>CYP304Bxx/yy top part = my
old Bxx, bottom = my old Byy
AAGE02028825.1 revised accurate seq 4/20/06
22307 MFVTPTIFLWLVTIGLIAYRCHRFLFDRPKNFPDGPPKFPLLGGYAVMLLINFYHLHRAANKLCEYYRTKILGIYLGNFP
TVIVNDFATVKEVLNRVEFDGRPDLFIARMREKNFLLRGIFFTQGPDWKEQRRFILRYLRDYGFGRRFDELEAETNAEIL
TLVEMLRYGPRHEHETEFMTKDGCAMVPNVFFACFANAFLYVLTGERINRDEAGALFE (2) 22960
35137 TGKYAMMFQRTGDDYGTIYSLL
PWMRHLFPNRTRYRTIREGSLGVNRFIESIIQKRLETHEEGHVRCFLDLYFTEMKKTVPRTEDNRFTFQH
35412
35469 DQLVLGIVDF
FFPAISGATTQIALLLERLLWHPEVVQKMQAEIDDVVGHGRLPTLDDRINLPYTEATLREAMRIDTLVPSGVAHMAMKDT
TLRGYDIPKDTILVLGLDSIHMQKDIWGDPERFRPERFLNYRGELSLSKDVSVPFGAGKRLCAGETFARNTMFLIVSALV
QNFNIRQRLGDKLPDMGKRSTGIIISPADYWVKFEPR* 36092
>CYP304C1 AAGE01104491 512990636 572473586 613989430
64% to CYP304C1
749978894
754492027 584954719 complete
MVLISELIIAALLGLLIYRFYRYLFERPSENFPPGPPRL
PLLGGYPFMLALNYKHLHKAAARLSQLYKSKLIGLYLGPLPAVIVNDYDTVKEVLTRPEF
DGRPDLFMARLRDQHFQRR
(1)
GIFFTDSESWREQRRFFLRTLHHFGFGRRSPEAEADIQAGLEDVISLLRDGPKYEHEKAL
VDSAGFALCPTVFFAVFSNVLLRMIVGVRLAREDQAVMFE
VGKNAIAFHRNGDDYGMLLSYIPWIRHLFPKTTKYDLLRKVNQQANAVILSLAQKCES
SYDENDIRCLVDAYIQEMRATGSKGESTGKDEFGFQ
(1)
YDQLVIGAADFLVPPFSAIPAKICLILERLIQYPEVQTKMYRELNEVVGLNRLPTLDDRA
DLPYCDAVIREGLRIDALVPSGIPHMAVTDTQLNGYQIPKGTVIVNSLEFIHHQPEIFRD
PDSFMPERFLTPDGKLALDQDKTLPFGAGKRVCGGEQFARNALFLGVTSLVQNFTFQ
LPAGRACPDLDGRITGVIQTTPDFRLKFVSRR*
>CYP305A6 AAGE01041187 494160882 476322188 754462117 mate pair =
754369970
which is
an exact match to part of AAGE01202372
65% to
305A2 825745101 613940462
AAGE01202372.1
N-term exon for CYP305A complete
1435 MITLVLSSVVIVSFIFWLWQDLQRPPNFPP (1) 1346
GPKWLPFFGNTLLIRNLARISGGQHLAFEALSKQYKSPVIGLKLGREHVVVALQ
YPAVHEALTKEAFDGRPDNFFIRLRTMGTR
(2)
LGITFTDGPFWTEHNSFVVR
HLRQAGYGRQPMQLQIQNELNELIGIIRDLDSEPVWPGSILPTSVINVLWTFTTGSRIPR
DDQRLTRLLKLLQDRSKAFDMSGG
ILSQLPWLRHIAPEWTGYNLINRFNQEIHEFFKATIEKHHQDYTEEKCSDDLIYAFIK
EMKERKDDPCSTFTDVQLSMIILDIFIAGSQTTSTTIDIALMILAMNTEIQRKIYAEIDD
NFHPDEIPDQNCRTNLQYTEAFLLEVMRLYQIAPIGGPRRALSDCTLGGYRIPRNTTILM
GLHTVQMDPDHWGDPENFRPERFIGPDGKIINTERLIPFGLGRRRCLGDSLARSCMFTFL
VGILQKFSLRLPDSLEGPSLKLTPGITLSPKPYKVVFEPRLK*
AAGE02003241.1
24317 MITLVLSSVVIVSFIFWLWQDLQRPPNFPP (1) 24228
13700
GPKWLPFFGNTLLIRNLARISGGQHLAFEALSKQYKSPVIGLKLGREHVVVALQYPA 13530
13529 VHEALTKEAFDGRPDNFFIRLRTMGTR (2) 13449
13391 LGITFTDGPFWTEH 13350
13349
NSFVVRHLRQAGYGRQPMQLQIQNELNELIGIIRDLDSEPVWPGSILPTSVINVLWTFTT 13170
13169
GSRIPRDDQRLTRLLKLLQDRSKAFDMSGGILSQLPWLRHIAPEWTGYNLINRFNQEIHE 12990
12989
FFKATIEKHHQDYTEEKCSDDLIYAFIKEMKERKDDPCSTFTDVQLSMIILDIFIAGSQT 12810
12809 TSTTIDIALMILAMNTEIQRKIYAEIDDNFHPDEIPDQNCRTNLQYTEAFLLEVMRLYQI 12630
12629
APIGGPRRALSDCTLGGYRIPRNTTILMGLHTVQMDPDHWGDPENFRPERFIGPDGKIIN 12450
12449
TERLIPFGLGRRRCLGDSLARSCMFTFLVGILQKFSLRLPDSLEGPSLKLTPGITLSPKP 12270
12269 YKVVFEPRLK* 12237
>73% to
CYP305 above 519967093 521924636 570423900 pseudogene of AAGE01051792
contains a
deletion and stop codon
FLPGPQWLPFIGNTPFVRKLARASGGQHLAFEALSKQYNSPVIGLKLGREYVVVALQYPAVREVHSKEEFDGRPDNF
LLKMRLERFVISRLGVTCTDGPFWAEHRNFVVRHLRQAGYGRQ
GIIRDMDGEPVWPGSILPTSVINVLWTFTTGSRIPRDDQRLARLLKLLQDRSKAFDMS
GGVLSQLPWLRHIAPEWTGYNLLKRFNQELHEFFMIIVERHHQEYHEEKCSDDLIYA
FIKEMKDRKDDPSSTFTDLQLTMIILDIFIAGSQTTSITIDLAFMMLTMHTDIQRDTCRN
R*DLHHDEMPSKRSYSLPYTE
AAGE02003240.1
this matches 305A5
Sbjct 53574 FLPGPQWLPFIGNTPFVRKLARASGGQHLAFEALSKQYNSPVIGLKLGREYVVVALQYPA 53395
Query 61
VREVHSKEEFDGRPDNF-----------------LLKMRLERFVISRLGVTCTDGPFWAE 103
VREVHSKEEFDGRPDNF
LLKMRLERFVISRLGVTCTDGPFWAE
Sbjct 53394 VREVHSKEEFDGRPDNFFLRLRTMGTR*DFKL*CLLKMRLERFVISRLGVTCTDGPFWAE 53215
Query 104 HRNFVVRHLRQAGYGRQ--------------GIIRDMDGEPVWPGSILPTSVINVLWTFT 149
HRNFVVRHLRQAGYGRQ
GIIRDMDGEPVWPGSILPTSVINVLWTFT
Sbjct 53214
HRNFVVRHLRQAGYGRQPMQLQIQNELNELIGIIRDMDGEPVWPGSILPTSVINVLWTFT 53035
Query 150 TGSRIPRDDQRLARLLKLLQDRSKAFDMSGGVLSQLPWLRHIAPEWTGYNLLKRFNQELH 209
TGSRIPRDDQRLARLLKLLQDRSKAFDMSGGVLSQLPWLRHIAPEWTGYNLLKRFNQELH
Sbjct 53034
TGSRIPRDDQRLARLLKLLQDRSKAFDMSGGVLSQLPWLRHIAPEWTGYNLLKRFNQELH 52855
Query 210 EFFMIIVERHHQEYHEEKCSDDLIYAFIKEMKDRKDDPSSTFTDLQLTMIILDIFIAGSQ 269
EFFMIIVERHHQEYHEEKCSDDLIYAFIKEMKDRKDDPSSTFTDLQLTMIILDIFIAGSQ
Sbjct 52854
EFFMIIVERHHQEYHEEKCSDDLIYAFIKEMKDRKDDPSSTFTDLQLTMIILDIFIAGSQ 52675
Query 270
TTSITIDLAFMMLTMHTDIQRDT-CRNRXDLHHDEMPSKRS-YSLPYTE 316
TTSITIDLAFMMLTMHTDIQ+ +LH DEMP + SLPYTE
Sbjct 52674 TTSITIDLAFMMLTMHTDIQKKIHAEIDENLHQDEMPQQNDRTSLPYTE 52528
>CYP305A5 AAGE01051792 70% to CYP305A2 but no stop
codon N-term exon is one of two choices.
82% to
other CYP305 Aedes seq
520611721
836008963 529076567 570690021
AAGE01309663.1
CYP305A N-term exon matches by default since the other CYP305 has an exon 1
sequence complete
MIVLVLTSVLIIAFSYWLLQELRRPPNYPP (1)
GPQWLPFIGNTPFVRKLARASGGQHLAFEALSKQYNSPVIGLKLGREYVVVALQYPA 696
697 VREVHSKEEFDGRPDNFFLRLRTMGTR (2?) 777
838
LGVTCTDGPFWAEHRNFVVRHLRQAGYGRQPMQLQIQNELNELIGIIRDM 987
988
DGEPVWPGSILPTSVINVLWTFTTGSRIPRDDQRLARLLKLLQDRSKAFDMSGGVLSQLP 1167
1168
WLRHIAPEWTGYNLLKRFNQELHEFFMIIVERHHQEYHEEKCSDDLIYAFIKEMKDRKDD 1347
1348 PSSTFTDLQLTMIILDIFIAGSQTTSITIDLAFMMLTMHTDIQKKIHAEIDENLHQDEMP
1527
1528
QQNDRTSLPYTEAFLLEVQRFFHIVPVSGPRRALSDCTLGGYQIPKNTTILMGLRTVHMD 1707
1708
PEHWGDPECFRPERFLSPDGKIITTERLIPFGLGRRRCLGESLARACMFTFLVGILQKFS 1887
1888
LRQPANCSEKPSPKLLPGITLSPKPYKVIFEPR* 1986
>CYP306A1 570772008 512981304 597667916 641824294
753304856 593374976 574131373
587966306
514783872 514783871 618134500 835036042 803206894 578828539
AAGE01228356
AAGE01635404 AAGE01635520 complete
MYLILGIVLILTYVLWTLLDRRGKPPGPFGLPILGYLPFIDSIKPYETLTNLAKRYG
PVYSLRMGQVDAVVLTAPDLIRDTLKREETTGRAPLFITHGIMGGH
(1)
GIICAEGNLWRDQRRLSTEWLRKMGMTKFGPTRATLEARILIGVNELLE
(0)
DLRRESEKVFAFDPAPLLHHILGNLMNDIVFGLQYERDDATWRYLQHLQEEGVKHIGVSMAVNFLPFLR
(2)
HLPSSKRIIEFLLNGKAKTHKIYDSIIEKQRSRMEGGGSEVSDP
GRHDDCILSNFLQETRRRETGARPELAFCSDVQLRHLLADLFGAGVDTTFTTLRWLILFL
ALNKDAQERLRQEMASQLRGEPCLNDVDSLPYLKACVAEAQRLRTVVPLGIPHGAVS
(0)
EITIAGYKVSKNTMIIPLLWSVHMDPSLWPNPDRFDPDRFLDESGQYSAPAHFMPFQT
GKRMCLGDELARMILLLYTGRLFWHFELDVFNGEGLDLTGVCGITLTPPPFEIIFKERV*
>CYP307A1 571521703 817504746 824335840 591439033 834970143
TC53059
TC28026 TC50479 78% to CYP307A1 complete
813467047
(exon 1) found by searching with the DNA seq above, 67% to anoph 307A1
246
MAYTLILVALMSLLSVVCYLKVLYEWHRKVRVQTVKSSRYAKKLQKLEESQPQEVEEAP 422
423
VEFPQAPGPYPWPVLGSAAIIGQYPAPFMGFSALAKKYGDVYSIRIGQGQCLVVSSLELI 602
603
REVLNQNGRYFGGRPDFLRYHQLFGGDRNN (1)
SLALCDWSSLQQKRRNLARKHCSPSDASSYYQKMSDVGV
AEMHYFMDQLTDVVTPGQDFKVKPLIMQACANMFSKYMCSVRFEYDDAGFQKMVHSFDEI
FYEINQGYAVDFMPWLAPFYFRHMSKLSSWSNYIRGFILERIVNEREQNLGEDEPERDFT
DALLKSLREDPSVSRDTIMYMLEDFIGGHSAIGNLVMLALGYVAKNPEIGARIQQEIDHV
TDKGLRNVTLYDTESMPYTVATIFEVLRYSSSPIVPHVATENTCIG
GYGVQTGTVVFINNYDLNTSEKYWDHPERFDPSR
(2?)
SNESQKQILRVKKNIPHFLPFSIGKRTCIGQNLVRGFSFIMLANILQKYDVHT
NDPAQIKMKPACVAVPPDTYPLAFTQRSQ*
>CYP307B1 AAGE01081732 476411966 68% to 307B1
519649910 578920479 complete
revised
according to AAGE02011086.1 and AAGE02028078.1
4/20/06
1027
MEKFTIFLFSSNTIYLLVACFLVTLIMLLLEVRQKISVKSDLVKLVKSFLFGQWLSVFTQNNKNRNL 848
847
NDTEVKVLRRAPGPKSYPIIGNLKDLDGYEVPYQAFSVLAKKYGPVVNLKLGVVDAVVIN 668
667 GIEHIKEVLINKAQYFDSRPNFRRYQLLFSGNKEN 533
(1)
SLAFCDWSEVQKARRDMLVPHTFPRNFSGRFNELNGVINDEIRLVIGESNVNRVIEIK
14
PIIMNICANVFSQYFASHRFELEDPKFQKLVKNFDQIFYEVNQGYAADFLPFLLPLHHR 193
194
NLKRMDQLAEEIREIMLETIINDRYDNWVEGNTENDYVDSLINHVKSKIGPDMEWETALF 373
374
ALEDIIGGHSAVANFLVKTFGYIIQHPEVQQNIQSEVDRVLETEGKHTVDLSDRNHMPYT 553
554
EAVIMEALRLIASPIVPHVANQDSQIG 637 (1?)
685
GYDVPKDTLIFLNNYDLSMSENLWENPNDFVPERFLQNGRLVKPDFFIPFGAGRRS 864
865
CMGYKMTQLISFSIIANLLRSYTITPLSGHSYFVPVGSLAMPEKSYEFQINLRH*
1029
CYP3 clan
CYP6 related sequences
Note CYP6
and CYP9 sequences (in Anopheles) have only one intron and will be the easiest
to assemble. 6AG, 6AH and 6AJ
Are
exceptions.
CYP3 clan
sequences
CYP6
related, 14 complete, 20 partials
>AAGE01198540
494152727 63% to CYP6Z2 67% to AY433537 519918984 574095157 569650597 complete
MFIYTFALFWLALVLVLRYIYSYWDRNGLASIKPQIPYGNLKSVAQK
TQSFGVATCELYWKSQERLAGIYLFFRPAVLIRDAHLAQRIMTTDFSYFHDRGVYCNEEI
DPFSANLFAL
PGKRWRNLRHRFTPLFTSGQLRCMMPTILDVGHKLQKFLEPAAERQEVVDIREIVSRGVL
ELIASLFFGFEADCINDPDDAFSKTLREFQLGGFMNNFRTACTFVCPELLQVTRISSLSP
QMIKFATDVVTKQIEHREKNNVSRKDFIQLLIDLRREEANNNEVALSFEQCAANVFLFYV
AGSDTSTSAITFTLHELTQNPEVMDKLQSEIDEMLVQTNGELTYTAIKELPYLDLCVKET
LRKYPGLAILNRKCTKSYAVPESSVVIQEGTQIMIPLLAYGMDEKYFPEPERYYPERFNKQSKNYDEKA
YYPFGEGPRNCI
(1)
AYRMGVMVSKIGLILLLSKFKFEATQGPKIVFSAATVPLVPKGGIPVKISNR*
>AAGE01065173
78% to AAGE01198540
N-term is
on AAGE02015843.1 (revised 4/20/06)
46903 MFIYTFALFWLAVAFAIRYIYSYWDRNGLPSIKPH
3333
IPYGNLKAVANRTESFGVATCDLYWKSKDRLVGIYLFFRPAVLIRDAHLAQQIMTTDFSH 3154
3153
FHDRGVFCNEEVDPFSANLFALAGKRWRNLRNKFTPLFTAGQLRCMMPIILSVGHKLQNV 2974
2973
LEPAAKKQEVLEIRELVSRCVLDIIASVFFGFEANCINDPNDAFIQNLRELQYDGFFNNL 2794
2793
RAAASFICPELLKLTRISSLSPEMIRFVTDIVTKQIEHREKNKVTRKDFIQLLIDLRRED 2614
2613
TNNNEAALGFEECAANVFLFYVAGSDTSTSAVAFTLHELTQNAETMGKLQTEIDEMLVKT 2434
2433
SGELTYDGIKEMSYLDLCVKETLRKYPGLAILNRECTKSYAVPNSDILLKKGTQVVIPLL 2254
2253
AYGMDEKYFPEPDRYLPERFDKSTKNYDEKAFYPFGEGPRNCI (1) 2116
2065
AFRMGVMVSKICLVLLLSRFNFEATRGPKIDFTPSTVALLPKGGIPVKISIR* 1907
>AAGE01047841
AY433537 62% to 6Z2 569650597 622013821 579345058 complete
MLFIYSVALLCIAVTLALKYVYSYWDRHGLPSVKPHIPFGNLKTVVKKTESFGIAIN
QLYWQTKGQLAGIYLFFRPAILVRD
AHLAQQIMTTDFNHFHDRGIYCNEEGDPFSANLFALPGKRWRNLRNKLTPLFTGGQLRGM
MPTILEVGEKLQKHLEPVAERQEVVEIRDIVSRFVLEIIATVFFGFEANCIEDRDDSFSK
VLREAQGERLSAVLRAAAMFVCPGLLRYTGISSLEPQVIAFVSEIVTKQIEHREKNSVTR
KDFIQQLIEIRRGSGENQVPAMSIEQCAANVFLFYAAGSETSTGTIAFSMHELSHHADVM
KKLQDEIDDALAKSNGAITYESVMQMQYLDLCVKETLRKYPGLPFLNRECTMDYKVPDSD
LVIRKGTQLVLPIYGFSMDEQYFPEPECYIPERFEEASKNYDEKAYYPFG
DGPRNCI
(1)
AYRMGVLITKIGLILLLSKFTFEATQGPKMMFSSASVPLLPKDGISLKISN
RKR*
>AAGE01005406
80% to AY433537 62% to 6Z2 complete
possible
pseudogene with frameshift at AVGDKLX X = ct
confirmed
in four trace archive sequences
520668645,
757097876, 589569591, 811977620
5398
MLFVYTLTILSIAITLVLKFVYSYWDRYGVQNIKPHIPFGNLKTVVKKTESFGVAINQLY 5219
5218 WQTKGQLVGIYLFFRPAILIRDAHLAQQIMTTDFNHFHDRGVYCNEEGDPFSASLFSLPG
5039
5038
KRWRNLRNKLTPLFTGGQLRGMMPTILAVGDKLX 4940
4937
KHLEPVAENREPIEIRDIVSRFVLEIIATVFFGFEANCIKDRNDAFCRVLREAQRESMYT 4758
4757
NFRAAAVFVCPGLLKYTGISSLEPEVKEFVSGIVTEQIEHREKNGATRKDFIQQLIELRR 4578
4577 EDSQNQNVRMSIEQCAANVFLFYIAGSETSTGTITFTMHELSQHPEVMKKLQAEIDDTLA
4398
4397
KSNGEITYENVNQIQYLDLCVKETLRKYPGLPILNRECTSDYKVPDLDLVIRKGTQVVIP 4218
4217
LYGISMDEQYFPEPECYKPERFDGASKNYDEKAYYPFGEGPRNCI (1) 4083
4017
AFRMGVLVSKIGLVLLSSKFNFKPTQGPKIVFSPAAVPLVPKGGISLMISRRDK 3856
VADLYMGLHISVVLKVVCS*
>AAGE01054542
476413066 56% to 20199522 76% to 6Y1 579367130
614744104
834925676 complete
MWLVYLVWLVAAVLLAVYLWIKKRFNFWKDRGVEYIEPEFPFGNFKTLGKVEHIAPITQR
HYDYFKQKGVPYGGVFMLTSPLLYILDTKLIKTLLVKDFNHFPNRGVYFNEKDDPLSAHMFAI
EGNKWKTLRNKLSPTFTSGRIKMTFPLVVGVCQQFCDHLGEVVQQSNEVEMHDLLSRYTI
DVIGTCAFGIDCNSFREPDNEFRKYGKIAFDKLPHSPLVVYLMKAFRSYANAFGMKQLHE
DVSSFFSKVVKDTIEYRESNNVVRNDFMDLLLKLKNTGRLEESGEEIGKISFEEIAAQAF
IFFTAGYDTSSTAMTYTLYELALNQKAQEKARKCVLDIFAANNGTLTYESVGNMGYLDQC
IN (1)
936
ETLRKHPPVAILERNADRDYKLPDSDIVIKKGRKIMIPTFAMHHDAEHFPDPE 760
759
RYDPDRFSPEQVACRDPYCYLPFGEGPRICIGMRFGTIQARVGLASLLKRFRFRVCDKTQ 580
579
IPVRYSKTNFILGPANGVWLRVEKL* 505
>AAGE01206812
586027460 593564617 637757183 494307621 complete 38% to 6M1
TC54189
TC23406 TC42024 38% CYP6P3 TC54190 TC23407 TC42025 TC574 TC6535
83% to
TC63333 94% to TC54191 581543219
MFLVILLITLSLYLYQKWIYTYWKRRGVPQLNPSFPFGDVADTFKQRKSYANRLAELHHQ
SASDSHRFVGIYTLFQPILLVTDVELVRRMLTVDFEHFTDRGAHVNEKRDPLSGHLFSLAG
AKWRRMRLKLTPAFTTAKLKAMFPTMMACGRTLSAVIDDHVGRALAIRDLMTRFTMDVIASVGFGLE
CNSMRNPDELFRQMGGRFFSKSWKTSVRMLLAFVAPKVNRYLQVK
LNDDDVEEYMLNLVR
DTIAKREGGGEVRKDFIQLL
()
VQLRNQVEVKDGGSWEMNKVDQNKTLTVEEMAAQSFVFLN
AGYETTSSTVTFCLFELCRNKDLIRKVQEEIDRVMDGGREISYEALAEMTYLESCIDETL
RKYPISPVLFRVCTKPYKIPETDVVIEKDTLVQISLVGLQRDTRYYEDPVKFDPDRYGER
KSETMPHYSFGDGPRVCI
(1)
GLRMGKVMAKMALVELLFRYDFELESPAADSGEIELDPSLLMLQAKHDVKLIPRFRAK*
>617983543
some differences with TC54191 TC42026 44% to CYP6Z3
94% to
586027460
584131270
520119914 760257438 832454533 625082069
complete
39% to 6N1 anopheles complete
MLLPILLVVLVVYLFQKWTYSHWKRRGVPQLNPAFP
FGNVADTFKQRTSYSNRLAELHHQAVRDGHRFVGIYTLL
QPILLVTDVELVKRMLTVDFEHFVDRGAHVNEKRDPLSGHLFSLTGAKWRRMRLKLTPAF
TTAKLKAMFPTMMACGRTLSAVIDDHVGRALAIRDLMTRFTMDVI
ASVGFGLECNSMRNPDELFRQMGGRFFSKSWKTSVRMLLAFVAPKVNRYLQVKLND
DDVEEYMLNLVRDTIAKREGGGEVRKDFIQLLVQLRNQVEVKDGGSWEMNKVDQNKTLTV
EEMAAQSFVFLNAGYETTSSTVTFCLFELCRNKDLIGKVQEEIDRVMDGGREISYEALAE
MTYLESCIDETLRKYPISPVLFRVCTKPYKIPETDVVIEKDTLVQISLVGLQRDTRYYED
PMKFDPDRYGERKSETMPHYSFGDGPRVCI
GLRMGKVMAKMALVELLSRYDFELESPAADSGEIELDPSLLMLQAKHDVKLIPRLRTK*
>NABNU08TR NABNU08 32% to CYP6Z4 59% to 586027460
(no
genomic match) looks like a pseudogene
best
genomic match was to 617983543 at 77%
I do not
think the TIGR database actually has an Aedes seq that is missing from
The 15
million trace files of Aedes, so this may be a contaminant from another
Species
KTLTPFERAAQSSGSQKAFYETTSATGDGSRIERSRNKDLI
GKVQEEIDRVMDGGKGIS*
EALAETTYPESCTEETLRKHPSPPDQDRGGTKPNKTPETDDASEKDTPVQTPPGGTQRDK
REKEDPEKHEPERYGERKPETTPHHSRGDGPRDSTGHRKGKATAKKALAEQPTRNDYEQE
PPAADTGENEQEPSQPTPQAKHEVKQKPRQRAK
>AAGE01003592
512632636 TC63333 TC10785 TC15419 TC26904 TC37692 TC4101
41% to
CYP6P4 83% to 586027460 836033925 753054225 574000494
(n-term
looks identical to 586027460) complete
revised
4/21/06 used AAGE02004393.1, AAGE02030939.1
MFLVILLITLSLYLYQKWIYTYWKRRGVPQLNP
SFPFGDVADTFKQRKSYANRLAELHHQSASDSHRFVG
IYTLFQPILL
VTDVELVRRMLTVDFEHFTDRGAHVNEKRDPLSGHLFSLAGAKWRWMLQKLAPAFTSAKV
KSMFPTMMTCGRTLSAVVGDHLGRALPIRALMTRFTMDVIASVGFGLDCN
SMRNPDEPFHKMGSKFFSKSWKTSVRMLLAFVAPKVNRFLQL
KLNDDDVEEYMLNLVRDTIAKREHGGEVRNDFIQLLVQLRNQVEVEDGGSWEINKVEPNK
ALTVQEIAAQSFVFLNAGYETTSSTITFCLFELCRNRDLLGKLQEEIDEVVDGGREASYE
AITEMTYLEACVEETLRKYPISPVLFRVCTKPYRIPDTDFVIEKGTLVQISLVGLNRDPR
YYEAPLKFDPDRYGERKAETMVHYSFGDGPRGCIGLRMGKVMVKMALVELLSNYDFEMES
PTGENELDPSLLMLQPKHDV