Oct. 18,
2005 under revision April 21, 2006 (in progress)
Revision
continues May 19, 2006 to June 25, 2006
Compiled
by David Nelson and David Drane
The completed and
named sequences are here
(http://drnelson.utmem.edu/AedesFasta.June25.htm)
This file
is more archival with detailed information.
Please see
the FASTA file above.
Useful
links for analysis
http://www.ncbi.nlm.nih.gov/Traces/trace.cgi Trace Archive at NCBI
http://trace.ensembl.org/perl/traceview
Trace files at Ensemble
http://132.192.64.52/blast/P450.html
P450 Blast server
http://www.proweb.org/proweb/Tools/WU-blast.html
Do-it-yourself WU Blast
http://www.bioinformatics.vg/bioinformatics_tools/JVT.shtml
DNA translator
http://www.ncbi.nlm.nih.gov/BLAST/tracemb.shtml
NCBI megablast
http://www.tigr.org/tigr-scripts/tgi/T_index.cgi?species=a_aegypti
TIGR Aedes gene index page
206 Aedes
sequences here including 142 complete sequences.
Numbers in
() are intron phases. Names have not been assigned for most genes.
Sequences
collected and assembled by David Drane and David Nelson from July to Sept.
2005. 3.5
million of 15 million trace file sequences were downloaded from NCBI and
placed on
a stand alone BLAST server on a Mac G4 for TBLASTN searches
at expect
value of 10. The WGS section of
Genbank was searched and 220 AAGE01XXXXXX
accession
numbers are given at the end of this file. The TIGR Gene Index was
searched
for text “P450”. The
EST section of Genbank was searched and
discontiguous
megablast was used to extend sequences by chromosome walking.
Most
sequences should be represented here now, but not all are assembled.
The Aedes
mosquito seems to have more P450s than the Anopheles mosquito.
This file
is in progress. The CYP4 and CYP325
families are not yet fully assembled
because
there are some large introns in these sequences.
The
sequences are presented in clan groups: the CYP2, CYP3, CYP4 and mitochondrial
clans. Note: Aedes has a CYP18 that was not
found in Anopheles.
CYP329 of
Anopheles now looks like it is a pseudogene of a CYP9 sequence.
It is
short in the heme signature and it has a P at the critical T in the I-helix
oxygen
binding pocket. It is the only
sequence that is in the CYP3 clan that does not
fall
inside the CYP6 or 9 families in Anopheles.
There are
11 complete sequences in the CYP2 clan (CYP15, 18, 303, 304, 305, 306:phm, 307)
Phantom phm is one of the Halloween genes.
There are
76 complete sequences in the CYP3 clan (CYP6, CYP9)
There are
34 complete sequences in the CYP4 clan (CYP4 , CYP325)
There are
9 complete sequences in the mitochondrial clan (CYP12, 49, 301, 302:dib,
314:shd, 315:sad) These include three of the
Halloween genes disembodied dib, shade shd,
shadow sad.
There are
21 pseudogenes so far.
There are
15 partial sequences (not including the pseudogenes).
CYP2/CYP18
clan sequences
>514720743
753475610 750240311 possible CYP15 N-term = DR747015.1 EST
MWQNLVVLIIFVILFCLRDMRKPGYFPP
(1)
>CYP15B
like 585964866 641740723 584363040
78 GPNWFPLIGSGFEVFRLVKHFKFYHLMWAELMRRYGPIVGLRLGRDRVVIVSGLDA
257
258
IREVYSKDQFDGRPDGFFFRIRSFDKRLGVVFTDGAHWDIQRRFSVRTLKALGMGRTGMV 437
438
NSLEREAEEMIHHLRKLSRTQKVISMHNAFDVSVLNSIWTLIAGKR (2) 575
FDLDDKKLEWIMETIHKSFRVIDMSGGVLNQFPPIRYVLPDKSGFAPLLNLLSPLWTFLQ 816
>CYP15B
like seq DR746695.1 adult female corpora allata cDNA
813859354
749484786 522065275 514869301 520643713
GTIKSIRSKLDQPDNPDCFIASYLRELNIAERHSSFTNEQLLCLCLDL
FQAGSETTSNTLGYGIAHMLHHPEIVQKIHNELDSVIGRYRLPLLADRPYLPYTEAVLCE
IQRISNIAPLAIAHRTVAPVQLGTYVIPKNTITLISLYSLHMDKAYWGDPEVFRPERFLN
ETGDKLIAHEYFVPFGS
(1)
GKRRCLGESLAKSSLFLFFTAFMHAFLVEPAEPGKLPELDGIDGITLSPCPYFVQLKERLI*
>possible
complete CYP15B1 assembled from parts 52% to 15B1 from
Anopheles
AAGE01116789
AAGE01129498
Used trace
archive seqs to verify seq at PLLNLLRPLWTFLQ
This
region is not accurate in AAGE02003241.1
638470554
823375362
593712263
586030336
641740723
569671400
used AAGE02003241.1 for the C-term seq changes
MWQNLVVLIIFVILFCLRDMRKPGYFPP
(1)
GPNWFPLIGSGFEVFRLVKHFKFYHLMWAELMRRYGPIVGLRLGRDRVVIVSGLDA
IREVYSKDQFDGRPDGFFFRIRSFDKRLGVVFTDGAHWDIQRRFSVRTLKALGMGRTGMV
NSLEREAEEMIHHLRKLSRTQKVISMHNAFDVSVLNSIWTLIAGKR
(2)
FDLDDKKLEWIMETIHKSFRVIDMSGGVLNQFPPIRYVLPDKSGFAPLLNLLRPLWTFLQ (0)
GTIKSIRSKLDQPDNPDCFIASYLRELNIAERHSSFTNEQLLCLCLDL
FQAGSETTSNTLGYGIAHMLHHPEIVQKIHNELDSVIGRYRLPLLADRPYLPYTEAVLCE
IQRISNIAPLAIAHRTVAPVQLGTYVIPKNTITLISLYSLHMDKAYWGDPEVFRPERFLN
ETGDKLVAHEYFVPFGS (1)
GKRRCLGESLAKSSLFLFFTAFMHAFLVEPAEPGKLPELDGIDGITLSPCPYYVQLKERLI*
>567527404 46% to CYP15B1 may be a CYP15
pseudogene
XXXXXXXXXXXXXXXXLVRRFRFYHHTCAAFMCLYRPIVDLRMGRDRVVIMTGLDP
I*KVYSKDEKENRPVGFFFRIRSFDKRLAVVFTDGAHWDIQRRFSVRTLKALGMGRTGLV
SSLEREAEEMIHHLRKLSRTQKVISRNNAFDVSVLNSIWTLIAER
>CYP18A1 AAGE01025833 AAGE01338874.1 AAGE01065191.1
529463664 572557122
66% to
18A1 (note: CYP18 not seen in Anopheles) complete
revised at cyan aa based on AAGE02007615.1
MFLDTYLLGVVRQEFFDASKARST
2678
LLVFCCTLSCVVFLQWLFRLVCQIKKLPPGPWGVPIFGYLTFIGHEKHTQYMKLARKYG 2502
2501
SLFSAKLGAQLTVVISDYKIIREAFKTEDFTGRPHSPLLKTLGGF (1?)
GIINSEGQLWKDQRRFLH
719
718 EKLRHFGMTVLGNKKHLMESRIM (0)
534
TEVAELLASLNEVGSQSTDLSKYLSVSVSNVICNIIMSVRFSLEDPKFKRFNWLIEEGMRLF 359
358
GEIHTIDYIPQIQYLPGNINAKNKIAKNRQEMFDFYREVIDEHKRSFNAENIRDIVDAYL 179
178 DEIQKAQAEGRDQELFDGKDH 116
EIQMMQVIADLFSAGMETIKTTLLWLNVFMLRHPDAMKRVQDELDQVVGRNRLPKIEDVP 406
YLPITETTILEVMRISSIVPLATTHSPKS (2) 319
2116
DVVINGYTIPAGSYVVPLINSVHMDPTLWDKPEEFNPSRFLDAEGKVHKPDFFIPFGVGR 1937
1936
RRCLGDVLARMELFLFFASIMHTFTIELPEDEPMPSLKGIIGVTISPQAFRVKLIPRPLN 1757
1756
ADLDRLRNVGSC* 1718
>AAGE01098313
(upper seq) CYP18 like fragment probable pseudogene
Query: 1703 ASLNEVGSRSS 1671
ASLNEVGS+S+
Sbjct: 210 ASLNEVGSQST 220
Query: 313
VGSQSTELSKYLSVLVSNVICNIIMSVRFSLEDPKF--------------GGMHTIDYIP 176
VGSQST+LSKYLSV VSNVICNIIMSVRFSLEDPKF
G +HTIDYIP
Sbjct: 215
VGSQSTDLSKYLSVSVSNVICNIIMSVRFSLEDPKFKRFNWLIEEGMRLFGEIHTIDYIP 274
Query: 175
QIQYLPGNV-------KNRQEMFDIYREVINEHKRSFNAENIRDIV 59
QIQYLPGN+ KNRQEMFD YREVI+EHKRSFNAENIRDIV
Sbjct: 275
QIQYLPGNINAKNKIAKNRQEMFDFYREVIDEHKRSFNAENIRDIV 320
Query: 56 AYLDEILKAQAE 21
AYLDEI KAQAE
Sbjct: 322 AYLDEIQKAQAE 333
>AAGE01227048
(upper seq) CYP18 like fragment probable pseudogene
Query: 643
ASLNEVGSQPIDLNKYLSVSVSNVICNIIMSVRFSLEDPKFA-------------G-LHT 780
ASLNEVGSQ DL+KYLSVSVSNVICNIIMSVRFSLEDPKF
G +HT
Sbjct: 210
ASLNEVGSQSTDLSKYLSVSVSNVICNIIMSVRFSLEDPKFKRFNWLIEEGMRLFGEIHT 269
Query: 763
FAGLHTIDYIPQIQYLPGNV-------KNRQEMFDFYREMIDEHKQSFNAENIRDIV 912
F +HTIDYIPQIQYLPGN+
KNRQEMFDFYRE+IDEHK+SFNAENIRDIV
Sbjct: 264
FGEIHTIDYIPQIQYLPGNINAKNKIAKNRQEMFDFYREVIDEHKRSFNAENIRDIV 320
Query: 915 AYLDEILKAQAEDRDQELFEGKDHEI
992
AYLDEI KAQAE RDQELF+GKDH++
Sbjct: 322 AYLDEIQKAQAEGRDQELFDGKDHDV
347
>CYP303A1 AAGE01109944 641807020 834983680 618119317
834966118
826136105 587934965
72% to
303A1 complete
MYWYYLACFIVVFIIFLYLDCIKPANFPPGPKWYPIIGSAIEIARARQKTGMLCKAIKLIASKYDHKGVIGF
KVGKDKTVMAISGDSLREMMSNEDLDGRPTGIFYETRTWGLRRGVLLTDEEFWQEQRRFI
VRHLKEFGFARKGMAEIIGNEAEYVKNDFHALVKAGNGKALVQMQSAFSVYILNTLWLMM
AGIRYTRENKDLKYLQSLLHELFANIDMMGALFSHFPFIRFFAPRLSGYKQFVEIHNLMH
KFIGAEVENHKKSFNDTDEPRDLMDVYLKILQSNRDIPESFSQEQLLAVCLDMFIAGSET
TTKTLGFAFLHLVRQRETQLKVQKELDEVVGRNRLPTLEDRVN
(2)
LPYCEAVVLEALRMFMANTFGIPHRALRDTKLCGYDIPK
(0)
DTMLVGMFRGMMLNDWESPTSFKPERFLKGGKIVIPPNFHPFGVGRHRCMGEMMGKAN
110
LFLFITTLFQSFDFLVPEGYPIPSDEPIDGATPSVRQYTALIVPR*
>581536484
803281860 586608108 826028980
574131458
595148561 754352758 590136340 519840563 753671460
67% to
512982119 above 58% to 304B anoph
tried
walking the chromosome down to exon 2 so
some
numbers above are in the intron
MLLNPSAILWTVAAGLLIYRCFRFMFDRPPNFPSGPPRFPLLGSYLVLLMVNYRHLHR
AAVRLGQFYRTKILGIYLGDFPSIVVNDLAIAKEVLARSEFDGRSDLFLARMRERNFQRR
GIFFTDGPHWKEQRRFVLRHLRDYGFGRRFDELEAETRSELMTLLDVLRYGPKFEHERLF
AKDGCVKCPDAFYGLLGNVYFQVICGERFQRKDMAQLYE
(1)
>223483644
519671636 528946489 494183870
a second
304B like C-term sequence 57% to 304B
TGRHAINFQQKGDDYGTILSYLPWLKDYFPEATNYRILREVNNRMNDLIEAMVQKY
LASYDENHMRCFLDRYIYEMKQSKPLEGDAFTFQ
(1)
YDQLVMILWDMLLPTLSGSAIQLSMLLERLLLNPRVATKVQQELDGVVGHGRLPTLDDRV
889
NLPYAEATLLEALRIDTLVPSGISHVALEDTKLCGYDIPKGCFVMLSLDVINNQREFWGDP
ENFRPERFLDESGKLSLKKDISVPFGGGKRLCVGETFSRNTLFLMFTALMQNFNIKPRPG
DPLPDLGQRITGVVTSMEPFWLRFEAR*
>CYP304B2xx Possible full length gene joining the
512982119 and 223483644 fragments complete
note this
is a hybrid of two different genes, see corrected seq below
AAGE01051934
MFVTPTIFLWLVTIGLIAYRCHRFLFDRPKNFPDGPPKFPLLGGYAVMLLINFYHLHR
AANKLCEYYRTKILGIYLGNFPTVIVNDFATVKEVLNRVEFDGRPDLFIARMREKNFLLR
GIFFTQGPDWKEQRRFILRYLRDYGFGRRFDELEAETNAEILTLVEMLRYGPRHEHETEF
MTKDGCAMVPNVFFACFANAFLYVLTGERINRDEAGALFE
TGRHAINFQQKGDDYGTILSYLPWLKDYFPEATNYRILREVNNRMNDLIEAMVQKY
LASYDENHMRCFLDRYIYEMKQSKPLEGDAFTFQ
(1)
YDQLVMILWDMLLPTLSGSAIQLSMLLERLLLNPRVATKVQQELDGVVGHGRLPTLDDRV
889
NLPYAEATLLEALRIDTLVPSGISHVALEDTKLCGYDIPKGCFVMLSLDVINNQREFWGDP
ENFRPERFLDESGKLSLKKDISVPFGGGKRLCVGETFSRNTLFLMFTALMQNFNIKPRPG
DPLPDLGQRITGVVTSMEPFWLRFEAR*
>CYP304B3yy/xx top part = my old Byy, bottom = my old Bxx + 1 aa diff
DW987682.1 EST supports this assembly, so mine
are hybrids
AAGE02028825.1 revised seq on 4/20/06
46553 MLLNPSAILWTVAAGLLIYRCFRFMFDRPPNFPSGPPRFPLLGSYLVLLMVNYRHLHRAAVRLGQFYRTKILGIYLGDFP
SIVVNDLAIAKEVLARSEFDGRSDLFLARMRERNFQRRGIFFTDGPHWKEQRRFVLRHLRDYGFGRRFDELEAETRSELM
TLLDVLRYGPKFEHERLFAKDGCVKCPDAFYGLLGNVYFQVICGERFQRKDMAQLYE
(2) 47203
61403 TGRHAINFQQKGDDYGTILSYLP
WLKDYFPEATNYRILREVNNRLNDLIEAMVQKYLASYDENHMRCFLDRYIYEMKQSKPLEGDAFTFQ
(1) 61672
61872 YDQLVMILWDMLL
PTLSGSAIQLSMLLERLLLNPRVATKVQQELDGVVGHGRLPTLDDRVNLPYAEATLREALRIDTLVPSGISHVALEDTKL
CGYDIPKGCFVMLSLDVINNQREFWGDPENFRPERFLDESGKLSLKKDISVPFGGGKRLCVGETFSRNTLFLMFTALMQN
FNIKPRPGDPLPDLGQRITGVVTSMEPFWLRFEAR* 62501
>512982119
637789748 834948129 570603901 750442192 570627554 568540398
743856885
581525309 637183809 812171267 586112683 570800380
579961153
793213948 581533371 587665129 570695804 574007683
60% to
304B1anopheles numbers above include a long chromosome
walk of
about 5-7kb, about 500 bp per step.
No C-term was found
N-term
exon is 55% to 304B anoph. and 48% to 304C anoph.
MFVTPTIFLWLVTIGLIAYRCHRFLFDRPKNFPDGPPKFPLLGGYAVMLLINFYHLHR
AANKLCEYYRTKILGIYLGNFPTVIVNDFATVKEVLNRVEFDGRPDLFIARMREKNFLLR
GIFFTQGPDWKEQRRFILRYLRDYGFGRRFDELEAETNAEILTLVEMLRYGPRHEHETEF
MTKDGCAMVPNVFFACFANAFLYVLTGERINRDEAGALFE
>CYP304B
494544931 512720460 41% to 476322188 72% to 304B1
827562306
594336057 512633341
(2)
TGKYAMMFQRTGDDYGTIYSLLPWMRHLFPNRTRYRTIREGSLGVNRFIESII
QKRLETHEEGHVRCFLDLYFTEMKKTVPRTEDNRFTFQ
(1)
HDQLVLGIVDFFFPAISGATTQ
IALLLERLLWHPEVVQKMQAEIDDVVGHGRLPTLDDRINLPYTEATLREAMRI
DTLVPSGVAHMAMKDTTLRGYDIPKDTILVLGLDSIHMQKDIWGDPERFRPERFLNYRGE
LSLSKDVSVPFGAGKRLCAGETFARNTMFLIVSALVQNFNIRQRLGDKLPDMGKRSTGII
ISPADYWVKFEPR*
>CYP304Byy
AAGE01029809 Possible full length gene joining
the
581536484 and 494544931 fragments complete
note this
is a hybrid of two different genes, see corrected seq below
MLLNPSAILWTVAAGLLIYRCFRFMFDRPPNFPSGPPRFPLLGSYLVLLMVNYRHLHR
AAVRLGQFYRTKILGIYLGDFPSIVVNDLAIAKEVLARSEFDGRSDLFLARMRERNFQRR
GIFFTDGPHWKEQRRFVLRHLRDYGFGRRFDELEAETRSELMTLLDVLRYGPKFEHERLF
AKDGCVKCPDAFYGLLGNVYFQVICGERFQRKDMAQLYE
(1)
(2)
TGKYAMMFQRTGDDYGTIYSLLPWMRHLFPNRTRYRTIREGSLGVNRFIESII
QKRLETHEEGHVRCFLDLYFTEMKKTVPRTEDNRFTFQ
(1)
HDQLVLGIVDFFFPAISGATTQ
IALLLERLLWHPEVVQKMQAEIDDVVGHGRLPTLDDRINLPYTEATLREAMRI
DTLVPSGVAHMAMKDTTLRGYDIPKDTILVLGLDSIHMQKDIWGDPERFRPERFLNYRGE
LSLSKDVSVPFGAGKRLCAGETFARNTMFLIVSALVQNFNIRQRLGDKLPDMGKRSTGII
ISPADYWVKFEPR*
>CYP304Bxx/yy top part = my
old Bxx, bottom = my old Byy
AAGE02028825.1 revised accurate seq 4/20/06
22307 MFVTPTIFLWLVTIGLIAYRCHRFLFDRPKNFPDGPPKFPLLGGYAVMLLINFYHLHRAANKLCEYYRTKILGIYLGNFP
TVIVNDFATVKEVLNRVEFDGRPDLFIARMREKNFLLRGIFFTQGPDWKEQRRFILRYLRDYGFGRRFDELEAETNAEIL
TLVEMLRYGPRHEHETEFMTKDGCAMVPNVFFACFANAFLYVLTGERINRDEAGALFE (2) 22960
35137 TGKYAMMFQRTGDDYGTIYSLL
PWMRHLFPNRTRYRTIREGSLGVNRFIESIIQKRLETHEEGHVRCFLDLYFTEMKKTVPRTEDNRFTFQH
35412
35469 DQLVLGIVDF
FFPAISGATTQIALLLERLLWHPEVVQKMQAEIDDVVGHGRLPTLDDRINLPYTEATLREAMRIDTLVPSGVAHMAMKDT
TLRGYDIPKDTILVLGLDSIHMQKDIWGDPERFRPERFLNYRGELSLSKDVSVPFGAGKRLCAGETFARNTMFLIVSALV
QNFNIRQRLGDKLPDMGKRSTGIIISPADYWVKFEPR* 36092
>CYP304C1 AAGE01104491 512990636 572473586 613989430
64% to CYP304C1
749978894
754492027 584954719 complete
MVLISELIIAALLGLLIYRFYRYLFERPSENFPPGPPRL
PLLGGYPFMLALNYKHLHKAAARLSQLYKSKLIGLYLGPLPAVIVNDYDTVKEVLTRPEF
DGRPDLFMARLRDQHFQRR
(1)
GIFFTDSESWREQRRFFLRTLHHFGFGRRSPEAEADIQAGLEDVISLLRDGPKYEHEKAL
VDSAGFALCPTVFFAVFSNVLLRMIVGVRLAREDQAVMFE
VGKNAIAFHRNGDDYGMLLSYIPWIRHLFPKTTKYDLLRKVNQQANAVILSLAQKCES
SYDENDIRCLVDAYIQEMRATGSKGESTGKDEFGFQ
(1)
YDQLVIGAADFLVPPFSAIPAKICLILERLIQYPEVQTKMYRELNEVVGLNRLPTLDDRA
DLPYCDAVIREGLRIDALVPSGIPHMAVTDTQLNGYQIPKGTVIVNSLEFIHHQPEIFRD
PDSFMPERFLTPDGKLALDQDKTLPFGAGKRVCGGEQFARNALFLGVTSLVQNFTFQ
LPAGRACPDLDGRITGVIQTTPDFRLKFVSRR*
>CYP305A6 AAGE01041187 494160882 476322188 754462117 mate pair =
754369970
which is
an exact match to part of AAGE01202372
65% to
305A2 825745101 613940462
AAGE01202372.1
N-term exon for CYP305A complete
1435 MITLVLSSVVIVSFIFWLWQDLQRPPNFPP (1) 1346
GPKWLPFFGNTLLIRNLARISGGQHLAFEALSKQYKSPVIGLKLGREHVVVALQ
YPAVHEALTKEAFDGRPDNFFIRLRTMGTR
(2)
LGITFTDGPFWTEHNSFVVR
HLRQAGYGRQPMQLQIQNELNELIGIIRDLDSEPVWPGSILPTSVINVLWTFTTGSRIPR
DDQRLTRLLKLLQDRSKAFDMSGG
ILSQLPWLRHIAPEWTGYNLINRFNQEIHEFFKATIEKHHQDYTEEKCSDDLIYAFIK
EMKERKDDPCSTFTDVQLSMIILDIFIAGSQTTSTTIDIALMILAMNTEIQRKIYAEIDD
NFHPDEIPDQNCRTNLQYTEAFLLEVMRLYQIAPIGGPRRALSDCTLGGYRIPRNTTILM
GLHTVQMDPDHWGDPENFRPERFIGPDGKIINTERLIPFGLGRRRCLGDSLARSCMFTFL
VGILQKFSLRLPDSLEGPSLKLTPGITLSPKPYKVVFEPRLK*
AAGE02003241.1
24317 MITLVLSSVVIVSFIFWLWQDLQRPPNFPP (1) 24228
13700
GPKWLPFFGNTLLIRNLARISGGQHLAFEALSKQYKSPVIGLKLGREHVVVALQYPA 13530
13529 VHEALTKEAFDGRPDNFFIRLRTMGTR (2) 13449
13391 LGITFTDGPFWTEH 13350
13349
NSFVVRHLRQAGYGRQPMQLQIQNELNELIGIIRDLDSEPVWPGSILPTSVINVLWTFTT 13170
13169
GSRIPRDDQRLTRLLKLLQDRSKAFDMSGGILSQLPWLRHIAPEWTGYNLINRFNQEIHE 12990
12989
FFKATIEKHHQDYTEEKCSDDLIYAFIKEMKERKDDPCSTFTDVQLSMIILDIFIAGSQT 12810
12809 TSTTIDIALMILAMNTEIQRKIYAEIDDNFHPDEIPDQNCRTNLQYTEAFLLEVMRLYQI 12630
12629
APIGGPRRALSDCTLGGYRIPRNTTILMGLHTVQMDPDHWGDPENFRPERFIGPDGKIIN 12450
12449
TERLIPFGLGRRRCLGDSLARSCMFTFLVGILQKFSLRLPDSLEGPSLKLTPGITLSPKP 12270
12269 YKVVFEPRLK* 12237
>73% to
CYP305 above 519967093 521924636 570423900 pseudogene of AAGE01051792
contains a
deletion and stop codon
FLPGPQWLPFIGNTPFVRKLARASGGQHLAFEALSKQYNSPVIGLKLGREYVVVALQYPAVREVHSKEEFDGRPDNF
LLKMRLERFVISRLGVTCTDGPFWAEHRNFVVRHLRQAGYGRQ
GIIRDMDGEPVWPGSILPTSVINVLWTFTTGSRIPRDDQRLARLLKLLQDRSKAFDMS
GGVLSQLPWLRHIAPEWTGYNLLKRFNQELHEFFMIIVERHHQEYHEEKCSDDLIYA
FIKEMKDRKDDPSSTFTDLQLTMIILDIFIAGSQTTSITIDLAFMMLTMHTDIQRDTCRN
R*DLHHDEMPSKRSYSLPYTE
AAGE02003240.1
this matches 305A5
Sbjct 53574 FLPGPQWLPFIGNTPFVRKLARASGGQHLAFEALSKQYNSPVIGLKLGREYVVVALQYPA 53395
Query 61
VREVHSKEEFDGRPDNF-----------------LLKMRLERFVISRLGVTCTDGPFWAE 103
VREVHSKEEFDGRPDNF
LLKMRLERFVISRLGVTCTDGPFWAE
Sbjct 53394 VREVHSKEEFDGRPDNFFLRLRTMGTR*DFKL*CLLKMRLERFVISRLGVTCTDGPFWAE 53215
Query 104 HRNFVVRHLRQAGYGRQ--------------GIIRDMDGEPVWPGSILPTSVINVLWTFT 149
HRNFVVRHLRQAGYGRQ
GIIRDMDGEPVWPGSILPTSVINVLWTFT
Sbjct 53214
HRNFVVRHLRQAGYGRQPMQLQIQNELNELIGIIRDMDGEPVWPGSILPTSVINVLWTFT 53035
Query 150 TGSRIPRDDQRLARLLKLLQDRSKAFDMSGGVLSQLPWLRHIAPEWTGYNLLKRFNQELH 209
TGSRIPRDDQRLARLLKLLQDRSKAFDMSGGVLSQLPWLRHIAPEWTGYNLLKRFNQELH
Sbjct 53034
TGSRIPRDDQRLARLLKLLQDRSKAFDMSGGVLSQLPWLRHIAPEWTGYNLLKRFNQELH 52855
Query 210 EFFMIIVERHHQEYHEEKCSDDLIYAFIKEMKDRKDDPSSTFTDLQLTMIILDIFIAGSQ 269
EFFMIIVERHHQEYHEEKCSDDLIYAFIKEMKDRKDDPSSTFTDLQLTMIILDIFIAGSQ
Sbjct 52854
EFFMIIVERHHQEYHEEKCSDDLIYAFIKEMKDRKDDPSSTFTDLQLTMIILDIFIAGSQ 52675
Query 270
TTSITIDLAFMMLTMHTDIQRDT-CRNRXDLHHDEMPSKRS-YSLPYTE 316
TTSITIDLAFMMLTMHTDIQ+ +LH DEMP + SLPYTE
Sbjct 52674 TTSITIDLAFMMLTMHTDIQKKIHAEIDENLHQDEMPQQNDRTSLPYTE 52528
>CYP305A5 AAGE01051792 70% to CYP305A2 but no stop
codon N-term exon is one of two choices.
82% to
other CYP305 Aedes seq
520611721
836008963 529076567 570690021
AAGE01309663.1
CYP305A N-term exon matches by default since the other CYP305 has an exon 1
sequence complete
MIVLVLTSVLIIAFSYWLLQELRRPPNYPP (1)
GPQWLPFIGNTPFVRKLARASGGQHLAFEALSKQYNSPVIGLKLGREYVVVALQYPA 696
697 VREVHSKEEFDGRPDNFFLRLRTMGTR (2?) 777
838
LGVTCTDGPFWAEHRNFVVRHLRQAGYGRQPMQLQIQNELNELIGIIRDM 987
988
DGEPVWPGSILPTSVINVLWTFTTGSRIPRDDQRLARLLKLLQDRSKAFDMSGGVLSQLP 1167
1168
WLRHIAPEWTGYNLLKRFNQELHEFFMIIVERHHQEYHEEKCSDDLIYAFIKEMKDRKDD 1347
1348 PSSTFTDLQLTMIILDIFIAGSQTTSITIDLAFMMLTMHTDIQKKIHAEIDENLHQDEMP
1527
1528
QQNDRTSLPYTEAFLLEVQRFFHIVPVSGPRRALSDCTLGGYQIPKNTTILMGLRTVHMD 1707
1708
PEHWGDPECFRPERFLSPDGKIITTERLIPFGLGRRRCLGESLARACMFTFLVGILQKFS 1887
1888
LRQPANCSEKPSPKLLPGITLSPKPYKVIFEPR* 1986
>CYP306A1 570772008 512981304 597667916 641824294
753304856 593374976 574131373
587966306
514783872 514783871 618134500 835036042 803206894 578828539
AAGE01228356
AAGE01635404 AAGE01635520 complete
MYLILGIVLILTYVLWTLLDRRGKPPGPFGLPILGYLPFIDSIKPYETLTNLAKRYG
PVYSLRMGQVDAVVLTAPDLIRDTLKREETTGRAPLFITHGIMGGH
(1)
GIICAEGNLWRDQRRLSTEWLRKMGMTKFGPTRATLEARILIGVNELLE
(0)
DLRRESEKVFAFDPAPLLHHILGNLMNDIVFGLQYERDDATWRYLQHLQEEGVKHIGVSMAVNFLPFLR
(2)
HLPSSKRIIEFLLNGKAKTHKIYDSIIEKQRSRMEGGGSEVSDP
GRHDDCILSNFLQETRRRETGARPELAFCSDVQLRHLLADLFGAGVDTTFTTLRWLILFL
ALNKDAQERLRQEMASQLRGEPCLNDVDSLPYLKACVAEAQRLRTVVPLGIPHGAVS
(0)
EITIAGYKVSKNTMIIPLLWSVHMDPSLWPNPDRFDPDRFLDESGQYSAPAHFMPFQT
GKRMCLGDELARMILLLYTGRLFWHFELDVFNGEGLDLTGVCGITLTPPPFEIIFKERV*
>CYP307A1 571521703 817504746 824335840 591439033 834970143
TC53059
TC28026 TC50479 78% to CYP307A1 complete
813467047
(exon 1) found by searching with the DNA seq above, 67% to anoph 307A1
246
MAYTLILVALMSLLSVVCYLKVLYEWHRKVRVQTVKSSRYAKKLQKLEESQPQEVEEAP 422
423
VEFPQAPGPYPWPVLGSAAIIGQYPAPFMGFSALAKKYGDVYSIRIGQGQCLVVSSLELI 602
603
REVLNQNGRYFGGRPDFLRYHQLFGGDRNN (1)
SLALCDWSSLQQKRRNLARKHCSPSDASSYYQKMSDVGV
AEMHYFMDQLTDVVTPGQDFKVKPLIMQACANMFSKYMCSVRFEYDDAGFQKMVHSFDEI
FYEINQGYAVDFMPWLAPFYFRHMSKLSSWSNYIRGFILERIVNEREQNLGEDEPERDFT
DALLKSLREDPSVSRDTIMYMLEDFIGGHSAIGNLVMLALGYVAKNPEIGARIQQEIDHV
TDKGLRNVTLYDTESMPYTVATIFEVLRYSSSPIVPHVATENTCIG
GYGVQTGTVVFINNYDLNTSEKYWDHPERFDPSR
(2?)
SNESQKQILRVKKNIPHFLPFSIGKRTCIGQNLVRGFSFIMLANILQKYDVHT
NDPAQIKMKPACVAVPPDTYPLAFTQRSQ*
>CYP307B1 AAGE01081732 476411966 68% to 307B1
519649910 578920479 complete
revised
according to AAGE02011086.1 and AAGE02028078.1
4/20/06
1027
MEKFTIFLFSSNTIYLLVACFLVTLIMLLLEVRQKISVKSDLVKLVKSFLFGQWLSVFTQNNKNRNL 848
847
NDTEVKVLRRAPGPKSYPIIGNLKDLDGYEVPYQAFSVLAKKYGPVVNLKLGVVDAVVIN 668
667 GIEHIKEVLINKAQYFDSRPNFRRYQLLFSGNKEN 533
(1)
SLAFCDWSEVQKARRDMLVPHTFPRNFSGRFNELNGVINDEIRLVIGESNVNRVIEIK
14
PIIMNICANVFSQYFASHRFELEDPKFQKLVKNFDQIFYEVNQGYAADFLPFLLPLHHR 193
194
NLKRMDQLAEEIREIMLETIINDRYDNWVEGNTENDYVDSLINHVKSKIGPDMEWETALF 373
374
ALEDIIGGHSAVANFLVKTFGYIIQHPEVQQNIQSEVDRVLETEGKHTVDLSDRNHMPYT 553
554
EAVIMEALRLIASPIVPHVANQDSQIG 637 (1?)
685
GYDVPKDTLIFLNNYDLSMSENLWENPNDFVPERFLQNGRLVKPDFFIPFGAGRRS 864
865
CMGYKMTQLISFSIIANLLRSYTITPLSGHSYFVPVGSLAMPEKSYEFQINLRH*
1029
CYP3 clan
CYP6 related sequences
Note CYP6
and CYP9 sequences (in Anopheles) have only one intron and will be the easiest
to assemble. 6AG, 6AH and 6AJ
Are
exceptions.
CYP3 clan
sequences
CYP6
related, 14 complete, 20 partials
>AAGE01198540
494152727 63% to CYP6Z2 67% to AY433537 519918984 574095157 569650597 complete
MFIYTFALFWLALVLVLRYIYSYWDRNGLASIKPQIPYGNLKSVAQK
TQSFGVATCELYWKSQERLAGIYLFFRPAVLIRDAHLAQRIMTTDFSYFHDRGVYCNEEI
DPFSANLFAL
PGKRWRNLRHRFTPLFTSGQLRCMMPTILDVGHKLQKFLEPAAERQEVVDIREIVSRGVL
ELIASLFFGFEADCINDPDDAFSKTLREFQLGGFMNNFRTACTFVCPELLQVTRISSLSP
QMIKFATDVVTKQIEHREKNNVSRKDFIQLLIDLRREEANNNEVALSFEQCAANVFLFYV
AGSDTSTSAITFTLHELTQNPEVMDKLQSEIDEMLVQTNGELTYTAIKELPYLDLCVKET
LRKYPGLAILNRKCTKSYAVPESSVVIQEGTQIMIPLLAYGMDEKYFPEPERYYPERFNKQSKNYDEKA
YYPFGEGPRNCI
(1)
AYRMGVMVSKIGLILLLSKFKFEATQGPKIVFSAATVPLVPKGGIPVKISNR*
>AAGE01065173
78% to AAGE01198540
N-term is
on AAGE02015843.1 (revised 4/20/06)
46903 MFIYTFALFWLAVAFAIRYIYSYWDRNGLPSIKPH
3333
IPYGNLKAVANRTESFGVATCDLYWKSKDRLVGIYLFFRPAVLIRDAHLAQQIMTTDFSH 3154
3153
FHDRGVFCNEEVDPFSANLFALAGKRWRNLRNKFTPLFTAGQLRCMMPIILSVGHKLQNV 2974
2973
LEPAAKKQEVLEIRELVSRCVLDIIASVFFGFEANCINDPNDAFIQNLRELQYDGFFNNL 2794
2793
RAAASFICPELLKLTRISSLSPEMIRFVTDIVTKQIEHREKNKVTRKDFIQLLIDLRRED 2614
2613
TNNNEAALGFEECAANVFLFYVAGSDTSTSAVAFTLHELTQNAETMGKLQTEIDEMLVKT 2434
2433
SGELTYDGIKEMSYLDLCVKETLRKYPGLAILNRECTKSYAVPNSDILLKKGTQVVIPLL 2254
2253
AYGMDEKYFPEPDRYLPERFDKSTKNYDEKAFYPFGEGPRNCI (1) 2116
2065
AFRMGVMVSKICLVLLLSRFNFEATRGPKIDFTPSTVALLPKGGIPVKISIR* 1907
>AAGE01047841
AY433537 62% to 6Z2 569650597 622013821 579345058 complete
MLFIYSVALLCIAVTLALKYVYSYWDRHGLPSVKPHIPFGNLKTVVKKTESFGIAIN
QLYWQTKGQLAGIYLFFRPAILVRD
AHLAQQIMTTDFNHFHDRGIYCNEEGDPFSANLFALPGKRWRNLRNKLTPLFTGGQLRGM
MPTILEVGEKLQKHLEPVAERQEVVEIRDIVSRFVLEIIATVFFGFEANCIEDRDDSFSK
VLREAQGERLSAVLRAAAMFVCPGLLRYTGISSLEPQVIAFVSEIVTKQIEHREKNSVTR
KDFIQQLIEIRRGSGENQVPAMSIEQCAANVFLFYAAGSETSTGTIAFSMHELSHHADVM
KKLQDEIDDALAKSNGAITYESVMQMQYLDLCVKETLRKYPGLPFLNRECTMDYKVPDSD
LVIRKGTQLVLPIYGFSMDEQYFPEPECYIPERFEEASKNYDEKAYYPFG
DGPRNCI
(1)
AYRMGVLITKIGLILLLSKFTFEATQGPKMMFSSASVPLLPKDGISLKISN
RKR*
>AAGE01005406
80% to AY433537 62% to 6Z2 complete
possible
pseudogene with frameshift at AVGDKLX X = ct
confirmed
in four trace archive sequences
520668645,
757097876, 589569591, 811977620
5398
MLFVYTLTILSIAITLVLKFVYSYWDRYGVQNIKPHIPFGNLKTVVKKTESFGVAINQLY 5219
5218 WQTKGQLVGIYLFFRPAILIRDAHLAQQIMTTDFNHFHDRGVYCNEEGDPFSASLFSLPG
5039
5038
KRWRNLRNKLTPLFTGGQLRGMMPTILAVGDKLX 4940
4937
KHLEPVAENREPIEIRDIVSRFVLEIIATVFFGFEANCIKDRNDAFCRVLREAQRESMYT 4758
4757
NFRAAAVFVCPGLLKYTGISSLEPEVKEFVSGIVTEQIEHREKNGATRKDFIQQLIELRR 4578
4577 EDSQNQNVRMSIEQCAANVFLFYIAGSETSTGTITFTMHELSQHPEVMKKLQAEIDDTLA
4398
4397
KSNGEITYENVNQIQYLDLCVKETLRKYPGLPILNRECTSDYKVPDLDLVIRKGTQVVIP 4218
4217
LYGISMDEQYFPEPECYKPERFDGASKNYDEKAYYPFGEGPRNCI (1) 4083
4017
AFRMGVLVSKIGLVLLSSKFNFKPTQGPKIVFSPAAVPLVPKGGISLMISRRDK 3856
VADLYMGLHISVVLKVVCS*
>AAGE01054542
476413066 56% to 20199522 76% to 6Y1 579367130
614744104
834925676 complete
MWLVYLVWLVAAVLLAVYLWIKKRFNFWKDRGVEYIEPEFPFGNFKTLGKVEHIAPITQR
HYDYFKQKGVPYGGVFMLTSPLLYILDTKLIKTLLVKDFNHFPNRGVYFNEKDDPLSAHMFAI
EGNKWKTLRNKLSPTFTSGRIKMTFPLVVGVCQQFCDHLGEVVQQSNEVEMHDLLSRYTI
DVIGTCAFGIDCNSFREPDNEFRKYGKIAFDKLPHSPLVVYLMKAFRSYANAFGMKQLHE
DVSSFFSKVVKDTIEYRESNNVVRNDFMDLLLKLKNTGRLEESGEEIGKISFEEIAAQAF
IFFTAGYDTSSTAMTYTLYELALNQKAQEKARKCVLDIFAANNGTLTYESVGNMGYLDQC
IN (1)
936
ETLRKHPPVAILERNADRDYKLPDSDIVIKKGRKIMIPTFAMHHDAEHFPDPE 760
759
RYDPDRFSPEQVACRDPYCYLPFGEGPRICIGMRFGTIQARVGLASLLKRFRFRVCDKTQ 580
579
IPVRYSKTNFILGPANGVWLRVEKL* 505
>AAGE01206812
586027460 593564617 637757183 494307621 complete 38% to 6M1
TC54189
TC23406 TC42024 38% CYP6P3 TC54190 TC23407 TC42025 TC574 TC6535
83% to
TC63333 94% to TC54191 581543219
MFLVILLITLSLYLYQKWIYTYWKRRGVPQLNPSFPFGDVADTFKQRKSYANRLAELHHQ
SASDSHRFVGIYTLFQPILLVTDVELVRRMLTVDFEHFTDRGAHVNEKRDPLSGHLFSLAG
AKWRRMRLKLTPAFTTAKLKAMFPTMMACGRTLSAVIDDHVGRALAIRDLMTRFTMDVIASVGFGLE
CNSMRNPDELFRQMGGRFFSKSWKTSVRMLLAFVAPKVNRYLQVK
LNDDDVEEYMLNLVR
DTIAKREGGGEVRKDFIQLL
()
VQLRNQVEVKDGGSWEMNKVDQNKTLTVEEMAAQSFVFLN
AGYETTSSTVTFCLFELCRNKDLIRKVQEEIDRVMDGGREISYEALAEMTYLESCIDETL
RKYPISPVLFRVCTKPYKIPETDVVIEKDTLVQISLVGLQRDTRYYEDPVKFDPDRYGER
KSETMPHYSFGDGPRVCI
(1)
GLRMGKVMAKMALVELLFRYDFELESPAADSGEIELDPSLLMLQAKHDVKLIPRFRAK*
>617983543
some differences with TC54191 TC42026 44% to CYP6Z3
94% to
586027460
584131270
520119914 760257438 832454533 625082069
complete
39% to 6N1 anopheles complete
MLLPILLVVLVVYLFQKWTYSHWKRRGVPQLNPAFP
FGNVADTFKQRTSYSNRLAELHHQAVRDGHRFVGIYTLL
QPILLVTDVELVKRMLTVDFEHFVDRGAHVNEKRDPLSGHLFSLTGAKWRRMRLKLTPAF
TTAKLKAMFPTMMACGRTLSAVIDDHVGRALAIRDLMTRFTMDVI
ASVGFGLECNSMRNPDELFRQMGGRFFSKSWKTSVRMLLAFVAPKVNRYLQVKLND
DDVEEYMLNLVRDTIAKREGGGEVRKDFIQLLVQLRNQVEVKDGGSWEMNKVDQNKTLTV
EEMAAQSFVFLNAGYETTSSTVTFCLFELCRNKDLIGKVQEEIDRVMDGGREISYEALAE
MTYLESCIDETLRKYPISPVLFRVCTKPYKIPETDVVIEKDTLVQISLVGLQRDTRYYED
PMKFDPDRYGERKSETMPHYSFGDGPRVCI
GLRMGKVMAKMALVELLSRYDFELESPAADSGEIELDPSLLMLQAKHDVKLIPRLRTK*
>NABNU08TR NABNU08 32% to CYP6Z4 59% to 586027460
(no
genomic match) looks like a pseudogene
best
genomic match was to 617983543 at 77%
I do not
think the TIGR database actually has an Aedes seq that is missing from
The 15
million trace files of Aedes, so this may be a contaminant from another
Species
KTLTPFERAAQSSGSQKAFYETTSATGDGSRIERSRNKDLI
GKVQEEIDRVMDGGKGIS*
EALAETTYPESCTEETLRKHPSPPDQDRGGTKPNKTPETDDASEKDTPVQTPPGGTQRDK
REKEDPEKHEPERYGERKPETTPHHSRGDGPRDSTGHRKGKATAKKALAEQPTRNDYEQE
PPAADTGENEQEPSQPTPQAKHEVKQKPRQRAK
>AAGE01003592
512632636 TC63333 TC10785 TC15419 TC26904 TC37692 TC4101
41% to
CYP6P4 83% to 586027460 836033925 753054225 574000494
(n-term
looks identical to 586027460) complete
revised
4/21/06 used AAGE02004393.1, AAGE02030939.1
MFLVILLITLSLYLYQKWIYTYWKRRGVPQLNP
SFPFGDVADTFKQRKSYANRLAELHHQSASDSHRFVG
IYTLFQPILL
VTDVELVRRMLTVDFEHFTDRGAHVNEKRDPLSGHLFSLAGAKWRWMLQKLAPAFTSAKV
KSMFPTMMTCGRTLSAVVGDHLGRALPIRALMTRFTMDVIASVGFGLDCN
SMRNPDEPFHKMGSKFFSKSWKTSVRMLLAFVAPKVNRFLQL
KLNDDDVEEYMLNLVRDTIAKREHGGEVRNDFIQLLVQLRNQVEVEDGGSWEINKVEPNK
ALTVQEIAAQSFVFLNAGYETTSSTITFCLFELCRNRDLLGKLQEEIDEVVDGGREASYE
AITEMTYLEACVEETLRKYPISPVLFRVCTKPYRIPDTDFVIEKGTLVQISLVGLNRDPR
YYEAPLKFDPDRYGERKAETMVHYSFGDGPRGCIGLRMGKVMVKMALVELLSNYDFEMES
PTGENELDPSLLMLQPKHDVILIPKFM*
>CYP6AG3
AF288534 AAGE01003202 TC54102 TC12857 TC2905 TC29599 TC46955
TC9197 62%
to CYP6AG2 48% to 6AG1 complete
Revised
4/21/06 used AAGE02011378.1 211773-218215 (+) strand
211773
MWWTVVGVLGGILSAIYLFLSWNFNCWKKDGIKGPKPRLLFGNL
PNVLTQKKHIFYEYEKIYN (2) 211961
216796
DFKTEPVVGYFSVRTPQLMIREPELIKEVLIKGFRYFSA
NEFSDVVDEKSDPLFARNPFSLSGEKWKTRRGEITPAFTNNR (0)
IKALSTLMDEVCDRMTDHVKKQKESALETKE (0) 217188
217247
LMSKYTTDVVSNCVFAIDAQSF
SKDKPEIREMGRRIMDFNFAA
QIILMVTTFLPSVKKFYKFTFVPREVEQFFIRIMKDAIRHRKENNIVRNDYLDHLLSL
QEKKQISEIDMAGHGVSFFADGFETSSLVMTYCLFDLASHPEIQTRLREEIRNVQATK
GGINYDNIGEMTYLDQVLNETLRIHPIIPVLAKRCTESTVLVGPKDQKIPVSAGTTVV
IPYFVQLDSQYYQEPNKYNPERFSPENGGTKPYRERGVYFPFGEGPRMCLGMRFAIAQ
VKRGIIEIIDKFEISVNSKTQVPLKYEPKMFMLYPVGGIWLNYKPIK* 218215
>CYP6AG4
possible assembled whole sequence 93% to 6AG3
NABUJ77TF = 6AG4
NABUJ77 = 6AG3 57% to CYP6AG2 90% to 6AG3
AY431873 96% to 6AG3 only 2 aa diffs to new 1.231_5
60% to
6AG2 only 45% to 6AG1 complete
DR747526.1
EST 95% to 6AG3 98% (only 3 aa diffs) to AY431873
This seq
not found in the WGS section may be hybrid
Replace
with
AAGE02011379
23932
MWWTVVGVLGGILSAIYLFLSWNFDCWKKDGIKGPKPRLLFGNLPNVLKQKKHIFYEYEKIYN (2) 24120
30535
DFKTEPVVGYFSVRTPQLMIREPELIKEVLIKGFRYFSANEFSDAVDEKSDPLFARN 30705
30706
PFSLSGEKWKTRRGEITPAFTNNR (0)
IKALSTLMDEVCDRMTDHVKKQKEPAVDTKE (0) 30927
30987
LMSKYTTDVVSNCVFAIDAQSFSKDKPEIREMGRRIMDFNFRAQIILMITTFLPSVKKF 31163
31164
YKFTFLPREVEQFFIRIMKDAIRHRKENNIVRNDYLDHLLSLQEKKQISEIDMAGHGVSF
31343
31344
FADGFETSSTVMTNCLFDLASHPEIQTRLREEIRNVQATKGGINYDNIGEMTYLDQVLNE
31523
31524
TLRIHPIIPVLRKRCTESTVLVGPKDQKIPVSAGTTVVIPYFVQLDSQYYQEPNKYNPER
31703
31704
FSPENGGTKPYRERGVYFPFGEGPRMCLGMRFAIAQVKRGIIEIIDKFEISVNSKTQVPL 31883
31884
KYEPKMFMLYPVGGIWLDYKSIK*
31955
>AAGE01024260
51% CYP6AG1 complete
replace
with AAGE02035807.1 1 aa diff to earlier version
3271
MLVTVGLLLTAFAALYLYLTWHFDYWRKRNVPGPEPLPLVGNFPAFFRRNRPVMEEKYQIYK (2) 3456
3518
DYCSKYNFVGIFTNRSPQIFITSPALARDILVKYFKNFHDNEIGLITNKELDPLF 3682
3683 GRNPFVLNGAAWKAKRAEITPAFTASR (0) 3763
3825
IKALYVSVENVCAQMTKYVKEHCESPIEMKELGDKFTTDVVSSCIFGADAQSFIHQDAE 4001
4002
IRDMGSKLMDSSLSFALKMAVMTVLPSVAKIANMSLVSKPREKFFIKLMAEAIRHREESS 4181
4182
EKYLDFLDYLSMLKKEKNITELDMAAHGVTFFLDGNETSSATLSLNLYELAKQPEIQKRL 4361
4362 REELMNATNDDGTISYETLSELPFLEQVFSEGLRLWPPVTFMSKVCTDPIELDLTSTRKV 4541
4542
PIERGTCAIISNWSLHRDPNFYEDPLKFNPDRFAPEKGGIFPYKEKGCYMPFGDGPRQCL 4721
4722
GMRFGRMQVKRGIYEVIRNFEISVASRTSDPLKIVSSPAISLGLSGIWLSFKPIRS* 4892
>AAGE01002325
58% to 6AG1 complete
5532
MFLTITLIVTAVAAIYLYLTWNYNYWKKLNVPGPSPLPGLGSFPSFITQRRPVADEM 5362
5361 DEIYR
(2)
5286
EYKPKYNFVGVFSNRSPRIMITSSELAKDILSKNFKNFHDNEFGEMTNKEIDPLF 5107
5106
GRNPFMLTGDEWKAKRAEITPAFTTSR 5026 (0)
4960
MKALFPLVEDVCSRMTKYVTQNRGSVLDSKELSAKFTTDVVSSCIFACDAQSFTSGKPEI 4781
4780
REQGRKLMEQSFSSFLILLFIINFPTLAKIFKIGLVPKSLEKFFTDLMKEAISHRDASGT 4601
4600
NRVDYLDYLISLRNKKEISELDMAAHGVTFFIDGFETSSVAISFMLYEIAKNPEVQKRLR 4421
4420
KELQKVTTDQGTVSYDSLLELSYLDQVVNESLRLWPPAAFISKKCTEPMDLPLTANQNVT 4241
4240
IGKEICAIINIWSLHRDPEYYDDPLTFNPDRFSPETGGTAPYREKGCFIPFGDGPRQCLG 4061
4060
MRFARMQVKRCLYELVSNFKITVNEKTKQPMKLDPKQFLTMPLGGIWLDFEPISK* 3893
>AAGE01005157
51% to AAGE01002325 48% to 6AG1 complete
3390
MWLIVISILVTIVSLVYHYLTWNFNYWKYRGVPGPLPKPFLGTFSSTFTQKEHPIEENNRIYR 3202
3149
LFREYRKDVPFIGGFSFRSPQLFALSPTLVKDILVKYHKHFRANEVGGTFDSKADPLLAR 2970
2969
NPFFLDGEEWRSKRAQITPAFTNSR (0) 2895
2815
LKALLPIMDNICNNMVSYIDRHIPNGPIESKELSAKYTTDVVSSCIFGAEGGSLTS 2648
2647
DRSEIREMGNALFQQTFMFIVLAVISSIAPILKRFVKLSLIPKSIENYFVGLMTEAVRK 2471
2470
RKASGTKQVDYLDHLINLQEQKEISILDMAAHGVTFFIDGFETTSEVLGFSLLELSIDKE 2291
2290
IQNRLRQEIHSAEDGQLTFETIMELPYLDQIVN (1) 2192
ETLRKWPPAYALSKRCTEEITFRLKDNHEVLIEKGITAILPIWAIHLDK 1990
1989
EFYPDPNRFNPDRFSEEDGGHSVRYYQEKGVFLPFGDGPRACIGRRIGLLQVKRALVEIV 1810
1809
KNYDFTVNSKTVLPIKIDPKNIAVTPLGGIWIDYRKL* 1696
>AAGE01024111
86% to AAGE01002325 complete
3449
MFITITLIVSAVTAIYLYLTWHFNYWKKLNVPGPSPLPGLGNFPSFITQKRPVAEEMDEIYR 3264 (2)
3191
EYKPKYNFAGVFSNRSPRIMITSAELAKDILVKNFKNFHDNEFGELTNKEIDPLL 3027
3026
GRNPFLLDGSEWKAKRAEVTPAFTTSR (0) 2949
2884
MKALFPLVEDVCSRMTKYLIKNRGSVIDAKELSAKFTTDVVSSCIFACDAQSFTSEKPEI 2705
2704
REQGRKLIEQTFSSFMLLLFIVNFPTLAKIFHVGFIPKSMEKFFTNLMKDAVRYRDASET 2525
2524
NRADYLDYLITLKKKKELSELDMAAHGVTFFIDGFETSSVAISFMLYEIAKNPTVQKRLR 2345
2344
QELKKVTTDNGTVSYDSLLELSYLDQVVNESLRLWPPAAFMSKKCTEPMELPLTANRSVT 2165
2164 IGKEVCAIINIWSLHRDPEYFDDPLTFNPDRFSPETGGTSPYREKGCFVPFGEGPRQCLG
1985
1984
MRFARMQVKRCLYEAVTNFAITVNPKTMEPMRLDPKQVLTMPLGGIWLNFEPISK* 1817
>CYP6AL1v2
AY771597 AAGE01003622 complete
40% to 6N1, 98% to 6AL1
476388850
494541913
TC67380
TC35464 TC46198 98% to AY771597, 2aa diffs to 6AL1
MLFLAFAIFVLFAIIQIVYHFRYWMRRGVPQLRPSFPFGDFGEF
FRQKHGIPMTYANIYARTRHLPYVGIYLSMRPVLFVNDPQMVKDILSRDFEHFHDRGL
HVNEETDPLSGNLFSLGGVKWKNMRAKLTPTFTSGCLKGMLAILIDKATVLQKQFAKE
IATHNTIEVKDLFARYTTDVIASVAYGIDNDSINNDHDLFRQMGIKVFQQDFKTSLRL
ALTFFIPKIKALLGFSLVAKDVEDFMINLVSKTIEHRERNGIQRKDMMQLMLQLRNSG
SVSINDQQWNLDSSATVKNLTINQVAAQVFVFFVAGYETSSTLMSFCVWELARNPEIQ
VKVHQEIDSVLSNYGGALTYEALADMEYLECCMEETLRKHPPVSFLNRECTKTYRIPE
TDVIIDKGTAVVVSLLGMHRDPQHFTQPTEFKPERFSSDEQSNESNKAYFPFGGGPRL
CIGMRLGMLQAKVALVTLLAKFEFSLGKEHVKDMELPLKANTLLLVPQDGIQLVVKKR
>AAGE01116725
827542817 63% to 6AL1v2 complete
MLLAFLALSAPLVTVLIWLQFRYWTRCGVPQLDPSFPFGNFSEFFCQKNGIPS
TYANLYHRTKHLPFVGIYLSLRPALLINDPELVKNILTRDFEHFHDRGIHVDEETDPMSG
HLFALGGVKWKNLRAKLTPTFSSGSLKEMFPLLVEKATVLQKRFLKEIATSEVVEVKELAACY
1 TSDVIASVAYGIDMDSINNRDDLFRRMGEK
VLAHDLITSLRLALAFWFPKLKVMLGSKSI 180
181
APVIQEFMTELVRKTIEHREKEGVHRKDMMQLLLQLRNGVSLKRNGVQWTEDSAPKNAIK 360
361
SLSIDEVTAQVMVFFVAGYETSSSTVSFCLFELARHQDIQAKVHQEIDTVLAEHEGNLTY 540
541
ASLASMKYLEQCLEETVRKYPPVAILNRECTKTYRIPETDVIVEKGTPIVVPLMGMHRDP 720
721
QYFPQPNDFQPDRFEGGAQSKAYFGFGAGPRLCIGMRLGILQSKVAVVTLLRKFKF 888
889
SLANPEDQHTELRMKPRSFILTTEGGIQLVVQQRHVCET* 1008
>CYP6AL2 AAGE01012031
494577102 622074403 641800960 755156422 632907779 580010239
42% to 6Z2 complete 51% to 6AL1
MTLLSIGVALLCVAA
FAFLNYVFGYWKRRGIRQLTPHFPFGNFTDLFFGKASFPKVCENLYERSKQWRLLGGY
VLLRPILLVNDPQLAKDIMVKDFQHFHDRGPHVDEENDPLSGHLFSLAGEKWKHLRAKLT
PTFTSGRLKGMFQTLVDTGEVLQEYIQKYAEGEDVVEIREILARYNTDNIASVAFGIKID
SINNPNEPFRHIGRK (0)
VFEPNFRNNMRGLITFMVPKLNKYLKIKSVDDDVEKFILKVVQETLEYREKNGIVRRDMM
QLLLQLRNTGTVSVDERWDVETSDKFKKLTLKEVAAQAHVFFLAGFETSSTTMSFCLYEL
AKHPEIQRRVQAEIDSVTALHDGKLTYDSINDMRYLECCIDETLRKYPPVPVLNRECTQD
YKVPGMDFTIEKGTAIVLQIAGMQHDPQYYPDPMQFKPERFQDPEVKSKPYAPFGDGPRV
CIGMRMGKIQTKVGLCLLLSKFDFELFGHDEPELVMDPNNFVLTPVDGINLKVSCRE*
>AAGE01225620
AAGE01073711 84% to 6AL2 588918478 complete
818
MTPLSIGVALLCVAAFAFLNYVFSYWNRRGVQQLTPYFPFGNFSDLFLGKASFPRVCETL 639
638
YERTKKWRLLGVYILLRPVLLVNDPQLAKDIMVKDFQHFHDRGTHVDEENDPLSGHLFSL 459
458
AGEKWKHLRAKLTPTFTSGRLKGMFQTLVDTGEVLQDYIHTCAKNEEVVEIREILARYNT 279
278
DNIASVAFGIKIDSINNPNEPFRQIGRK (0) 192
120
FFESNFRNNMRLMITFMVPKLNKYFKIKSVDAEVEQFILGMAKETLEYREKNGVVRKDMM
QLLIQLRNTGTVSVDERWDVETSTNSKKLTIGEVAAQAHVFFLAGFETSSSTMSFCLYEL
AKNPEVQRKVQSEIDSVTALHDGKLTYDSINEMRYLECCIDETLRKYPPVPVLNRECTKD
YKVPDSDITIEKGTAVILQISAMHHDPQYYPDPLRFVPERFLDPDMKGKPYAPFGDGPRI
CIGLRMGKIQTKVGLCLLLSKFNFELYGHKESELVMSPNNFLNTPVNGINLKVSCRE*
>AAGE01005840
44% to 6AL2 810047144 637194488 complete
MLLCLLILGSIATFYLFLHHHYSYWKRRGISQLKPSFAFGDFGPVIRGRANFVHHLQGIYERTK
RDYSLLGLYVLFRPALLVNDFVVARDILSRDFQHFGDRGIYVDEKRDPFSGHLFALDGER
WRHVRHKVAPAFTPLKLKDVFQTQLIGGVVLQDHLKHFAESGQSVDVADLFLRYSVDMIA
SVAFGVEIDSVNCPEEQFYRVAHSSVESNVKNLLRWTGGFLIPKVLKYTGTR
LVDQHVQDFFMHVVQQTVEYREKTGFTRRDVLQSLLKIMNAESQNVSI
185
186
DFTITDLTVTAFTFLLAGMETSSSTATFCLYEIVNNQEIQRRLQKEIDESLQEHDGL 356
357
ITYDSVVAMKYLDHCVNEAMRKFPALAYLHRICTEDYLVPSTRTIIKKGTLVLIPIYALQ 536
537
RDQEFFPHPDLFLPDRFNDPEAIRQAPFFPFGEGPRSCIGQRMGKMNVKIALVHLLSRYN 716
717
FTLANPVDQGREAPIDPLHFTISPQGSFNMNVTHRKCSSPSSHSKSLTNHSVSLAH*
>AAGE01173027
TC56435 TC16115 TC27418 TC40206 TC8341 56% to CYP6N1v2
494318851,
join with 223483845 complete
replace with AAGE02015839.1
note: ESTs DV300013.1 DV262803.1 have CVG not CIG at
heme region
33781
MIALLLIGAVTLVFLFVKQRFNYWKVRGVPYVRPTFPLGNLWGIGTKKHLSEGLEDLYVQ 33602
33601 LKGKAQLGGIYFFINPVVLVTDLDLIKTILIKDFNFFHDRSIYYNEKDDPLTAHLFTMEG 33422
33421
IKWKNMRVKLTPTFTSGKMKLMFPIVRDCANELEKCISKEIVDGKEIEVKDILARYTTDV 33242
33241
IGNCAFGLECNSLHNPNAEFREMGRKVFQLQGLGFLKLLLTQQFSTLSRALGATVLQPDV 33062
33061
AKFFLKTVSDNVDYREKNKIERNDFIDLMIKLKNGQTLEHDKSDQRVEKLSIEQVAAQSF 32882
32881
VFFFAGFETSSTLMSFCLYELAQNQDLQDKARKDILDTLNKHGSLSYEAVHEMKYLENCVS (1) 32699
ETLRKHPPASNIFRTATQDYTVPGTSLTIEKGTSVMIP 32522
32521
TLAIHRDPEYYPDPMKFDPDRFTADQVAARHPFAFLPFGEGPRVCIGMRFGLMQARVGLA 32342
32341
TLLKNFRFTVGERLETPAQLDPSSAILLIKGGLWLKVDKI* 32219
>AY433475
AAGE01031181 complete
52% TO 6M4 519940462 2246449 DR747763.1
821767964
627434636 90% to 6M5
MEPITIILVTILVLLLTYGFHLIRRQLRFFXDHNVPH
IAGNFVLIDKTQHPANHFLRWYKQSKGQYPLTGVFMFIKPIAIPLDLELIKRILVKDFQY
FQNRGMYYNERDDPLSAHLFSLEGAKWRSLRAKISPTFTSGKMKMMYPTMMAAGKQFSEH
LEEKMSEENELEMRDLLARFTTDMIGTCAFGIECNSMKEPNSKFREMGRKHFES
PRSGLKDLLKITAPGLAR
FFGVTEILPDVAEFFMDVVKSTVEYRMKNNVRRNDFMDLLIAMLDDETEGSESLTISEIA
AQAYVFFIAGFETSSTTMTWALHELSRNPEIQEEGRKCVQEVLEKYNGVMSYEAIMEMTY
IDYIIN (1)
ETLRLYPPVPLHFRVVTKDYPVPGTDTVLPAGTFTMIPVYAIHHDEDIF
PEPEKFDPTRFTPEEVSKRHAYAWTPFGEGPRICIGLRFGMMQARIGLALLLNNLRFSPG
PKSCTKMEFQPENLILTPKQGLWLKVEKV*
>AAGE01032555
AAGE01493222 (4 aa diffs) 476379758 88% to AY433475
570738080
578795972 58% to 6M2 complete
MEVITITLLTILVLLIAYASHLLRRQIRFFKDRNVPHIPASF
ELLDKTIHPAKHFLRWYKQFKGQYPLTGVIMFIKPIAIPLDLDLIKRILVKDFQYFQNRG
IYYNERDDPLSAHLFSLEGAKWRSLRAKISPTFTSGKMKMMYPTMVAAGKQF
558
557
SEYLEEKVEDGNELEMRDLLARFTTDMIGTCAFGIECNSMKEPNSKFREMGRKHFEA 387
386
PRNALKDAFKMTAPGLARFLRVTEILPDVSEFFMDVVKSTVEYRMKNNVRRNDFMDLLI 210
209
AMLDDKTEGSESLTINEIAAQAYVFFIAGFETSSTTMTWALHELSRNPDI
QEEGRKCVQEVLEKYNGVMSYEAIMEMTYIDQIIN
(1)
ETLRLYPPVPMHFRVVSKDY
HVPETDTILPAGTFTMIPVYAIHHDEDIFPEPEKFDPTRFTPEEVNKRHAFAWTPFGEGP
RVCIGLRFGMMQARIGLALMLKNLRFSPGPKTCTEMEFQPQNFILSPKEGLWLNVEKI*
>CYP6M5
AAGE01133741 494330821 73% to AY433475 578801721 826022155 639077358 760273799 581855956 complete
MEVITITLLTILILLLIYVLHLLRRQIHFFKDRNVPYKPASFERLDKTIHPAMHFLRWYKQFKG
QYPLSGVFMFIKPIVIPLDLELIKRILVKDFQYFQNRGIYYNERDDPLS
AHLFSLEGAKWRNLRAKISPTFTSGKMKMMYPTMVAAGKQFSEYLEEKVGDGNELEMRD
LLARFTTDMIGTCAFGIECNSMKEPNSKFREMGRKHFEAPRNVLKDAFKMTAPGLAR
FLRVTEILPDVSEFFMDVVKSTVEYRMKNNVRRN
DFMDLLIAMLDDKTEGSESLTISEIAAQAYVFFIAGFETSSTTMTWALHELSRNPDIQ
EEGRKCVQEVLEKYNGVMSYEAIMEMTYIDQIIN
ETLRLYPPVPMHFRVVSKDYHVPETDTILPAGTFTMI
PVYAIHHDEDIFPEPEKFDPTRFTPEEVNKRHAFAWTPFGEGPRVCIGLRFGMMQARIGL
ALMLKNLRFSPGPKTCTEMEFQPQNFILSPKEGLWLNVEKI*
>CYP6M6
AAGE01004894 complete
476324109 637742538 512549238 568770347
5 aa diffs
to 6M6 476418676 494093520 66% to AY433475
MDVFLLIAAFVLLVAYGLHLLRKQVNFWADRNVPHNPVNFRQTVDQTVHMARRFQGYYHQFK
GQYPFAGMYLFTKPVALAIDLELLKCIFVKDFQYFHDRGTYYNEKDDPLSAHLFNLEGN
KWRNLRSKISPTFTSGKMKMMYPTMIAAGKQFSEYMDEKVGVEQELELKDLLARFTTDVI
GMCAFGIECNSMKDPNAEFREKGRMHFETPRNRKKDMMCSIAPKLARMMGLKQIIPDLSD
FFLGVVRETIDYRVKNGVRRNDFMDLLIGMLTGENVELGP
LTFNEVAAQAFVFFVAGFETSSTTMTWALYELSVNQDIQEKGRKCVRDVLEKYNGEL
SYETIMEMSYIDHILH
(1)
ETLRKYPPVPVHFRIVTKDYKVPNTETVLPAGTSVMIPVYAVHHDPEIFPDPK
RFDPDRFTTEEINKRHPYAWTPFGEGPRICIGMRFGMMQARIGLALLLNNFRFSSG
KKSTVPLDFTAKSFILSPDEGLWLKVEKL*
>AAGE01105997
63% to 6M6 822931992 complete
MMEPLDVAITVVMVALAVYMYLDKKHSYWADRKVPFVKPKFFYGNAKEISQTMQ
VGQVFQQFYHELKGRSPFGGIYMFTAPVAVVTDLELLKCIFVK
4
DFQYFHDRGTFYSEKGDPLSAHMFNLEGNKWKMLRNKLSPTFTSGKMKMMFPTIVAAGK 180
181
QFHDFMDEKVKQESEFELKDLLARFTTDVIGMCAFGIECNSIKDPDAQFRVMGRKLFTTG 360
361
RSKPKSFLMNTMPKVAKLLRLRIFPADVSDFFMKVVRETIDYRMANNVHRNDFMDLLIQM 540
541
RNPDENKSSEGLLSFNEIAAQAFVFYLAGFETSSTLLTWTLYELAVNQDIQEKGRQHVKE 720
721 VLKKHDGEMTYESITSMKYLDQILN 795 (1)
860
EALRKYPPVPVHFRETSKDYTVPDSNIVIEGGTRLFVPVYAIHHDPEIFPNPEQFNPDRF 1039
1040
TPEEEQKRHPYAWTPFGEGPRICIGLRFGMMQARIGLAYLLNSFKFSIGEKCKVPLEFDV 1219
1220
KSFILAPKGGLWLKVEKI* 1276
>476419050
52% to CYP6N3 637792107 793200622 complete
AAGE02015839.1
4/21/06
102508
MWIFLLLSIAVLLILQVRRKYSYWKRHGVPFIQPRFPFGSITPVGDRVHSSQLMARFYNQ 102687
102688
LKGTYPFAGMYFFTNPVVLALDLDFIKNVLVRDFQYFHDRGLYHNEKDDPLTCHLFNIEG 102867
102868
TKWTNLRRKLLPTFSSGKMKMMCPTILAIADRFRTAIENSISDQNEIEMRDFLARFTTDV 103047
103048
IGTCAFGIDCNSLENPDAEFLKMGNKIFEVPTSRIIAYFFVSTFQELSKKLHIKAVPEDV 103227
103228 SRFFYKVVRETMAYRQSSGVQRNDFMNLLMQLKEKGELEGSDEKLGTLTLDEVVAQAYVF 103407
103408
FLGGYETSSTNMCFCLYELALNGEIQEKARECVQKAVAKHGGLNYEALMDMPYLEQCIY (1) 103584
EALRKYPPIANLFRSVTQDYNVPNSNVMLPKGMNVWIPIYAIH 103767
103768
HDPEFFPEPELFDPERFTQEECEKRKPFTYMPFGEGPRTCIATRFGMMETKTGLATLLMN 103947
103948
FKFTKSARLEVPPKFSTKHVMLTPVGGLWVKVEKIEQ* 104061
>AAGE01021887 66% to
AAGE01005098 complete
replace with AAGE02015839.1 4/21/06
112574
MWLNLLVMVFALSVILVRRRYSYWKRIGVPFIQPRFPLGSIGSIGTRIHSSQLLAQFYQQ 112753
112754 LKGSHPFAGIFYFLQPVALALDLEFVKNVMVRDFQYFHDRGLYYNEKDDPLSSHLFNIEG 112933
112934
TKWTTLRRKLVPTFSSGKLKMMCPTVVSVADRFKMCIEKSIAKEEAIEMRELLARFTTDV 113113
113114
IGSCAFGIECNSLENPDDKFRKMGEKVFDVSPFAILAFFFLSTFKDLARKCRISITDSEV 113293
113294
AAFFSTIVQKTITYREKNNVQRNDFMNLLMQMMKKNKEDESEENSVTLTLDEVVAQSYVF 113473
113474
FLGGFETSRTTMSYCLYELSLNQEVQNRARKCIQSAVAKHDGLNYEALMDMPYLEQCIN (1) 113650
113709
ESLRKYPPISNALRSTTKDYAVPGTEVILKKGTDVIVPIYAIHHDPEYYPDPELFDPD 113882
113883
RFSADQCAKRKPFTFMPFGEGPRMCVASRFGMMETKIGLAAMLMSFRFSKCEKSIVPLKI 114062
114063
SPNHLMLTPAGGLWLKVEQLESDETEMGFSKLISDERVNRLGYSM* 114200
>AAGE01004071
494535013 complete
55% to 476419050 56% to 6N2
TC62330 TC28072 TC40767 58%
to CYP6N2
MWNLSTSSNPHTIVPAHPKEILLLATSSVPFIPARFPVGSFDGVGVRNHPSQLLAKFYR
QMKGLHPFVGVYYFLQPVVVVLDLDFAKTILIRDFQYFHDRGLYYNEKDDPISGNLLHLE
GSRWTNQRKKLIPTFSSGKLRMMCPTILKVADNLKVSFERYVAERDEIEIKDILARFSTD
VIASCAFGLDCSSLLEADDEFRRMGTKVFDISGWKLLKLFFVFAFGNVARRCHMKLIDED
ISQFFFKVVRETIDFRKKNHVHRKDFLNLLIQLKDNG
ELEGSNEKLGTLTLNEVVAHSFVFFLGGFETASTTMSYCL
YELSLNEEVQERARQCVKAAIHKYGDLNYDDLLDMPYLEQCINETLRKYPPSTIYRIVTQ
NYHVPDSSIVFPKGMSVMIPVYAIHHDPEFWPSPELYDPDRFAPEECVSRNPLTFIPFGE
GPRMCVAARLGVLQTKIGLATLLMNFRFSRCKNSTEPLQYSPKHFILTPVGGLKMRVEKI
Q*
>AAGE01078584
494579395 48% to 581602077 46% to
6N2 586209056 568771938
574225739
pseudogene (sequence does not continue)
1188
MLLYLLLTVVTLAYLWIGRRYSYWKQRSVPYVEPRFPFGNLQGLNKRHFGLLAQDVYSKL 1367
1368
KGSGSKFGGMFFFVNPVAVILDLDFAKDVFVKDFQYFHDRGVYSNEKVDPITSHLVAMEG 1547
1548
IKWKNLRAKLTPTFTSGKMKMMFPTITAVADEFRKCMVNEVDKGGEIEMKEFLARFTT
DVIGSCAFGLECNSLADPEAEFRKMGKKALTMSPMGFLR
525
524
RILSVTFRDLAKFLGVRISDPDVATFFMNVVRSTIEYRERNKVQRNDFMDLLIKLKNVEP 345
344
IDENTNQLGPLTFNEIVAQAFVFFLAGFETSSTT 243
MCFCLYELAKNQELQDKARRNIDEVLAKYGTMTYEAVH
EMRYMENCIN(1)
ESLRKYPPLPNILRNVNKPY
>AAGE01020246 79% to
494579395 58% to 6N1 complete
4381
MLLFLLLSVVTAAYLWVIWRYSYWKRRSVPYVEPSFPFGNLQGLNKRHFGLLTQDVYSK 4557
4558
LKGTGCKFGGMFFFVNPMVVILNLDFAKDVFVKDFQYFHDRGEYSNEKADPIMAHLVTME 4737
4738
GTKWKNLRTKLTPVFTSGKMKMMFPI
4816
ITAVAEEFRKCMAKEADKGEDIEMKELLARFTTDVIGNCAFGLECNSLMDPEAEFRKMGR 4995
4996 KAMAMSSADFLRRKLCNSFRGLAKLLGVRLSDPDVSDFFMNAVRSTIEYRERNKVQRNDL
5175
5176
MDLLIKLKNAELIDEKSDRLGPLTFNEIAAQAFVFFLAGFESSSTAMSFCLYELAKNQEL 5355
5356
QDKARRNINEVLVKHGTLTYEALYEMTYIENCIN 5457
5518
ESLRKYPPVTNIVRNVSKPYRVPGMNVTLEEDCRVLLPVYAIHHDPSLYPNPDQFDPERF 5697
5698
NPENSAARHPMAFVPFGEGPRICIGLRFGSMQARIGLTYLLKNFRFTLSEKMHDPLKMMS 5877
5878 NTIILASEGGLWMRIEKL*
5934
>AAGE01192518
80% to AAGE01020246 637742736 (758bp upstream do not have more P450 seq)
this break
is not near the usual intron boundary CIN/ESLR. This is a probable
pseudogene
fragment.
1554
YQVPGMNVTLEKGCRVLLPVYAIHQDPKLYPNPEQYDPDRFNPENSAARHSMAFVPFGEG 1375
1374
PRFCIGQRFGMMQARIGLTYLLKNFRFTLSEKTPSPLKILANSTVLASEGGLWLKLEKL* 1195
>AAGE01052546
494130149 64% to 76419050 615888679
12 ADRGLYYNEKDDPISCHLFNIEGSYWTNLRKKLSPIFSSGKLKLMCPMVITIAERFQKCL
191
192
SKSITQNQQEAEMKEWLNRFTIDVIGTCAFGIECNSLTNPEEKFRKMGVKMFHVANSR 365
366
IIKFFFISLFKNLAKKVHIKSVPEDVSEFFFKVIRKTIAFREMNHVLRNDFINLSMQLMA 545
546
DGKLEGSDEDVGKITLNEVVAQSFVFFLAGYETSSTVMMFCLYELSLQEDIQRRARENV 722
723
ITAVSRHGGLNYDALMDMGYLDQCVN
ETMRKYPPAGNLGR 899
900
CVTKDYNIPNTNITLRK
EGPRNCIAARSGMLMAK
1136
AAGE02018066.1
Length=351875 use this seq
319978
MWIYLLIGIITSLVLFVRRKYLFWERQGVPFIKPKFPFGNLLVNGKRVHTSQLTTYYYNA 319799
319798
LKGKKHPIGGVFFFTTPFAVVLDRELMRNVLIQDFQHFHDRGLYYNEKDDPISCHLFNIE 319619
319618
GSYWTNLRKKLSPIFSSGKLKLMCPMVITIAERFQKCLSKSITQNQQEAEMKEWLNRFTI 319439
319438
DVIGTCAFGIECNSLTNPEEKFRKMGVKMFHVANSRIIKFFFISLFKNLAKKVHIKSVPE 319259
319258
DVSEFFFKVIRKTIAFREMNHVLRNDFINLSMQLMADGKLEGSDEDVGKITLNEVVAQSF 319079
319078
VFFLAGYETSSTVMMFCLYELSLQEDIQRRARENVITAVSRHGGLNYDALMDMGYLDQCV 318899
318898 N (1)
ETMRKYPPAGNLGRCVTKDYNIPNTNITLRKGLNVVIPVH 318719
318718
GIHHDAEYYPDPERFDPERFSAEESTKRLPFTFMPFGEGPRNCIAARFGMLMAKVGVASM 318539
318538
LMRFQFSKCSKTAVPLVISPKHASMSPEGGMWLKVKEIK* 318419
>AAGE01005098
6263502207 69% to 6N2
TC55162
TC31642 TC43307 584989096 579853579 complete
MWIYLLIAAITLSVLLVR
232
RKYSYWKRHGVPYIKPTFPFGNIRPAGNRVHSSQLMTRYYNELKGKHQFGGIFFFTNPV 408
409
ALALDLEFIKDVLVRDFQYFHDRGMYYNERDDPISGHLFNIEGTQWTNLRKKLLPTFSSG 588
589
KLKMMSPTIISVAERFQECLEKCITVDTEIEMKDLLARFTTDVIGTCAFGIDCNSLN 759
760
DPEVEFRKMGNKMFELP
TGRILKFFFISTFKNLARKARLKSVPEDVSEFFFRVVRETIDYREKSHIQRNDFMNLLMQLREKGALE
GSDEKVGTLSMNEVVAQAFVFFLGGFETSSTTMSYCLYELSLHEDIQERARECVQSAIAK
HGGFNYDAVMDMNYLELCIN
(1)
ESLRKYPPGAN
LVRCATKDYQVRNSSVVFKKGMSVMVPIYAIHHDAEYYPDPERYDPERFGVEELAKRPPF
TFMPFGEGPRICIAARFGMMESKIGLAALLMNFKFSKCSKSIVPLVISNKHVVLTPAGGL
WLKVEKLEQ*
>AAGE01273771
520199522 728739223 86% to
CYP6N4v1 578623429 complete
MLIYLTVLALTLAVLWIRKRYSYWMDRGILYVEPS
FPAGNLRGMGRKEHLSSQMQRCYKELKGKGPVGGMFFFINPVALAMDLDLIKSVLVKDFQ
YFHDRSVYYNEKDDPLSAHLFTMEGAKWKNLRAKLTPTFTSGKMKMMYPTIIGVADEFQK
LMKSEVSS
NAEIEMKEILARFTTDVIGTCAFGLECNSLHDPDAKFRAMGRKIFSFANGRF
LKAVIAQQFRSLARSLHIALVDKEVSDFFLGAVRDTIKYREENKIERNDFMSLLMKLKDD
GNTGNTETLTVEEIAAQAFVFFLAGFETSSTAMSYCLYELAQNSDLQNKARKSVMDSIKK
HGSLTYEAMQDMQYIDQCIN
(1)
ESLRKYPPASTLTRSVSKDYKLPNSNVVLQQGSTLIVPVYA
LHHDAEYYPDPEKYNPDRFTPEEVAKRNPYCFLPFGEGPRICIGMRFGMMQARVGLAYLL
RDFSFTLSSKTPVPLKISPRSPVLTSEGGLWLKVQKL*
>AAGE01504815 581602077
91% to CYP6N3v2 753204063 815151384 632872571 complete 579754790 TC62375 TC16009 TC23365 TC47533
TC56452 TC50555 6N6v1 6T1.4
MWIYLTVLALTLAVLWVRKRYAYWKERGIPYVEPSFPAGNIR
GMGRKEHFSTQMQRCYKELKGKGPVGGVFFFINPVPLALDLDFIKTVLVKDFQYFHDRSI
YYNEKDDPLSAHLVALEGAKWKNLRTKLTPTFTSGKMKTMFPTIIGVADEFQKMMKNEVV
GNTEIEMKDILARFTTDVIGTCAFGIECNSLQDPNAQ
FRRMGRKIFSVAK
GRLLKLITAQQFRSLARMLGITLIDK
DVSDFFIGAVRDTIKYREENKIERNDFMSLLMKLKNDESSQDTNSGDVE
TLTVEQIAAQAFVFFLAGFETSSTAMSNSLYELAQNSDLQNKARKSVMDAIKKYGSLTYE
AMQDMQYIDQCIN
(1)
ESLRKYPPASNLTRTVSTDYKLPDSNVVLQQGSTLIVPVYALHHDA
EYYPDPEKYDPDRFTPEEVAKRNPYCFLPFGEGPRNCIGMRFGMLQARVGLAYLLRDFSF
TLSNKTPVPLKISPHSPILTSEGGLWLNVRKL*
>AAGE01026936
476398858 46% to 476419050 48% to 6N1 584294086 819690004
complete
MSALLIILALTPLFLFIIYVK
672
QKYAYWARRNVPFLKPHFPYGNFEALDRKSIADVAREAYEEMKNRGPFYGAYFFLQPL 499
498
ITITDPDLIKMVLIKDFNTFPDRGLYFNERDDPLSAHMFAIEGNKWRSLRQRLSPTFTSG 319
318
KMKMMFPTLAAVGDQFSAFLDEEIGSGKVVEVKDFMAKFTTDIIGSCAFGIECNSFK 148
147
DPHGRFRQFGKMVFETPVHGSLVRFALKSFPEISRRLRIK 28
ALHEEASKFFYGVVEDTVKYREKNGVERKDFLSLLIDMKKDGVDFT
MDEIAANSFIFFGAGFETSSSNQTFCLYELARNPECQDKARQSVLDALRNHGGMTYDAAC
DMQYLDQCIN
(1)
ETLRLYPSVPVLERRAFQDYKIPGHDVVIPKGMKINIPAYAI
QRDERFYPDPDVFNPDRFHQKEVAKRHICTFIPFGEGPRICIGLRFGMMQSRVGLATILS
KFRISICSETANPLEYSSKTSVLIPKEGLWLRVDPL*
>AAGE01569058
TC56593 TC20604 TC28568 TC51124
56% to CYP6AA1 834896125 complete
MGLYNTVLYLVLPIVWLLYTYFRRKYSYWADRNVPQVPGSL
PLGSFNGMGTKYHFVDVLKRVYDTYHKTHKAIGMYLSVKPILFVSDLDLIKKILVKDFNS
FRDRGMYYNEKDDPLSAHLFSIEGERWRFLRNKLSPTFTSGKIKYMYLTICE
IGEEFLACFDKYLDRKEAVDIKPLAQRFTSDVISSVAFGLKTNALKNEGSELLNKGDSVF
KPGRWETIRIFALLSYRDLAKKLGLRQFPRDVTDYFMDIIRGTVDHREKTNVMRQDFLQL
LLKLKNKGTIEDHEEESKEKITLNELAAQAFLFFFAGFETTSTTVSFALFELANNAEVQEKTR
QEVQRVLAKHGGHLTYDAIKDMTYLEQVVNETLRKHPPVGNLIRLANDPYRIDSLGTDIE
RDTMIMIPVHAIHNDPDIYPDPERFDPYRFTPEAINARHSHCFIPFGDGPRNCIGMRFAL
VEVKFGIAQLLTRLRFTVNEKTQFPVRYDPKSQFAEVKGGIWLNVERI*
>223495136
AABUM55TV.gz 59% to TC56593 40% to 6aa2 pseudogene?
Sequence
has no exact match, 78% to 615844728(TC56593 see above)
KTLLNHPPFFNLILLLNYPYLIHSL*TFF
487
QQNTIIMIPFHTIHNYPNIYPYP*RFYPNQFTP*SINSHHSHTFIPF*YPPLNCISIPFS 308
307
LLHLNFVIAHLLTKLRFTSNHKTHFKNRYDPKSRGAVVEGGIWLKVE 167
>AAGE01185776
223460790 57% to TC56593 53% to 6AA2 78% to 223468847
636165685
580089056 637123288 complete
MAFLFTTLCLLLPLLGLLYYYVRRKFAYWADRGVPYVPGSLPMGSFNDMGSTKHIVELLDAIYKQYRNTHK
AVGMFLSINPILLAVDLELVKQILVKDFNSFHDRGMYFNERDDPLASHMFSVEGERWRFL
RNKLSPTFSSGKIKYMFLTVREIGLEFLASFEPFMERKEPVEIGIQAQKFTCDVIGSCAF
GLSCNALKDESTELLDIADRVFNPKPLEMMYMLLLICFRKWAVKLRLKQTPADIERFFV
NMVRKTVEHREKNNITRPDFLQLLMQLKNKGTLEESEEDSKETISMNDVIAQA
682
681
FLFFFGGFETSSKALSFALFELALNPELQEKARDEVLRTLDKHDGLLTYEALKDMTYVEQIVH (1)
ESLRKYAPIGNVIRKANEPYQIHSPDIILEKGTMVM 325
324
IPVHSIHHDPEIYPDPSRFDPDRFTPEAISARHSHSFLPFGDGPRNCIGMRFALLEVKFG 145
144
IAQLLSRLRFTVNEKTQLPLRYDPKANVASALGGLWLDVERI*
AAGE02023125.1
Length=57153 use this seq probable allele of upper seq
97%
identical 12 diffs
15290
MVFLFTTLCLLLPHLGLLYYYVRRKFAYWADRGVPYVPGSLPMGSFNGMGSTKHFVELLD 15111
15110
PVYKQYRNTHKAVGMFLSINPVLLAVDPDLVKQILVKDFNSFHDRGMYFNERDDPLASHM 14931
14930 FSVEGERWRFLRNKLSPTFSSGKIKYMFLTVREIGLEFLASFEPFMERKEPVEIGIQAQK 14751
14750
FTCDVIGSCAFGLSCNALKDESTELLDIADRVFNPKPLEMMYMLLLICFRKWAVKLRLKQ 14571
14570
TPADIERFFVNMVRKTVEHREKNNISRPDFLQLLMQLKNKGTLEESKEDSKETISMNDVI 14391
14390
AQAFLFFFGGFETSSKALSFALFELALNPELQEKARDEVLRTLDKHDGLLTYEALKDMTY 14211
14210
VEQIVH (1) 14193
14133
ESLRKYAPIGNVIRKANEPYQIHSPDIILEKGTMVMIPVHSIHHDPEIYPDPSR 13972
13971
FDPDRFTPEAISARHSHSFLPFGDGPRNCIGMRFALLEVKFGIAQLLSRLRFTVNEKTQL 13792
13791
PLRYDPKTNVASALGGLWLDVERI*
13717
>223468847
73% to 223460790 I-helix and end 62% to 223460790 pseudogene
AFLFFFVGFSTSFTPFSFSLFEFALFPQLRGVARDRFLRTLDDHDVLFTFAALIDLTYVALIFH
(?)
DSLP*FAPFRYVFREAYVPFQFHSPDFFLG*S
490
TIVMIPFHSFLHDPEFFPDPSRFVPDRFSPEAISALHSHSFLPFGDGPRNCFGMRFALLE 311
310
VKFGIAPFLPPFPFSFPPPPPLPLRFVPKANVASTLPGLCFPVDLI 173
>AAGE01408667
TC66947 TC28477 TC49577 55% to CYP6P4 TC67159
TC33540
TC39062 593244050 519827477 50% to CYP6P4 complete
Revised
May 15 2006, Trace files support the cyan seqs
MAILELYLAIGVTLVLATAGCVFLFLDKKRSFWKDRNFPCTGRAKMIYGDYKNMNQT
EHMQYINQRIYNEFKARKLPIGGTVLFLVPSTVVVDPDLIKAMLVKDFNFFHDRGVYNNP
EVDPLTGHLFSLEGQAWRQLRAKLSPTFTSGKMKMMFSTIL
SVADDLKEFLLEKTESGPTELEMKNVLAGFTTDV
IGSCAFGIECNSLRATHCRFREVSRKIFEQSVGQMLWMIVLMLFK
GVATKLKLKATPAEVENFFTNMVQETIDHRERNNVQRSDFMNILIQMKNSTNLEEKLTLN
EITAQSFIFFVAGFETSSTTMVNCLFELAMNPDIQEKLRAEIFKVC
GEGDLTYESVSSVEYLNMVIDETLRKHPVVDSLLRTSTQPYNIPNTDLKIPKGTF
VFIPVHALHHDPEYYPDPDRFDPERFNAENRASRHPFVYLPFGEGPRNCIGMRFGLMQTR
VGLITVLRNFRVRPSSNTPERLVVNPKSGIPAPLGGIPLLIERI*
>AY432230
AAGE01011017 52% to 6P4 583641208 589587999 CR937398.1
CR937850
CR937397.1 CR937849.1 complete
MDPVTVILTIFVGLTGLVYFFLRREQQKWPRLGVPFAKNPH
LLFGNVRGIFQKEHSCEILQRLYWEFKGRGLKLGGIMNFFQPAVLVIDPEISKSILVKDFNKFHDRGIFVDPAGDPLSANLFSLEGAQWKAMRTKMSPTFTSGKMKYMFESVLNVAERLKDYLAENCLKEDIELKNILQRFTMDVIGNVAFGV
ECNSIKNPSSEFRLMGLKANRFDGVRFLKFFIGGAYKNFAKKIKLKVVEDDVHKFFMSLV
HSTVHYREGNNVKRNDFLNLLMEIKNKGKFSDEPNSGGEGITMNEIAAQCFIFFTAGFET
SSTTINFCLYELANNPDIQDRLRNEIEDVVAKDGGELKYDTLLGMNYLDRVVS
(1)
ETLRKYSAVDNLFRISNSPYTPDGCNFTIPAGTLFQIPIHSMHHDPEYFPDPGRFDPDRF
LPEVAKSRHPYCYLPFGEGPRVCIGVRFGLMQTKIGLVTLLRDFRFGPRSETPDRLQFEA
KTFVLTPQTGIYLKIEPIGI*
>AAGE01083421
476356097 39% to CYP6AD1 587854394 570810577 complete
MISGTVCILLVLANVAFLVLFVR
GVLQSRQVYWVRRRIPFVAWPHLLFGNVRRLWRHEHSSTIGQRLYRDLKARRLAAGGFNL
LVSPSILVADPDLAEEVLVGNVRRFPDRGL
HVDAEVDPLSETLFALRGNRWKDKRNR
LAPVFSEETLKPV
FRMVASFADELRKEISINLDRRLQDVQEWVSRYVTQVMGKSVFGM 202
203
RCRMMQDPNTDFRRYGRISTELSWLLLLKNWIGVTMPWMARKVGLRITDATVEKFYVDLC 382
383
RSNVLVRESYKVKENDILQLFMRLREARQLTME 481
482
ELTTACYSFVKHGMEPCTSVMTFCLYELAKNLSIQKRLRDEISHNLEDTDGQ 637
638
LTYDVIMSMNYLDQVVN (1) 688
745
ETMRKYPPVDFIYRRSSQSRDNIPQGTLFVIPVYAFHHDPDHFPAPENF 897
898
DPERFTAKQARTRHPYCYLRFGAGPRECLGAR
FGLLVVKAGLVTLLRRFRFAMPEELVHEKLQFKPNASVLSPVEGSVRLRVETI*
These trace files match 100% MISGTVCILLVLA
gnl|ti|591523435
gnl|ti|587360321
gnl|ti|585826792
gnl|ti|578932662
gnl|ti|570810577
gnl|ti|576970754
These
trace files match 100%
HVDAEVDPLSETLFALRGNRWKDKRNRLAPVFSEETLKPV
gnl|ti|639160181
gnl|ti|591799078
gnl|ti|591523435
gnl|ti|576970754
gnl|ti|570810577
These trace files match 100%
RLAPVFSEETLKPVFRMVASFADELRKEISINLDRRLQDVQEWVSRYVTQVM
GKSVFGMRCRMMQDPNTDFRRYGRISTELSWLLLLKNWIGVTMPWMARKVGLRITD
gnl|ti|639160181
gnl|ti|591523435
gnl|ti|578800786
gnl|ti|570810577
gnl|ti|476356097
These trace files match LAKNLSIQKRLRDEISHNLEDTDGQLTYDVIMSMNYLDQVVN 100%
gnl|ti|793209208
gnl|ti|637742971
gnl|ti|587849418
gnl|ti|578800786
gnl|ti|476356097
gnl|ti|567212773
These trace files match FHHDPDHFPAPENFDPERFTAKQARTRHPYCYLRFGAGPRECLGAR 100%
gnl|ti|476356097
gnl|ti|793209208
gnl|ti|614704229
gnl|ti|637742971
gnl|ti|588905694
gnl|ti|743515336
gnl|ti|587854394
gnl|ti|587849418
These
trace files match CLGARFGLLVVKAGLVTLLRRFRFAMPEELVHEKLQFKPNASVLSPVEGSVRLRVETI100%
gnl|ti|614704229
gnl|ti|637742971
gnl|ti|588905694
gnl|ti|745128895
gnl|ti|743515336
gnl|ti|587854394
>1.1327_5 AAGE01083421 JP’s version 17 diffs to AAGE01083421, new gene
MISGTVCIVLVLANVAFLVLFVRGVLQSRQVYWVRRRIPFVAWPHLLFGNVRRLWRHEHSSTIGQRLYRDLKARRLAAGGFNLLVSPSILVADPDLAEEVLVGNVRRFPDRGLHVDAEVDPLSETLFALSGNSWQDKRNQLTPVFSEETLKPVFRMIASFADELRKEISKNLDRRLQDVQEWVSRYVTQVMGKSVFGMRCRMMQDPNTDFRRYGRISTELSWLLLLKNWIGVTMPWIARKVGLRITDATVEKFYVDLCRSNVLVRESYKVKENDILQLFMRLREARQLTMEELTTACYSFVKHGMEPCTSVMTFCLYELAKNVSIQKRLRDEISHYLEDTDGQLTYDVIMSMNYLDQVVNETMRKYPPVDFIYRRSSQSRDNIPQGTLFVIPVYAFHHDPDHFPAPEKFDPERFTAKQARTRHPYCYLPFGAGPRECLGARFGLLVVKAGLVTLLRRFRFAMPEELVHEKLQFKPNASVLSPVEGSVRLRVEAI.
These trace files match 100% MISGTVCIVLVLA
gnl|ti|822917015
gnl|ti|749404661
gnl|ti|630748299
gnl|ti|520549356
These
trace files match 100%
HVDAEVDPLSETLFALSGNSWQDKRNQLTPVFSEETLKPV
gnl|ti|749404661
gnl|ti|630748299
gnl|ti|591518206 this seq also matches JP’s version
QLTPVFSEETLKPVFRMIASFADELRKEISKNLDRRLQDVQEWVSRYVTQ
VMGKSVFGMRCRMMQDPNTDFRRYGRISTELSWLLLLKNWIGVTMPWIARKVGLRITDAT
VEKFYVDLCRSNVLVRESYKVKENDILQLFMRLREARQLTMEELTTACYSFVKHGMEPCT
SVMTFCLYELAKNVSIQKRLRDEISHYLEDTDGQLTYDVIMSMNYLDQ
These trace files match FHHDPDHFPAPEKFDPERFTAKQARTRHPYCYLPFGAGPRECLGAR 100%
Two different sequences exist
gnl|ti|808282492
gnl|ti|593108115
These
trace files match CLGARFGLLVVKAGLVTLLRRFRFAMPEELVHEKLQFKPNASVLSPVEGSVRLRVEAI100%
gnl|ti|808282492
gnl|ti|593642133
gnl|ti|593108115
>CYP6Pnew
574331490 AAGE01337778.1 571589719 600552785 complete
66% to 6P4
note:
TC66947 begins 357bp downstream of this seq, same oreintation
this seq
matches AAGE02030882.1
MLPFLLAVVALLLTAAGLYIRSRHRFWSDRGIPCAPNPEFLFGHVRGQVTNKHAAY 353
VNRELYQQFKARGEGFGGYSFFAVPAVIIVDPELVKTILVRDFAVFHDRGIYNNPKDDPL
173
SGQLFLLEGLQWKILRQMLTPTFTSGRMKAMFGTIMDV
AEEFRQFLVDSRERESVIEMKEVLASFTTDVIGTCAFGIECNTLKNPDSDFLKYGKKVF
EQRMSTLFKFIFASLFK
DLARKLGVKITDAGVEKFFLGLVRETVEFREKNNVMRNDFMNLLLQLKNKGRLVDQLDE
ADEVAARGLTMEELAAQCFVFFIAGYETSSTTMNFCLYELAKNPDIQEKLREDIEEAVAS
NGGRVTYDLVMGLRYLDNVVN
(1)
ETLRKYPPIESLNRVPTSDYTVPGTKHVLPKQTMITIPIYALHHDPDFYLDPDNFDP
DRFLPEAAQARHPYAFIPFGEGPRNCIGMRFGLMQTKIGLITLL
RNFRFSPSAKTPDKIAFDVKSFVLSPDGGNYLRYDKI*
>AAGE01395583
96% to 6Pnew 592239414 N-term part C-term part 569878875
803228481
a second C-term part, neither can extend 8 aa to the normal exon boundary
This looks
like a pseudogene
MLPLLLAVVTILLTAAALYIRSRHRFWSDRGIPCAPNPEFLFGHVRGQVTNKHAAYVNRE
LYQQFKARGEGFGGYSFFAVPAVIIVDPELVK
8
TILVRDFAVFHDRGIYNNPKDDPLSGQLFLLEGLQWKILRQMLTPTFTSGRMKAMFGTI 184
185
MDVAEEFRQFLVDSRKRESVIEMKEILASFTTDVIGTCAFGIECNTLKNPDSDFLKYGKK 364
365
VFEQRVSTLMKFIFASLFKDLARKLRIKITDAGVEKFFLGLVRETVEFREKNNVLRNDFM 544
545
NLLLQLKNKGRLVDQLDEADEVAARGLTMEELAAQCFVFFIAGYETSSTTMNFCLYELAK 724
725
NPDIQEKLREDIEEAVASNSGRVTYDLVMGL 817
this is
the same as
AAGE02023125.1
100% match to 1.702_1 with error at intron boundary, missing small exon
This
could be a pseudogene since the gene structure does not match related genes
The
end of exon 1 could be broken off by an insertion, blocking expressed.
If
it is expressed it is missing 2 conserved amino acids VM
48326
MLPLLLAVVTILLTAAALYIRSRHRFWSDRGIPCAPNPEFLFGHVRGQVTNKHAAYVNRE 48147
48146
LYQQFKARGEGFGGYSFFAVPAVIIVDPELVKTILVRDFAVFHDRGIYNNPKDDPLSGQL 47967
47966
FLLEGLQWKILRQMLTPTFTSGRMKAMFGTIMDVAEEFRQFLVDSRKRESVIEMKEILAS 47787
47786 FTTDVIGTCAFGIECNTLKNPDSDFLKYGKKVFEQRVSTLMKFIFASLFKDLARKLRIKI 47607
47606
TDAGVEKFFLGLVRETVEFREKNNVLRNDFMNLLLQLKNKGRLVDQLDEADEVAARGLTM 47427
47426
EELAAQCFVFFIAGYETSSTTMNFCLYELAKNPDIQEKLREDIEEAVASNSGRVTYDL (0) 47253
44758
GLRYLDNVVN (1) 44729
44670 ETLRKYPPIESLNRVPTSDYTVPGTKHVLPKQTMITIPIYALHHDPDFYLDPDNFD 44503
44502
PDRFLPEAAQARHPYAFIPFGEGPRNCIGMRFGLMHTKIGLITLLRNFRFSPSPKTPDKI 44323
44322
AFDVKSFVLSPDGGNYLRYDKI*
44254
>AAGE01028822
752849490 633799995 600024515 576876673 complete 65% to 6S1
MILILLLLAATALFFRWINAYRARYQFWKEHNVPHLEPRFPVGNAGDILKSTIHFAH
IMDNLYRELKHFGDYAGIYFFRDPVLVVLSPEFAKTVLVK
658
DFNYFLDRGVYSNEKDDPLSANLFFMEGHRWRKLRAKLTPTFTTGKLKAMFHTILAVGEQ
478
FDRYLQDYTKQKDEVEVKDLLARFTTDIIGSCAFGIDCNSLENPESKFRQMGKRMINFPK
298
LKALKIFFAMMYRKQARWLRIRFNDEDVSDFFFAVVRDTIRYREENNFERKDFMQLLIE-
121
LKNKGYMEDDGEYVEEL
QGGRLEKLTFEEIAAQAFVFFFAGFETSATTMTFALHLLASNQEVQDRGRKCVYEVLERH
444
DGKLSYEALMEMTYIDCIIQ
(1)
ETLRIYPPVATIHRITTKPYKLPNGSVLPEGVGVVIPNLAFQRDPEFFPEPMQFRPE
157
RFFEDEKDKRHNFCHLPFGEGPRICIGMRFGLLQTRMGIAMLLKNYRFRLCPKSVFP
LKTDPINLIYGPAGDVWLGIEKIQ*
>AAGE01126587
45% to 6AJ1 missing part of exon 4 632860227 759657436 578977303
walked by
megablast to AAGE01030106.1 but this still does not get the N-term
bset match
to N-term in WGS = AAGE01034156.1
Since there is only one 6AJ
This must
be the correct N-terminal.
Note after
the sequence PERDDQLQSLLKTK there is a gap compared to Anoph 6AJ1
The seq
KSSTY that comes right after this seq in 6AJ1 appears on the
Opposite
strand translation in the same spot, so there might be an inversion
Here and
this might be a pseudogene. 13 aa
at the end of exon 4 are missing.
Five trace
file seqs have a 100% match in this region, so the seq is correct.
Alternatively
the seq may be shorter like 6AH1 or 6AG.
A PHASE 1 boundary is
possible.
1061
MILTTVFLIGLLYKNPVAFFLVIVAGLLVRELIKYHFRHWERCNVPGPKPSLIFGNIASN 882
881 IFLRQHFAEMIDGWYN (2) 834
763
KFPNAPFIGFYKIFKPSVMIRDPEMIKNVLVRDQACFSANDFAFDEKLDPLLAHNPFMVSG 587
586
ERWKKSRQLLTPIFTGSKMKQLFPIMDEISSQFVDFVGRQCGREVEAKS (0) 440
ISAAYTTQNVAGCAFSLDADCFNNPNSEWRVMGKKIFQPTLLAGIKFMLMLFVPSVTWFIPVP (2)
2075
FLPKEVDRWMRKLVSTLLQERKNKQPERDDQLQSLLKTKS (1) 1956
1485
AELTEEQIAGHSLAFFSEGFETSSTTMGFAILH (0) 1381
1327
LAENPDVQEKLFQEIQNTLGKNDIPLTFDLVQKIEYLDWVLQESLRITPPA 1175
1174
AGLQKLCTQNYCLKYKVDGKEVGTWIMPGTTVLIPIVAVHM
990 DPKYYPEPEKFRPERFSPEEKAKRTDPVYYPFGEGPRMCLGMRFAQIQIKMALLKLVQQF
811
810 RVRTSPNYKPWQYNRNTFLTEAKDGLQVVFERRS*
706
>AAGE01171970
69% to 6AK1
822917241
578615327 575500950 complete
MPLVAWLAAIALICYLAWKRNNFWNRHGVPYVLEIPAVGNFSSVALQMHSMFD
YVARIYDHVRTRDADFFGINIFFRKALVIRNPDMVKKMLVADSRYFINRQMC
TDREGDHFGYYNLMMIKEPLWKDLRGYLSPSVTSSRLRRMFSLIDE
(0) 526
IGNNMLAHLDGVQQKPTK
LRETEFKELCARFTTDVIASTFFGIQANCLSDEESEFRYYGR
264
KIFEYGPKRALNMAAFFFMPELVPYLGFKLFPRDTERFLKTIIEQEIARRETSGENRGDF
84
IDSMIALKNNEATIGVEEKI
(2)
HLKGDILVAQAATFYMASFETTSSVLSFTLYELTKN
(0)
PEIQQRLREEIHNCIKKYGRDLSYECLVNEMPYLGMVISEAARLYPVLPFIERQCSLPAGA
TGYKLDPFHNFVVPNKMPVLVPIYAIHRDPK
(0)
YFPDPLRFDPDRFSKDNADNIVPCSYMPFGVGPRTCLGSHFGTLQVKVAITRLLSKYRI
LRSESSPETLTYRKNAFTLHSNEGLYADLELDELC*
>AAGE01277917
65% to 6AH1
519656511
569678490 521887623 578889466
749413119 578997302 complete
MLELYIALAVLAVCLYFKWSCSYWRRVGNVDGPQPLPIFGNGLEQITGAKHFGEIFEEVYR
(2?)
TYPTAAWVGIYELFNKPAIVVRDLELVKE
878
ILVGSFQHFNRNSFEVDETIDPLVAINPFTQSGDLWKERRSQVVPVFSQTKIRSCFPIIK
698
NVADNFLEYVTKTRKTSPDFEAKD
623
ICARFTIDSVASCAFGIDAESFTNPNSEFRRVGFELFNPSSIMATVRSLLALFAPKLASLLRIP (2?)
FVPPYVDRWFRKLVNEVIRQRKEGEVKRQDLFQAMYDT
628
LTQQGTVDVKNDEIVGHSVTFLTEGFETSSTLMCYFLYELASNQHIQDRVLNEIDCVLKE
448
YDGKLTDEAVNKLTYMERAMYETLRMHSPVFTLTKVCTKEYELPPQYTDDVGKRITMKPG
268
MSAIIPVHAIHLDPEIYPDPCRFEPDRFLDENRKGRHRYAFLGFGEGPRICLGMKFGLSQ
88
SKIGIATLLSKYRVVGSDKQELPLEISRKSFLLASKNGIWVKFVERG*
CYP3 clan
CYP9J
related sequences 17 complete genes, two full length pseudogenes
And two
partial pseudogenes
>CYP9J1
TC67648 TC11677 TC2154 TC45358 AY064092 AF390099 complete 50% to 9J4 96% to 9J2
MVEVNIFSALAVGAVILLLYHYIAKKYHYFLTKPIPCIKPTFLL
GIFDMVVLKRVELVFGSKLLYNSYPDAKIIGYYELTKPTYMVRDPEMIKKIAIKDFDS
FTDRTPVFGDAVPADSLFFNSLFSLRGQKWRDMRSTLSPAFTGSRMRHMAELVVKCAT
SMTDFFHSEAKAGRRLEFNMKDTFSRFVCDAIASVAFGIEVDSFRDPENESYKKGNES
QKIHTFKSLATFVTLRFVPFLQKVFNFDFVDANVAGYFKKLILDNMDQRNKQGIVRND
LVNMLMETKNGALKYEEQDTQVPEGFATVEESHVGKSTHSRIWTDNELISQCFFFFFA
AFDNVSSILTFLSYELTVNQDIQRRLYEEIAVTESTLNGQPITYEALQKMAYLDMVVS
ETLRKYPTATLTDRYVNKDYVFDDEEGLRFVIEKGKTIWIPMLALHHDPKYFPEPERF
DPERFSEDNRSKIVPGTYLPFGAGPRSCIGPRLALLEVKMALYHLVKDFNLQASEKTQ
IPLRLSKSAFTMQAENGVWLELKARPKA
96% to
CYP9J1, 9J2, 9J15 complete
nearly
identical to 9J2 (9aa diffs)
This is a
near exact match (1 aa diff) to CYP9Jae9 below without any frameshifts
This is
not a pseudogene
MVEVNLFSALAVGAVILLLYHYIAKKYHYFLTKPIPCIKPTFLFGIFDMVVLKRVELVFGSKLLYNSYPDAK
(2)
IIGYYELTKPTYMVRDPEMIKKIAIKDFDSFTDRT
PVFGDAVPADSLFFNSLF
SLRGQKWRDMRSTLSPAFTGSRMRHMAELVAKCATSMTDFIHSEAKAGRRLEFNMKDTFS
RFVCDAIASVAFGIEVDSF
RDPENEFYKKGNESQKIHTFKSLATFVTLRFVPFLQKVFNFDFVDANVAGYFKK
LILDNMDQRKKQGIVR
NDLVNMLMETKKGALKYEEPDMQVSEGYATVEESHVGKSTHSRIWTDNELISQCFFFFFA
AFDNVSSILAFLSYELTVNQDIQRRLYEEIAVTESTLNGQPITYEALQKMAYLDMVVSEA
LRKYPTATLTDRYANKDYVFDDEEGLRFVIEKGKTIWIPMLALHHDPKYFPEPERFDPER
FSEDNRSKIVPGTYLPFGAGPRSCIGPRLALLEVKMALYHLVKDFNLQASEK
TQIPLRLSKSAFTMQAENGV
WLELKARPKA
>CYP9Jae9
494315799 588882678 579665582 579009008
Length = 536
Score = 1060 bits (2742), Expect = 0.0
Identities = 535/557 (96%), Positives =
535/557 (96%)
Frame = +1
Query:
247 MVEVNLFSALAVGAVILLLYHYIAKKYHYFLTKPIPCIKPTFLLGIFDMVVLKRVELVFG
426
MVEVNLFSALAVGAVILLLYHYIAKKYHYFLTKPIPCIKPTFL GIFDMVVLKRVELVFG
Sbjct:
1
MVEVNLFSALAVGAVILLLYHYIAKKYHYFLTKPIPCIKPTFLFGIFDMVVLKRVELVFG 60
Query:
427
SKLLYNSYPDAK*VCSTNDCRAYYDDEPSFSTRIIGYYELTKPTYMVRDPEMIKKIAIKD 606
SKLLYNSYPDAK
IIGYYELTKPTYMVRDPEMIKKIAIKD
Sbjct:
61
SKLLYNSYPDAK---------------------IIGYYELTKPTYMVRDPEMIKKIAIKD 99
Query:
607
FDSFTDRTPVFGDAVPADSLFFNSLFSLRGQKWRDMRSTLSPAFTGSRMRHMAELVAKCA 786
FDSFTDRTPVFGDAVPADSLFFNSLFSLRGQKWRDMRSTLSPAFTGSRMRHMAELVAKCA
Sbjct:
100
FDSFTDRTPVFGDAVPADSLFFNSLFSLRGQKWRDMRSTLSPAFTGSRMRHMAELVAKCA 159
Query:
787
TSMTDFIHSEAKAGRRLEFNMKDTFSRFVCDAIASVAFGIEVDSFRDPENEFYKKGNESQ 966
TSMTDFIHSEAKAGRRLEFNMKDTFSRFVCDAIASVAFGIEVDSFRDPENEFYKKGNESQ
Sbjct:
160
TSMTDFIHSEAKAGRRLEFNMKDTFSRFVCDAIASVAFGIEVDSFRDPENEFYKKGNESQ 219
Query:
967
KIHTFKSLATFVTLRFVPFLQKVFNFDFVDANVAGYFKKLILDNMDQRKKQGIVRNDLVN 1146
KIHTFKSLATFVTLRFVPFLQKVFNFDFVDANVAGYFKKLILDNMDQRKKQGIVRNDLVN
Sbjct:
220
KIHTFKSLATFVTLRFVPFLQKVFNFDFVDANVAGYFKKLILDNMDQRKKQGIVRNDLVN 279
Query:
1147 MLMETKKGALKYEEPDMQVSEGYATVEESHVGKSTHSRIWTDNELISQCFFFFFAAFDNV 1326
MLMETKKGALKYEEPDMQVSEGYATVEESHVGKSTHSRIWTDNELISQCFFFFFAAFDNV
Sbjct:
280 MLMETKKGALKYEEPDMQVSEGYATVEESHVGKSTHSRIWTDNELISQCFFFFFAAFDNV
339
Query:
1327 SSILAFLSYELTVNQDIQRRLYEEIAVTESTLNGQPITYEALQKMAYLDMVVSEALRKYP 1506
SSILAFLSYELTVNQDIQRRLYEEIAVTESTLNGQPITYEALQKMAYLDMVVSEALRKYP
Sbjct:
340
SSILAFLSYELTVNQDIQRRLYEEIAVTESTLNGQPITYEALQKMAYLDMVVSEALRKYP 399
Query:
1507 TATLTDRYANKDYVFDDEEGLRFVIEKGKTIWIPMLALHHDPKYFPEPERFDPERFSEDN 1686
TATLTDRYANKDYVFDDEEGLRFVIEKGKTIWIPMLALHHDPKYFPEPERFDPERFSEDN
Sbjct:
400
TATLTDRYANKDYVFDDEEGLRFVIEKGKTIWIPMLALHHDPKYFPEPERFDPERFSEDN 459
Query:
1687 RSKIVPGTYLPFGAGPRSCIGPRLALLEVKMALYHLVKDFNLQASEKTQIPLRLSKSAFT 1866
RSKIVPGTYLPFGAGPRSCIGPRLALLEVKMALYHLVKDFNLQASEKTQIPLRLSKSAFT
Sbjct:
460
RSKIVPGTYLPFGAGPRSCIGPRLALLEVKMALYHLVKDFNLQASEKTQIPLRLSKSAFT 519
Query:
1867 MQAENGVWLELKARPKA 1917
MQAENGVWLELKARPKA
Sbjct:
520 MQAENGVWLELKARPKA 536
AAGE02011007.1
1 diff to CYP9Jae9 (fourth P450 on this contig)
177820
MVEVNLFSALAVGAVILLLYHYIAKKYHYFLTKPIPCIKPTFLLGIFDMVVLKRVELVFG 177641
177640 SKLLYNSYPDAK (2) 177605
177541 IIGYYELTKPTYMVRDPEMIKKIAIKD 177461
177460
FDSFTDRTPVFGDAVPADSLFFNSLFSLRGQKWRDMRSTLSPAFTGSRMRHMAELVA 177290
177289
KCATSMTDFIHSEAKAGRRLEFNMKDTFSRFVCDAIASVAFGIEVDSFRDPENEFYKKGN 177110
177109
ESQKIHTFKSLATFVTLRFVPFLQKVFNFDFVDANVAGYFKKLILDNMDQRKKQGIVRND 176930
176929
LVNMLMETKKGALKYEEPDMQVSEGYATVEESHVGKSTHSRIWTDNELISQCFFFFFAA 176753
176752
FDNVSSILAFLSYELTVNQDIQRRLYEEIAVTESTLNGQPITYEALQKMAYLDMVVSEAL 176573
176572
RKYPTATLTDRYANKDYVFDDEEGLRFVIEKGKTIWIPMLALHHDPKYFPEPERFDPERF 176393
176392 SEDNRSKIVPGTYLPFGAGPRSCIGPRLALLEVKMALYHLVKDFNLQASEKTQIPLRLSK 176213
176212 SAFTMQAENGVWLELKAR 176159
>CYP9J2
TC64859 TC11586 TC5014 TC50571 AF329892 complete
50% to 9J4
96% to
9J1, 98% to AY064093
MVEVNLFSALAVGAVILLLYHYIAKKYHYFLTKPIPCIKPTFLL
GIFDMVVLKRVELVFGSKLLYNSYPDAKIIGYYELTKPTYMVRDPEMIKKIAIKDFDS
FTDRTPVFGDAVPADSLFFNSLFSLRGQKWRDMRSTLSPAFTGSRMRYMAELVVKCAT
SMTDFIHSEAKAGRRLEFNMKDTFSRFVCDAIASVAFGIEVDSFRDPENEFYKKGNET
QKIHTFKSLATFVTLRFVPFLQKVFNFDFVDANVAGYFKKLISDNMDQRKKQGIVRND
LVNMLMETKKGALKYEEPDLQVSEGYATVEESHVGKSTHSRIWTDNELISQCFFFFFA
AFDNVSSILAFLSYELTVNQDIQRRLYEEIAATESTLNGQPITYEALQKMAYLDMVVS
EALRKYPTATLTDRYANKDYVFDDEEGLRFVIEKGKTIWISMLALHHDPKYFPEPERF
DPERFSEDNRSKIVPGTYLPFGAGPRSCIGPRLALLEVKMALYHLVKDFNLQPSEKTQ
IPLRLSKSAFTMQAENGVWLELKARPKA
>CYP9J15
AY064093 complete
98% to 9J2 50% to 9J4
MVEVNLFSALAVGAVILLLYHYIAKKYHYFLTKPILCIKPTFLL
GIFDMVVLKRVELVFGSKLLYNSYPDAKIIGYYELTKPTYMVRDPEMIKKIAIKDFDS
FTDRTPVFGDAVPADSLFFNSLFSLRGQKWRDMRSTLSPAFTGSRMRHVAELVAKCAT
SMTDFFHSEAKAGRRLEFNMKDTFSRFVCDAIASVAFGIEVDSFRDPENEFYKKGNET
QKVHTFKSLTTFVTLRFVPFLQKVFNFDIVDANVAGYFKKLILDNMDQRKKQGIVRND
LVNMLMETKKGALKYEEPDLQVSEGYATVEESHVGKSTHSRIWTDNELISQCFFFFFA
AFDNVSSILAFLSYELTVNQDIQRRLYEEIAATESTLNGQPITYEALQKMAYLDMVVS
EALRKYPTATLTDRYANKDYVFDDEEGLRFVIEKGKTIWISMLALHHDPKYFPEPERF
DPERFSEDNRSKIVPGTYLPFGAGPRSCIGPRLALLEVKMALYHLVKDFNLQPSEKTQ
IPLRLSKSAFTMQAENGVWLELKARPKA
$$$$$$$
>AAGE01172381
AAGE01048909 TC52199 TC17152 TC22373 TC36113 72% to TC60679
575142964
494348902 trace archive seqs
join with
TC60874 TC20300 TC25446 TC48456
complete, 57% to CYP9J2, 57% to 9J1 56% to 9J15
MLEVNLFAAIAVGALILAVYHHISKRYQYFLSKPVPCMKPTFLVGNSGPMLTRKKDIA
SHIRTMYDTYPDAKMIGFYDLTKPVYLVRDPEVVKTMTVKDFEHFTDHTPTMTGTGEE
VSEKSLFGNSLFALRGQKWRDMRSTLSPAFTGSKMRHMFELVVECGQSMAEFLLSEAK
AGKRLEFEMKDIFTRFGNDVIATVAFGIKVDSMRDRENEFYMKGKQLLNFQRFTLMIK
FLLMRAMPALAEKLGADFVDAEAGKYFTGVIMENMKQRKAHGIVRNDMIHMLMEVRKG
ALKHEKGEQETKDAGFATVEESQVGKTTHSRIWKDNELVAQCFIFFLAGFDTLSTGLT
FLTYELALNPEIQQRLYEEVMETESNLDGKPLTYE
VLQQMKYMDMVISESLRKWPPGIVADRYCTKEYQFKDGP
GSFLIEKGTSLWIPTIAIHNDPRYYPNPDKFDPERFSDENKSKINPAAYIPFGVGPRNCI
GSRLALMEMKSVVYYLLREFSFEPTEKTQIPLKLTMSGFTLQGEKGVWLEFKPRSI*
AAGE02011007.1 Length=228841 use this seq part
of a large gene cluster
With 6 genes and one pseudogene on this contig
(second P450 on contig)
Others are at 143937 (+)=CYP9Jae7, 164192 (-) = AAGE01021462, 177820 (-) = 9J2,
196721 (+)= 9J9v1 (6 diffs), 218249 (+) = 9J10
220106 (+) pseudogene N-term exon 1 only new
153975
MLEVNLFAAIAVGALILAVYHHISKRYQYFLSKPVPCMKPTFLVGNS
GPMLTRKKDIASHIRTMYDTYPDAK (2) 154190
154250
MIGFYDLTKPVYLVRDPEVVKTMTVKDFEHFTDHTPTMTGTGEEVSEKSLF
GNSLFALRGQKWRDMRSTLSPAFTGSKMRHMFELVVECGQSMAEFLISEAKAGKRLEF
EMKDTFTRFGNDVIATVAFGIKVDSMRDRENEFYMKGKQLLNFQRFTLMIKFLLMRAM
PALAEKLGADFVDAEAGKYFTGVIMENMKQRKAHGIVRNDMIHMLMEVRKGALKHEKG
EQETKDAGFATVEESQVGKTTHSRIWKDNELVAQCFIFFLAGFDTLSTGLTFLTYELA
LNPEIQQRLYEEVMETESNLDGKPLTYEVLQQMKYMDMVISESLRKWPPGIVADRYCT
KDYQFKDGPGSFLIEKGTSLWIPTIAIHNDPRYYPNPDKFDPERFSDENKSKINPAAY
IPFGVGPRNCIGSRLALMEMKSVVYYLLREFSFEPTEKTQIPLKLTMSGFTLQGEKGV
WLEFKPRSI* 155653
>AAGE01021462
82% to AAGE01172381 821673483 574004088 637743693 complete
AAGE01075209
only 4 aa diffs to 574004088
578
MEVDLFAAIAVGALILAVYHHLLKRYQYFLTKPVSCVKPSFPMGSSGVMLTRKRDIFSHI 399
398
QMMYNTYPDAK (2) 366
304
IMGFYDFTKPVYMIRDPEVIKRITVKDFDHFIDHTPSMTGQGEEPGENSLLGNTLFALR 128
127
GQKWRDMRSTMSPAFTGSKMRHMFELVAESGQSTAKFLLAEA 2
KARKRLEFEMKDTFTRFGNDVIATVAFGIKVDSMRDRDNEFYMKGKQLLNFQTFL
LKIKFIMMRAMPTLA
EKLGVDLLDAEAVKYFKGMILENMKQRKAHGIIRNDMIHMLMEVRKGALKHEKDEQDTKD
AGFATVEESQVGKTTHSRIWKDNELVAQCFIFFVAGFDTVSTGLTFLAYELALNPEIQQR
LYEEIIETETTLEGKSLTYEVLQKMKYLDMVVSEGLRKWPAGI
LGDRYCTKDYQYKDAAGSFVIEKGTSLWIPTIAIHNDPQYYPNPEKFDPERFSDENKSKI
NPFAYMPFGVGPRNCIGSRLALMEMKLIMYYLLREFSFEPTEKTQIPLKLVMSGFALQGE
KDVWLEFKPRAL*
AAGE02011007.1 = old
AAGE01021462 (third P450 on this contig)
164192
MEVDLFAAIAVGALILAVYHHLLKRYQYFLTKPVSCVKPSFPMGSSGVMLTRKRDIFSHI 164013
164012 QMMYNTYPDAK (2) 163980
163918
IMGFYDFTKPVYMIRDPEVIKRITVKDFDHFIDHTPSMTGQGEEPGENSLLGNT 163757
163756
LFALRGQKWRDMRSTMSPAFTGSKMRHMFELVAESGQSTAKFLLAEAKARKRLEFEMKDT 163577
163576
FTRFGNDVIATVAFGIKVDSMRDRDNEFYMKGKQLLNFQTFLLKIKFIMMRAMPTLAEKL 163397
163396
GVDLLDAEAVKYFKGMILENMKQRKAHGIIRNDMIHMLMEVRKGALKHEKDEQDTKDAGF 163217
163216
ATVEESQVGKTTHSRIWKDNELVAQCFIFFVAGFDTVSTGLTFLAYELALNPEIQQRLYE 163037
163036
EIIETETTLEGKSLTYEVLQKMKYLDMVVSEGLRKWPAGILGDRYCTKDYQYKDAAGSFV 162857
162856 IEKGTSLWIPTIAIHNDPQYYPNPEKFDPERFSDENKSKINPFAYMPFGVGPRNCIGSRL 162677
162676
ALMEMKLIMYYLLREFSFEPTEKTQIPLKLVMSGFALQGEKDVWLEFKPRAL* 162518
>AAGE01005986
476365476 825769964 631521475 578081381 476406576 591384997
51% to
TC52199 59% to 9J3 complete
MDFTFWGVMAAAAIGIGLIYRYMTRNYFYFADKPIPFLEPVFAIGNLGPLLMKKRDIFEHFRWLYNRFPNDK
(2)
IFGMFSMSDPVFMIRDPAMLKRIAVKDFDHFADHSGLGGDTELDNPHMLVLNTLVALRG
NKWRDMRATLSPAFTGSKMRQMFALIAECGQRMVEFYKGAEEGSRIEVEAKEMFSRFTND
VIATTAFGIEVDSFRQPENEIFSLGKAVMQPSGLLNTLKGIGYVLFPKLMVKMNVDFLSK
KDDQFFRGTIQETMRIRQEKSIFRPDMIELLIQAKKGNLKHSADKQSEVEAFSAAEESQVGRRSHDRTWTDD
ELIAQALIFFSAGFETVSTTLSFVAYELARNDDVQSRLYEEILETNRSLDGKILSYEALQ
AMPYMDMVVSETMRLWPIGTIVDRLCVKDYVYDDGQGCRFTIEKGRSVMGSVIGMHHDPK
YYPQPEKFDPERFSAENRRNINPDTYLPFGIGPRNCI
()
GSRFALMEMKAVVYYLLLNLSFDVTEKTQIPLKMQKSPSRFVSEKGIWIALKPRVTVV*
AAGE02011007.1 1 diff to AAGE01005986 (first P450 on this contig)
143916 MDFTFWGVMAAAAIGIGLIYRYMTRNYFYFADKPIPFLEPVFAIGNLGPLLMKKRDIFEHFRWLY 144110
144111 NRFPNDK (2) 144131
144498
IFGMFSMSDPVFMIRDPAMLKRIAVKDFDHFADHSGLGGDTELDNPHMLVLNTLVALR 144671
144672
GNKWRDMRATLSPAFTGSKMRQMFALIAECGQRMVEFYKGAEEGSRIEVEAKEMFSRFT 144848
144849
NDVIATTAFGIEVDSFRQPENEIFSLGKAVMQPSGLLNTLKGIGYVLFPKLMVKMNVDFL 145028
145029
SKKDDQFFRGTIQETMRIRQEKSIFRPDMIELLIQAKKGNLKHSADKQSEVEAFSAAEE 145205
145206
SQVGRRSHDRTWTDDELIAQALIFFSAGFETVSTTLSFVAYELARNDDVQSRLYEEILET 145385
145386
NRSLDGKILSYEALQAMPYMDMVVSETMRLWPIGTIVDRLCVKDYVYDDGQGCRFTIEKG 145565
145566
RSVMGSVIGMHHDPKYYPQPEKFDPERFSAENRRNINPDTYLPFGIGPRNCI (1) 145721
145788 GSRFALMEMKAVVYYLLLNFSFDVTEKTQIPLKMQKSPSRFVSEKGIWIALKPRVTVV* 145964
>476384387
587659684 832450214 519879482 82% to AY433038 54% to 9J5 complete
note this
seq is upstream of 9J6 on AAGE01000868 (4723-3066) on opp. strand
MEINLELWIAVISIGILLYKWITRNNDYFHEKPIPSMAVKPFFGGIAPLVFKSFSMNGFISHIYQKYPNVK
(2)
VFGFFDALTPIFVVRDPELIKKITVKDFDHFIDHLPMFGNSENDNPYSIFGKTLFAL
TGKKWRQMRATLSPAFTGSKMRKMFELVIECSDSVAQFYKTQSNETHEVELTDLLTRF
GFDVIASCAFGIRMDSLRDRDNDFYNNGIKMRRFQRLSVAIRFVMFKFCPTLMGKLGIDV
IDRDQVRYFSALIKDAVKQ
RQTKDIIRHDMIQLLIQARKGTLKHQEEKEVEEGFATVKESSIGKTNVTFNMTDNEMI
AQAFVFFLAGFETVSTALTFLIHDLVMNKDVQHRLYEEVASTHEYLQGKHLNYDTLQKMK
YMDMVVSESMRMRPAGPFMDRVCIHDYDLDDGQGLKFTIDKGTAVWIPVQGIHMDPKYYP
NPERFDPERFNDENKAAINPMTYLPFGIGPRNCIGS
RFALMEIKAIVYYLLLHFSFEANRKTQIPLKLRKGFTVVAAEGEVWIDLKAR*
>CYP9J6
AY433038 AAGE01000868 (10298-11954) 520111339 528815988
616367213
644315757 56% to AY431970 56% to 9J5 complete
MEVNLGAVIVILSTVILIYKWITRNNDYFHEKPIPSMAVKPLFGSTGPLILKQFSLHGFINHIYQKYPNAK
(2)
VLGIFDALTPIFVVRDPELIKKIAVKDFDHFIDHRPMFGNSENDN
PYSIFGKTLFALEGQKWRDMRATLSPAFTGSKMRKMFELVIECSDSVAQYYVKQSKKVVE
VELTDMCTRFGSDVIATCAFGIKMDSLRERDNEFYDNGKKMMRFERLSVALRMFAFKFFP
TLMGQMGIDIIDREQAKYFSALIMDAVRQRQTKGITRPDMIQLLIQARKGTLKHQEEKEV
EEGFATVKESSIGKTNVSFNMTDNEMIAQAFVFFLAGFETVSTTLTFLIYDLVV
NKDVQQRLYEEIVATNDSLQG
KLLNYDTLQKMKYLDMVLSESMRIRPAAATLDRLCVRDYEVDDGQGLKFTINKGTAVWIP
TQGIHMDPMYYPNPERFDPERFNDENKATIDPMTYLPFGVGPRNCIGSRFALMEIKAIVY
YLLLHFSFEANRKTPIPLELRKGFTIVAAEGEVWIDMKAR*
>AAGE01063458
263503628 49 % to TC52199 48% to
CYP9L3 476324061
DR747831
520184164 820336301 223394438 51% to CYP9J1 476322739 complete
MEVNLLLLLIIVGILGVIYRQVKKHYDYFHDKPIPSMATVPLLGSTGPLMTKRCTFNDFIQTIYYKYPSAKV
FGLFDMTTKMFVLRDPEVIKKITVKDFEYFVDRRPLFGANKEDDGNENIL
FNKTLVGMVDQRWRDMRAILSPAFTGSKMRAMFELIEQYCTQMVPILKEQSAESGYVDYE
MKDFFSRVANDIIATCAFGLQVESLKSRDNEFYTMGKQMMNFNRFIVLLRVMGLRFFPSL
MIKMGVDIVDREQNQYFSKIIKEAVRARETHGIVRPDMIHLLMQARKGTLKHQQETTEST
AGFATVEESDVGKSVVSKTMSEPEFIAQCLIFFLAGFDTVSTGMLFMAYELDLNPNIQQK
LYEEIAQTNKELGGKPATYDTLQKMKYMDMVVS
ESLRMWPVAAFDRKCGRDYVLDDGAGLKFTIDAGTCIWVPVYGIHRDPKYYPNPDKFEPE
RFSDENRGKIDMTMYMPFGMGPRNCIGSRFALMEIKAIMYALLLNFSIERNEKTQVPLKL
VKGFVGLQVENGLHLRFKKRK*
>CYP9J9v1
AAGE01125862 AAGE01449675 TC60679 TC19466 TC24864 TC43056
59% to
CYP9J1 AY431945 DR747470 91% to TC60950,
90% to
TC60951 588932055 complete
MVEVDLYVAVAIGAIILLLYHYGSKKYEYFLTKPIPALKPTFLLGNT
GAMMFRRRDVSAHVKLLYNSLEGYK
(2)
VAGFYDLMKPIYM
LRDPEVIKQIAVKDFDYFMDHTPTMTNNRADDEVGGDSLFGNSLFALRGQK
WRDMRATLSPAFTGSKMRHMFELVADCAKSMAEFFKSEAAAGKKLEYEMKDTFSRFGNDV
IATVAFGIKVDSLRDRDNEFYMKGKNMLNFQSVSVMFKFLLLRAFPKLSQKIGVDFVDST
LTEYFKGMIVDNMKQRDAHNIFRNDMIQMLMEVRKGSLKHQKDEKETKDAGFATVEESNV
GKSTINRVWTENELIAQCFLFFLAGFDTVSTCMTFLTYELMLNPDIQQRLFDEVMETEES
LNGKPLTYEVLQRMEYMDMVVSEALRKWPPAVVSDRFCVKNYMYDDGKGTRFPIEKGQTM
WIPTIAIHSDPRYYENPEKFDPERFNEENRSKIDTGAYLPFGVGPRNCIGSRLALMEVKV
IIYNLLKEFSLEASEKTQIPLKMAKNFFALQAENGVWLELKPRKH*
AAGE02011007.1 6 diffs to 9J9v1 (fifth P450 on this
contig)
196721
MVEVDLYVAVAIGAIILLLYHYGSKKYEYFLTKPIPALKPTFLLGNTGAMMFRRRDVSAH 196900
196901 VKLLYNSFEGYK (2) 196936
196999
VAGFYDLMKPIYMLRDPEVIKQIAVKDFDYFMDHTPTMTNSKADDEVGGDSLFGNSLFA 197175
197176 LRGQKWRDMRATLSPAFTGSKMRHMFELVADCAKSMAEFFKSEAAAGKKLEYEMKDTFSR 197355
197356
FGNDVIATVAFGIKVDSLRDRDNEFYMKGKNMLNFQSVSVLFKFLLLRAFPKLSQKIGVD 197535
197536
FVDSNLTEYFKGMIVDNMKQRDAHNIFRNDMIQMLMEVRKGSLKHQKDEKETKDAGFATV 197715
197716 EESNVGKSTINRVWTENELIAQCFLFFLAGFDTVSTCMTFLTYELMLNPDIQQRLFDEVM 197895
197896
ETEESLNGKPLTYEVLQRMEYMDMVVSEALRKWPPAVVSDRFCVKNYMYDDGKGTRFPIE 198075
198076
KGQTMWIPTIAIHSDPRYYENPEKFDPERFNEENRSKIDTGAYLPFGVGPRNCIGSRLAL 198255
198256
MEVKVIIYNLLKEFSLEASEKTQIPLKIAKNFFALQAENGVWLELKPR 198399
>AAGE01065801
AY431970 60% to 9J5 494336054 641818007 578928176 complete
AAGE02029679.1
use this seq change D to E
MEVELLHVGVLVAIVAFLYRWITRNNDYFHDKPIPSMAVTPFLGASGPLLLRKVTFNDFVQSIYNKYPGVK
(2)
VFGMFETITPFFVIRDPELIKQIGIKDFDHFVDHRPTFGLDDETAEHPKALFRKTLFSM
TGQRWKEMRATLSPAFTGSKMRQMFSLMSECCDEMMKHYLDKAKGSG
RVEVEMKDLLSRISINVIASCAFGIKVDCFKEQEHEFLYHGRKMMGFGRPIVIARMLAMR
VFPKFAAKFGIDLLDREQANYFTHVFQETIRARESHGYIRHDMIDLLLQARKGTLKYQEE
KDDQEGFATVQESDVGKADVSKSMTEAEMIAQCLIFFLGGFDTVSTCAMFTAYELVRNPE
VQHKLYEEIKQTEKELEGKPLSYDALQKMKYMDMVVSETLRMWPLAPATDRLCTQDYTID
DGQGVRFTIDKGTCVWFPAAGLHHDPQYFPNPEKFDPERFNDENKR
NINLGAYLPFGIGPRNCIGSRFALMEVKAVMYFILLKFSFVRGAKTQIPMQLRKGFTNLG
PENGMHVELKLR*
>AAGE01006393
81% to AAGE01065801 83% to AY431970 complete
AAGE01400897
84% to AAGE01065801 834914646 N-term 578891344 C-term
749
MEVNLVYLAVVLAVIAYLYRWITRNNDYFHDKPIPSMAVRPFLGASGSLVLRKVSFPDFI 570
569
QTIYNKFPGVK (2) 537
474
VFGMFETITPFFVIRDPELIKQIAIKDFDHFVDHRPTFGLFDEESAEH 331
330
PNALFRKTLFSMTGQRWKEMRATLSPAFTGSKMRLMFSLMGECFDGMIDHYVKKAKTSGR 151
150
VEVEVKDMMSRVSINVIASCAFGIKVDCFKDQDHEFL 181
182
RHGKKMMDFARPIVIARMMAMRVFPKLSSRFGIDLLDPEQARYFTQVFQETIKARESHGT 361
362
VRNDMIDLLLQARKGTLKFQEEKNDQEGFATVQESDMGKVEVMKHITESEMIAQCLVFFL 541
542
GGFDTVSTCAMFMAYELVRSPEVQQKLYEEVLETSKELAGKPLSYDALQKMKYMDMVVSE 721
722 TLRIWPLAPATDRLCTKDYTIDDGQGLKFTIDKGTCVWFPAAGLHHDPQYFPNPERFDPE
901
902
RFNDENKRNINLGAYLPFGIGPRNCIGSRFALMEVKAVMYYTLLKFTIVRSAKTQIPMQL 1081
1082
RKGFTNLGPEKGMHVELKLR*
AAGE02029680.1
Same as above use this seq
86528
MEVNLVYLAVVLAVIAYLYRWITRNNDYFHDKPIPSMAVRPFLGASGSLVLRKVSFPDFI 86707
86708 QTIYNKFPGVK (2) 86740
86803
VFGMFETITPFFVIRDPELIKQIAIKDFDHFVDHRPTFGLFDEESAEH 86946
86947
PNALFRKTLFSMTGQRWKEMRATLSPAFTGSKMRLMFSLMGECFDGMIDHYVKKAKTSGR 87126
87127
VEVEVKDMMSRVSINVIASCAFGIKVDCFKDQDHEFLRHGKKMMDFARPIVIARMMAMRV 87306
87307
FPKLSSRFGIDLLDPEQARYFTHVFQETIKARESHGTVRNDMIDLLLQARKGTLKFQEEK 87486
87487
NDQEGFATVQESDMGKVEVTKQITESEMIAQCLVFFLGGFDTVSTCAMFMAYELVRNPEV 87666
87667
QQKLYEEVLETSKELAGKPLSYDALQKMKYMDMVVSETLRIWPLAPATDRLCTKDYTIDD 87846
87847 GQGLKFTIDKGTCVWFPAAGLHHNPQYFPNPERFDPERFNDENKRNINLGAYLPFGIGPR 88026
88027
NCIGSRFALMEVKAVMYYTLLKFTIVRSAKTQIPMQLRKGFTNLGPEKGMHVELKLR 88197
>AAGE01179692
AAGE01102574 AAGE01259804 (3 aa diffs)
476398393
616358813 575550118 584317339
67% to
TC60679 in 9J fam
53% to 9J4
63% to 9J9 complete
AAGE01266366
parts of two genes
MAAAVLVAVLLFCRYVAKKYQYFLTK
PVPCVKPTFLLGSSGPTIFRKVDVATHFKKIYDVFPQAP
VIGFYDFTTPMYLLRDPEMIKKVSIKDFDYFTDHVPMMPTDAEKEHNPDTLFGNT
LLSLRGQKWRDMRSTLSPAFTGSKMRHMFELVAECGRSLVEHFKAEAAAGRTMEHEMKET
FSKVGSDLIATLAFGIKVDSLREPENVFYANGKKMLNLKSLATFVKFLLITFVPRLMRWLKVDVLNGQSAAY
FKRIILDNMEQREAHKILRNDMIQILMEVRKGTLQHQKEEKDTKDAGFATVEESQVGKSS
HSRVWTENELVAQCLLFFLAGLDTISTCMT
FLTYELTVDPDIQQRLYEEITETYKSLNGKPLSYDVLQRMQYMDMIVSETLRKWPPGVIS
NRYCNKNYLYDDGRGTQFVIEKGQVILIPSYCIQRDPRYFPDPDRFDPERFNEANRAQIN
TSAYIPFGVGPRNCIGSRLALMEVKCMVYYLLKDFELIATGKTQIPERIARDSFGLHPEK
GVWIEFKPRSSQDS*
>AAGE01198792
(parts of two genes, 1-804 is N-term of AAGE01339434,
1707-973
is C-term of another gene) 95% to 476398393
574155449
638535809 complete
MLKVDLFMAAALLAAVLLFCRYVAKKYQYFLTKPVPCVKPTFLLGSSGPTIFRKVDV
ATHFKKIYDVFPQAP
(2)
VIGFYDFTTPMYLLRDPEMIKKVSTKDFDYFTDHVPMMPTDAEKEHGPETLFGNTLL
SLRGQKWRDMRSTLSPAFTESKMRHMFELVAECGRSLVEHF
QTEAAAGRTMVHEMKETFSKVGSDLIATLAFGIKVDSLREPENVFYANGKKMLNLKSLAT
FVKFLLIMFVPRLMRWLKVDVLNGQSAAYFKRMILDNMEQREAHKILRNDMIQILMEVRK
GTLQHQKEE
1707
KDTKDAGFATVEESQVGKSSHSRVWTETELVGQCLLFFLAGLDTISTCMTFLTYELTVDP 1528
1527
DIQQRLYEEITETYKSLNGKPLSYDVLQRMQYMDMIVSETLRKWPPGVISNRYCNKNYLY 1348
1347
DDGRGTQFVIEKGQVILIPSYCIQRDPRYFPDPDRFDPERFNEANRAQINTSAYIPFDVG 1168
1167 PRNCIGSRLALMEVKCMVYYLLKDFELIATGKTQIPERIARDSFGLHPEKGVWIEFKPRNSPDF*
973
>AAGE01102574
possible pseudogene fragment upstream of AAGE01179692
AAGE02011008.1
pseudogene piece
1906
T*LII*KGHTAFISTYRSHHDP*YYENPEQFDPEWFNEAYRAHISTNIYIPFEFRPRNCI 2085
2086 GSRLALIEVKRM 2121
2121
VYQHLKDFETVTTEPIPVRIARDPLALHPEKGVRAELK 2234
AAGE02011008.1 use this seq (first P450 on contig)
Note
6 P450 genes in this contig at 2k, 4k= AAGE01339434, 6k=9J8,
16k=
AAGE01007189, 19k= CYP9LaeP 494089659, 31k = 9Jnew all (+)
2420 MAAAVLVAVLLFCRYVAKKYQYFLTKPVPCVKPTFLLGSSGPTIFRKVDVATHFKKIYDV 2599
2600 FPQAP (2) 2614
2669
VIGFYDFTTPMYLLRDPEMIKKVSIKDFDYFTDHVPM 2779
2780
MPTDAEKEHNPDTLFGNTLLSLRGQKWRDMRSTLSPAFTGSKMRHMFELVAECGRSLVEH 2959
2960
FKAEAAAGRTMEHEMKETFSKVGSDLIATLAFGIKVDSLREPENVFYANGKKMLNLKSLA 3139
3140
TFVKFLLITFVPRLMRWLKVDVLNGQSAAYFKRIILDNMEQREAHKILRNDMIQILMEVR 3319
3320
KGTLQHQKEEKDTKDAGFATVEESQVGKSSHSRVWTENELVAQCLLFFLAGLDTISTCMT 3499
3500
FLTYELTVDPDIQQRLYEEITETDKSLNGKPLSYDVLQRMQYMDMIVSETLRKWPPGVIS 3679
3680 NRYCNKNYLYDDGRGTQFVIEKGQVILIPSYCIQRDPRYFPDPDRFDPERFNEANRAQIN 3859
3860
TCAYIPFGVGPRNCIGSRLALMEVKCMVYYLLKDFELIATEKTQIPVSIARDSFGLHPEK 4039
4040 GVWIEFKPRSSQDS* 4084
>AAGE01339434
AAGE01406122 775439256 579949790 521969711 complete 61% to 9J10
AAGE01266366
parts of two genes
AAGE01198792
(parts of two genes, 1-804 is N-term of this gene,
1707-979
is C-term of another gene 95% to 476398393)
NABOD09TR NABOD09 TC54929 TC20193 TC31509 TC43101
TC5368 TC8501
MEVDLLSAFAVGCIVILIYHYASQKYLYFLTKPIPSLKPTFLVGNIGDIIFRTKDALTHINELYYAFPESK
(2?)
VVGFYELTKPVFMLRDPEVIKQITVKDFDHFMDRS
LPSANDRADTDQPVEGLFANSLVAFQGQKWKDMRSTLSPAFTGSKIRHMFDLVADCSRSM
VEHFRSEANAGRRLECELKDVFSRFCNDVIATVAFGIRVDSVRDPETEFYVKGKQLLDFQ
SPKIILKFLLFQTVPWLMRKLKVDFADADLADYFKGIIQ
DNMKQREVHGIVRNDMVQMLMEVRKGTLKHISDDRESKDSGFASVEESHFGKSTHSRAWT
DNELISQCFVFFIAGLDTVSSCLTFLTYELTLNPDIQKRLYEEVMDTERLLSEKPLSYEA
LQSMKYLDMVVSETLRKWPPTIDSDRYSTRDYLLDDGAGLKVPIEKGRSIYIPIVAIQND
PKYFPDPDRFDPERFSDENRSKIVPGTFIPFGAGPRNCIGSRLALMEVKVAVY
YLLREFSLERTERTDDPIRLTKKAIDLRTENGAWVELKPRKI*
AAGE02011008.1 use this seq 6 diffs to AAGE01339434 (second
P450 on contig)
4256 MEVDLLSAFAVGCIVILIYHYVSQKYLYFLAKPIPSLKPTFLVGNIGDIIFRTKDALTHINELYYAF 4456
4457 PESK (2) 4468
4537
VVGFYELTKPVFMLRDPEVIKQITVKDFDHFMDRSLPSANDRADTDQPVEGLFANSLVAF 4716
4717 QGQKWKDMRSTLSPAFTGSKIRHMFDLVADCSRSMVEHFRSEANAGRRLECELKDVFSRF 4896
4897
CNDVIATVAFGIRVDSVRDPETEFYVKGKQLLDFQSPKIILKFLLFQTVPWLMRKLKVDF 5076
5077
ADADLADYFKGIIQDNMKQREVHGIVRNDMVQMLMEVRKGTLKHIGDDRESKDSGFASVE 5256
5257
ESHFGKSTHSRAWTDNELISQCFVFFIAGLDTVSSCLTFLTYELTLNPDIQKRLYEEVMD 5436
5437 TERLLSEKPLSYEALQSMKYLDMVVSETLRKWPPTIDTDRYSTRDYLLDDGAGLKVPIEK 5616
5617 GRSIYIPIVAIQNDPKYYPDPDRFDPERFSDENRSKIVPGTFIPFGAGPRNCIGSRLALM 5796
5797 EVKVAVYYLLREFSLERTERTDVPIRLTKKAIDLRTENGAWVELKPRKI* 5946
>CYP9J8
AAGE01187448 AAGE01142069 AAGE01118978
476375412
832374064 810104215 758886185 262894467 complete
AAGE01439874 N-term of 9J8
and C-term of AAGE01339434
9J8 is 609bp downstream of
AAGE01339434
MLDPFLLAAFAAVIFLVYHLLNRKYQFFAERGIPYVKPTLLLGNGASVLLKKEDLLQNIQRTYDTFPNAK
(2)
IMGIFDFVKPIMMIRDPDAIKQIGVKDFDHFVDHTPLFTPADCEDVGTNSLFGNSLFA
LRGQKWRDMRATLSPAFTGSKMRHMFELVLDC
ARSTAEYFREEAKSGRTTEYEMKNVFSRFSTDVIGSVAFGIKVDSLREQDNDFFVKGKAM
LNFQNLKSLLKVIMLRSAPGLMNRLNVDITSPQMNAYFKDMIMDNMKQREINGIVRNDMI
NILMQVQKGALLHQKDEQDTKDAGFATVEESSVGKALHNRVWSENELVAQCFLFFLAGFD
TVSTCLTFVSYELLANPDVQQKLFEEIMAVEASLDGKPLSYEVLQKMQYLDQIISETLRL
WPPAPFVDRYCVKDYLFDDGQGTRVPIEKGQIVWFPITALHHDAKYFPEPNRFDPER
FSEQNRPKINPGAYLPFGVGPRNCIGSRFALMEVKAIVYHLVKNFTLERSGKSRVPLKLE
KSYIAMIVEGGMWLEFRPRA*
AAGE02011008.1
use this seq 4 diffs
to CYP9J8 AAGE01187448 (third P450 on contig)
6554
MLDPFLLAAFAAVIFLVYHLLNRKYQFFAERGIPYVKPTLLLGNGASVLLKKEDLLQNIQRTYET 6748
6749 FPNAK (2) 6763
6825
IMGIFDFVKPIMMIRDPDAIKQIGVKDFDHFVDHTPLFTPADCEDVGTNSLFGNSLFAL 7001
7002
RGQKWRDMRATLSPAFTGSKMRHMFELVLDCARSTAEYFREEAKSGRTTEYEMKNVFSRF 7181
7182
STDVIGSVAFGIKVDSLREQDNDFFVKGKAMLNFQNLKSLIKVIMLRSAPGLMNRLNVDI 7361
7362
TSPQMNAYFKDMIMDNMKQREINGIVRNDMINILMQVQKGALLHQKDEQDTKDAGFATVE 7541
7542
ESSVGKALHNRVWSENELVAQCFLFFLAGFDTVSTCLTFVSYELLANPDVQQKLFEEIMA 7721
7722 VEASLDGKLLSYEVLQKMQYLDQIISETLRLWPPAPFVDRYCVKDYLFDDGQGTRIPIEK 7901
7902
GQIVWFPITALHHDAKYFPEPNRFDPERFSEQNRPKINPGAYLPFGVGPRNCIGSRFALM 8081
8082
EVKAIVYHLVKNFTLERSGKSRVPLKLEKSYIAMIVEGGMWLEFRPRA* 8228
>AAGE01007189
AAGE01035444 65% to 9J8 complete
3059 MFFALAIFAGLVLFCLYNVQQKYKYFESRGIPYVKPSFLLGNSAPLIFKKKDMLRHIQDLY
2877
2876 HTHPEAK (2) 2856
IMGLFDFTAPVWMVRDPEAIKQLAVKDFDHFS 2698
2697
DHTPIYTGGDVEDMGTDSLFGNSLLLLRGQKWRDMRATLSPAFTGSRMRLMFELVSECAQ 2518
2517
SMVDYFREEATAGKRLEYEMKDVFSRFSNDVIASVAFGIKVDSLREPDNEFFINGKELMN 2338
2337
FRNMKTVAKVLLMRMFPRLMIKLKADISSAEMNAYFRGMITDNMKQRQAHGIVRNDMINI 2158
2157
LMQVRQGALKNQKEDQETSNAGFAVVEESTTIGQPRDRVWSDNELAAQCFLFFIAGSET 1981
1980
VSTYLTFLAYELLINPEVQEKLFREIAEVERSLAGKPIGYDQLQAMKYMDMVVSENLRLW 1801
1800
PPAPFADRYCSKNYRYDDGQGTRVTIEKGQIVWFPTTALQHDPEYFPDPYRFDPERFSDQ 1621
1620
NRSKIKTGTYLPFGIGPRACIGSRLALLEVKVVAYHLVKHFKLVRSERSKVPLKLKSKMI 1441
1440 GMEVDGGVWLELEPRERS*
1384
AAGE02029680.1
Length=88206 note 9 P450s on this contig, use this seq
13173 MFFALAIFAGLVLFCLYNVQQKYKYFESRGIPYVKPSFLLGNSAPLIFKKKDMLRHIQDL 12994
12993 YHTHPEAK (2) 12970
12907
IMGLFDFTAPVWMVRDPEAIKQLAVKDFDHFSDHTPIYTGG 12785
12784
DVEDMGTDSLFGNSLLLLRGQKWRDMRATLSPAFTGSRMRLMFELVSECAQSMVDYFREE 12605
12604 ATAGKRTEYEMKDVFSRFSNDVIASVAFGIKVDSLREPDNEFFTNGKELMNFRNMKTVAK 12425
12424
VLLMRMFPRLMIKLKADISSAEMNAYFRGMITDNMKQRQAHGIVRNDMINILMQVRQGAL 12245
12244 KNQKEDQETTNTGFAVVEESTTIGQPRDRVWSDNELAAQCFLFFIAGSETVSTCLTFLAY 12065
12064
ELLINPEVQEKLFREIAEVERSLAGKPIGYDQLQAMKYMDMVVSENLRLWPPAPFADRYC 11885
11884 SKNYRYDDGQGTRATIEKGQIVWFPTTALQHDPEYFPDPYRFDPERFSDQNRSKIKTGTY 11705
11704
LPFGIGPRACIGSRLALLEVKVVAYHLVKHFKLVRSERSKVPLKLKSKMIGMEVDGGVWL 11525
11524 ELEPRERS* 11498
AAGE02011008.1
use this seq 1 diff to
AAGE01007189 (fourth P450 on contig)
16655 MFFALAIFAGLVLFCLYNVQQKYKYFESRGIPYVKPSFLLGNGAPLIFKKKDMLRHIQDLYHT 16843
16844 HPEAK (2) 16858
16921 IMGLFDFTAPVWMVRDPEAIKQLAVKDFDHFS 17016
17017
DHTPIYTGGDVEDMGTDSLFGNSLLLLRGQKWRDMRATLSPAFTGSRMRLMFELVSECAQ 17196
17197
SMVDYFREEATAGKRLEYEMKDVFSRFSNDVIASVAFGIKVDSLREPDNEFFINGKELMN 17376
17377
FRNMKTVAKVLLMRMFPRLMIKLKADISSAEMNAYFRGMITDNMKQRQAHGIVRNDMINI 17556
17557
LMQVRQGALKNQKEDQETSNAGFAVVEESTTIGQPRDRVWSDNELAAQCFLFFIAGSETV 17736
17737
STYLTFLAYELLINPEVQEKLFREIAEVERSLAGKPIGYDQLQAMKYMDMVVSENLRLWP 17916
17917 PAPFADRYCSKNYRYDDGQGTRVTIEKGQIVWFPTTALQHDPEYFPDPYRFDPERFSDQN 18096
18097
RSKIKTGTYLPFGIGPRACIGSRLALLEVKVVAYHLVKHFKLVRSERSKVPLKLKSKMIG 18276
18277 MEVDGGVWLELEPRERS* 18330
>AAGE01004684
AAGE01021948 494089659 55% to CYP9L2 223518047 590305650 827533211
569625505
575383376 520166303 520595733 637720165 65% to 263503628 complete
possible
pseudogene This seq has a stop codon seen in 9 trace files
MDINSYYLLVTIVLLILIVLYRRVSKHYGYFSDKPIPSLTPIPLFGNMFPLFMKKYTFPEFIQMIYNRFPDAK
(2?)
LGMFDMSTRFVVLRDPELIKKVLVKDFEFFIDRRSLFGDSASESDSILITKTLL
LLTGQKWRDMRATLSPAFTGSKMRAMFELIVTYSDRMVGILKDQAGPVGYVDYE
MKECCSRIASDIIATCAYGLEVESLANRENDFYTMGKKMMNFGKTSFFVRLLLYSVFPKLMSKLQVDLFDGEQTRYFTEIIKDTVKARD*HGIVRPDMIHLLMQARKGVLKHHRETAEASAGFATVEESEVGKTAIGKTMT
DSEFVAQCLIFFIAGFEAISSQMSFMCYELATNPDIQQKLYEEIKETNKLLKGKPLTYDTL
QQMKYMDMVTSEALRMWSGPATDRKCVRDYVLDDGAGWKFPIEAGTCVMI
721
720
PSYAIHRDPKYYPNPDRFDPERFSEERRADINMTMYLPFGAGPRNCIGSRFALMEMKAIV 541
540
YGLLLNFSIERNEKTQVPLRLNKGFAPLAGEKGMHLRLKVRG* 418
AAGE02011008.1
Length=35048 use this seq both ours were wrong
19802
MDINSYYLLVTIVLLILIVLYRRVSKHYGYFSDKPIPSLTPIPLFGNMFPLFMKKYTFPE 19981
19982 FIQMIYNRFPDAK(2)
ALGMFDMSTRFVVLRDPELIKKVLVKD 20161
20162
FEFFIDRRSLFGDSASESDSILITKTLLLLTGQKWRDMRATLSPAFTGSKMRAMFELIVT 20341
20342 YSDRMVGILKDQAGPVGYVDYEMKECCSRIASDIIATCAYGLEVESLANRENDFYTMGKK 20521
20522
MMNFGKTSFFVRLLLYSVFPKLMSKLQVDLFDGEQTRYFTEIIKDTVKARD*HGIVRPDM 20701
20702
IHLLMQARKGVLKHHQETAEASAGFATVEESEVGKTAIGKTMTDSEFVAQCLIFFIAGFE 20881
20882
AISSQMSFMCYELATNPDIQQKLYEEIKETNKLLKGKPLTYDTLQQMKYMDMVTSEALRM 21061
21062
WSGPATDRKCVRDYVLDDGAGWKFPIEAGTCVMIPSYAIHRDPKYYPNPDRFDPERFSEE 21241
21242
RRADINMTMYLPFGAGPRNCIGSRFALMEMKAIVYGLLLNFSIERNEKTQVPLRLNKGFA 21421
21422 PLAGEKGMHLRLKVRG* 21472
AAGE02011008.1
use this seq 1 diff to
AAGE01007189 (fifth P450 on contig)
pseudogene
This seq has a stop codon seen in 9 trace files
19826 MDINSYYLLVTIVLLILIVLYRRVSKHYGYFSDKPIPSLTPIPLFGNMFPLFMKKYTFPEFIQMIYNR
20005
20006 FPDAK (2) 20020
20081
ALGMFDMSTRFVVLRDPELIKKVLVKDFEFFIDRR 20185
20186 SLFGDSASESDSILITKTLLLLTGQKWRDMRATLSPAFTGSKMRAMFELIVTYSDRMV 20359
20360
GILKDQAGPVGYVDYEMKECCSRIASDIIATCAYGLEVESLANRENDFYTMGKKMMNFGK 20539
20540
TSFFVRLLLYSVFPKLMSKLQVDLFDGEQTRYFTEIIKDTVKARD*HGIVRPDMIHLLMQ 20719
20720 ARKGVLKHHQETAEASAGFATVEESEVGKTAIGKTMTDSEFVAQCLIFFIAGFEAISSQ 20896
20897
MSFMCYELATNPDIQQKLYEEIKETNKLLKGKPLTYDTLQQMKYMDMVTSEALRMWSGP 21073
21074
ATDRKCVRDYVLDDGAGWKFPIEAGTCVMIPSYAIHRDPKYYPNPDRFDPERFSEERRAD 21253
21254
INMTMYLPFGAGPRNCIGSRFALMEMKAIVYGLLLNFSIERNEKTQVPLRLNKGFAPLAG 21433
21434 EKGMHLRLKVRG* 21472
>CYP9Jnew
AAGE01553900 AAGE01064689 59% to 9J5 644306108 757010867 616348285 complete AAGE02011008.1 (sixth P450 on contig)
30792 MEVNVLYLLIVVAVLAVIYRRITRFYEYFHDKPIPSMAAGPPFGSAGPLY
RKKYSFNDFIKMTYDKFPGAK (2) 31004
31067 VFGLCDMTTKLFVIRDPELIKKVTVKDFDYFVNRRATFGESIDDHDEMLFAKSLLALN 31240
31241
DQKWRDMRATLSPAFTGSKMRAMFELIEGYSARMVEILKEQSQAAGYVDYEMKDCFTRVA 31420
31421
NDIIATCAFGLQVESLKNRENEFYVMGKNMLNFNRVSIMFRIFGFNLFPGLMAKLGVDLI 31600
31601
DAEFGQYFSKIIKDAVHTRETRGIVRPDMIHLLMQAKKGALKSQYETTDANTGFATVEE 31777
31778
SEVGRSSIAKAITESEMIAQCFVFFLAGFDSVSSEMVFMAYELALNPDVQQRLYEEIVET 31957
31958
DKQLGGKPPTYDTLQKMQYMDMVVSESLRMWPAGAFDRKCDRDYVLDDGAGLKFTIDRG 32134
32135
AYVWIPVHGIHRDPKYYPDPDKFDPERFSESNRDNIDMTMYMPFGAGPRNCIGSRFALME 32314
32315 IKAIMYALLLQFRIERNEKTSVPLKLVKGFAGLNGEGGVHLRLTLRQ*
32458
>AAGE01123974
494309314 592077078 733946792 579386359 62% to CYP9J5
pseudogene
N-term missing, deletion in second exon
TPFLGASGPLMLRKVTFIEFIQSIYNKYPGVK
(2?)
VFGMFDTITPFFEIRD
[DELETION]
KFGIDLLDREQADYFTHVFQETIRTRESHGIIRHDMIDLLLQARKGTLKYHEE
KDDQEGFATVQESEVGKVEMSKSMTEAEMIAQCLIFFLGGFDTVSSCIMFTAYELVRN
PEVQQKLCEEIVQTDKELGGKPLSYDALQKMKYMDMVVSESLRIWPLAPATDRLCTKDYI
VDDGQGLKFTIDKGTCVWFPAAGLHHDPQYFPNPERFDPERFNDENKRNINLGTYLPFGI
GPRNCIGSRFALMEVKAVMYYILLKFTIARSAKTQIPVQLRKGFTNVGPDHGMHMELKLR
*
>494345849
G719P81FE4.T0 pseudogene 92% to 494309314
DLLVQTIHGYV*SIIRRKMLRKGFRHCQES*
LGKMEMSKSMTEAEMIAQYLIFFLDGFATVSSCIMFTAYEVVRNPEVQRKLCEEIVQTDK
ELGGKPLSYEALQKMKYMDMVVSESLRIWPLAPATDRLCTKDYIVDDGQGLKFTIDKGTC
VWFPAAGLHHDPQYFPNPEQ
FDPERFNDENK
XXNSGLGTYLPFGIDPK
>AAGE01253357
476363988 755013039 587425733 632907226 55% to CYP9J5 complete
MEVNLLLLATVITVFVYLYRLITKNNDYFHDKPIPSLKARPLLGSTGPLLLKQVTFADFVSYVYNKFPGVK
(2?)
VLGMFDTLTPFFVIRDPELIKQIAVKDFDHFMDHRPFFGESAESEEHPYALFKRVI
FALNGQRWRNMRATLSPAFTGRKMRLMFTLMVDCSERMLKHYESLMSSTGRMEVEIKDML
SRYGINVIASCAFGIDVDCFKDVDHEFMYHGRRMLQMGNPVVIAKMLFMRMFPNLAKKSG
MDVIPREQAVYFTKLIKETIRTRESQGIVRNDMIDLLLEARKGTLKYEEEREEVQEGFAT
VQESDVGKAQVTKAISEIDMIAQCLIFFIAGFESVSTTSMFMIYE
LILNPEIQQKLYEEVEQTYKQLGDKLLTYDALQSMKYMDMVVSETMRKWPLSPIGDRICV
RDYTLDDGQGLRFTIDKGTCVWFPIHGLHHDPQYYPNPDRFDPERFNDQNKGNIKMGTYL
PFGIGPRNCIGSRFALMELKAVMYHMLRKFSFHRSTNTRIPLKLRKGMNNVGTDEGMHVERIRRL*
>AAGE01015732
91% to 476363988 55% to 9J5 complete
2265
MEVNLLLLATVLTVFVYLYRLITKNNDYFHDKPIPSLKARPLLGSTGPLLLKQVTFSDFV 2444
2445
AYVYNKFPGVK (2) 2477
2538
VLGMFDTLTPFFVIRDPELIKQIAVKDFDHFMDHRPFFGESVESEEHPYALFKRVIFAL 2714
2715
NGQQWRNMRATLSPAFTGRKMRLMFTLMVDCSERMLKHYESLMSSTGQMEVEIKDMLSRY 2894
2895
GINVIASCAFGIDVDCFKDVDHEFMYHGTRMLQMGNPLVIAKMLFTRMFPKLANNWGMDV 3074
3075
IPREQAVYFSKLIKETIRTRESQGIVRNDMIDLLLEARKGKLKYEEEREEEQEGFATVQE 3254
3255
SDVGKAQVTKAISEVDMIAQCLIFFIAGFESVSANTMFMIYELILNPDIQQKLYEEVEQT 3434
3435
YKELGDKRLTYDALQSMKYMDMVISETLRKWPLTPVGDRMCVKDYVLDDGQGLRFTIDKG 3614
3615
TCVWFPIHGLHHDPQYYPNPDRFDPERFKDQNKGHIKMGTYIPFGIGPRNCIGSRFALME 3794
3795
MKALMYHMLRRFSFHRTANTQIPPKFRKGMNNFGTEQGLHVELRLRGQ* 3941
>AAGE01001411.1
3000-5000 region 10kb upstream of 9J7 complete
MDLDWTQLLAIVAIVVIIYRWLTGNHDYFHHKPIPSMTVRPIMGSTGPLLLKQCTFPEFIQSSYKKFAGAR
(2)
VFGLFDTNIPMYVICDPDLIKRIAVADFDHFMDHRPIFGASNSDHPNLLFEKT
LFALTGQKWKNMRSTLSPAFTGSKMRQMFKFVVDCSESMVRFYQSEPRGTSHEMKDVFSR
FANDVIATCAFGIEVDSLRKRNNEFYVHGSKMLRLTRLSVVARLLGYRFAPTLMGKLGLD
INDQEQNQYFSSLVKETVKIRDVQGIFRPDMVHLLMEAKKGTLHHQEEIEHNKGFATVEE
SAMTKMRSMNSMTEVELIAQCLMFFLAGFDTVSTCLTFTAYELALNPTIQDKLYEEIKRI
HEAMSGKSLDYETLQKMSYMDMVISEVLRKWPAIAALDRLCVQDYEMDVGNGLKFTIDRG
SGIWIPIHAMHHDPKYYPDPERFLPERFSDENKASINMGAYLPFGIGPRNCIGSRFALME
VKAIVYHMLLRFSFERTAKTQVPVEIVKGFAPLKPKDGVFLEFRPRDAV*
>CYP9J7
262902386 621799144 520524964 618123933 complete
AAGE01001411.1
13000-15000 region
MDTIFVLALVGLLLLILLVLLYRFLSRKNDYFLNKPIPSLPGPLLLGGTSPLMLFRVSFTDYVKTVYDSFPDAK
(2?)
VCGVMNTVIPLYIVRDPELIKKIAIKDFDHFADHRPVFGSDHGDHPNLIACKA
LFVLTGPKWKTMRATLSPAFTGAKMKFMFELIVECSEALVDYYRDQGAKEWDM
RTLFARFSNDVIATCAFGIKVNSSSDRDNEFYRRGKEMMVFTNFKTQLKIAGYLFTPWLM
NWFGIDLIKQEHSDYFAGLIRDTVRTREANGIIRPDMVHLLMQSRKGILKNQQ
EDDPEQEVSETTRSLPGPTMTESEMIGQCLFFFLAGFDTVSTALTFLAYELA
LNPDVQEKLSAEIAETHQSLNKRSIT
YEALHSMKYLDMVISESLRKWPSAPAVDRLCVQDYTLDDGQGLQFRMEKGIGIWIPIYGI
HRDPKYYPEPDKFDPERFSDQRKGDIQPGTYLPFGIGPRSCIGMRFALMELKCIVYYLLL
NFRLEKTERTEVPPVLEKGYVTLSAANGVWLKMVPK*
AAGE02029679.1
use this seq
77856
MDTIFVLALVGLLLLILLVLLYRFLSRKNDYFLNKPIPSLPGPLLLGGTSPLMLFRVSFT 77677
77676 DYVKTVYDSFPDAK (2) 77635
77577 VCGVMNTVIPLYIVRDPELIKKIAIKD 77497
77496 FDHFADHRPVFGSDHGDHPNLIACKALFVLTGPKWKTMRATLSPAFTGAKMKFMFELIVE 77317
77316
CSEALVDYYRDQGAKEWDMKDLFARFSNDVIATCAFGIKVNSSSDRDNEFYRRGKEMMVF 77137
77136
TNFKTQLKIAGYLFTPWLMNWFGIDLIKQEHSDYFAGLIRDTVRTREANGIIRPDMVHLL 76957
76956
MQSRKGILKNQQEDDPEQEVSETTRSLPGPTMTESEMIGQCLFFFLAGFDTVSTALTFLA 76777
76776
YELALNPDVQEKLSAEIAETHQSLNKRSITYEALYSMKYLDMVISESLRKWPSAPAVDRL 76597
76596
CVQDYTLDDGQGLQFRMEKGIGIWIPIYGIHRDPKYYPEPDKFDPERFSEQRKGDIQPGT 76417
76416
YLPFGIGPRSCIGMRFALMELKCIVYYLLLNFRLEKTERTEVPPVLEKGYVTLSAANGVW 76237
76236 LKMVPK* 76216
>AAGE01024220
39% to 9J8 40% to CYP329A1 anopheles 826166409
note has a
P at I-helix T location like CYP329
There is a
deletion and a stop codon in N-term exon
This may
be the CYP329A1 pseudogene equivalent in Aedes
Change
N-term to eliminate the stop codon
METEDLYWFSFILVTIVGFFTFKLMTKNR
ILSGQRSSLREAAFYLRKSG*GNK
4729
IFGFYNYLSPVYYIRDPELIRKLWINEF
4455 METEDLYWFSFILVTIVGFFTFKLMTKNRHFFRVRGVPFEKPHFIYGNLGEVTSGKLSSLELIASF 4652
4653 YQKFENER (2) 4676
IFGFYNYLSPVYYIRDPELIRKLWINEFNSFANHAYFLDESKDPI
LGNQLHLLKNEKWRQMRHTLTPVLSGQSVSSMSSLIRTNSLDLV
5853
DHLKASVDSELEFKGIFLKYVFNVIANCAFGLELNTFKDESDKFCTYGTALVYGNNPVQT 5674
5673
LKTMMFYLFPKMTTQMKVRLMEDEHAAYFTNLIGSTISEREKKNVNRADVIQMLHQANKG 5494
5493
ELKAEGQDDEVLQMKDFSKCKWNQEELIAQCIAFFGSGFEPLVNLLSFA 5347
5346
AYELAANPDIQQKLLSELEGSLRDDPVVSDTVDKLSYLNMVISETLRKWPASPSLDR 5176
5175
ECSKDYLLDDGGCRVQFRKGDTLWVSIWALHRDERNFPDPERFDPERFSEKNKASITPG 4999
4998
TYMPYGVGPRNCI (1)
4902
GTRLASLVAKITLVDLVRNFKLELGSRMVQPLRLSKTSYSMEPEGGFWLKMTPR* 4738
>CYP9J10
complete
AAGE01039952 AAGE01005096 494160094 72% to TC52199
754334205
821634863 803206067 637185165 TC60951 TC32891 TC38518
57% to CYP
9J1, 90% to TC60679 98% to TC60950 (56% to anopheles 9J4)
CYP9J10
TC60950 TC38519 60% to CYP9J4 98% to TC60951 (3 aa diffs)
98% to
9J10v2 (2 aa diffs) 98% to 9J10v1 (3 aa diffs)
MVEVDLYVALAVGAIVVLLYHYAAKKYEYFLTKPIPALKPTMLFGNTGPMMFRQRDVSSHLKMLYNTYEGSK
(2)
MIGFYDLMKPIYMLRDPEVIKQIAVKDFDYFMDHTPTMTNSNPEDEVGGDSLFGNSLFA
LRGQKWRDMRATLSPAFTGSKMRHMFELVADCAKSMAEFFKAEAAAGKTLEYEMKD
TFSRFGNDVIATVAFGIKVDSLRDRDNEFYLKGKAMLNFQSLSVLLKVLFLRAFPKLS
HKLGLDFVDSTLTEYFKQMIVDNMKQRAAHGIMRNDMIQMLMEVRKGSLRHQK
DEKETKDAGFATVEESNVGKSNINRVWTENELISQCFLFFVAGFDTVSTCMTFLTYELMLNQNIQQ
RLYDEVMETEKSLNGKPLTYEVLQKMEYMDMVVSEALRKWPPAVISDRFCVKNYMYDDGQ
GTRFLVEKGQTMWIPTIAIHSDPKYYENPEKFDPERFNEENRSKIDTGAYLPFGVGPRNC
IGSRLALMEVKVIIYNLLKDFSLESSEKTQIPLKMSKNFFVLQAENGVWLELKPRKR*
AAGE02011007.1 5 diffs to 9J10 (sixth P450 on
this contig)
218249
MVEVDLYVALAVGAIVVLLYHYAAKKYEYFLTKPIPALKPTMLFGNTGPMMFRQRDVSSH 218428
218429 LKMLYNTYEGSK (2) 218464
218529 MIGFYDLMKPIYMLRDPEVIKQIAVKDFDYFMDHTPTMTNSNPEDEVGGDSLFGNSLFA 218705
218706
LRGQKWRDMRATLSPAFTGSKMRHMFELVADCAKSMAEFFKAEAAAGKTLEYEMKDTFSR 218885
218886
FGNDVIATVAFGIKVDSLRDRDNEFYLKGKAMLNFQSLSVLVKVLFLRAFPKLSQKLGLD 219065
219066
FVDSTLTEYFKQMIVDNMKQRDAHGIMRNDMIQMLMEVRKGSLRHQKDEKETKDAGFATV 219245
219246
EESNVGKSNINRVWTENELISQCFLFFVAGFDTVSTCMTFLTYELMLNQNIQQRLYDEVL 219425
219426
ETEKSLNGKPLTYEVLQKMEYMDMVVSEALRKWPPAVISDRFCVKNYMYDDGQGTRFLVE 219605
219606
KGQTMWIPTIAIHSDPKYYENPEKFDPERFNEENRSKIDTGAYLPFGVGPRNCIGSRLAL 219785
219786 MEVKVIIYNLLKDFSLVSSEKTQIPLKMSKNFFVLQAENGVWLELKPRKR*
219938
>AAGE02011007.1 pseudogene exon 1 new seq last
P450 on this contig
168bp
downstream of 9J10, 51% to CYP9Jae4
220106
MLQVHMFLAATVVLLL*YSYSITT*RNIYEYSLSKPISCAKPTFLVGRNWSTSLCKADMT 220285
220286 LHFKKICVFFPDA 220324
>AAGE01088707
494098990 826090661 819721560 57% to 9J5 complete
MEVNLFYFGVLVAILGTLYYLLTKKHGHFLDKPIPSMAAKPILGSVSDLMLQRVPFSTFIQTLYDKYRGVK
(2)
VFGLFDMMTPTYVIRDPELIKQVAVKDFDHFADHVQVFGNSSYDHPNLLTGKTLFSLTG
LRWKTMRATLSPAFTGSKMRYMFELIVECTERAVRYYEKNALKSGPKVYEMKDVFSRFAN
DVIATCAFGLQIESSRDRDNEFFVNGSKMLDFSRPSVMLRIMGHQLVPWLMAFFGWDVID
EQQNTYFKTLILDAIREREHRGIVRPDMINLLIHAKKGTLKHQQENEHVPEGFATVQESE
VGTSSVTTVMTDVEMVAQCLIFFLAGFDTVSTSLLYASYELAINPEVQQKLYDEIQNTRT
ALNGKPLTYDAMQKMKYMDMVMSEVLRMWPPAPSTDRLCTKNYVMDEGNGVKYTIEKGTS
VWFPIHALHHDPNYYPQPEKFDPERFSDERKGSINAGAYLPFGIGPRNCIGSRFALAEVK
TILYYMLGSFSFERCSKTEVPPVLAKGFDVIPANGMHIEFKPRPKK*
AAGE02029679.1
2 aa diffs to AAGE01088707 use this seq
32148
MEVNLFYFGVLVAILGTLYYLLTKKHGHFLDKPIPSMAAKPILGSVSDLMLQRVPFSTFI 32327
32328 QTLYDKYRGVK (2) 32349
32424 VFGLFDMMTPTYVIRDPELIKQVAVKDF 32507
32508
DHFADHVQVFGNSSYDHPNLLTGKTLFSLTGLRWKTMRATLSPAFTGSKMRYMFELIVEC 32687
32688
TERAVRYYEKNALKSGPKVYEMKDVFSRFANDVIATCAFGLQIESSRDRDNEFFVNGSKM 32867
32868 LDFSRPSVMLRIMGHQLVPWLMAFFGWDVIDEQQNTYFKTLILDAIREREHRGIVRPDMI 33047
33048
NLLIQAKKGTLKHQQENEQVPEGFATVQESEVGTSSVTTVMTDVEMVAQCLIFFLAGFDT 33227
33228
VSTSLLYASYELAINPEVQQKLYDEIQNTRTALNGKPLTYDAMQKMKYMDMVMSEVLRMW 33407
33408
PPAPSTDRLCTKNYVMDEGNGVKYTIEKGTSVWFPIHALHHDPNYYPQPEKFDPERFSDE 33587
33588
RKGSINAGAYLPFGIGPRNCIGSRFALAEVKTILYYMLGSFSFERCSKTEVPPVLAKGFD 33767
33768 VIPANGMHIEFKPRPKK* 33821
>AAGE01194580
86% to 494098990 832396347 complete
AAGE01341824
89% to 494098990
543
MEVNLFYFGAIVAIFGALYYLLTKKHGYFHDKPIPAMGAKPILGSIGDLMLQRVPFNTFL 722
723 QAAYDKYSGVK (2) 755
816
VFGMFDLMTPTYVIRDPELIKQVGVKDFDHFVDHEQVFGNSSYDHPNLLTGKTLFSLTG 995
996
SRWKTMRATLSPAFTGSKMRYMFELIVECIERAVKYYEEETKKKGAQVYEMKDVFSRFAN 1175
1176
DVIATCAFGLQVESSRDRDNEFFVNGSKMVDFGKPSFILRLMGHQLVPWLMAFFGWDVID 1355
1356 GQQNTYFKRLIMDAIKEREHRGIVRPDMINLLIQAKKGTLKHQQENEQVPEGFATVQESE
1535
1536
VGKSTATTMMTDVEMVAQCLIFFLAGFDTVSTSLLYTSYELAVNPEVQKKLYDEIQNTRT 1715
1716 ALGGK
1730
PLTYDAVQKMK
YMDMVISEVLRKWPPIASTD
879
878
RVCTKNYVMDEGNGIKYTIEKGAALWFPTYALHHDPKYYPQPEKFDPERFSDERKGSINT 699
698
GAYLPFGIGPRNCIGSRFALAEVKTILYYMLGSFSFERCSKTEVPPVMPKGFDVIPVNGM 519
518 HIEFKPRPKG* 486
>AAGE01341824 89% to
494098990
1238
KGTLKHQQENEQVPEGFATVQESEVGKSTATTMMTDVEMVAQCLIFFLAGFDTVSTSLLY 1059
1058
TSYELAVNPEVQKKLYDEIQNTRTALGGKPLTYDAVQKMKYMDMVISEVLRKWPPIASTD 879
878
RVCTKNYVMDEGNGIKYTIEKGAALWFPTYALHHDPKYYPQPEKFDPERFSDERKGSINT 699
698
GAYLPFGIGPRNCIGSRFALAEVKTILYYMLGSFSFERCSKTEVPPVMPKGFDVIPVNGM 519
518 HIEFKPRPKG* 486
CYP3 clan
Four CYP9M
related sequences and one new CYP9 subfamily, plus one pseudogene
>AAGE01023613
494247077 812172036 586045833 57%
to 9M1 complete
MVLLDLLVVLIPIVSYLLYRWAVATYDFFEKRKIPYVKPYPFVGGLWPVFSGKLHPTDAAVLGYNLFPEN
RFSGFFAFRRPGYLIHDPALAKQIMIKDFDHFTDHMNTISVDVDPIFGRALFFMDGQRWR
HGRSGLSPAFTGSKMRNMFTLLSKYVEGAMQRLAQDAGQGKMELEIRDLFQK
(2?)
LGNDIITSISFGVEIDSVHNPNNEFFKRGKQLAATGGFQGLKFFFSLVVPDSVFKLFGI
RFLPKEAADFYVDVVSKTIKHREEYKIVRPDFIHLFVQARKNEL
KEETADDELKSAGFTTVEEHIEASTENSQYTDLDITAVAASFFFGGIETTTTMLCFALYE
LAGNKEVQQKLQAEIDSVRKELGGGSLTYEVLQKMKYLDMVVTETLRRWPPLGITNRVCV
KPYTFEDHEGTKVTIEKGQLIQIPVQSFHRDPSFFPDPYRFDPERFSEENKHKINQDAFL
PFGSGPRNCIGSRLALMQAKCLLYYLFSAFSLEYSDKMDVPIKLNKMSLTYTAKNGFWFN
LLPKKVAV*
>AAGE01008959 54% TO 9M1
74% TO 494247077 complete
6014
MGVLEWLAVFVPIVTYLLYRWSTATYDYFREKKIPFVKPYPLFGSLWPIFSGKLHPVDAT 6193
6194 ILGYDMFPGRRFSGFFTFRTPSYLVHDPALAKQVLIKDFDHFTDHTSTILPDVDPVLGRN
6373
6374
LFFMDGQRWRHGRSGLSPAFTGSKMRNMFVLLSNYVDGAMKRLAQDAGPGKMELELRDLFQK (2) 6559
6593
LGNDIITSISFGVDIDSIHNPNNKFYKRGQKVTATGGIQGFKVFLTTVIPGSVFKF 6784
6785
FGVKVLPKEAADFYVDVISKTVKQREEYKIVRPDFIHLFMQARKNELKEDKADEELKDAG 6964
6965
YSTVEEHLQSTTKNNQYTDLDIAAVAVSFFFG 7060
7060
GIETTSSVLSFVLYELCLNPAIQHKLQEEIDTVRAQLEGNPLSYEVLQKMKYMDMVVS 7233
7234
ETLRRWAPLGIVSRKCVKPYTFEDHDGTKVTVEKGHIIQIPLQSFHRDPNFFPDPYRFDP 7413
7414
ERFSDENKHKIKQDTYIPFGSGPRNCIGSRLALMQTKCVLFYLFANFSVEFSEKMDVPIK 7593
7594
LNKMALSYTAQNGFWFHFAARDVKT* 7671
>AAGE01012700 71% to
494247077 55% to 9M1 complete
2863
MFESLALIVPVAAFLVYLWSIATYNYFKKRKIPFVKPYPLIGGLWPALTGKVLPLEAATL 3042
3043
GYDMFPKHRFSGYFMFRNPEYLIHDPALAKQVMIKDFDHFTDHTSVFPVEVDPIIGRSLF 3222
3223 FMDGQRWRNGRSGMSPAFTGSKMRNMFTLLSKYADSAMQRLVEDAGKNKLELEIRDLFQK
(2) 3402
3464
LGNDIITSISFGLDIDSVHNPDSEMFKKGKQLAGTTGFQGFKFFLSMALPSSIYKLFGI 3640
3641
RLITKDVADFYLDIVTNTIKYREENNIVRPDFIHLFVQARKNELKEDKTDETLDSAGFTT 3820
3821
VQEHIKSSSENSKYSDFGITAVVASFFFGSIETTSTVLCFAMYELAANPEIQQKLQDEIE 4000
4001
LVKDQLNDSPLTYEVLQKMKYLDMVVSETLRRWPPLGTTNRVCVKPYTLEDYDGTQVTIE 4180
4181
KGQAVQIPIISYHHDPNYFPDPYRFDPERFSDESRDKINQDAFLPFGSGPRNCIGSRLAL 4360
4361
MQVKSLLFYLLTCFSVEFSEKMDVPIKLKKMSMTYTAQSGFWFNLVPKSVEV* 4519
>494569869
25% to 9M1 N-term 39% to 494247077 pseudogene of 494247077
LDLLDLVGVLIPIGSYPLDLWVLTSYYSFEKIEIPYVKPYPLVG
ELRPEFTNVLLPSYDTGIGHLLFPETDLP*
FFWIHQIACLSPYSPQAILINMIGTYLFSDFCGI*
SADVDPDFGRALFLTDGLKTRPGRSGINVIVWAYNMNMLAVCYMPLFHGSYQKQAEDARQ
CNMSIDYCDVFHL ()
RGSDVIHYNIRLDVHIDCVHVPFYDYYHKRASG*RLTGVIFWDLEF
>AAGE01026951
AAGE01099852 476324290 55% to 9M1
579013166
826063288 832528009 494192568 637071386
613990760 complete
METLVWIALVLLIIIFLIYRWSIACYDYFEKRNILY
VKPYPFFGGLWPVFCRKLHPTDATIMGYNMFPERRCSGLFTF
RNPAYDIHDPTLAKQIMVKDFDHFTDHMNTISADVDPILGRALFFMGGSRWRHGRAGLSP
AFTGSKMRNMFVLLSKHVDEAMRRLVEDAGEGALEVEIRELFQK
(2)
LGNDITTSISFGVEVDSVHNPGNTFLEM
GKLLIATSAFQGFKYLLSLVVPESVFKFFGVRFFPKEAADFYLDIVTETISHREKNKIVR
PDFIQLFVQARKNELKKDNTDDNFK
SAGFTTVDEYIESSTENGQYTDLDIAAVALSFFFGGIETTTTAICFAVYEIVLNATIKEK
LQTEIDSVKEGLEGRPLSYEILQQMKYLDMVVSEALRRWPPVGVTNRACVKTYAFEENDG
TTVTIEEGQVVHIPVQSFHRDSNYFPDPLRFDPERFSDENKHVINQDAFLPFGSGPRNCV
GSRLALMQAKCILYYFFCTLFDGLFQQNGPTDQTQDYVSLLRSAEWFVVSFDAEHGKVVKYKK*
>AAGE01015749
494133555 519945594 763120971
810094850 53% to 9M1 complete
MILLLLVVAVGYLIYRWSVATFDYFEKLNVPFLKPYPFFGALWPSLKGE
KSPTDATAEGYRLFPGNRFSGFFSFREPGYLIHDPELIKQIAIRDFDHFTDHANNVPLEV
DPFLGRGLFFTGGQRWKHGRTALSPAFTGSKMRNMFQLLSSYTDGAMKRLVKDAAGGKLEREMKDLFQR
(2?)
LGNDVMTSISLGFDTDSVHDPDNEFFQYGKRLSRTSGLQG
LRFFVLTLLPENILKVIGIRIIPSDIANFYNEVV
IKVIKERLEKNIVRPDFIHLMLQARKNELKADKTDEFLNDAGFSTVKEHLQSSAKNQ
IEWSDYDIAATSASFFFGGIESTTTLVCFALYEIALNHDVQQKLRAEVDATKLSLGDAKL
TYESMQQMKYMDMVITETLRKWPPFGVTNRRCTKAYSLENANGTKVTVHKGQVIFIPIYE
IQRDAQYYPNPERFDPERFSDQNRGNLNQDTYLPFGIGPRNCIGSRLTLMQAKCYLFYML
TCFEIQLSTKTDVPMQLDARSSALNAKNGLKMQLIPRGV*
>AAGE01236202
AAGE01528761 AAGE01574909 575351627 754305099 587660657 263512612
581727980 743856203
625109625 223413916
773058412 (exon 1) mate
pair = 775439855 (exon 2)
832539269 578892595 complete 51% to 9M1
MMELLLLGAAALTAVCYLLYRWSTSTFGYFEKRSVPFGKPYPLLGALWPYLKGEKSPVDALCEGYRHFP
GCTYSGVFLFRSPCYLIHDPELIKKIAVRDFDHFADHANNVSLEVDPFMGRVLFFANGQR
WKQGRTALSPAFTGSKMRNMFGLVSEYTNGAVQRLVEDAEASGGKMERELNDLFKR
(2?)
LGHDAITSISLGTDIDSIREPENEFFAHGKELAKTTGLQGFRFFIMSLLPEKILRLSR
MRIVPEHLANFYHGVVSKVIKHRLDNGIVRPDFIHLLLQARRNELKTDKTDE
KFNDAGFATVQEHLQAPTKNPIEWTDYDIAATVATFFFGGAESTTALLCFTIYELALN
PHVQQKLLAEIDSVQKTVGTEKLTYESMQQMKYLDM
VISETLRKWPPFGVTNRRCTKPYQIQDVDGHSVTIEKGQVVFLPIQHIHRDPHFFPNPMR
FDPERFATENRDQLNQDAYLPFGAGPRNCIGSRLSLMQTKCFLYYLLSTFEVQLSNRTEV
PIEIDLKATGLNSKNGFWFHLIQRVK*
>AAGE01014192
813491936 639416242 762398872 complete
TC52960
TC20003 TC26029 TC39058 TC4763 TC7436
40% to
CYP9J4 40% to 9A4 (new subfamily in CYP9)
AAGE02005788.1
4968-6548 no introns
ESTs
DV359961,DV294300,DV294302,DV359959
MEAFLLISALVGALILLYRYATAFANYFNQRGIKYRKPTFLLGNLGPILFQRTT
PVANLTDLYREFAGEKVYGFYEFRRPTIILRDLQLIKRVF
VKDFNHFTNHTAPVDEHMDSILGNGLISLEGQKWRDMRAMLSPMFTGNKIRHMVPLVGKC
AEDLCRFVERETDEVEWDVRELLAKCLVEVIGSCAFGIEVDSFNDPDNEFDRVAKYLMNQ
SDVRKVARFLLIMVFPKMCKQFGMELFDDKYKRLFRRLVSETMLKRESDGVSRPDLIQLLMLARRG
KLEADKDVEGESFAAANDYLETGTDDVKRSWSDDELTAQAVIFFAAGFDTTSTLLSFTLM
ELAIHPEIQDRLFEEIKSVQRSDSVISYEQIQSLEYLDAVISESLRKWPPLTATDRKCTK
DYLMVDPEDGSPMFSIEEGYSVWVPIYCFHHDPKYFPNPEKFDPDRFNRVNRHQLNPAAY
MPFGVGPRNCIGSRFALMSAKMILLRLLRSFRVEVCPKTDTTLQLSKTKMNMTLEKGHWV
YLKRRS*
CYP4 clan
sequences
>AY433052
AAGE01072700.1 AY431937 88% to 4G16 complete
AAGE01141041.1
AAGE01223479.1 AAGE01094290.1
1517
MSATVAPADPVMANANIASPMNVFYFLLAPALLLWFIYWRISRQHMLKLAEKIPGPPGLP 1338
1337 LLGNALELIGTSH (1?)
1379 SVFRNVIEKGKDFNQVIKIWIGPKLIVFLVDPRDVELLLSSHVYIDKSPEYRFFKPWLGNGLLIST 1179
323
GHKWRQHRKLIAPTFHLNVLKSFIDLFNENSRLVVEKMHKEAGKTFDCHDYMSECTVEILL (1)
1240
ETAMGVSKKTQDQSGFDYAMAVMKMCDILHLRHRKMWLYPDLFFNMSQYAKRQVKLLDTI 1061
1060
HSLTRKVIRNKKAAFATGTRGSLATTSIKTAEFEKPKSNINTNSVEGLSFGQSANLK 890
889
DDLDVDENDVGEKKRLAFLDLLLESAENGALISDEEIKNQVDTIMFEGHDTTAAGSSFFL 710
709
SMMGIHQHIQDKVIQELDDIFGDSDRPATFQDTLEMKYLERCLMETLRMYPPVPIIARS
LKQDLKLASSDLVVPSGATIVVATYKLHRLETIYPNPNVFDPDNFLPERQANRHYYAFVP
FSAGPRSCV
320 (1)
255 GRKYAMLKLKVILSTILRNFRVISDLKEEDFKLQADIILKREEGFQIRLEPRQRKPKAAKA*
>AAGE01114834
52% to AY433052 same as TC67187 76% to 4G17N-term probable ortholog complete 80% to 4G17 full
length
AAGE01340100.1
same as AAGE01229939.1 85% to 4G17 C-term, probable ortholog
633767131
823361413 823353110
MVIFMTLVLVASALFHFWMISRRYVQLGNKIPGPRAYP
FIGNANMLLGMNHNEIMERAMQLSYIYGSVARGWLGYHLVVFLTEPADIEIILNSYVHLT
KSSEYRFFKPWLGDGLLISSGEKWRSHRKLIAPAFHMNVLKTFVDVFNDNSLAVVERMRK
EVGKEFDVHDYMSEVTVDILLETAMGSQRTSESKEGFDYAMAVMK
(2)
MCDILHSRQLKFHLRMDSVFNFTKIKQEQERLLGIIHGLTRKVVKQKKELFE
KNFADGKLPSPSLSEIIAKEESESKESLPV
ISQGSLLRDDLDFNDENDIGEKRRLAFLDLMIETAKSGADLTDEEIKEEVDTIMFEGHDT
TAAGSSFVLCLLGIHQDVQDRVYKEIYQIFGNSKRKATFNDTLEMKYLERVIFETLRMYP
PVPVIARKVTQDVRLASHDYVVPAGTTVVIGTYKVHRRADIYPNPDVFNPDNFLPERTQN
RHYYSYIPFSAGPRSCV
(1)
GRKYAMLKLKVLLSTILRNYRVVSNLKESDFKLQGDIILKRTDGFRIQLEPRV*
>CYP4C38? exon 1 AAGE01133681.1
587572087 complete
These two pieces probably are
from one gene, since there are no
Other
closely related sequences found. 66% to 4C36
784 MSELTTFIYGILVFLIFAPFLQWWVKRARLVQIIDKIPGPKAYPFIGTTYTFFGKKHY
(1) 611
>CYP4C38
N-term AAGE01207392.1 AAGE01470307.1 71% to 4C27
824335234
761357490 744250376 592527729
570727647
754993699 585845687 593920597 613947338
594452687
575404595 749489367 579218945 825227784
AAGE01009885.1
TC66432 Length = 995 71% to 4C27 anopheles
AAGE01207392.1
matches the N-term part of TC66432
parts are on AAGE02022591.1
AAGE02022592.1. AAGE02022593.1
use these seqs
456
ELFYIIDERTRRYPDIHRIWTGMRPEIRISKPEYVETIIGASKHMEKSHGYDFLFDWLGEGLLTSK 259
302
GERWFQHRKLITPTFHFNILDGFCDVFAEQGAVLAERLEPFANTGKPVDVFPFITKAALDIIC (1) 490
694
ETAMGVKVNAQTGGENNYVNAIYR (2) 762
822
MSEIFVDRSIKPWLHPEFIFKRTEYGRQHKKALDIVHGYTKK (0) 947
VIRDRKEALQVKENSTGAGDTGEDLYFGTKKRLAFL 227
228
DLLLEGNAKHKQLTDDDVREEVDTFMFE (0)
GHDTTTAGMSWALFLLGLHPDWQDRVHQEIDS 407
408
IFAGSDRPATMKDLGEMKLLERCLKETLRLYPSVSFFGRKLSEDVTLGQYHIPAGTLMGI 587
588
HAYHVHRDER (2?)
FYPDPEKFDPDRFLPENTEHRHPFAYIPFSAGPRNCIGQKFAILEEKS 761
762
IVSSVLRKFRVRSANTRDEQKICQELITRPNEGIRLYLEKRQ*
>Exon
1 of 4C25 ortholog AAGE01102043.1 80% to 4C25 complete
Exon
2 of 4C25 ortholog AAGE01326257.1
AAGE01078331.1 83% to 4C25
515
MIEATVKSSFVLSKVAKMLSYFSPITIILATMIAGAIYVYNKRRARLVKLIEKIPGPASMPLIGNSLHINVDHD 294
(1?)
EIFNRIISIRKLYGRQQGFSRAWNGPIPYVMISKASAVE (0) 935
PILGSPRHIEKSHDYEFLKPWLGTGLLTSQGKKWHPRRKILT 1504
1505
PAFHFKILDDFVDIFQEQSAVLVQRLQRELGNEEGFNCFPYVTLCALDIVC (1) 1657
1714 ETAMGRLIHAQKNSDSDYVKAVYQ (2)
IGSIVQNRQQKIW 1887
1888
LQPDFIFKRTEDYRNHQRCLSILHEFSNRVIRERKEEIRKQKQSNNNTINGNANNAVEAN 2067
2068 ILDGNNNAEEFGRKKRLAFLDLLIEASQDGTVLSNEDIREEVDTFMFEGHDTTSAAISWI 2247
2248
LLLLGAEPAIQDRIVEEIDHIMGGDRDRFPTMKELNDMKYLECCIKEGLRLYPSVPLIAR 2427
2428
KLVEDVQIEDYTIPAGTTAMIVVYQLHRDPAVFPNPDKFNPDNFLPENCRGRHPYAYIPF 2607
2608
SAGPRNCIGQKFAVLEEKSVISAVLRKYRIEAVDRRENLTLLGELILRPKDGLRIKISRR 2787
2788 E* 2793
>AAGE01094388.1
exon 2 4C like possible pseudogene fragment cannot extend
2051 FNRIECIKRLYTYQSGGYMRTWNG 1980
>AAGE01029369.1
72% to 4C26 62% to 4C25 complete
did blast
with exon 1 of 4C26 to find best match
did blast
with last 500bp of AAGE01029369 to find trace seq on (+)
759912013
mate pair
= 759644271 will be downstream of AAGE01029369 and should match next
contig,
possibly with N-term exon. Contig
match = AAGE01001656.1 (-)
over
15kb, but no P450 seq. Might be a
short contig between these
AAGE01030574.1
exon 2 4C like
586027613
matches first 500 bp on (-)
mate
pair = 586024059 matches
AAGE01059591.1 (+) no P450 seq
repeat
with first 500bp of AAGE01059591
600013440
matches on (-) mate pair = 600014884
this
matches two contigs AAGE01312028.1 (-)
and
AAGE01029369.1 (+) this one has a
P450 seq
that
is 4C like and complements this exon 2 seq.
Join
them
Now
use last 500bp of AAGE01030574 to find a trace file that matches on (+)
636183786
mate pair = 637148886 matches AAGE01001355.1
this
is the same contig found in a search above going downstream from the N-term of
a 4C like P450. The intron must be
more than 17kb
join
exon 1 seq
AAGE01098344.1
best hit to 4C26 N-term
searched
by megablast to get 585803103(+), mate pair = 585951518
searched
WGS with this to find adjacent contig downstream
this
= AAGE01001355.1 16kb, no P450 seq
148
MNHNIAAKIASLFSVLSPITTVILVVMVCAIITYKKKRARLVHHINRIPGPFMLPIIGNGLHVTLGCKD 354
4051
EFLDRVISAQKMYGRRIGMSRAWNGPIPYVMISKASAVE 3935
3210 PILSNPKLVEKSVDYDFMKPWLGNGLLTSRASVWHPRRKTLTPAFHFKILSEFVNIFHK 3034
3033 QALVMNEKLAEQLDNTAGFDIVPFTTLCALDIFC (1?)
2868 ETAMGCPVNAQKNSDSEYVRAHK (2?)
IGKIIRNRLQKVWLRPD 2686
2685
FIFKHTEDYRKHQECLQVLHNFSDRVVQERKTEIVAKRCQAEDLIDLNNNKVADETISCC 2506
2505 SKKQLEFLDLLIEGSLDGNGLTDLDVREEVDTFVIGGHDTTAAAMAWILLLLGSDQKIQD 2326
2325
RVIDEIDGIMNGDRDRRPTMQELNDMKYLECCIKEGLRLYPSIPLIARRLTEDVQVDDY 2149
2148
IIPSGTTTLIVVYQLHRDPSVFPNPDKYNPDNFLPENCSGRHPYAYIPFSAGPRNCIGQK 1969
FAILEEKMVLSTVLRKFRIEAVERREDVKLLGDLVLRPRDGLKIRVSRRL* 1816
1.295_1 AAGE01029369 Hils Version 15 diffs to
blast file
AAGE02013631.1 exon 1, AAGE02013630.1 exon 1 exact duplicate,
AAGE02013629.1 exon 2, AAGE02013628.1 exons 3-5 use this seq
MNHNIAAKIASLFSVLSPITTVILVVMVCAIITYKKKRARLVHHINRIPGPFMLPIIGNGLHVTLGCKDEFLDRVISAQK
MYGRRIGMSRAWNGPIPYVMISKASAVEPILSNPKLVEKSVDYDFMKPWLGNGLLTSRASVWHPRRKTLTPAFHFKILSE
FVNIFHKQALVMNEKLAEQLDNTAGFDIVPFTTLCALDIFCETAMGCPVNAQRNSDSEYVRAHKLIGKIIRNRLQKVWLR
PDFIFKHTEDYRKHHECLQVLHSFSDRVVQERKAEIVAKRRQAEDLIDLNNNNESEELTSCCRKKQLAFLDLLIEGSLDG
NGLTDLDVREEVDTFVIGGHDTTAAAMAWILLLLGSDQKIQDRVIDEIDGIMNGDRDRKPTMQELNDMKYLECCIKEGLR
LYPSIPLIARRLTEDVQVDDYIIPSGTTTLIVVYQLHRDPSVFPNPDKYNPDNFLPENCSGRHPYAYIPFSAGPRNCIGQ
KFAILEEKMVLSTVLRKFRIEAVERREDVKLLGDLVLRPRDGLKIRVSRRL.
>476414268
92% to Aedes albopictus AY971511 complete
760814858
568935720 581452704 754413849 580048410
walked
upstream to 531423840
walked to
529070673
walked to
824339230 mate pair = 823396717 matches C-term
AAGE01143020.1
63% to 4C28 AAGE01462557.1 62% to 4C37
AAGE01324666.1
exon 2 4C like same as 476414268
supercontig
1.295 frame = -
177175
MLKEPLLLVITIASQLLHAVKEFPLPATVLLGVVIVVYLFAHADRDQLKSLLRINGAKDG 176996
176995
SKKSVKFYLNQLPGPQCIPLLGNSLMMATDRE (1) 176900
DMFNRLTTARKLYGRKQGICRIWNGRTPYVLISKAEPVERILSSSVNIEKGRDYGFLRPW
546
LGNGLLTCPGSRWYKRRKALNPTFNYKMLSDFLEVFNRQAQTMVRLMEKELNRENGFN
CTRYATLCSLDILCETAMGYPIQAQEQFGSDYVKAHEE
(2)
IGRIMLERLQKIWLHPDFIYKRTNFYKRQSECLKILHGFSENVIKQRRLQRDASLANKHDEDPSI
EIGRKRQLAFLDLLLEATQDGQPLSDRDIRDEVDTFILGGHDTTATAIGWLLYL
LGTDPQVQDRVFEEIDSIMGQDRDRPPTMIELNEMKYLECCIKEALRLFPSIPLIARKLT
ESVNVGDYTIPAGTNAVIVVYQLHRDTQIFPNPDKFNPDRFLPENSQGRHQY
AYIPFSAGPRNCIGQKFGLLEEKAVAVAVLRKYRITSLDRREDLTLYGELVLKSKNGL
RISISQRQ*
AAGE02013627.1 exon 1, AAGE02013626.1 exons 2-3
use this seq (3 diffs)
12959
MLKEPLLLVITIASQLLHAVKEFPLPATVLLGVVIVVYLFAHADRDQLKSLLRINGAKDG
SKKSVKFYLNQLPGPQCIPLLGNSLMMATDRE (1) 12684
13019
DMFNRLTTARKLYGRKQGICRIWNGRTPYVLISKAEPVERILSSSVNIEKGRDYGFLRPW 12840
12839
LGNGLLTCPGSRWYKRRKALNPTFNYKMLSDFLEVFNRQAQTMVRLMEKELNRENGFNCT 12660
12659 PYATLCSLDILCETAMGYPIQAQEQFGSDYVKAHEE (2) 12552
12391
IGRIMLERLQKIWLHPDFIYKRTNFYKRQSECLKILHGFSENVIKQRRLQRDASLAN 12221
12220
KHDEDPSIEIGRKRQLAFLDLLLEATQDGQPLSDRDIRDEVDTFILGGHDTTATAIGWLL 12041
12040 YLLGTDLQVQDRVFEEIDSIMGQDRDRPPTMIELNEMKYLECCIKEALRLFPSIPLIARK 11861
11860 LTESVNVGDYTIPAGTNAVIVVYQLHRDTQVFPNPDKFNPDRFLPENSQGRHQYAYIPFS 11681
11680
AGPRNCIGQKFGLLEEKAVAVAVLRKYRITSLDRREDLTLYGELVLKSKNGLRISISQRQ* 11498
>AAGE01044016.1 AAGE01004063 47% to 4AR1, 614744667
579602080 complete
probably
same gene as 4T1.6 (3
diffs), 4I1.3 (2 diffs)
MIAIIAFTAIFVLFVYVWQWRRRLSRPFRTVPGPPGLPLIGNCHQFIGKSSTNIFHMLI
ELERLYGSVFKVDVATGIWLFYMSPGDIERIMTGP
EFNCKSDDYDMLLEWLGTGLLISNGNKWFTHRKALTPAFHFKILDNFVQVFDEKSTILAR
KFLSYSGKVVGIFPLVKLCTLDVIVETAMGTESNAQTEESGYTMAVEDISEIVFWRMFNN
VYNTEFMFKLSNKYGTYKKCLETIREFTLSIIEKRRSTLNVFDKNGGTSEVCNDSTGLKK
KMALLDILLQTEIDGRPLTNEEVREEVDTFMFA
(0)
GHDTTASAITFLLYAMAKYPDVQQKVYEEAVSVLGDSIDTPITL
SALNDLKYLDLVIKESLRMFPPVPYISRSTIK
EVELSGCTIPTGTNITVGIFNMHHNPKYFPDPEEFIPERFEVERGVEKQHPYAYVPFSAG
GRNCIGQKFAQYEIKSTISKVIRLCRIELI
RPNYEPPLKAEMILKPQDEMPLRFFPR*
>AAGE01046474.1
possible 4K2 like N-term joins with AAGE01021812
2979
MLIVVLLPLVITLCLVFAFVHRKLLQFPNLAGPPEWPIAGSATEIVNLSSI (1) 2827
2664
EIFKLLRRYAQQYGTAYKLSFWYQYTLVFAKPDIAE (0) 2557
supercontig
1.283 1377209 EIFKLLRRYAQQYGTAYKLSFWYQYTLVFAKPDIAE 1377316
AAGE01021812
52% to AAGE01044016 54% to CYP4K2
supercontig
1.283 1387420 KILNTQSYASKSEDYDKVAEWIGYGL 1387497
KILNTQSYASKSEDYDKVAEWIGYGL 4389
4388
LISKGEKWFKRRKVLTPGFHFKILESFVRVFNEKSDVLCRKLASYGGSEVDVFPTLKLYT 4209
4208
LDVLCETALGYSCNAQTEDSFYPAAVEELMSILYWRFFNLFASVDTLFRFTKQYRRFHKL 4029
4028
IGDTREFTLKIIEEKRKLLNELHDEGAVNEEDDEGKKKMALLDLLLRATVDGKPLSD 3858
3857
DDIREEVDTFTFA 3819 (0)
3755
GHDTTASALTFLLFNIAKYSDVQQKLFEEISSVVGSTSELSLH (2) 3645
3583
TLNDLRYLDLVIKESLRLYPSVPMIARIATENTKLDDMPIPKCTCVSVDIFQMH 3404
3403
RDPDRFEDPESFIPERFDAIRDGGKHNAFTYIPFSAGNRNCI 3269 (1)
3219
GQKFAQYELKIAVVKLIQTFRLELPSPDIEPILKAEIVLKPAEKLPIRFITRTTK* 3049
AAGE02013268.1
use this seq
239125
MLIVVLLPLVITLCLVFAFVHRKLLQFPNLAGPPEWPIAGSATEIVNLSSI (1) 239274
239440 EIFKLLRRYAQQYGTAYKLSFWYQYTLVFAKPDIAE
(0) 239547
249651
KILNTQSYASKSEDYDKVAEWIGYGLLISKGEKWFKRRKVLTPGFHFKILESFVRVFNE 249827
249828
KSDVLCRKLASYGGSEVDVFPTLKLYTLDVLCETALGYSCNAQTEDSFYPAAVEELMSIL 250007
250008
YWRFFNLFASVDTLFRFTKQYRRFHKLIGDTREFTLKIIEEKRKLLNELHDEGAVNEEDD 250187
250188 EGKKKMALLDLLLRATVDGKPLSDDDIREEVDTFTFA
(0)
250362
GHDTTASALTFLLFNIAKYSDVQQKLFEEISSVVGSTSELSLH (2) 250490
250552
TLNDLRYLDLVIKESLRLYPSVPMIARIATENTKLDDMPIPKCTCVSVDIF 250704
250705
QMHRDPDRFEDPESFIPERFDAIRDGGKHNAFTYIPFSAGNRNCI (1) 250839
250901 GQKFAQYELKIAVVKLIQTFRLELPSPDIEPILKAEIVLKPAEKLPIRFITRTTK*
251068
>CYP4D23
AAGE01000026.1 476394815
TC65595 TC24018 TC42055 74% to 4D22
only 51%
to 4D17, probable ortholog of 4D22 complete
4T2.8 (v1
2 diffs), 4T1.3 (v2 1 diff), 4T1.1 (v3 1 diff)
AAGE01263405.1
probable exon 1 of 4D22 ortholog
566
MSILDWILVITGAVLAINYLLVRRNLKYQSQWPGPAAVPLIGCYYLYFNKKPE (0)
5888
DVMDFIFTLSRKYGTMFRVWVGTRLALFCTNTPDTETVLSSQKLIRKSELYKFLVPWLGN 6067
6068
GLLLSTDQKWFNKRKIITPAFHFKILEQFIEVFDRQSGILVQKLKPEASGKLVNVYPYVT 6247
6248
LCALDVIC 6271 (1)
6334
ETAMGTPINAQTDVDSKYVRAVTELSYLLTTRFVKVWQRSDFLFNLSPDRKRQDKV 6501
6502
IKVLHDFTTNIIQKRRKELMDHGDSGISGDDSIGSKKKMAFLDVLLQASVDGKPL 6666
6667
TDKEIQEEVDTFMFEGHDTTTIAIAFTLLLLARHPEVQEKVYKEVTEIIGTDLSIPATYR 6846
6847
NLQDMKYLEMVIKESLRLYPPVPIIGRKFTEKTTIGGNVIPEDSNFNLGIIVMHRDPKLF 7026
7027
DDPEKFDPERFSPERTMEQSSPYAYIPFSAGPRNCI (1)
7202
GQKFAMLELKSTLSKVIRNYRLTEAGPEPQLIIQLTLKPKDGLKIAFVPRA* 7357
>CYP4D24 AAGE01006231.1 4T2.6 (100%) complete 494125342 62%
to 4D16
AAGE01082298
bridged by 825775921 to AAGE01006231.1
1254
MLILLASVVVLSIGLAVYFYQQFANRLHYAAKIGGPKGYPLLGNSIQYGTKSPVEFLQE 1078
1077
VQKTNEQCGKFYRLWIGPDLIFPITDAKL 991 (0)
917
AILSSQKLLDKSVQYDFIRPWLGNGLLTSTGRKWHSRRKIITPTFHFKILEQFVEIFD 744
743
QQSNIFVGQLKSKAQSGEDFDVFPVVTLCALDVIC 639
4385
ESAMGTKVNAQLNSDSKYVRAVKD (2) 4456
MATVAMARSFKAFARFNFTFYFTPYRRMQDKALKVLHDYTDSVIRSRRLEL
AKGAFTKSDENENDVGIRKKVAFLDMLLQATVDGRPLDDLEVREEV
4812
DTFMFEGHDTTTSAISFLIGILAKHPDVQQKVYDEVRNVIGDDLNVSVTLSMLNQLNYLD 4991
LVIKETLRLYPSVPIYGRMLLENQEI
(1)
5134 NGTVFPAGSNLAIFPYFMGRDPEYFENPLEFRPERFAVETSAEKANPYRYVPFSAGPRNCIGQKFA 5331
VAEIKSLISKLVRHYEVLPPKQPNSERMIAELVLRPEGGVPVRIRSRVR*
>AAGE01055570
54% to 4D16 TC58022 AAGE01032454.1 512569922 complete
mate pair
= 514720868 = C-term
walked
upstream to 822913819 mate pair = 822913819 = C-term
walked up
to 803280909 mate pair 808283299 matches mid region
walked to
519825648 mate pair 520507511 matches C-term
walked to
826165713 and 825253376 mate pair = 825244224 matches exons 2,3
walked to
528823040 and 572484877 mate pair = 572478448 matches exons 3,4
walked to
585907890 walked to 812022667 and 749632380 mate pair = 749635932
matches
exon 2, walked to 578582171 (possible repeat region)
walked to
580094767 mate pair = 585907890 above so got past the repeat
also found
759050174 mate pair = 759046608.
This mate pair has an N-term seq
which is
almost identical to AAGE01124480
another
hit that matches 759046608 exactly = 824317331 so the seq is confirmed
MWFLLSLVAAACLAWAIYRKFARTLEISGQHTGPPALPILGNGLWFLNKQPD (1)
EFLPIIQRLTDEYGDVFRFWQGPEFTLYVGRPSMIE (0)
TLLTDKNLTDKS 392
393
GEYGYLSNWLGDGLLLSKRNKWHARRKAITPAFHFKILEQFVDVFDRNASELVDVLGKHA 572
573 DSGEVFDIFPHVLLYALDVIC (1) 635
698
ESAMGTSVNALRNADSEYVRAVKEAANVSIKRMFDFIRRTPLFYLTPSYQQLR-KSLK 868
869
VLHGYTDNVITSRRKQLSNSSNKNHKDSDDFGFRRKEAFLDMLLKTNINGKPLTDLEI 1042
1043
REEVDTFMFEGHDTTTSAVVFTLLNLAKHPAIQQKVYDEIESVIGNDLQKPIELSDLHDL 1222
1223
SYLEMVIKETLRLYPSVPLIGRRCVEETTIEGKTIPAGANIIVGVFFMGRDPNYFEKPLD 1402
1403
FIPERFSGEKSVEKFNPYKYIPFSAGPRNCI 1501
GQKFALNEMKSVISKLLRHYEFILPAGSPAEPLLASELILKPHHGVPLQIRRRGH*
>516274867
broken CYP4D exon 1 probable pseudogene
AGQSALPILGNVLRFLNYLPD
(1)
AGCTGGCCAATCGGCCTTACCAATCCTGGGGAATGTACTGAGGTTTCTCAACTACTTGCCCGATGGT
>AAGE01019344.1
AGE01032320 54% to 4D17 complete
AAGE01178606.1
exon 2 of 4D like seq, 45% to 4D17 but only 35% to 4D15
TC64783
Length = 903 83% to AAGE01055570
793219534
630759272 note that in 630759272 exons 2 and 3 are on the – strand
and exon 4
is on the plus strand. But the order is correct on 587120742
520001141 512663558,
AAGE01124480.1
516274910 = exon 1 most like 4D17
744614807
568757642 576385708 are exact matches so this seq is
really
different from the seq of AAGE01055570
this is
probably the N-term exon of seq AAGE01019344
904
MWIYLSLLTVGFVAVVIYRKFARTLEVAQQYAGPPALPILGNGLWFLNKQPD (1) 1059
1674 EFLPIIHKLTSTYGDVVRFWQGPQFTLYVGNPSMIE
(0) 1781
19
ILTNKHLTDKSGEYDYLSNWLGDGLLLSKRHKWHARRKAITPAFHFKILEQFVDVFDRNAAELV 198
199 DVLEKHADDGKTFDMFPYVLLYALDVIC (1)
332
ESAMGTSVNALRNADSEYVRAVKEAAHVSIKRMFDIIRRTSLFYLTPSYQKLRKALK 511
512 VLHGYTDNVIVSRRNQLMSKTDSGGVSDEFGAKKKDAFLDMLLRTSINGKPLTNLEIRE
688
689
EVDTFMFEGHDTTTSAVVFTLFNLAKHPEIQQKVYDEIVSVIGKDPKEKIELSHLHDLSY 868
869 TEMAIKETLRLFPSVPLIGRRCVEEITIEGKT
IPAGANIIVGIYFMGRDPKYFENPSHFIPERFEGEFSVEKFNPYKYIPFSAGPRNCI (1)
GQKFALNEMKSVISKLLRHYEFILPPDSVEEPPLASELILKPHRGVPLQIRHRALN*
>AY431801
64% to 4D24 AAGE01115931.1 AAGE01014858.1
AAGE01023514.1 complete
AY433130
change 1 aa use AAGE02013268.1
MFLLVTVFFAVVSLAVFVYQKFANQLYYGAKIGGPKCYPLVGNAFRFINKSPP
()
DFFLTIERTVREAGKCFRLWLGPELLIIVTDAKVAE
()
GVLSSPKFIEKSGEYNFIRPWLGDGLLTSSYRKWHSHRKIIT
PTFHFKILEQFVEIFDSQSNILIDKLTPFMESGETFDVFPLVTLCALDVIC
(1)
ESAMGTKVNAQIHSDSEYVQAVKE
2240 (2)
2179
ITTIIHIRTYDVLARYDFLFNLSSYRKRQDKVLEVLHGYTNSVIRSRRRELSDAKEANPD 2000
1999
NNATSELGIRRKVAFLDMLLQATVDGRPLTDVEIREEVDTFMFEGHDTTTSAISFLLYRL 1820
1819
AKHPEVQHKVYDEIKAVIGEGMTGPVTLSMLNELHYLELVIKETLRLYPSVPFYGRKVLENSEI (1) 1628
1567
EGTTFPAGSNLILMPMFMGRDPEYFDDPLEFRPERFEKEISAEKVNPYRYIPFSAGPRNCI 1385
1384
GQKFAMAELKSVASKVLRHFEVLPPEGGQEESFIGEMILRPTYGVLLRLKKRQ* 1229
AAGE02013268.1
212620 MFLLVTVFFAVVSLAVFVYQKFANQLYYGAKIGGPKCYPLVGNAFRFINKSPP
() 212462
202379 DFFLTIERTVREAGKCFRLWLGPELLIIVTDAKVAE
(0) 202272
189962
GVLSSPKFIEKSGEYNFIRPWLGDGLLTSSYRKWHSHRKIITPTFHFKILEQFVEIFDSQ 189783
189782 SNILIDKLTPFMESGETFDVFPLVTLCALDVIC (1)
189684
189644 ESAMGTKVNAQIHSDSEYVQAVKE (2) 189553
189492
ITTIIHIRTYDVLARYDFLFNLSSYRKRQDKVLEVLHGYTNSVIRSRRRELSDAKEANPDNNA 189304
189303
TSELGIRRKVAFLDMLLQATVDGRPLTDVEIREEVDTFMFEGHDTTTSAISFLLYRLAKH 189124
189123
PEVQHKVYDEIKAVIGEGMTGPVTLSMLNELHYLELVIKETLRLYPSVPFYGRKVLENSEI (1) 188941
188880
EGTTFPAGSNLILMPMFMGRDPEYFDDPLEFRPERFEKE 188764
188763
ISAEKVNPYRYIPFSAGPRNCIGQKFAMAELKSVASKVLRHFEVLPPEGGQEESFIGEMI 188584
188583 LRPTYGVLLRLKKRQ* 188536
>CYP4H28
4T2.2 (2 diffs) complete
AAGE01082714.1
AAGE01027375.1 AY432644
55% to 4H18
MLAILVSLATVAFLWLVYQRRMARAAKIAAYFPHPKPVLPLLGNSLMFANKDAPAIFHTVLDLHKQCG 1218
QNLVTYGLFGDVQLHISSPKAIERVLLSKVTKKNYIYEYLEPWLGTGLLLSFGEKWFQRR 1038
KIITPTFHFKILEQFLEVFNAETDRLVTKIEQHVGGEEFDMYQYITLHALDSIC 876
2915
ETSMGVSINALDNPDNAYVHAIKDFGSIVIQRTFSALRSFPLLYFLHPFYWRQQKLIK 3094
3095
TMHN FTNSVIKAKRQALEEKRHTEGETKEHNEDDGIYGKKRMSFLDLLLNESSMSD 3262
3263
ADIREEVDTFMFEGHDTTTSGIYFSLMALAMHPDIQERLYGEIRQVLETEEERHAPLTNATLQQMKY 3463
3464 LDMVIKEVLRVYPSVPIIGRELLEDVEI (1) 3547
3604
NGCQVPRGTAMVVIIHNVHRNAEVFPDPERFDPERFSDESGGKRGPYDYIPFSVGARNCI 3783
GQKYALLEMKVTLVKLLLAYRFIPGKSTDSIRIQGDLVLRPFGNMALRIESR*
>4H29
4I1.8 512549996 AAGE01010708.1 784728638 complete
MVPLLMLISLLASALIWVLSALVKNLLVYRELQRKLPNFVSTPTVLLLGNTHLFKKDPTPPG
IFATFNQFHRTYGNDLIVQGLLNRPALQITSAPVVEQVLQARTIKKSIIYEFMRPWLNEG
LITSLGKKWAQRRKIITPAFHFKILEEFLAIFNERTEVFVDKIKDQVGKGDFNIYEHVTL
CTLDIISESAMGVKLNAQDDPNSSYVQAVKE
(2)
MSEIIFQRLFGLLRMHKFFFQMSEAAQRQRAALKVLHK
FTDSVIFQRKDQLDDEQARQESKQKLEETDIYGKRKMTLLELLLNVSVEGHHLSNS
DIREEVDTFMFEGHDTTTSCISFSAYHIARHPEVQQKLYDEMVQVIGKDFKNAELSYSTL
QELKYLEMTIKEVLRIHPSVPIIGRKTTGDMRIDGETVPAGVDIAVLIYAMHNNP
EVFPEPEKFDPERFNEENSAKRHPYSYIPFSAGPRNCIGQKFALLEIKVTLVKLLGHYRL
LPCEPENEVKVKSDITLRPVNGTFVKIVPR
AAGE02013311.1
use this seq (3 aa diffs)
43167
MVPLLMLISLLASALIWVLSALVKNLLVYRELQRKLPNFVSTPTVLLLGNTHLFKKDPTP 42988
42987
PGIFATFNQFHRTYGNDLIVQGLLNRPALQITSAPVVEQVLQARTIKKSIIYEFMRPWLN 42808
42807
EGLITSLGKKWAQRRKIITPAFHFKILEEFLAIFNERTEVFVDKIKDQVGKGDFNIYEHV 42628
42627 TLCTLDIISESAMGVKLNAQDDPNSSYVQAVKE (2)
42529
37902 MSEIIFQRLFGLLRMHKFFFQMSEAAQRQRAALKVLHKFTDSVIFQRKDQLDDEQARQES 37723
37722
KQKLEETDIYGKRKMTLLELLLNVSVEGHHLSNSDIREEVDTFMFAGHDTTTSCISFSAY 37543
37542
HIARHPEVQQKLYDEMVQVIGKDFKNAELSYSTLQELKYLEMTIKEVLRIHPSVPIIGRK 37363
37362
TTGDMRIDGETVPAGVDIAVLIYAMHNNPEVFPEPEKFDPERFNEENSAKRHPYSYIPFS 37183
37182 AGPRNCVGQKYALLEIKVTLVKLLGHYRLLPCEPENEVKVKSDITLRPVNGTFVKIVPR 37006
>476148479
476152924 832469399 620727729 529569782 68% to 4H18 complete
AAGE01076911.1
MLLILTLIFATVGYALFNYHRQRQKLLNIRSHFDGPDSHYLWGTFPMFIGKTIP
(1)
DIWDIITDLHKKHGEDIAIIAAFN
ELVMDLSSSKNVEKVLLAKSIKKSFAYDFLEPWLGTGLLISTGEKWFQRRKIITPTFHFS
MLEGFLEVFNKEANILVSKLKAKAGKDEFDIYDYVTLYALDSIC
ETSMGVQINAQDDPNNEYAVAVKQMSTFILRRVFSILRTFPSLFFLYPFAKEQKKVILKLH
NFTNSVIDARRAMLEKEKSNKNVTFDLQEENMY
TKRKMTFLDLLLNVTVNGKPLSREDIREEVDTFMFEGHDTTTSGISFTLWHLAKYQDV
QQKLFEEIDRVLGKDKVNAELTNLQIQELDYLDMVVKESLRLIPPVPIIGRTLVEDMEM (1)
NGVTIPAGTQISIKIYNIHRNPKIWEKSDEFIPERFSKTNESKRGPYDFIPFSAGSRNCIGQ
RYAMMELKVTIIKLIASFKVLPGDSMDKLRFKTDLVIRPDNGIPIKLVERI*
>AAGE01049176.1
67% to 4H14 complete
MLFLAIVVGALLYLVVNFYVTRKPLERMAVHFSGPKPHYLLGNVLEFLNKDLP
(1)
GIFETMVGFHRKYGQDILTWNVLNLNMISVTSAENVEKVLMAKQTKKSFLYSFVEPWLGQGLL
ISSGEKWFQRRKIITPTFHFKILEQFVTVFNKETDTMVENLKKHVDGGEFDIYDYVTLMALDSIC
ETSMGTCVNAQKNPTNRYVQNVKRMSVLVLLRTISVLAGSPLLYDILHPHAWEQRKIIKQ
LHEFTISVIESRRRQLEADKLEQVDFDMNEESLYSKRKMTFLDLLLNVTVEGKPLTNADI
REEVDTFMFE
(0)
GHDTTTSGISFAIYQLALNPQIQDKLYDEIVSILGKNSSNVELTF
QTLQDFRYLESVIKESMRLFPPVPFIGRTSVEDMEM
(1)
NGTTVKAGQEFLVAIYVIHRNPKVYPDPERFDPERFSDTAESKRGPYDYIPFSAGSRNCI
GQRYAMLEMKVTLIKLLMNYKILPGESMGKVRVKSDLVLRPDRGIPVKLVARS*
>494155296 56% to AY205085 66% to 4H14 793189512
AAGE01213118.1
AAGE01473588.1
AAGE01538714.1 531423523 512616786 570666861 571502407
supercontig
1.85 Frame = - complete
2620039
MITLVLVAGVVLYFLRSFLQKRNKLLKIANHFGGPKPLPVIGNLLEFNTDIP (1) 2619884
2602651
GIVHLNHTYGPNLFVWGFLNENVLFLGDTKLVEKVLLAKQTQKSLLYSYLTCWLRTGLLLA 2602469
SGEKWFQRRKIITPTFHFKVLEQFVTVFNREAQTMVDVMRKHVGGKEFDVYSYVTLMALDSVC
ETSMGTSVNAQKDPDNRYVRNVKR
(2?)
MSVLFLLRVIHPLATHPELYSLIHPNAYEQRKIVRELHEFT
DNVIATRRKQLKSDQMVDINRNVEDRYSKQKMTFLDLLLNVNIDGKPLTDLD
IREEVDTFMFE
(0)
GHDTTTSGISFTIYQLALNPHVQDKIYEEIVAILGKNHKTVELTYQSLQEFKYLEMAIK
EGLRLFPSVPFIGRNLVEDLEF (1)
DDITLPAGQDILIPIYMIHRNPEIYPDPERYDPERFSDGTESKRGPYDYIPF
SAGTRNCIGQRFAMLEMKAALIKLIGNYRILPGESLKKLRIMTDLVVRPEKGVPIRLEERV*
AAGE02005220.1
Length=120659 USE THIS SEQ
89011 MITLVLVAGVVLYFLRSFLQKRNKLLKIANHFGGPKPLPVIGNLLEFNTDIP
(1) 88856
71635 GIFEKIVHLNHTYGPNLFVWGFLNENVLFLGDTKLVEKVLLAKQTQKSLLYSYLTCWLRTGLLLAS 71438
71437
GEKWFQRRKIITPTFHFKVLEQFVTVFNREAQTMVDVMRKHVGGKEFDVYSYVTLMALDSVC (1) 71252
71190 ETSMGTSVNAQKDPDNRYVRNVKR (2) 71122
71063
MSVLFLLRVIHPLATHPELYSLIHPNAYEQRKIVRELHEFTDNVIATRRKQLKSGQM 70893
70892 LDINRNVEDRYSKQKMTFLDLLLNVNIDGKPLTDLDIREEVDTFMFE (0) 70752
70216
GHDTTTSGISFTIYQLALNPHVQDKIYEEIVAILGKNHKTVELTYQSLQEFKYLEMAI 70043
70042 KEGLRLFPSVPFIGRNLVEDLEF (1)
69911 DDITLPAGQDILIPIYMIHRNPEIYPDPERYDPERFSDGTESKRGPYDYIPFSAGTRNCI 69732
69731
GQRFAMLEMKAALIKLIGNYRILPGESLKKLRIMTDLVVRPEKGVPIRLEERV* 69570
>TC65985
TC16577 TC24796 TC37697 57% to CYP4H14
AAGE01321728.1 AY205085 AAGE01106416.1
complete
MFNFAVFLVILVVGLARFCINRSKLQQLAKHFPGPKPALLVGNLLQFPADIGGIFRRMVYY
HEKFGPDIVTWGIGNTLKFNVSSTRNVEKVLMAKTVQKSLSYSFIEPWLGKGLLTSTGRK
WFQRRKIITPTFHFTILEGFAEVFNRNADTLIDKLKVHEGGSEFDVYRYVSLYALDSICE
TAMGVQVHAQDDPENQYVRDVNRLSELFLLRIFSFLGMFPTLYWYLHPNAWEQRKLIRTL
HQFTDNVIWKRREQLMNGPRNDEMDNTTLSKKKQTFLDLLLCMSVESQPLSNEDIREEVD
TFMFGGHDTTSSAISFTIMQLALHQDIQDKLYAEIVSILKGQNLKTTHLTFNNIQDFKYL
DLIVKESLRLLPPISYVGRKLTEDTELNGATIPAGQDIFIPIYMVHRNPKIYPDPERFI
PERFAENAENLRGPYDYIPFSIGSRNCIGQKYGMMQLKMTVVRLIANFRVLPSEATASVK
LRTDLVLRPEYGIPIKIEARN*
AAGE02013268.1
use this seq (3aa diffs) no introns
13367
MFNFAVFLVILVVGLARFCINRSKLQQLAKHFPGPKPALLVGNLLQFPADIGGIFRRMVY 13188
13187
YHEKFGPDIVTWGIGNTLKFNVSSTRNVEKVLMAKTVQKSLSYSFIEPWLGKGLLTSTGR 13008
13007
KWFQRRKIITPTFHFTILEGFAEVFNRNADTLIDKLKVHEGGSEFDVYRYVSLYALDSIC 12828
12827
ETAMGVQVHAQDDPENQYVRDVNRLSELFLLRIFSFLGMFPTLYWYLHPNAWEQRKLIRT 12648
12647 LHQFTDNVIWKRREQLMNGPRNDEMDNTTSSKKKQTFLDLLLCMSVEGQSLSNEDIREEV 12468
12467
DTFMFGGHDTTSSAISFTIMQLALHQDIQDKLYAEIVSILKGQNLKTTHLTFNNIQDFKY 12288
12287 LDLIVKESLRLLPPISYVGRKLTEDTELNGATIPAGQDIFIPIYMVHRNPKIYPDPERFI 12108
12107
PERFAENAENLRGPYDYIPFSIGSRNCIGQKYGMMQLKMTVVRLIANFRVLPSEATASVK 11928
11927 LRTDLVLRPEYGIPIKIEARN* 11862
>AY431450
65% to 4J10 AAGE01108571 514842991 complete
continues
on AAGE01227281.1 AAGE01378346.1
MFSSVLSLVIITLIVLLAVYEWYLRQRDGYRAALQYPGGPML
PVLGNILEVLIKDTVQTFNYARSNALKYGRSYRQWIFG
NVILNVIRIREAEPILSSTKHTRKSILYRFLEPLMGDGLLCSKGSKWQARRKILTPAFHF
SILNDFLQVFQEEAEKLVGLLDSCADAEEEVVLQSIVTRFTLNTIC
(1)
ETAMGVKLDTFIGADKYRSQVYDVGERIVHRTMTPWLYDDGVYNLFGYQKPLEDA
IEPIHDFTRSIIRQKREQLKQDSTMHIVDSDGI
(2)
YGSKQRYAMLNTLLMAEENDAIDEEGIREEVDTFMFEGHDTTAAGLIFSILLLATEQEAQ
QRVYDELLKARSTKSESEAFTIADYNNLKYLDRFVKEALRLYPPVSFISRNLSGPLEV
DSTTFPHGTIAHIHIYDLHRDPEQFPDPERFDPDRFLPEVAA
KRNPYAYVPFSAGPRNCIGQKYALLEMKTVLCALLINYRILPVTTRQEVIFIADLVLRAK
TPIKVQFAKRKANATRS*
>AAGE02025842.1
first P450 on contig (of two)
6
aa diffs to AY431450 all in short interval
trace
files 811916166, 586617316, 582273387 match this seq
580134265,
753220309, match AY431450
There
may be two sequences
Searched
with the first 211 nucleotides to see if there were two
Alternate
matches affecting synonomous codons. All but one trace file matched
This
genomic seq.
180450
MFSSVLSLVIITLIVLLAVYEWYLRQRDGYRAALQYPGGPMLPVLGNILEVLIKDTVQTFNYARSN
180647
180648 ALKYGRSYRQWIFGNVILNVIRIREAEPILSSTKHTRKSILYRFLEPLMGDGLLCSKGS 180824
180825
KWQARRKILTPAFHFSILNDFLQVFQEEAEKLVGLLDSCADAEEEVVLQSIVTRFTLNTIC (1) 181007
181067
ETAMGVKLDTFIGADKYRSQVYDVGERIVHRTMTPWLYDDGVYNLFGYQKPL 181222
181223 EDAIEPIHDFTRSIIRQKREELKQDSTMHIEDSGDI (2) 181330
181387 YESKQRYAMLNTLLMAEENDVIDEEGIREEVDTFMFEGHDTTAAGLIFSILLLATEQEAQQRV 181575
181576
YDELLKARSTKSESEAFTIADYNNLKYLDRFVKEALRLYPPVSFISRNLSGPLEV (1) 181740
181798
DSTTFPHGTIAHIHIYDLHRDPEQFPDPERFDPDRFLPEVAAKRN 181932
181933
PYAYVPFSAGPRNCIGQKYALLEMKTVLCALLINYRILPVTTRQEVIFIADLVLRAKTPI 182112
182113 KVQFAKRKANATRS* 182157
>AAGE01397643.1
84% TO 223407477 569795084
250bp
downstream of AAGE01227281.1
join with
AAGE01226366.1
MDFLMDWWFAVLIIVIVLLAWDAIDKSGRPYRAMNKFPGPRVFPLIGTLSEILFKDQGK
TFQLAREWPKRYGGSYRFWVNSTLYVLNVVRVREAEPILSSTKNIDKSRFYKFLHPFLG
LGLLNSTGPKWMHRRRILTPSFHFNILNGFHRTFVEECDQLLATIDEHVDKGVSTAL
>AAGE01226366.1
95% to AAGE01331087.1 10 aa diffs (allele?)
YLNP
1440
KKRYAMLDSLLVAEQKQLIDEAGIREEVDTFAFEGHDTTAAALVFIFFTLAHESAVQDRI 1261
1260 YSEIRQVYNGKPQSDRVFTPQDYSEMKFLDRALKECLRLWPPVAFISRNISEDIVLEDGA 1081
1080
VIPAGCVANIHIFDLHRDPEQYPDPDRFDADRFLPEEVDRRNPYAYVPFSAGPRNCIGQK 901
900
YAMMELKVVIVNALLKFRVLPVTKLEDINFVADLVLRSTNPIEVRFERR* 754
AAGE02025842.1
ESTs DW194177.1 EB096538.1 use this seq
Second
P450 on contig (of two)
182386
MDFLMDWWFAVLIIVIVLLAWDAIDKSGRPYRAMNKFPGPRVFPLIGTLSEILFKDQ (1) () 182556
182616
AKTFQLAREWPKRYGGSYRFWVNSTLYVLNVVRVREAEPILSSTKNIDKSRFYKFLHPFLG 182798
182799
LGLLNSTGPKWMHRRRILTPSFHFNILNGFHRTFVEECDQLLATIDEHVDKGVSTALQPV 182978
182979 MSKFTLNTIC (1)
ETSMGVKLSTVSGADVYRTKLYEIGEALVHRLMRPWLLNDFLC
RLTGYKAAFDKLLLPVHSFTTGIINKKREQFQASSEPLVELTEENI (2)
YLNPKKRYAMLDSLLVAEQKQLIDEAGIREEVDTFAFEG 183506
183507
HDTTAAALVFIFFTLAHESAVQDRIYSEIRQVYNGKPQSDRVFTPQDYSEMKFLDRALKE 183686
183687
CLRLWPPVAFISRNISEDIVLEDGAVIPAGCVANIHIFDLHRDPEQYPDPDRFDADRFLP 183866
183867
EEVDRRNPYAYVPFSAGPRNCIGQKYAMMELKVVIVNALLKFRVLPVTKLEDINFVADLV 184046
184047 LRSTNPIEVRFERR 184088
>AAGE01331087.1
61% to 4J5 575366287 574128015
no ESTs
for this seq. no exact match in WGS 95% to AAGE02025842.1
second gene
YLNP
1211
KKRYAMLDSLLVAEQKQLIDEAGIREEVDTFAFEGHDTTAAALVFIFFTLAHEPAVQDRI 1032
1031
YSEILQVYNGKPQSERAFTPQDYAEMKFLDRALKECLRLWPPVAFISRNISEDIVLDDGT 852
851
LIPAGCVANIHIFDLHRDPEQYPEPDRFDADRFLPEEVDRRNPYAYVPFSAGPRNCIGQK 672
671
YAMMELKVVVVNALLKFRVLPVTKLEDINFVADLVLRSTNPIEVRFERR* 525
>AAGE01288441.1
97% to AAGE01216085.1 10 aa diffs
note on
finding the C-term. This seq is
74% identical to 4J5
at the
C-term. This 4J5 seq continues as
GQKYALLEVKTAVAYLVLRYRILPATKREEIRFIADLVLRSATPLKVRFERRQNA*
Which is
59% to AAGE01331087 so this is a good model
The N-term
part of this seq has only one seq in the trace files
So it may
be a poor version of the AAGE01216085.1 seq.
1368 IVIRGSFVINAIRARETEALLSSTKLIDKSILYTFLYPFMGKGLLTSTGPKWFHRRKILTAAFHFNI 1189
1188
LPKFLVTFQEECDKLLRKLDADVKADNTTTLQSVAARFTLNTIC 1057
997
ETAMGVKLDSMSMADEYRAKIQEVIKLLLLRVMNPWLVEEFPYRLLGFRRRLMKVL 830
829
KPIHAFTRSIIKQRRDLFHANVKNVDDFSEENIYVNTNQRYALLDTLLASEAKNQIDEEG 650
649 IREEVDTFMFEGHDTTASAFTFIFLVIANHQEAQRQLVEEIEAMIAGRIKPTEPLSMHDY 470
469
SELKFMDRVIKECLRLYPPVPFISRAILEDALLGDRFIPKDSMANLHIFDLHRDPDQFPD 290
289
PERFDPDRFLPANVEKRNPYAYVPFSAGPRNCI
191
>AAGE01216085.1
61% to 4J9 578920794 complete
TC57837
Length = 832 100% to AAGE01216085 extends the end
519943525
826152951 513457906 new C-term for 4J seq
attempted
walking to join with N-term part. Ran into a gap
613942247
589588262
MDWLTIVLLLILALLALYEVHLRLLLSNRAAKQFPGPRR
4
LPVLGNALALLFNDQVSTFKLPRRWAQRYKESYRLVIRGGFVINAIRARETEALLSSTKL 183
184
IDKSILYTFLYPFMGKGLLTSTGPKWFHRRKILTAAFHFNILPKFLVTFQEECDKLLRKL 363
364 DADVKAGNTTTLQSVAARFTLNTIC 438 (1)
498
ETAMGVKLDSMSMADEYRAKIQEVIKLLLLRVMNPWLVEEFPYRLLGFRRRLMKVL 665
666
KPIHAFTRSIIKQRRDLFHANVKNVDDFSEENIYVNTNQRYALLDTLLASEAKNQIDEEG 845
846
IREEVDTFMFEGHDTTASAFTFIFLVIANHQEAQRQLVEEIETMIAGRSNPTEPLSMHDY 1025
1026
GELKFMDRVIKECLRLYPPVPFISRAVLEDAQLGDRFIPKDSMANVHIFDLHRDPEQFPD 1205
1206
PERFDPDRFLPENVEKRNPYAYVPFSAGPRNCI
1304
QRFAMLELKAILTAVLREFRVLPVTKREDVVFVADMVLRSRDPIVVKFERR*
677
>223407477
AABIG09TP.gz 223407646 AABIH08TP.gz 65% to 4J9
AAGE01099570.1
574077942
MDFLTNWWFGALVIVTVLLVRDAIDKSGRIYRAINKFAGPPCLPLIGTLCEILFMNQGK
(0)
TYQWARKWPKRYGGSYRFWFSSTLYVLNVVRVREAEHILSSTRNI
DKSRFYKFLHPFLGLGLLNSNGPKWMHRRRILTPSFHFNILNGFHHTFVEECDQLLATID
EHVDKGVPTALQPVMSKFTLNTIC
Correct seq 88% to AAGE02025842.1
matches Nelson 223407477 on top
AAGE02025843.1
6807
MDFLTNWWFGALVIVTVLLVRDAIDKSGRIYRAINKFAGPPCLPLIGTLCEILFMNQ (1) 6637
6575 ATTYQWARKWPKRYGGSYRFWFSSTLYVLNVVRVREAEHILSSTRNIDKSRFYKFLHPFLGLGLL
NSNGSKWMHRRRILTPSFHFNILNGFHHTFVEECDQLLATIDEHVDKGVPTALQPVMSKFTLNTIC (1)
6183
6127 ETSMGVKLSTVSGADVYRTKLYEIGEVLVHRLMRPWLLNDFLCRLTGYKAA
FDKLLLPVHSFTTGIINMKRKQFQESLEPSVELTEENI (2) 5861
5801
YLNPKKRYAMLDSLLLAEQKQLIDEAGIREEVDTFAFEGHDTTAAALVFIFFTLAREPAVQDRI
YREILQVYSNKPQSSRAFTPQDYSEMKFLDRALKECLRLWPPVTFISRSISEDIILDDGS
LIPAGCVANIHIMDMHHDPEQFPDPERFDADRFLPEQVDRRNPYAYVPFSAGPRNCIGQK
YAMMELKVVVVNALLKFRVLPVTKLEDINFVADLVLRSTNPIEVRFERR* 5100
>AAGE02030510.1
Length=13206 98% to AAGE02025843.1 6807-5100. 9 aa diffs
new
seq
11199
MDFLTNWWFGALVIVIVLLVRDAIDKSGRIYRAINKFAGPPCLPLIGTLCEILFMNQ (1) 11029
10967
ATTYQWARKWPKRYGGSYRFWFSSTLYVLNVVRVREAEPILSSTRNIDKSRFYKFLH 10797
10796
PFLGLGLLNSNGPKWMHRRRILTPSFHFNILNGFHHTFVEECDQLLATIDEHVDKGVPTA 10617
10616 LQPVMSKFTLNTIC (1) 10575
10519
ETSMGVKLSTVSGADVYRTKLYEIGEVLVHRLMRPWLLNDFLCRLTGYKAAFDKLLLP 10346
10345 VHSFTTGIINMKRKQFQESLEPSVELTEENI (2)
10253
10193
YLNPKKRYAMLDSLLLAEQKQLIDEAGIREEVDTFAFEGHDTTAAALVFIFFTLAREPAV 10014
10013
QDRIYSEILQVYSNKLQSALAFTPQDYSEMKFLDRALKECLRLWPPVTFISRSISEDIIL 9834
9833
DDGSLIPAGCVANIHIMDLHHDPEQFPDPERFDADRFLPEQVDRRNPYAYVPFSAGPRNC 9654
9653
IGQKYAMMELKVVVVNALLKFKVLPVTKLEDINFVADLVLRSTNPIEVRFERR* 9492
These
two seqs are very close but the region QDRIYSEILQVYSNKLQSALAFTPQDY
Has
4 aa diffs. Trace files support
both sequences
588906795,
590281011 match this seq.
832533501,
832391117, 589181510, 579871626, 592076987 match the other seq AAGE02025843.1
>AAGE02030510.1
pseudogene like AY431450 new seq
12342 WRSWRKILTPASHFSIFSEFL*LLQKEVDKLVRLLE
NGIDKYQWQV*DLNGKIGHSMMTPWFL
12007
YDDGAYNLFGYQKSLEDAIEPIHDFTKN
11898
DEETIQEEVDNLMFEGYDTTAEGLIFSILLLATEQEAQQRV*NELLEDLS 11749
TNLESESFTVASYKNFNY
>AAGE01099570.1
4J like pseudogene
396
RIDHSIMTQLLYDDGVYNLFKYRKSL*DAIEPIHDFIRSIFLQNCVQLNQDSMMYSEEVK 575
576
QTFASTV*VKLSRISRYGLKPTYIMMNILLTAEK 677
676
NDGTAEETILEEVDNTMFEGCDTTAAGLIFSILLLATEQEPQQRV*DKL*EDCSSKS 847
848
ESETFTWMSYNNLKYRFLK 904
GPEYALLEMITIICILLISYRAIXXXXIFIADQILQTKPTAKVDYARRKANAMRN*
>AAGE01003123.1
C-term of 4J like seq 90% to AAGE01331087, pseudogene
PRGCVANIHIMDMHHDPEQFPDPDRFNADRFLPEEVERRNPYAYVPFSAGPRNCI
2873
GQKYAMMEL*VVVVNALLKFRVLPVTKLKDINFVADLVLRSTNPIEVRFERR 2718
>476375054
Pseudogene 61% to 4J5, 4aa diffs to AAGE01003123,
stop codon in same place
666
PRGCVANIHIMDMHHDPEQFPDPDRFNADRFLPEEVERRNPYAYVPFSAGTRICIGQKYA 845
846
MMEL*VVVVNALLKFGILPVTK 911
>AAGE01584611.1
89% to AAGE01005255.1 matches 574201551, 520163843
note
mate pair of 520163843 = 520524408 that has a C-term
like
AAGE01005255, but not
identical. These two genes are
linked
AAGE01584611
is upstream of 520524408 on the same strand
12
AELYKSNIREVGKIIQQRIMNPLLFEDWIYKITGYQAEFDKILSPIHSFTNNIIRQRRET 191
192
FHATMRNVDSPSEENTYTNIKQRYAMLDSLLLAEAKHQIDAEGIREEVDTFTFEGHDT 365
366 IGSAFVFTFLLIAHDQLVQQSLYEEIQRMFNLQPIPTLQNYNDLKYMDRVIKESLRI 536
537
YPPVPFISRLITEDVQYDGKLVPRGTLMNVGIYDLHRDPEQFPDPLRFDPDRFLPEQVQR 716
717 RSPYAYIPFSAGPRNCI 767
>520524408 593182570 813103660 574115512 593092990 825253407
579726574
exact match to AAGE01484914.1
96%
to AAGE01005255, but not
the same gene
This seq
is identical to TC57838
TC57838
Length = 974 9 aa diffs to AGE01005255 complete
593092990
and 593182570, 825253407 579726574
Another
set of WGS seqs are an exact match to AGE01005255
So there
are two very similar gene sequences that are 95% identical.
MYVFTTVAGLLVFIFILYKIYLRSLPSYRAAKYFPGYPVYPIVQNLFT
ALFKSQTGAFQQARQWARIFNNRTYRVLIQGVLYVQIIHHKDVEMLLSSSRLITKSPLYK 779
LIVPFIGNGLLNSTGEKWHQRRKILTPTYHFNILQGFLQIFHEECRKLVNQLDKDAAQGI 599
TTTLQPLSTQVTLNTIC (1)
ETAMRLKLDTSETAEVYKSNIREVGKVIQQRIMNPLLFEDWIYKITGYQ
1 AKFDKILRPIHAFTNSIIRQRRETFHETMKNVDSPSEENIYTNIKQRYAMLDSLLLAEA 177
178
KQQIDGEGIREEVDTFTFEGHDTTGSAFVFTFLLIAHEQLVQQRLFEEIERMFNLQPNP 354
355
TQQDYNDLKYMDRVIKESLRIYPPVPFISRLITEDVQYDGKLVPRGTIMNIEIYDLHRDP
534
535
EQFPDPERFDPDRFLPEEVQRRSPYAYVPFSAGPRNCI
GQRFAMLELKAILIGVLREFRVLPVTKREDVVFVGDMVLRSRDPIVVKFERR*
807
AAGE02035951.1 missing the last exon use this
seq 3 aa diffs to 520524408
2639
MYVFTTVAGLLVFIFILYKIYLRSLPSYRAAKYFPGYPVYPIVQNLFTALFKSQTGAFQQ 2460
2459 ARQWARIFNNRTYRLLIQGVLYVQIIHHKDVEMLLSSSRLITKSPLYKLIVPFIGNGLLN 2280
2279 STGEKWHQRRKILTPTFHFNILQGFLQIFHEECRKLVNQLDKDAAQGITTTLQPLSTQVT 2100
2099
LNTIC (1) 2085
2026 ETAMGLKLDTSETAEVYKSNIREVGKVIQQRIMNPLLFEDWIYKITGYQAKFD 1868
1867 KILRPIHAFTNSIIRQRRETFHETMKNVDSPSEENIYTNIKQRYAMLDSLLLAEAKQQID 1688
1687
GEGIREEVDTFTFEGHDTTGSAFVFTFLLIAHEQLVQQRLFEEIERMFNLQPNPTQQDYN 1508
1507
DLKYMDRVIKESLRIYPPVPFISRLITEDVQYDGKLVPRGTIMNIEIYDLHRDPEQFPDP 1328
1327
ERFDPDRFLPEEVQRRSPYAYVPFSAGPRNCI (1) 1232
GQRFAMLELKAILIGVLREFRVLPVTKREDVVFVGDMVLRSRDPIVVKFERR.
>TC57838
TC48249 matches 593092990 and 593182570, 825253407 579726574
cyan = corrections
GCAAAATTCGACAAGATTCTTCGTCCCATTCATGCATTCACCAACAGCATCATCCGACAGCGAAGGGAAACATTTCATGA
AACTATGAAAAACGTGGACTCCCCATCGGAGGAGAACATATACACCAACATAAAGCAGCGCTACGCCATGCTGGATAGTC
TTCTGCTGGCGGAAGCCAAACAGCAAATTGACGGCGAAGGGATCCGCGAGGAGGTTGACACGTTTACCTTTGAAGGCCAC
GATACAACTGGCAGTGCCTTCGTGTTCACCTTTCTGTTGATTGCTCACGAGCAACTCGTTCAGCAGCGTCTGTTCGAAGA
GATTGAACGCATGTTCAACCTCCAACCCAATCCAACCCAACAGGACTACAATGACTTGAAGTACATGGATCGGGTGATCA
AGGAATCGCTTCGAATCTATCCGCCGGTGCCATTCATCTCCCGATTGATTACCGAGGATGTACAATACGATGGGAAGTTG
GTACCGAGGGGTACCATCATGAACATCGAAATCTACGATTTGCACCGAGATCCGGAGCAGTTTCCCGATCCGGAACGATT
CGATCCGGATCGGTTTCTGCCGGAGGAGGTCCAGCGGAGGAGTCCGTACGCTTATGTTCCGTTCAGTGCTGGACCGAGGA
ATTGCATTGGTCAACGGTTCGCCATGCTGGAGCTGAAGGCCATCCTCATCGGGGTGCTCCGCGAGTTCCGAGTCCTTCCC
GTTACCAAGCGGGAGGATGTGGTTTTCGTTGGGGACATGGTCCTCCGCTCGAGAGACCCAATCGTGGTCAAATTCGAACG
ACGTTAAGCTTTTTCTTGCTTTTTATAGCGACCCGTTGACCCAGTGAATTCAAGGATTTTTCAGTTTTTTACGGACAAAG
AGCGCATCCTGAACTGCTACTAGGCTACCCAAAAGTCATATTCTTAAATTGTTAATCCTAACATACTGTGTGAATAAATG
TTTTTTTATCGATT
>AAGE01005255.1
55% TO 4J9 TC57836 complete
5509
MYVFTTVAGLLVFIFILYEIYLRSLPSYRAAKYFPGYPVYPIVQNLFTALFKSQTGSFQQ 5330
5329
ARQWARIFNHRTYRLLIQGVLFVQIIHHKDVEMLLSSSRLITKSPLYKLIVPFIGKGLLN 5150
5149
STGEKWHQRRKILTPTFHFNILQGFLQIFHEECRKLVYQLDKDAAQGITTTLQPLSTQVTLNTIC 4955 (1)
4896
ETAMGLKLDTSETAEVYKSNIREVGKVIQQRIMNPLLFEDWIYKITGYQAKFDK 4735
4734
ILRPIHAFINSIIRQRRETFHETMKNVDTPSEENIYTNIKQRYAMLDSLLLAESKQQID 4558
4557
AEGIREEVDTFTFEGHDTTGSAFVFTFLLIAHEQLVQQRLFEEIERMFNLQPNPAL 4390
4389
QDYNDLKYMDRVIKESLRIYPPVPFISRLITEDVQYDGKFVPRGTIMNVEIYDLHRDPEQ 4210
4209
FPDPERFDPDRFLPEDVQRRSPYAYVPFSAGPRNCI 4096
GQRFAMLELKAILTAVLREFRVLPVTKREDVVFVADMVLRSRDPIVVKFERR* 783
>AAGE01001298.1
AAGE01138953 gene a C-term complete
821735340 matches on the (-)