D. Nelson
Some names not previously assigned are
now assigned. Some revisions to
sequences are made, while some are known but remain confidential. Look for a new Bombyx genome paper in
Science late this year/early next year.
Then all will be revealed.
Revised Feb. 9, 2007
D. Nelson
Some sequences are given placeholder
names like CYP6AEaa until official
names can be assigned. There are 51 complete sequences, 9 more
are nearly
complete, missing only N- or C-terminals
or a small internal piece.
This is 60 genes with strong, intact or
nearly intact assemblies.
There are another 12 in closely related
gene subfamilies that have all or nearly
all exons accounted for, but the exon
connections cannot be made from the small
contigs that do not overlap. This is at least 72 genes. There are 7 more partials
that are less than half a gene at the
present time. If these can be
completed, that
would make 79 P450s in silkworm. This is comparable to Drosophila.
Parts of some genes were assembled from
confidential sequences. The
Beijing Genomics Institute has now set up a database with a BLAST server to
make this data public, so I have included it here.
SilkDB: a
knowledgebase for silkworm biology and genomics
Nucleic Acids
Research 33, 399-402 (2005)
http://silkworm.genomics.org.cn/
July 20, 2004
Note: trees have been built and naming
is being done, but some sequences are still
uncertain. The old CYP301B sequence has had its C-terminal exon changed
based
on an Apis sequence and it is now
CYP49A1, the old CYP49A1 is now CYP49A2
>CYP4G22
old CYP4Gxx 64% to 4G19
AV404689
Bombyx mori prothoracic gland EST spans two contigs
AV405174
Bombyx mori prothoracic gland EST
AV404871
Bombyx mori prothoracic gland EST
BP183989
P5PG Bombyx mori cDNA clone
BP182770
cDNA clone NRPG1026
BP182055
NRPG Bombyx mori cDNA clone
BP183011
cDNA clone NRPG1327
BP183771
P5PG Bombyx mori cDNA clone
AU004478
EST
BP183321
NRPG Bombyx mori cDNA clone
BAAB01118673.1
BAAB01085960.1 84% to 4G20
MSYTNAENVVPTSTFSAINLFYVLLVPAVILWYAYWRMSRRRLYELADKLNGPPGLPLLGNALEFVGGSA
()
DIFRNIVQKSADYDHESVVKIWIGPRLLVFLYDPRDVEVILSSHVYIDKAEEYRFFKPWLGNGLLIST (1?)
GQKWRSHRKLIAPTFHLNVLKSFIDLFNANSRAVVDKLKKEASNFDCHDYMSECTVEILL
(1)
234
ETAMGVSKSTQDQSGFEYAMAVMKMCDILHLRHTKIWLRPDLLFKFTDYAKNQTKLLDIIHGLTKK 431 (0)
988
VIKRKKEEFASGKKPSNLNETATTSEPSTGKLTSVEGLSFGQSSGLKDDLDVDDDVGQK 1164
1165
KRLAFLDLLLESSQSGVAISDEEIKEQVDTIMFE 1266 ()
1340
GHDTTAAGSSFFLSMMGIHQDIQDKVIEELDQIFGDSDRPVTFQDTLEMKYLERCLMET 1516
1517
LRLYPPVPIIARQVNQEITL 1576 ()
1680
SNGKKIPAGTTLVIATYKLHRRPDVYPNPNKFDPDNFLPERSANRHYYAFVPFSAGPRSCV 1862 (1)
1960
GRKYAMLKLKVILSTILRNFRVISDLKESDFKLQADIILKRAEGFQVRLQPRKRMAKA* 2136
>CYP4G23 old
CYP4Gyy 58% to 4G19 59% to CYP4Gxx
CK508129 rswdd0_001928.y1 swd Bombyx mori
cDNA
CK505752
rswcc0_009305.y1 swc Bombyx mori cDNA
AV401408
BP122820 BP123076 BP183420 CK509955 CK505752 CK505553
BAAB01050777.1
BAAB01096227.1 BAAB01156775.1 BAAB01003438.1
BAAB01073763.1
BAAB01110168.1 BAAB01050777.1
MTSLVDETEGYHVNSRVIFYPLLGLTTAIWILYRWQQNSHMHKLAELLPGPASIPIFGNALTLMRKNPHE
()
LVNLALGYAQTFGNVIRVWLGSKLIVFLVDADDIEIILNSHVHIDKATEYRFFKPWLGEGLLISS
(1?)
GPKWRSHRKMIAPTFHINILKSFVGIFNQNSNNVVEKLKSEVGKTFDVHDYMSGTTVDILL
()
ETAMGISRKTQDESGFDYAMAVMK
()
MCDIIHQRHYKFWMRSEIVFKLTSFFKQQTKLLGIIHGLTNK
()
VIKNKKETYLENKAKGIIPPTLEEFTHHSGEILANNAKTLSDTVFKGYRDDLDFNDENDV
()
GEKKRLAFWDLMIESSQNGTNKISDHEIKEEVDTIMFE
()
GHDTTAAGSSFVLCLLGIHQDVQARVYDELYQIFGDSDRPATFADTLEMKYL
ERVILESLRLYPPVPVIARKLNRDVTI
()
STKNYVIPAGTTVVIGTFMLHRQPKYYKDPEVFNPDNFLPENTQNRHYYSYIPFSAGPRSCV
(1)
GRKYALLKLKILLSTILRNFRTISEIPEKEFKLQGDIILKRAEGFQMKVEPRKRVPTNVAR*
>CYP4G24 old
CYP4Gzz 71% TO CYP4Gyy
FIRST
TWO EXONS ARE JOINED WITHOUT OVERLAPPING EVIDENCE
BUT
THEY ARE THE ONLY OTHER 4G N-TERM SEQUENCES WITHOUT A PARTNER
BAAB01149226.1
BAAB01135847.1 BAAB01073803.1 BAAB01199362.1
MSSLLNYNFEFDVPPIFHTLLLAIMIMWM
LHRWQQSSRLFRLGNKLPGPMALPLVGNSLLILGKKAEGW
= BAAB01031491.1
VKYALDYSEKYG
TVVRAWAGPKLVVFLTDANDVEVILNSQIHIDKSPEYRFFKPWLGEGLLISSG
= BAAB01016193.1
XKWRSHRKMIAPTFHINILKSFMGVFNENSKSVVKKLRSEVGKTFDVHDYMSCVTVDILL
ETAMGITKTTQDAASFDYAMAVMK
MCNIIHQRHYKVWLHFDAIFKLTSLFKKQRELLKTIHGLTNK (0)
VIKKKKFMYLQNKEKGIIPPTIEELTKIKDTNDSIMEDSAKTLSDTVFKGYRDDLDFNDEQDV
()
GEKKRLAFLDLMIESAQNHTCNISDHEIKEEVDTIMFE ()
GHDTTAAGSSFVLCLLGIHQEIQSKV
YDELFEIFGDSDRLVTFADTLQMKYLERVILESLRLYPPVPAIARKLTRDVQI
()
VTNNYIIPAGSTVVIGTFKIHRDPKYHKNPNVFNPDNFLPENTQNRHYYSYIPFSAGPRSCV
(1)
GRKYALLKLKVLLSTILRNYKTTSEISEDQFVLQADIILKRYDGFKIRIEPRNKNHSNTV*
CYP4L subfamily sequence
>CYP4L6 67% to 4L4
CK535040
EST 5aa diffs BP182824 EST
BAAB01121615.1
BAAB01097697.1 BAAB01093085.1 BAAB01107041.1
MYLILLICLLVLVLTVSWWSMLNRICKSNVPGPFPLPIIGNAHQFVVRST
(1)
EFLGLLKSFTDKYGDVFRVHFFSYPYVLISHPKYAE
(0)
932 ALVSSADLITKGRSYSFLKAWLGEGLLTAS 1018
(1)
1506
GPRWRLHRKFLTPAFHFNILQNFLPVFCKKSEILRDKIRRLADGQPIDLFPITALAALDNVAESIM 1688 (1)
GVSVNAQQNSESEYVRAIEX
()
LSQITTLRMQIPLLGEDFIFNLTSYKKKQNIALEVVHGQTKKVIEARRCELEKNNKTNISGTNE
()
IGIKNKHAFLDLLLLAEIDGKL
()
12 MDEQSVREEVDTFMLEGHDTTTSGI 86
85 LVYTLFGLSKHPDIQEKWYEEQLTIFGEEMDRTPAY ()
NELAQMKVLEWVIKESLRMYPSVP
264
265
LIERWITKDAE
VGGLKLSKGTSVVLNIFQMHRNPEVFEKPLEFIPERFDSLEHKNPFSWLAFSAGPRNCI
()
GQKFAMMEMKVTLSTLVRNFKLVPVDIEPILCADLILRSQNGVKVGFLPRTQSNSKT
CYP4M subfamily sequences
>CYP4M5 EST
BP125511
BAAB01099467.1
BAAB01097680.1 BAAB01039373.1
MFVYLIFIASFFLLIHLAFNY
NSKAVMMNKVPGPKLSFILGNAPEIMMLSSVELMKLARKFASRWDGIYRIWAFPLSIINIYNPDDVEVIVSTTKHN
EKSSVYKFLKPWLGDGLLISK
GEKWQQRRKILTPAFHFNILRQFSVIIEENSQRLVESLEKCIGKPIDIVPVVSEYTLNSIC
ETSMGTQLSDKTEDAWKAYKDAIYELGPYFFQRFTRVYLYFDIIFYLTSLWRKMKKPLKSLHG
FTSTVIKERKIYVEQNGVKF
GEDVNDDDLYIYKKRRKTAMLDLLIAAQKDGEIDDHGIQEEVDTFMFE
EHDTTASGLTFCFMLLANHRAVQDKIVEEINYIMGDSTRRANLEDLSKMKYLECCIKESLRLYPPVHFISRNLNEPV
VLSNYEIPAGSFCHIHIFDLHRRADIYEDPLVYDPDRFSQENSKGRHPYAYIPFSAGPRNCI
GQKFAMIEMKSAVAEVLRKYELVPVTRPSEIELIADIILRNSGPVEITFNKRTK
>CYP4M9
BAAB01103156.1 BAAB01142776.1 BAAB01178932.1 57% to 4M5
BY921981.1
EST N-term
MWMYFILILLLLLTIHFLLNYNYRARLLRRIPGPRGYFIVGNALDVILSPAELFASTRK
NAAQWPNLNRFWSFGIGALNIYGPDEIEAIISSTQHITKSPVYNFLSDWLRDGLLLST
GTKWQKRRKILTPAFHFNILKQFCVILEENSQRFTENLKDTEGKSINVVPAISEYTLHSII
()
675
ETAMRTQLGSETSEAGRSYKNAICELGNQFVHRLARLPLHNNFIYNLYTLGKQNKH
LNIVHSFTKKVIKDRRQYIRGNGGNNFDDEKDTQADEHSIYFNKKKTAMLDLLLKAERDGL
IDEIGVQEEIDTFMFE
()
1150
GHDTTATGLTYCIMLIANHKSIQVGALRKYID 1245 ()
295
GESTRAADIEDLSKMRYLERCIKESLRLYPPVPSMGRILSEEI 423 ()
1467
VLDGYTVPAGTYCHIQIFDLHRREDLFKDPLVFDPDRFLPHNTEGRHPYAYIPFSAGPRNCI 1282 ()
GQKFAILEMKSLLSAVLRRYNLYPITKPEDLKFVLDLVLRTTEPVHVRFVKRNKV*
CYP4S subfamily sequences
>CYP4S5 66% to
CYP4S4 from the moth Mamestra brassicae
BAAB01177041.1
BAAB01181662.1 BAAB01048279.1 BAAB01025317.1
AADK01013707.1
MIWAVVLLIIFIFLIWKALEDEENPLDSLPGPERKPIIGATMRFVNLNT
()
3009
GEMFIKLREYHAMYGTRYVVKIFKRRILHLSNERDVE (0)
365 VVLSHSKNIKKSKPYTFLSPWLGSDLLLST 454
GFKWHSRRKILTPTFHFNILKSFLEIIKDKSCDLVKRLEEYRGEEVDLMPVISDFTLFTIC
ETAMGTQLDSDKSAETSEYKMAILQIGSLLLDRLTKVWLHNDFIFRQFTVGRKFQKCLKQVHSFAHN
VIVERKRQRASGRDPTVVAEDVFG
RKKRLAMLDLLLEAEEKNEIDFEGIMDEVNTFMFE
(1?)
GHDTTAVALTFSLMLVAEDDQVQ
(0?)
DRIYKELQGIFGDSDRRPTISDVAEMKYLEAVVKETLRLYPSVPFIAREITEDFML
()
DDLKIKKGSEVAVHIYDLHRRKELFSDPEKFLPDRFLNGELKHPYSFVPFSAGPRNCI
(1)
GQRFATLEMKCVLSEICRSFRLEPRTKGWRPTLVAEMLLRPNEPIHVKFIKRKQS*
>CYP4S6 94% to
CYP4S5
BAAB01121123.1
BAAB01054179.1 BAAB01073078.1 BAAB01008096.1 BAAB01081659.1
AADK01014846.1
164
MIWTVVVLIIFIFLIWKALEDEENPLDSLPGPERKPIIGAALRFVNLNT 18 ()
REMFIKLREYHAMYGTRFVVKIFKRRVLHLSNEKDAE (0)
VVLSHSKNIKKSKGYTFLSPWLGSGLLLST (1)
1053
GFKWHSRRKILTPTFHFNILKSFLEIIKDKSCDLVKRLEEYRGEEVDLMPVISDFTLYTIC 1235
1966
ETAMGTQLDSDKSAETKEYKMAILQIASLLLNRLTKVWLHNDLIFDQLTVGRKFQK 2133
2134
CLKQVHSFAHNVIVERKRQRASGRDYSVVAEDVFGRKKRLAMLDLLLEAEEKNEIDFEGI 2313
2314
MDEVNTFMFE 2343 ()
2862
GHDTTAVALTFSLMLVAEDDQVQ 2930 ()
3158
DRIYKELQGIFGDSDRRPTISDVAEMKYLEAVVKETLRLYPSVPFIAREITEDFML 3325 ()
578
DDLNIKKGSEVVVHIYDLHRRKELFADPEKFQPDRFLNGELKHPYSFVPFSAGPRNCI 751 ()
GQRFAMLEMKCVLSEICRSFRLEPRTKGWRPTLVAEMLLRPNEPIHVKFIKRKQS*
CYP4AU sequences
>CYP4AU2
BAAB01192732.1 CYP325 like N-term
AADK01032226.1
1927
MIILLLLFFVASLYWLYWTNKSKRMNKMTASLPVPPTLPILGNATLFIG
>CYP4AU2
35% to 4C3 aa 112-276 37% to 4G19 BAAB01022101.1 exons 2,3,4
AADK01025934.1
(1)
AIGDPEDAQVVLENCLDKDVVYRFLRPWLGHGLFVAP (1)
VCLWKSHRKVLLPVFHNKVVEQYLQMISVQADILVERLNEKANKGEFDVLKYITACTLDIVF
(1)
ETAMGERMDVQRSPDTPYLRARHTVMTILNMRFFKVWLQPDCIFNLTSYSKQQKDNIDLTHKFTDE
(0)
>CYP4AU2
62% to CYP4AU1 I-helix to end BAAB01055413.1
AADK01029434.1
Note N-terminal
is known but confidential (10/18/2007)
Only missing
23-24 aa in the middle between the two segments
(1) GKVRPVLDMLFGREIEFTDEQLREHIDSITIAGNDTTALVMAYTLVRLGIHQNVQEKVYLE
()
QRTIFGDSKRGADKVDVAQMQYLERVLKESMRLYTVVPIIARNVHKDTYL ()
PRCGVTLPAGIGAVVGPFAIHRSKSVWGPDADEFDPDRFLPERSLNRHPAAFLPFSHGSRNCI
()
GRNFGMLIMKSIVSTISRSYRIEADELGPLKIEMLLFPIRGHQIRISRRETLA*
>CYP4AU-like
BAAB01077742.1 Length = 3282 68% to BAAB01022101.1 exon 2
AADK01002646.1
69% to
CYP4AU2
2223
AICDTEHAQIILDCLNKDMVYRFLRLWLEHGLFVAS 2330
CYP4AX sequences 50% identical to CYP4S4
>CYP4AX1
CYP4qq 50% to CYP4S4 53% to CYP4Sxx lower case is from CYP4rr
BAAB01092965.1
BAAB01135526.1 BAAB01183645.1, AADK01005955.1
AADK01030377.1
(frameshift after GQKF in last exon)
MWFSLVVVAVIYALWKLFYKEDDPIDSLPGPTKLPIIGNMLDMFNMTP ()
GEKFKYERQLSKTYKQRYMQKIFYRRIVYVHHPDDVE ()
VVLSHSKNITKNVNYDFLKPWLGTGLLLST
()
GSKWFKRRKILTSAFHFDILKDFASLFEEKSRRLVDQLRANNGEPISLLPVMSNYTLFTLC
(1)
ETALGTKLDTDRSVAAAEYKDAISKTAQISIYRLPRIWLYIDAIFNRTSAGREFAKNVDIIHSFADN
VIVQRKEQRLNSLDKGLVERDEFNRKKRTALLDLLLEAEAKREIDLEGIREEVNTFMFA
()
GHDTTGTALTFSLMLLSDHEEAQ
()
ERILEEYNEVMRGKETPTLSEFAEMKYLDAVIKETLRLYPNPHRVGRVLTEDITL
(1)
GGVPIKAGTEIGVQIIDLHHREDFFPEPEKFRPERFLRGEIQHPYSFVPFSAGPRNCL
()
GQKF
AMLEIKSVLTHICNNFKLVPMKRNWRVETVSDIVLKPAEPIYIKFVPR*
>CYP4AX2
CYP4rr 94% identical to CYP4qq lower case is from CYP4qq
BAAB01152933.1
BAAB01194854.1 BAAB01058563.1 BAAB01102208.1 BAAB01008375.1
MWFSLLVVAVIYALWKLFYKEDDPIDSLPGPAKLPIIGNMLDMFNMTP
()
GEKFKYERQLSKTYKQRYMQKVFYRRIVYLHHPDDVE
(0)
VVLSHSKNITKNVIYDFLKPWLGTGLLLST
()
GSKWYKRRKILTSAFHFDILKDFASLFEERSRRLVDQLRANNGEAISILPVMNNFTLLTIC
(1)
ETALGTKLDTDRSVNTAAYKDAISKIGQICIYRLSRIWLYIDAIFNRTSAGREFAKNVDIIHSFADN
VIVQRKEQRLNSLDKGLVERDEFNRKKRTVLLDLLLEAEAKREIDLEGIREEVNTFMFA
()
ghdttgtaltfslmllsdheeaq
()
ERILEEYNEVMRGKETPTLSEFAEMKYLDAVIKETLRLYPNPHRIGRVLTEDITL
(1?)
GGVPMRAGTEVCVLTIDLHYREDFFPEPEKFRPERFLRGEIQHPYSFVPFSAGPRNCL
(1)
GQKFAMLEIKSVLTHICNNFKLVPMKRNWRVETVSDIVLKPAEPIYIKFVPR*
&&&&&&&&&&&&&&&&&
>AM106362
Lutzomyia longipalpis
Jacobina = best EST to these genes
MWGKFFFICDGLFWAVFFSPPLRGGLGSFFRGGGRKFPGPFYGS
PSLGRLPAQMGFSQKGFLGLPKVILSKITQSNENFGWAEAVC
()
AYFPPEECQYVFNANECLSRDDIYDYIKPFTGDGLVTLP
()
AETWKDHRKFLNPCFNLKILQSYMPIFNTEVKTLIGRLGQRIGKGSFDMYDYMDACALDVVC
()
QTTLGTQMNIQKNENMDYLDAANSLLATMTTRIFNPLYHSDFIFNLSKWAKMEQKNSDITFGFVDNILQR
KKAAYKKFQPSDEQNNLDEGTSFKSPQLFIDQLLKLSMEGKYFTDTDVKNEANTIVA
CYP6B sequence
>CYP6B29
51% TO 6B1 CK494244 EST 57% to 6B27 aa 144-500
BAAB01105883.1
BAAB01064450.1
2461
MAIIYILSASVVLPLLLYLYFTRHFNYWKKRNVPGPKPVPLFGNLMELALRKKNIGIVFKELYENFPNEK 2670
2671
VVGIYRMTTPCLLIRDLDVIKNIMIKDFDVFVDRGVELS*SGLGANLFHADGDTWRV 2841
2842
LRNRFTPLFTSGKLKN
MLHLMIERANKYIEHVEMLCDHQPEQDIHTLVQKYTMATIAACAFGLDIDTTDPNKDQLK
TLEEIDRLSLTANFAFELDMMYPGVLKKLNSTLFPGFVSRFFKDVVKTIIEQRNGKPTDR
NDFMDLILALRQLGDIQATKRNSEDKEYSIELTDELIEAQAFVFYIAGYETSATTMTFML
YQLALNPDIQDKVIAEIDQGLKESKGEVTYEMLQNLTYFEKAFNETLRMYSIVEPLQRNA
KIDYKIPDTDIVIEKGTTVLFSPLGIHHDEKYYPNPSKFDPERFSPANISARHPCAHIPF
GTGPRNCI
()
661
GMRFAKIQSRVCMVKMFSKFRFELAKNTPRNLDIDPTRLLLGPKGGIPLKIVRR* 825
CYP6AB subfamily sequences
>CYP6AB4 64% TO
6AB1 OVER WHOLE SEQ BAAB01081811.1 BAAB01162324.1
BAAB01031011.1,
AADK01009517.1
MLTAAIFVIIVALVYLYSTRTFRYWEKRGIKHDKPVPFFGTDSEGYLLRKSMTQTAVDAY
LKYPNEKVIGFFRSSRPELIIRDPDIIKRILTTDFAYFYPRGLNPHKKVIEPLMRNLFFA
DGDLWRLLRQRMTPAFTSGKLRAMFPLVVERAERLQSRTLEIASQPLDARELMARYTTDF
IGACGFGLDADSLNDEDSAFRKLGAAIFNITVQQAIVAALKEIFPGIFKHFKYSSKYETD
FMSLVSSILKQRNYKPSGRNDFIDLLLECRMKGEIVVESIEKMKPDGTSEVVRMELTEQL
LAAQVFIFFAAGFETSSSATSFTLHQLAFHPEIQEKVQKELDQVLAKYNNKLCYDAIKEM
RYLESAFKEAMRMFPSLGFLIRECARQYTFPELNLTIDEGVGIIIPLQALHNDPEYFDSP
NEFRPERFMPSEYNHNKTKFVYLPFGDGPRGCI
()
GARLGLMQSLAGLAAVLSKFTVKPAPSTKRHPVVEPKSSVVQSIKDGLPLLFIERTKS*
>CYP6AB5
BAAB01021567.1 BAAB01206787.1 58% to 6AB3 72% to 6AB2
56% TO CYP6ABxx
MISSING
C-TERM BEYOND A REPEAT SEQ.
possible
C-term = BAAB01196051.1 66% to 6AB2
Bmb030331
from Li Bin
MFFYLLIVILITLYYYGVR
TFDYWKKKGVNHDPPLPFFGNNLRQFMQKASMAMMATETYKKYPEEKVVGFYRGTSPELV
VRDPELIKRILVTDFSSFYARGFNPHKKVIEPLLKNLFFADGDLWRLIRQRFTPAFSTAK
LKAMFHIITERAEKLQMIAENEAYENFCDVRELMARYTTDFIGACGFGLNIDSLSDENSQ
FRKLGKRIFKRDLSDAVRAALKLMFPELCKHLTFLTPELEKSMTYLVQNVIREKNYKPSG
RNDFIDLMLELKQKGKLLGESIEAKNANGTPKQVELEFDDLLMTAQVFVFFGAGFETSST
ASSYTLHQLAFNPECQEKTQKEIDEVLSKHNNKITYDAIKEMTYLEMAFNEAMRLYPSVG
YLVRMCTVPEYTFPEINLTINEDVKLMIPIQAIQKDEKYFKDPERFHPERFSSGAKANLK
PYTFLPFGEGPRACV
()
923
GERLGQMLSMAGLVAVLQKYTVEPVEISLRDPIPDPTTTVSEGFVHGLPLKLRRRERRI* 1102
>CYP6AB8 96%
identical to CYP6AB4 20 aa diffs
whole seq known
but confidential
CYP6AE sequences.
>CYP6AE2 Bombyx
mori (silkworm)
old CYP6AEcc BAAB01091139.1
contig444108,
frameshift
at 1529 whole gene in one contig
No
accession number
Junwen Ai
submitted to nomenclature committee Jan. 31, 2007
nearly identical to sequence CYP6AEcc on Bombyx page except
one region from amino acids 106-256 does not match.
The
EST BY914225 agrees with the Ai sequence so the old
CYP6AEcc
sequence appears to be a hybrid assembled from two genes.
663
MSVSALVFAAFVLLVTYIYYWSTRKFDYWKRKNVPYAKPVPFFGNYMRYITLQSFLGDVM 842
843
QKLCQQFPDRPYFGSFYGTEPALVLQNPEIIKQVFTKDFYYFNSRENRDYNHKEVFTQNL 1022
1023
FFANGDRWKVIRQNLTPLFSSSKMKNMFYLVEKCNHSLEDMLDKETKDLQSIEIRSAMIR 1202
1203
YTLDSICSSAFGIETNTLSEGAENSPFPSMGSTIFSSSITRGLKLIGRSMWPGIFYKLGL 1382
1383
RCFPTEIDDFFERLLTEVFENRGYKPTNRNDFVDLILSLKQNDYLTGDG 1529
1529
LVPKNVDAKKVTVKVDDALLIAQCVVFFGAGFETSATTLSAALYELAKNPEAQRRAQEEV 1708
1709
DELLLKHNNKLNYDCLAELPYLEACMNEAMRLYPVLPNITREAVTDYTFPDGLRIDKGMR 1888
1889
VHVPVYAIHRNPDNFPDPEEFRPERHLGDAKNDIKQFTYFPFGEGPRICI 2038 (1)
2917
GMRFGKMQTIAGMVTCLKKYNFELADGMSKTVPFRSTTVLTQPSTGLFLKATPRDGWKQRIFAR* 3111
>CK526561 a
related EST rswfa0_003045.y1 swf Bombyx mori cDNA.
2aa diffs with BAAB01091139.1, 6aa diffs with
CYP6AEbb
GFETSASTLSAALYELAKNPEAQRRAQEEVDELLLK
414
HDNKLNYDCLAELPYLEACMNEAMRLYPVLPNITREAVTDYTFPDGLRIDKGMRVHVPVY 235
234 AIHRNPDNFPDPEEFRPERHLGDAKNDIKQFTYFPFGEGPRICIG
>CYP6AE3P
(old CYP6AEbb) BAAB01174895.1 95% to BAAB01091139.1 54% to 6AE1
there
are probably two very similar genes
BAAB01172535.1
fills in missing sequence for this contig
Bmb026776
from Li Bin
MSVSALVFSAFVLLVTYIYYWSTRKFDYWKRKNVPYAKPVPFFGNYMRYITLQSFLGDVM
QKLCQQFPDRPYFGSFYGTEPALVLQNPEIIKQVFTKDFYYFNSRENRDYNHKEVFTQNL
FFANGDRWKVIRQNLTPLFSSSKMKNMFYLVEKCNHSLEDMLDKETKDLQSIEIRSAMIR
34 YTLDSICSSAFGIESNTLREGGENSPFANMGSIVFSSSITRGLKWISRSMWPGIFYKLGL
213
214
QCFPAEIDGFFER 252
LLTEVFENRGYKPTNRNDFVDLILSLKQNDYLTGDGLVPKNVDAKKVTVKVDDALLIAQC
VVFFGAGFETSATTLSAALYELAKNPEAQRRAQEEVDELLLKHNNKLNYDCLAELPYLEA
CMNEAMRLYPXXXXXTRETVADYTFPDGLRIDKGMRVHVPVYAIHRNPDNFPDPEEFRPE
RHLGDAKNDIKQFTFFPFGEGPRICI (1)
GMRFGKMQTIAGLITCLKKFNFELADGMPRTLAFRSTTLLTQPSTGLFLKATPRDGWEQRIFAR*
>CYP6AE4
(old CYP6AEaa) BAAB01211364.1
84% to CYP6AEff 53% to 6AE1
missing
C-term
MLFLTLLFILSVCVYFIY
(frameshift)
YRVCNRRFDYWRKKNVSFVPPVPILGNYSGYILLKESISKVVHNLCKLFP 2567
2568
NDPYIGAFFGTEPTLIVKDPEFIKLVLTKDFYHFNGREGSKYTHNEVVTQNIFFTYGDRW 2747
2748 KVIRQNLTPLFSSLKMRNMFHIIEKCSGIFENLLDEESLAPEVEMKSLMSRFTMDCIGGC
2927
2928
AFGVDTKAMQEPKDNIFTTMGYLFFESTTYRGIKNVLRAIWPGIFYGLGLKVFPTDLNEF 3107
3108
FSKLLVGIFEARDYKPSSRNDFVDLLLNLKKNRHIVGDRLQKTTTGDEGADSKFELEVDD 3287
3288
GLLVGQCLAFFSAGFETTSTISNFTLYELAKNPDVQKRAQKEVDEYIKKHNNKLDYDCVK 3467
3468
ELPFVEACIDEALRLYPVLGVLTREVMEQYTFPTGLTLDKGDRVHIPVYHLQRDPEYFPE 3647
3648
PELFKPERFYGEEKKNIRPFTYLPFGDGPRICI 3746 (1)
>CYP6AE5
(old CYP6AEff) BAAB01096775.1 91% TO CYP6AEee
BAAB01211363.1
51% to CYP6AE1 Depressaria pastinacella 86% TO CYP6AEdd
AADK01005027.1
MILTIIFILSLCVYILYRISTRKFDYWQKKNVNYVQPTPFLGNYSGYILLKENLL
DVVYNLSKLFPNDPYVGAFFGTKPTLIVKDPEFIKLVLTKDFFYFTGKECFEYTHKEVIT
QGIIFTYGDRWKVIRQNVTPLFSSSKMRSMFRIIEHCSGVFENLLD
EESLAPEVEMKSLM
SRFTMDCIGGCVFGVDINAMQEPKDNIFTTMGCLFLETTTSRGIKNVVKAIWPEIFYGLG
FKVFPTDIHKFFSKLLVRIFEARDYKPSERTDFVDLLLNLKKNRHIVGDRLQKIKTGDEG
ADSKFELEVDDGLLVGQCLAFFSTGFETSSTISNFTLYELAKNPDVQKRAQKEVDEYIKK
HNNKLDYDCVKELPFVEACIDEALRLYPLFGVISRQTGERYTFPTGLTLDKGDRVHIPVY
HLQRDPEYFPEPELFKPERFYGEEKKNIRPFTYLPFGDGPRICI
(1)
11263
GMRFAKMQILAGLVTILKKYTVQLADGMPETIDIEPKAIVTQPAISLRLKFVPRNDLQKRIFA* 11069
>CYP6AE6P
(old CYP6AEee) BAAB01091974.1 93% to CYP6AEff
BAAB01178163.1
= C-term exon 50% TO BAAB01091139.1
BAAB01149335.1
= N-TERMINAL (MOST PROBABLE SEQUENCE TO COMPLETE GENE)
1055
MILTIIFILSLCVYILYRISTRKFDYWQKKNVSYVEPAPFLGNYSGYILLKENLLDVVHN 1234
1235
LSKLFPNDPYVGAFFGTKPTLIVKDPEFIKLVLIKDFYYFHGREGSKYTHNEVITQGIFF 1414
1415
TYGDRWKVIRQNLTPLFSSSKIRNMFRTIEKCSGVFENLLD 1537
EESLAPEVEMRSVMSRFT
MDCIGGCVFGVDINAMQEPKDNIFTTMGCLFLETTTSRGIKNVVKAIWPEIFYGLGFKVF
PTDIHKFFSKLLVRIFEARDYKPSERTDFVDLLLNLKKNRHIVGDRLQKIKTGDEGADSK
LELEVNDDLLVAQCVSFFIAGFETSSNSLTFTLYELAKNPDVQKRAQKEVDEYIKKHNNK
LDYDCVKELPFVEACIDEALRLYPLFGVISR
(frameshift)
DKRERYTFPTGLTLDKGDRVHIPVYHLHHDPEYFPEPELFN
PERFYGEEKKNIRPFTYLPFGAGPRVCI
GERFAKMQMLAGLVPILKRYTVRLAEGMPETINFEPKAIASQPNIGVRLNLLPRNN
>CYP6AE7
(old CYP6AEdd) 53% to 6AE1 BAAB01210600.1 BAAB01149335.1 87% to CYP6Aeaa
Bmb021626
from Li Bin
MILTIIFILSLCVYILYRISTRKFDYWQKKNVSYVEPAPFLGNYSGYILLKENLLDVVHN
LSKLFPNDPYVGAFFGTKPTLIVKDPEFIKLVLIKDFYYFHGREGSKYTHNEVITQGIFF
TYGDRWKVIRQNLTPLFSSSKIRNMFRTIEKCSGVFENLLDEESLAPEVEMRSVMSRFTM
DCIGGCAFGVDTNAMQEPKDNIFKTMGYLFFESTTHRGIKNVFKAIWPEIFYGLGFKVFP
TDLNEFFSKLLVGIFEARDYKPSSQNVFINLLLNLKKNRHIVGDRLLKIKTGNVRAESKI
KLEVDDELLVSQCVAFFIAGFETSSTISSFTLYELAKNPDVQKRAQKEVDEYIKKHNNKL
DYDCVKELPFVEACIDEALRLYPVLGVLTREVMEQYTFPTGLTLDKGDRVHIPVYHLQRD
PEYFPEPELFKPERFYGEEKRNIRPFTYLPFGAGPRTCI
(1)
GQRFAKMQMLAGLVTILKRYTVRLAEGMPETINFEQRAIVTQPNIGIRLNLLPRNN*