Revised Oct. 18, 2007

D. Nelson

 

Some names not previously assigned are now assigned.  Some revisions to sequences are made, while some are known but remain confidential.  Look for a new Bombyx genome paper in Science late this year/early next year.  Then all will be revealed.

 

Revised Feb. 9, 2007

D. Nelson

 

Some sequences are given placeholder names like CYP6AEaa until official

names can be assigned.  There are 51 complete sequences, 9 more are nearly

complete, missing only N- or C-terminals or a small internal piece.

This is 60 genes with strong, intact or nearly intact assemblies.

There are another 12 in closely related gene subfamilies that have all or nearly

all exons accounted for, but the exon connections cannot be made from the small

contigs that do not overlap.  This is at least 72 genes.  There are 7 more partials

that are less than half a gene at the present time.  If these can be completed, that

would make 79 P450s in silkworm.  This is comparable to Drosophila.

 

Parts of some genes were assembled from confidential sequences.  The Beijing Genomics Institute has now set up a database with a BLAST server to make this data public, so I have included it here.

 

SilkDB: a knowledgebase for silkworm biology and genomics

Nucleic Acids Research 33, 399-402 (2005)

 

http://silkworm.genomics.org.cn/

 

July 20, 2004

 

Note: trees have been built and naming is being done, but some sequences are still

uncertain.  The old CYP301B sequence has had its C-terminal exon changed based

on an Apis sequence and it is now CYP49A1, the old CYP49A1 is now CYP49A2

 

CYP4G subfamily sequences. 

 

>CYP4G22 old CYP4Gxx 64% to 4G19

AV404689 Bombyx mori prothoracic gland EST spans two contigs

AV405174 Bombyx mori prothoracic gland EST

AV404871 Bombyx mori prothoracic gland EST

BP183989 P5PG Bombyx mori cDNA clone

BP182770 cDNA clone NRPG1026

BP182055 NRPG Bombyx mori cDNA clone

BP183011 cDNA clone NRPG1327

BP183771 P5PG Bombyx mori cDNA clone

AU004478 EST

BP183321 NRPG Bombyx mori cDNA clone

BAAB01118673.1 BAAB01085960.1 84% to 4G20

MSYTNAENVVPTSTFSAINLFYVLLVPAVILWYAYWRMSRRRLYELADKLNGPPGLPLLGNALEFVGGSA ()

DIFRNIVQKSADYDHESVVKIWIGPRLLVFLYDPRDVEVILSSHVYIDKAEEYRFFKPWLGNGLLIST (1?)

GQKWRSHRKLIAPTFHLNVLKSFIDLFNANSRAVVDKLKKEASNFDCHDYMSECTVEILL (1)

 234 ETAMGVSKSTQDQSGFEYAMAVMKMCDILHLRHTKIWLRPDLLFKFTDYAKNQTKLLDIIHGLTKK 431 (0)

 988 VIKRKKEEFASGKKPSNLNETATTSEPSTGKLTSVEGLSFGQSSGLKDDLDVDDDVGQK 1164

1165 KRLAFLDLLLESSQSGVAISDEEIKEQVDTIMFE 1266 ()

1340 GHDTTAAGSSFFLSMMGIHQDIQDKVIEELDQIFGDSDRPVTFQDTLEMKYLERCLMET 1516

1517 LRLYPPVPIIARQVNQEITL 1576 ()

1680 SNGKKIPAGTTLVIATYKLHRRPDVYPNPNKFDPDNFLPERSANRHYYAFVPFSAGPRSCV 1862 (1)

1960 GRKYAMLKLKVILSTILRNFRVISDLKESDFKLQADIILKRAEGFQVRLQPRKRMAKA* 2136

 

>CYP4G23 old CYP4Gyy 58% to 4G19 59% to CYP4Gxx

CK508129   rswdd0_001928.y1 swd Bombyx mori cDNA

CK505752 rswcc0_009305.y1 swc Bombyx mori cDNA

AV401408 BP122820 BP123076 BP183420 CK509955 CK505752 CK505553

BAAB01050777.1 BAAB01096227.1 BAAB01156775.1 BAAB01003438.1

BAAB01073763.1 BAAB01110168.1 BAAB01050777.1

MTSLVDETEGYHVNSRVIFYPLLGLTTAIWILYRWQQNSHMHKLAELLPGPASIPIFGNALTLMRKNPHE ()

LVNLALGYAQTFGNVIRVWLGSKLIVFLVDADDIEIILNSHVHIDKATEYRFFKPWLGEGLLISS (1?)

GPKWRSHRKMIAPTFHINILKSFVGIFNQNSNNVVEKLKSEVGKTFDVHDYMSGTTVDILL ()

ETAMGISRKTQDESGFDYAMAVMK ()

MCDIIHQRHYKFWMRSEIVFKLTSFFKQQTKLLGIIHGLTNK ()

VIKNKKETYLENKAKGIIPPTLEEFTHHSGEILANNAKTLSDTVFKGYRDDLDFNDENDV ()

GEKKRLAFWDLMIESSQNGTNKISDHEIKEEVDTIMFE ()

GHDTTAAGSSFVLCLLGIHQDVQARVYDELYQIFGDSDRPATFADTLEMKYL

ERVILESLRLYPPVPVIARKLNRDVTI ()

STKNYVIPAGTTVVIGTFMLHRQPKYYKDPEVFNPDNFLPENTQNRHYYSYIPFSAGPRSCV (1)

GRKYALLKLKILLSTILRNFRTISEIPEKEFKLQGDIILKRAEGFQMKVEPRKRVPTNVAR*

 

>CYP4G24 old CYP4Gzz 71% TO CYP4Gyy

FIRST TWO EXONS ARE JOINED WITHOUT OVERLAPPING EVIDENCE

BUT THEY ARE THE ONLY OTHER 4G N-TERM SEQUENCES WITHOUT A PARTNER

BAAB01149226.1 BAAB01135847.1 BAAB01073803.1 BAAB01199362.1

MSSLLNYNFEFDVPPIFHTLLLAIMIMWM

LHRWQQSSRLFRLGNKLPGPMALPLVGNSLLILGKKAEGW = BAAB01031491.1

VKYALDYSEKYG

TVVRAWAGPKLVVFLTDANDVEVILNSQIHIDKSPEYRFFKPWLGEGLLISSG = BAAB01016193.1

XKWRSHRKMIAPTFHINILKSFMGVFNENSKSVVKKLRSEVGKTFDVHDYMSCVTVDILL

ETAMGITKTTQDAASFDYAMAVMK

MCNIIHQRHYKVWLHFDAIFKLTSLFKKQRELLKTIHGLTNK (0)

VIKKKKFMYLQNKEKGIIPPTIEELTKIKDTNDSIMEDSAKTLSDTVFKGYRDDLDFNDEQDV ()

GEKKRLAFLDLMIESAQNHTCNISDHEIKEEVDTIMFE ()

GHDTTAAGSSFVLCLLGIHQEIQSKV

YDELFEIFGDSDRLVTFADTLQMKYLERVILESLRLYPPVPAIARKLTRDVQI ()

VTNNYIIPAGSTVVIGTFKIHRDPKYHKNPNVFNPDNFLPENTQNRHYYSYIPFSAGPRSCV (1)

GRKYALLKLKVLLSTILRNYKTTSEISEDQFVLQADIILKRYDGFKIRIEPRNKNHSNTV*

 

CYP4L subfamily sequence

 

>CYP4L6  67% to 4L4

CK535040 EST 5aa diffs BP182824 EST

BAAB01121615.1 BAAB01097697.1 BAAB01093085.1 BAAB01107041.1

MYLILLICLLVLVLTVSWWSMLNRICKSNVPGPFPLPIIGNAHQFVVRST (1)

EFLGLLKSFTDKYGDVFRVHFFSYPYVLISHPKYAE (0)

 932 ALVSSADLITKGRSYSFLKAWLGEGLLTAS 1018 (1)

1506 GPRWRLHRKFLTPAFHFNILQNFLPVFCKKSEILRDKIRRLADGQPIDLFPITALAALDNVAESIM 1688 (1)

GVSVNAQQNSESEYVRAIEX ()

LSQITTLRMQIPLLGEDFIFNLTSYKKKQNIALEVVHGQTKKVIEARRCELEKNNKTNISGTNE ()

IGIKNKHAFLDLLLLAEIDGKL ()

12  MDEQSVREEVDTFMLEGHDTTTSGI 86

85  LVYTLFGLSKHPDIQEKWYEEQLTIFGEEMDRTPAY ()

NELAQMKVLEWVIKESLRMYPSVP 264

265 LIERWITKDAE

VGGLKLSKGTSVVLNIFQMHRNPEVFEKPLEFIPERFDSLEHKNPFSWLAFSAGPRNCI ()

GQKFAMMEMKVTLSTLVRNFKLVPVDIEPILCADLILRSQNGVKVGFLPRTQSNSKT

 

CYP4M subfamily sequences

 

>CYP4M5 EST BP125511

BAAB01099467.1 BAAB01097680.1 BAAB01039373.1

MFVYLIFIASFFLLIHLAFNY

NSKAVMMNKVPGPKLSFILGNAPEIMMLSSVELMKLARKFASRWDGIYRIWAFPLSIINIYNPDDVEVIVSTTKHN

EKSSVYKFLKPWLGDGLLISK

GEKWQQRRKILTPAFHFNILRQFSVIIEENSQRLVESLEKCIGKPIDIVPVVSEYTLNSIC

ETSMGTQLSDKTEDAWKAYKDAIYELGPYFFQRFTRVYLYFDIIFYLTSLWRKMKKPLKSLHG FTSTVIKERKIYVEQNGVKF

GEDVNDDDLYIYKKRRKTAMLDLLIAAQKDGEIDDHGIQEEVDTFMFE

EHDTTASGLTFCFMLLANHRAVQDKIVEEINYIMGDSTRRANLEDLSKMKYLECCIKESLRLYPPVHFISRNLNEPV

VLSNYEIPAGSFCHIHIFDLHRRADIYEDPLVYDPDRFSQENSKGRHPYAYIPFSAGPRNCI

GQKFAMIEMKSAVAEVLRKYELVPVTRPSEIELIADIILRNSGPVEITFNKRTK

 

>CYP4M9 BAAB01103156.1 BAAB01142776.1 BAAB01178932.1 57% to 4M5

BY921981.1 EST N-term

MWMYFILILLLLLTIHFLLNYNYRARLLRRIPGPRGYFIVGNALDVILSPAELFASTRK

NAAQWPNLNRFWSFGIGALNIYGPDEIEAIISSTQHITKSPVYNFLSDWLRDGLLLST

GTKWQKRRKILTPAFHFNILKQFCVILEENSQRFTENLKDTEGKSINVVPAISEYTLHSII ()

675 ETAMRTQLGSETSEAGRSYKNAICELGNQFVHRLARLPLHNNFIYNLYTLGKQNKH

LNIVHSFTKKVIKDRRQYIRGNGGNNFDDEKDTQADEHSIYFNKKKTAMLDLLLKAERDGL

IDEIGVQEEIDTFMFE ()

1150 GHDTTATGLTYCIMLIANHKSIQVGALRKYID 1245 ()

295 GESTRAADIEDLSKMRYLERCIKESLRLYPPVPSMGRILSEEI 423 ()

1467 VLDGYTVPAGTYCHIQIFDLHRREDLFKDPLVFDPDRFLPHNTEGRHPYAYIPFSAGPRNCI 1282 ()

GQKFAILEMKSLLSAVLRRYNLYPITKPEDLKFVLDLVLRTTEPVHVRFVKRNKV*

 

CYP4S subfamily sequences

 

>CYP4S5 66% to CYP4S4 from the moth Mamestra brassicae

BAAB01177041.1 BAAB01181662.1 BAAB01048279.1 BAAB01025317.1

AADK01013707.1

MIWAVVLLIIFIFLIWKALEDEENPLDSLPGPERKPIIGATMRFVNLNT ()

3009 GEMFIKLREYHAMYGTRYVVKIFKRRILHLSNERDVE (0)

 365 VVLSHSKNIKKSKPYTFLSPWLGSDLLLST 454

GFKWHSRRKILTPTFHFNILKSFLEIIKDKSCDLVKRLEEYRGEEVDLMPVISDFTLFTIC

ETAMGTQLDSDKSAETSEYKMAILQIGSLLLDRLTKVWLHNDFIFRQFTVGRKFQKCLKQVHSFAHN

VIVERKRQRASGRDPTVVAEDVFG

RKKRLAMLDLLLEAEEKNEIDFEGIMDEVNTFMFE (1?)

GHDTTAVALTFSLMLVAEDDQVQ (0?)

DRIYKELQGIFGDSDRRPTISDVAEMKYLEAVVKETLRLYPSVPFIAREITEDFML ()

DDLKIKKGSEVAVHIYDLHRRKELFSDPEKFLPDRFLNGELKHPYSFVPFSAGPRNCI (1)

GQRFATLEMKCVLSEICRSFRLEPRTKGWRPTLVAEMLLRPNEPIHVKFIKRKQS*

 

>CYP4S6 94% to CYP4S5

BAAB01121123.1 BAAB01054179.1 BAAB01073078.1 BAAB01008096.1 BAAB01081659.1

AADK01014846.1

 164 MIWTVVVLIIFIFLIWKALEDEENPLDSLPGPERKPIIGAALRFVNLNT 18 ()

     REMFIKLREYHAMYGTRFVVKIFKRRVLHLSNEKDAE (0)

     VVLSHSKNIKKSKGYTFLSPWLGSGLLLST (1)

1053 GFKWHSRRKILTPTFHFNILKSFLEIIKDKSCDLVKRLEEYRGEEVDLMPVISDFTLYTIC 1235

1966 ETAMGTQLDSDKSAETKEYKMAILQIASLLLNRLTKVWLHNDLIFDQLTVGRKFQK 2133

2134 CLKQVHSFAHNVIVERKRQRASGRDYSVVAEDVFGRKKRLAMLDLLLEAEEKNEIDFEGI 2313

2314 MDEVNTFMFE 2343 ()

2862 GHDTTAVALTFSLMLVAEDDQVQ 2930 ()

3158 DRIYKELQGIFGDSDRRPTISDVAEMKYLEAVVKETLRLYPSVPFIAREITEDFML 3325 ()

 578 DDLNIKKGSEVVVHIYDLHRRKELFADPEKFQPDRFLNGELKHPYSFVPFSAGPRNCI 751 ()

     GQRFAMLEMKCVLSEICRSFRLEPRTKGWRPTLVAEMLLRPNEPIHVKFIKRKQS*

 

CYP4AU sequences

 

>CYP4AU2 BAAB01192732.1 CYP325 like N-term

AADK01032226.1

1927 MIILLLLFFVASLYWLYWTNKSKRMNKMTASLPVPPTLPILGNATLFIG

 

>CYP4AU2 35% to 4C3 aa 112-276 37% to 4G19 BAAB01022101.1 exons 2,3,4

AADK01025934.1

(1) AIGDPEDAQVVLENCLDKDVVYRFLRPWLGHGLFVAP (1)

VCLWKSHRKVLLPVFHNKVVEQYLQMISVQADILVERLNEKANKGEFDVLKYITACTLDIVF (1)

ETAMGERMDVQRSPDTPYLRARHTVMTILNMRFFKVWLQPDCIFNLTSYSKQQKDNIDLTHKFTDE (0)

 

>CYP4AU2 62% to CYP4AU1 I-helix to end BAAB01055413.1

AADK01029434.1

Note N-terminal is known but confidential (10/18/2007)

Only missing 23-24 aa in the middle between the two segments

(1) GKVRPVLDMLFGREIEFTDEQLREHIDSITIAGNDTTALVMAYTLVRLGIHQNVQEKVYLE ()

QRTIFGDSKRGADKVDVAQMQYLERVLKESMRLYTVVPIIARNVHKDTYL ()

PRCGVTLPAGIGAVVGPFAIHRSKSVWGPDADEFDPDRFLPERSLNRHPAAFLPFSHGSRNCI ()

GRNFGMLIMKSIVSTISRSYRIEADELGPLKIEMLLFPIRGHQIRISRRETLA*

 

>CYP4AU-like BAAB01077742.1 Length = 3282 68% to BAAB01022101.1 exon 2

AADK01002646.1

69% to CYP4AU2

2223 AICDTEHAQIILDCLNKDMVYRFLRLWLEHGLFVAS 2330

 

CYP4AX sequences 50% identical to CYP4S4

 

>CYP4AX1 CYP4qq 50% to CYP4S4 53% to CYP4Sxx lower case is from CYP4rr

BAAB01092965.1 BAAB01135526.1 BAAB01183645.1, AADK01005955.1

AADK01030377.1 (frameshift after GQKF in last exon)

MWFSLVVVAVIYALWKLFYKEDDPIDSLPGPTKLPIIGNMLDMFNMTP ()

GEKFKYERQLSKTYKQRYMQKIFYRRIVYVHHPDDVE ()

VVLSHSKNITKNVNYDFLKPWLGTGLLLST ()

GSKWFKRRKILTSAFHFDILKDFASLFEEKSRRLVDQLRANNGEPISLLPVMSNYTLFTLC (1)

ETALGTKLDTDRSVAAAEYKDAISKTAQISIYRLPRIWLYIDAIFNRTSAGREFAKNVDIIHSFADN

VIVQRKEQRLNSLDKGLVERDEFNRKKRTALLDLLLEAEAKREIDLEGIREEVNTFMFA ()

GHDTTGTALTFSLMLLSDHEEAQ ()

ERILEEYNEVMRGKETPTLSEFAEMKYLDAVIKETLRLYPNPHRVGRVLTEDITL (1)

GGVPIKAGTEIGVQIIDLHHREDFFPEPEKFRPERFLRGEIQHPYSFVPFSAGPRNCL ()

GQKF

AMLEIKSVLTHICNNFKLVPMKRNWRVETVSDIVLKPAEPIYIKFVPR*

 

>CYP4AX2 CYP4rr 94% identical to CYP4qq lower case is from CYP4qq

BAAB01152933.1 BAAB01194854.1 BAAB01058563.1 BAAB01102208.1 BAAB01008375.1

MWFSLLVVAVIYALWKLFYKEDDPIDSLPGPAKLPIIGNMLDMFNMTP ()

GEKFKYERQLSKTYKQRYMQKVFYRRIVYLHHPDDVE (0)

VVLSHSKNITKNVIYDFLKPWLGTGLLLST ()

GSKWYKRRKILTSAFHFDILKDFASLFEERSRRLVDQLRANNGEAISILPVMNNFTLLTIC (1)

ETALGTKLDTDRSVNTAAYKDAISKIGQICIYRLSRIWLYIDAIFNRTSAGREFAKNVDIIHSFADN

VIVQRKEQRLNSLDKGLVERDEFNRKKRTVLLDLLLEAEAKREIDLEGIREEVNTFMFA ()

ghdttgtaltfslmllsdheeaq ()

ERILEEYNEVMRGKETPTLSEFAEMKYLDAVIKETLRLYPNPHRIGRVLTEDITL (1?)

GGVPMRAGTEVCVLTIDLHYREDFFPEPEKFRPERFLRGEIQHPYSFVPFSAGPRNCL (1)

GQKFAMLEIKSVLTHICNNFKLVPMKRNWRVETVSDIVLKPAEPIYIKFVPR*

 

&&&&&&&&&&&&&&&&&

 

>AM106362 Lutzomyia longipalpis Jacobina = best EST to these genes

MWGKFFFICDGLFWAVFFSPPLRGGLGSFFRGGGRKFPGPFYGS

PSLGRLPAQMGFSQKGFLGLPKVILSKITQSNENFGWAEAVC ()

AYFPPEECQYVFNANECLSRDDIYDYIKPFTGDGLVTLP ()

AETWKDHRKFLNPCFNLKILQSYMPIFNTEVKTLIGRLGQRIGKGSFDMYDYMDACALDVVC ()

QTTLGTQMNIQKNENMDYLDAANSLLATMTTRIFNPLYHSDFIFNLSKWAKMEQKNSDITFGFVDNILQR

KKAAYKKFQPSDEQNNLDEGTSFKSPQLFIDQLLKLSMEGKYFTDTDVKNEANTIVA

 

 

CYP6B sequence

 

>CYP6B29 51% TO 6B1 CK494244 EST 57% to 6B27 aa 144-500

BAAB01105883.1 BAAB01064450.1

2461 MAIIYILSASVVLPLLLYLYFTRHFNYWKKRNVPGPKPVPLFGNLMELALRKKNIGIVFKELYENFPNEK 2670

2671 VVGIYRMTTPCLLIRDLDVIKNIMIKDFDVFVDRGVELS*SGLGANLFHADGDTWRV 2841

2842 LRNRFTPLFTSGKLKN

MLHLMIERANKYIEHVEMLCDHQPEQDIHTLVQKYTMATIAACAFGLDIDTTDPNKDQLK

TLEEIDRLSLTANFAFELDMMYPGVLKKLNSTLFPGFVSRFFKDVVKTIIEQRNGKPTDR

NDFMDLILALRQLGDIQATKRNSEDKEYSIELTDELIEAQAFVFYIAGYETSATTMTFML

YQLALNPDIQDKVIAEIDQGLKESKGEVTYEMLQNLTYFEKAFNETLRMYSIVEPLQRNA

KIDYKIPDTDIVIEKGTTVLFSPLGIHHDEKYYPNPSKFDPERFSPANISARHPCAHIPF

GTGPRNCI ()

661 GMRFAKIQSRVCMVKMFSKFRFELAKNTPRNLDIDPTRLLLGPKGGIPLKIVRR*  825

 

CYP6AB subfamily sequences

 

>CYP6AB4 64% TO 6AB1 OVER WHOLE SEQ BAAB01081811.1 BAAB01162324.1

BAAB01031011.1, AADK01009517.1

MLTAAIFVIIVALVYLYSTRTFRYWEKRGIKHDKPVPFFGTDSEGYLLRKSMTQTAVDAY

LKYPNEKVIGFFRSSRPELIIRDPDIIKRILTTDFAYFYPRGLNPHKKVIEPLMRNLFFA

DGDLWRLLRQRMTPAFTSGKLRAMFPLVVERAERLQSRTLEIASQPLDARELMARYTTDF

IGACGFGLDADSLNDEDSAFRKLGAAIFNITVQQAIVAALKEIFPGIFKHFKYSSKYETD

FMSLVSSILKQRNYKPSGRNDFIDLLLECRMKGEIVVESIEKMKPDGTSEVVRMELTEQL

LAAQVFIFFAAGFETSSSATSFTLHQLAFHPEIQEKVQKELDQVLAKYNNKLCYDAIKEM

RYLESAFKEAMRMFPSLGFLIRECARQYTFPELNLTIDEGVGIIIPLQALHNDPEYFDSP

NEFRPERFMPSEYNHNKTKFVYLPFGDGPRGCI ()

GARLGLMQSLAGLAAVLSKFTVKPAPSTKRHPVVEPKSSVVQSIKDGLPLLFIERTKS*

 

>CYP6AB5 BAAB01021567.1 BAAB01206787.1 58% to 6AB3 72% to 6AB2

56% TO  CYP6ABxx

MISSING C-TERM BEYOND A REPEAT SEQ.

possible C-term = BAAB01196051.1 66% to 6AB2

Bmb030331 from Li Bin

MFFYLLIVILITLYYYGVR

TFDYWKKKGVNHDPPLPFFGNNLRQFMQKASMAMMATETYKKYPEEKVVGFYRGTSPELV

VRDPELIKRILVTDFSSFYARGFNPHKKVIEPLLKNLFFADGDLWRLIRQRFTPAFSTAK

LKAMFHIITERAEKLQMIAENEAYENFCDVRELMARYTTDFIGACGFGLNIDSLSDENSQ

FRKLGKRIFKRDLSDAVRAALKLMFPELCKHLTFLTPELEKSMTYLVQNVIREKNYKPSG

RNDFIDLMLELKQKGKLLGESIEAKNANGTPKQVELEFDDLLMTAQVFVFFGAGFETSST

ASSYTLHQLAFNPECQEKTQKEIDEVLSKHNNKITYDAIKEMTYLEMAFNEAMRLYPSVG

YLVRMCTVPEYTFPEINLTINEDVKLMIPIQAIQKDEKYFKDPERFHPERFSSGAKANLK

PYTFLPFGEGPRACV ()

923 GERLGQMLSMAGLVAVLQKYTVEPVEISLRDPIPDPTTTVSEGFVHGLPLKLRRRERRI* 1102

 

>CYP6AB8 96% identical to CYP6AB4 20 aa diffs

whole seq known but confidential

 

CYP6AE sequences. 

 

>CYP6AE2    Bombyx mori (silkworm)

            old CYP6AEcc BAAB01091139.1 contig444108,

            frameshift at 1529 whole gene in one contig

            No accession number

            Junwen Ai

            submitted to nomenclature committee Jan. 31, 2007

            nearly identical to sequence CYP6AEcc on Bombyx page except

            one region from amino acids 106-256 does not match.

            The EST BY914225 agrees with the Ai sequence so the old

            CYP6AEcc sequence appears to be a hybrid assembled from two genes.

 663 MSVSALVFAAFVLLVTYIYYWSTRKFDYWKRKNVPYAKPVPFFGNYMRYITLQSFLGDVM 842

 843 QKLCQQFPDRPYFGSFYGTEPALVLQNPEIIKQVFTKDFYYFNSRENRDYNHKEVFTQNL 1022

1023 FFANGDRWKVIRQNLTPLFSSSKMKNMFYLVEKCNHSLEDMLDKETKDLQSIEIRSAMIR 1202

1203 YTLDSICSSAFGIETNTLSEGAENSPFPSMGSTIFSSSITRGLKLIGRSMWPGIFYKLGL 1382

1383 RCFPTEIDDFFERLLTEVFENRGYKPTNRNDFVDLILSLKQNDYLTGDG 1529

1529 LVPKNVDAKKVTVKVDDALLIAQCVVFFGAGFETSATTLSAALYELAKNPEAQRRAQEEV 1708

1709 DELLLKHNNKLNYDCLAELPYLEACMNEAMRLYPVLPNITREAVTDYTFPDGLRIDKGMR 1888

1889 VHVPVYAIHRNPDNFPDPEEFRPERHLGDAKNDIKQFTYFPFGEGPRICI 2038 (1)

2917 GMRFGKMQTIAGMVTCLKKYNFELADGMSKTVPFRSTTVLTQPSTGLFLKATPRDGWKQRIFAR* 3111

 

>CK526561   a related EST rswfa0_003045.y1 swf Bombyx mori cDNA.

2aa diffs with BAAB01091139.1, 6aa diffs with CYP6AEbb

    GFETSASTLSAALYELAKNPEAQRRAQEEVDELLLK

414 HDNKLNYDCLAELPYLEACMNEAMRLYPVLPNITREAVTDYTFPDGLRIDKGMRVHVPVY 235

234 AIHRNPDNFPDPEEFRPERHLGDAKNDIKQFTYFPFGEGPRICIG

 

>CYP6AE3P (old CYP6AEbb) BAAB01174895.1 95% to BAAB01091139.1 54% to 6AE1

there are probably two very similar genes

BAAB01172535.1 fills in missing sequence for this contig

Bmb026776 from Li Bin

MSVSALVFSAFVLLVTYIYYWSTRKFDYWKRKNVPYAKPVPFFGNYMRYITLQSFLGDVM

QKLCQQFPDRPYFGSFYGTEPALVLQNPEIIKQVFTKDFYYFNSRENRDYNHKEVFTQNL

FFANGDRWKVIRQNLTPLFSSSKMKNMFYLVEKCNHSLEDMLDKETKDLQSIEIRSAMIR

34  YTLDSICSSAFGIESNTLREGGENSPFANMGSIVFSSSITRGLKWISRSMWPGIFYKLGL 213

214 QCFPAEIDGFFER 252

LLTEVFENRGYKPTNRNDFVDLILSLKQNDYLTGDGLVPKNVDAKKVTVKVDDALLIAQC

VVFFGAGFETSATTLSAALYELAKNPEAQRRAQEEVDELLLKHNNKLNYDCLAELPYLEA

CMNEAMRLYPXXXXXTRETVADYTFPDGLRIDKGMRVHVPVYAIHRNPDNFPDPEEFRPE

RHLGDAKNDIKQFTFFPFGEGPRICI (1)

GMRFGKMQTIAGLITCLKKFNFELADGMPRTLAFRSTTLLTQPSTGLFLKATPRDGWEQRIFAR*

 

>CYP6AE4 (old CYP6AEaa) BAAB01211364.1 84% to CYP6AEff 53% to 6AE1

missing C-term

MLFLTLLFILSVCVYFIY (frameshift)

     YRVCNRRFDYWRKKNVSFVPPVPILGNYSGYILLKESISKVVHNLCKLFP 2567

2568 NDPYIGAFFGTEPTLIVKDPEFIKLVLTKDFYHFNGREGSKYTHNEVVTQNIFFTYGDRW 2747

2748 KVIRQNLTPLFSSLKMRNMFHIIEKCSGIFENLLDEESLAPEVEMKSLMSRFTMDCIGGC 2927

2928 AFGVDTKAMQEPKDNIFTTMGYLFFESTTYRGIKNVLRAIWPGIFYGLGLKVFPTDLNEF 3107

3108 FSKLLVGIFEARDYKPSSRNDFVDLLLNLKKNRHIVGDRLQKTTTGDEGADSKFELEVDD 3287

3288 GLLVGQCLAFFSAGFETTSTISNFTLYELAKNPDVQKRAQKEVDEYIKKHNNKLDYDCVK 3467

3468 ELPFVEACIDEALRLYPVLGVLTREVMEQYTFPTGLTLDKGDRVHIPVYHLQRDPEYFPE 3647

3648 PELFKPERFYGEEKKNIRPFTYLPFGDGPRICI 3746 (1)

 

>CYP6AE5 (old CYP6AEff) BAAB01096775.1 91% TO CYP6AEee

BAAB01211363.1 51% to CYP6AE1 Depressaria pastinacella 86% TO CYP6AEdd

AADK01005027.1

MILTIIFILSLCVYILYRISTRKFDYWQKKNVNYVQPTPFLGNYSGYILLKENLL

DVVYNLSKLFPNDPYVGAFFGTKPTLIVKDPEFIKLVLTKDFFYFTGKECFEYTHKEVIT

QGIIFTYGDRWKVIRQNVTPLFSSSKMRSMFRIIEHCSGVFENLLD EESLAPEVEMKSLM

SRFTMDCIGGCVFGVDINAMQEPKDNIFTTMGCLFLETTTSRGIKNVVKAIWPEIFYGLG

FKVFPTDIHKFFSKLLVRIFEARDYKPSERTDFVDLLLNLKKNRHIVGDRLQKIKTGDEG

ADSKFELEVDDGLLVGQCLAFFSTGFETSSTISNFTLYELAKNPDVQKRAQKEVDEYIKK

HNNKLDYDCVKELPFVEACIDEALRLYPLFGVISRQTGERYTFPTGLTLDKGDRVHIPVY

HLQRDPEYFPEPELFKPERFYGEEKKNIRPFTYLPFGDGPRICI (1)

11263  GMRFAKMQILAGLVTILKKYTVQLADGMPETIDIEPKAIVTQPAISLRLKFVPRNDLQKRIFA* 11069

 

>CYP6AE6P (old CYP6AEee) BAAB01091974.1 93% to CYP6AEff

BAAB01178163.1 = C-term exon 50% TO BAAB01091139.1

BAAB01149335.1 = N-TERMINAL (MOST PROBABLE SEQUENCE TO COMPLETE GENE)

1055 MILTIIFILSLCVYILYRISTRKFDYWQKKNVSYVEPAPFLGNYSGYILLKENLLDVVHN 1234

1235 LSKLFPNDPYVGAFFGTKPTLIVKDPEFIKLVLIKDFYYFHGREGSKYTHNEVITQGIFF 1414

1415 TYGDRWKVIRQNLTPLFSSSKIRNMFRTIEKCSGVFENLLD 1537

     EESLAPEVEMRSVMSRFT

MDCIGGCVFGVDINAMQEPKDNIFTTMGCLFLETTTSRGIKNVVKAIWPEIFYGLGFKVF

PTDIHKFFSKLLVRIFEARDYKPSERTDFVDLLLNLKKNRHIVGDRLQKIKTGDEGADSK

LELEVNDDLLVAQCVSFFIAGFETSSNSLTFTLYELAKNPDVQKRAQKEVDEYIKKHNNK

LDYDCVKELPFVEACIDEALRLYPLFGVISR (frameshift)

DKRERYTFPTGLTLDKGDRVHIPVYHLHHDPEYFPEPELFN

PERFYGEEKKNIRPFTYLPFGAGPRVCI

GERFAKMQMLAGLVPILKRYTVRLAEGMPETINFEPKAIASQPNIGVRLNLLPRNN

 

>CYP6AE7 (old CYP6AEdd) 53% to 6AE1 BAAB01210600.1 BAAB01149335.1  87% to CYP6Aeaa

Bmb021626 from Li Bin

MILTIIFILSLCVYILYRISTRKFDYWQKKNVSYVEPAPFLGNYSGYILLKENLLDVVHN

LSKLFPNDPYVGAFFGTKPTLIVKDPEFIKLVLIKDFYYFHGREGSKYTHNEVITQGIFF

TYGDRWKVIRQNLTPLFSSSKIRNMFRTIEKCSGVFENLLDEESLAPEVEMRSVMSRFTM

DCIGGCAFGVDTNAMQEPKDNIFKTMGYLFFESTTHRGIKNVFKAIWPEIFYGLGFKVFP

TDLNEFFSKLLVGIFEARDYKPSSQNVFINLLLNLKKNRHIVGDRLLKIKTGNVRAESKI

KLEVDDELLVSQCVAFFIAGFETSSTISSFTLYELAKNPDVQKRAQKEVDEYIKKHNNKL

DYDCVKELPFVEACIDEALRLYPVLGVLTREVMEQYTFPTGLTLDKGDRVHIPVYHLQRD

PEYFPEPELFKPERFYGEEKRNIRPFTYLPFGAGPRTCI (1)

GQRFAKMQMLAGLVTILKRYTVRLAEGMPETINFEQRAIVTQPNIGIRLNLLPRNN*