Xenopus tropicalis cytochrome P450s
 
 
The 2006 Bioinformatics class searched the ESTdb for matches to Human P450s in 
the species Xenopus tropicalis.  There are over 1 million ESTs for this frog 
species in ESTdb.  The yellow sequences were turned in for assignment #1.
Additional sequence was added later to complete the sequences, by walking 
upstream or downstream on more X. tropicalis ESTs. The refseq_rna 
database was also searched.
(in progress, Jan. 23, 2006) 
Note: there are 22.9 million reads for X. tropicalis in the Trace Archive.
Incomplete sequences can be completed by chromosome walking with the MegaBlast server.
This file now has 55 sequences. 52 from X. tropicalis and 3 from X. laevis
50/52 are full length CYP sequences. 
I have just added a cluster of 25 genes and some pseudogenes on scaffold 55
That will push the gene count up near 75. 
 

>BX728777 CX904306.1 CYP1A1 N. Abdletawab

552208048 411550065 409289324 388847477

62629_prot from UCSC browser scaf 287 (+) 1408174-1414975

62% to 1A1 57% to 1A2, 90% to 1A6, 91% to 1A7

MMDNSTTTEVLVASIVFAIVFLVIRSQRVKLPPGTKKLPGP

MPYPVIGNLLSLSKNPHLSLTKMSETYGDVFQIQIGTKPMLVLSGLETLRQALIRQSDEF

AGRPDLFTFRLVGDGQSMTFSSDSGEV

WRARRRLAQNALKTFATSPSPTSSNSCLVEENIITEAEYLIRKFKELIDDKGEFDPYRYV

VVSVANVICGMCFGKRYNHDDEELLNVVNLTDEFGAAAASGNPADFIPILQYFPNSSMKA

FKEINQKFLAFMQKFTKEHYKTFDKNHIRDITDSLIQHSQEKRVDENSDIQLSNEKIVNI

VNDLFGAGFDTITTALSWSLMYLVAHPNIQQRIQDELDQVIGRERRPRLSDRAQLPYTE

AFILEMFRHSSFMPFTIPH (1)

CTTKDTMLNGYFIPKGICVLINQWQVNHDP(2)

NLWQDPFKFCPERFLNNDGTMVNKTEMEKVMIFGL

GKRRCVGEAIGRMEVFLFLTTMLQQMQFFKQDGEKLDMSPQYGLTMKHKR

CHLTAKLRFALLTN*

 

>DN053435 DN024870 51% to CYP1A8P ortholog DN024871 mate pair to DN024870
DN025714.1

MESAVKKTLMDMMPMLLKASISFLTVLLVMSILWKKRNSLPGPWAVPI

VGNFFQLGDQIHITLTDMRNRYGDVFQIKLGLMPIVVVSGLETVKRVLLKEGENFADRPN

FYSFSLFSNGSSMTFSEKYGESWKIHKKIMKNALRNLSNESTNSSNCSCRLEEYVCAEAS

DLVQELTDLSAEKVAFDPSQSIVITVANVVCALSFGKRYDHHDKEFLTLIDFNNDLRKA

AGGGLLADFIPILRFIPSSSVKALKKFVQSFHSFIAKCVKDHFATFEENNIRDITDA

LIQLCKERKSEDKNQLLSDDQIISTVNDIFGAGFDTITSALLWAIFYLLRYPEFQDKIHK

EIEEKIGCNRAPRFNDRKDLHYTEAFINEVLRHSSFVPFGLPHCTTMDTKLNGYFLPKGT

CVFTNLYQVNHDNTVWKDADMFMPERFLDQNGQIIKSLTEKVLVFGMGVRKCLGEDVARN

EMFVIMTIMMQRLKLVKSTKHELDPIPVYGLTLKPKPYYLVAKVRT*

 

>CX846813.1 C.Blackwell 1B1 as query 55% to 1B1 ortholog
CL126458.1 from GSS, Trace archive 483147144 391272900 233714403
422555774 (from Trace search with Human DNA for last part) 483233841
MNWKIWEDLGQSSVPKLLLSFLCALTVAHILKWIHEWIIPRWIRS

SQPPGPFPWPLFGNALQMGSYPHLAFIDLAKRYGNIFQIKLGSQKIVVLNGDLVIRHALL

HKGEDFAGRPKFTSYQFVSGGRSLAFGCYTEKWKAHRKLAHSTVRAFSTGNPQTKRCLAE

NVLKEARDLIALFSELGQGGKYFYPGRHTVVSVANVMSAVCFGRRYQHGDLEFQSLLSNN

DKFTRSVGAGSLVDVMPWLQRFPNPVRSVFRSFQQ (1)

VNYEFYDFVYKKFLLHRNTANQAV

TRDMMDAFIHILITKEGKVRADDADGGEEKGKNGQYFFHSLEAEHVPS

TVTDIFGASQDTLSTALQWVIFFLVR (2)

YPEIQTKLQDEMDRVIGKDRLPCIEDQPKLPYLMAFLYEF

MRFSSFVPITIPHATTKNTTIMGYQIPKDTVVFVNQWSVNHDPQKWSNPGEFNPSRFLDD

NGLINKDLVSNIMIFSVGKRRCIGEELSKIQLFMFSSILLHQCIFTALPADNLNPKGDYG

LSIKPKPFRISMTLRHGSMDLLNNSVLSGMAE*

 

>CYP2D45

NM_001015719.1 CX969358.1  54% to 2D6 E.Mahrous

MSLLSQLCPFALGCNVFTLGIIFTLLLLLLDFMKRRKPCTDFPP

SPPSWPFVGNLLQMDFRDLHNSFKQLSKQYGDVMSLRVFWKPTVVLNGFEVIKEALIQ

KSEDTADRPPFNLYEILGFVGNNKAVVLANYGQSWKDLRRFTLSTLRDFGMGKKSLEE

RVRDEAGYLCDAFQSEQGGPFDPHVLINTAVSNVICSIIFGERFEYDDHKFLKLLCLI

EESIKAESGPVPQIISSLPWSSKVPGLARLFFQPRIHMLQYLQEIINEHKQTWDSGHT

RDFIDAFMLEMKKAKGVKDSNFNDQNLLLTTADLFSAGSETTTTTLRWGLLFMLLYPD

VQRKVQEEIDQVIGRTRKPTMGDVLQMPYTNAVIHEIQRYADIIPLSVPHMAYRDTHI

KGFFIPKGTVIMTNLSSVLKDEKVWEKPFQFYPEHFLDRDGKFVKREAFMAFSAGRRV

CLGEQLARMELFLFFTSLLQRFSFQIPDGEPCLREDPVFVFLQVPHDYKICAKVR

 

>CYP2D.1 scaffold_160:807096-818137  (-) strand UCSC browser

52% to 2D6

MSLLSQLCPFALGCNVFTLGIIFTLLLLLLDFMKRRKPCTDFPPSPPSWPFVGNLLQMDSSSLSNSFRQ (0)

LKKQYGDVFSLQFYWQNVVVLNGYEAIKEALLQKSEDFADRPPFELYEGIGFTGNNK (1)

GVVTAKYGQSWKDLRRFTLSTLRDFGMGKKSLEERVGEEAGYLCDAFLSEQ (1)

GQLFDPHYKLNTAVANIISFIVFGDRFDYDDYKFQKLLNLNQAMFEVESGTMAQ (0)

IATAIPWLAKLPGLAKMIYRPHVDVLEYLQKIISDHQKTWNPACTRDLIDAFTLQMEK (0)

AKGDKENHFNEKNLLFTTFDLFTAGSETSTTTLRWGLLYMLQYPDVQ (1)

RKVQEEIDKVIGKSRKPVMADVLQMSYTNAVIHEIQRCADLVPLSLIHMTYRDTEVQGFSIPK (0)

GVAVIPNLSSVLKDEKVWEKPFQFYPEHFLDADGKFVKQEAFLPFST (1)

GRRACLGERLARMELFLFFTSLLQRFSFQIPDGEPCPRDDPIVYIVQIPHPYKLCAKIR*

 

>CYP2D.2 scaffold_160:866974-882965 (-) strand UCSC browser

DR873330.1 Trace archive 408392602, 234381521

MSLLSQLCPFALGCNVFTLGIIFTLLLLLLDFMKRRKPCTDFPPSPPSWPFVGNLLQMDFSSLSFRQ (0)

LRKQYGDVFSLQLGWQNVVVLNGYEAIKEALLQKSEDFADRPPFELYEGIGFTGNNK (1)

GVVLANYSQSWKDLRRFTLSTLRDFGMGKKSLEEKVREEAGYLCDAFQSEQ (1)

GQLFDPHYKLNTAVANIMNSIVFGDRFDYDDYKFQKLLNLNQEMFEVEFGTMAQ (0)

IATAIPWLAKLPGLAKMIYRPHVDVLEYLQKIISDHQKTCNPACTRDLIDAFTLEMEK (0)

VKGDKENYFNEKNLLFTAFDLFTAGSETSSTTLRWGLLYMLLYPDVQ (1)

RKVQEEIDQVIGKSRKAAMADVLQMSYTNAVIHEIQRCADLVPLSVTHMTYRDTEVQGFSIPK (0)

GVAVCPNLSSVLKDEKVWEKPFQFYPEHFLDADGKFVKQEAFLPFST (1)

GRRACLGERLARMELFLFFTSLLQRFSFQIPDGEPCPRDDPIVYIVQFPHPYKLCAKIR

 

>CYP2D.3 scaffold_160:1923301-1927860  (+) strand then a gap  UCSC browser

Trace arfchive to fill in seq gap with exon 4 387743496

241672823 to finish exon 7

418485537 walking down, 479264026 = exon 8, 248788894 = exon 9

5aa diffs to 2D45

MSLLSQLCPFALGCNVFTLGIIFTLLLLLLDFMKRRKPCTDFPPSPPSWPFVGNLLQMDFRDLHNSFKQ (0)

LSKQYGDVMSLQVFWKSMVVLNGFEVIKEALIQKSEDTADRPPFNLYEILGFVGNNK (1)

AVVLANYGQSWKDLRRFTLSTLRDFGMGKKSLEERVRDEAGYLCDAFQSEQ (1)

GGPFDPHVLINTAVSNVICSIIFGERFEYDDHKFLKLLCLIEESIKAESGPVPQ (0)

IISSLPWSSKVPGLARLFFQPRIHMLQYLQEIINEHKQTWDSGHTRDFIDAFMLEMKK (0)

AKGVKDSNFNDQNLLLTTADLFSAGSETTTTTLRWGLLFMLLYPDVQ (1)

RKVQEEIDQVIGRTRKPTMGDVLQMPYTNAVIHEIQRYGDIIPLSVPHMAYRDTHIKGFFIPK (0)

GTVIMTNLSSVLKDEKVWEKPFQFYPEHFLDRDGKFVKREAFMAFSA (1)

GRRVCLGEQLARMELFLFFTSLLQRFSFQIPDGEPCPREDPVFVFLQVPHDYKICAKVR*

 

>CYP2D.4 scaffold_160: 1936385-1938199 (+) strand (exons 6-9)

BX707908.1, CX969358 mate = CX969359 3aa diffs to 2D45

MSLLSQLCPFALGCNVVTLGIIFTLLLLLLDFMKRRKPCTDFPPSPPSWPFVGNLLQMDFRDLHNSFKQ

LSKQYGDVMSLRVFWKPTVVLNGFEVIKEALIQKSEDTADRPPFNLYEILGFVGNNK

AVVLANYGQSWKDLRRFTLSTLRDFGMGKKSLEERVRDEAGYLCDAFQSEQ

GGPFDPHVLINTAVSNVICSIIFGERFEYDDHKFLKLLCLIEESIKAESGPVPQ

IISSLPWSSKVPGLARLFFQPRIHMLQYLQEIINEHKQTWDSGHTRDFIDAFMLEMKK

AKGVKDSNFNDQNLLLTTADLFSAGSETTTTTLRWGLLFMLLYPDVQ

RKVQEEIDQVIGRTRKPTMGDVLQMPYTNAVIHEIQRYGDIIPLSVPHMAYRDTHIKGFFIPK

GTVIMTNLSSVLKDEKVWEKPFQFYPEHFLDRDGKFVKREAFMAFSA

GRRVCLGEQLARMELFLFFTSLLQRFSFQIPDGEPCPREDPVFVFLQVPHDYKICAKVR*

 

There are some assembly difficulties at scaf 160 in this region.  Some duplicate exons exist.  The D.3 and D.4 may be the same gene.  Only 4 aa diffs

 

>DN017333.1 51% to 2C8, Ramy Naguib Attia

cannot extend in the ESTdb

MSPSIFTLLIFVLLVLLSIMWWKKNLKDRSLLPPGPTPLPFLGNLLQVKPKEFLKALDK (0)

LKEKHGSVFTVYFGARPTVILCGYQTVKEALIDQADTFSSRGKMALAEHILKGY (1)

GITGSNGERWKQLRRFALTTLRNFGMGKRTIEKRIQEETTFLIEEFRNAE (1)

GMPFDPTFYLGCAVSNIICSIVFGERFDYNDKQFLFLLKNINKVLRFMNSTWGV (0)

VFFTFDKIMCHIPGPHQKAMKHLVDLKAFVQQRVRESKEILDINSPQHFIDCFLIKMQE (0)

EQENPHSEFHMDNLIGSALNLFFAGTETVSTTLRYGILILLKWPHIQ (1)

GRIQEEIDDVIGRQQCPKIEDRSKMPYTDAVIHEIQRFSDIVPTGLPHTATQDTTFRGHTIPK (0)

GTDVFALLTTVLKDPEVFQNPEEFNPERFLDENGILKKSQAFMPFSA (1)

GKRMCPGESLARMEIFLFLTTLLQKFTLIPTVPSVDLDVTPEISSSGHLPREYKMCVLPTT*

 

>CYP2Q2
CX972427.1 best hit to CYP2A13 in ESTdb X.tropicalis S.Hill, A.Bolen

NM_001010998.1 89% to CYP2Q1, 55% to 2A6 same as CX972427.1

from refseq database

MDTSWLWTLLLCLLISAMLIYSTWNKMYRKRNLPPGPTPIPLFG

NVMQIKRGEMVKSLIELGKKYGDIYTLYFGPSPVVILCSYRAIKEALIDQAEEFSGRG

AIPSFDQYFQGYGVVFTNGEEWKNLRRFSLSTLRNFGMGKRGIEERIKEEAQFLVAEI

KSYKEKPFDPTNILVQ

CVSNVICSVVFGNRFEYANKDFQNLLSLFQSVFQETSSSWGQLLNMLPAVMN
HVPGPHKNIIRDMNKLEDFVLQRVKENEKTVDPNSPRDLIDSFLIKMQQENKNPTSPFHM
KNLIATILSIFFAGTETVSTTLRHGFLILLIHPEIEAKLQEEIDRVVGQNRSPTIEDRNK
MPYTDAVIHEIQRLSDVIPMNVPHLVTKDTKFRGYTIPKGTNIYPLLCAVLRDPEQFDTP
SKFNPNHFLDDKGCFKSNDGFMPFSTGKRICLGEGLARMELFLFLT

NILQNFKLHSESGLTEDNIAPKMKGFANYPTSYQLSFIPR

 

>CYP2Q3

NM_001010999.1 54% to 2A6 79% to NM_001010998.1, 78% to 2Q1

from refseq database

MDTTWLWSLQLFLLIATMLIYSTWNKMYRKRNLPPGPTPIPLFG

NVLQIKRGEMVKSLLELGKKYGPVYTLYFGPSPVIILCDYQSIKEALNDQAEEFSGRG

KIPSWDQFFQGYGESFSNGDEWKQLRRFSLTTLRNFGMGKRGIEERIQEEAQFLVAEI

KSYKGKPFDPTKILVQCVSNVICSVVFGQRYEYSNKDFHKLLYMFQAVFEDTSSTLGQ

LMTLLPNIMNHIPGPHKTVVNKLNKVNDFILQRVKENEKTLDPNSPRHFIDSFLIQMQ

KEKDNPVTKFHWKNLLCTIMNLFFAGTETVSTTLRHGFLMLLIHPEIEEKLHEEIDRV

VGQDRSPTIEDRSKMPYTDAVIHEIQRFSDVLPMSLPHLVMKDTQFRGYTIPKGTDVY

PLICAALRDPKQFATPNKFNPQHFLDDNGLFKSSNAFLPFSTGKRICLGEGLARMELF

LFLTNILQNFKLHSENQFAEDDIAPKMNGFANYPLSYEFSLIPRVQSLLVL

 

>CX329225.2 DR834894.1 CX379987.2 Yun Peng 74% to human 2R1

MFPPVPLVALVAAALLIGGFLVRQIVKQRKPRGFPPGPPGLPLIGNILA

LASDPHVYMKKQSKIHGQ (0)

IFSLDLGGISTVVLNGYDAV

KECLVRQSDVFADRPSLPLFKKLTNMGGLLNAKYGRCWTEHRKLAVSCFRTFGCSQKSFE

SKISEECLFFLDAIDSYKGKALDPKHLVTIAVSNVSNLILFGERFRYDDNDFLHMIEIFS

ENIELATSAWVFLYNAFPLIGFLPFGKHQQLFRNASEVYDFLLQIIGRFSENRKPQSPRH

FIDAYMDEMERNEAD

PDSTYSMENLIFSVGELIIAGTETTTNVLRWAMLFMALYPNIQGQVQKEIDGVVGLNRMP
TFEEKSRMPYTEAVLHEILRYCNIAPLGIFHATSRDTVVRGYSIPEGTTVITNLYSVHFD
EKYWTDPEIFYPERFLDSAGQFTKKEAFVPFSLGRRHCLGEQLARMEMYLFFTALLQRFH
LHFPQGFVPNLRPKLGMTLQPHPYVICAERR*

 

>CX850388.1 different from 2R1 above 91%

MFPPVPLVALVAAALLIGGFLVRPIVKQRKPRGFPPGSPGLPLIGNILALASDPHVYLKK  184

QSKIHGQIFSLDLGGISTVVLNGYDAVKECLVRQSDVFADRLSLPLFKKLTNMGGLLNAK  364

YGRCWTEHRKLAVSCFRTFCCSQKSFESKISEECLFFLEANDSYE

 

>CYP2U1

CX851239.1 CX439683.1 CX959423.1 DR836116.1

best hit to CYP2U1 in ESTdb X.tropicalis M.Puljic

best match in human = CYP2U1 63%, CYP2U1 ortholog

MSDLAQDSMSGTLDWKQMGYASWSLLGDCASVSALLLYIALFLGLYLLMGSLWRYYQI

IHSNAPPGPTPWPIVGNFAFMLMPGWLM

QLLNFGIAKGKLRRVPAGATRRGAFLYPHIVLTEMAKMYGKIYGLYIGTRLMVILNDFNS

VKDALVSHSEVFSDRPSVSLVTIITKRKGIVFAPYGPIWRQQRRFSHSTLRYFGLGKLSL

EPKIIEEFKYVKAEMLKFGNKGFSPFEIINNAVSNVICSISFGKRFNYEDKEFKTMLSLM

SRGLEISVNSEAVLICLCSWLYYLPFGPFKELRQIVIDITAFLKRIIAE

HQVTLDPANPRDFIDMYLLHIKEEQKGQAESIFNTEYLFYIIGDLFIAGTDTTTNTLLWS

LLYMCLYPDVQEKVQAEIDTVIGRDRPPSLTDKSQMPFTEATIMEVQRMTVVVPLSVPHM

ASESSVFHGYTIPKGSVVMANLWSVHRDPKVWEKPNDFMPKRFLDENGQILKKEAFIPFG

IGRRVCMGEQLAKMELFLMFVNLLQSFSFSLADDTFKPSLEGRFGLTLAPYPFDIKITKR

*

 

>DN060997.1 DR833173.1 DR842090.1 CF374775.1

best hit to CYP2S1 in ESTdb X.tropicalis H.Penmatsa, K. Iyer, G. Vasser

best match in human = 2C18 55%, 47% to 2S1 not a 2S ortholog
MEILGATAVLLVICAF

FLLLNTIQVIRRQGKGKLPPGPTPLPFLGNFLQLRGEEVFKSLLEFGKKYGPVYTIHLGM

EPVVVLCSFDIVKEALNDNGDEFGARGHMPLLEKISHGGHGVVASNGERWKQLRRFSLMT

LRNFGMGKRSIEERIQEEAHFLTNEFKYTKGQPVDP

TFYFSKAVSNVICSVVFGDRFEYEDTEFLRLLGLLNQVFRGFSSVWGQLYNIFPKV

MGKLPGPHNMIFKSVNSLQEFIMQRINMHQET

LDPSSPRDFIDCFLIKMQQEKDVPQTEFHMQGALNTTFDMFGAGTETVSTTLRYGLLILL

KHPDIEERIQKEIDSVIGRNRAPCIEDRSRMPYTDAVIHEIQRFVDIIPMGIPHKVTRDI

QFQGYFIPKGTTVYPMLSSVLHDPKQFKYPDIFNPGHFIDENGKFCKNDGFMPFSSGKRI

CVGEGLARMELFLFITTILQNYTLRSPVDTEDLDLTPELSGFGNIPRPYKLCFIPR*

 

>NM_001001212.2 51% to 2C18

MDWALEINGLPILLLIAALLLLLARKVGKKVKGCLPPGPKPLPI

LGNLLQLKSREIHKPLLEFNKKYGPVYTLYMGSMPAVVLCGYEAVKEALVDNAEKFSG

RAEVPIVNLTTQGYGIAFSNGERWKELRRFSLTTLRNFGMGKRSIEERIQEEIHFLLE

AFHETQGSFFSPAFIIRRSVSNVICSVVFGKRFDYTDQKLQILLDLIAENLRRVDNIW

VQVYNFIPKLLNILPGPHHKLTENYKAQLRYVEEIVQEHGKTLDPSAPQDYIDAFLLK

MEQERKKAHTEYNVQNLLSCSLDIFFAGQESTSSTLGYGLLILMKYPHIKEKVQAEIE

SVIGRSRRPCMDDRAKMPYTEAVIHEIMRFIDFFPLGVPHSVTEDTLYRGYVIPKGTT

IFPFLHSVLFDPSMFERPQEFYPGHFLNQDGSFRKNEGFMAFSAGKRACPGKSLARVE

IFLYLTSILQQFDPQPALSPKDIDLSPEYSGFGKMAPSFQLKLVPH

 

>NM_001005711.1 45% to 2C8

MEPLTIFLCLFIFLLLLFTWKTHKRRVQLPPGPYPLPLLGNVLQ

GITVLYDSYRKLSEQYGPVFTVWLGSTPMVVLCGYEVLKDALINHSQEFGARGAFPVP

ERLTDGYGVISTNGTRWQQLRRFSVTVLRNFGMGKRSMEERIHEETQHLIQAVQHTGG

EAFDPLYLLGRAVNNIINLIVFGRRWDYKDKMMIKLFNIINSILLFLRSPLGVIYSAL

YQIMQHLPGPHQKIFHDSETVKSFIREQINSHKETLDSDSPRDYIDCFLIKANQEKDH

HSSEFSQENLVNTVFDFFVAGTETATNTIQFSLLVIITYPHIQAQVQKEIDKVVGPDR

LPGIADRAQMPYTNAVIHEIHRFLDLVPLSLPHMATQDTVCRGFRIPKGTTVIPLIGS

ALCDPAHWETPEEFNPEHFLNQNGEFYIPPAFMPFSAGKRVCLGEGLARMEIFLFFTA

LLQKFTIRVANQTDTFNLRTLRRAFRKKGLFYQLRAMPRTCTVEK

 

>NM_001004777.1 (gap missing C-helix, 22 aa) CX454308.2 69% to NM_001035117

MDRKQPYKTLMEVSKKYGSVFSVRVGPLKMVVLCGYDTVKDALLNYPDEFADRPALPLFD

ELVKGHGIIFSNGENWKVMRRFSLSTLRDFGMGKKTIESKIIEECDHLVQKFNSYGGKPFDNTM

IMNAAAANIIASILLSHRFHYENPTLLRLLKLVNENMRLMASPIALLYNTYPSIMRWV

PGCHKTIYNNAQELMEFIRETFSKHKVELDINDQRNLIDAFLSRQQEEKPHSAKYFHD

DNLTILVIDLFAAGMETTSTTLRWALLLMMKYPEIQKKVQDEIEKVIGSVEPRAEHRK

EMPYTDAVLHEIQRFANITPMNGPHATTKDVTFRGFFLPKGTYVIPLLASVLKDENYF

EKPNEFYPEHFLDSEGHFMKNEAFLPFSAGRRSCAGENLARMELFLFFTSLLQNFTFQ

APPGEELDLTPDVGGTVPPRPHTVCALPRS

 

>NM_001004878.1 66% to NM_001035117, 51% to 2K17 zebrafish

MDRKQPYKALLKVSKKYGPVCSFQIGPLKTVVLCGYDTVKDALLNDEFADRPAMPMLD

DVAKGHGILSSNGENWRVMRRFALSTLRDFGMGKKTIESKINEE

CDHLVQKFSSYGGKPFDTTMIMNAAVANIIASILLSHRFHYENPTLLRLLKLVNENTK

FMASRIAMLYNTFPSIMRWIPGCHKSIYKNAQELLEFIRETFSKQKVELDINDQRNLI

DAFLSRQQEPNSGKYFHDDNLTILVFDLFVAGMETTSTTLRWALLLMMKYPEIQKKVQ

DEIEKVIGSAEPRAEHRKEMPYTDAVIHEIQRFANIFPMNGPHATTKDVTFRGFLIPK

GTFVIPLLASVLKDENYFKKPNEFYPEHFLDSEGHFVKNDAFLPFSAGRRSCAGENLA

RMELFLFFTSLLQNFTFQAPPGEELDLTPDVGIATPPMQHTVCALPRA

 

>scaffold_21945:198-3026 1 aa diff to scaffold 55 (-) 506398-495841

 198 EKVHDEISRVIGSAHPTYSHRTQMPFTNAVIHEILRFADIVPLSVPHETTRDVHFKGYFIPK (0) 1158

2607 GTYIIPLLSSVLKDKTQFDAPEEFNPNHFLDSEGNFLKKEAFMPFSA (1) 2747

2847 GRRACPGEILARMELFIFFTSLLQKFSFRPPPGVTNINLSSDVGFTSVPLEGMICAIPRA 3026

 

>scaffold_2219:41253-41363 (-) exon 2 partial

41363 LWKKYGSIFSVQIGSQKMVVLCGYETVKDALVNYAEE 41253

 

>scaffold_3861: 233-373 (+) exon 8, this scaffold has large gaps

100% to second exon 4-9 86% to 21818_prot

233 GTFVIPLLTSVLYDQTRFEKPKEFYPQHFLDSEGNFVKNEAFLPFSA 373

 

>scaffold_3861:8120-8284 (+) exon 2 this scaffold has large gaps

100% to 21818_prot

8120 LWKKYGSIFRVQIGSQKMVVLCGYETVKDALINHGEEFSERPRLPIFQVIANGY 8284

 

>scaffold_3433:6638-6781 (+) exon 6, 95% to DT436641.1

6638 AKHPETYSYFHNENLVRLVRNVFSAGVETTSTALRWALLLMIKYPDIQ 6781

 

>scaffold_3433:23829-23954 (+) exon 4

100% to Green = DT436641.1 75% to 21819_prot

23829 FVFSLGKPFDNTMIMNAAVANIIVSIVLGHRFDYQDPKFLRL 23954

 

>scaffold_2615 : 2-55 (+) exon 9

2 VGFTSVPLEGMICAIPRA 55

 

>scaffold_2899 : 33841-33948 (+)

94% to 49369_prot   scaffold_996:232793-245538

33841 GTQVIPLLASVLQDETYFEKPEEFYPQHFLDSEGLF 33948

 

>scaffold_590 : 150217-150324 exon 8, 83% to $$$$$$8

150217 LWDKPYFEKPDEFYAQHFLDSEGNFVKNEAFLPFSA 150324

 

>14945_prot 55% to DN017333.1, 52% to 2C8

scaffold_1232:575-12890 (+) exon 3 partial

  575 MAMDSAGTVLLAACVIVLFYLVKWRGNNKRKNLPPGPTAFPLLGNFLQVSTTEIPSSCVE 754

 1024 LSKTYGPVFTLYLGGHRSIILIGYDAVKEALIDNSDVFSDRGEGGVSEMIFKNY 1185

 3914 GVILSNGERWKTMRRFTLTTLRNFGMGKRSVEERIQEEARSLEEAFRKKK 4063

 5417 DEPFDPIYLLGLAVSNIICSIIFGERFDYEDEKFMTLLMYIREFVKLLNSFFGM 5578

 5936 LFNFFPNLFCYIPGPHQNIFTYFNKLKQFVKDEAKSHKDTLDANCPRDFIDCFLIRME 6109

 8038 QEKNNPNSEFHYENLFGTILDLFLAGTETTSSTLRYAFLILLKYPEIQ 8181

 9338 ENVYKEIVQVIGQHRYPSVEDRSKMPYTEAVIHEVQRIGDILPLGLEHAASKDTTFRGYDIPK 9526

11964 GTLIFPLLTSVLKDPKYFKNPDQFDPEHFLDENGCFKKNDAFMPFST 12104

12708 GKRVCAGEGLARMELFIFLTTILQKFILKSTVATEEIKITPEPNTNGSRPWPYKMFVVPRC 12890

 

>scaffold_16683:748-5473 = BC092552.1

MDPVSVLLSVVVCIFLFKVFYDGEKESQNFPPGPKPLPLIGNLHIINMEKPYLTFME

LAEKYGSVFSFHLGTEKVVVLCGTDAVRDALINHAEEFSGRPKVAIFDQIFKGH

GIIFADGENWKVMRRFSLSTLRDFGMGKKTIEEKISEESDCL

VETFKSHGGKPFDNTMIMNAAVANIIVALLLSQRFDYQDPTLLKLVKSINKIVRITGSSMVMLYNTF

PSIMQWIPGSHQNVVKNAEKIYTFLIETFTKHRHQLDVNDQRDLIDTFLIKQQEEKSSST

KFFHDENLKVLLLNLFGAGMETTSTTLRWGILLMMKYPEVQKKVQDEIDRVIGSAEPRLE

HQKQMPYTDAVIHEIQRFADLVPNNVPHATTKDVTFRGYFIPK

GTHVIPLLTSVLKDKDYFKKPNEFYPEHFLDSEGHFVKNEAFLPFSA

GRRICAGETLAKMELFLFFTNLLQNFTFQPPPGVEVQLTRGVAITSIPTEHKICALPRS*

 

>52542_prot scaffold_1232:27024-44511 (+)

MELGVTWSLILAVIVSFLVYSFTWRRKLRKINMPPGPPLYPLLGNMLQIS

AKEFPQSLVKLSEKYGTVFTVYLPSKPAVILSGYDCIKEALLDNNESFGA

RGESPLGYLLFKDYGVIFSNGERWKQLRRFSLSCLRDFGMGKKSIEERIQ

EEARCLVEELGKNGDTPMDPTYMLTLAVSNVICTVVFGERFDYKDEKFMT

LISLLKIVSRDFSSAWGIRSRRPRTRSCAQKLLNLFPNTLSRLPGPPQRL

FRNFDKLKAFVAESLKSHQETLNSDCPRDFIDCFLIKMEKEKNNPQTEFH

SDNLFGTVLDLFFAGTETTSITLKYSFLMLLKYTEVTRKAMEEIDNIIGQ

ERCPFYEDRIKMPYTNAVIHEIQRMADIVPLGVPHATTHDIIFRGYNIPK

DTIIFPLMTSVLKDPKYFNDPKQFDPAHFLDENGSFKKNDAFQPFSIGKR

SCLGEGLARMEIFLFITSILQAFNLKSDTAPQDIDITPEPDKNGAIPRTY

KMYFVPK

 

>14947_prot scaffold_1232:47102-62978 (+)

MAVLGIETLFLVCSFTFLVFLFSRRQRHARLPPGPTPLPLLGNVLQLDFSKQVKEFVKLGSQY

GPVSMVYLGPYPVLVLNGYDVVKEAFVDNGEVFSNRGKNAFIEMIFKGR

GVAFSNGERWRQMRRFSLSTLRDFGMGKRRVEERVQEE

ACALVEEFKKTKGTPFNSTYLMTLAVSNVICSVVFGERFDYQNETFLSVL

ALLKDTFKIITSPWTQLFSFAPGLLKHLPGPHKKAAENLDRLKTFVTEFV

ASHEETLEENFPRDYIDCFLIKMRQEKDNVNTEFDYENLFVTLMNLFFAG

TETTSITLQYGMLILLKYPDIQKKIHEEIDSVIGFNRCPSMEDRPKMPYT

DATIHEIQRFADIVPMGVPRSTNKDTTLRGYDIPKGTTVFPMLTCILKDP

RYFKDPESFNPCHFLDEKGCLKKTDAFIPFSIGKRVCLGEGLARMEIFLF

LTSILQRFELKCHMDPKDIDISPVPSKSAYMPRPYELYITPR

 

>52545_prot scaffold_1232:71267-83903 (+) short seq

exon 7 gap filled in by DT419848.1, missing exon 2

82% to 52547_prot scaffold_1232:122253-139910

71267 MDVAGLGTFLLVLITFILTLSSWNTMYKKVNLPPGPTPLPLIGNLMNIKKGKMVNSLMK (0)

 

GLSFSNGERWRQMRHFTLKTLKNFGMGKKSIEEKIQEEALCLVEEIRKSG (1)

ETPVDPSKLIMDAVSNVFCSIMFGRRFEYNEEKFANLLTNVNEIFRLMSNTWGQ (0)

LESIFPSVMAYIPGPHKKKNTLSEELISFLHERVKSNQETFDPSAPRDFIDEYLMKIEQ (0)

EKKNPNSEFTMRNTLLTFFSIFLGGTETSTTTIKHGLLLLIKYPEIQ (1)

79425 AKLHMEIDHVIGRNRIVNINDRNAMPYMEAVINEIQRFSDIAPLNAPRKVTKDVQFRGYSIPK (0) 79511

DTEIYPLLCTVHRDPKYFSSPYEFNPSHFLDEQGRFRKSEAMMAFSA (1)

GKRICPGESLARMELFLFFTTILQNFTLTSPTHFTEDDVAPKMAGFMNHPIQYKASFISR* 83903

 

>scaffold_1232:89713-89889 extra exon 1

MDVTGLGTILLVLISCVLIFSSWKTFYQKHNLPPGPTPLPLMGNLMNIKKGKLVSSLMK

 

>scaffold_1232

90% to 52547_prot   scaffold_1232:122253-139910, 54% to 2Q2

 93883 MDVTGLGTILLVLISCVLIFSSWKTFYQKHNLPPGPTPLPLMGNLMNIKKGKLVSSLMK 94059

 96262 LWEQYGAVYTLYFGTQPVIVLCGYDAVKEALVDQAEAFGARGKISSLDPVTQGY 96423

 96988 GIGFSNGERWRQMRHFTLKALRDYGMGKKSIEEKIQEEALCLVEEFRKSG 97137

 98027 EMPINPSTHIMKAVANIFFSIMLGNRFEYNNETFSALLATLEEMYTLMNNTWSQ 98194

 99835 IENVLPKLMAYIPGPHKKRDALAKELILFFHERVKANQETFDPSAPRDFIDEFLIKMEQ 100014

101345 EKKNPNSEFTMRNILMTFFSIFIGGTETSTTTLKHGLLLLIKYPEIQ 101485

116449 AKLHMEIDNVIGRNRTVNLNDRNSMPYMEAVINEIQRFSDIAPLNLPRKVTKDVQFRGYCIPK 116637

119036 DTEIYPLLCTVHRDAKYFSSPYEFNPSHFLDEQGRFKKNDALMAFSA 119176

119933 GKRMCPGESLARMELFLFFTTILQNFTLTSPTHFTEDDVAPKMTGIINHPIQYKASFIA 120109

extra exons 7,8,8

113766 AKLHMEIDNVIGRNRTVNLNDRKFMPYMEAVIN 113864

115357 DTEIYPLLCTVHRDAKYFSSPYEFNPSHFLDEQGRFKKNDALMAFSA 115479

115792 DTEIYPLLCTVHRDAKYFSSPYEFNPSHFLDEQGRFKKNDALMAFSA 115932

 

>52547_prot scaffold_1232:122253-139910 (+) poor model revised

missing exons 2,3 found on DT436730.1

58% to 52548_prot scaffold_1232:145476-158239, 55% to 2Q2

MYVAGLGTILLVLISCVLIFSSWKTLYQKHNLPPGPTPLPLIGNLMNIKRGKLVSSLMK (0)

LWEQYGAVYTLYFGIQPVIVLCGYDAVKEALVDQAEDFGARGKISSLDPVTQGY

GLSFSNGERWRQLRHFTLKALRDFGMGKKSIEEKIQEEALCLVEEFRKSG

EMPTDPEKPIMKAVSNIFFTIVLGNRFEYNDETFSALLAKVEEMFRLMSNTWSQ (0)

IENVLPKLMAYIPGPHKKRDALGKQLILFLHERIKANQETFDPSAPRDFIDEFLIKMEQ (0)

EKKNPNSEFTMKNTLLTFYSIFLGGTETSTTTLKHGLLLLIKYPEIQ

AKLHMEIDNVIGRNRTANMIDRNSMPYMEAVINEIQRFSDIIPLNVPRKVTKDVQFRGYCIPK

DTEIYPLLCTVHHDAKYFSSPYEFNPSHFLDEQGKFKKNNAMMAFSA

GKRICPGESLTRMELFLFFTTILQNFTLTSPTHFTDNDVAPKMTGFINHPIQYKASFISR

 

>52548_prot scaffold_1232:145476-158239 (+) 67% to 2Q2

MDITGLGTLVLILLISCIVIYSTWNSMYRKRNLPPGPTPLPLIGNLLQIKRGEMVKSLTE

FGKQYGPVYTLYLGPRPVIVLNGYQAVKEALIDQGEEFSGRGKLVVADLIFGGF

GVVFSNGDRWKQLRRFSLMTLRDFGMGKRSIEERIKEEAQCLQVELHKYK (1)

QTPTDPQNILVQAVSNVICSVVFGNRFEYENSEFLKLLRLFNETFQMMSSTWGQ

LQQIIPFIMNYIPGPHQKIDKVVARQLEFVSERVKKNQETIDFNSPRDFIDCFLIKMQQ

ETQNPTSEFNLKNLLMTVLNLFVAGTETVSSTLRNGILLLLKYPHIQ

EKLHKEIDVVIGQNRSPNIDDRSKMPYMDAVIHEIQRFTDILPMNLPHSVIKDTAFQGYTIPK

DTDVYPMLCSVLRDPTQFTTPENFNPEHFLDDSGCFKKSDAFMPFST

GKRICLGEGLARMELFLFLTTILQNFTLTSETQITESDITPRMAGFANVPISYKVSFVPR

 

>4371_prot scaffold_1232:170631-190051 (+) = CYP2Q3

MDTTWLWSLQLFLLIATMLIYSTWNKMYRKRNLPPGPTPIPLFGNVLQIKRGEMVKSLLE

LGKKYGPVYTLYFGPSPVIILCDYQSIKEALNDQAEEFSGRGKIPSWDQFFQGY

GESFSNGDEWKQLRRFSLTTLRNFGMGKRGIEERIQEEAQFLVAEIKSYK

GKPFDPTKILVQCVSNVICSVVFGQRYEYSNKDFHKLLYMFQAVFEDTSSTLGQ

LMTLLPNIMNHIPGPHKTVVNKLNKVNDFILQRVKENEKTLDPNSPRHFIDSFLIQMQK

EKDNPVTKFHWKNLLCTIMNLFFAGTETVSTTLRHGFLMLLIHPEIE

EKLHEEIDRVVGQDRSPTIEDRSKMPYTDAVIHEIQRFSDVLPMSLPHLVMKDTQFRGYTIPK

GTDVYPLICAALRDPKQFATPNKFNPQHFLDDNGLFKSSNAFLPFST

GKRICLGEGLARMELFLFLTNILQNFKLHSENQFAEDDIAPKMNGFANYPLSYEFSLIPRVQSLLVL*

 

>4372_prot scaffold_1232:202751-216554 (+) = CYP2Q2

MDTSWLWTLLLCLLISAMLIYSTWNKMYRKRNLPPGPTPIPLFGNVMQ

IKRGEMVKSLIELGKKYGDIYTLYFGPSPVVILCSYRAIKEALIDQAEEF

SGRGAIPSFDQYFQGYGVVFTNGEEWKNLRRFSLSTLRNFGMGKRGIEER

IKEEAQFLVAEIKSYKEKPFDPTNILVQCVSNVICSVVFGNRFEYANKDF

QNLLSLFQSVFQETSSSWGQLLNMLPAVMNHVPGPHKNIIRDMNKLEDFV

LQRVKENEKTVDPNSPRDLIDSFLIKMQQENKNPTSPFHMKNLIATILSI

FFAGTETVSTTLRHGFLILLIHPEIEAKLQEEIDRVVGQNRSPTIEDRNK

MPYTDAVIHEIQRLSDVIPMNVPHLVTKDTKFRGYTIPKGTNIYPLLCAV

LRDPEQFDTPSKFNPNHFLDDKGCFKSNDGFMPFSTGKRICLGEGLARME

LFLFLTNILQNFKLHSESGLTEDNIAPKMKGFANYPTSYQLSFIPR

 

>21828_prot two genes fused scaffold_55:606485-680452

second part has some P450 exons poor model (revised)

upper part is rhesus blood group glycoprotein rhag

DT438894.1

603844 MDLVYSPSVCLLLATAVFIILYTLIDWARSSARNFPSGPLALPLIGHLHIINLKRPSEALNK 603659

602714 ISKTHGNIFRIQMGTVEMVVLAGYEAVKEALIDNAEAFAGRPFVPILDDIFHGY 602553

593338 GIPFSHGDNWKEMRRFTLSTFRDFGMGKRTIEDKIIEECGFLIKEIEVYK (1) 593189

590829 DEPVELKEFISVAVGNIISSIVLGHRFDNYQHPTLLRVLELVHENFRLLGSPSVI (0) 590665

588598 LYNIFPIMRFFPGDHKKIMKNLEELHCFLRETFLKHLKVLERDDQRGYIDAYLVRQLE (0) 588425

586539 EKGNPKSYFHEQNLLSILATLFAAGTDTTIASIRWAISFMVKNPLIQ (1) 586399

584383 KRVHEEIDRVIGSSQPQFHHRKSMPYTNAVVHETQRVANVVPMNLPHATTRDINFRGYHLPK (0) 584198

581601 GTYIVPLLESVLFDKTQFERAEEFYPEHFLDSDGKFVMRPAFLPFST (1) 581464

       GKRICIGETLAKMELFIFFTSLMQKFSFHPPPGDPNFDVKPAIGLTSPPLPRKLCIVPRS

 

>6348_prot scaffold_55:566754-603844 DT450622.1 83% to 21828_prot

578469 MDLVYSPSMCLLLAAVVFIILYTLIDWARSSARNFPSGPLALPLIGHLHIINLKRPSEALNK 578284

577359 ISKTHGNIFRIQMGTVEMVVLAGYETVKEALIDNAEAFAGRPFVPILDDIFHGY 577198

575649 GIPFSNGENWKEMRRFTISRFRDFGVGKRTMEDKITEESVCLIKEMEVLK 575500

575200 DEPVELTPYISVAVGNIIASIVLGHRFDDYKNPTLLRVLQLTSENLSYLGSPSVL 575036

574075 LYNVFPILRFFPGDRNKLLKNLKELHCFLRETFMKHLKVLERDDQRGYIDAFLVKQLE 573902

572545 EKENSNSYFHEKNLICILVSLFSAGTDTTIASIRWALTFMVKNPHIQ 572408

571008 QRVHEEIDRVIGSSQPQFHHRTSMPYTNAVVHETQRVANVVPMNLPHATTTDVNFRGYHLPK 570823

569487 GTYVVPLLESVLFDKTQFERAEEFYPEHFLDSDGKFVMRPAFLPFST 569347

566936 GKRICIGETLAKMEVFIFFTTLMQKFSFHAPPGEPDIEIKRGIGLTSPPLPQKLCIVRRS 566757

 

>scaffold_55 exon 7, same as NM_001015757.1

558700 MDFTFSLATYLVLVVTVLYILSNWKRKALNNFPPGPKGWPLVGNVFSIDLKKPQRTYIE 558524

550333 EKVHDEIARVIGSAHPTYSHRTQMPFTNAVIHEMLRFADIVPLSVPHETTRDVHFKGYFIPK 550148

 

>21822_prot 2 P450s fused scaffold_55:465695-533227 (-)

99% to DT436641.1

549227 GTYIIPLLTSVLKDKTQFDAPEEFNPNHFLDSEGNFLKKEAFMPFSA 549087

548988 GRRACPGEILARMELFIFFTSLLQKFSFHPPPGVTNINLSSDVGFTSVPLEGMICAIPRA 548809

 

>scaffold 55 partial seq 100% to DT436641.1

543754 LSKKYGPVFSVQMGRKKMVILVGYETVKDALVTHAEEFGGRASIPVNKNLEK 543599

542568 GMIFSNGENWKAMRRFTITTLKDFGMGKSTIEETIAHECSYLVQYFASFK 542419

542089 GKPFDDSTILITSVANIIVAILLGHRMEYEDPVFLRLVNLNSEYVKLLGSPMVT 541928

541086 IYNMFPALGFLPGCHKTIEKNIKELYAFVRRTFVEHQKHLDIHDQRSFIDAFLARQKEE 540910

540653 AKHPETYSYFHNENLVRLVRNLFSAGMETTSTALRWALLLMIKYPDIQ 540510

 

>DT436641.1 DT433530.1 DT443285.1 DN045517.1 S.Sarva 95% to NM_001015757

MDFTFSLATYLVLVVTVLYILSNWKRKALNNFPPGPKGWPLVGNVFSIDLKKPQRTYIE

LSKKYGPVFSVQMGRKKMVILVGYETVKDALVTHAEEFGGRASIPVNKNLEKGL

GMIFSNGENWKAMRRFTITTLKDFGMGKSTIEETIAHECSYLVQYFASFK

GKPFDDSTILITSVANIIVAILLGHRMEYEDPVFLRLVNLNSEYVKLLGSPMVT

IYNMFPALGFLPGCHKTIEKNIKELYAFVRRTFVEHQKHLDIHDQRSFIDAFLARQKEE
AKHPETYSYFHNENLVRLVRNLFSAGMETTSTALRWALLLMIKYPDIQ
EKVHDEISRVIGSAHPTYSHRTQMPFTNAVIHEILRFSDILPLGVPHETTRDVHFKGYFIPK
GTYIIPLLSSVLKDKTQFDAPEEFNPNHFLDSEGNFLKKEAFMPFSA
GRRACPGEILARMELFIFFTSLLQKFSFHPPPGVTNINLSSDVGFTSVPLEGMICAIPRA*

 

>scaffold 55 (-) 100% to DT436641.1

539574 GMIFSNGENWKAMRRFTITTLKDFGMGKSTIEETIAHECSYLVQYFASFK 539425

539096 GKPFDDSTILITSVANIIVAILLGHRMEYEDPVFLRLVNLNSEYVKLLGSPMVT 538935

538098 IYNMFPALGFLPGCHKTIE 538042

 

>scaffold 55 (-) 96% to DT436641.1

532638 MDFTFSLATYLVLVVTVLYILSNWKRKALNNFPPGPKGWPLVGNVFSIDLKKPQRTYIE 532462

530541 LSKKYGPVFSVQMGRKKMVILVGYETVKDALVTHAEEFGGRASIPVNKNLEKGL 530383

527019 GITFSNGENWKAMRRFTITTLKDFGMGKSTIEETIAHECSYLVQYFASFK 526870

525259 GKPFDNSTILSTSVANIIAPILFGHRMEYEDPVFLRLVNLNSEYVKLLGSPMVT 525098

524043 IYNMFPALGFLPGCHKTIEKNLKELYAFVRRTFVEHQKHLDIHDQRSFIDVFLARQKE 523870

521860 EAKHPETNSYFHNENLVRLVRNVFSAGMETTSTALRWALLLMIKYPDIQ 521714

521136 EKVHDEISRVIGSAHPTYSHRTQMPFTNAVIHEILRFADIVPLSVPHETTRDVHFKGYFIPK 520951

519207 XXXXXXXXXXXLKDKTQFDAPEEFNPNHVLDSEGNFLKKEAFMPFSA 519100

519001 GRRACPGEILARMELFIFFTSLLQKFSFHSPPGVTNINLSSDVGFTSVPLEGMICAIPRA 518822

 

>scaffold 55 (-) 94% to NM_001015757.1 95% to DT436641.1

506398 MDFTFSLATYLVLVVTVFYILSNWKRKALNNFPPGPKGWPLVGNVFSIDLKKPQRTYIE 506222

504460 LSKKYGPVFSVQMGRKKMVILVGYETVKDALVTHAEEFGGRAYIPVNKDLEKGL 504299

500885 GITFSNGENWKAMRRFTITTLKDFGMGKSTIEEKITHECSYLVQYFAFSK 500736

500410 GKPFDNSTILITSVANIIVAILLGHRMEYEDPVFLRLLNLNSEYVKLLGSPMVT 500252

499245 IYNMFPALGFLPGCHKTIERNMKELYAFVRRTFVEHQKNLDIHDQRSFIDAFLARQKEE 499069

497715 AKHPETKSYFHNENLVRLVRNVFSAGVETTSTALRWALLLMIKYPDIQ 497572

497006 EKVHDEISRVIGSAHPTYSHRTQMPFTNAVIHEILRFADIVPLNVPHETTRDVHFKGYFIPK 496821

496260 GTYIIPLLSSVLKDKTQFDAPEEFNPNHFLDSEGNFLKKEAFMPFSA 496120

496020 GRRACPGEILARMELFIFFTSLLQKFSFRPPPGVTNINLSSDVGFTSVPLEGMICAIPRA 495841

 

>Green = DT436641.1 75% to 21819_prot

trace archive for gap

243598069 431692585 (both run into gap)

484940 MFLGDPVTVLLAVALCLIVAITLYRQKRDSSKNFPPGPKPLPIIGNIHNINLKRPYLTYL E 484758

481692 LWKKYGPIFRVQIGSQKMVVLCGYETVKDALVNYAEEFSERPVVPIFLDVVKEY 481531

       seq gap

479226 GKPFDNTMIMNAAVANIIVSIVLGHRFDYQDPKFLRLMSLINENLRLTGSPTVM 479065

477865 LYNVFPSVMRWLPGNHQTVGKNAAENQRFIRETFIKHKEKLDVNDQRNLVDAFLVKQQE 477689

474586 KNGNAVYFHDDNLTMLVSNLFAAGMETTSTSVRWGLLLMMKYPEIQ 474449

470107 KNVQNEIEKVIGQSRPQTEHRKSMPYTDAVIHEIQRFGNIIPMNLPHATAQDVTFRGYFLPK 469922

466117 GTYIIPLLSSVLKDKTQFDAPEEFNPNHFLDSEGNFLKKEAFMPFSA 465977

465877 GRRACPGEILARMELFIFFTSLLQKFSFHPPPGVTNINLSSDVGFTSVPLEGMICAIPRA 465698

 

>$$$$2 100% to $$$$3 and 21819_prot

454570 MFLGDPVTVLLAVALCLIVANTLYRQKRDSYKNFPPGPKPLPIIGNIHNINLKRPYLTYLE 454388

 

>$$$$3 100% to $$$$2 and 21819_prot

449457 MFLGDPVTVLLAVALCLIVANTLYRQKRDSYKNFPPGPKPLPIIGNIHNINLKRPYLTYLE 449275

 

>6347_prot scaffold_55:435508-454570 (-)  = first exon of seq below

join with scaffold_55:422403-435585  (-) between 6347 and 21819

84% to 21819_prot

duplicated exons 5 and 6

436471 LWKKYGPIFSVQIGSQKMVVLCGYETVKDALVNYAEEFSERPVVPIFLDAVKEY 436310

435614 GIIFSHGENWKVMRRFTLSTLRDFGMGRRTIEDRINEECDFLVEQFKSFK 435465

434353 GEPFENTMIMNAAVANIIVSIVLGHRFDYQDPIFLRLMSLINENIRLMGSPTVM 434192

432802 LYNVFPSVMRWLPGNHQTVGKNAAENRRFLRETFTKHRDKLDINDQRNLVDAFLVKQ Q 432629

432003 EKNGNAVYFHDENLTMLVSNLFAAGMETTSTSVRWGLLLMMKYPEIQ 431863

430611 LYNVFPSVMRWLPGNHQTVGKNAAENRRFIRETFTKHRDKLDVNDQRNLIDAYLVRQQ 430438

429812 EKNGNAVYFHDDNLTVLVSNLFAAGMETTSTSVRWGLLLMMKYPEIQ 429672

428838 ENVQNEIEKVIGQSRPQTEHRKSMPYTDAVIHEIQRFGNIIPMNLPHATAQDVTFRGYFLPK 428653

427205 GTYVIPLLTSVLYDQTRFEKPKEFYPQHFLDSEGNFVKNEAFLPFSA 427065

426319 GKRSCAGENLAKMELFLFFTSLLQNFTFQAPPGEELDLTPAIGITTPPLPHNICALPRT 426143

 

>21819_prot parts of two genes long last intron has more exons

scaffold_55:413156-422625 (-) corrected  gene model

81% to scaffold_55:314488-344970

422625 MFLGDPVTVLLAVALCLIVANTLYRQKRDSYKNFPPGPKPLPIIGNIHNINLKRPYLTYLE (0)  422443

421910 LWKKYGSIFSVQIGSQKMVVLCGYETVKDALVNHGEEFSERPEIPIFHVIAKGY (1) 421749

420605 GVIFSHGENWKVMRRFTLSTLRDFGMGKKSIEDKINEECDSLVEKLRSY (1) 420456

419591 GKAFENSVTINAAVANIIVSLLLGRRFDYEDPTFLRLMSLMNANFRLMGSPMVM 419430

417270 LYNLYPSIIRWLPGSHKTVGKNAAETQRFIRETFTKRREKLDVNDQRNLIDAFLVRQQ 417097

416953 ETKEDGCSFHDDNLTVLVSNLFAAGMETTSSTLRWGLLLMMKYPEIQ  (1) 416813

415727 KNVQNEIEKVIGQSRPQTEHRKSMPYTDAVIHEIQRFGNIIPMNLPHATAQDVTFRGYFLPK (0) 415542

414638 GTYVIPLLTSVLYDKDHFEKPNEFYPQHFLDSEGNFVRNEAFLPFSA (1) 414498

413332 GKRSCAGENLAKMQLFLFFTSLLQNFTFQAPPGEELDLTPTTGFTTPPLLHNICALPRT  413156

394115 NFSFQAPPGEELDLTPTTGFTTPSLLHNICALPHT* 394233

394006 NFTFQAPPGEELDLTSTTGFTTPPLPHNICALPRT* 394114

393899 NFTFQAPPGEELDLTPTTGFTTPPLPHNICALPRT* 394005

393791 NFTFQAPPGEELDLTPTTGFTTPPLPHNICALPRT* 393898

 

>second exon 4-9 86% to 21818_prot CX463658.2 CR436794.1 CR426826.1

84% to 21819_prot, N-term from ESTs

       MFLGDPVTVLLAVALCLIVANTLYRQKRDSYKNFPPGPKPLPIIGNIHNINLKRP

       YLTYLELWKKYGPIFSVQIGSQKMVVLCGYETVKDALVNYAEEFSERPVIPIFLDAVKEY

       GVIFSHGENWKVMRRFTLSTLRDFGMGRRTIEDRINEECDFLVEQFKSFK

393678 GKPFDNTMIMNAAVANIIVSIVLGHRFDYQDPIFLRLMSLINENVRLTGSPKAM  393517

391361 LYNVFPSVMRWLPGNHQTVGKNAAEYHRFIRETFTKYRDKLDINDQRNLVDAFLVKQQ 391188

390087 EKNGNAVYFHDDNLTVLVSNLFVAGMETTSTSVRWGLLLMMKFPEIQ  389947

373288 ENVQNEIEKVIGQSRPQTEHRKSMPYTDAVIHEIQRFGNIIPMNLPHATAQDVTFRGYFLPK 373103

       GTFVIPLLTSVLYDQTRFEKPKEFYPQHFLDSEGNFVKNEAFLPFSA

369454 GKRSCAGENLARMELFLFFTSLLQNFTFQAPPGEELDLTPAIGITTPPLPHNICALPRT 369278

 

>scaffold_55 fragment of exon 5 same as 389947 exon 5

368850 TTSTSVRWGLLLMMKFPEIQ 368791

 

>21818_prot 72% to NM_001004878.1 82% to 21819_prot

scaffold_55:344864-351768 (-) CF344279.1

354925 MFLGDPVTILLAVVLCLIVANTLYRGKKDGVGNLLPGPKPLPIIGNIHILNLKKPYLTYLK (0) 354743

       LWKKYGSIFRVQIGSQKMVVLCGYETVKDALINHGEEFSERPRLPIFQVIANGY

351804 GVAFSHGENWKVMRRFTLTALRDFGMGRRTIEDRINEECDFLVEAFKSYK 351655

350789 GKPFENLMILNAAVANIIVSIVFGHRFDYQNPTFLRLMRLINENARLLGSPTAM 350628

348627 LYNVFPSVMRWLPGSHKTLRKNVDEIKIFIRETFTKQRDKLDVNDQRNLIDAFLVKQQ 348454

347806 EKNGNGPYFHDENLTTLVNNLFSAGMETTSSTLRWGLLLMMKYPEIQ 347666

347063 KNVQNEIEKVIGQSRPQIEHRKSMPYTDAVIHEIQRFGNIIPMNLPHATAQDVTFRGYFLPK 346956

346172 GTYVIPLLTSVLYDQSHFEKPNEFYPQHFLDSEGNFVKNEAFLPFSA 346032

345043 GKRSCAGENLANMELFLFFTSLLQNFTFQAPPGEELDLTPGTGLSAPPLPHNICALPRT 344867

 

>scaffold_55:314488-344970 81% to 21819_prot

339702 MFLGDPVTLLLAVVLSLIVANTLYRKERVNVQNFPPGPKPLPIIGNIHNINAKRPYLTYLE (0) 339520

337426 LWKKYGSVFSVQIGSQRMVLLCGYETVKDALVNHAEEFSDRPIIPLFHEITKGN 337265

333747 GVVFANGENWKVMRRFTILALRDFGMGRRTIEYRINEECDFLVEKIKSYRG 333595

333068 GEPFENTMIMNAAVANIIVSILLGHRFDYQDPTILRLLSLINQSVKITGSPMVM 332907

331666 LYNMFPSVMRWLPGSHKTLAINVAEIQSFIRETFTKYRDKLEINDQRNLIDAFLVKQQE 331490

330183 NKENGLYFHDDNLTMLVSNLFTAGMETTSSTLRWGLLLMMKYPEIQ 330046

328774 ENVQNEIEKVIGQSRPQTEHRKSMPYTDAVIHEIQRFGNIIPMNLPHATAQDVTFRGYFLPK 328589

327361 GTFVIPLLMSVLYDQSHFENPNEFYPQHFLDSEGNFVKNEAFLPFSA 327221

321805 GKRSCAGENLARMELFLFFTSLLQNFTFQAPPGEELDLTPGTGLSAPPSPYKICALPCS 321629

 

>21816_prot 77% to NM_001035117 correct seq 80% to 21810_prot

scaffold_55:303150-314597 (-)

314597 MDPISILLSIAVCVFLLNLFYGGKGDSKMFPPGPKPLPLIGNLLIMNMKKPHLTFME (0) 314427

314228 LAEKYGSVFSVQLGTEKVVVLCGTDAVKEALINHADEFSERPKIPIFEDVSKGY 314067

312244 GLIFSHGENWKVMRRFTLTTLRDFGMGKKTIEERICEESDCLVEAFKSYK 312095

310744 GKPFENTLIMNAAVANIIVSILLGHRFDYQDTALLKLIKIINENVRLMGSPMVM 310583

308441 LYNTYPSVMQWLPGKHKTVAENTLKLFKFLEETFTKHRDQLDVNDQRDLVDTFLVKQQE 308265

307774 EKPSSSKFFHDQNLTLLVSNLFGAGMETTSTTLRWGLLLMMKYPDIQ 307634

306167 KKVQDEIDKVIGSAEPQTEHRKLMPYTDAVIHEIQRFANIAPSNLPHATTTDVTFRGYFIPK 305985

304474 GTQVIPLLTSVLQDKNYFKKPEEFYPEHFLDSEGHFMKNEAFLPFSA 304334

303329 GRRSCAGETLAKMELFLFFTKLLQNFTFQPPPGVEVQLTSGEGFTSSPLQHNICALPRT 303153

 

>21813_prot scaffold_55:257853-278655 (+) 90% to 21811_prot

part of exon 2 in seq gap, trace archive for exon 2 586458683

262853 MDPVSVLLSVVVCIFLYKVFYGGKERPENFPPGPKPLPLIGNLHIMNMRKPHLTFME (0) 263023

265029 LAKTYGSVFSFQLGLEKIVVLCGTDTVKDALINHAEEFSERAKIPVFEDIAKGH

269321 GIVFAHGENWKVMRRFTLSALRDFGMGKKTIEDKICEESDCLVETFKSYN 269470

270333 GKPFDNTFILNSAAANIIVTILLGDRFDYKDPKMLNLIKVVNQNMRIGGGFMVR 270494

273263 LYNTYPTIMRWIPGSHQTVSKNVATIFKFLNETFTEHRKVLDVNDQRDLIDAFLVKQQE 273439

274172 EELSSKKFFYNQNLTVLVTNLFAAGMETTSTTLRWGLLLMMKYPEIQ 274312

276404 KKIQKEIDQVIGSAQPRLEHRKQMPYTDAVIHEIQRFANIAPINIPHETTQDVTFRGYFIPK 276589

277716 GTQVIPLLASVLRDKAYFKKPEEFYPEHFLDSEGNFVKNEAFLPFSA  277856

278476 GKRSCAGETLAKMELFLFFTKLLQNFTFQPPPGVEVQLTCGVALTSIPLDHKICALPRS 278652

 

>scaffold_55:287553-290430 exons 1-3 (+) 89% to 21811_prot

21815_prot exons 4-8 missing exon 9 scaffold_55:291301-297995 (+)

287561 MDPVSVLLSVVVCIFLYKVFYGGEKESQNFPPGPKPLPLIGNLHIMNMRKPHLTFME (0) 287731

289748 LAKTYGSVFSVQLGLRKTVVLCGADTVRDALINHAEEFSERARIPVFEDITKGHG 289912

290242 GIVFAHGENWKVMRRFTLSTLRDFGMGKKTIEDKICEESDSLVEIFKSYN 290391

291900 GKPFDNTLILNSAVANIIVTILLGDRFDYKDPTLLKLVKVVNQNIRIGGGFMAR 292061

292702 LYNIYPSVMRWIPGDHKTVFKNIAKVYKFLNKTFTEHRKVLDVNDQRDLIDAFLVKQQE 292878

294169 EKLSSKKFFHNQNLTVLVANLFAAGMETTSTTLRWGLLLMMKYPEIQ 294309

295416 KKIQEEIDRVIGSAEPRLEHRKLMPYTDAVIHEIQRFANIAPNNVPHETTQDVTFRGYFIPK 295601

296551 GTQVIPMLTSVLRDKAYFKKPEEFYPEHFLDSEGKFVKNEAFLPFSA  296691

297892 GRRSCAGETLAKMELFLFFTKLLQNFTFQPPPGVEVQLTCGVAMTSIPLYHNIC ALSRS* 298071

 

>21812_prot scaffold_55:242882-252687 (-) 90% to 21809_prot

DN029946.1 fills seq gap

252687 MFSFEPITLFMAIVICLLIYLVYGGKGTPPNFPPGPKPLPLIGNLHIMNLKKPYMTLME (0) 252511

250984 LGKKYGSVFSVQLGTEKVVVLCGYDAVKDALINHAEEFSDRPIIEAFHRRSNGH 250823

250732 GITFSHGENWKVMRRFTLATLRDFGMGKRTIEDKINEECISLVETFQSYK 250583

       GEPFENSLILNAAVANIIVSILLGHRFEYQDPTLLKLIRLINEIARILGTPIVM

       LYNAYPSVMRWLPGSHHNVEKNTQKSHTFI

247704 KETFAEHKAQLDINDQRDFIDAFLIKQSE 247618

245612 EKSATGRFFHNENLVSLVDSLFSAGMETTSTTLRWSLMLMMKYPEIQ 245472

245315 KKVQEEIDKVIGSAQPQMEHRKQMPYTDAVIHEIQRFADIVPTNLPHSTTKDVTFRGYLIPK 245130

243475 GTQVIPLLTSVLRDKAYFERPYEFYPQHFLDSEGNFVKNEAFIPFSA 243335

243067 GKRSCAGETLAKMELFLFFTKLLQNFTFQSPPGQDLHLTPLVGFTSAPMVHKICALSRTLD* 242882

 

>21811_prot scaffold_55:216779-238524 (+) 90% to 21813_prot

78% to NM_00103511

223244 MDPVSVLLSVVICIFLYKVFYGGKETSKNFPPGPKPLPLIGNLHIMNMKKPHLTFME (0) 223414

225164 LAEKYGSVFSFEFGLRKTVVLCGTDTVRDALINHAEEFSERARIPVFEDITKGH (1) 225325

225574 GIVFAHGENWKVMRRFTLSTLRDFGMGKKTIEDKICEESDCLVEIFKSYN (1) 225723

227389 GKPFDNTLIMNSAVANIIVTILLGDRFDYKDPTMLKLVKVVNQNIRITGGLMAR (0)227550

230255 LYNIYPSIMRWIPGSHQTVSKNMAKVFKFLNETFTEHRKQLDVNDQRDLIDAFLVKQRE (0) 230431

232470 EKLSAKTFFHNDNLTVLVTNLFGAGMETTSTTLRWGLLLMMKYPVIQ (1) 232610

234765 KKVQKEIDQVIGSAQPRLEHRKQMPYTDAVIHEIQRFANIAPINIPHETTQDVTFRGYFIPK (0) 234947

236825 GTQVIPVLTSVLQDKAYFKKPEEFYPEHFLDSEGKFVKNEAFLPFSA (1) 236965

238345 GKRSCAGETLAKMELFLFFTKLLQNFTFQPPPGVEVQLTCGVALTSIPADHKICALPLS 238521

 

>21810_prot scaffold_55:188843-209598 (-) 80% to 21816_prot

209598 MDPVSVLLSVVVCIFLFKVFYGGKRTLENFPPGPKPLPLIGNLHMMNMKKPHLTFME (0) 209428

207937 LAEKYGSVFSVHLGTEKVVVLCGTDTVRDALINHAEEFSERAKMPIFEDFSKGL (1) 207776

206533 GVVFGHGENWKVMRRFTLSTLRDFGMGKKTIEERISEESDCLVETIKSYE (1) 206384

205040 GKPFDNTLIMNAAVANIIVHILLNHRFDYQDPTLLKL LINIVIDNIKIGGSPIVM   204879

200634 LYNTYPSVVRWIPGSHKTLGENTAQLYKFLEETFTQHREQLDVNDQRDLIDAFLVKQQE  200458

198405 EKPSSAKFFHNENLVALLANLFVAGMETSSTTLRWGLLLMMKYPDIQ 198265

192757 KKVQDEIDKVIGSAEPRLEHRKLMPYTDAVIHEIQRFANIAPISLPHATTTDVTFRGYFIPK (0) 192572

191365 DTQVMIVLTSVLQDKDYFKKPEEFYPEHFLNSKGNFVKNEAFLPFSA (1) 191222

189019 GRRRICAGETLAKMELFLFFTKLLQNFTFQPPPGVEVDLTCADAMTSKPQEHQICALPRG* 188843

 

>21809_prot bad model green parts OK 90% to 21812_prot

77% to NM_001035117 (lower case)

scaffold_55:163837-187896 (-)

184541 MVSFEPITLFLAIVICLFLIYLVYGGKGTPPNFPPGPKPLPLIGNLHIINLKKPYMTFME (0) 184362

173558 LGKKYGSVFRVQLGTEKVVVLCGYDAVKDALINHAEEFSDRPIIETFHRRSNGH

173308 GITFSHGENWKVMRRFTIATLRDFGMGKRTIEDRINEECHSLVETFQSYK 173159

171488 GEPFETNLIMNAAVANIIVSILLGHRFEYQDPTLLKLIGLSNEMVRILGSPIVL 171339

169346 LYNAYPSVMKWLPGSHHNVIKNTQKSHTFIKETFTEHKAQLDINDQRDFIDAFLAKQSE 169170

167042 KKPNPGLFFHNENLVSLVDGLFVAGMETTSTTLRWGLLLMMKYPEIQ 166899

166538 KVQDEINKVIGSAQPQTEHRKQMPYTDAVIHEIQRFADIIPANLPHATTKDVTFRGYFIPK 166356

164553 GTQVIPMLTSVLRDKDYFERPYEFYPQHFLDSEGNFVKNEAFLPFSA 164413

164016 GKRSCAGETLAKMELFLFFTNLLQNFTFQPPPGQDLNLTTTGGFTSIPMVHKICALSRN 163840

 

>21808_prot New exons 3-5 scaffold_55:152750-158460 (+) no ESTs 83% to 21810_prot

exon 7 decaying, probable pseudogene

154072 MDPVSVLLSVVICIFLYKIFYGGKETPENSPPGPKPLPLIGNLHMINMKKPHLTFME (0) 154242

       seq gap

155983 GIVFAHGENWKVMRRFTLSTLRDFGMGKKTIEDRISEESDCLVGVFKSYE (1)156132

156898 GKPFDNTMIMNAAVANIIVHILLNHRFEYQDPTLLKLIKIVSENIRIGGSPIVM  (0) 157059

157308 LYNTYPSIMRWIPGRHKTVGANTAKLYDFLKETFTRHREHLDVNDQRDLIDVFLVKQQE  (0)157484

158540 KKLSSTKFFHDENLTVLLGNLFGAGMETTSTTLRWGLLLMMKYPEVQ 158680

159012 LYNAFPSVMGWLPGRQQRLFENSQTFHESI KHKSQLDISDQRDLL 159147

 

>scaffold_55: 150542-150360 (-) 100% to NM_001004878.1

150542 MLAADPMTILLSAFICLLLGFVLFGRKRNVCQNFPPGPRPLPVIGNLLLMDRKQPYKALLK (0) 150360

 

>NM_001004878.1    66% to NM_001035117, 51% to 2K17 zebrafish

21807_prot (extra N-term piece) P=Q in browser

scaffold_55:119438-130135 (-)

130135 MLAADPMTILLSAFICLLLGFVLFGRKRNVCQNFPPGPRPLPVIGNLLLMDRKQPYKALLK (0) 129953

126756 VSKKYGPVCSFQIGPLKTVVLCGYDTVKDALLNDEFADRPAMPMLDDVAKGH 126601

124037 GILSSNGENWRVMRRFALSTLRDFGMGKKTIESKINEECDHLVQKFSSY 123891

123551 GKPFDTTMIMNAAVANIIASILLSHRFHYENPTLLRLLKLVNENTKFMASRIAM 123390

123231 LYNTFPSIMRWIPGCHKSIYKNAQELLEFIRETFSKQKVELDINDQRNLIDAFLSRQQE (0) 123055

122583 PNSGKYFHDDNLTILVFDLFVAGMETTSTTLRWALLLMMKYPEIQ 122449

121271 KKVQDEIEKVIGSAEPRAEHRKEMPYTDAVIHEIQRFANIFPMNGPHATTKDVTFRGFLIPK 121089

119957 GTFVIPLLASVLKDENYFKKPNEFYPEHFLDSEGHFVKNDAFLPFSA 119817

119617 GRRSCAGENLARMELFLFFTSLLQNFTFQAPPGEELDLTPDVGIATPPMPHTVCALPRA 119441

 

>exon 1 with frameshift at end same as 21811_prot

115608 MDPVSVLLSVVICIFLYKVFYGGKETSKNFPPGPKPLPLIGNLHIMNMKKPHLTX 115769

115769 ME (0)115774

>$$$$$ 93% to NM_001004777

112992 MLAADPMTILLSAFICLLLGFVLFGRKRNVCQNFPPGPRALQVIGNLLLMDRRQPYETLIE (0) 112810

 

>NM_001004777.1 (gap missing C-helix, 22 aa) CX454308.2 69% to NM_001035117

scaffold_55:85652-94653 (-)

94653 MLAADPMTILLSAFICLLLGFVLFGRKRNVCQNFPPGPRALPVIGNLLLMDRKQPYKTLME (0) 94471

93921 VSKKYGSVFSVRVGPLKMVVLCGYDTVKDALLNYPDEFADRPALPLFDELVKGH 93760

      GIIFSNGENWKVMRRFSLSTLRDFGMGKKTIESKIIEECDHLVQKFNSYG

91987 GKPFDTTMIMNAAAANIIASILLSHRFHYENPTLLRLLKLVNENMRLMASPIAL 91826

89951 LYNTYPSIMRWVPGCHKTIYNNAQELMEFIRETFSKHKVELDINDQRNLIDAFLSRQQE 89775

88013 EKPHSAKYFHDDNLTILVIDLFAAGMETTSTTLRWALLLMMKYPEIQ 87873

87405 KKVQDEIEKVIGSVEPRAEHRKEMPYTDAVLHEIQRFANITPMNGPHATTKDVTFRGFFLPK 87220

86157 GTYVIPLLASVLKDENYFEKPNEFYPEHFLDSEGHFMKNEAFLPFSA 86017

85828 GRRSCAGETLARMELFLFFTSLLQNFTFQAPPGEELDLTPDVGGTVPPRPHTVCALPRS 85652

 

>scaffold_55

82615 GRRSCAGKTLAKMKLFLFFTSILQNFTFQAPPGVEPDLTPAISGTRTHKPHTVCALPRA 82439

 

>$$$$$4 95% to NM_001004777.1

78951 MLAADPMTILLSAFICLLLGFVLFGRKRNVCQNFPPGPRALPVIGNLLLMDRKQPYKTLME (0) 78769

78170 VSKKYGPIFSVRAGPQKMVVLCGYDTVKDALLNYPDEFADRPALPLFDEVVKGH 78009

76552 GIFFSNGENWKVMRRFGLSALRDFGMGKKTIESKINEECDHLVQKFNSYG 76403

75689 GKPFDTTMIMNAAAANIIASILLSHRFQYENPTLLRLLKLVNENIRLMASPIAL 75528

74153 LYNTYPSIMRWVPGCHKTIYKNAQELMEFIRVTFSKHKAELDINDQRNLIDAFLSRQQE 73977

67696 EKPHSAKYFHDDNLTILVFDLFAAGMETTSTTLRWALLLMMKYPEIQ 67556

67082 KKVQDEIEKVIGSVEPRAEHRKEMPYTDAVLHEIQRFGNITPMNGPHATTKDVTFRGFFLPK 66897

69093 GTYVIPLLASVLKDENYFEKPNEFYPEHFLDSEGHFVKNEAFLPFSA 68953

68767 GRRSCAGETLARMELFLFFTSLLQNFTFQAPPGEELDLTPDVGGTVPPRPHTVCALARS 68591

 

>$$$$$5

Note frameshift in exon 1

55169 MDPVSVLLSVVVCIFLYKVFYGGKEASQ 55086

55084 NFPPGPKPLPLIGNLHMMNMKKPHLTFME 54998

53620 FSKKYGPVFSIQLGLNKAIVLCGADAVKDALINHGDEFSGRPKIPVFDQISKGY 53459

52239 GVVFADGENWKVMRRFALSTLRDFGMGRKTIEDTIVEEXXXXXXXXXXXX 52126

51713 AKPFDNTLILNAAVANIIVHILLNHRFEYQDPTLIKLIKSVSENVKIAGSPIVM 51552

50894 LYNTYPSIMGWIPGSHKTVFENFQKLSNFLKETFTKRRDQLDVNDQRDLIDAFLVKQQE 50718

50601 LALQFQEKSSSKKFFHDENLKVLLGDLFAAGMETTSTTLRWGILMMMKYPDIQ 50443

49280 KKVQDEIDRVIGSAEPRLEHRKQIPYTDAVIHEIQRFANLVPIVLPHSITEDVTFRGYFLPK 49095

48788 GTQVIPLLISVMQDKDYFQKPEEFYPEHFLDSKGNFVKNEAFLPFSV 48648

48515 GKRSCVGETLAKMELFLFFTKLLQNFTFQPPHGVEVQLTCGDALTSIPLDHKICALPRS 48339

 

>$$$$$$6 nearly identical to $5

37714 GVVFADGENWKVMRRFALSTLRDFGMGRKTIEDTIVEESGCLVETFKSHE 37565

37222 GKPFDNTLILNAAVANIIVHILLNHRFDYQDPTLIKLIKSVSENVKIAGSPIVM 37061

36400 LYNTYPSIMGWIPGSHKTVFENFQKLSNFLKETFTKRRDQLDVNDQRDLIDAFLVKQQE 36224

36089 EKSSSKKFFHDENLKVLLGDLFAAGMETTSTTLRWGILLMMKYPDIQ 35949

34787 KKVQDEIDRVIGSAEPRLEHRKQIPYTDAVIHEIQRFANLVPIVLPHSITEDVTFRGYFL K 34608

34295 GTQVIPLLISVMQDKDYFQKPEEFYPEHFLDSKGNFVKNEAFLPFSV 34155

34022 GKRSCVGETLAKMELFLFFTKLLQNFTFQPPHGVEVQLTCGDALTSIPLDHKICALPRS 33846

 

>$$$$$$7 duplicate exons to $8

29803 LAKKYGPVFSVQLGTKKTVVLCGTDAVKDALINYADEFSGRPKTPLSEQASKGN 29642

28967 GIIFANGENWKVMRRFTLSTLRDFGMGKKTIEDRISEESDCLVETFKSHKGR 28812

28033 GKPFDNTLILNAAVANIIVHILLNHRFDYQDPTFLKLIKSVNDNVRNGARPIIVVSKLWP 27854

 

>$$$$$$8 no ESTs

26641 LAKKYGPVFSVQLGTKKTVVLCGTDAVKDALINYADEFSGRPKTPLSEQASKGN 26480

25807 GIIFANGENWKVMRRFTLSTLRDFGMGKKTIEDRISEESDCLVETFKSHKGR 25652

22131 LYNAFPSIIRWIPGTHKRIFASSQNFFNFLKEIFMKRKDQLDVNDQRDLVDAFLVKQQE 21955

21874 EKSSSTKFFHDENLKVLIGNLFGAGMETTSTTLRWGILLMMKYPEIQ 21734

20135 KKVQDEIDRVMGSTEPRPEHRKQMPYTDAVIHEIQRFADLVPNGVPHATTTDVTFRGYFIPK 19950

19461 GTQVFPLLTSVLRDKAYFKKPDEFYPEHFLDSEGNFLKNEAFLPFSAG 19318

 

>21803_prot scaffold_55:62-7002 (-) 84% to seq at 28033

DN017398.1 DT401910.1 DN087618.1 DN099678.1 DN087299.1

Seq completed by ESTs

49361_prot scaffold_996:1053-7259 same seq as 21803_prot scaffold_55:62-7002

7002 MDPVSVLLSVVVCIFLFKFFYGGEKGSQNFPPGPKPLPLIGNLHMINMKKPYLTFME (0) 6832

6071 LAEKYGPVFSVHLGANKAVVLCGTDAVKDALINYADEFSGRPKTPLFEQTFKGN (1) 5910

4393 GIVFADGENWKVMRRFTISTLRDFGMGKKTIEDRIIEESCCLVETFKSHK (1) 4244

2832 GKPFDNTMILNAAVANIIVHILLKHRFEYQDPTLLKLIKGVNENVRNGARPIVM (0) 2671

     LYNAFPSIIQWIPGTHKRIFANTQNFFNILKEIFIEHRDQLDVNDQRDLIDTFLVKQQE

     EKSSSTKFFHDENLKVLIGNLFAAGMETTSTTLRWGILLMMKYPEIQ

 661 KKVQDEIDRVIGSAEPRLEHRKLMPYTDAVVHEIQRFANLVPNGLPHATTTDVTFRGYFIPK (0) 476

     GTQVIPLLTSVLRDKAYFKKPEEFYPEHFLDSKGNFLKNEAFLPFSA

     GKRTCAGETLAKMELFLFFTKLLQNFTFQPPPGVEVQLTRGVSLTSIPLDHKICALSRS*

 

25 P450 gene cluster on scaffold 55 continues on scaffold 996 upstream of

21803_prot One side of this cluster has genes that are homologous to Chr 6p21

in humans. The other side of the cluster has a methyl malonyl CoA mutase also

on 6p21. There are no P450 gene clusters in humans on chr6, but CYP21A2 is at

6p21. The CYP21A2 gene is at 32Mb and the MUT gene and rhag are at 49.5Mb, not

in a syntenic region.

 

>49362_prot scaffold_996:115436-122968

86% to 21803_prot scaffold_55:62-7002

MDLVSVLLSVVVCIFLYKVFYGGEKESQNFPPGPKPLPLIGNLHIMNMKK

PFLTFMELAEKYGPVFSVQLGTKKVVVLCGTDAVKDALVNHADEFSGRPK

IPMFDQTSKGHGVIFADGENWKVMRRFTLSTLRDFGMGKKTLEDRIGEES

GCLVETFKSHEGKPFDNTLILNAAVANIIVHILLNHRFDYQDPTLLKLIK

SVSENVRIGGRPIVMLYNTYPSIMQWVPGSHKSIYENSQNLLNFLKETFT

EHRHQLDVNDQRDLIDTFLVKQQEEKSSSTKFFHDENLTILLSNLFGAGM

ETTSTTLRWGILLMMKYPDIQKKVQDEIDQVIGSAEPRLEHRKQMPYTDA

VIHEIQRFANLAPNGLPHATTTDVTFRGYFIPKGTQVIPVLTSVLRDKAY

FKKPEEFYPEHFLDSEGKFLKNEAFLPFSAGKRICAGETLAKMELFLFFT

KLLQNFTFQPPPGVEVQLTCGDAITSIPLDHKICALSRS

 

>49364_prot scaffold_996:134740-168381 poor model, missing exons 6,7

same seq as:

NM_001035117 CYP2 family member, 50% to 2K21 zebrafish

from refseq database

83% to 49362_prot scaffold_996:115436-122968

MDPVSVLLSVVVCIFLFKVFYDGEKESQNFPPGPKPLPLIGNLHIINMEKPYLTFME

LAEKYGSVFSFHLGTEKVVVLCGTDAVRDALINHAEEFSGRPKVAIFDQIFKGH

GIIFADGENWKVMRRFSLSTLRDFGMGKKTIEEKISEESDCLVETFKSH

GGKPFDNTMIMNAAVANIIVALLLSQRFDYQDPTLLKLVKSINKIVRITGSSMVM

LYNTFPSIMQWIPGSHQNVVKNAEKIYTFLIETFTKHRHQLDVNDQRDLIDTFLIKQQE

EKSSSTKFFHDENLKVLLLNLFGAGMETTSTTLRWGILLMMKYPEVQ

KKVQDEIDRVIGSAEPRLEHQKQMPYTDAVIHEIQRFADLVPNNVPHATTKDVTFRGYFIPK

GTHVIPLLTSVLKDKDYFKKPNEFYPEHFLDSEGHFVKNEAFLPFSA

GRRICAGETLAKMELFLFFTNLLQNFTFQPPPGVEVQLTRGVAITSIPTEHKICALPRS

 

>4055_prot scaffold_996:168592-176929

14029_prot scaffold_996:176841-181757 join these two

89% to 49367_prot scaffold_996:195103-207745

172383 MDLVSVLLSVVICIFLYKVFYGGEKESQNFPPGPKPLPIIGNFHMINMKKPHLTFME 172553

172634 LAKKYGSVFSIQLGPEKLVVVCGADAVKDALVNHADEFSARPTIPVFDKTSKGH 172795

174055 GVFFANGENWKVMRRFTLSTLRDFGMGKKTIEDRICEESDFLMETFKSYK 174204

174922 GKPFDNTMIMNAAVANIIVHILLNHRFDYQDPTLLKLINIVSENISIAAKPIVL 175080

176775 LYNAYPSIMEWVPGTHKSVAENMLKLYNFLRETFTQHRDQLDVNDQRDLIDVFLVKQQE 176951

177972 EKPSSTKFFNDQNLTVLLADLFGAGMETTSTTLRWGLLFIMKYPDIQ 178112

179041 KKVQDEIDKVIGSAQPRLEHRKKMPYTDAVIHEIQRLGNLAPNVGHETTTDVTFRGYFIPK 179223

180149 GTQVIILLTSVLQDKDYFKKPEEFYPEHFLDSEGNFVKNEAFLPFSA 180289

181578 GRRICVGETLAKMELFLFFTKLLQNFTFQPPPGVEVDLTCADAITSKPLEHQICALPRS* 181757

 

>49367_prot scaffold_996:195103-207745

80% to 49362_prot scaffold_996:115436-122968

MDLVSVLLAVVICFFIFKVFYGGKNAFQNFPPGPKPLPIIGNFHMINMKKPYLTFME

LAEKYGPVFSIQLGTEKVVVLYGADAVKDALINHGDEFSGRPTIPVFDRISKGH

GLFFANGENWKVMRRFTLSTLRDFGMGKKTIEDRICEESDFLMETFKSYK

GKPFDNTMIMNAAVANIIVHILLNHRFDYQDPTLLKLINTISENVRIAGKPMVV

LYNAYPSIMQWFPGIHKSVAESILQFYDFLRETFTQHRDQLDVNDQRDLIDVFLVKQQE

EKSSSTKFFNDHNLTALVADLFGAGMETTSTTLRWGLLFMMKYPDIQ

KKVQDEIDRVIGSAQPRLEHRKTMPYTDAVIHEIQRLGNLAPFIGHETTTDVTFRGYFIPK

GTQAIVLLASVLQDKDYFKKPEEFYPEHFLDSEGNFVKNEAFLPFSA

GRRMCVGETLAKMELFLFFTKLLQNFTFQPPPGVEVDLTCGDAVTSKPLDHQICALPRS

 

>49368_prot scaffold_996:213773-226056, 81% to 21809_prot

MFPLEPTTLFVAIVLCLFLIYLLLHNGKGTPPNFPPGPKPLPFIGNLHIM

NLNKPHKTYMELGNKYGSVFSVQLGTEKVVVLCGYDAVKDALINHAEEFS

ERAVSTLSRKRLKGYGIIFSHGENWKVMRRFTLATLRDFGMGKRTTEDTI

NEECNFLMETFKSYKGEPFETNLIMNAAVANIIVSILLGHRFEYQDPTLL

KLIGLVNEIVKLSGRPIIMIYDAFPSVVSWLPGSHQKVLENTRGLRNFIK

ETFTEHKARLDINDQRDLIDVFLVKQREEKPNPGLFFHNENLISLVSNLF

VAGMETTSTTLRWGLLLMMKYPEIQKKVQNEIDKVIGSAQPQMEHRKQMP

YTDAVIHEIQRFADIVPTNLPHATTMDVTFRGYLIPKGTRVIPLLTSVLR

DKAYFEKPYEFYPEHFLDSEGNFVKNEAFIPFSAGKRICAGETLAKMELF

LFFTNLLQNFTFRSPPGQDLPLTTAEGFTSIPMVHKICAVSRA

 

>49369_prot scaffold_996:232793-245538 missing exon 4, 78% to $$$$$4

MLVADPMTILLSAFICLLLGFVLVGNKRHIYRKFPPGPRALPFIGNIQMIYVKQPYKTLLE

LSKTYGSIFSIQVGTEKMVVLCGYDTVKDALLNYPDDFADRPALPLIDDLAKRH

GVFFSNGENWRVMRRFALSALKDFGMGKKRMEKTIIEECDHLVQKFNSYGGQYH

Seq gap at repeat seq.

LYHTYPSIMRWVPGCHKTVYKNGRELYHFLKETFSKHKADLDINNQRNLIDAFLSKQQK

EKSKPDGFFHDDNLTTLLFDLFTAGMETIANTLRWAILLMMKYPEVQ

KKVQDEIEKVIGSAEPRVEHRKNMPYTDAVIHEIQRFANITPMNCPYATSQDVTFKGYFLPK

GTQVIPLLASVLQDEAYFEKPEEFYPQHFLDSEGHFVKNEASIPFSA

GRRSCAGENLARMELFLFFTSLLQNFTFQAPPGEELDLTPDVGLSTPPMQHTTCALSRACS

 

A flanking gene in 996 is MUT methylmalonyl CoA mutase it is on human chr 6p21

Just like the other end of this cluster rhag gene.  There are no P450s at

This location on human chr 6.

 

>NM_001015757.1 47% to 2K6 zebrafish

from refseq database

MDFTFSLATYLVLVVTVLYILSNWKRKALNNFPPGPKGWPLVGNVFSIDLKKPQRTYIE

LSKKYGPVFSVQMGRKKMVILVGYETVKDALVTHAEEFGGRAYIPVTKDLEKGL

GMIFSNGENWKAMRRFTITTLKDFGMGKSTIEETIAHECSYLVQYFASFK

GKPFDNSTILITSVANIIVAILLGHRMEYEDPVFLRLVNLNSEYVKLLGSPMVT

IYNMFPALGFLPGCHKTVKKNLKELYAFLKRTFVEYQKNFDIHDQRSFIDVFLARQKEE

AKHPETYSYFHNENLVRLVRNLFSAGMETTSTALRWALLLMIKYPDIQ

EKVHDEIARVIGSAHPTYSHRTQMPFTNAVIHEMLRFADIVPLSVPHETTRDVHFKGYFIPK

GTYIIPLLTSVLKDKTQFDAPEQFNPNHFLDSEGNFLKKEAFMPFSA

GRRACPGEILARMELFIFFTSLLQKFSFRPPPGVTNINLSSDVGFTSVPLEGMICAIPRA

 

>CYP3A81

NM_001015786.1 59% to 3A5

from refseq database

MNLIPHLSTGTWILLAALLVLILLYGIWPYGYFKKMGIPGPTPL

PFIGTFLEFRKGMVQFDTECFKKYGKMWGTYDGRQPVLAIMDPAIIKTILVKECYTNF

TNRRNFGLNGPFESAITIAEDEQWKRIRNVLSPTFTSGKLKEMFQIMKDYSDILVKNI

QGYVEKDEPCATKDVIGAYSMDVITSTSFSVNIDSLNKPSDPFVIHMKKLLKTGLLSP

LLILVVIFPFLRPILEGLNLNFVPKDFTEFFMNAVTSFREKRKKGDHSGRVDLLQLMM

DSRTTGGNDLSNKHKALTDAEIMAQSVIFIVAGYETTSTALSYLFYNLATHPDVQQRL

HEEIDSFLPDKASPTYDILMQMEYLDMVIQETLRLFPPAGRLERVSKQNVEINGVSIP

KGIVTLIPAYVLQRDPEYWPEPEEFRPERFSKENRATHTPFTFLPFGDGPRNCIGLRF

ALLSMKVAIVTLLQNFSVRPCAETLIPMEFSTIGFLQPKKPIVLKFLSRAAAHE

 

>CYP3A82

BX779027 CX415432.1 93% to 3A81 N. Abdletawab

CX415433 = mate pair

Trace archive 243619963 419527443 418746304

476067541 (exon 1)

416306382 joint before I-helix

MNLIPHLSTGTWILLAVLLVLILL (2)

YGIWPYGLFKKMGIPGPTPLPFIGTFLEFRKGMIQFDIECFKKYGR

MWGMYDGRQPVLAIMDPAIIKTILVKECYTNFTNRRNFGLNGPLESAITVAEDEQWKRIR

SVLSPTFTSGKLKEMFQIMKDYSDILVKNVQVSVDKDEPCATKDVIGAYSMDVITSTSFS

VNIDSLQNPSNPFVIHIKNLLKTGFLSPVIIFAVIFPFLRPIFEVLNISFFPKDFTQFFM

NAVTSFREKRKKGDHS (0)

GRVDLLQLMVDSGTTEGNDSSNQHKALTDAEIMAQSLIFIFAGYETTSTALSYLF
YNLATHPDVQQKLHEEIDSFLPDKASPTYDILMQMEYLDMVIQETLRLFPPAGRLERVSK
QNVEINGVIIPKGTVAMIPAYVLQRDPEYWPEPEEFRPERFSKENRATQTPFTFLPFGDG
PRNCIGLRFALLSMKVAIATLLQNFSVRPCAETLIPMEFSTIGVLQPKKPIVLKFLSRAAAHE*
 

>CYP3A83 CX982440.1 DT453227 G.Vasser

60% to 3A80 chicken, 56% to 3A81 57% to 3A82

MTFLPDFSMATWTLLVLLLTLLAYYAIWPYKLFKRYGIPGPTPIPFIGTFL

GNRHGLMEFDMECFKKYGKVWGIYEGQKPLLAIVDPVIIKSIMVKECYTNFTNRRDFGLS

GPLKSSVLISKDEQWKRIRTVLSPTFTSGKLKQMFPLMNHYGELLVKNIHKKINNKEPLD

MKHIFGSYSMDIVLSTSFSVNVDSMNNPNDPFVTNARNLFTFSFFNPLFLISILCPFLVP

LLDKMNFCFLSLKILNFFKDAVASIKKKRQKGTH

EDRVDFLQLMVDAQSNEGKSVPEEEKHGYKE

LSDTEILAQSLIFIMAGYETTSTTLMFLAYNIATHPDVQRKLEEEIDALLPNKAPPTYDA

LMKMEYMDMVINESLRMFPPAIRIDRVCKKTMEINGVTIPAGVVIVVPLFALHLNPEIWP

EPEEFQPERFSKENQKNQDPYNFLPFGVGPRNCIGMRFALVNMKVALTILLQNFRLETCK

DTPVPLKICTKGYLKPTKPIILNLIPKVGQTVEE*
 

>CYP4T8

DR860523.1 CX967683.1 DT412234.1 DR834559.1

best match to 4B1 in ESTdb X.tropicalis

A.Bolen, K. Iyer, E. Mahrous, S. Aggarwal

89% to CYP4T4 of Xenopus laevis

best match in human = 4B1 55%, probably a 4B1 ortholog

MWNTLVWQQVA

ALLCLLAVLLKATQLYLSQKRQENIFKQFPGPPRHWLLGNVDQIRRDGKDLDLLVNWTHK

NGGAFPVWFGNFSSFLFLTHPDYAKVIFGREEPKSSISYDFLVPWIGKGLLVLTGPKWFQ

HRRLLTPGFHYDVLKPYVNLISKCTTDMLDNWEKLITKQKTVELFQHLSLMTLDSIMKCA

FSYDSNCQKDSNNAYIKAVFDLS

YVANLRLRCFPYHNDTVFYLSPHGYRFRKACRITHEHTDKVIQQRKESMKLEKELEKIQQ

KRHLDFLDILLFARDEKGHGLSDEDLRAEVDTFMFEGHDTTASGISWILYCMAKYPEHQQ

KCREEIKEVLGDRQIMEWEDLGKIPYTNMCIKESLRMYPPVPGVARQLRNPVTFFDGRSV

PAGTVIGLSIYAIHKNPAVWEDPEVFNPLRFSPENSANRHSHAFLPFAAGPRNCIGQNFA

MNEMKVAVALTLNRFHLAPDLENPPIRIPQLVLKSKNGIHVHLTKVQ*

 

>CYP4T9

NM_001017348.2 55% to 4B1 human, 73% to DR860523.1 C. Blackwell

79% to CYP4T3 of Xenopus laevis

from refseq database

MASTLWKALSSPWLSVNIYQIGQFVALLCVVLLLLKAYALYSRG

RRFAAALVPFPGPPAHWLYGHVNQFRRDGKDLDRLMVWVNKYPNAFPLWIGKFFGTLI

ITDPDYAKVVFGRSDPKTSTGYNFLVPWIGKGLLILSGNTWFQHRRLITPGFHYDVLK

PYVSLISDSTKIMLDELDVYSNKDESVELFQHVSLMTLDSIMKCAFSYHSNCQTDKDN

DYIQAVYDLSWLTQQRIRTFPYHSNLIYFLSPHGFRFRKACRIVHLHTDKVIGQRKKL

LESKEELEKVQKKRHLDFLDILLCSKDENGQGLSHEDLRAEVDTFMFEGHDTTSSGIS

WILYCMATHPEHQQKCREEISEALGERQTMEWDDLNRMPYTTMCIKESLRLYPPVPSV

SRELAKPITFHDGRSLPAGMLVSLQIYAIHRNPNVWKDPEIFDPLRFSPENSSKRHSH

AFVPFAAGPRNCIGQNFAMNEMKVAVALTLKRFELSPDLSKPPLKQPQLVLRSKNGIH

VYLKKAS

 

>CYP4F.a2 AL648475.2, CF786466.1, AL896688.2 S.Hill, N.Liao

NM_001016020.2 62% to 4F22, 1aa different to NM_001015770

90% to NM_001015810, same as DT404766.1

from refseq database

MLPFLDHFLDSLNMSRTSFRVYIFAAVILMFCLIMCRTIFKMAI

YIYAYIINARRLRCFPEPPRRSWLLGHLGMFMPTEEGLTEISSAICNLRRTLLTWLGP

IPEVSLVHPDTVKPVVAASAAIAPKDELFYGFLRPWLGDGLLLSRGEKWGQHRRLLTP

AFHFDILKNYVKIFNQSTDIMLAKWRRLTAEGPVSLDMFEHVSLMTLDTLLKCTFSYD

SDCQEKPSDYISAIYELSSLVVKREHYLPHHFDFIYNLSSNGRKFRQACKTVHEFTAG

VVQQRKKALQEKGMEEWIKSKQGKTKDFIDILLLSKNEDGSQLSDEDMRAEVDTFMFE

GHDTTASGLSWILYNLACHPEYQEKCRKEITELLEGKDIKHLEWDELSKLPFTTMCIK

ESLRLHPPVVAVIRRCTEDIKLPKGDILPKGNCCIINIFGIHHNPDVWPNPQVYDPYR

FDPENLQERSSYAFVPFSAGPRNCIGQNFAMAEMKIVLALILYNFQVRLDETKTVRRK

PELILRAENGLWLQVEELKR

 

>CYP4F.a3 CX980073.1, CX968437.1, CF589328.1

NM_001015770.1 62% to 4F22

from refseq database

note there are two seqs that differ at only 2 nucleotides

but there are three ESTs that supprt each sequence so these

are probably alleles.

MLPFLDHFLDSLNMSRTSFRVYIFAAVILMFCLIMCRTIFKMAT

YIYAYIINARRLRCFPEPPRRSWLLGHLGMFMPTEEGLTEISSAICNLRRTLLTWLGP

IPEVSLVHPDTVKPVVAASAAIAPKDELFYGFLRPWLGDGLLLSRGEKWGQHRRLLTP

AFHFDILKNYVKIFNQSTDIMLAKWRRLTAEGPVSLDMFEHVSLMTLDTLLKCTFSYD

SDCQEKPSDYISAIYELSSLVVKREHYLPHHFDFIYNLSSNGRKFRQACKTVHEFTAG

VVQQRKKALQEKGMEEWIKSKQGKTKDFIDILLLSKNEDGSQLSDEDMRAEVDTFMFE

GHDTTASGLSWILYNLACHPEYQEKCRKEITELLEGKDIKHLEWDELSKLPFTTMCIK

ESLRLHPPVVAVIRRCTEDIKLPKGDILPKGNCCIINIFGIHHNPDVWPNPQVYDPYR

FDPENLQERSSYAFVPFSAGPRNCIGQNFAMAEMKIVLALILYNFQVRLDETKTVRRK

PELILRAENGLWLQVEELKR

 

>CYP4F.b1

NM_001015810.1 64% to 4F22

from refseq database

MLPSLDHFLDSLNMSRSSFRVYIFAAVILLFCLIMFRTILKMAI

YIYAYIINARRLRCFPEPPRRSWLLGHLGLFMPTEEGLTEVSNTISNFRKSFLTWMGP

ISLVSMVHPDTIKPMVAASAAIAPKDELFYGFLRPWLGDGLLLSRGEKWGRQRRLLTP

AFHFDILKNYVKIFNQSTDIMLAKWRRLAAVGPVSLDMFEHVSLMTLDTLLKCTFSYD

SDCQEKPSDYIAAIYELSSLVVKREHYLPHHFDFIYNLSSNGRKFHQACKTVHEFTAG

VVQQRKKALQEKGIEEWIKSKQGKTKDFIDILLLSKDEDGNQLSDEDMRAEVDTFMFE

GHDTTASGLSWILYNLACHPEYQEKCRKEITELLEGKDTKHLEWDELSQLPFTTMCIK

ESLRLHPPVTAVSRRCTEDIKLPDGKVIPKGNSCLISIYGTHHNPDVWPNPQVYDPYR

FDPEKLQERSSHAFVPFSAGPRNCIGQNFAMAEMKIVLALTLYNFYMRLDETKTVRRK

PELILRAENGLWLQVEELKQ

 

>CYP4V

CX972921.1 AL876066.2 CR409635.1 DT527908.1

best hit to CYP4V2 in ESTdb X.tropicalis Xin Liu

best match in human = 4V2 62%, a 4V ortholog

MELGGEVHLLVWVAAAVVLLTLLALSILPALQDYVRKRRILKPIPGPGPNYPLIGDALFLKN

NGGDFFLQICEYTESYRLQPLLKVWIGTIPFIVVYHADTVEPVLSSSKHMDKAFLYKFLHPW

LGKGLLTSTGEKWRSRRKMITPTFHFAILSEFLEVMNEQSKILVEKLQTHVDGESFDCFM

DVTLCALDIISETAMGRKIQAQSNRDSEYVQAIYKMSDIIQRRQKMPWLWLDFLYAHLRD

GKEHDKNLKILHSFTDKAILERAEELKKMGEQKKEHCDSDPESDKPKKRSAFLDMLLMAT

DDAGNKMSYMDIREEVDTFMFEGHDTTAAALNWSLFLLGSHPEAQRQVHKELDEVFGKSD

RPVTMDDLKKLRYLEAVIKESLRIYPSVPLFGRTVTEDCSIRGFHVPKGVN

VVIIPYALHRDPEYFPEPEEFRPERFFPENASGRNPYAYIPFSAGLRNCIGQRFALMEEK

VVLSSILRNYWVEASQKREELCLLGELILRPQDGMWIKLKNRETAPTA*

 

>CX843268.1 CX920105.1 CX846214.1 54% to CYP5A1 human, Ramy Naguib Attia

missing last exon in seq gap, not in ESTdb or UCSC genome browser

Trace archive 414884422 418775930

scaffold_1778:46661-92162

MGESWLWGLDGCTVTLTLVAGFLGLLYWYSVSAFWQLEKAGIKHPKPLPFIGNI

MLFQKGFWEGDRHLLKTYGPICGYYMGRRPMIVIAEPDAIKQVLQKDFVNFTNRMRLNLV

TKPMSDSLLCLRDDKWKRVRSVLTPSFSAARMKEMCPLINQCCDVLVENLMEYASSGEAC

NVQRCYACFTMDVVASVAFGTQVDSQRDSDHPLVQNCKRFLELFTPFKPVVLLCLAFPSI

MIPIARRLPNKHRDRINSFFLKVIRDIIAFRENQPPNERRRDFLQLMLDAR

DSAGHVSVDHFDIVNQADLSVPQNQDRGQDPPRKSTQKTLNEEEILGQAFIFLIAGYETT

CSLLSFASYLLATHPDCQEKLLKEVDEFSQEHEEADYNTVHDLPYMEMVINETLRMYPPA

YRFAREAARDCTVMGLGIPAGAVVEIPIGCLQNDPRFWHEPEKFNPER (2)

FTAEEKQKRHPFLFLPFGAGPRSCIGMRLALLEAKITLYRVLRKFRFQTCDLTQ (0)

(MISSING last exon)

 

>BC060001.1 Xenopus laevis CYP5A1 for comparison, 92% to X. tropicalis

MEEPTLWGLDGCTVTFALVAGFLGLLYWYSVSAFSQLDKVGIKH

PKPLPFIGNVMLFKKGFWEGDRHLIKTYGPICGYYMGRRPMIVIAEPDAIKQVLQKDF

VNFTNRMKLNLVTKPMSDSLLCLRDDKWKRVRSVMTPSFSAIRMKEMCPLINQCCDVL

VDNLLEYASSGEACNVQRCYACYTMDVVASVAFGTQVDSQRDPDHPLVQNCKRFLELF

TPFKPLILLCLAFPSIMIPIARRLPNKQRDRINSFFLKVIRDIIAFRENQPPNERRRD

FLQLMLDAQDSVSHVTVDHFDIVNQADLSVPQNSPSEKQDRGQDPPRKSSKKLNKEEI

LGQAFIFLIAGYETTCSLLSFTSYLLATHPDCQEKLLKEVDEFSQEHKEADYNTVHDL

PYMDMVINETLRMYPPAYRFAREAARDCTVMGQNIPAGAVVEIPIGCLQNDPRFWHEP

EKFNPERFTAEEKQKRHPFLFLPFGAGPRSCIGMRLALLEAKITLYRVLQKFRFQTCD

LTQ IPLQLSAMSTLRPKDGVYVTVVAR*

 

>CYP7A1

5102_prot 64% to human 7A1 scaffold_8:3335708-3339166 UCSC browser

MLTVSLIWGLVVALCCFFWLIVGIRRRQPGEPPLENGLIPYLGCALQFGA

NPLEFLRVRQNKFGNVFTCKIAGQFVHFVTDPFSFNSVMRHGRHFDWQKF

HFATSAKAFGHSNIDSSDSEVTQNVHDSFLKTLQGDALDPLISNMMENLQ

HTMLQNSSYKVNSKDWVTEGLYAFCYRVMFEAGYLTLFGKEFNSPEDKNL

ARQEAQRALILNAIENFKEFDKIFPALVAGLPIHVFKSAYSARENLAKDL

LHENLRKRNNISELISLRMFLNESMTSLNDMEKAKTHLALLWASQANTLP

ATFWSVFYLLRCPHAMKASTEEVQRVLEKASQKVNCDGRYIFLNRHELDD

MPVLDSIIKEAMRLSSASLNIRVAKENFVLHIDDKQAFNIRKDDIVALYP

QMVHLNPDIYEDPNNFKYDRYLGEDGKEKTSFFLNGRKLKYYYMPFGSGK

TKCPGRQFAVHEIKQMLTLIICYFDMELVDKNIRSPPLDQSRAGLGILQP

THDVDFRYKLKAH

 

>CYP7B1

CX908537.1 CX805060.1 CX814354.1 BX713865.1 CN080250.1

52% to 7B1 Y. Peng

MLDTLLYTVTGLVLGGLLLLLLLPRRQ

RREGEPPLENGWIPFIGLAYEFHKNALEFLISRQQKYGDIFTVHVAGKYITFIMDSTQFQ

YVIKHGKQLDFHEFANSLSSRTFDHPRLTEAKFPHLNDKLHRIYKIMQGRALDKLTDSMM

GNLQRVFKWKFSQATDWKAEKMYQFCCCIMFEASFMTLYGRDPIADGHKVISEIREK

FTKFDAKFPYLVINIPIALLGATKKIREELIHFFFPNKMEKRSEISEVVQERKNVLEQYE
LQDYDRAAHHFAFLWASVGNTIPATFWAMYYLVRHPEALAAVRDEIDHLLQSTGQKKGP
EYDIHITREQLDSMVLLGSAIKESFRLCAASMNIRLVQEDFDLELEGNQTIRLRKDDFIA
LYPPALHMDPEIYEDPERYKYDRFVENGKEKILFYKKGKKLKEYLMPFGSGTSKCPGRFF

AMNEIKQFLAVLLIYVEMELVEHKALGHDNRRSGLGILLPNSDIMFRFKPRTLDL*

 

>CYP8A1

DT406915.1 DN063760.1 best hit to CYP8A1 in ESTdb X.tropicalis H.Penmatsa

best match in human = 8A1 55%, an 8A1 ortholog
cannot extend in ESTdb

29605_prot model missing internal exon(s) UCSC browser

MVWAGVFSLLLLILCIGFCYYRFLHRTR (2)

QPNEPPLDRGSIPWLGYALEFGKDAAKFLSDMKEKHGDIFT (0)

IQVAGKFITVLLDPHSYDAVFWAPSNQLDFGKYARMLMDRMFDVRLPPSGGNEEKTLLAS (2)

HFQGSNLTKLTRSMFHNLSTILLKDRRLPNTEWTDQGLFDFIYGVMLR (2)

AGYLTLFGTESEQYTSTYSPMRDLKHSEDVYKEFRKLDWLLMKAARNTLST (1)

GEKEEASLVKNRLQKLISIKSHKGKCCKSSWFEYYQQHLEEIQATEDMQSRALVLQLWATQ (0)

GNAGPATFWLVLYLLKHPEAMAAVQAEFESIFQRNLQEKRHIEEMNQDLLDKMIIL (1)

DNVLNETLRLTAAPFISREVLTDMTLKLADGHQYQLRRGDRLCLFPFVSPQMDPEVHQQPE (0)

VFQHNRFLNADGTEKTEFYKKGKRLKYYNLPWGAGSNVCVGKKHAVNSIKQ (2)

FVCLLLFYFDFELKTPAEKIPEFNRTRYGFGLLQPEHDILFRYRRKA*

 

>CYP8B1

CX973251.1 best hit to CYP8B1 in ESTdb X.tropicalis M.Puljic

best match in human = 8B1 54%, an 8B1 ortholog
cannot extend in ESTdb
Trace archive 413359888 234313629 248374609 continues on CF343441.1
MALFLPIILALLVSVIGGLYLLGMFRKRRPDEPPLDKGTIPWLGYALDFRKNTSTFLQKMHKK

HGDIFTVQIAGYYFTFVMDPLSFGPIIKESKGNLDFEEFAKDLVLRVFGYQSFTNDHKML

EKSSTKHLMGDGLIVMTQAMMENLQNLMVHNIGSGKGEREWQQDGLFNYSYNIVFRAGYL

ALYGNEPAKNKGSKEKAKEFDRKHSDELFYEFRKYDQLFPRLAYAVLPPKDKIEAERLKR

LFWNMLSVKKTLQKENISGWIGEQHQ

QRAEQGLPEYMQDRFMFLLLWASQGNTGPASFWFLLYLLKHPEALKAVREEVEAVLKETG

QEVKPGGPLINLTRDMLMKTPVMDSAVEETLRLTAAPVLIRAVKQDMKIKMASGKDFSMR

KGDRVALFPYIAVQMDPEIHAEPEKFKYNRFLNEDGTKKTEFFKNGKKVKYYTMPWGAGS

TICPGRFFATNELKQFAFLMLTYFEFELVNPNEEIPSIDPNRWGFGTMQPTRDVQFRYRLRY*

 

>CYP11A1

53% to CYP11A1 human 62% to 11A1 Fugu

CX377330.2 CX940317, Q. Tran, S. Sarva

cannot extend in ESTdb

22200_prot scaffold_62:97483-110117 model short at N-term

prpose a GC boundary at the end exon 1 to preserve length

MLLLRRLPAVPSGLRMISHHSVVGAGPEMGTLSQVD

TPLPYNQMPGNWKRGWLELYRFWRKDGFHNIHYHMMENFQRFGPIYR (2)

EALGIYDSVFIQLPEDAATLFHVEGLH

PERLRVPPWYEYRDYRNRRYGVLLKKGEDWRSHRIALNREVLSMSAMSRF

LPLLDSVGQDFVHRAHIQVERSGRGKWTADLTNELFRFALESVCYVLYGQ

RLGLLQDYIDPESQQFIDSVSLMFNTTAPMLYLPPSLLRKINSSIWKDHV

RAWDAIFTHADRCIQQIYSSLRQQSDSTYSGVLSSLLLQDQMPLEDIKAS

VTELMAGGVDTTSMTLQWAMYELARTPSVQEKLRSEVIAARDASGKDLTA

LLKRIPLVKAALKETLRLHPVAITLQRYTQRDTVIRNYIIPQGTLVQVGL

YAMGRNPDIFALPQRFSPERWLGGGPTHFRGLGFGFGPRQCIGRRIAEIE

MQLFLIHILENFKIEINRMVDVGTTFNLILFPSKPIHLTLRPLK*

 

>54949_prot X. tropicalis predicted protein from UCSC browser

scaffold_1888:38981-41244

trace archive 483095886 413360250 242989364 mate = 242988788

584719101 584719005 234592190

50% to 11B2 human 49% to 11B1

scaffold_8238:179-3044 UCSC browser

387238891 419638572 mate = 419631756

MMAALVCGGTCSWDTVRGLRTKSTHFSTVQLAQDSQSLTSAKAQS

LPFKSIPCTGRNAWANLARYWKNNSFQQLHLVMEGHFQNLGPIYR (2)

ETLGTHSSVNIIHPQDVARLFQSEGVFPRRMGIEAWAAHRDLRNHKCGVFL (2)

LNGEDWRSDRLILNKEVLSLTGVKKFLPFLDEVANDFVSFLMRRINKNTRGTLTVDLYADLFRFTME (1)

ASGYVLYGQRLGLLEEHPNEDSLRFIRAVETMIKTTLPLVYLPHQLLRLTDSALWTQHMEAWDVIFQQ (1)

ADRCIQNIYQEFCLGQERGYSGIMAELLLQGELPLDSITANVTELMAGGVDT (0)

TAMPLLFTLFELARNPSVQQELRAEIKRAEGQCPKDMNQLLNSMPLLKGAIKETLR (2)

LYPVGITVQRYPMKDIVLQNYHIPAG (0)

TLVQVGLYPMGRSSELFQNPLRYDPTRWMRRDETNFKALAFGFGSRQCIGRRIAETEMMLFLMH (0)

VSIMNTDIKAQTYSGH*

 

>AF449175.1 Xenopus laevis steroid 11-beta-hydroxylase protein (CYP11B1)

FAFGPIYRENLGTHSSVNIIHPHDVARLFQSEGIFPRRMGIGVWAAHRDLRNHKCGVFL

LNGEEWRSDWLILNKEVLSLAGVKKFLPFLDEVANDFVSFLMRRINKNTRGTLTVDLYADLFRFTME

ASGYVLYGLRLGLLEEHPNEDSLRFIRAVETMIKTTLPLLYLPHQLLRLMDSSLWIQHMEAWDIIFQQ

TDRCIQNIYQEFCLGQERGYSGIMAELLLQGELPLDSIKANVTELMAGGVDT

TAMPLLFTLFELARNPITS

 

>CYP17

CX931022 DR883840.1 CX957932.1 50% to CYP17A1 human, Guo Zhu

MISYVAAAVLLAFGLALLSIWKFAGGKPRGAKYPNSLPCLPFIGSLLHLASHLAPHI

LFNKLQEKYGSLYSFKMGSHYIVIVNHHEHAKEVLLKKGKTFGGRPRAVTTDLLTRNAKD

IAFADYSPTWKFHRKLVHAALSMFGEGTVAIEKIISREAASLCQTLITFQGSPLDMAPEL

TRAVTNVVCALCFNARY

KRCDPEFEEMLAYSKGIVDTVAKDSLVDIFPWLQIFPNKDLEILKRSVAIRDKLLQKKLK

EHKEAFCGEEVNDLLDALLKAKLSMENNNSNISQEVGLTDDHLLMTVGDIFGAGVETTTT

VLKWAVAYLLHYPKVQAKIQEELDVKVGFGRHPVLSDRRILPYLDATISEVLRIRPVAPL

LIPHVALHESSIGEYTIPQDARVVINLWSLHHDPNEWXNPEEFIPDRFLDENGNHLYTPS

QSYLPFGAGIRVCLGEALAKMEIFLFLSWILQRFTLEVPAGDSLPDLDGKFGVVLQVKKFRVT

AKLREVWKNIDLTT*

 

>CYP19

CX885719.1 CX885718.2 DN051995.1

best hit to CYP19 in ESTdb X.tropicalis Z.Zhang

best match in human = CYP19 71%, CYP19 ortholog

5697_prot from UCSC browser scaffold_27:1527865-1546466

MEALNPVQYNSTEAVPTLAPATTVSLLLFIFLLIILWNQEETCLIPGPAYCMGLGP

LISYGRFLLTGIGKAANYYNNMYGEFVRVWINGEETLVISKASATFHIMKHSHYISRFGS

KLGLQCIGMNENGIIFNSNPSLWKVIRPFFIKALSGPGLMQTTEICIRSTKRYLDNLGNV

TNELGNVDVLKLMRLIMLDTSNNLFLRIPLD

ENEIVLKIQKYFDAWQALLLKPDIFFKISWLYKKYEKSANDLK

EAIEILIEQKRQKLSSSEKLDENMDFASELIFAQNHGDLTAENVNQCILE

MLIAAPDTMSVSLFFMLVLVAQHPKIEEGIMNEIDNVIGDRDVESNDIPN

LKVLENFIYESMRYQPVVDLVMRKALEDDMIDGYYVKKGTNIILNLGRMH

RIEYFPKPNEFTLENFEKTVPYRYFQPFGSGPRAC

AGKYIAMVMMKVILVTLFKRYKVQTLGGRCLENIQNNNDLSMHPDESQPCLEMIFIPKNTAELKQ*

 

>CYP20

DR851274.1 CX431022.2 CF240500.1 DT417160.1

best hit to CYP20 in ESTdb X.tropicalis Z.Zhang

best match in human = CYP20 72%, CYP20 ortholog
MLDFAIFAIT

FLLILVGAVLYLYPSSRQACGIPGLAPTEEKDGNLQDIVNSGSLHEFLVNLHERFGPVAS

FWFGRRLVVSLGSLDLLKQHINPNKTSDPFQMMLKSLLGYQSGVIGEAAESHVQKKLFEN

GIIKALHSNFSVVIKLSEELLAKWLTYPQSQHVPLCQHMLGFAMKSVTQTAMGSSFDDDQ

EVIHFRRNHDAIWSEIGKGFLDGSIERSPNRKKLYEDALMEMETVLKKAIKERKVKNPGR

HVFVDSLLQGNLSDKQVLEDSMIFSLAGCVITANLCTWAIYFLT

TSEEVQDKLFKEVTRVIGKGPITMDKLEQLSYCRQILCETVR

TASLTPISARLQELEGRVDQHIIPKETLVLYALGVVLQDNTAWPLAYRFDPDRFNDETAK

QSLTLLGFSGSQECPELRFAYMVAMVLLSVLVRKLHLLPVKGQVMETKYELVTSPKEEAW

ITVSKRS*

 

>CYP21 41% to human CYP21A2, 45% to Fugu CYP21, 46% to zebrafish CYP21

50686_prot scaffold_1026:150552-153289

50687_prot

MALLLLLLLFLVLLSLQWAKKYFLGFSNPHVHYPPCPPPLPFLGNLLHLA

HKDLPIHLLHLSRKYGSIYRLSFWGKDFVVLNNSNLIREALLKKWADFAG

RPKSYIGDLISLGGKDLSLGDYTPVWKVQRRLTHTSLQNCVRNDLENVLI

REARLLCQDCLNLNGEPVDISRSFSLRTCRIIAELTFGTTYDLSDPKFQE

IHKCIVNIIKLWESPSVTALDFIPFLQ (0)

KFPNQTLKLLMDTAKQRDSFIKSQVEAHKAHLPSSKCDEDILDGMMRFLL

EKSGDDSSGMSEFSEDHLHMAVVDLFIGGTETTASLLTWTVAYLMHYPEA

QDKIHQEIIGAVGMERYATYTDRNSLPYLNATVSEMLRLRPVVPLAVPHC

TIRDTSIAGYTIPKGTTVIPNIYAAHLDETIWDNPTQFYPENSHSSRALL

PFSVGARLCIGETLARMEVFFFLSHLLRDFRLLPPSPELLPELSGVFGIN

FKCRPFLVCISPRENTPKIQDLNNKT*

 

>CYP24A1

DT404490  67% to CYP24A1 human, Guo Zhu

Cannot extend in ESTdb

26514_prot scaffold_125:1595529-1618234 UCSC browser (errors in model)

MTSRIKRDFLGMLLKSRSISVQHSIPTATAVCDLKEKELPAPSSCPHSLA

ALPGPTKLPILGSLLDILRKGGLKRQHEALASYHKQFGKIFRMKLGSFDS

VHIGAPCLLEALYREESNYPKRLEIKPWKAYRDYRDEAYGLLILEGKDWQ

RVRSAFQQKLMKPTEVGKLDTKINEVLVDFMKRIDSVCDEDGT

IEDLYCELNKWSLESICLVLYEKRFGFLQPNLGEEAQNFITAIKM

MMSTFGLMMVTPVELHKSLNTKIWK

DHTHAWDSIFKTAKCHIDRRLTKLSSKGSEDFLCTIYNDSKLSKKEMYAT

ITEMLIGAVETTANSLLWAIFNISRNPHIQKKLLEEIESVLLPDQVPTAD

DIRNMPYLKACLKESMRITPSIPFTTRTLDKETVLGDYVLPKGTVLTINS

HVLGSNQECFDNWNQFRPERWLQQKNTINPFAHVPFGIGKRMCIGRRLAE

LQLQLTLCW (0)

LIRKYEIVATDNDPVETLHLGTLMPSRELPVAFHRR*

 

>CYP26A1

NM_001016147.2 (seq gap) CX456558.2 CX423895.2 CX424757.1

69% to human CYP26A1, L. Zhu

MDLYTLLTSALCTLALPVLLLLTAAKLWEVYCLSRKDASCRNPL

PPGTMGLPFFGETLQMVLQRRKFLQVKRRKYGRIYKTHLFGSPTVRVTGAENVRQILL

GEHKLVSVHWPASVRTILGAGCLSNLHDSEHKYTKKVIMQAFSREALANYVPLMEEEL

RRSVNLWLQSDSCVLVYPAIKRLMFRIAMRLLLGCDPQRLGREQEETLLEAFEEMTRN

LFSLPIDVPFSGLYRGLRARNIIHAQIEENIKEKLQREPDGQCRDALQLLIDHSRRTG

EPVNLQALKESATELLFGGHGTTASAATSLTTFLALHKDVLEK

VRKELESQGLLSNKPEEKKELSIEVLQQLKYTSCV

IKETLRLSPPVAGGFRVALKTFVLNGYQIPKGWNVIYSIADTHGEAELFPDKDE

FNPDRFLTPLPGDSSRFGFIPFGGGVRCCVGKEFAKILL

KVFIVELCRNCDWELLNGSPAMKTSPIICPVDNLPAKFKPFASSI

 

>CYP26B1

CX905408.1 CX388776.2 CX940676.2 82% to human 26B1 L. Chen

MIFQSFDLVSALATLAACLVSVALLLAVSQQLWQ

LRWAATRDKSCKLPIPKGSMGFPLVGETFHWILQGSDFQSSRREKYGNVFKTHLLGRPLI

RVTGAENVRKILMGEHHLVSTEWPRSTRMLLGPNSLANSIGDIHRHKRKVFSKIFSHEAL

ESYLPKIQLVIQDTLRVWSSNPESINVYCEAQKLTFRMAIRVLLGFR

LSDEELSQLFQVFQQFVENVFSLPVDVPFSGYRRGIRAREMLLKSLEKAIQEKLQNTQGK
DYADALDILIESGKEHGKELTMQELKDGTLELIFAAYATTASASTSLIMQLLKHPSVLEK
LREELRGNSILHNGCVCEGALRVETISSLHYLDCVIKEILRLFSPVSGGYRTVLQTFELD
GFQIPKGWSVLYSIRDTHDTAPVFKDVDVFDPDRFGQDRTEDKDGRFHYLPFGGGVRNCL
GKHLAKLFLKVLAIELASMSRFEL

ATRTFPKIMPVPVVHPADELKVRFFGLDSNQNEIMTETEAMLGATV*

 

>CYP26C1

CX830022.1 CN120927.1 CR567555.1 CX376643.2 CR567556.1

65% to human 26C1 L. Chen

note this seq has in insertion compared to human, but

the insertion is supported by several ESTs and is real

also seen in X. laevis (see below)

MFLLEISYTSFFEAALTSALSLVLLLAASHQLWS

LRWHSTRDRGSSLPLPKGSMGWPFFGETLHWLVQGSSFHSSRREKYGNIFKTHLLGKPVI
RVTGAENIRKILLGEHHLVSTQWPQSTQIILGSNTLSNSIGELHRQKRKMMSKVLSSAAL
ESYLPRIHEAVRWEVRSWCRGVGPVSMLSCAKALTFRIAARILLGLSLTDTQFQELTRTF
EQLVENLFCLPLDIPFSGLRKGMKARDTLHQYMEEAIKEKLSKRDPDACEDALDYLINSA

KEGGKEINMQELKESAIELIFAAFLTTASASTSLVLLLLKHPSAIHKIRQELASHGL

SEHCEQCLPATENPNNNILQDNGHQCLTAGCQLPLVMGTEGQVKTLWEQTKQLLTDRTDK

DPQNSLSSKNLVNGENRIQEAPCSHDKSNCSPVPGKLQNSVFEGT

CQQNISLEKLKSLHYLDC

VVKEVLRLLPPVSGGYRTALQTFELDGYQIPKGWSVMYSIRDTHETAAVYQNAEMFDPER

FSTERDEGKLGRFNYIPFGGGARSCIGKELAQIILKILAMELVTTAKWELATPSFPKM

QTVPVVHPVDGLQLSFSFLGSNDSDKAARNRSLANP*

 

>CYP26C1 BC111476 Xenopus laevis cDNA

MFLLEISYTSFFEATLTSVLSLVLLLAASHQLWSLRWHSTRDRG

STLPLPKGSMGWPFFGETLHWLVQGSSFHSSRREKYGNVFKTHLLGKPVIRVTGAENI

RKILLGEHSLVSTQWPQSTQMILGSNTLSNSIGELHRQKRKVMSKVLSSAALECYFPR

IQEAVRWEVRGWCRGVGPVSMFACAKALTFRIASRILLGLSLTDSQFHELARTFEQLV

ENLFSLPLDIPFSGLRKGIKARDTLHQYMEEAIKEKLTRRDPDACEDALDYLINSAKE

GGKEINMQELKESAIELLFAAFLTTASASTSLVLLLLKHPSAILKMRQELASHGFSKQ

CQCLPDMENPNNNILQDNGHRCLTAGCQLPLLMGTEGHLKTQGEQTEQLLTDKTDPQN

SLSSKNPLKGKNRIQEAPCSHDKSTCTPVPGKLQSPVSEGT SQQNSNLEKIKSLHYLE

CVVKEVLRLLPPVSGGYRTALQTFELDGYQIPKGWSVMYSIRDTHETAAVYQNAEMFD

PERFSSERDEGKLGKFNYIPFGGGVRSCIGKELAKVILKILAMELVTTAKWELATPSF

PKMQTVPVVHPVDGLQLSFSFLSSSDRDRAARNGSLA

 

>CYP27A1

DR832386.1 CX969640.1 DR852196.1

best hit to CYP27A1 in ESTdb X.tropicalis Xin Liu

best match in human = CYP27A1 55%, probably a CYP27A1 ortholog
cannot extend in ESTdb
Trace archive 570051728(+), walked to 411568263(+) 494948503(+)

MPSASKLGFLPLGRCRWLLHTGRGVSVSQGRAVAGAAVGAVGEEKKMKTF

EDLPGPSLLTNIYWVFLRGYILYTHELQAIYKKNYGPMWKST

LGRYKTVNIADVDILETVLRQEGKYPMRSDMEVWKEHRRQRDLSLGPFTEEGHKWHTLRS

VLNKRMLKPAEAMLYTGVVNEVVTDFLVRLEEMRSETPSGDMVNDIPNALYRFAFEGISY

ILFETRIGCLEKQIPVETQRFIDSIGAMLKNSIFVTIFPPWTNNLLPYYKRYMDSWDNIF

AFGNKLINEKMKKIEARLERDEEVQGEYLTYLISSGKLTDKEIYGSVAELLLAGVDTTSN

TLSWALYHLAREPEIXNALYQEVIGVVPGQNIPTSEDISSMPLLRAVIKETLRL

 

>CYP27A like CX393015 DR862140.1 DN030991.1 49% to 27A1 human

scaffold_31:1,728,368-1,736,213 19854_prot UCSC browser

MIAQRLQTGA

QALLQQSCRASVQTVRKKATLGVSGATVVEGKTLKTLDDLPGPSPLKLLYWIFLRGYLFR

THELQVIFRKTYGPMWKMSDRQHAMVTVASPDLLESLLRKEGKYPTRADMFIMREHRDLR

GHSYGPVTEEGHQWHRIRTILNQRMLKPRETVVYAGSMNEVVSDLLLKIKELTAQSSSGT

QVNGVAELMYKFAFESICTVLFETRLGCLNKEILPETQKFIDSIGIMLEHLTMLTRLPQW

TKGILPYWGRYIEAWDTIFDFGKKLIDKKMEDIEGRLKRGEEVEGEYLTYLLSSGKLSME

EVYGSVVELLQAGVDTTSN

TLTWALYQLSRNPEIQNNLYQEVIRVIPGETIPDSEAIARMPLLKAVIKETLRL

FPVVPENARMINEKEVTIKDYVFPVKTQFILGHYAISRDETTFPEADRF

LPERWLRDSGMKHHPFGSIPFGYGVRACVGRRIAELEMHLALSRIIKMFQVIPDPDLGEV GAKNRAVLVANRPVNLRFIERQPRPE*

 

>CYP27?

AL787054.2 DT421274.1 CR427561.1 Quynh Tran

50% to 27A1 human, 42% to 27B1 human and 38% to 27C1 human

Also BC094536.1

MRKGCHALLWKTCWANVQTGRE

KATLGVAGAVAEQEKKLKMSTDLPGPSTLNILYWVFLRGYVFESHKLQVIWKKRYGPLW

KTCIGSHRLVNVASPELLETLLRQEGKYPMRTDMFMWKEHRDLQDFSYGPLTEEGHRWHT

LRRVLNQRMLKPKEAVRYTESFNDVVTDLLVVIKEITAQSPNRTTVDGVANLMYKFAFES

ICTVLFETRIGCLKKEIPPETEKFINSIAIMLENQTRMEKLPRWTRGIFPYWRRFVEGWD

NIFIYGKKLIDKKMEEIEGRLKRGEEVEGEYLTYLLSSGKLSMEEICGSVAELLQAGVDTTSNT

LTWALYQLARNPEIQHNLHQEVIGVTPGDTIPDSEAIARMPLLRAVIKETLRLYPVVPEN

GRVVTEKDVILNDYIIPKNSQFVLCHYALSRDETQFPEPDRFLPERWLR

DSGMKHHPFSSIPFGYGVRACAGRRIAELEMHLALSRIIKMFQVVPDPELGEVGTKNRTV

LVSSRPINLQFIER*

 

>CYP27B1

NM_001006906.1 CYP27B1 54% to human

MAQTLKLGSSRSSQLFRGLQELWAETVLKNSEKVIKGHKSLADM

PGPSTVSFISDLFCRRGLARLHELQLEGKAKFGPVWKASFGPILTVHVAEPSLIEQVL

RQEGKHPIRSDLSSWKDYRQCRGHSYGLLTAEGEEWQQFRSILGKHMLKPKEVEAYSD

VLNDVVGDLIKKINYQRSQNQNNVVKDIAKEFYMFGLEGISSVLFESRIGCLEPTVPK

ETEKFIQSINTMFVMTLLTMAMPKFLHKIFRKPWQKFCESWDYMFAFAKGHIDKRMKD

VAQKLAQGEKVEGKYLTYYLAQEKIPMKSIYGNVTELLLAGVDTISSTLSWSLYELAQ

HPDIQSAVYSEVEEILQGKQIPSPSDVARMPLLKAVVKEVLRLYPVIPGNARVVADRD

IQVGDYIIPKKTLITLCHYATSRDENVFSNPNEFQPDRWLKKEDTHHPYASLPFGFGK

RSCIGRRIAELEVYLALARILSHFEVKPEQPGSLVMPMTRTLLVPEKEINLQFLER

 

>CYP27C1

NM_001011341.1 short at N-term (58 aa) CX494774.2 68% to human

from refseq database

MAALGQLLRGSARLEGLARSFHRFPGAQAAGQALEHEQAEGVLGATVKGSPMVKNLKE

MPGPSTMANLVEFFWRDGFGRIQEIQQKHARQYGRIFKSHFGPQ

FVVSIADKDMVAQVLRAERDAPQRANMESWHEYRELRGRSTGLISAEGEKWLNMRSVL

RQKILRPRDVAMYSGGVNEVVEDLVKRIRKLRVQESDGLTVTNVNDLYFKYSMEAIAT

ILYECRLGCLDDQIPQQTKEYIEALELMFSMFKTTMYAGAIPKWLRPLIPKPWREFCR

SWDGLFKFSQIHVDDRLRQIESQLEKGEEVQGGVLTHLLLSKELDLEEIYANMTEMLL

AGVDTTSFTLSWATYLLAKNPGIQEAVYQQIVQNFGKDQVPTAEDVPKMPLVRAVVKE

TLRLFPVLPGNGRVTQDDLVVGGYFIPKGTQLALCHYSTSYDAECFPAAEEFRPERWI

RSGNLERKENFGSIPFGYGIRSCIGRRVAELEMHLLLIQLLQNFEIKPSPQTTTVLPK

THGLLCPGGKINVRFVDRQ

 

>CYP39A1

CX851900.1 CX931743.2 CX956889.1

best hit to CYP39 in ESTdb X.tropicalis N.Liao

best match in human = CYP39 51%, probably a CYP39 ortholog

MDPIASVSSALLSPTAALGLLVALLTAVLVRYLLPNG

SQKPPYPPCIRGWIPWFGAAFDMGKAPLEFIARAREKHGPIFTVLAAGNRLTFLSGKEGI

SAFFSSKEADFQQAVQKPVQHTASINKEDFLKSHSSIHETIKLRLSQNRLHLYFDRIRNEFSTRIE

LLNPEGTEDLFALVKKVMYPAVADTLFGKGLCPTGKGKLEEFAEHFWKFDEGFEYGSQLP

EFLLRDWSQSKQWLLRLFKKIVIEAEMNNPLEETSKTLHQHLLDTLKGNSTYNNSLLLLW

ASQANANPVTFWTLGFIISDPLVYKAAMDEIHSVFGKAGNKELNMNEAELKRLPFIKTCV

LEAIRLRSPGAITRKAVQPLKINNYLVPAGDLLMLSPYWLHRDPTLFPEPEMFR

PERWSKANLEKNVFLEGFVAFGGGKYQCPGRWFALMEMHMLVVMMLYKYEFSLLDPLPKQ

SNLHLVGTQQPDGPCRVRYKLRK*

 

>CYP46A1

NM_001032346 CYP46 ortholog 54% to human CYP46, 82% to NM_001032348

from refseq database, note: this frog has two CYP46 genes

MGLWALIGWAALLLLALILICFLLFSGYIHYIHMKYDHIPGPPR

DSFFLGHSPTMLRLMKNNLLMYDHFLGWVQKYGPVVRINGLHRVIILVVSPEAVKELL

MSPKYSKDKFYDVIANMFGVRFMGKGLVTDRDYDHWHKQRRIMDPAFSRTYLMGLMGP

FNEKAEELMEKLMEKADGKCEIKMHDMLSRLTLDVIGKVAFGMELNSLNDDLTPFPKA

ISLVMKGIVEMRNPMVRYSLAKRGFIRKVQESIRLLRQTGKECIERRQKQIQDGEEIP

VDILTQILKGAAMEEECDPEILLDNFVTFFIAGQETTANQLSFVVMELGRNPEILEKA

QAEIDEVIGSKRDIEYEDLGKLQYLSQVLKETLRLYPTAPGTSRGLTEDMVIDGVKVP

ENVTIMLNSYIMGRMEQYYSDPLTFNPDRFSPDAPKPYYSYFPFSLGPRSCIGQVFSQ

MEAKVVMAKLLQRYEFELAEGQSFKILDTGTLRPLDGVICRLRPRTSKKAATLQ

 

>CYP46A4

NM_001032348.1 CYP46 again 53% to human CYP46

from refseq database, note: this frog has two CYP46 genes

zebrafish also has 2 CYP46 genes

MGLWALFGWASLLLLALTLICFLLFCGYIQYIHMKYDHIPGPPRD

GFIFGHSPTILRLMKNNKVVYDQYLDWVQ

YGPVVRINALHRVIVLITSPEGVK

EFLMSPKYSKNDIYDRVATLYGM

RFMGKGLVTDKDHDHWYKQRRIMDPAFSR

TYLMDLMGPFNEKAEELMERLSEQADGKSDTEMHNLFSRVTLDVIAK

VAFGMELNSLKDDLTPLPQAISLVMNGI

VETRNPMIKYSLAKRGFIRKVQESIRLLR QTGKECIERRQKQIQDGEEIP

MDILTQILKGAALEEDCDPETLLDNFVTFFIAGQETTANQLSFAVMELGRNPEILQKA

QKEIDEVIGSRRFIEHEDLSKLHYLSQVLKETLRLYPTAPGTSRGLKEEIVIEGVRIP

PNVNVMFNSYIMGRMEQNYTDPLTFNPDRFSPGAPKPYYTYFPFSLGPRSCIGQVFSQ

MEAKVVMAKLLQRYDFELAEGQSFSIFDTGSLRPLDGVICRLRPRTSNTATTNKYIF

 

>CYP46A5 CX981536.1 CX970619.1 CX970620.1 CX370643.2
S. Aggarwal 87% to 46A1 Xenopus trop. and 84% to 46A4
54% to 46A1 human
scaffold_588:627624-672691
MGLWAILGWAALLLLALILICFLLYCGYIHYIHMKYDHIPGPPRDR (2)
SFIFGHSTALLKLVNENLLMYDYFLDW (2)
VHKYGPVMRINGLHKVAVLVASPEGIK (0)
EFLMSPKYLKDEFYDFFGSLFGER (2)
LMGKGLLTDRDYDHWHKQRRIMDPAFSRT (2)
YLMGLMGPFNEKAEELMEKLSENSDRKCEVNMHDMFSKVTLDVIGK (0)

VGFGMELNSLNDDQTPFPRAISLVMKGSVEIRNPMIK (0)

YSLAKRGLIRKVQESIRLLRQTGKECIERRQKQIQDGEEIPVDILTQILRGA (1)

ALEKDCDPETLLDNFVTFFIA (1)
GQETTANQLSFAVMSLGRNPEILKK (2)
AQAEIDEVIGSKRDIEYEDLGKLSYLSQ (0)
VLKETLRLYPTAPGTSRTLENEIVIDEVRIPGNVTLM (0)
LNSYVMGRMEQYYKDPLMFNPDRFSPDAPK (2)
PYFTYFPFSLGPRNCIGQVFSQ (0)
MEAKVVMAKFLQRYEFELAEGQSFKILDTGTLRPLDGVICRLRSRTNNKKANK*

 

>CYP51A1

NM_001016194.2 80% to CYP51 human, CYP51 ortholog L.Zhu

MLLSLWEAGGTLLEEAVGGSLASRILIPCTFLLALAYVSKLAFK

HLQAEDPGNVKYPPFISSNIPFLGHAIAFGKSPISFLENAYDKYGPVFSFTMVGKTFT

YLVGSDAAALLFNSKNEDLNAEDVYSRLTTPVFGKGVAYDVPNPIFLEQKKMLKTGLN

IAHFKTHVQMIEEETQEYFERWGDSGVRNLFEALSELIILTASRCLHGKEIRSMLNER

VAQLYADLDGGFTHAAWLLPGWLPLPSFRRRDRAHREIKNIFYQVIQKRRNSAEREDD

MLQTLLDATYKDGTPLNDDEIAGMLIGLLLAGQHTSSTTSAWMGFFLAKNKSLQAQCF

AEQKAVCGEDLPPLNYDQLKDLQALDRCIKETLRLRPPIMTMMRMARTPQSVAGYNIP

PGHQVCVSPTVNHRLRDTWDKNTDFNPDRYLHDNPAAGEKFAYVPFGAGRHRCIGENF

AYVQIKTIWSTMLRMYEFELVDGYFPTINYTTMIHTPNNPVIRYKRRKN