Diatom P450s from Thalassiosira pseudonana
data from JGI (Joint Genome Institute)
LINKS:
There are 10 P450s found. One is a clear CYP97E sequence related to
CYP97E1 from another diatom. Another is a second CYP97 related to CYP97B3
from plants and a CYP97 from Chlamydomonas. There is a CYP51 present.
The other 7 P450s are pretty unique. Only CYP97E2 has introns. The
Others are intron free.
Last modified July 26, 2004
D. Nelson
>CYP51C1 scaffold 68 48% to 51G1 Arabidopsis
MYCNSPTRLTTNPQRIYKCNPSRLSLTHTQFFHKRLTFLIGPEAQEPFFKAPDEVLSQN
EVYGFMKPVFGPGIVYDASKKNRQVQFQSMANGLRTARLKGYTAKIERETRQYLESWGES
GELDLFHALSELTILTASRCLHGDDVRENLFKEVSELYHDLDQGLTPLTVFFPNAPTKSH
MKRNAARAKMVELFSKVIKNRRDNPDVQHSDGTDILSIFMDVKYKDGSNITDEQVTGLLI
ALLFAGQHTSCITSTWTSLFILNNPAILKRIIAEQNDVFGSQPDADVDYKMVNEDMPLLH
NSMKEALRLCPPLILLIRYALKDVKVKAAGKDYTIPKGDMVLISPSVGMRIPEVFKEPNT
FDPDRFGPDREEDKSSPFAYMGFGGGMHSCMGQNFAFVQVKTILSVLFREFELEMVSETM
PDIDYEAMVVGPKGDCRVRYKRRQ*
>CYP97E1 from the diatom Skeletonema costatum AF459441
The Diatom mRNA may be short at the N-terminal
RQAFLALSIGLLSVGGVNSFQAPVAGSRVVTAAPSITQLFSTLDKKEVVETKKQQSSTST
TTPLVDVSSSRVDVDVNVLD
MASYESDLLSTWDEDPSLQKGFDWEIEKLRRYFAGLRQTPDGRW
VRKSTLFEFLVTNSPSKVVGVGPDGERYESPPKPVNIFDVGVLVGKNTLTWLGFGPNL
GMAAVPDAVIQKYEGSFFTFIKGALGGDLQTLAGGPLFLLLAKYYTDHGPIFNLSFGP
KSFLVISDPVMARHILRDSSPEQYCKGMLAEILEPIMGDGLIPADPKIWKVRRRAVVP
GFHKKWLNSMIGLFGDCGDRLVDDLEKRSTSDKPVIDMEERFCSVTLDIIGKAVFNYD
FGSVTKESPIVKAVYRVLREAEHRSSSFIPYWNLPYAEKWMVGQVEFRKDMGMLDDIL
AKLINRAVETRQEATVEELEERETSDDPSLLRFLVDMRGEDLTSKVLRDDLMTMLIAG
HETTAAMLTWTMFGLVSNDPGMMKEIQAEVRTVMGNKSRPDYDDVVAMKKLRYALIEA
LRLYPEPPVLIRRARQEDTLPPGGTGLSGGVKVLRGTDIFISTWNLHRAPEYWENADK
YDPTRWERPFKNPGVKGWNGYDPEKQSSQSLYPNEITSDYAFLPFGAGKRKCIGDQFA
MLEASVTLSMIMNKFDFTLVGTPEDVGMKTGATIHTMNGLNMMVSPRSETNPIPGTNE
WWTKQHLMRGLSSTGRPYTSDEDAAWTTSANGMRP
>CYP97E2 52% to 97B3 from Arabidopsis from GDLQ to VVSRR
85% identical to Skeletonema costatum AF459441 over 654 aa
scaffold 12 170174 to 172672 53% to scaffold 23 seq
MCTKLSSRRTLLALYFAFTGCTAFQLPSATPSRASITKAYSTHLDKEIKSKTPLVNPSKIYT
QADIDTLDLSSYENELLAAWDTDSSLQRGFDWEIEKLRRNFAGLRQREDGQWVRKPSLFD
FLVTNTPSNVVGVSNTGERYESPPKPVNMLDVGLLITKNLLNTLGFGPSLGMAAVPDAVI
QKYEGSFFSFIKGVLGGDLQTLAGGPLFLLLAKYYQDYGPIFNLSFGPKSFLVISDPVMA
RHILRDSSPEQYCKGMLAEILEPIMGDGLIPADPKIWKVR (2?)
RRAVVPGFHKKWLNNMVTLFGDCGERLVNDLDARATAKTPVDMEERFCSV
TLDIIGKAVFNYDFGSVTKESPIVKAVYRVLREAEHRSSSFIPYWDLPYADKWMGGQVEF
RKDMGMLDDILTKLINRAIETRDEASVEELEDRDVGDDPSLLRFLADMRGEDLTSKVLRD
DLMTMLIAGHETTA (0)
AMLTWTVFGLVSNDSGLMKEIQAEVRTVMGDKLRPDYDDIAKMKKMRYALIEALRLYPEPPVLIRRARSEDN
LPAGGSGLSGGVKVLRGTDIFISTWNLHRAPEYWENPEKYDPTRWERRFKNPGVKGWNGY
DPEKQSESSLYPNEITADYAFLPFGAGKRKCIGDQFAMLEASVTL (0)
AMIINKFDFTLVGSPKDVGMKTGATIHTMNGLNMVVSRRSEDNPIPETNDYWIQQHLSRGLNVNGRPYSS
NEDAAWTASSRDKNEGVVSRLVN*
>CYP97F1 Scaffold 23 132652 to 134655 also on scaffold 784
52% to 97B3 Arabidopsis, 52% to 97E1 from another diatom
MMIHHSRVCV
LQLRIGQHNTARAKEKPNPNMKFTTALAVLCWTSVTNAFVPSSFTSPALKNEQQQVRASS
PLYALDTKEKEETTTATSASSTDTSSTPAAAATEESEGLPWWWEYIWKLPVMQPAEPGTD
IIFADSARVLRTNIEQIYGGFPSLDQCPLAEGEITDIADGTMFIGLQRYQQQYGSPYKLC
FGPKSFLVISDPVQAKHVLRDANTLYDKGILAEILKPIMGKGLIPADPETWSVRRRAIVP
AFHKAWLNHMVGLFGYCNEGLIASLEEAAKKNDAPNGQQGGKIEMEEKFCSVALDIIGLS
VFNYEFGSVSEESPVIKAVYSALVEAEHRSMTPAPYWDLPFANEVVPRLRKFNSDLKVLD
DVLTDLIDRAKNSRQVEDIEELEKRDYANVKDPSLLRFLVDMRGADIDNKQLRDDLMTML
IAGHETTAAVLTWALFELTKHPEQMAKVRAEIDSVLGDRTPTYDDIKEMQYLRLVVAETL
RLYPEPPLLIRRCRTENKLPKGGGREATVIRGMDIFLSLYNLHHDERFWPEPNEFKPERW
ESKYINPEVPEWAGYDPAKWINTNLYPNEVASDFAYLPFGGGARKCVGDEFATLEATVTL
AMLLRRFEFEFDSAKLAASKIDIMDHPEDLEHAVGMRTGATIHTRKGLHMVIRKREL*
>CYP5018A1 Scaffold 2 194927 to 196540 (minus strand) also on scaffold 853
MFDTMTAPSSPSSSRSALVTAAAASLV
SLSLLSIYKRRRTTSNNELPYPPTPPDRNYFLGHAMSLRRVPGEPKKSHDLLFLNWMNKL
NSKVVMFELPFLGRLFGLGRMICVGDAEIARHILVTANYNKSPTYSVLQPLIGMSSMVAT
EGKMWKDQRKLYNPGFSPEFLRNCVSTIIEKCNRFIVRCDGDVENGVATDMLARSIDLTS
DVIVQVAFGEDWGVDSKDKHGIETLQTIRDLTVAVGENMTNPLRKYFGLRSIWRTRRLSA
ALDQDMQNLVKRRLAQVLAGDADLEKDILSLTLSGVLEAKQESKSGAISLSKDEMERMTS
QLKTFYFAGHDTSSSAIAWAYWLLTKHPESLQRAREEVVSHLGRDWSDEALTGDSLCNTT
YQCLQKCEYLDAVARETLRLYPPAASTRWATDAKGANAGGFNLEKSVVHVNFYAIQRDPD
VWENPDSFVPERFLGEEGRKRILSYSFLPFSKGSRDCIGKYFALLEIKIALAALISRYDA
SVVNENEQYVIRLTSVPHDGCKVNLSRRRK*
>CYP5019A1 Scaffold 43 84821 to 86698 (minus strand)
MLSLCSHIFFHIVHVHLQPS
WLLGHYPLFVNPDKHHQTFTAHATPSGISALWGPSTDKFFSSVRADHCRSILRQSSSRNF
VSFIVRHGRRTLGEESIILINGGKRWKRQRKVIQKAFHLEVVKGGREAVGEVADVVVDWI
LRACSGRSDSGVHDGDCGGRLVGADGNEKVCVEAEDFFKLFALEVFGKVAMGYDFRCFPS
LATSDDNGNSNSNVALHNKKDVVSNGTHTYNDNACNCLQMPPDAQSFDFLNVDIGNRSTP
TSLMNPCMQFYSIPTPHNKKYHHHMDRIKGLVGKIIGLQLNRLCSDGGVMEGDTNMITHL
LQSTIEENFSPTDTDGNTGCPFSSSLPSKSIPDTVISNLTPSDKDQIIESVSKMLITFLM
AGYETTAISMSFVVYFLSKYKRCQERCAEEARRVLGRFGVHGTDIDDDELVYCRAVFMET
IRLHLPVMFTTRVTEKEMSFDTGLEEGHNVTIPKGTRCVVCPTVVHMDERNFERAEEFLP
ERWVRWERGRWVERDYETEGLKSTALPSITEDEQDSPPISAKYDEENNSASSISAADPHN
FFSFSDGARNCVGKRLAIMESTILIAVLLRDVCVDFAEEGFEMKKVRRFVTCGPESLPVVFWRRE*
>CYP5020A1 scaffold 88 80450 to 82390 (plus strand)
MITAHSPIIIHSSRRQCNRKRYTKSLSPTMEGSSSSSSSSAMLFRLSSFL
MTQQEDSLLSPPLFDNLIPSSFNNNSLLVLLTSITIILLL
IIIIKQLHSHHTRQPFHPKHNLPPSVPNPHPILGHHAYTAYTPTSPQSDHVFKHHANKQG
ISSFWFFSTPSISITRASHARSILRHSVERRHMSFFLRRFKRVMGKDSLIYVEKDQLWRS
HRSIVQRAFTAEAVMGMGRKVWRVADSFAGSILEESRKNSGADAGTGGYRKDALELFKWV
IIDVFGVVALGYDFQCTKSLKLAPLAQSFDYSIDDVQERTLASEMCNPFMQCYWLPTERN
RRYAFHNVRVQGLMSDICEKRKSTIDSEEVRRPAADSTSHMQLRTFSSSLASRGDDLLTI
LLKTPSPNNNQGNDNVDVMSTKEVTEMLLTLFYAGYDTSSTALSCAMYLLSTHKEIQTIC
ANEARAASISLQQDPSTWNEQLAYCKAVVMESLRLHSPVTLTSRTLETNVQLDECTSVPK
GTRVYIPIELIQTSECNFARATEFLPERWVRRDESTGLWVERDYKSEESQSFVSKDSSYI
PPANPHNIFAFSDGARNCVGRRLAMMESTMVLACMVRDLVVNVPEGFKLEMKKKFVMTTP
EEVPLIFTERCRDGRV*
>CYP5021A1 Scaffold 8 518667 to 520907 (plus strand) also on scaffold 849 and scaffold 3
MDNRPLSSLTPECTIDRY
PNATTSSFQSFGGAATCSGSSTASTTTHLLLSSILIGLLSPIISCLFVILLLMFFKRRQT
RHELEGGKCNLPTVVWRPRFMNYTSKDEGSDEDIEIDDYEAWAREYARSLQSDDGNNSGK
STHMKKLGSSAITNILPRMERLNGPYGMYATVYGVSTKVLHVAHPVPARAILTGSGVVDV
GGMNNGIGECFERQNSSFLGEISQSVTRPFKRLSSGMEGAAISPSEERKQRRRSSALRLL
TGSTKYPAYDHFKNFSGDGVFTADGSDWKAKRASVLHCLLRSGGADCMLEKEINRAADSF
EREVTWAKQTMNKEGDDKDGPVMNVVTMLQRSTIGLIYRIITHHNVEFSPDIDTNEQFIC
SPKSSAASLTSLDKNQHNGAKASEDDNHTKPDVKKDSQMKLLLPIYLDAVTKIRMIVLAQ
SRSIWYLLPRWAYRTFSPMYRDEERTMVPIRQFARLACENAVEGSPLELLSQRSSHASKE
GEATSAVSKDLLDEAITLLFAGQDTSAATLSWTLHLLSLHPQEQQKVVEEVRSVLSSLDE
GEMVSKNTISQLPYLDAVIKESMRLYPVAPFIVRKLTTDVTIPIESQSVEDDATTTTIPE
STFACIWIYALQRNPKLWTKPDEFIPERWIEPDLRSNDLGQQEVGSYMPFALGPRNCLGQ
PIAQVILRVLLARILNKYEVRDPKFDALQRLGEETGEAFDTKYLLKDMQAGFTVLPSNGL
RIKLVERC*
>CYP5022A1 Scaffold 45 115153 to 116619
MGEMTESTVVASLLFAYDVLNSYPFVPPFRSVYGMSILGDDELIICDPKVFDKYVVRAEDKYPIGGAEA
VTTFTDFYKENNLTKALEGTGHGPEWKAWRKSLDPDMYVAWETYLPTIADAANKISKVAG
TSNIEFVDFLSRSAFDMFTAVMYGESPQTTNNNVASKEDIEFVRASQSAFDVTGRLLSNP
LDKLFGGILQSEFNVNMEKTYYFANLRTKQYADGATELQRAAKTAHGEESESKCPITAIK
TQFLNPSFIERLVHRGELSNDNIAELAAPLLMAGVDTTAYVMSWLYLNLASNLDVQAKLA
EELKSVLDGADLTTKEQMDSLPYLKACIRESHRLTPATPILAKTLEKDIDVVVDDACYKV
EAGHRISLNLRAIPMDPAYVDNPTLFQPERFLPSAVEARRGTPSEIIDHPSFADPFGRGK
RRCLGVNVAVAEIMILAARLVQDWEIGLVDTEDRHRWTAKQKLMLKADPYPSMTVSPRS*
>CYP5023A1 Scaffold 85 7912 to 9600 (plus strand)
MQQMATTCFVSSFTLPSSRITGPPTSFGRTSIQHEHLPSLTNLCAIYERGSTQLKA
ALLPSVVTKISPPLRNTLVLAAAAVAIYKNRHRLYLGSDPDPNFSEPLPEGSLGCPLLGN
LGFFTKNGTPETGPGEFFRSQAKTVDNASIFKYMALGKPVAMVSGMKNVKSAFNTEFKKI
RTGSLIKNFYRLFGKQSLLFISDADRHQYLRRLVGQSMTPEAISNAMPALVNVATDQIEL
LSEHPITVMESALTNYTLDVAWRQILGLDLKEDEIVTFYDAVDNWIGGITNVRTLFLPGM
ENTKAGKGLIYLKSKIERRIDELLANGPDGSTMSYMVYAKDEEDATKSMTREEIIDNALL
LILAGSETAASTLTVAMLALGLNKDAFQKLKDEQRQLISRHGEELTRSMLDKDCPYLDAV
VKETMRIKPLAGTGAVRIAEETIVVDGKQIPKGYGVAFNIFLTHASDPVVKEEDGSHMDV
AKGFKPERWLSDETKPTEYMPFGYGARFCLGYNLAMAEMKVFLALFARRVDYDLVNMTPD
HVTWKKMSIIPKPDDGAVISVTSISK*
>CYP5024A1 Scaffold 15 295386 to 296981 (plus strand)
MLIRTNTIAPLAVILLSVGGCHSFAPVVQHQCHGASLFSSTAAAEQKDVSKL
PLPPNPEKKDISELPLPPNLGMNLFRNIRDTFSYLSNPDRFVADRSAKLGPIFLAYQFFK
PTVYCGGKENVAEFISGTELKNKVIHPALPESFVELHTKWGALNMDATESMFKEARVLFG
DVLSSREALEQYSAAADREISDYVDNLAERVKTNPQQPIYLAPELKSLCLQIFSKIFSGE
GLSEQQMQQFNDYNDALLALSKGTDQYKKGKTALDELRVEMLRRFRALDDPNIPSDTPGK
WYHDQIFGRENFDNEERIATGMVLFIWGAFIECASLCVDSLALSYKYGLQEKIDGVREEY
ATRQATGLSSSDPKFWNTNDMPYTNGILRETLRTAPPGAGVPRFSYDDFELAGYRIPANY
PVMLDPRIGNMDANLYTKPEQFEPLRWVPTKAKESACPFQGSALNLGIGSWFPGGFGAHQ
CPGVPLAELVGRMFITKISNEFDAWEFSGEGLDKSGDIDYVKIPVRIPRDDFGMRFTLN*