Amphioxus (cephalochordate) ESTs, WGS and HTGS sequences

 

Branchiostoma floridae (many seqs)

Branchiostoma belcheri (1 seq)

 

D. Nelson August 13, 2004, added CYP51 Jan. 19, 2005,

Many new WGS Trace file sequences. modified May 10, 2005

 

To retrieve the trace archive files such as AFSA125350.y1

http://www.ncbi.nlm.nih.gov/Traces/trace.cgi?size=1&cmd=retrieve&s=&m=&retrieve=Search&val=TRACE_NAME%3D%27ATUP646033.g1%27&file=trace&gz=on&fasta=on&scfrcf=scf&dopt=fasta

 

and add the accession number into the search window as shown here:

 

TRACE_NAME='ATUP646033.g1'

 

The TRACE_NAME= limits to the appropriate field.  The quotes around the accession number are needed.

 

CYP2 clan

 

>AFPZ293133.y1 exon 1 90% to AC150407.1 gene at 44817-45095

ATGN222242.g1, ATGI214615.g1, AFSA852354.g2, AFSA840251.g2

AFSA264268.g2, AFSA229077.g2, ATGN243325.b1, ATGI179003.g1

ATUP200634.x2,

Exon 2 ATUP200634.y2 mate pair to ATUP200634.x2

AFSA601420.g2, AFSA489196.b2, AFSA489196.g2

MESAVSFVSGLLANLTLQSILVLVLAFLVTYWLLGTGDRQKNLPPGPRGLPLLGN

LLSFRPSYLLSNLAAWRDKYGDVFCVRIANRLAVVLNG ()

HKAIQDALVKQPEVFSNRPPPFIDSAKDQG

 

AFSA125350.y1 walk with this one it overlaps AFSA489196.g2 on the way to exon 3

 

>AFSA820329.b2 new exon 3 AFSA896985.b2 ATUP864470.b1 ATUP552587.x1

AFPZ806277.x1 AFPZ776469.x1 AFPZ908625.y1 ATGI225190.b1 AFPZ278931.y1

AFSA277571.g2

Exon 4 ATGN165786.g1 exon 4 ATUP951310.g1 ATUP682533.g1 ATUP194164.y2

ATGI58079.g1 ATWW119241.g1 ATUP547159.g1 AFPZ278931.x4 AFPZ278931.x1

ASWX119777.b2 (joins exon 3 of AFSA820329.b2)

ASWX119777.g2 exon 2 seq with frameshift mate pair of ASWX119777.b2

AFSA254465.b2 exon 2 seq

ATUP194164.x2 mate pair of ATUP194164.y2, AFPZ526846.y1

HKAIQDALVKQPEVFSNRPPPMLDSAKDQG

GVAMSEYGEDWKVKRRIGLT

ALRQFGMGKRSLEGKITEEARILCDVLAEKNGTATDMSLLLSNAVSNVICAMSF

GERFEHNDMEFQRLMRLMSEMVGGSGGNAGSSISRFIPLVRKLPFFKKGLERRV

KMSLEVVDFIKSKIKEHKETFDPADIRDIIDVYLMETQQQTPDDADRTITEMGMINTMRD

LFIAGAETTATTLKWGLLYLARHLEVQRKVQDEIDREFGASPPTLSQRGKLPYTEATILE

IQRIRPIAPLAVPHTTSTATVLHGFDIPADTFVIPNLWSAMMDPAVAPDPETFNPDRFLD

EDGTVVRPEWLIPFSL

GRRQCLGEQLAKMELFLFLTTLLQHFTFKLPDGAPALSMEGSLGIVLAPKAYQICAVPRDN*

 

>AFPZ869380.y1 AFSA270651.b2 (4 aa diffs to ATGN165786.g1) exon 4

GRRQCLGEQLAKMELFLFLATLLQHFTFKLPYGAPAPSMEGSMGIVLAPKAYQICAVPRDN*

 

>ASFW203349.g2 WGS exon 1 94% to AC150407.1 gene at 44817-45095

AFSA277180.g2, AFSA337814.g2, AFSA337814.b2

MESAVSFVSGLLANLTLQSTLVLVLAFLVTYWLLGAGDRQKNLPPGPRGLPLLGNLLSF

RPSHLLSNLAAWRQQYGDVFCVRIANRLAVVLNG

 

>AFPZ80615.b2 ASWX143119.b2 ATUP19565.y2 ATUP590430.x1 ATUP871195.g1

AFPZ617150.x1 ATGI45147.b1 AFPZ676420.b2 ATUP848153.y1(poor seq)

New exon 3 seq

Exon 4 ATGI75778.b1 ATWX106000.g1 ATWX78968.b1 ATUP879942.b1

ATUP25175.g1 AFSA516244.g2 AFSA298646.b2 AFPZ676420.b2(joins with exon 3)

ATGI213326.g1 overlaps ATUP871195.g1 exon 3 and goes 800bp upstream

Has exact match to exon 2 from AC150407.1 first seq.

Note: there are two seqs with exact aa seq but one silent nuc diff

These are proably from different genes.

ATGI213326.g1 AFPZ69924.g2 ASWX145321.g2 ATUP276953.x2 are 100% identical

In the exon 2 region

HKAIQDALVKQPEVFSNRPPAVVDSANDQ

GVVMAQYGEGWKVKRRLGLTALRQFGMGKRSLEGKITEEARAVCDILAEK

DGAATDISLLLSNGVSNVICSMSFGERFEYNDTEFQRLMRLMSELVTGSAISR

FNPYVRKLPFIKKGVESRMKMAKEITEFIKAKIKEHKDTFDPADIRDIIDVYLMETQQQI

PDDGDRTITEMGMINTMRDLFIAGAETTATTLKWGLLYLARHLEVQRKVQDEIDQEFGSS

PPTLSQRGKLPYTEATILEIQRIRPIVPLSVPHTTSAATVLHGFDIPANTFVIPNLWSAM

MDPTVAPDPETFNPDRFLDEDGTHVRPEWFIPFSL (1)

GRRQCLGEQLAKMELFIFLTTLLQHFTFKLPDGAPAPSMDGSLGVVLAPDPYQICAIQRD

 

>AC150407.1 two linked genes 5000-6800 region and 44000-48000 region

AFSA41261.x1 WGS exon 1 96% to AC150407.1 gene at 44817-45095

ATUP33911.b1, ASWX26139.b2, ATUP603318.x1, AWXX12138.b1, AFPZ106727.x1

ATWW205625.g1, ATWW98645.b1, AFSA387158.g2, AFSA192393.g2, ATGN203140.g1

ATUP879942.g1, AFSA748020.g2, ATGN123505.b1, ATUP723083.y1, AFPZ103877.x1

APWS99281.g1, ATUP590430.y1 (mate pair to exon 1 sequence ATUP590430.x1)

ATUP871195.b1 (mate pair to exon 1 sequence ATUP871195.g1)

Note ATGN123505.b1 overlaps with AFSA940316.g2 to join exons 1 and 2

AC150407.1 is missing exons 1 and 2 (sequence gap)

exon 2 WGS seqs = APNK3267.b2, AFPZ279007.x4, AFPZ279007.x1, ATUP615432.y1

ATWX107943.g1, ATGN209312.b1, ATUP704177.b1, AFSA940316.g2

exon 2 from ATUP615432.y1 (overlaps exon 3)

next 8 are all exon 2 different from AFPZ80615.b2 exon 2 at one nucl.

AFPZ279007.x4 AFPZ279007.x1 ATWX107943.g1 AFSA940316.g2

APNK3267.b2 ATUP615432.y1 ATGN209312.b1 ATUP704177.b1

Exon 3 WGS sequences ATUP603318.y1 (mate pair to exon 1 sequence ATUP603318.x1)

ATWW25636.b1, AWYB3861.g1, ATUP183373.b1, ATUP688580.x1, ATUP664516.g1

ASWX26139.g2 (mate pair to exon 1 sequence ASWX26139.b2)

ATUP239851.g1, ATWX22588.b1 ATUP615432.y1 ASFW157059.g2 ATWW157810.g1

AFPZ279007.x1 AFPZ279007.x4 ATUP704177.b1

Exon 4 AFPZ919116.b2 ASWX175598.x1 ATGN296415.g1 ASFW157059.g2

MESAVAFASGLLANLTLQSTLVLVLAFLTTYWLLGAGGRQKNLPPGPRGLPLLGNL

LSFRPSRLLSNLAAWRQQYGDVFCVRIANRLTVVLNG

HKAIQDALVKQPEVFSNRPPAVVDSANDQ

5313 GVVMAQYGEGWKVKRRLGLTALRQFGMGKRSLEGKITEEARAVCDILAEKDGAATD

ISLLLSNGVSNVICSMSFGERFEYNDTEFQRLMRLMSELVGGSAISRFNPYVRKL

PFIRKGVESRVKMSMEIVEFIKLKIKEHKETFDPADIRDIIDVYLMETQQQTPDDGDR

TITEMGMINTMRDLFIAGAETTATTLKWGLLYLARHLEVQRKVQDEIDREFGSSPPTLS

QRGKLPYTEATILEIQRIRPIVPLSVPHTTSTATVLHGFDIPANTFVIPNLWSAMMDPTV

APDPETFNPDRFLDEDGTLVRPEWFIPFSL (1) 6266

6618 GRRQCLGEQLAKMELFIFLTTLLQHFTFKLPDGAPAPSMDGSLGVVLAPNPYQICAVPRDN* 6803

 

>AC150407.1 two linked genes 5000-6800 region and 44800-48000 region

WGS exon 1 = AFPZ47185.g2, APWS114129.b1, AFSA346565.b2, APNK34495.b3

WGS exon 2 = ATUP592730.y1

ATWX46513.g1, ATGN30351.b1, ATGN26143.g1, ATGN270605.g1, ATUP890454.x1

APWS24002.b1, ATGI65252.g1, ATGN86876.g1, ATWW134155.g1, AFPZ623416.g2

Exon3 ATGI94162.b1 ASWX162986.g2 AFPZ47185.b2 AFPZ855725.x1 AFPZ929344.x1

ASWX162986.b2 ATGI65252.g1

Exon 4 ASWX162986.b2 ATUP592730.x3 ATUP592730.x1 AFSA165640.b2 AFSA319547.b2

ATWW189487.g1

44817 MESVVPFASGLLANLTLQSTLVLVLAFLTTYWLLGAGGRQKKPPPGPRGLPLLGNL

      LSFRPSRLLSNLAAWRQQYGDVFCVRIANRLAVVLNG 45095 (2)

46004 HKAIQDALVKQPEVFSDRPSPFRFSDKDQ 46090 (1)

46542 GVVMAQYGESWKVKRRLGLTALRQFGMGKRSLEGKITEEARAVCDILAEKDGA

      AMDVSLLLSNGVSNVICSMSFGERFEYNDEEFQRLMRLMSELVSAGGISRFIPLVRKLPF

      LNEGSKNRAKMSMEIVEFIKVKIKEHKETFDPADIRDIIDVYLMETQQQTPDDVDR

      TITEMGMIGTVRDLFIAGAETTATTLKWGLLYLARHLEVQRKVQDEIDREFGSSPPTLS

      QRGKLPYTEATILEIQRIRPIVPLSVPHTTSAATVLHGFDIPANTFVIPNLWSAMMDSAV

      APDPETFNPDRFLDEDGTLVRPEWFIPFSL (1) 47495

47838 GRRQCLGEQLAKMELFIFLTTLLQHFTFKLPEGAPAPSMDGSLGVVLAPKPYQICGVPR* 48017

 

>AC150409.1   Branchiostoma floridae very similar to above clone

45% to 2U1 fugu

AFPZ598379.y1 AFPZ177295.y01 AFPZ177295.y1 ATUP642958.x1

ATGI278297.g1 ATUP598704.y1 ATUP909233.y1

AFSA820152.g2

exon 3 with some frameshifts

97% to AC150407.1 46000 might be an allele

exon 3,4 ATWW157810.g1

exon 4 ATUP610859.x1 ATWX26371.g1 ATUP919332.x1 ATWW121802.b1 ATGI269677.g1

ASFW189763.g2 ASFW131684.g2 AFSA849352.b2 AFSA173661.g2 AFPZ99561.y1

10750 MESAVAFASGLLANLTLQSTLVLVLAFLTTYWLLGAGGRQKNLPPGPRGLPLLGNLLSFR 10571

10570 PSRLLSNLAAWRQQYGDVFCVRIANRLTVVLNG 10472 (2)

8793 HKAIQDALVKQPEVFSDRPSPFRFSDKDQ 8707 (1)

8256 GVVMAQYGESWKVKRRLGLTALRQFGMGKRSLEGKITEEARAVCDILAEKDGTATDIS 8083

8082 LLLSNGVSNVICSMSFGERFEYNDAEFQRLMRLMSELVSAGGISRFIPLVRKLPFFNEGS 7903

7902 KNRAKMSMEIV 7870

EFIKAKIKEHKETFDPADIRDIIDVYLMETQQQTPDDVDRTITEMGMIGTVRDLFI

AGAETTATTLKWGLLYLARHLEVQRKVQDEIDREFGSSPPTLSQRGKLPYTEATILEIQR

IRPIVPLSVPHTTSAATVLHDFDIPANTFVIPNLWSAMMDPTVAPDPETFNPDRFLDEDGTLV

RPEWFIPFSL (1)

GRRQCLGEQLAKMELFTFLTTLLQHFTFKLPDGAPAPSMDGSLGVVLAPKPYQICAVPRDN*

 

>50% to CYP2U1 zebrafish ASFW117295.g2 AFSA812739.g2 ASFW150452.g2 New seq

TTWKGGVFFLPRALKPRRGRPKVREEIAREFASPVPPWSERERLPYTEATIMEIQGIRPIVPLNIFHGN

TSATTLYGYDIPAGTYIIPSLWSAMMDPKVTPEPEEFRPERFLDDEGKVVKPEWFLPFSA (1)

GRRRCLGEQLAKMELFLFYTSLLQHFTFKLPDGAPAPPMDGSLGFVLSPPAYDICAVPRHSS*

 

>62% to 2U1 zebrafish

AFSA220461.b2 ATWX77582.b1 APWS173478.g1 new C-term exon

GRRICLGEQLAKMELFLFLTSLLQQFTFKLPEGAPKPDMCGEIGATLLPKPYNIQAISRKK*

 

>DE198043.1 genomic survey sequence. NEW 1/6/06

          Length=653 43% to 2U1 Fugu

VQTTVRAELDRVLMRGESVSAAHRRALPVTEATVMEILRLATPSPLNFRATACDVTLRG  476

YRLPEGTWTLMNCWAVHRDPLQWTEPDTFDFTRFLDREGRVTTPPAWRPFGIGTRS  308

 

>DE197854.1 genomic survey sequence. NEW 1/6/06

DE000432

          Length=702 51% to 2U1

71   SVLHRYIIPKDTIVFAGQWSVHHDPELFPEPDMFDPERFLDDEGNFKNIEYFMPFSM (1) 241

375  GPRSCMGQPLAEVQLFLLFTNLMQNFKLKLPEGAAKPSSEGVMGITLAPKPFDLV  539

 

>DE017611.1 genomic survey sequence. NEW 1/6/06

          Length=625 46% to 2K11

GVLFAAYGPDWKHQRKFALMTLRDFGVGKRSLEGKIR  373

EEADALIQEVESKNGLPFDIKQMLPNAVSNVICSIAFGNRFEYGDPEFLRLIGLLNAAVE  553

AQPSRDILPNIHPVFRRLPFGS  619

 

>DE195161.1 genomic survey sequence. NEW 1/6/06

67% to 2N11

LFLAGTDTTSTTLRWALLYMILHPDIQEKVQQEIDSVLGPNQEPEMAHR  322

 

>DE189345.1 genomic survey sequence. NEW 1/6/06

61% to 2N11

LFLAGTDTTATTLHWAVLFMILHPDIQQKVQQEIDSVLGPNQDPSMEHR

 

>DE013036.1 genomic survey sequence. NEW 1/6/06

          Length=592 44% to 2Z2

VIYDLFFTGAETSSTCLRWAVFLMAVYPDVQARIHREVDTVLGSDGEVTLDKRAALPFL  386

DATISEVYYLNS  350

 

>DE012415.1 genomic survey sequence. NEW 1/6/06

58% to 2N11

AGTDTTATTLHWAVLFMILHPDIQQKVQQEIDSVLGPNQDPSMEHR

 

>CF918864 BI377274 Amphioxus 5-6 hrs cDNA 45% to 2U1 fugu

RVRRDATVSLAHRPEMPYTDAFLHEVLRIRPPGPLSVPHMAGPGATLNGYEIPQNTQVYA

NLWSLHMDPEYWPEPERFDPTRFIGPDGKVLPNPPSYAPFSLGRRACPGKQLAKSEAFLF

LVTMVQRFSFKLPEGAPVPPMDGVMGFSLAAQPHSLCAISRN*

 

>CF918826 BI383662 Amphioxus 5-6 hrs cDNA 51% to 2U1 fugu

AAESGTRPDYIIPQDAMIFVNLWSVHMDPQLFPDPNTFRPERFLDQDGNFVKQAVIPFGI

GPRVCLGEQLAKMEVFMLFVSLMQRFTFHLPEGAPEPSMLGKLASAINVPCPFELCAVAR*

 

>BI387982 Amphioxus 26hr cDNA library 48% to 2N2 zebrafish

NGKPVPKPAALMPFSA

GRRACPGEAVXKADTFLLLGGLVQNFRFSIPEGEGPPDLTPDDKTGGDTCIPYPYKVVMSCRKCML*

 

>BI388387.1 C-helix to mid

EGANYSDGCXGVIFAPYGSFWKEQRKFTLMSLRDFGFGNRSIYGKIVEESQVLQSVIAKF

DGQPFSTHRLLHNAVANVTCNILFGDRWEYDDPLFQRMMDALNYMVSTNVFAVPQNFIPF

TRYIPGWAGRLEPWLKKFLSIMGYLREELDKHKVIFDPTDLRDFINTYLLEIQNQ

 

>BI387848 Amphioxus 26hr cDNA library

52% to 2U1 FUGU, 50% to 2U1 mouse 75% to BI377261

AFSA235046.g2  ATGI91479.g1  ATUP593811.y1 AFSA108094.b2

RHASDLLLDGTETTGNTLLWALLYMTQNPTIQHK (0)

VQQELDAVVGESQPTLSHRSQLPYVNACLLETMRIRTLVPLAVPHATTQDVTIQEFDIPQGTQ (0)

VLPNLYSLHMDPTYWPDPDRFDPERFLDAEGNVINKPQSFMPFGG (1)

GRVCLGEQLARMELFLFFSTLLQSFTFKTPEGAPPPKTDGGLGITWTP

 

>AFPZ7602.y1 ATGI55268.b1 

VILNLYSLHVDPTYWPDPERFDPERFLDAEGNVINKPESFMPFAG (1)

 

>ASWX176511.y1  87% to BI377261

ATUP829661.g2 AFPZ642936.g2  ATUP921353.y1  ATWW61130.b1

VLTNLHSLHMDPAYWPDPDRFDPERFLDAEGKVINKPKSFMPFSG (1)

 

>ATUP598105.y1  ATGN136393.b1  ATUP193767.y2 ATGI126577.b1

ATWW83807.g1 

VHEELDAVVGESLPTLSHRSLLPYVNACLQEVMRIRPVGPLAIPHATTEAVRVRGYDIPKRTQ (0)

VLLNLYSLHMDPAYWPDPDRCDPERFLDAEGNVINKPESFMPFGG (1)

GRVCLGEQLARIELFLFFSTLLQSFTFKTPEGAPPPNADGILGLTLAPHPFQLCAIPR*

 

>ATUP937768.y3  ATUP937768.y1  ATUP905825.y1  ATWW1274.b2

AFSA664077.g2 

VLFNLYSLHMDPAYWPDPDRFDPERFLDAEGNVINKPESFMPFGG (1)

GRRVCLGEQLARMELFLFFSTLLQSFTFKTPEGAPPPNTDGIFRLTLKP HPFQLCAIRR*

 

>AFSA690405.b2 exon 1 and part of exon 2 89% to AFSA636542.b2

walk to ATWW106344.b1   APWS97989.g1

note: this is probably a poor version of APWS97989.g1

downstream the sequences are the same

MAILFSWIVESVLEILQISGLTLQTILVFCVPFLLACTF*KRPRNLPXYPAGRVPVLGH 849

LLALGRAPHLKLTXWRRQYGDVFTVRMGMEDVVVLNGYTAVRDALVDRSELFASRPPNYL 669

FDLTVGFGE ()

DIVTARWGSQFX QRRRL

 

>ATUP47463.b1 exon 1,2 87% to AFSA636542.b2

MAAVVSWISESVQEIPQISGLTLQTCLVFXAAFLLTCALXRRPRNLPPYPAGHVPVLG 791

HLLALGRAPLLKLTAWRRQYGDVFTVRMGMEDVVVLNGYTAVKDALVDRSELFASRPPNY 611

LFDSSVGFGK ()

DIGAARWGTGLKQRRRFATAALKHLGMKVGTGSVEDNIRQEASCLRKR (0)

 

>ATGI68302.b1 exon 1 82% to ASWX66916.b2

MAVIVSWIAELVWEIFQISGLTIQTFLVFCVVFLLAYVLLKRHKNLPPYPAGRVPVLGHL 326

LALGREPPLKLTAWRRQYGDVFTVRMGMEDVVVLNGYTAVKDALVDRSELFASRPPNYLL 506

DAIVGCGK ()

 

>AFPZ28428.y1 exon 1 79% to AFSA636542.b2

MATAVFRWIIQSVQDTLQIYGLNLQSLLVFCTAFVLACALLKRSPNLPPYPAGRVPVLG 304

HLLALGRAPHLKLTAWRRQYGDVFTVRMGMEDAVVLNGYTAVKDALVDRSELFASRPPNY 484

LFDLTVDSGK ()

 

>ASWX66916.b2 exon 1 89% to AFSA636542.b2

walked to AWXX13027.b1 ATUP266482.b1

mate pair = ASWX66916.g2 exon 3

AFSA35511.g2 exons 2,3 ATGN165304.g1 ASFW57081.b2

walked upstream to ATUP266482.b1 which = ASWX66916.b2 join seqs.

MAAVVSWIAESVLEILPMSGPTLQTFLVFCVAFLLTWALLRRPRNLPPYPAGRVPVLG 489

HLLALGRAPHLQLTAWRRQYGDVFTVRMGMEDVVVLNGYTAVKDALVDRSELFASRPPNY 669

LFDSSVGFGK ()

DIGAARWGTELKQRRRFATAALKHLGMKVGTGSVEDNIRQEASCLRNR  (0)

IAEYHGQPFGISNDMKVAVANVICSMAFGRRYGYEDETFRELSEAIRNLLAEIGSGQFISVFPLLRFVPG

 

>ATWX43498.b1 exons 2,3 very similar to AFSA35511.g2

walked to ATUP237571.b1 no obvious exon kept walking to ASWX77262.g2 = exon 4

ASFW107932.b3 exon 4 walked to AFPZ187653.x1 exon 4,5 ASWX45971.b2

DIGAAPLGDRVEAEKRFATAALKHLGMKVGTGSVEDNIRQEASCLRKR (0)

IAEYHGQPFAISNDMKVAVANVICSMAFGRRYGYEDETFRELSEAIRNLLAEIGSGQFISVFPLLRFVPG ()

ACKEVLKHLSKIHEVLWDEIARHRENFDRENPRDFLDFCLLELEQREK

VEGLTEENVLYMAQNLFLAGTDTTANTLLWSLLYMTLNPDIQNK (0)

VHEELDA

 

>AFPZ601018.b2  ATGN133651.b1 ATGN143242.b1 

walked to ATUP71680.y1 (exon 5)

walked to ATUP705359.g2 (exon 4)

walked to ATGI77993.b1 (exon 3)

walked to AFPZ24940.g2 (exon 2)

walked from exon 7 downstream to AFSA524984.b2 to try to find a mate pair

to exon 1 did not work

tried finding more exon 3,4 hits to look for more mate pairs

ATUP206044.x2 mate pair = ATUP206044.y2 = exon 1

MTGAVQWIADSVQEILQISELTLQTFLVLCSTFLLACVVFNRSRSRNL

PPYPAGRVPVLGHLLALGRAPLLKLTAWRRQYGDVFTVRMGMEDVVVLNGYTAVQDALVD

RSELFASRSPFYYLFDALFAFGK (1)

DIISARWGSGFRQKKRFATTVLKNLGMRVGRGSIEDSIREEASCLRNR (0)

IAENNGQPFDIAHDVAVAVANIICSMAFGKRYDYEDETFRELTKAIATISIELGAGHIT SVFPLLRFVPV (1)

VLYNHSHLYATVNRPIIKALEASSKVKNVMREEISRHREHLDRENPRDFLDLCLLELEQQE

KVEGLTEENVFHMAQDLFLGGTDTTANTLTWSLLYMTLNPDVQNK (0)

VHEELDAVVGESLPALSHRSQLPYVNACLLETMRIRTIVPLASHATTQEVKVQGYDIPKGTQ (0)

LMLTSPHMDPANWPDPDPFDPERFLDAEGNVIKKPESFMPFSG (1)

GRRVCLGEQLARMELFLFFSTLLQSFTFKTPVGAPPPNTDGIPGLTFMPHPFQLLAIER*

 

>APWS97989.g1 exon 1 93% to AFSA636542.b2

ATUP196459.x2 AFSA321451.g2 ASFW202410.g2

Walked upstream to ATGI153668.g1 AFPZ313895.x1 mate pair AFPZ313895.y1

to try to find a mate pair in the C-term part

AFPZ313895.x1 mate pair AFPZ313895.y1 = AFSA636542.b2 seq exon 3

These two seqs are 95% identical

APWS97989.g1 exon 4 end of exon 4 = BI377261 join seqs.

BI377261 Amphioxus 5-6 hrs cDNA 49% to 2U1 fugu 75% to BI387848

AFPZ459499.y1 ATUP541153.g1 AFSA636542.b2

MAIIVSWIVESVLEILQISGLTLQTILVFCVAFLLACTFWKRPRNLPPYPAGRVPVLGH 585

LLALGRAPHLKLTAWRRQYGDVFTVRMGMEDVVVLNGYTAVRDALVDRSELFASRPPNYL 405

FDLTVGFGE ()

Missing exon 2

VAEYEGKPIDIAHGINVAVANVICSMTFGKRYDYEDETFRELSEAVVTIMSELGAGQIIS VFPLLRFVPG (1)

ASYSVSAQLAKIQKVLREEMSRHREHLDRENPRDFLDFCLLELEQQEKVAGLTEENVLYMAQ

NLFFAGTDTTTNTLRWSLLYMALNPDIQKK (0)

VQEELDAIVGESLPTLSHRSQLPYVNACLLETMRIRHIGPLAVPHATTDTVKVKEYDIAKGTQ (0)

VLPNLHSLHMDPAYWPDPERFDPERFLDAEGNVINKPESFMPFSG (1)

GRRVCLGEQLARMELFLFFSTLLQSFTFKTPEGAPPPSTDGVF GVTLTPHPFQLCAIPR*

 

>AFSA636542.b2 ATUP541153.g1 ATUP933964.y1  ATUP933964.x1 ATUP738986.y1

ATGI10244.g1 ATGN171873.g1  ATUP926693.b1 (exon 6) AFSA482736.g2 (exon 6)

AFSA726698.g2 (exon 6)

34% to 2N1 35% to 2D4

MAVIVSWIVESVLEILQISGLTLQTILVFCVAFLIACTFLLK

RPRNLPPFPAGRVPVLGHLLALGRAPHLTLTAWRRQYGDVFTVRMGMEDVVVLNGYTAV

KDALVDMSELFASRPPNYLFDLTVGFGE  (1)

DIVTARWGSKFRQRRRFATTALRNLGMKVGTGSIEEKIREEAIRLRNR (0)

VAEYEGKPIDIAHGINVAVANVICSMTFGKRYDYEDETFRELSEAVVTI

MSELGAGQIISVFPLLRFVPG (1)

ASYSVSGQLAKIQKVLREEMSRHREHLDHENPRDFL

DFCLLELELQEKVAGLTEENVLYMTQNLFFGGTDTTTNTLLWSLLYMILNPDIQKK (0)

AQEELDAVVGESLPTLSHRSQLHYVNACLLEVMRIRH

IGPLAVPHATTDTVKVKEYDIAKGTQ (0)

VLPNLHSLHMDPAYWPDPDRFDPVRFLDAE GNVINKPESFMPFSG (1)

GRRVCLGEQLARMELVLFFSTLLQSFTFKTPEGAPPPSTDGIFGITLTPHPFQLCAIPR

 

>exon 1 ATGN171873.g1 APWS102929.b1 ATGI10244.g1

ATGGCTGTAATTGTCAGCTGGATAGTTGAGTC

CGTCCTGGAGATTTTGCAGATCTCCGGGCTGACTCTGCAAACAATTCTCGTCTTCTGTGT

GGCCTTCCTCATTGCGTGCACGTTCTTGTTAAAGCGCCCCAGGAACCTGCCACCTTTCCC

GGCAGGACGCGTGCCTGTTCTCGGGCACCTCCTCGCCTTGGGCCGAGCGCCTCACCTCAC

GCTGACGGCGTGGAGGCGGCAGTACGGGGACGTCTTCACCGTCAGGATGGGGATGGAAGA

TGTGGTGGTTCTGAACGGCTACACTGCCGTCAAGGATGCGCTCGTGGACATGTCCGAGCT

GTTCGCGTCCAGGCCGCCAAACTACCTGTTCGATTTGACAGTTGGATTCGGAGAAGGT (1)

 

>DIVT exon 2 ATGI10244.g1

AGACATTGTTACTGCACGTTGGG

GGAGCAAGTTCAGACAGAGACGGAGGTTTGCTACCACGGCGTTAAGGAACCTCGGCATGA

AGGTCGGCACTGGCAGCATTGAAGAGAAAATCCGAGAGGAAGCTATACGTCTCCGCAACA

GGGT

 

>VAE exon 3 ATUP738986.y1

AGGTTGCAGAATACGAGGGAAAAC

CTATTGATATCGCCCATGGTATCAACGTGGCGGTCGCGAACGTCATCTGCTCCATGACGT

TCGGAAAGCGCTACGACTACGAGGATGAAACGTTCCGGGAGCTCTCTGAGGCGGTTGTGA

CAATAATGTCTGAGCTTGGAGCGGGGCAGATTATCAGTGTCTTCCCCCTGTTACGGTTTG

TTCCAGGAGGT

 

>ASYS exon 4

AGCCAGCTACAGTGTATCTGGACAACTGGCGAAGATCCAAAAGGT

GTTGAGGGAAGAAATGTCTCGCCATCGAGAACACCTGGATCACGAGAACCCACGAGACTT

CCTCGACTTCTGCCTGCTGGAGCTGGAACTGCAGGAAAAGGTGGCTGGTCTGACGGAAGA

GAACGTCCTGTATATGACACAGAACCTTTTCTTCGGTGGAACAGACACGACCACCAACAC

ATTGCTGTGGAGTCTACTCTACATGATTTTGAACCCAGACATCCAAAAGAAGGT

 

>AQEEL exon 5

AGGCACAAGAGGAGCTTGATGCCGTTGTTGGTGAGAGTCTGCCCACCCTGTCCC

ACCGTTCCCAGCTGCACTACGTGAACGCCTGCCTGTTGGAGGTCATGAGGATCCGCCATA

TCGGGCCTCTTGCCGTTCCCCACGCCACCACAGACACGGTCAAAGTGAAGGAGTACGACA

TCGCTAAGGGAACCCAGGT

 

>VLP exon 6 AFSA726698.g2 ATUP926693.b1 AFSA482736.g2

ATUP933964.y1

AGGTACTACCGAA CTTGCACTCCCTCCACATGGACCCCGNCTACTGGCTTGATCCGGACC

GTTTTGACCCCGTAAGATTCCTGGACGCGGAA

GGGAACGTCATCAACAAGCCTGAGTCCTTCATGCCTTTTTCTGGAGGT

 

>GRR exon 7 ATUP933964.x1

AGGCCGACGTGTGTGTCTTGGTGAGCAGCTGGCCAGGATGGAACTTGTCCTG

TTCTTCTCGACTCTACTGCAGTCCTTCACCTTCAAGACGCCAGAGGGCGCCCCTC

CTCCAAGCACTGACGGCATCTTTGGGATAACATTGACACCGCATCCGTTCCAGCTTTGTG

CAATACCACGTTAG

 

Other closely related exons

 

>ATUP699472.x1 exons 6,7

VLLNVYSLHMDPAYWLDPDRFDPERFLDAEGKVINKPESFLPFGG (1)

GGRVCLGEQLARMELFLFFTTLLQSFTFKPPEGASPPNADGILGLTLAPHPFQLSAIPR*

 

>AFPZ728456.y1 exons 5,6

VHEELDAVVGESLPTLSHRSQLPYVNACLQEVMRIRPVGPLAIPHATTEAVKVRGYDIPKRTQ (0)

VLLNLYSLHMDPAYWPDPDRFDPERFLDAEGKVINKPDSFLPFGG (1)

 

>AFPZ476483.b2 exons 5,6

VHEELDAVVGESLPTLSHRSQLPYVNAC

LQEVMRIRPVGPLAIPHATTEAVKVRGYDIPKRTQ (0)

VLLNLYSLHMDPAYWPDPDGFDPEXFLDAEGKVXHKPES

 

>exons 5,6

AFSA16336.x4  AFSA16336.x1  AFPZ506410.x1 APNK80508.g2  ASWX68286.g3 

ATUP343092.y1 ATGN182700.g1 ATUP756295.y1 AFPZ866552.y1 ATUP443435.g1

ASFW36405.b2  AFSA625448.b2 AFSA427303.b2 AFSA716480.g2 AFPZ471003.x1

VQQELDAVMGASLPSLSHRSKLPYVNACLMETMRIRTLLSVILHATAQEVKVQGYDIPKGTR (0)

VLMNMHSLHMDPAYWPDPDRFDPERFLDAEGNVINKLPSFMPFSG (1)

AGGTACAGCAGGAGCTTGATGCC

GTTATGGGCGCGAGTCTGCCCAGCCTGTCCCACCGCTCCAAGCTGCCCTACGTGAACGCC

TGCCTGATGGAGACCATGCGGATCCGCACTCTTCTGTCTGTCATCCTTCACGCCACCGCG

CAGGAGGTCAAAGTGCAGGGATACGACATTCCTAAGGGAACTCGGGT

AGGTGTTGATGAACATGC

ACTCCCTCCACATGGACCCCGCCTACTGGCCTGACCCGGACCGGTTTGACCCCGAAAGGT

TTCTGGACGCGGAAGGGAACGTCATCAACAAACTTCCATCCTTCATGCCTTTTTCAGGAGGT

 

>ATGI42736.b1   ATGN217089.g1  ATUP49594.g2   AFSA786188.b2 

AFSA126109.g2  AFPZ657783.y1 ATUP738387.x1

AFPZ495923.y1  dup. exon 5 (pseudogene) exon 6 and part of 7

VHEELDAVVGASLPALSDRSQLL

YVNACLLETMRIRTLVPVSLPH

VQQELDAVVGASLPALSHRSQLPYVNACLMETMRIRTLLSVILHATAQEVKVQGYDISKGTR (0)

VLMNMHSLHMDPAYWPDPDRFDPERFLDAEGNVINKLPAFMPFSG (1)

GHRVCLGEQLARMELFLFFSTLLQSFTIKTPEGAPPPNTDGIFGLALKPHPFQLCAIPR*

AGGTGTTGATGAACATGC

ACTCCCTCCACATGGACCCCGCCTACTGGCCTGACCCGGACCGGTTTGACCCCGAAAGGT

TTCTGGACGCGGAAGGGAACGTCATCAACAAACTTCCAGCCTTCATGCCTTTTTCAGGAGGT

 

>AFSA241515.g2  AFPZ140710.y1  APWS45577.b1  ASWX65492.b2  

ATUP320554.x1  ATGN323264.g1  ATGN284296.b1  ATGI170827.b1 

ATUP12995.x2   ATUP716729.x1  AFSA152443.b2

VLMNMYSLHMDPVYWPDPDRFDPERFLDAEGNVINKPESFMPFGG (1)

GRRVCLGEQLARMELFLFFSTLLQSFHFKTPEGAPAPCADGIFRMTVTPHPFELCAIPV*

 

>AFSA83521.b2

VLMNMYSIHMDPVYWPDPDRFDPERFLDAEGNVINKPESFMPFGG (1)

GRRVCLGEQLARMELFLFFSDLLQSFTFKTPEGAPAPCADGIFPMTLTPXPFELCAIPR*

 

>APWS92234.g2  ATWX24634.b1  ATGN357284.b1  ATUP895861.x1  ATGI42736.g1

ATGI104268.g2  ATWW86466.b1  ATWW117588.b1  ATUP559927.b1  AFPZ509619.y1

ATGN267193.b1

IHEELDAVVGESLPALSHRPQLPYVNACLLETLRIRTLV XXXXHATTQDVKVQQFDIPKGTQ (0)

VLPNLHSLHTDPAYWPDPDRFDPERFLDAEGNVINKPESFMPFSG (1)

 

>ATUP710771.b1 1 aa diff to APWS92234.g2

AFSA525510.b2

VLPNLHSLHTDPAYWPDPDRFDPERFLDAEGNVINKPESFMPFGG (1)

 

>AFPZ185379.x1 ATGI56471.b1  ATGN311806.g1  ATUP672818.b1 AFSA909926.b2

AFSA523356.b2  AFSA330395.g2 AFSA330395.b2

ATGI93799.g1 APWS110835.b1 ASWX76430.g2 

GTQCKLHACRSTLEDPLEQQAKLSSLTEENVLHMAGDLFLAGTETTTNTLQWSLLYMTLNPDIQNK (0)

VQEELDAVVAESLPTLSHRSQLPYVNACLLEVMRIRTLIPAVRHVTTQEVKVQEYHISMGTW (0)

VLANLHSLHTDPAYWPDPDRFDPERFLDAEGNVINNPKSFMPFGG (1)

GRRACLGEQLARMELFLFFSTLLQSFTFTTPEGALPPNTDGVFGLTLVPHPFQLCATPR*

 

>ATUP912010.y1 ATGI128166.b1  ASFW164761.b2  AFSA840082.g2

AFSA174046.g2  AFSA315286.g2  AFPZ159110.y1  ATWW201417.g1

AFSA778163.g2 

VLVNLHSLHMDPVYWPDPDRFDPERFLDAEGNVVNKPQSFMPFAG

 

>ATUP680104.g1

VILNLHSVHMDPAFWPDPDRFDPDRFLDAEGNFINKPESFMPFSA (1)

 

>ATWW125683.g1

VHEELDAVVGASLPTLAHRSQLPYVNAFLMEVMRIRYVGPLGVPHATTAAVKVQEYDIPEGTQ (0)

IILNLHSVHMDPAFWPDPDRFDPDRFLDAEGNFINKPESFIPFSA

 

>APWS102434.g1 ATWW177217.g1

KVQGYDIPKGXX

VLMNLYSLHMDPAYWPDPDRFDPERFLDAEGNLINKPESFMPFG (1)

 

>ATWW233361.g1 ATUP551452.y1

VHEELDAVVGESLPALSHRSQLPYVNACLMEIMRIRYVGPLSVPHATTAPVKVQEYDIPKGTQ (0)

VIVNLHSLHVDPAYWPDPDRFDPDRFLDAEGNFINKPESFMPFS

 

>AFPZ859823.x1 AWYB2850.g1 ATWW63772.b1 AFPZ870007.y4 AFPZ870007.y1

mate pair AFPZ859823.y1 = exon 7 ATUP557464.y1

ATUP557464.y1 ASFW50972.b2 AFSA305932.b2  AFPZ122560.y1  ATUP820771.x1

Almost 51% to CYP2U1 human

VWTKIQFSNIPLLITIVSGKLVTRFLFPVLFLPLVNR ???? uncertain

AFMEVLKQNSRVHEVLWDEIARHRETFDSENPRDFIDFCLLELEQQE

KVDGLTEENVMYMAQDLFFAGTETATNTLLWSLLYMTLNPGVQQK (0)

VHEELDTVVGASLPTLSHRSRLPYANACLMETMRIRHIAPLIIPHATTDTVRVQEYDIPEGTQ (0)

VLMNMYSLHMDPAYWPDPDRFDPERFLDAEGNVINKPESFMPFGG (1)

GRRVCLGEQLARMELFLFFSTLLQSFSFKTPEGAPAPCADGIFRMTLTPHPFELCAIPR*

 

>ATWW225973.b1  CYP2U like ATUP29908.b1 ATUP481728.g1 APNK56784.b2

AFSA784029.g2 ATUP411423.g1

FLDSDGKVVTRPESFMPFST (1)

GRRVCLGEQLAKMELFLLFSSLLKHFTLKLPEGAAAPSTDGIMGFFYVPPKVNMCITKR*

 

CYP17/CYP1 like

>ATGI187647.g1

MWLMTITVGLVTLILVKWLKDYVQRWRMPPGPFFWPVIGNLSCKYRGS (0)

 

>ATGI151113.b1  ATWW15542.g1

SYLTFIDLAKTYGDVFSLKMGMTDVVVLNSLDAVKEAFVKKGEDFAGRPKMT (1)

 

>AFPZ866519.x1   66% to C-helix of CYP17A2

TDISSEGGKDIAFADYSPTWKLHRKLFHSAIR (2)

 

>ATGI157309.b1 ATGN157240.b1 ATUP362994.b1 AFPZ866519.x1

GYASAQNLQSKVHESLEDTIAVFSKMEGQAVDLEDYIYQLVYNVICSAAFGTR (2)

 

>AFPZ866519.y1

YNMDDEDFDTLMKISKDTTETFGQGLLADVYPVLRFLPSS (1)

 

>AFPZ295620.x1

SVTANRKMTHQLMEIMQRHLEQHRESFDP (1)

IPLNEYQCTLLQITSVTSQITMIKAQKDAEEEGIQDIDSLTDTHLRQLIGDISF (1)

 

>ASWX154218.b2 I-helix to EXXR region ATGN97768.g1 AFSA255326.b2

47% to CYP1A7

AGTISTILTLRWAILYLAVHPEIQEKVAAELDSVVGRDRLPELSDREATPYTEAIFHEVMRMASMDPV

SLPHATTVDTTLS ()

GYQIPKGTWILPNLWALHHDPDTWGDPDVFRP

 

>AFPZ295620.y1 ASWX154218.g2 (very end + downstream seq)

DVFRPERFLDESGKPIPKPAALMPFG (2)

VGRRACPGEALGKADTFLLLGGLVQNFRFSIPEGEGPPDLTPDEIGQ

GSISIPYPYNVVMTCRK*

 

35% to Xenopus CYP17 and 36% to CYP1A6 and CYP1A7

MWLMTITVGLVTLILVKWLKDYVQRWRMPPGPFFWPVIGNLSCKYRGS (0)

SYLTFIDLAKTYGDVFSLKMGMTDVVVLNSLDAVKEAFVKKGEDFAGRPKMT (1)

TDISSEGGKDIAFADYSPTWKLHRKLFHSAIR (2)

GYASAQNLQSKVHESLEDTIAVFSKMEGQAVDLEDYIYQLVYNVICSAAFGTR (2)

YNMDDEDFDTLMKISKDTTETFGQGLLADVYPVLRFLPSS (1)

SVTANRKMTHQLMEIMQRHLEQHRESFDP (1)

IPLNEYQCTLLQITSVTSQITMIKAQKDAEEEGIQDIDSLTDTHLRQLIGDISF (1)

AGTISTILTLRWAILYLAVHPEIQEKVAAELDSVVGRDRLPELSDREAT

PYTEAIFHEVMRMASMDPVSLPHATTVDTTLS ()

GYQIPKGTWILPNLWALHHDPDTWGDPDVFRPERFLDESGKPIPKPAALMPFG (2)

VGRRACPGEALGKADTFLLLGGLVQNFRFSIPEGEGPPDLTPDEIGQGSISIPYPYNVVMTCRK*

 

>DE040433.1 Amphioxus genomic survey sequence. No introns, NEW 1/6/06

41% to 1B1 Danio, 40% to CYP1C1 fugu, 39% TO 1A1 HUMAN, 39% to Xenopus 1A6, 1A7

trace file 630869645 632546376 539391436

MAAVATAALFGLSYLQVVLIAVLLVLVAAVVASSLRQNTPSLPPGPWGF

PVVGIFPALGSRPHHAFSRMAEKYGDVFRVKFGSRT

VIILNGIDMVKDACVKQSACFAGRPALYSFKQVKNGITFKTYSPSWVARKKVTVGALKGF

VNGRVGALTASAETMITEEAQELARVFLSKSGQPSNPEEYAHTAVANVVCALCFGKRYEH