Amphioxus
(cephalochordate) ESTs, WGS and HTGS sequences
Branchiostoma
floridae (many seqs)
Branchiostoma
belcheri (1 seq)
D. Nelson
August 13, 2004, added CYP51 Jan. 19, 2005,
Many new
WGS Trace file sequences. modified May 10, 2005
To
retrieve the trace archive files such as AFSA125350.y1
http://www.ncbi.nlm.nih.gov/Traces/trace.cgi?size=1&cmd=retrieve&s=&m=&retrieve=Search&val=TRACE_NAME%3D%27ATUP646033.g1%27&file=trace&gz=on&fasta=on&scfrcf=scf&dopt=fasta
and add
the accession number into the search window as shown here:
TRACE_NAME='ATUP646033.g1'
The
TRACE_NAME= limits to the appropriate field. The quotes around the accession number are needed.
CYP2 clan
>AFPZ293133.y1
exon 1 90% to AC150407.1 gene at 44817-45095
ATGN222242.g1,
ATGI214615.g1, AFSA852354.g2, AFSA840251.g2
AFSA264268.g2,
AFSA229077.g2, ATGN243325.b1, ATGI179003.g1
ATUP200634.x2,
Exon 2
ATUP200634.y2 mate pair to ATUP200634.x2
AFSA601420.g2,
AFSA489196.b2, AFSA489196.g2
MESAVSFVSGLLANLTLQSILVLVLAFLVTYWLLGTGDRQKNLPPGPRGLPLLGN
LLSFRPSYLLSNLAAWRDKYGDVFCVRIANRLAVVLNG
()
HKAIQDALVKQPEVFSNRPPPFIDSAKDQG
AFSA125350.y1
walk with this one it overlaps AFSA489196.g2 on the way to exon 3
>AFSA820329.b2
new exon 3 AFSA896985.b2 ATUP864470.b1 ATUP552587.x1
AFPZ806277.x1
AFPZ776469.x1 AFPZ908625.y1 ATGI225190.b1 AFPZ278931.y1
AFSA277571.g2
Exon 4
ATGN165786.g1 exon 4 ATUP951310.g1 ATUP682533.g1 ATUP194164.y2
ATGI58079.g1
ATWW119241.g1 ATUP547159.g1 AFPZ278931.x4 AFPZ278931.x1
ASWX119777.b2 (joins exon 3 of AFSA820329.b2)
ASWX119777.g2 exon 2 seq with frameshift mate pair of ASWX119777.b2
AFSA254465.b2
exon 2 seq
ATUP194164.x2
mate pair of ATUP194164.y2,
AFPZ526846.y1
HKAIQDALVKQPEVFSNRPPPMLDSAKDQG
GVAMSEYGEDWKVKRRIGLT
ALRQFGMGKRSLEGKITEEARILCDVLAEKNGTATDMSLLLSNAVSNVICAMSF
GERFEHNDMEFQRLMRLMSEMVGGSGGNAGSSISRFIPLVRKLPFFKKGLERRV
KMSLEVVDFIKSKIKEHKETFDPADIRDIIDVYLMETQQQTPDDADRTITEMGMINTMRD
LFIAGAETTATTLKWGLLYLARHLEVQRKVQDEIDREFGASPPTLSQRGKLPYTEATILE
IQRIRPIAPLAVPHTTSTATVLHGFDIPADTFVIPNLWSAMMDPAVAPDPETFNPDRFLD
EDGTVVRPEWLIPFSL
GRRQCLGEQLAKMELFLFLTTLLQHFTFKLPDGAPALSMEGSLGIVLAPKAYQICAVPRDN*
>AFPZ869380.y1
AFSA270651.b2 (4 aa diffs to ATGN165786.g1) exon 4
GRRQCLGEQLAKMELFLFLATLLQHFTFKLPYGAPAPSMEGSMGIVLAPKAYQICAVPRDN*
>ASFW203349.g2
WGS exon 1 94% to AC150407.1 gene at 44817-45095
AFSA277180.g2,
AFSA337814.g2, AFSA337814.b2
MESAVSFVSGLLANLTLQSTLVLVLAFLVTYWLLGAGDRQKNLPPGPRGLPLLGNLLSF
RPSHLLSNLAAWRQQYGDVFCVRIANRLAVVLNG
>AFPZ80615.b2
ASWX143119.b2 ATUP19565.y2 ATUP590430.x1
ATUP871195.g1
AFPZ617150.x1
ATGI45147.b1 AFPZ676420.b2 ATUP848153.y1(poor seq)
New exon 3
seq
Exon 4
ATGI75778.b1 ATWX106000.g1 ATWX78968.b1 ATUP879942.b1
ATUP25175.g1
AFSA516244.g2 AFSA298646.b2 AFPZ676420.b2(joins with exon 3)
ATGI213326.g1
overlaps ATUP871195.g1
exon 3 and goes 800bp upstream
Has exact
match to exon 2 from AC150407.1 first seq.
Note:
there are two seqs with exact aa seq but one silent nuc diff
These are
proably from different genes.
ATGI213326.g1
AFPZ69924.g2 ASWX145321.g2 ATUP276953.x2 are 100% identical
In the
exon 2 region
HKAIQDALVKQPEVFSNRPPAVVDSANDQ
GVVMAQYGEGWKVKRRLGLTALRQFGMGKRSLEGKITEEARAVCDILAEK
DGAATDISLLLSNGVSNVICSMSFGERFEYNDTEFQRLMRLMSELVTGSAISR
FNPYVRKLPFIKKGVESRMKMAKEITEFIKAKIKEHKDTFDPADIRDIIDVYLMETQQQI
PDDGDRTITEMGMINTMRDLFIAGAETTATTLKWGLLYLARHLEVQRKVQDEIDQEFGSS
PPTLSQRGKLPYTEATILEIQRIRPIVPLSVPHTTSAATVLHGFDIPANTFVIPNLWSAM
MDPTVAPDPETFNPDRFLDEDGTHVRPEWFIPFSL
(1)
GRRQCLGEQLAKMELFIFLTTLLQHFTFKLPDGAPAPSMDGSLGVVLAPDPYQICAIQRD
>AC150407.1
two linked genes 5000-6800 region and 44000-48000 region
AFSA41261.x1
WGS exon 1 96% to AC150407.1 gene at 44817-45095
ATUP33911.b1,
ASWX26139.b2, ATUP603318.x1, AWXX12138.b1,
AFPZ106727.x1
ATWW205625.g1,
ATWW98645.b1, AFSA387158.g2, AFSA192393.g2, ATGN203140.g1
ATUP879942.g1,
AFSA748020.g2, ATGN123505.b1, ATUP723083.y1, AFPZ103877.x1
APWS99281.g1,
ATUP590430.y1 (mate pair to exon 1 sequence ATUP590430.x1)
ATUP871195.b1
(mate pair to exon 1 sequence ATUP871195.g1)
Note
ATGN123505.b1 overlaps with AFSA940316.g2 to join exons 1 and 2
AC150407.1
is missing exons 1 and 2 (sequence gap)
exon 2 WGS
seqs = APNK3267.b2, AFPZ279007.x4, AFPZ279007.x1, ATUP615432.y1
ATWX107943.g1,
ATGN209312.b1, ATUP704177.b1, AFSA940316.g2
exon 2
from ATUP615432.y1 (overlaps exon 3)
next 8 are
all exon 2 different from AFPZ80615.b2 exon 2 at one nucl.
AFPZ279007.x4
AFPZ279007.x1 ATWX107943.g1 AFSA940316.g2
APNK3267.b2
ATUP615432.y1 ATGN209312.b1 ATUP704177.b1
Exon 3 WGS
sequences ATUP603318.y1 (mate pair to exon 1 sequence ATUP603318.x1)
ATWW25636.b1,
AWYB3861.g1, ATUP183373.b1, ATUP688580.x1, ATUP664516.g1
ASWX26139.g2
(mate pair to exon 1 sequence ASWX26139.b2)
ATUP239851.g1,
ATWX22588.b1 ATUP615432.y1 ASFW157059.g2 ATWW157810.g1
AFPZ279007.x1
AFPZ279007.x4 ATUP704177.b1
Exon 4
AFPZ919116.b2 ASWX175598.x1 ATGN296415.g1 ASFW157059.g2
MESAVAFASGLLANLTLQSTLVLVLAFLTTYWLLGAGGRQKNLPPGPRGLPLLGNL
LSFRPSRLLSNLAAWRQQYGDVFCVRIANRLTVVLNG
HKAIQDALVKQPEVFSNRPPAVVDSANDQ
5313
GVVMAQYGEGWKVKRRLGLTALRQFGMGKRSLEGKITEEARAVCDILAEKDGAATD
ISLLLSNGVSNVICSMSFGERFEYNDTEFQRLMRLMSELVGGSAISRFNPYVRKL
PFIRKGVESRVKMSMEIVEFIKLKIKEHKETFDPADIRDIIDVYLMETQQQTPDDGDR
TITEMGMINTMRDLFIAGAETTATTLKWGLLYLARHLEVQRKVQDEIDREFGSSPPTLS
QRGKLPYTEATILEIQRIRPIVPLSVPHTTSTATVLHGFDIPANTFVIPNLWSAMMDPTV
APDPETFNPDRFLDEDGTLVRPEWFIPFSL
(1) 6266
6618
GRRQCLGEQLAKMELFIFLTTLLQHFTFKLPDGAPAPSMDGSLGVVLAPNPYQICAVPRDN* 6803
>AC150407.1
two linked genes 5000-6800 region and 44800-48000 region
WGS exon 1
= AFPZ47185.g2,
APWS114129.b1, AFSA346565.b2, APNK34495.b3
WGS exon 2
= ATUP592730.y1
ATWX46513.g1,
ATGN30351.b1, ATGN26143.g1, ATGN270605.g1, ATUP890454.x1
APWS24002.b1,
ATGI65252.g1, ATGN86876.g1, ATWW134155.g1, AFPZ623416.g2
Exon3
ATGI94162.b1 ASWX162986.g2 AFPZ47185.b2
AFPZ855725.x1 AFPZ929344.x1
ASWX162986.b2
ATGI65252.g1
Exon 4
ASWX162986.b2 ATUP592730.x3 ATUP592730.x1 AFSA165640.b2 AFSA319547.b2
ATWW189487.g1
44817
MESVVPFASGLLANLTLQSTLVLVLAFLTTYWLLGAGGRQKKPPPGPRGLPLLGNL
LSFRPSRLLSNLAAWRQQYGDVFCVRIANRLAVVLNG 45095 (2)
46004
HKAIQDALVKQPEVFSDRPSPFRFSDKDQ 46090 (1)
46542
GVVMAQYGESWKVKRRLGLTALRQFGMGKRSLEGKITEEARAVCDILAEKDGA
AMDVSLLLSNGVSNVICSMSFGERFEYNDEEFQRLMRLMSELVSAGGISRFIPLVRKLPF
LNEGSKNRAKMSMEIVEFIKVKIKEHKETFDPADIRDIIDVYLMETQQQTPDDVDR
TITEMGMIGTVRDLFIAGAETTATTLKWGLLYLARHLEVQRKVQDEIDREFGSSPPTLS
QRGKLPYTEATILEIQRIRPIVPLSVPHTTSAATVLHGFDIPANTFVIPNLWSAMMDSAV
APDPETFNPDRFLDEDGTLVRPEWFIPFSL (1) 47495
47838
GRRQCLGEQLAKMELFIFLTTLLQHFTFKLPEGAPAPSMDGSLGVVLAPKPYQICGVPR* 48017
>AC150409.1 Branchiostoma floridae very
similar to above clone
45% to 2U1
fugu
AFPZ598379.y1
AFPZ177295.y01 AFPZ177295.y1 ATUP642958.x1
ATGI278297.g1
ATUP598704.y1 ATUP909233.y1
AFSA820152.g2
exon 3
with some frameshifts
97% to
AC150407.1 46000 might be an allele
exon 3,4
ATWW157810.g1
exon 4
ATUP610859.x1 ATWX26371.g1 ATUP919332.x1 ATWW121802.b1 ATGI269677.g1
ASFW189763.g2
ASFW131684.g2 AFSA849352.b2 AFSA173661.g2 AFPZ99561.y1
10750
MESAVAFASGLLANLTLQSTLVLVLAFLTTYWLLGAGGRQKNLPPGPRGLPLLGNLLSFR 10571
10570
PSRLLSNLAAWRQQYGDVFCVRIANRLTVVLNG 10472 (2)
8793
HKAIQDALVKQPEVFSDRPSPFRFSDKDQ 8707 (1)
8256
GVVMAQYGESWKVKRRLGLTALRQFGMGKRSLEGKITEEARAVCDILAEKDGTATDIS 8083
8082
LLLSNGVSNVICSMSFGERFEYNDAEFQRLMRLMSELVSAGGISRFIPLVRKLPFFNEGS 7903
7902
KNRAKMSMEIV 7870
EFIKAKIKEHKETFDPADIRDIIDVYLMETQQQTPDDVDRTITEMGMIGTVRDLFI
AGAETTATTLKWGLLYLARHLEVQRKVQDEIDREFGSSPPTLSQRGKLPYTEATILEIQR
IRPIVPLSVPHTTSAATVLHDFDIPANTFVIPNLWSAMMDPTVAPDPETFNPDRFLDEDGTLV
RPEWFIPFSL
(1)
GRRQCLGEQLAKMELFTFLTTLLQHFTFKLPDGAPAPSMDGSLGVVLAPKPYQICAVPRDN*
>50% to
CYP2U1 zebrafish ASFW117295.g2 AFSA812739.g2 ASFW150452.g2 New seq
TTWKGGVFFLPRALKPRRGRPKVREEIAREFASPVPPWSERERLPYTEATIMEIQGIRPIVPLNIFHGN
TSATTLYGYDIPAGTYIIPSLWSAMMDPKVTPEPEEFRPERFLDDEGKVVKPEWFLPFSA
(1)
GRRRCLGEQLAKMELFLFYTSLLQHFTFKLPDGAPAPPMDGSLGFVLSPPAYDICAVPRHSS*
>62% to
2U1 zebrafish
AFSA220461.b2
ATWX77582.b1 APWS173478.g1 new C-term exon
GRRICLGEQLAKMELFLFLTSLLQQFTFKLPEGAPKPDMCGEIGATLLPKPYNIQAISRKK*
>DE198043.1
genomic survey sequence. NEW 1/6/06
Length=653
43% to 2U1 Fugu
VQTTVRAELDRVLMRGESVSAAHRRALPVTEATVMEILRLATPSPLNFRATACDVTLRG 476
YRLPEGTWTLMNCWAVHRDPLQWTEPDTFDFTRFLDREGRVTTPPAWRPFGIGTRS 308
>DE197854.1
genomic survey sequence. NEW 1/6/06
DE000432
Length=702
51% to 2U1
71
SVLHRYIIPKDTIVFAGQWSVHHDPELFPEPDMFDPERFLDDEGNFKNIEYFMPFSM (1) 241
375
GPRSCMGQPLAEVQLFLLFTNLMQNFKLKLPEGAAKPSSEGVMGITLAPKPFDLV 539
>DE017611.1 genomic survey
sequence. NEW 1/6/06
Length=625
46% to 2K11
GVLFAAYGPDWKHQRKFALMTLRDFGVGKRSLEGKIR 373
EEADALIQEVESKNGLPFDIKQMLPNAVSNVICSIAFGNRFEYGDPEFLRLIGLLNAAVE 553
AQPSRDILPNIHPVFRRLPFGS 619
>DE195161.1
genomic survey sequence. NEW 1/6/06
67% to
2N11
LFLAGTDTTSTTLRWALLYMILHPDIQEKVQQEIDSVLGPNQEPEMAHR 322
>DE189345.1
genomic survey sequence. NEW 1/6/06
61% to
2N11
LFLAGTDTTATTLHWAVLFMILHPDIQQKVQQEIDSVLGPNQDPSMEHR
>DE013036.1 genomic survey
sequence. NEW 1/6/06
Length=592
44% to 2Z2
VIYDLFFTGAETSSTCLRWAVFLMAVYPDVQARIHREVDTVLGSDGEVTLDKRAALPFL 386
DATISEVYYLNS 350
>DE012415.1 genomic survey
sequence. NEW 1/6/06
58%
to 2N11
AGTDTTATTLHWAVLFMILHPDIQQKVQQEIDSVLGPNQDPSMEHR
>CF918864
BI377274 Amphioxus 5-6 hrs cDNA 45% to 2U1 fugu
RVRRDATVSLAHRPEMPYTDAFLHEVLRIRPPGPLSVPHMAGPGATLNGYEIPQNTQVYA
NLWSLHMDPEYWPEPERFDPTRFIGPDGKVLPNPPSYAPFSLGRRACPGKQLAKSEAFLF
LVTMVQRFSFKLPEGAPVPPMDGVMGFSLAAQPHSLCAISRN*
>CF918826
BI383662 Amphioxus 5-6 hrs cDNA 51% to 2U1 fugu
AAESGTRPDYIIPQDAMIFVNLWSVHMDPQLFPDPNTFRPERFLDQDGNFVKQAVIPFGI
GPRVCLGEQLAKMEVFMLFVSLMQRFTFHLPEGAPEPSMLGKLASAINVPCPFELCAVAR*
>BI387982
Amphioxus 26hr cDNA library 48% to 2N2 zebrafish
NGKPVPKPAALMPFSA
GRRACPGEAVXKADTFLLLGGLVQNFRFSIPEGEGPPDLTPDDKTGGDTCIPYPYKVVMSCRKCML*
>BI388387.1
C-helix to mid
EGANYSDGCXGVIFAPYGSFWKEQRKFTLMSLRDFGFGNRSIYGKIVEESQVLQSVIAKF
DGQPFSTHRLLHNAVANVTCNILFGDRWEYDDPLFQRMMDALNYMVSTNVFAVPQNFIPF
TRYIPGWAGRLEPWLKKFLSIMGYLREELDKHKVIFDPTDLRDFINTYLLEIQNQ
>BI387848
Amphioxus 26hr cDNA library
52% to 2U1
FUGU, 50% to 2U1 mouse 75% to BI377261
AFSA235046.g2 ATGI91479.g1 ATUP593811.y1 AFSA108094.b2
RHASDLLLDGTETTGNTLLWALLYMTQNPTIQHK
(0)
VQQELDAVVGESQPTLSHRSQLPYVNACLLETMRIRTLVPLAVPHATTQDVTIQEFDIPQGTQ
(0)
VLPNLYSLHMDPTYWPDPDRFDPERFLDAEGNVINKPQSFMPFGG (1)
GRVCLGEQLARMELFLFFSTLLQSFTFKTPEGAPPPKTDGGLGITWTP
>AFPZ7602.y1
ATGI55268.b1
VILNLYSLHVDPTYWPDPERFDPERFLDAEGNVINKPESFMPFAG
(1)
>ASWX176511.y1 87% to BI377261
ATUP829661.g2
AFPZ642936.g2 ATUP921353.y1 ATWW61130.b1
VLTNLHSLHMDPAYWPDPDRFDPERFLDAEGKVINKPKSFMPFSG
(1)
>ATUP598105.y1 ATGN136393.b1 ATUP193767.y2 ATGI126577.b1
ATWW83807.g1
VHEELDAVVGESLPTLSHRSLLPYVNACLQEVMRIRPVGPLAIPHATTEAVRVRGYDIPKRTQ
(0)
VLLNLYSLHMDPAYWPDPDRCDPERFLDAEGNVINKPESFMPFGG
(1)
GRVCLGEQLARIELFLFFSTLLQSFTFKTPEGAPPPNADGILGLTLAPHPFQLCAIPR*
>ATUP937768.y3 ATUP937768.y1 ATUP905825.y1
ATWW1274.b2
AFSA664077.g2
VLFNLYSLHMDPAYWPDPDRFDPERFLDAEGNVINKPESFMPFGG
(1)
GRRVCLGEQLARMELFLFFSTLLQSFTFKTPEGAPPPNTDGIFRLTLKP
HPFQLCAIRR*
>AFSA690405.b2
exon 1 and part of exon 2 89% to AFSA636542.b2
walk to
ATWW106344.b1 APWS97989.g1
note: this
is probably a poor version of APWS97989.g1
downstream
the sequences are the same
MAILFSWIVESVLEILQISGLTLQTILVFCVPFLLACTF*KRPRNLPXYPAGRVPVLGH
849
LLALGRAPHLKLTXWRRQYGDVFTVRMGMEDVVVLNGYTAVRDALVDRSELFASRPPNYL
669
FDLTVGFGE
()
DIVTARWGSQFX
QRRRL
>ATUP47463.b1
exon 1,2 87% to AFSA636542.b2
MAAVVSWISESVQEIPQISGLTLQTCLVFXAAFLLTCALXRRPRNLPPYPAGHVPVLG
791
HLLALGRAPLLKLTAWRRQYGDVFTVRMGMEDVVVLNGYTAVKDALVDRSELFASRPPNY
611
LFDSSVGFGK
()
DIGAARWGTGLKQRRRFATAALKHLGMKVGTGSVEDNIRQEASCLRKR
(0)
>ATGI68302.b1
exon 1 82% to ASWX66916.b2
MAVIVSWIAELVWEIFQISGLTIQTFLVFCVVFLLAYVLLKRHKNLPPYPAGRVPVLGHL
326
LALGREPPLKLTAWRRQYGDVFTVRMGMEDVVVLNGYTAVKDALVDRSELFASRPPNYLL
506
DAIVGCGK
()
>AFPZ28428.y1
exon 1 79% to AFSA636542.b2
MATAVFRWIIQSVQDTLQIYGLNLQSLLVFCTAFVLACALLKRSPNLPPYPAGRVPVLG
304
HLLALGRAPHLKLTAWRRQYGDVFTVRMGMEDAVVLNGYTAVKDALVDRSELFASRPPNY
484
LFDLTVDSGK
()
>ASWX66916.b2
exon 1 89% to AFSA636542.b2
walked to
AWXX13027.b1 ATUP266482.b1
mate pair
= ASWX66916.g2 exon 3
AFSA35511.g2
exons 2,3 ATGN165304.g1 ASFW57081.b2
walked
upstream to ATUP266482.b1 which = ASWX66916.b2 join seqs.
MAAVVSWIAESVLEILPMSGPTLQTFLVFCVAFLLTWALLRRPRNLPPYPAGRVPVLG
489
HLLALGRAPHLQLTAWRRQYGDVFTVRMGMEDVVVLNGYTAVKDALVDRSELFASRPPNY
669
LFDSSVGFGK
()
DIGAARWGTELKQRRRFATAALKHLGMKVGTGSVEDNIRQEASCLRNR (0)
IAEYHGQPFGISNDMKVAVANVICSMAFGRRYGYEDETFRELSEAIRNLLAEIGSGQFISVFPLLRFVPG
>ATWX43498.b1
exons 2,3 very similar to AFSA35511.g2
walked to
ATUP237571.b1 no obvious exon kept walking to ASWX77262.g2 = exon 4
ASFW107932.b3
exon 4 walked to AFPZ187653.x1 exon 4,5 ASWX45971.b2
DIGAAPLGDRVEAEKRFATAALKHLGMKVGTGSVEDNIRQEASCLRKR
(0)
IAEYHGQPFAISNDMKVAVANVICSMAFGRRYGYEDETFRELSEAIRNLLAEIGSGQFISVFPLLRFVPG
()
ACKEVLKHLSKIHEVLWDEIARHRENFDRENPRDFLDFCLLELEQREK
VEGLTEENVLYMAQNLFLAGTDTTANTLLWSLLYMTLNPDIQNK
(0)
VHEELDA
>AFPZ601018.b2 ATGN133651.b1 ATGN143242.b1
walked to
ATUP71680.y1 (exon 5)
walked to ATUP705359.g2
(exon 4)
walked to
ATGI77993.b1 (exon 3)
walked to
AFPZ24940.g2 (exon 2)
walked
from exon 7 downstream to AFSA524984.b2 to try to find a mate pair
to exon 1
did not work
tried
finding more exon 3,4 hits to look for more mate pairs
ATUP206044.x2
mate pair = ATUP206044.y2 = exon 1
MTGAVQWIADSVQEILQISELTLQTFLVLCSTFLLACVVFNRSRSRNL
PPYPAGRVPVLGHLLALGRAPLLKLTAWRRQYGDVFTVRMGMEDVVVLNGYTAVQDALVD
RSELFASRSPFYYLFDALFAFGK
(1)
DIISARWGSGFRQKKRFATTVLKNLGMRVGRGSIEDSIREEASCLRNR
(0)
IAENNGQPFDIAHDVAVAVANIICSMAFGKRYDYEDETFRELTKAIATISIELGAGHIT
SVFPLLRFVPV (1)
VLYNHSHLYATVNRPIIKALEASSKVKNVMREEISRHREHLDRENPRDFLDLCLLELEQQE
KVEGLTEENVFHMAQDLFLGGTDTTANTLTWSLLYMTLNPDVQNK
(0)
VHEELDAVVGESLPALSHRSQLPYVNACLLETMRIRTIVPLASHATTQEVKVQGYDIPKGTQ
(0)
LMLTSPHMDPANWPDPDPFDPERFLDAEGNVIKKPESFMPFSG
(1)
GRRVCLGEQLARMELFLFFSTLLQSFTFKTPVGAPPPNTDGIPGLTFMPHPFQLLAIER*
>APWS97989.g1
exon 1 93% to AFSA636542.b2
ATUP196459.x2
AFSA321451.g2 ASFW202410.g2
Walked
upstream to ATGI153668.g1 AFPZ313895.x1 mate pair AFPZ313895.y1
to try to
find a mate pair in the C-term part
AFPZ313895.x1
mate pair AFPZ313895.y1 = AFSA636542.b2 seq exon 3
These two
seqs are 95% identical
APWS97989.g1
exon 4 end of exon 4 = BI377261 join seqs.
BI377261
Amphioxus 5-6 hrs cDNA 49% to 2U1 fugu 75% to BI387848
AFPZ459499.y1
ATUP541153.g1 AFSA636542.b2
MAIIVSWIVESVLEILQISGLTLQTILVFCVAFLLACTFWKRPRNLPPYPAGRVPVLGH
585
LLALGRAPHLKLTAWRRQYGDVFTVRMGMEDVVVLNGYTAVRDALVDRSELFASRPPNYL
405
FDLTVGFGE
()
Missing
exon 2
VAEYEGKPIDIAHGINVAVANVICSMTFGKRYDYEDETFRELSEAVVTIMSELGAGQIIS
VFPLLRFVPG (1)
ASYSVSAQLAKIQKVLREEMSRHREHLDRENPRDFLDFCLLELEQQEKVAGLTEENVLYMAQ
NLFFAGTDTTTNTLRWSLLYMALNPDIQKK
(0)
VQEELDAIVGESLPTLSHRSQLPYVNACLLETMRIRHIGPLAVPHATTDTVKVKEYDIAKGTQ (0)
VLPNLHSLHMDPAYWPDPERFDPERFLDAEGNVINKPESFMPFSG
(1)
GRRVCLGEQLARMELFLFFSTLLQSFTFKTPEGAPPPSTDGVF
GVTLTPHPFQLCAIPR*
>AFSA636542.b2
ATUP541153.g1 ATUP933964.y1
ATUP933964.x1 ATUP738986.y1
ATGI10244.g1
ATGN171873.g1 ATUP926693.b1 (exon
6) AFSA482736.g2 (exon 6)
AFSA726698.g2
(exon 6)
34% to 2N1
35% to 2D4
MAVIVSWIVESVLEILQISGLTLQTILVFCVAFLIACTFLLK
RPRNLPPFPAGRVPVLGHLLALGRAPHLTLTAWRRQYGDVFTVRMGMEDVVVLNGYTAV
KDALVDMSELFASRPPNYLFDLTVGFGE (1)
DIVTARWGSKFRQRRRFATTALRNLGMKVGTGSIEEKIREEAIRLRNR
(0)
VAEYEGKPIDIAHGINVAVANVICSMTFGKRYDYEDETFRELSEAVVTI
MSELGAGQIISVFPLLRFVPG
(1)
ASYSVSGQLAKIQKVLREEMSRHREHLDHENPRDFL
DFCLLELELQEKVAGLTEENVLYMTQNLFFGGTDTTTNTLLWSLLYMILNPDIQKK (0)
AQEELDAVVGESLPTLSHRSQLHYVNACLLEVMRIRH
IGPLAVPHATTDTVKVKEYDIAKGTQ
(0)
VLPNLHSLHMDPAYWPDPDRFDPVRFLDAE
GNVINKPESFMPFSG (1)
GRRVCLGEQLARMELVLFFSTLLQSFTFKTPEGAPPPSTDGIFGITLTPHPFQLCAIPR
>exon 1
ATGN171873.g1 APWS102929.b1 ATGI10244.g1
ATGGCTGTAATTGTCAGCTGGATAGTTGAGTC
CGTCCTGGAGATTTTGCAGATCTCCGGGCTGACTCTGCAAACAATTCTCGTCTTCTGTGT
GGCCTTCCTCATTGCGTGCACGTTCTTGTTAAAGCGCCCCAGGAACCTGCCACCTTTCCC
GGCAGGACGCGTGCCTGTTCTCGGGCACCTCCTCGCCTTGGGCCGAGCGCCTCACCTCAC
GCTGACGGCGTGGAGGCGGCAGTACGGGGACGTCTTCACCGTCAGGATGGGGATGGAAGA
TGTGGTGGTTCTGAACGGCTACACTGCCGTCAAGGATGCGCTCGTGGACATGTCCGAGCT
GTTCGCGTCCAGGCCGCCAAACTACCTGTTCGATTTGACAGTTGGATTCGGAGAAGGT (1)
>DIVT
exon 2 ATGI10244.g1
AGACATTGTTACTGCACGTTGGG
GGAGCAAGTTCAGACAGAGACGGAGGTTTGCTACCACGGCGTTAAGGAACCTCGGCATGA
AGGTCGGCACTGGCAGCATTGAAGAGAAAATCCGAGAGGAAGCTATACGTCTCCGCAACA
GGGT
>VAE
exon 3 ATUP738986.y1
AGGTTGCAGAATACGAGGGAAAAC
CTATTGATATCGCCCATGGTATCAACGTGGCGGTCGCGAACGTCATCTGCTCCATGACGT
TCGGAAAGCGCTACGACTACGAGGATGAAACGTTCCGGGAGCTCTCTGAGGCGGTTGTGA
CAATAATGTCTGAGCTTGGAGCGGGGCAGATTATCAGTGTCTTCCCCCTGTTACGGTTTG
TTCCAGGAGGT
>ASYS
exon 4
AGCCAGCTACAGTGTATCTGGACAACTGGCGAAGATCCAAAAGGT
GTTGAGGGAAGAAATGTCTCGCCATCGAGAACACCTGGATCACGAGAACCCACGAGACTT
CCTCGACTTCTGCCTGCTGGAGCTGGAACTGCAGGAAAAGGTGGCTGGTCTGACGGAAGA
GAACGTCCTGTATATGACACAGAACCTTTTCTTCGGTGGAACAGACACGACCACCAACAC
ATTGCTGTGGAGTCTACTCTACATGATTTTGAACCCAGACATCCAAAAGAAGGT
>AQEEL
exon 5
AGGCACAAGAGGAGCTTGATGCCGTTGTTGGTGAGAGTCTGCCCACCCTGTCCC
ACCGTTCCCAGCTGCACTACGTGAACGCCTGCCTGTTGGAGGTCATGAGGATCCGCCATA
TCGGGCCTCTTGCCGTTCCCCACGCCACCACAGACACGGTCAAAGTGAAGGAGTACGACA
TCGCTAAGGGAACCCAGGT
>VLP
exon 6 AFSA726698.g2 ATUP926693.b1 AFSA482736.g2
ATUP933964.y1
AGGTACTACCGAA
CTTGCACTCCCTCCACATGGACCCCGNCTACTGGCTTGATCCGGACC
GTTTTGACCCCGTAAGATTCCTGGACGCGGAA
GGGAACGTCATCAACAAGCCTGAGTCCTTCATGCCTTTTTCTGGAGGT
>GRR
exon 7 ATUP933964.x1
AGGCCGACGTGTGTGTCTTGGTGAGCAGCTGGCCAGGATGGAACTTGTCCTG
TTCTTCTCGACTCTACTGCAGTCCTTCACCTTCAAGACGCCAGAGGGCGCCCCTC
CTCCAAGCACTGACGGCATCTTTGGGATAACATTGACACCGCATCCGTTCCAGCTTTGTG
CAATACCACGTTAG
Other
closely related exons
>ATUP699472.x1
exons 6,7
VLLNVYSLHMDPAYWLDPDRFDPERFLDAEGKVINKPESFLPFGG
(1)
GGRVCLGEQLARMELFLFFTTLLQSFTFKPPEGASPPNADGILGLTLAPHPFQLSAIPR*
>AFPZ728456.y1
exons 5,6
VHEELDAVVGESLPTLSHRSQLPYVNACLQEVMRIRPVGPLAIPHATTEAVKVRGYDIPKRTQ
(0)
VLLNLYSLHMDPAYWPDPDRFDPERFLDAEGKVINKPDSFLPFGG
(1)
>AFPZ476483.b2
exons 5,6
VHEELDAVVGESLPTLSHRSQLPYVNAC
LQEVMRIRPVGPLAIPHATTEAVKVRGYDIPKRTQ
(0)
VLLNLYSLHMDPAYWPDPDGFDPEXFLDAEGKVXHKPES
>exons
5,6
AFSA16336.x4 AFSA16336.x1 AFPZ506410.x1 APNK80508.g2 ASWX68286.g3
ATUP343092.y1
ATGN182700.g1 ATUP756295.y1 AFPZ866552.y1 ATUP443435.g1
ASFW36405.b2 AFSA625448.b2 AFSA427303.b2
AFSA716480.g2 AFPZ471003.x1
VQQELDAVMGASLPSLSHRSKLPYVNACLMETMRIRTLLSVILHATAQEVKVQGYDIPKGTR
(0)
VLMNMHSLHMDPAYWPDPDRFDPERFLDAEGNVINKLPSFMPFSG (1)
AGGTACAGCAGGAGCTTGATGCC
GTTATGGGCGCGAGTCTGCCCAGCCTGTCCCACCGCTCCAAGCTGCCCTACGTGAACGCC
TGCCTGATGGAGACCATGCGGATCCGCACTCTTCTGTCTGTCATCCTTCACGCCACCGCG
CAGGAGGTCAAAGTGCAGGGATACGACATTCCTAAGGGAACTCGGGT
AGGTGTTGATGAACATGC
ACTCCCTCCACATGGACCCCGCCTACTGGCCTGACCCGGACCGGTTTGACCCCGAAAGGT
TTCTGGACGCGGAAGGGAACGTCATCAACAAACTTCCATCCTTCATGCCTTTTTCAGGAGGT
>ATGI42736.b1 ATGN217089.g1 ATUP49594.g2 AFSA786188.b2
AFSA126109.g2 AFPZ657783.y1 ATUP738387.x1
AFPZ495923.y1 dup. exon 5 (pseudogene) exon 6 and
part of 7
VHEELDAVVGASLPALSDRSQLL
YVNACLLETMRIRTLVPVSLPH
VQQELDAVVGASLPALSHRSQLPYVNACLMETMRIRTLLSVILHATAQEVKVQGYDISKGTR
(0)
VLMNMHSLHMDPAYWPDPDRFDPERFLDAEGNVINKLPAFMPFSG (1)
GHRVCLGEQLARMELFLFFSTLLQSFTIKTPEGAPPPNTDGIFGLALKPHPFQLCAIPR*
AGGTGTTGATGAACATGC
ACTCCCTCCACATGGACCCCGCCTACTGGCCTGACCCGGACCGGTTTGACCCCGAAAGGT
TTCTGGACGCGGAAGGGAACGTCATCAACAAACTTCCAGCCTTCATGCCTTTTTCAGGAGGT
>AFSA241515.g2 AFPZ140710.y1 APWS45577.b1
ASWX65492.b2
ATUP320554.x1 ATGN323264.g1 ATGN284296.b1
ATGI170827.b1
ATUP12995.x2 ATUP716729.x1 AFSA152443.b2
VLMNMYSLHMDPVYWPDPDRFDPERFLDAEGNVINKPESFMPFGG
(1)
GRRVCLGEQLARMELFLFFSTLLQSFHFKTPEGAPAPCADGIFRMTVTPHPFELCAIPV*
>AFSA83521.b2
VLMNMYSIHMDPVYWPDPDRFDPERFLDAEGNVINKPESFMPFGG
(1)
GRRVCLGEQLARMELFLFFSDLLQSFTFKTPEGAPAPCADGIFPMTLTPXPFELCAIPR*
>APWS92234.g2 ATWX24634.b1 ATGN357284.b1
ATUP895861.x1 ATGI42736.g1
ATGI104268.g2 ATWW86466.b1 ATWW117588.b1
ATUP559927.b1 AFPZ509619.y1
ATGN267193.b1
IHEELDAVVGESLPALSHRPQLPYVNACLLETLRIRTLV
XXXXHATTQDVKVQQFDIPKGTQ
(0)
VLPNLHSLHTDPAYWPDPDRFDPERFLDAEGNVINKPESFMPFSG
(1)
>ATUP710771.b1
1 aa diff to APWS92234.g2
AFSA525510.b2
VLPNLHSLHTDPAYWPDPDRFDPERFLDAEGNVINKPESFMPFGG (1)
>AFPZ185379.x1
ATGI56471.b1 ATGN311806.g1 ATUP672818.b1 AFSA909926.b2
AFSA523356.b2 AFSA330395.g2 AFSA330395.b2
ATGI93799.g1
APWS110835.b1 ASWX76430.g2
GTQCKLHACRSTLEDPLEQQAKLSSLTEENVLHMAGDLFLAGTETTTNTLQWSLLYMTLNPDIQNK
(0)
VQEELDAVVAESLPTLSHRSQLPYVNACLLEVMRIRTLIPAVRHVTTQEVKVQEYHISMGTW
(0)
VLANLHSLHTDPAYWPDPDRFDPERFLDAEGNVINNPKSFMPFGG
(1)
GRRACLGEQLARMELFLFFSTLLQSFTFTTPEGALPPNTDGVFGLTLVPHPFQLCATPR*
>ATUP912010.y1
ATGI128166.b1 ASFW164761.b2 AFSA840082.g2
AFSA174046.g2 AFSA315286.g2 AFPZ159110.y1
ATWW201417.g1
AFSA778163.g2
VLVNLHSLHMDPVYWPDPDRFDPERFLDAEGNVVNKPQSFMPFAG
>ATUP680104.g1
VILNLHSVHMDPAFWPDPDRFDPDRFLDAEGNFINKPESFMPFSA
(1)
>ATWW125683.g1
VHEELDAVVGASLPTLAHRSQLPYVNAFLMEVMRIRYVGPLGVPHATTAAVKVQEYDIPEGTQ
(0)
IILNLHSVHMDPAFWPDPDRFDPDRFLDAEGNFINKPESFIPFSA
>APWS102434.g1
ATWW177217.g1
KVQGYDIPKGXX
VLMNLYSLHMDPAYWPDPDRFDPERFLDAEGNLINKPESFMPFG
(1)
>ATWW233361.g1
ATUP551452.y1
VHEELDAVVGESLPALSHRSQLPYVNACLMEIMRIRYVGPLSVPHATTAPVKVQEYDIPKGTQ
(0)
VIVNLHSLHVDPAYWPDPDRFDPDRFLDAEGNFINKPESFMPFS
>AFPZ859823.x1
AWYB2850.g1 ATWW63772.b1 AFPZ870007.y4 AFPZ870007.y1
mate pair
AFPZ859823.y1 = exon 7 ATUP557464.y1
ATUP557464.y1
ASFW50972.b2 AFSA305932.b2
AFPZ122560.y1 ATUP820771.x1
Almost 51%
to CYP2U1 human
VWTKIQFSNIPLLITIVSGKLVTRFLFPVLFLPLVNR ???? uncertain
AFMEVLKQNSRVHEVLWDEIARHRETFDSENPRDFIDFCLLELEQQE
KVDGLTEENVMYMAQDLFFAGTETATNTLLWSLLYMTLNPGVQQK
(0)
VHEELDTVVGASLPTLSHRSRLPYANACLMETMRIRHIAPLIIPHATTDTVRVQEYDIPEGTQ
(0)
VLMNMYSLHMDPAYWPDPDRFDPERFLDAEGNVINKPESFMPFGG
(1)
GRRVCLGEQLARMELFLFFSTLLQSFSFKTPEGAPAPCADGIFRMTLTPHPFELCAIPR*
>ATWW225973.b1 CYP2U like ATUP29908.b1 ATUP481728.g1
APNK56784.b2
AFSA784029.g2
ATUP411423.g1
FLDSDGKVVTRPESFMPFST
(1)
GRRVCLGEQLAKMELFLLFSSLLKHFTLKLPEGAAAPSTDGIMGFFYVPPKVNMCITKR*
>ATGI187647.g1
MWLMTITVGLVTLILVKWLKDYVQRWRMPPGPFFWPVIGNLSCKYRGS (0)
>ATGI151113.b1 ATWW15542.g1
SYLTFIDLAKTYGDVFSLKMGMTDVVVLNSLDAVKEAFVKKGEDFAGRPKMT (1)
>AFPZ866519.x1 66% to C-helix of CYP17A2
TDISSEGGKDIAFADYSPTWKLHRKLFHSAIR
(2)
>ATGI157309.b1
ATGN157240.b1 ATUP362994.b1 AFPZ866519.x1
GYASAQNLQSKVHESLEDTIAVFSKMEGQAVDLEDYIYQLVYNVICSAAFGTR
(2)
>AFPZ866519.y1
YNMDDEDFDTLMKISKDTTETFGQGLLADVYPVLRFLPSS
(1)
>AFPZ295620.x1
SVTANRKMTHQLMEIMQRHLEQHRESFDP
(1)
IPLNEYQCTLLQITSVTSQITMIKAQKDAEEEGIQDIDSLTDTHLRQLIGDISF
(1)
>ASWX154218.b2 I-helix to
EXXR region ATGN97768.g1 AFSA255326.b2
47% to
CYP1A7
AGTISTILTLRWAILYLAVHPEIQEKVAAELDSVVGRDRLPELSDREATPYTEAIFHEVMRMASMDPV
SLPHATTVDTTLS
()
GYQIPKGTWILPNLWALHHDPDTWGDPDVFRP
>AFPZ295620.y1
ASWX154218.g2 (very
end + downstream seq)
DVFRPERFLDESGKPIPKPAALMPFG
(2)
VGRRACPGEALGKADTFLLLGGLVQNFRFSIPEGEGPPDLTPDEIGQ
GSISIPYPYNVVMTCRK*
35% to
Xenopus CYP17 and 36% to CYP1A6 and CYP1A7
MWLMTITVGLVTLILVKWLKDYVQRWRMPPGPFFWPVIGNLSCKYRGS (0)
SYLTFIDLAKTYGDVFSLKMGMTDVVVLNSLDAVKEAFVKKGEDFAGRPKMT (1)
TDISSEGGKDIAFADYSPTWKLHRKLFHSAIR
(2)
GYASAQNLQSKVHESLEDTIAVFSKMEGQAVDLEDYIYQLVYNVICSAAFGTR
(2)
YNMDDEDFDTLMKISKDTTETFGQGLLADVYPVLRFLPSS (1)
SVTANRKMTHQLMEIMQRHLEQHRESFDP
(1)
IPLNEYQCTLLQITSVTSQITMIKAQKDAEEEGIQDIDSLTDTHLRQLIGDISF
(1)
AGTISTILTLRWAILYLAVHPEIQEKVAAELDSVVGRDRLPELSDREAT
PYTEAIFHEVMRMASMDPVSLPHATTVDTTLS
()
GYQIPKGTWILPNLWALHHDPDTWGDPDVFRPERFLDESGKPIPKPAALMPFG
(2)
VGRRACPGEALGKADTFLLLGGLVQNFRFSIPEGEGPPDLTPDEIGQGSISIPYPYNVVMTCRK*
>DE040433.1 Amphioxus genomic
survey sequence. No introns, NEW 1/6/06
41%
to 1B1 Danio, 40% to CYP1C1 fugu, 39% TO 1A1 HUMAN, 39% to Xenopus 1A6, 1A7
trace
file 630869645 632546376 539391436
MAAVATAALFGLSYLQVVLIAVLLVLVAAVVASSLRQNTPSLPPGPWGF
PVVGIFPALGSRPHHAFSRMAEKYGDVFRVKFGSRT
VIILNGIDMVKDACVKQSACFAGRPALYSFKQVKNGITFKTYSPSWVARKKVTVGALKGF
VNGRVGALTASAETMITEEAQELARVFLSKSGQPSNPEEYAHTAVANVVCALCFGKRYEH