Comparison of Dictyoselium assemblies from genome annotation and my annotation.

Attempts are made to reconcile the differences in the assemblies.

D. Nelson

May 13, 2003

 

I have 55 sequences identified

 

51     = seq 19             ng7190 use ng7190

508A1  = seq 1+2            ng3018 missing N-term exon use my seq

508A2  = seq 22+45+69+86+88 ng5262 N-term incorrect use my seq

508A3  = seq 21+67+72+82+87 ng10949 correct the two diffs and add my N-term

508A4  = seq 66+70          ng4360 remove K from mine at boundary then use mine

508B1  = seq 15+77          ng6400 missing most of N-term use my seq

508B2P = seq 60

508C1  = seq 4+23+24+83     ng1934 missing three parts use my seq

508D1  = seq 59+81          ng1935 missing N-terminal exon use my seq

508E1  = seq 71+85          ng8897 use my seq correct two other defects

513A1  = seq 8+53           ng6889 n-term missing use my seq

513A2P = seq 38             ng12344 correct 1 aa diff use my seq

513A3  = seq 7+50           ng4838 add missing intron back use my seq

513B1  = seq 49             ng627 2aa diffs use ng627 seq

513C1  = seq 5+56           ng6446 100% match

513D1  = seq 54+55+78       ng11186

513E1  = seq 51+52+90       ng3101 use N instead of D

513E2P = seq 37a complete except for missing N-term exon 55% to 513E1

         (old name 513C1P)  ng1289 missing first 13 aa use my seq

513E3P = seq 37b upstream of 513E2P only 429 bp between them (old name 513C2P)

513F1  = seq 57

513G1P = seq 41 37% to CYP513A1

514A1  = seq 11             ng1282 add 7aa to exon 1 and change 1 aa L to F in ng1282

514A2  = seq 65             ng1006 joint at N-term exon does not agree, 2aa diffs

      inside, use ng1006 seq

514A3P = seq 89

514A4  = new seq            ng2792 7aa difs to 514A1

515A1  = seq 79b            ng4137 some parts of both seqs have support

515A2P = seq 79a

515B1  = seq 6              ng2647 N-terms do not match use ng2647 but extend back

516A1  = seq 9+43           ng9107 use ng9107

516A2P = seq 64

516B1  = seq 10+42          ng9290 use mine  but add insertion

516B2P = seq 48 partial, might be a pseudogene

517A1  = seq 3+61           ng4501 use this seq with my N-terminal

517A2  = seq 74             ng10310 correct 3aa diffs and use my seq

517A3P = seq 36             ng682 missing some seq. Use my seq add new C-term piece

517A4  = new seq            ng5440 most like 517A1, 13 aa diffs

518A1  = seq 20

518A2P = seq 25+27+29+30

518B1  = seq 26+28

519A1  = seq 12             ng8085 add N-term exon then use ng8085

519B1  = seq 35+44+75       ng6320 USE ng6320

519C1  = seq 31+32+33       ng5581 use my seq (N-term missing)

519C2P = seq 46

519D1  = seq 14+34          ng12432 use my seq

519E1  = seq 17+84          ng6352 add two short pieces use ng6352

520A1  = seq 47+68+80       ng8864 use this seq with my N-term

520B1  = seq 58             ng1210 correct stop codon. use my seq.

521A1  = seq 13             ng2405 100% match

522A1  = seq 18

519H1P = seq 39+73          ng11335 old 523A1 correct three aa diffs add my N-term

524A1  = seq 91             ng752 N-terms do not agree use my seq

525A1  = seq 62

554A1  = seq 40             ng1193 small diffs use ng1193 seq

555A1  = seq 76             ng10297 use ng10297

556A1  = seq 92             ng10943 use my seq

 

Dicty database did not identify the following as P450s 508B2P, 513E2P, 513F1, 513G1P,

515A2P, 516A2P, 516B2, 518A1, 518A2P, 518B1, 519C2P, 522A1, 525A1,

note 518A1, 518B1 and 522A1 have non-standard Heme signatures

(13 seqs)

Dicty database found two new genes I had missed 514A4 and 517A4 both very similar to existing genes.

 

>514A2

>aa-ng1006        Contig_0356     pred cdna   pORF 5256         3526 strand r start y stop y ph 0

MNLIYTIILTIIILVLIISIKDLFFEDKIKKINKSIPSPPTIPIFGNLLQINSKDVATCF

NDFYKQYGKVYRLRLGSVETVVLTGGDIIDECFNKKYRDFLKARYVKFSRYLGKDTNILH

SNGDYHFLLKGVLSSQVTVRKLNNGRLEFNKYILQMFNNLNNNDEGSTMFLANDVPSQIK

KLILKVVLNFTLGIEENDDINLSLFQNGSNIFKAAGLFIYSDYLPFLFPLDIKSMAKSNM

ISSYVFVRDYLAKKLEEVKKKEYIINGDDDGGVDTSQTPLIESYYKLYLQGLIGYDSILL

SIVDIIIASVDTTSNSISFIIARLTNHQEIQSKIYEEIMSNDINNNSNNISFSDHSKYPY

IISIMNETYRYYASVPLPEPNMTTEDIEVDGYKIAKGTQIYKNIRGTLISKEFWGEDALE

FKPERFKTQTLNQKGLLHFGAGPRGCPGARFTECFFFTLMVLLFKNYKLQNPNDNPIDDR

GDVGLSMQCKPYDALFIKRN

 

JC1c209h10.r1 has FL not CF

JC3a120c12.s1 has ?F not FL

Contig_0356   has CF not FL use CF

 

Query:     1 MNLIYTIILTIIILVLIISIKDLFFEDKIKKINKSIPSPPTIPIFGNLLQINSKDVATCF 60

             MNLIYTIILTIIIL       DLFFEDKIKKINKSIPSPPTIPIFGNLLQINSKDVAT 

Sbjct:     1 MNLIYTIILTIIIL-------DLFFEDKIKKINKSIPSPPTIPIFGNLLQINSKDVATFL 53

 

Query:    61 NDFYKQYGKVYRLRLGSVETVVLTGGDIIDECFNKKYRDFLKARYVKFSRYLGKDTNILH 120

             NDFYKQYGKVYRLRLGSVETVVLTGGDIIDECFNKKYRDFLKARYVKFSRYLGKDTNILH

Sbjct:    54 NDFYKQYGKVYRLRLGSVETVVLTGGDIIDECFNKKYRDFLKARYVKFSRYLGKDTNILH 113

 

Query:   121 SNGDYHFLLKGVLSSQVTVRKLNNGRLEFNKYILQMFNNLNNNDEGSTMFLANDVPSQIK 180

             SNGDYHFLLKGVLSSQVTVRKLNNGRLEFNKYILQMFNNLNNNDEGSTMFLANDVPSQIK

Sbjct:   114 SNGDYHFLLKGVLSSQVTVRKLNNGRLEFNKYILQMFNNLNNNDEGSTMFLANDVPSQIK 173

 

Query:   181 KLILKVVLNFTLGIEENDDINLSLFQNGSNIFKAAGLFIYSDYLPFLFPLDIKSMAKSNM 240

             KLILKVVLNFTLGIEENDDINLSLFQNGSNIFKAAGLFIYSDYLPFLFPLDIKSMAKSNM

Sbjct:   174 KLILKVVLNFTLGIEENDDINLSLFQNGSNIFKAAGLFIYSDYLPFLFPLDIKSMAKSNM 233

 

Query:   241 ISSYVFVRDYLAKKLEEVKKKEYIINGDDDGGVDTSQTPLIESYYKLYLQGLIGYDSILL 300

             ISSYVFVRDYLAKKLEEVKKKEYIINGDDDGGVDTSQTPLIESYYKLYLQGLIGYDSILL

Sbjct:   234 ISSYVFVRDYLAKKLEEVKKKEYIINGDDDGGVDTSQTPLIESYYKLYLQGLIGYDSILL 293

 

Query:   301 SIVDIIIASVDTTSNSISFIIARLTNHQEIQSKIYEEIMSNDINNNSNNISFSDHSKYPY 360

             SIVDIIIASVDTTSNSISFIIARLTNHQEIQSKIYEEIMSNDINNNSNNISFSDHSKYPY

Sbjct:   294 SIVDIIIASVDTTSNSISFIIARLTNHQEIQSKIYEEIMSNDINNNSNNISFSDHSKYPY 353

 

Query:   361 IISIMNETYRYYASVPLPEPNMTTEDIEVDGYKIAKGTQIYKNIRGTLISKEFWGEDALE 420

             IISIMNETYRYYASVPLPEPNMTTEDIEVDGYKIAKGTQIYKNIRGTLISKEFWGEDALE

Sbjct:   354 IISIMNETYRYYASVPLPEPNMTTEDIEVDGYKIAKGTQIYKNIRGTLISKEFWGEDALE 413

 

Query:   421 FKPERFKTQTLNQKGLLHFGAGPRGCPGARFTECFFFTLMVLLFKNYKLQNPNDNPIDDR 480

             FKPERFKTQTLNQKGLLHFGAGPRGCPGARFTECFFFTLMVLLFKNYKLQNPNDNPIDDR

Sbjct:   414 FKPERFKTQTLNQKGLLHFGAGPRGCPGARFTECFFFTLMVLLFKNYKLQNPNDNPIDDR 473

 

Query:   481 GDVGLSMQCKPYDALFIKRN 500

             GDVGLSMQCKPYDALFIKRN

Sbjct:   474 GDVGLSMQCKPYDALFIKRN 493

 

>CYP513B1

>aa-ng627        Contig_0257     pred cdna   pORF 5998         4427 strand r start y stop y ph 0

MNLLVLSVILAIIIYLIFKRNYKYSPSKINSKIPGPIGLPIFGNILSLDNKNGIHTTFQK

WFKIYGPIYSVNMGNKSAVVLTGFPIIKKAFIDNSEAFAPHYTFESRYKLNKCSDITQEN

GKNQSALKRIFLSELTVTRIKKQESHIQNEIVKLMKVLDKHSEDGKPFLLNNYFSMFSIN

IISRFLFGIDFPYQDFEETSDLMVGIRDLLIASGEIVLSDFLPIPHSKRSKLYTSYQALV

VQIETLVKSHKYKEDDECMLSKLMIEHDKGNIPWDAVISNCNTIITAGSDSTSSTALFFL

IEMMNNPTIQTKVYNDIVVSFEQNQQADDYMNESMVILKYSKYRSLIPYLSLALKENYRK

HPAAPFGAPHETTQETVIEGYTIAKGTMIFQNIYATQRSDTFYSQPDEFIPERWNGDENS

QTLISFGTGIRDCIGKSLAYNEIFTIIASVLNRYEFINPNPSIPFDDNGIPGLTTQCKNT

VVQIKKR

 

Note only one seq has L C.  The others all have S W so correct my seq

 

>CYP513B1 seq 49 complete seq 45% to seq 7, 50

          Length = 488

 

 Score = 2504 (881.5 bits), Expect = 4.8e-264, P = 4.8e-264

 Identities = 485/487 (99%), Positives = 485/487 (99%)

 

Query:     1 MNLLVLSVILAIIIYLIFKRNYKYSPSKINSKIPGPIGLPIFGNILSLDNKNGIHTTFQK 60

             MNLLVLSVILAIIIYLIFKRNYKYSPSKINSKIPGPIGLPIFGNILSLDNKNGIHTTFQK

Sbjct:     1 MNLLVLSVILAIIIYLIFKRNYKYSPSKINSKIPGPIGLPIFGNILSLDNKNGIHTTFQK 60

 

Query:    61 WFKIYGPIYSVNMGNKSAVVLTGFPIIKKAFIDNSEAFAPHYTFESRYKLNKCSDITQEN 120

             WFKIYGPIYSVNMGNKSAVVLTGFPIIKKAFIDNSEAFAPHYTFESRYKLNKCSDITQEN

Sbjct:    61 WFKIYGPIYSVNMGNKSAVVLTGFPIIKKAFIDNSEAFAPHYTFESRYKLNKCSDITQEN 120

 

Query:   121 GKNQSALKRIFLSELTVTRIKKQESHIQNEIVKLMKVLDKHSEDGKPFLLNNYFSMFSIN 180

             GKNQSALKRIFLSELTVTRIKKQESHIQNEIVKLMKVLDKHSEDGKPFLLNNYFSMFSIN

Sbjct:   121 GKNQSALKRIFLSELTVTRIKKQESHIQNEIVKLMKVLDKHSEDGKPFLLNNYFSMFSIN 180

 

Query:   181 IISRFLFGIDFPYQDFEETSDLMVGIRDLLIASGEIVLSDFLPIPHSKRSKLYTSYQALV 240

             IISRFLFGIDFPYQDFEETSDLMVGIRDLLIASGEIVLSDFLPIPHSKRSKLYTSYQALV

Sbjct:   181 IISRFLFGIDFPYQDFEETSDLMVGIRDLLIASGEIVLSDFLPIPHSKRSKLYTSYQALV 240

 

Query:   241 VQIETLVKSHKYKEDDECMLSKLMIEHDKGNIPWDAVISNCNTIITAGSDSTSSTALFFL 300

             VQIETLVKSHKYKEDDECML KLMIEHDKGNIP DAVISNCNTIITAGSDSTSSTALFFL

Sbjct:   241 VQIETLVKSHKYKEDDECMLLKLMIEHDKGNIPCDAVISNCNTIITAGSDSTSSTALFFL 300

 

Query:   301 IEMMNNPTIQTKVYNDIVVSFEQNQQADDYMNESMVILKYSKYRSLIPYLSLALKENYRK 360

             IEMMNNPTIQTKVYNDIVVSFEQNQQADDYMNESMVILKYSKYRSLIPYLSLALKENYRK

Sbjct:   301 IEMMNNPTIQTKVYNDIVVSFEQNQQADDYMNESMVILKYSKYRSLIPYLSLALKENYRK 360

 

Query:   361 HPAAPFGAPHETTQETVIEGYTIAKGTMIFQNIYATQRSDTFYSQPDEFIPERWNGDENS 420

             HPAAPFGAPHETTQETVIEGYTIAKGTMIFQNIYATQRSDTFYSQPDEFIPERWNGDENS

Sbjct:   361 HPAAPFGAPHETTQETVIEGYTIAKGTMIFQNIYATQRSDTFYSQPDEFIPERWNGDENS 420

 

Query:   421 QTLISFGTGIRDCIGKSLAYNEIFTIIASVLNRYEFINPNPSIPFDDNGIPGLTTQCKNT 480

             QTLISFGTGIRDCIGKSLAYNEIFTIIASVLNRYEFINPNPSIPFDDNGIPGLTTQCKNT

Sbjct:   421 QTLISFGTGIRDCIGKSLAYNEIFTIIASVLNRYEFINPNPSIPFDDNGIPGLTTQCKNT 480

 

Query:   481 VVQIKKR 487

             VVQIKKR

Sbjct:   481 VVQIKKR 487

 

>517A3P

MEIVNV (frameshift)

FIILIILFLVKDF (0)

VKKNKKIHTKSPSGPIAFPILGNVVQIRFWELFKIQEHEL (10 aa deletion)

IVRAWIGE RLFLFVSNYDVKYFQKDENFLYKPSLLVPGWRYASSNGLGVMSSSDDEWKRAKSS

VS**LRVHTSKKLMEEKAIEFIDSLEKISNNNEI (0)

FYPKGHIQGYACSMLFKYMFNQDLSVESGLSRTIGNAVEHVFGNLSKLTAFDCFE

IFSPLYDWFFTRRL*DCDIV*QIINILTENH

 

This is a pseudogene, so the sequences in green need to be added.

Ng682 has an extension that I did not have that needs to be added to my seq.

The seq similarity stops after LCSNG so I removed that last two aa.

 

>aa-ng682        Contig_0274     pred cdna   pORF 17004       15801 strand r start y stop y ph 0

MEIVNVKKNKKIHTKSPSGPIAFPILGNVVQIRFWELFKIQEHELIVRAWIGERLFLFVS

NYDVKYFQKDENFLYKPSLLVPGWRYASSNGLGVMSSSDDEWKRAKSS

FYPKGHIQGYACSMLFKYMFNQDLSVESGLSRTIGNAVEHVFGNLSKLTAFDCFEIFSPLYDWFFTRRL

KGCDIVRQIISSQNENHFKSIDPSKPRDLMDDLLIEYGLNEITKEDTMKINQICFDIFGPAIG

TVTITMNWVILQLCNRPEPQEIAYLEIKKAVKDDEYVNFNHKQNAPYIVAFIKETMRLCS

NGPN

 

>CYP517A3P Seq 36 pseudogene 77% to seq 3 77% to seq 61

           Length = 242

 

 Score = 546 (192.2 bits), Expect = 8.8e-99, Sum P(2) = 8.8e-99

 Identities = 103/103 (100%), Positives = 103/103 (100%)

 

Query:     6 VKKNKKIHTKSPSGPIAFPILGNVVQIRFWELFKIQEHELIVRAWIGERLFLFVSNYDVK 65

             VKKNKKIHTKSPSGPIAFPILGNVVQIRFWELFKIQEHELIVRAWIGERLFLFVSNYDVK

Sbjct:    20 VKKNKKIHTKSPSGPIAFPILGNVVQIRFWELFKIQEHELIVRAWIGERLFLFVSNYDVK 79

 

Query:    66 YFQKDENFLYKPSLLVPGWRYASSNGLGVMSSSDDEWKRAKSS 108

             YFQKDENFLYKPSLLVPGWRYASSNGLGVMSSSDDEWKRAKSS

Sbjct:    80 YFQKDENFLYKPSLLVPGWRYASSNGLGVMSSSDDEWKRAKSS 122

 

 Score = 418 (147.1 bits), Expect = 8.8e-99, Sum P(2) = 8.8e-99

 Identities = 79/86 (91%), Positives = 80/86 (93%)

 

Query:   109 FYPKGHIQGYACSMLFKYMFNQDLSVESGLSRTIGNAVEHVFGNLSKLTAFDCFEIFSPL 168

             FYPKGHIQGYACSMLFKYMFNQDLSVESGLSRTIGNAVEHVFGNLSKLTAFDCFEIFSPL

Sbjct:   157 FYPKGHIQGYACSMLFKYMFNQDLSVESGLSRTIGNAVEHVFGNLSKLTAFDCFEIFSPL 216

 

Query:   169 YDWFFTRRLKGCDIVRQIISSQNENH 194

             YDWFFTRRL  CDIV QII+   ENH

Sbjct:   217 YDWFFTRRL*DCDIV*QIINILTENH 242

 

>CYP524A1

MKTPTKYFIIFILLAALAVF (0)

KGSLPGPSFVPPFFGMLFQLIFTPFSFYEKQEKYGPISWTSIMNKF

VLFVTDAEINRQVFKEENAKLYLSLGAKKILTEK

AIPFIEGAP HRQLRKQLLPLFTIRALSSYLPIQESIVDEHIAMWIKNGKADINARNNCRDLNMAISTGVFV

GNNTPESVRDDIAKNFFVMNEGFLCLPIDLPGTTLRKAINARVRLVEIFTDIIAKSRKRMGDGE

KPQSLIDLWVEHFLNCPKEERDELSNDTIIFTLLSFMFASQDALTS

SLVWTVQLMAEHPDILAKVRAEQASLRPNNEKLDLDTMRQATYTRMVVSEILRFRPPAVMVPHE

NIEDIVIGDNVHVPKGTMILPSIWSAHFQEGGYSDPYKFDPQRFDSVRKEDVTCAKNSLVFGAG

PHFCIGKELAKNQIEVFLTKLAMSTEWTHNKTPGGDEIIFGPTIFPKDGCNITIKARN*

 

Note: two exon extensions occur in ng752. VSEAT to KLIVA extends exon 1 downstream

VGKFT to ITQK extends exon 2 upstream.  This makes the N-term too long.

The N-terminal up to the PFFGML motif is usually 30-50 aa long, but ng752 is 96aa long.  I include the sequence VLFV to LTEK, which ng752 removes as an intron.

I think my solution is better, since it preserves the length between the PFFGML motif and the C-helix motif (HRQLRK) as compared to the Arabidopsis CYP710A1 seq and

The CYP26B1 FROM FUGU (see aligns below, deleted intron shown in red)

 

>aa-ng752        Contig_0287     pred cdna   pORF 3086         1351 strand r start y stop y ph 0

MKTPTKYFIIFILLAALAVF

VSEATSKVGQQTTTTTQQVKCSGLQCTLNKLIVA VGKFTIKQILVGIVIVLATIALHQQYVITQK

KGSLPGPSFVP PFFGML FQLIFTPFSFYEKQEKYGPISWTSIMNKF

AIPFIEGAP HRQLRKQLLPLFTIRALSSYLPIQESIVDEHIAMWIKNGK

ADINARNNCRDLNMAISTGVFVGNNTPESVRDDIAKNFFVMNEGFLCLPIDLPGTTLRKA

INARVRLVEIFTDIIAKSRKRMGDGEKPQSLIDLWVEHFLNCPEEERDELSNDTIIFTLL

SFMFASQDALTSSLVWTVQLMAEHPDILAKVRAEQASLRPNNEKLDLDTMRQATYTRMVV

SEILRFRPPAVMVPHENIEDIVIGDNVHVPKGTMILPSIWSAHFQEGGYSDPYKFDPQRF

DSVRKEDVTCAKNSLVFGAGPHFCIGKELAKNQIEVFLTKLAMSTEWTHNKTPGGDEIIF

GPTIFPKDGCNITIKARN

 

>CYP710A1 (subject) CYP524A1 = query

         Length = 496

 

 Score = 780 (274.6 bits), Expect = 1.3e-80, P = 1.3e-80

 Identities = 167/469 (35%), Positives = 267/469 (56%)

 

Query:     8 FIIFILLAALA-VFKG-SLPGPSFVPPFFGMLFQLIFTPFSFYEKQEKYGPISWTS---I 62

             F++F+L+  L+ +FK  ++PGP FVPP  G    L+  P SF++KQ     IS  S   +

Sbjct:    19 FLLFLLVEQLSYLFKKRNIPGPFFVPPIIGNAVALVRDPTSFWDKQSSTANISGLSANYL 78

 

Query:    63 MNKFVLFVTDAEINRQVFKEENAKLYLSLG---AKKILTEKAIPFIEGAPHRQLRKQLLP 119

             + KF++++ D E++ Q+F       +  +G    KK+  +  + ++ G  H+ +R+QL P

Sbjct:    79 IGKFIVYIRDTELSHQIFSNVRPDAFHLIGHPFGKKLFGDHNLIYMFGEDHKSVRRQLAP 138

 

Query:   120 LFTIRALSSYLPIQESIVDEHIAMW---IKNGKADINARNNCRDLNMAISTGVFVGNNTP 176

              FT +ALS+Y  +Q+ ++  H+  W      G   ++ R   R+LN+  S  VFVG   

Sbjct:   139 NFTPKALSTYSALQQLVILRHLRQWEGSTSGGSRPVSLRQLVRELNLETSQTVFVGPYLD 198

 

Query:   177 ESVRDDIAKNFFVMNEGFLCLPIDLPGTTLRKAINARVRLVEIFTDIIAKSRKRMGDGEK 236

             +  ++    ++ + N G + LPIDLPG    +A  A  RL E       KS+ RM  GE+

Sbjct:   199 KEAKNRFRTDYNLFNLGSMALPIDLPGFAFGEARRAVKRLGETLGICAGKSKARMAAGEE 258

 

Query:   237 PQSLIDLWVEHFLNCPKEERDELSNDTIIFTLL-SFMFASQDALTSSLVWTVQLMAEHPD 295

             P  LID W++  +   +  +   S D  I  LL  F+FA+QDA TSSL+W V L+   P+

Sbjct:   259 PACLIDFWMQAIV--AENPQPPHSGDEEIGGLLFDFLFAAQDASTSSLLWAVTLLDSEPE 316

 

Query:   296 ILAKVRAEQASL-RP-NNEKLDLDTMRQATYTRMVVSEILRFRPPAVMVPHENIEDIVIG 353

             +L +VR E A +  P +N  + +D + +  YTR V  E++R+RPPA MVPH    D  +

Sbjct:   317 VLNRVREEVAKIWSPESNALITVDQLAEMKYTRSVAREVIRYRPPATMVPHVAAIDFPLT 376

 

Query:   354 DNVHVPKGTMILPSIWSAHFQEGGYSDPYKFDPQRFDSVRKEDVTCAKNSLVFGAGPHFC 413

             +   +PKGT++ PS++ + FQ  G+++P +FDP RF   R+ED    +N L FG GPH C

Sbjct:   377 ETYTIPKGTIVFPSVFDSSFQ--GFTEPDRFDPDRFSETRQEDQVFKRNFLAFGWGPHQC 434

 

Query:   414 IGKELAKNQIEVFLTKLAMSTEWTHNKTPGGDEIIFGPTIFPKDGCNITIKAR 466

             +G+  A N + +F+   +   ++   ++ G DEI++ PTI PKDGC + +  R

Sbjct:   435 VGQRYALNHLVLFIAMFSSLLDFKRLRSDGCDEIVYCPTISPKDGCTVFLSRR 487

 

Fugu CYP26B1

>CYP26B1 Scaffold_4267 Scaffold_49 partial sequence 74% to 26B1

         Length = 513

 

 Score = 355 (125.0 bits), Expect = 3.4e-36, P = 3.4e-36

 Identities = 122/440 (27%), Positives = 195/440 (44%)

 

Query:    24 LPGPSFVPPFFGMLFQLIFTPFSFY-EKQEKYGPISWTSIMNKFVLFVTDAEINRQVFKE 82

             +P  S   PF G     +     F+  +++KYG +  T ++ + ++ VT AE  R+V  

Sbjct:    49 MPKGSMGFPFIGETCHWLLQGSGFHASRRQKYGNVFKTHLLGRPLIRVTGAENIRKVLMG 108

 

Query:    83 ENAKLYLSL--GAKKILTEKAIPFIEGAPHRQLRKQLLPLFTIRALSSYLPIQESIVDEH 140

             E+  + +        +L   ++    G  HR+ RK    +F+  AL SYLP  + ++ E

Sbjct:   109 EHTLVTVDWPQSTSTLLGPNSLANSIGDIHRKKRKVFAKVFSHEALESYLPKIQQVIQES 168

 

 

>CYP524A1 Seq 91 complete seq 25% to 10, 42 468 aa

          Length = 468

 

 Score = 1926 (678.0 bits), Expect = 1.1e-236, Sum P(3) = 1.1e-236

 Identities = 371/388 (95%), Positives = 376/388 (96%)

 

Query:     1 MKTPTKYFIIFILLAALAVF 20

             MKTPTKYFIIFILLAALAVF

Sbjct:     1 MKTPTKYFIIFILLAALAVF 20

 

Query:    86 KGSLPGPSFVPPFFGMLFQLIFTPFSFYEKQEKYGPISWTSIMNKFAIPFIEGAPHRQLR 145

             KGSLPGPSFVPPFFGMLFQLIFTPFSFYEKQEKYGPISWTSIMNKF +   +   +RQ+

Sbjct:    21 KGSLPGPSFVPPFFGMLFQLIFTPFSFYEKQEKYGPISWTSIMNKFVLFVTDAEINRQVF 80

 

Query:   146 KQ 147

             K+

Sbjct:    81 KE 82

 

Query:   112 FYEKQEK-YGPISWTSIMNKFAIPFIEGAPHRQLRKQLLPLFTIRALSSYLPIQESIVDE 170

             F E+  K Y  +    I+ + AIPFIEGAPHRQLRKQLLPLFTIRALSSYLPIQESIVDE

Sbjct:    80 FKEENAKLYLSLGAKKILTEKAIPFIEGAPHRQLRKQLLPLFTIRALSSYLPIQESIVDE 139

 

Query:   171 HIAMWIKNGKADINARNNCRDLNMAISTGVFVGNNTPESVRDDIAKNFFVMNEGFLCLPI 230

             HIAMWIKNGKADINARNNCRDLNMAISTGVFVGNNTPESVRDDIAKNFFVMNEGFLCLPI

Sbjct:   140 HIAMWIKNGKADINARNNCRDLNMAISTGVFVGNNTPESVRDDIAKNFFVMNEGFLCLPI 199

 

Query:   231 DLPGTTLRKAINARVRLVEIFTDIIAKSRKRMGDGEKPQSLIDLWVEHFLNCPEEERDEL 290

             DLPGTTLRKAINARVRLVEIFTDIIAKSRKRMGDGEKPQSLIDLWVEHFLNCP+EERDEL

Sbjct:   200 DLPGTTLRKAINARVRLVEIFTDIIAKSRKRMGDGEKPQSLIDLWVEHFLNCPKEERDEL 259

 

Query:   291 SNDTIIFTLLSFMFASQDALTSSLVWTVQLMAEHPDILAKVRAEQASLRPNNEKLDLDTM 350

             SNDTIIFTLLSFMFASQDALTSSLVWTVQLMAEHPDILAKVRAEQASLRPNNEKLDLDTM

Sbjct:   260 SNDTIIFTLLSFMFASQDALTSSLVWTVQLMAEHPDILAKVRAEQASLRPNNEKLDLDTM 319

 

Query:   351 RQATYTRMVVSEILRFRPPAVMVPHENIEDIVIGDNVHVPKGTMILPSIWSAHFQEGGYS 410

             RQATYTRMVVSEILRFRPPAVMVPHENIEDIVIGDNVHVPKGTMILPSIWSAHFQEGGYS

Sbjct:   320 RQATYTRMVVSEILRFRPPAVMVPHENIEDIVIGDNVHVPKGTMILPSIWSAHFQEGGYS 379

 

Query:   411 DPYKFDPQRFDSVRKEDVTCAKNSLVFGAGPHFCIGKELAKNQIEVFLTKLAMSTEWTHN 470

             DPYKFDPQRFDSVRKEDVTCAKNSLVFGAGPHFCIGKELAKNQIEVFLTKLAMSTEWTHN

Sbjct:   380 DPYKFDPQRFDSVRKEDVTCAKNSLVFGAGPHFCIGKELAKNQIEVFLTKLAMSTEWTHN 439

 

Query:   471 KTPGGDEIIFGPTIFPKDGCNITIKARN 498

             KTPGGDEIIFGPTIFPKDGCNITIKARN

Sbjct:   440 KTPGGDEIIFGPTIFPKDGCNITIKARN 467

 

>aa-ng1193        Contig_0430     pred cdna   pORF 4315         2561 strand r start y stop y ph 0

MDLLLFIFFLILFYYSVKYYKADNQNSLSLSGPTPVPILGNIHQVGKDAHLTIPIISKKY

HGIFRMWLGGTYYVVVSDYKLIREMYIENFENFKNRIATFKTMTGDDSRGIIGCNGDIWD

SNKELIMKSYKKVLNKDMNDFILLKSKELFNFFEKNGIKNEEEEDDDDDGNKSIIINNTR

FYFQSLTLTVMFKMIFNENKSFQLYSDSTEFKLIFKTILNLLNSLNVYNVIYDFLGIFQP

ILLKFTKILDKNSFLSKIATEKFNSRIKEIDFTSDDFKANDLLDSLIMTINEDENGLNEK

QIENIKSICIDFLMAGTDTTGSTIEWIILKLVNNPEFQELIFQELKKLNKSEITANDKIN

TPFLNSFIKETNRLYPIAPLSLPRKSINEMIIGDNKYYIPANTNILMDVKGFSLDENNYK

DPNEFKPDRFLNSKVSDTLNFGIGPRNCIGQTIAMNQIHIFLSNLILNYRMFSIDCLPLP

ENLILSVSVRPTEYSLKLIKRV

 

>CYP554A1 SEQ 40 30% to 508 469 aa complete

MDLLLFIFFLILFYYS (0)

YYKADNQNSLSLSGPTPVPILGNIHQVGKDAHLTIPIISKKYHGLFRMWFGGTYYVVV

SDYKLIREMYIENFENFKNRIA

TFKTMTGDDSRGIIGCNGDIWDSNKELIMKSYKKVLNKDMNDFILLKSKELFNFFEKNGIKNE (INTRON)

IINNTRFYFQSLTLT

VMFKMIFNENKSFQLYSDST

EFKLIFKTILNLLNSLNVYNVI

YDFLGIFQPILLKFTKILDKNSFLSKIA

TEKFNSRIKEIDFTSDDFKANDLLDSLIMTINEDEN

GLNEKQIENIKSICIDFLMAGTDTTGSTIEWIILKLVNNPEFQELIFQELK

KLNKSEITANDKINTPFLNSFIKETNRLY (0)

KSNNNFFFRKSINEMIIGDNKYYIPANTNILMDVKGFSLDENNYKDPNEFKPDRFLNSKVSDTLNFGI

GPRNCIGQTIAMNQIHIFLSNLILNYRMFSIDCLPLPENLF*

 

1. VK there are two possible GTs for this exon boundary

2. IIACP2E4184 is only seq with LFRMWF so change this to IFRMWL

3. EEEDDDDDGNKS no good GT AG pairs in frame so leave this in

4. PIAPLSLP matches other related P450s in this region so this is correct

5. ILSVSVRPTEYSLKLIKRV adds correct C-terminal

use the ng1193 seq as the correct seq.

 

>CYP554A1 SEQ 40 30% to 508 469 aa complete

          Length = 470

 

 Score = 2323 (817.7 bits), Expect = 7.3e-245, P = 7.3e-245

 Identities = 458/483 (94%), Positives = 459/483 (95%)

 

Query:     1 MDLLLFIFFLILFYYSVKYYKADNQNSLSLSGPTPVPILGNIHQVGKDAHLTIPIISKKY 60

             MDLLLFIFFLILFYYS  YYKADNQNSLSLSGPTPVPILGNIHQVGKDAHLTIPIISKKY

Sbjct:     1 MDLLLFIFFLILFYYS--YYKADNQNSLSLSGPTPVPILGNIHQVGKDAHLTIPIISKKY 58

 

Query:    61 HGIFRMWLGGTYYVVVSDYKLIREMYIENFENFKNRIATFKTMTGDDSRGIIGCNGDIWD 120

             HG+FRMW GGTYYVVVSDYKLIREMYIENFENFKNRIATFKTMTGDDSRGIIGCNGDIWD

Sbjct:    59 HGLFRMWFGGTYYVVVSDYKLIREMYIENFENFKNRIATFKTMTGDDSRGIIGCNGDIWD 118

 

Query:   121 SNKELIMKSYKKVLNKDMNDFILLKSKELFNFFEKNGIKNEEEEDDDDDGNKSIIINNTR 180

             SNKELIMKSYKKVLNKDMNDFILLKSKELFNFFEKNGIKNE             IINNTR

Sbjct:   119 SNKELIMKSYKKVLNKDMNDFILLKSKELFNFFEKNGIKNE-------------IINNTR 165

 

Query:   181 FYFQSLTLTVMFKMIFNENKSFQLYSDSTEFKLIFKTILNLLNSLNVYNVIYDFLGIFQP 240

             FYFQSLTLTVMFKMIFNENKSFQLYSDSTEFKLIFKTILNLLNSLNVYNVIYDFLGIFQP

Sbjct:   166 FYFQSLTLTVMFKMIFNENKSFQLYSDSTEFKLIFKTILNLLNSLNVYNVIYDFLGIFQP 225

 

Query:   241 ILLKFTKILDKNSFLSKIATEKFNSRIKEIDFTSDDFKANDLLDSLIMTINEDENGLNEK 300

             ILLKFTKILDKNSFLSKIATEKFNSRIKEIDFTSDDFKANDLLDSLIMTINEDENGLNEK

Sbjct:   226 ILLKFTKILDKNSFLSKIATEKFNSRIKEIDFTSDDFKANDLLDSLIMTINEDENGLNEK 285

 

Query:   301 QIENIKSICIDFLMAGTDTTGSTIEWIILKLVNNPEFQELIFQELKKLNKSEITANDKIN 360

             QIENIKSICIDFLMAGTDTTGSTIEWIILKLVNNPEFQELIFQELKKLNKSEITANDKIN

Sbjct:   286 QIENIKSICIDFLMAGTDTTGSTIEWIILKLVNNPEFQELIFQELKKLNKSEITANDKIN 345

 

Query:   361 TPFLNSFIKETNRLYPIAPLSLPRKSINEMIIGDNKYYIPANTNILMDVKGFSLDENNYK 420

             TPFLNSFIKETNRLY        RKSINEMIIGDNKYYIPANTNILMDVKGFSLDENNYK

Sbjct:   346 TPFLNSFIKETNRLYKSNNNFFFRKSINEMIIGDNKYYIPANTNILMDVKGFSLDENNYK 405

 

Query:   421 DPNEFKPDRFLNSKVSDTLNFGIGPRNCIGQTIAMNQIHIFLSNLILNYRMFSIDCLPLP 480

             DPNEFKPDRFLNSKVSDTLNFGIGPRNCIGQTIAMNQIHIFLSNLILNYRMFSIDCLPLP

Sbjct:   406 DPNEFKPDRFLNSKVSDTLNFGIGPRNCIGQTIAMNQIHIFLSNLILNYRMFSIDCLPLP 465

 

Query:   481 ENL 483

             ENL

Sbjct:   466 ENL 468

 

>aa-ng1210        Contig_0437     pred cdna   pORF 7346         5011 strand r start y stop y ph 0

MDVNGPWSLPIIGGIYLINDNPNRALTKLSKKYGGIYKIWLGESFSMVVSDPEIVNEIWV

KQHDNFINRPKNITHKMFSSNYRSLNFGDNPNWKFNRSMASSHFTKTKLLSSKVTSVVEK

KLNKLIETMEYHSINKLPFDSYVGFSEYSLNIILNMLVSMDIDECENSTQNVIYSINEIF

KMLSTNSPQYSFPYLKFFFKKDLNNFKFHLDKIKSFIHSIYLKQLESYDPSNPRNILDSF

ISDLQSNDIDILLQICIDIVVAGTDTVANLLQWFVLFCINYPEIQEKLYNEIIEVVGKDC

KVLKYEHISKMPYLYGCFRESLRIRPVTPLSLPRVAKCDTYIKDDIFIPKGATIIQNIFG

MGNDEKYISEPNKFKPERWVEYIKNKKVNKNGNENSVNKYFNDLDKISIPFGVGKRQCLS

PAMAEQESLLSIATVVLNYKLKSNGQKKLNEKEVYSITIKPQPFKLFLEKRFVRKRV

 

Note: ng1210 missing N-terminal exon and some exon 2. compare to 520A1

One amino acid diff. Y vs. *

 

Stop codon in JC1c13f09.s1 Y in Contig_0437 correct seq to Y

Use my sequence for N-terminal

 

>CYP520B1 Seq 58 47% to seq 47 317 aa

MNYLLIIICIIFFSLFFDF

KIRKNWNLNFKRLFKKDVNGPWSLPIIGGIYLINDNPNRALTKLSKKYGGIYKIWLGESFSM

VVSDPEIVNEIWVKQHDNFINRPKNITHK

MFSSNYRSLNFGDNPNWKFNRSMASSHFTKTKLLSSK

VTSVVEKKLNKLIETMEYHSINKLP