Comparison
of Dictyoselium assemblies from genome annotation and my annotation.
Attempts
are made to reconcile the differences in the assemblies.
D. Nelson
May 13,
2003
I have 55
sequences identified
51 = seq 19
ng7190 use ng7190
508A1 = seq 1+2
ng3018 missing N-term exon use my seq
508A2 = seq 22+45+69+86+88 ng5262 N-term
incorrect use my seq
508A3 = seq 21+67+72+82+87 ng10949 correct
the two diffs and add my N-term
508A4 = seq 66+70 ng4360
remove K from mine at boundary then use mine
508B1 = seq 15+77 ng6400
missing most of N-term use my seq
508B2P = seq
60
508C1 = seq 4+23+24+83 ng1934 missing three
parts use my seq
508D1 = seq 59+81 ng1935
missing N-terminal exon use my seq
508E1 = seq 71+85 ng8897 use
my seq correct two other defects
513A1 = seq 8+53 ng6889
n-term missing use my seq
513A2P = seq
38
ng12344 correct 1 aa diff use my seq
513A3 = seq 7+50 ng4838
add missing intron back use my seq
513B1 = seq 49
ng627 2aa diffs use ng627 seq
513C1 = seq 5+56 ng6446
100% match
513D1 = seq 54+55+78 ng11186
513E1 = seq 51+52+90 ng3101
use N instead of D
513E2P = seq
37a complete except for missing N-term exon 55% to 513E1
(old name 513C1P) ng1289
missing first 13 aa use my seq
513E3P = seq 37b
upstream of 513E2P only 429 bp between them (old name 513C2P)
513F1 = seq 57
513G1P = seq 41
37% to CYP513A1
514A1 = seq 11
ng1282 add 7aa to exon 1 and change 1 aa L to F in ng1282
514A2 = seq 65
ng1006 joint at N-term exon does not agree, 2aa diffs
inside, use ng1006 seq
514A3P = seq 89
514A4 = new seq ng2792
7aa difs to 514A1
515A1 = seq 79b ng4137
some parts of both seqs have support
515A2P = seq
79a
515B1 = seq 6
ng2647 N-terms do not match use ng2647 but extend back
516A1 = seq 9+43 ng9107
use ng9107
516A2P = seq
64
516B1 = seq 10+42 ng9290 use
mine but add insertion
516B2P = seq 48
partial, might be a pseudogene
517A1 = seq 3+61 ng4501
use this seq with my N-terminal
517A2 = seq 74
ng10310 correct 3aa diffs and use my seq
517A3P = seq
36
ng682 missing some seq. Use my seq add new C-term piece
517A4 = new seq ng5440
most like 517A1, 13 aa diffs
518A1 = seq 20
518A2P = seq
25+27+29+30
518B1 = seq 26+28
519A1 = seq 12
ng8085 add N-term exon then use ng8085
519B1 = seq 35+44+75 ng6320
USE ng6320
519C1 = seq 31+32+33 ng5581
use my seq (N-term missing)
519C2P = seq
46
519D1 = seq 14+34 ng12432 use
my seq
519E1 = seq 17+84 ng6352 add
two short pieces use ng6352
520A1 = seq 47+68+80 ng8864
use this seq with my N-term
520B1 = seq 58
ng1210 correct stop codon. use my seq.
521A1 = seq 13
ng2405 100% match
522A1 = seq 18
519H1P = seq
39+73 ng11335 old
523A1 correct three aa diffs add my N-term
524A1 = seq 91
ng752 N-terms do not agree use my seq
525A1 = seq 62
554A1 = seq 40
ng1193 small diffs use ng1193 seq
555A1 = seq 76
ng10297 use ng10297
556A1 = seq 92
ng10943 use my seq
Dicty database did not
identify the following as P450s 508B2P, 513E2P, 513F1, 513G1P,
515A2P, 516A2P, 516B2,
518A1, 518A2P, 518B1, 519C2P, 522A1, 525A1,
note 518A1, 518B1 and 522A1
have non-standard Heme signatures
(13 seqs)
Dicty database found two
new genes I had missed 514A4 and 517A4 both very similar to existing genes.
>514A2
>aa-ng1006
Contig_0356
pred cdna pORF 5256
3526 strand r start y stop y ph 0
MNLIYTIILTIIILVLIISIKDLFFEDKIKKINKSIPSPPTIPIFGNLLQINSKDVATCF
NDFYKQYGKVYRLRLGSVETVVLTGGDIIDECFNKKYRDFLKARYVKFSRYLGKDTNILH
SNGDYHFLLKGVLSSQVTVRKLNNGRLEFNKYILQMFNNLNNNDEGSTMFLANDVPSQIK
KLILKVVLNFTLGIEENDDINLSLFQNGSNIFKAAGLFIYSDYLPFLFPLDIKSMAKSNM
ISSYVFVRDYLAKKLEEVKKKEYIINGDDDGGVDTSQTPLIESYYKLYLQGLIGYDSILL
SIVDIIIASVDTTSNSISFIIARLTNHQEIQSKIYEEIMSNDINNNSNNISFSDHSKYPY
IISIMNETYRYYASVPLPEPNMTTEDIEVDGYKIAKGTQIYKNIRGTLISKEFWGEDALE
FKPERFKTQTLNQKGLLHFGAGPRGCPGARFTECFFFTLMVLLFKNYKLQNPNDNPIDDR
GDVGLSMQCKPYDALFIKRN
JC1c209h10.r1
has FL not CF
JC3a120c12.s1
has ?F not FL
Contig_0356 has CF not FL use CF
Query: 1
MNLIYTIILTIIILVLIISIKDLFFEDKIKKINKSIPSPPTIPIFGNLLQINSKDVATCF 60
MNLIYTIILTIIIL
DLFFEDKIKKINKSIPSPPTIPIFGNLLQINSKDVAT
Sbjct: 1
MNLIYTIILTIIIL-------DLFFEDKIKKINKSIPSPPTIPIFGNLLQINSKDVATFL 53
Query: 61
NDFYKQYGKVYRLRLGSVETVVLTGGDIIDECFNKKYRDFLKARYVKFSRYLGKDTNILH 120
NDFYKQYGKVYRLRLGSVETVVLTGGDIIDECFNKKYRDFLKARYVKFSRYLGKDTNILH
Sbjct: 54
NDFYKQYGKVYRLRLGSVETVVLTGGDIIDECFNKKYRDFLKARYVKFSRYLGKDTNILH 113
Query: 121
SNGDYHFLLKGVLSSQVTVRKLNNGRLEFNKYILQMFNNLNNNDEGSTMFLANDVPSQIK 180
SNGDYHFLLKGVLSSQVTVRKLNNGRLEFNKYILQMFNNLNNNDEGSTMFLANDVPSQIK
Sbjct: 114
SNGDYHFLLKGVLSSQVTVRKLNNGRLEFNKYILQMFNNLNNNDEGSTMFLANDVPSQIK 173
Query: 181
KLILKVVLNFTLGIEENDDINLSLFQNGSNIFKAAGLFIYSDYLPFLFPLDIKSMAKSNM 240
KLILKVVLNFTLGIEENDDINLSLFQNGSNIFKAAGLFIYSDYLPFLFPLDIKSMAKSNM
Sbjct: 174
KLILKVVLNFTLGIEENDDINLSLFQNGSNIFKAAGLFIYSDYLPFLFPLDIKSMAKSNM 233
Query: 241
ISSYVFVRDYLAKKLEEVKKKEYIINGDDDGGVDTSQTPLIESYYKLYLQGLIGYDSILL 300
ISSYVFVRDYLAKKLEEVKKKEYIINGDDDGGVDTSQTPLIESYYKLYLQGLIGYDSILL
Sbjct: 234
ISSYVFVRDYLAKKLEEVKKKEYIINGDDDGGVDTSQTPLIESYYKLYLQGLIGYDSILL 293
Query: 301
SIVDIIIASVDTTSNSISFIIARLTNHQEIQSKIYEEIMSNDINNNSNNISFSDHSKYPY 360
SIVDIIIASVDTTSNSISFIIARLTNHQEIQSKIYEEIMSNDINNNSNNISFSDHSKYPY
Sbjct: 294 SIVDIIIASVDTTSNSISFIIARLTNHQEIQSKIYEEIMSNDINNNSNNISFSDHSKYPY
353
Query: 361
IISIMNETYRYYASVPLPEPNMTTEDIEVDGYKIAKGTQIYKNIRGTLISKEFWGEDALE 420
IISIMNETYRYYASVPLPEPNMTTEDIEVDGYKIAKGTQIYKNIRGTLISKEFWGEDALE
Sbjct: 354
IISIMNETYRYYASVPLPEPNMTTEDIEVDGYKIAKGTQIYKNIRGTLISKEFWGEDALE 413
Query: 421
FKPERFKTQTLNQKGLLHFGAGPRGCPGARFTECFFFTLMVLLFKNYKLQNPNDNPIDDR 480
FKPERFKTQTLNQKGLLHFGAGPRGCPGARFTECFFFTLMVLLFKNYKLQNPNDNPIDDR
Sbjct: 414
FKPERFKTQTLNQKGLLHFGAGPRGCPGARFTECFFFTLMVLLFKNYKLQNPNDNPIDDR 473
Query: 481 GDVGLSMQCKPYDALFIKRN 500
GDVGLSMQCKPYDALFIKRN
Sbjct: 474 GDVGLSMQCKPYDALFIKRN 493
>CYP513B1
>aa-ng627
Contig_0257
pred cdna pORF 5998
4427 strand r start y stop y ph 0
MNLLVLSVILAIIIYLIFKRNYKYSPSKINSKIPGPIGLPIFGNILSLDNKNGIHTTFQK
WFKIYGPIYSVNMGNKSAVVLTGFPIIKKAFIDNSEAFAPHYTFESRYKLNKCSDITQEN
GKNQSALKRIFLSELTVTRIKKQESHIQNEIVKLMKVLDKHSEDGKPFLLNNYFSMFSIN
IISRFLFGIDFPYQDFEETSDLMVGIRDLLIASGEIVLSDFLPIPHSKRSKLYTSYQALV
VQIETLVKSHKYKEDDECMLSKLMIEHDKGNIPWDAVISNCNTIITAGSDSTSSTALFFL
IEMMNNPTIQTKVYNDIVVSFEQNQQADDYMNESMVILKYSKYRSLIPYLSLALKENYRK
HPAAPFGAPHETTQETVIEGYTIAKGTMIFQNIYATQRSDTFYSQPDEFIPERWNGDENS
QTLISFGTGIRDCIGKSLAYNEIFTIIASVLNRYEFINPNPSIPFDDNGIPGLTTQCKNT
VVQIKKR
Note only
one seq has L C. The others all
have S W so correct my seq
>CYP513B1
seq 49 complete seq 45% to seq 7, 50
Length = 488
Score = 2504 (881.5 bits), Expect =
4.8e-264, P = 4.8e-264
Identities = 485/487 (99%), Positives =
485/487 (99%)
Query: 1
MNLLVLSVILAIIIYLIFKRNYKYSPSKINSKIPGPIGLPIFGNILSLDNKNGIHTTFQK 60
MNLLVLSVILAIIIYLIFKRNYKYSPSKINSKIPGPIGLPIFGNILSLDNKNGIHTTFQK
Sbjct: 1
MNLLVLSVILAIIIYLIFKRNYKYSPSKINSKIPGPIGLPIFGNILSLDNKNGIHTTFQK 60
Query: 61
WFKIYGPIYSVNMGNKSAVVLTGFPIIKKAFIDNSEAFAPHYTFESRYKLNKCSDITQEN 120
WFKIYGPIYSVNMGNKSAVVLTGFPIIKKAFIDNSEAFAPHYTFESRYKLNKCSDITQEN
Sbjct: 61
WFKIYGPIYSVNMGNKSAVVLTGFPIIKKAFIDNSEAFAPHYTFESRYKLNKCSDITQEN 120
Query: 121
GKNQSALKRIFLSELTVTRIKKQESHIQNEIVKLMKVLDKHSEDGKPFLLNNYFSMFSIN 180
GKNQSALKRIFLSELTVTRIKKQESHIQNEIVKLMKVLDKHSEDGKPFLLNNYFSMFSIN
Sbjct: 121
GKNQSALKRIFLSELTVTRIKKQESHIQNEIVKLMKVLDKHSEDGKPFLLNNYFSMFSIN 180
Query: 181
IISRFLFGIDFPYQDFEETSDLMVGIRDLLIASGEIVLSDFLPIPHSKRSKLYTSYQALV 240
IISRFLFGIDFPYQDFEETSDLMVGIRDLLIASGEIVLSDFLPIPHSKRSKLYTSYQALV
Sbjct: 181
IISRFLFGIDFPYQDFEETSDLMVGIRDLLIASGEIVLSDFLPIPHSKRSKLYTSYQALV 240
Query: 241
VQIETLVKSHKYKEDDECMLSKLMIEHDKGNIPWDAVISNCNTIITAGSDSTSSTALFFL 300
VQIETLVKSHKYKEDDECML KLMIEHDKGNIP DAVISNCNTIITAGSDSTSSTALFFL
Sbjct: 241
VQIETLVKSHKYKEDDECMLLKLMIEHDKGNIPCDAVISNCNTIITAGSDSTSSTALFFL 300
Query: 301
IEMMNNPTIQTKVYNDIVVSFEQNQQADDYMNESMVILKYSKYRSLIPYLSLALKENYRK 360
IEMMNNPTIQTKVYNDIVVSFEQNQQADDYMNESMVILKYSKYRSLIPYLSLALKENYRK
Sbjct: 301 IEMMNNPTIQTKVYNDIVVSFEQNQQADDYMNESMVILKYSKYRSLIPYLSLALKENYRK
360
Query: 361
HPAAPFGAPHETTQETVIEGYTIAKGTMIFQNIYATQRSDTFYSQPDEFIPERWNGDENS 420
HPAAPFGAPHETTQETVIEGYTIAKGTMIFQNIYATQRSDTFYSQPDEFIPERWNGDENS
Sbjct: 361
HPAAPFGAPHETTQETVIEGYTIAKGTMIFQNIYATQRSDTFYSQPDEFIPERWNGDENS 420
Query: 421
QTLISFGTGIRDCIGKSLAYNEIFTIIASVLNRYEFINPNPSIPFDDNGIPGLTTQCKNT 480
QTLISFGTGIRDCIGKSLAYNEIFTIIASVLNRYEFINPNPSIPFDDNGIPGLTTQCKNT
Sbjct: 421
QTLISFGTGIRDCIGKSLAYNEIFTIIASVLNRYEFINPNPSIPFDDNGIPGLTTQCKNT 480
Query: 481 VVQIKKR 487
VVQIKKR
Sbjct: 481 VVQIKKR 487
>517A3P
MEIVNV (frameshift)
FIILIILFLVKDF (0)
VKKNKKIHTKSPSGPIAFPILGNVVQIRFWELFKIQEHEL
(10 aa deletion)
IVRAWIGE
RLFLFVSNYDVKYFQKDENFLYKPSLLVPGWRYASSNGLGVMSSSDDEWKRAKSS
VS**LRVHTSKKLMEEKAIEFIDSLEKISNNNEI (0)
FYPKGHIQGYACSMLFKYMFNQDLSVESGLSRTIGNAVEHVFGNLSKLTAFDCFE
IFSPLYDWFFTRRL*DCDIV*QIINILTENH
This is a
pseudogene, so the sequences in green need to be added.
Ng682 has
an extension that I did not have that needs to be added to my seq.
The seq
similarity stops after LCSNG so I removed that last two aa.
>aa-ng682
Contig_0274
pred cdna pORF 17004 15801
strand r start y stop y ph 0
MEIVNVKKNKKIHTKSPSGPIAFPILGNVVQIRFWELFKIQEHELIVRAWIGERLFLFVS
NYDVKYFQKDENFLYKPSLLVPGWRYASSNGLGVMSSSDDEWKRAKSS
FYPKGHIQGYACSMLFKYMFNQDLSVESGLSRTIGNAVEHVFGNLSKLTAFDCFEIFSPLYDWFFTRRL
KGCDIVRQIISSQNENHFKSIDPSKPRDLMDDLLIEYGLNEITKEDTMKINQICFDIFGPAIG
TVTITMNWVILQLCNRPEPQEIAYLEIKKAVKDDEYVNFNHKQNAPYIVAFIKETMRLCS
NGPN
>CYP517A3P
Seq 36 pseudogene 77% to seq 3 77% to seq 61
Length
= 242
Score = 546 (192.2 bits), Expect =
8.8e-99, Sum P(2) = 8.8e-99
Identities = 103/103 (100%), Positives =
103/103 (100%)
Query: 6
VKKNKKIHTKSPSGPIAFPILGNVVQIRFWELFKIQEHELIVRAWIGERLFLFVSNYDVK 65
VKKNKKIHTKSPSGPIAFPILGNVVQIRFWELFKIQEHELIVRAWIGERLFLFVSNYDVK
Sbjct: 20
VKKNKKIHTKSPSGPIAFPILGNVVQIRFWELFKIQEHELIVRAWIGERLFLFVSNYDVK 79
Query: 66
YFQKDENFLYKPSLLVPGWRYASSNGLGVMSSSDDEWKRAKSS 108
YFQKDENFLYKPSLLVPGWRYASSNGLGVMSSSDDEWKRAKSS
Sbjct: 80
YFQKDENFLYKPSLLVPGWRYASSNGLGVMSSSDDEWKRAKSS 122
Score = 418 (147.1 bits), Expect =
8.8e-99, Sum P(2) = 8.8e-99
Identities = 79/86 (91%), Positives =
80/86 (93%)
Query: 109
FYPKGHIQGYACSMLFKYMFNQDLSVESGLSRTIGNAVEHVFGNLSKLTAFDCFEIFSPL 168
FYPKGHIQGYACSMLFKYMFNQDLSVESGLSRTIGNAVEHVFGNLSKLTAFDCFEIFSPL
Sbjct: 157
FYPKGHIQGYACSMLFKYMFNQDLSVESGLSRTIGNAVEHVFGNLSKLTAFDCFEIFSPL 216
Query: 169 YDWFFTRRLKGCDIVRQIISSQNENH
194
YDWFFTRRL CDIV QII+ ENH
Sbjct: 217 YDWFFTRRL*DCDIV*QIINILTENH
242
>CYP524A1
MKTPTKYFIIFILLAALAVF (0)
KGSLPGPSFVPPFFGMLFQLIFTPFSFYEKQEKYGPISWTSIMNKF
VLFVTDAEINRQVFKEENAKLYLSLGAKKILTEK
AIPFIEGAP
HRQLRKQLLPLFTIRALSSYLPIQESIVDEHIAMWIKNGKADINARNNCRDLNMAISTGVFV
GNNTPESVRDDIAKNFFVMNEGFLCLPIDLPGTTLRKAINARVRLVEIFTDIIAKSRKRMGDGE
KPQSLIDLWVEHFLNCPKEERDELSNDTIIFTLLSFMFASQDALTS
SLVWTVQLMAEHPDILAKVRAEQASLRPNNEKLDLDTMRQATYTRMVVSEILRFRPPAVMVPHE
NIEDIVIGDNVHVPKGTMILPSIWSAHFQEGGYSDPYKFDPQRFDSVRKEDVTCAKNSLVFGAG
PHFCIGKELAKNQIEVFLTKLAMSTEWTHNKTPGGDEIIFGPTIFPKDGCNITIKARN*
Note: two
exon extensions occur in ng752. VSEAT to KLIVA extends exon 1 downstream
VGKFT to
ITQK extends exon 2 upstream. This
makes the N-term too long.
The N-terminal up to the PFFGML motif is usually 30-50 aa long, but ng752 is 96aa long. I include the sequence VLFV to LTEK, which ng752 removes as an intron.
I think my solution is better, since it preserves the length between the PFFGML motif and the C-helix motif (HRQLRK) as compared to the Arabidopsis CYP710A1 seq and
The
CYP26B1 FROM FUGU (see aligns below, deleted intron shown in red)
>aa-ng752
Contig_0287
pred cdna pORF 3086
1351 strand r start y stop y ph 0
MKTPTKYFIIFILLAALAVF
VSEATSKVGQQTTTTTQQVKCSGLQCTLNKLIVA
VGKFTIKQILVGIVIVLATIALHQQYVITQK
KGSLPGPSFVP
PFFGML
FQLIFTPFSFYEKQEKYGPISWTSIMNKF
AIPFIEGAP HRQLRKQLLPLFTIRALSSYLPIQESIVDEHIAMWIKNGK
ADINARNNCRDLNMAISTGVFVGNNTPESVRDDIAKNFFVMNEGFLCLPIDLPGTTLRKA
INARVRLVEIFTDIIAKSRKRMGDGEKPQSLIDLWVEHFLNCPEEERDELSNDTIIFTLL
SFMFASQDALTSSLVWTVQLMAEHPDILAKVRAEQASLRPNNEKLDLDTMRQATYTRMVV
SEILRFRPPAVMVPHENIEDIVIGDNVHVPKGTMILPSIWSAHFQEGGYSDPYKFDPQRF
DSVRKEDVTCAKNSLVFGAGPHFCIGKELAKNQIEVFLTKLAMSTEWTHNKTPGGDEIIF
GPTIFPKDGCNITIKARN
>CYP710A1
(subject) CYP524A1 = query
Length = 496
Score = 780 (274.6 bits), Expect =
1.3e-80, P = 1.3e-80
Identities = 167/469 (35%), Positives =
267/469 (56%)
Query: 8
FIIFILLAALA-VFKG-SLPGPSFVPPFFGMLFQLIFTPFSFYEKQEKYGPISWTS---I
62
F++F+L+ L+ +FK ++PGP FVPP G
L+ P SF++KQ IS S +
Sbjct: 19 FLLFLLVEQLSYLFKKRNIPGPFFVPPIIGNAVALVRDPTSFWDKQSSTANISGLSANYL
78
Query: 63 MNKFVLFVTDAEINRQVFKEENAKLYLSLG---AKKILTEKAIPFIEGAPHRQLRKQLLP 119
+ KF++++ D E++ Q+F + +G
KK+ + + ++ G H+ +R+QL P
Sbjct: 79
IGKFIVYIRDTELSHQIFSNVRPDAFHLIGHPFGKKLFGDHNLIYMFGEDHKSVRRQLAP 138
Query: 120
LFTIRALSSYLPIQESIVDEHIAMW---IKNGKADINARNNCRDLNMAISTGVFVGNNTP 176
FT +ALS+Y +Q+ ++ H+ W G ++ R
R+LN+ S VFVG
Sbjct: 139
NFTPKALSTYSALQQLVILRHLRQWEGSTSGGSRPVSLRQLVRELNLETSQTVFVGPYLD 198
Query: 177
ESVRDDIAKNFFVMNEGFLCLPIDLPGTTLRKAINARVRLVEIFTDIIAKSRKRMGDGEK 236
+ ++ ++ + N G + LPIDLPG +A A
RL E
KS+ RM GE+
Sbjct: 199
KEAKNRFRTDYNLFNLGSMALPIDLPGFAFGEARRAVKRLGETLGICAGKSKARMAAGEE 258
Query: 237 PQSLIDLWVEHFLNCPKEERDELSNDTIIFTLL-SFMFASQDALTSSLVWTVQLMAEHPD
295
P LID W++ +
+ + S D I LL F+FA+QDA TSSL+W V L+ P+
Sbjct: 259
PACLIDFWMQAIV--AENPQPPHSGDEEIGGLLFDFLFAAQDASTSSLLWAVTLLDSEPE 316
Query: 296
ILAKVRAEQASL-RP-NNEKLDLDTMRQATYTRMVVSEILRFRPPAVMVPHENIEDIVIG 353
+L +VR E A + P +N + +D + + YTR V E++R+RPPA
MVPH D +
Sbjct: 317
VLNRVREEVAKIWSPESNALITVDQLAEMKYTRSVAREVIRYRPPATMVPHVAAIDFPLT 376
Query: 354
DNVHVPKGTMILPSIWSAHFQEGGYSDPYKFDPQRFDSVRKEDVTCAKNSLVFGAGPHFC 413
+ +PKGT++ PS++ + FQ G+++P +FDP RF R+ED
+N L FG GPH C
Sbjct: 377
ETYTIPKGTIVFPSVFDSSFQ--GFTEPDRFDPDRFSETRQEDQVFKRNFLAFGWGPHQC 434
Query: 414
IGKELAKNQIEVFLTKLAMSTEWTHNKTPGGDEIIFGPTIFPKDGCNITIKAR 466
+G+ A N + +F+ + ++ ++ G DEI++ PTI PKDGC + + R
Sbjct: 435
VGQRYALNHLVLFIAMFSSLLDFKRLRSDGCDEIVYCPTISPKDGCTVFLSRR 487
Fugu
CYP26B1
>CYP26B1
Scaffold_4267 Scaffold_49 partial sequence 74% to 26B1
Length = 513
Score = 355 (125.0 bits), Expect =
3.4e-36, P = 3.4e-36
Identities = 122/440 (27%), Positives =
195/440 (44%)
Query: 24 LPGPSFVPPFFGMLFQLIFTPFSFY-EKQEKYGPISWTSIMNKFVLFVTDAEINRQVFKE 82
+P S PF G + F+
+++KYG + T ++ + ++ VT AE R+V
Sbjct: 49
MPKGSMGFPFIGETCHWLLQGSGFHASRRQKYGNVFKTHLLGRPLIRVTGAENIRKVLMG 108
Query: 83 ENAKLYLSL--GAKKILTEKAIPFIEGAPHRQLRKQLLPLFTIRALSSYLPIQESIVDEH
140
E+ + + +L ++ G
HR+ RK +F+ AL SYLP + ++ E
Sbjct: 109
EHTLVTVDWPQSTSTLLGPNSLANSIGDIHRKKRKVFAKVFSHEALESYLPKIQQVIQES 168
>CYP524A1
Seq 91 complete seq 25% to 10, 42 468 aa
Length = 468
Score = 1926 (678.0 bits), Expect =
1.1e-236, Sum P(3) = 1.1e-236
Identities = 371/388 (95%), Positives =
376/388 (96%)
Query: 1
MKTPTKYFIIFILLAALAVF 20
MKTPTKYFIIFILLAALAVF
Sbjct: 1
MKTPTKYFIIFILLAALAVF 20
Query: 86
KGSLPGPSFVPPFFGMLFQLIFTPFSFYEKQEKYGPISWTSIMNKFAIPFIEGAPHRQLR 145
KGSLPGPSFVPPFFGMLFQLIFTPFSFYEKQEKYGPISWTSIMNKF + + +RQ+
Sbjct: 21
KGSLPGPSFVPPFFGMLFQLIFTPFSFYEKQEKYGPISWTSIMNKFVLFVTDAEINRQVF 80
Query: 146 KQ 147
K+
Sbjct: 81 KE 82
Query: 112
FYEKQEK-YGPISWTSIMNKFAIPFIEGAPHRQLRKQLLPLFTIRALSSYLPIQESIVDE 170
F E+ K Y +
I+ + AIPFIEGAPHRQLRKQLLPLFTIRALSSYLPIQESIVDE
Sbjct: 80 FKEENAKLYLSLGAKKILTEKAIPFIEGAPHRQLRKQLLPLFTIRALSSYLPIQESIVDE
139
Query: 171
HIAMWIKNGKADINARNNCRDLNMAISTGVFVGNNTPESVRDDIAKNFFVMNEGFLCLPI 230
HIAMWIKNGKADINARNNCRDLNMAISTGVFVGNNTPESVRDDIAKNFFVMNEGFLCLPI
Sbjct: 140 HIAMWIKNGKADINARNNCRDLNMAISTGVFVGNNTPESVRDDIAKNFFVMNEGFLCLPI
199
Query: 231
DLPGTTLRKAINARVRLVEIFTDIIAKSRKRMGDGEKPQSLIDLWVEHFLNCPEEERDEL 290
DLPGTTLRKAINARVRLVEIFTDIIAKSRKRMGDGEKPQSLIDLWVEHFLNCP+EERDEL
Sbjct: 200
DLPGTTLRKAINARVRLVEIFTDIIAKSRKRMGDGEKPQSLIDLWVEHFLNCPKEERDEL 259
Query: 291
SNDTIIFTLLSFMFASQDALTSSLVWTVQLMAEHPDILAKVRAEQASLRPNNEKLDLDTM 350
SNDTIIFTLLSFMFASQDALTSSLVWTVQLMAEHPDILAKVRAEQASLRPNNEKLDLDTM
Sbjct: 260
SNDTIIFTLLSFMFASQDALTSSLVWTVQLMAEHPDILAKVRAEQASLRPNNEKLDLDTM 319
Query: 351
RQATYTRMVVSEILRFRPPAVMVPHENIEDIVIGDNVHVPKGTMILPSIWSAHFQEGGYS 410
RQATYTRMVVSEILRFRPPAVMVPHENIEDIVIGDNVHVPKGTMILPSIWSAHFQEGGYS
Sbjct: 320
RQATYTRMVVSEILRFRPPAVMVPHENIEDIVIGDNVHVPKGTMILPSIWSAHFQEGGYS 379
Query: 411 DPYKFDPQRFDSVRKEDVTCAKNSLVFGAGPHFCIGKELAKNQIEVFLTKLAMSTEWTHN
470
DPYKFDPQRFDSVRKEDVTCAKNSLVFGAGPHFCIGKELAKNQIEVFLTKLAMSTEWTHN
Sbjct: 380
DPYKFDPQRFDSVRKEDVTCAKNSLVFGAGPHFCIGKELAKNQIEVFLTKLAMSTEWTHN 439
Query: 471 KTPGGDEIIFGPTIFPKDGCNITIKARN
498
KTPGGDEIIFGPTIFPKDGCNITIKARN
Sbjct: 440 KTPGGDEIIFGPTIFPKDGCNITIKARN
467
>aa-ng1193
Contig_0430
pred cdna pORF 4315
2561 strand r start y stop y ph 0
MDLLLFIFFLILFYYSVKYYKADNQNSLSLSGPTPVPILGNIHQVGKDAHLTIPIISKKY
HGIFRMWLGGTYYVVVSDYKLIREMYIENFENFKNRIATFKTMTGDDSRGIIGCNGDIWD
SNKELIMKSYKKVLNKDMNDFILLKSKELFNFFEKNGIKNEEEEDDDDDGNKSIIINNTR
FYFQSLTLTVMFKMIFNENKSFQLYSDSTEFKLIFKTILNLLNSLNVYNVIYDFLGIFQP
ILLKFTKILDKNSFLSKIATEKFNSRIKEIDFTSDDFKANDLLDSLIMTINEDENGLNEK
QIENIKSICIDFLMAGTDTTGSTIEWIILKLVNNPEFQELIFQELKKLNKSEITANDKIN
TPFLNSFIKETNRLYPIAPLSLPRKSINEMIIGDNKYYIPANTNILMDVKGFSLDENNYK
DPNEFKPDRFLNSKVSDTLNFGIGPRNCIGQTIAMNQIHIFLSNLILNYRMFSIDCLPLP
ENLILSVSVRPTEYSLKLIKRV
>CYP554A1 SEQ 40 30% to
508 469 aa complete
MDLLLFIFFLILFYYS (0)
YYKADNQNSLSLSGPTPVPILGNIHQVGKDAHLTIPIISKKYHGLFRMWFGGTYYVVV
SDYKLIREMYIENFENFKNRIA
TFKTMTGDDSRGIIGCNGDIWDSNKELIMKSYKKVLNKDMNDFILLKSKELFNFFEKNGIKNE
(INTRON)
IINNTRFYFQSLTLT
VMFKMIFNENKSFQLYSDST
EFKLIFKTILNLLNSLNVYNVI
YDFLGIFQPILLKFTKILDKNSFLSKIA
TEKFNSRIKEIDFTSDDFKANDLLDSLIMTINEDEN
GLNEKQIENIKSICIDFLMAGTDTTGSTIEWIILKLVNNPEFQELIFQELK
KLNKSEITANDKINTPFLNSFIKETNRLY
(0)
KSNNNFFFRKSINEMIIGDNKYYIPANTNILMDVKGFSLDENNYKDPNEFKPDRFLNSKVSDTLNFGI
GPRNCIGQTIAMNQIHIFLSNLILNYRMFSIDCLPLPENLF*
1. VK there are two
possible GTs for this exon boundary
2. IIACP2E4184 is only seq
with LFRMWF so change this to IFRMWL
3. EEEDDDDDGNKS no good GT AG pairs in frame so
leave this in
4. PIAPLSLP matches other
related P450s in this region so this is correct
5. ILSVSVRPTEYSLKLIKRV adds
correct C-terminal
use the
ng1193 seq as the correct seq.
>CYP554A1
SEQ 40 30% to 508 469 aa complete
Length = 470
Score = 2323 (817.7 bits), Expect =
7.3e-245, P = 7.3e-245
Identities = 458/483 (94%), Positives =
459/483 (95%)
Query: 1
MDLLLFIFFLILFYYSVKYYKADNQNSLSLSGPTPVPILGNIHQVGKDAHLTIPIISKKY 60
MDLLLFIFFLILFYYS
YYKADNQNSLSLSGPTPVPILGNIHQVGKDAHLTIPIISKKY
Sbjct: 1
MDLLLFIFFLILFYYS--YYKADNQNSLSLSGPTPVPILGNIHQVGKDAHLTIPIISKKY 58
Query: 61
HGIFRMWLGGTYYVVVSDYKLIREMYIENFENFKNRIATFKTMTGDDSRGIIGCNGDIWD 120
HG+FRMW GGTYYVVVSDYKLIREMYIENFENFKNRIATFKTMTGDDSRGIIGCNGDIWD
Sbjct: 59
HGLFRMWFGGTYYVVVSDYKLIREMYIENFENFKNRIATFKTMTGDDSRGIIGCNGDIWD 118
Query: 121
SNKELIMKSYKKVLNKDMNDFILLKSKELFNFFEKNGIKNEEEEDDDDDGNKSIIINNTR 180
SNKELIMKSYKKVLNKDMNDFILLKSKELFNFFEKNGIKNE
IINNTR
Sbjct: 119
SNKELIMKSYKKVLNKDMNDFILLKSKELFNFFEKNGIKNE-------------IINNTR 165
Query: 181
FYFQSLTLTVMFKMIFNENKSFQLYSDSTEFKLIFKTILNLLNSLNVYNVIYDFLGIFQP 240
FYFQSLTLTVMFKMIFNENKSFQLYSDSTEFKLIFKTILNLLNSLNVYNVIYDFLGIFQP
Sbjct: 166
FYFQSLTLTVMFKMIFNENKSFQLYSDSTEFKLIFKTILNLLNSLNVYNVIYDFLGIFQP 225
Query: 241
ILLKFTKILDKNSFLSKIATEKFNSRIKEIDFTSDDFKANDLLDSLIMTINEDENGLNEK 300
ILLKFTKILDKNSFLSKIATEKFNSRIKEIDFTSDDFKANDLLDSLIMTINEDENGLNEK
Sbjct: 226
ILLKFTKILDKNSFLSKIATEKFNSRIKEIDFTSDDFKANDLLDSLIMTINEDENGLNEK 285
Query: 301
QIENIKSICIDFLMAGTDTTGSTIEWIILKLVNNPEFQELIFQELKKLNKSEITANDKIN 360
QIENIKSICIDFLMAGTDTTGSTIEWIILKLVNNPEFQELIFQELKKLNKSEITANDKIN
Sbjct: 286
QIENIKSICIDFLMAGTDTTGSTIEWIILKLVNNPEFQELIFQELKKLNKSEITANDKIN 345
Query: 361
TPFLNSFIKETNRLYPIAPLSLPRKSINEMIIGDNKYYIPANTNILMDVKGFSLDENNYK 420
TPFLNSFIKETNRLY
RKSINEMIIGDNKYYIPANTNILMDVKGFSLDENNYK
Sbjct: 346 TPFLNSFIKETNRLYKSNNNFFFRKSINEMIIGDNKYYIPANTNILMDVKGFSLDENNYK
405
Query: 421
DPNEFKPDRFLNSKVSDTLNFGIGPRNCIGQTIAMNQIHIFLSNLILNYRMFSIDCLPLP 480
DPNEFKPDRFLNSKVSDTLNFGIGPRNCIGQTIAMNQIHIFLSNLILNYRMFSIDCLPLP
Sbjct: 406
DPNEFKPDRFLNSKVSDTLNFGIGPRNCIGQTIAMNQIHIFLSNLILNYRMFSIDCLPLP 465
Query: 481 ENL 483
ENL
Sbjct: 466 ENL 468
>aa-ng1210
Contig_0437
pred cdna pORF 7346
5011 strand r start y stop y ph 0
MDVNGPWSLPIIGGIYLINDNPNRALTKLSKKYGGIYKIWLGESFSMVVSDPEIVNEIWV
KQHDNFINRPKNITHKMFSSNYRSLNFGDNPNWKFNRSMASSHFTKTKLLSSKVTSVVEK
KLNKLIETMEYHSINKLPFDSYVGFSEYSLNIILNMLVSMDIDECENSTQNVIYSINEIF
KMLSTNSPQYSFPYLKFFFKKDLNNFKFHLDKIKSFIHSIYLKQLESYDPSNPRNILDSF
ISDLQSNDIDILLQICIDIVVAGTDTVANLLQWFVLFCINYPEIQEKLYNEIIEVVGKDC
KVLKYEHISKMPYLYGCFRESLRIRPVTPLSLPRVAKCDTYIKDDIFIPKGATIIQNIFG
MGNDEKYISEPNKFKPERWVEYIKNKKVNKNGNENSVNKYFNDLDKISIPFGVGKRQCLS
PAMAEQESLLSIATVVLNYKLKSNGQKKLNEKEVYSITIKPQPFKLFLEKRFVRKRV
Note:
ng1210 missing N-terminal exon and some exon 2. compare to 520A1
One amino
acid diff. Y vs. *
Stop codon
in JC1c13f09.s1 Y in Contig_0437 correct seq to Y
Use my
sequence for N-terminal
>CYP520B1 Seq 58 47% to seq 47 317 aa
MNYLLIIICIIFFSLFFDF
KIRKNWNLNFKRLFKKDVNGPWSLPIIGGIYLINDNPNRALTKLSKKYGGIYKIWLGESFSM
VVSDPEIVNEIWVKQHDNFINRPKNITHK
MFSSNYRSLNFGDNPNWKFNRSMASSHFTKTKLLSSK
VTSVVEKKLNKLIETMEYHSINKLP