Drosophila P450 Links
What's New Jan. 27, 2000
All Drosophila melanogaster P450 protein sequences have been posted as a FASTA
File All Drosophila melanogaster P450s
These sequences are all in Genbank. There are no confidential sequences left.
There are 86 P450 genes and 4 pseudogenes. CYP51 is absent. Since CYP51 is
also absent in C. elegans, this important eukaryotic sterol biosynthetic gene
may have been lost in the common ancestor of flies and nematode worms.
What's New Jan. 19, 2000
I have posted a 4 family tree with 89 sequences, including the new Drosophila
sequences in the 4 family and in the 18 clan.
New 4 Family tree
A second tree covering the remaining sequences including the 6, 9, 12 and 28 families is also here
New 6 and 9 Family tree
What's New Jan. 2, 2000
Two new sequence alignments of the I-helix to the end half of the proteins are
posted. The first alignment covers 73 sequences (mainly insect sequences) in a
4 family alignment. The second alignment
has 80 sequences (mainly 6, 9, 12, 28 families) with many new Drosophila
sequences included in both alignments. See second alignment.
These will form the basis for naming the new Drosophila
P450s, which will be done by Jan. 15, 2000. Once the trees for these alignments
have been debugged and polished, they will be posted with the new nomenclature
for the Drosophila sequences. The trees will contain about 31-32 confidential
sequences, but these will not be in the alignments.
Nov. 29, 1999 We are now in the final stages of completion of the fly genome.
In the next month, nearly all P450s in Drosophila should be identified and
posted to this site. Currently 80 N-terminal sequences have been identified and
57 are from complete sequences. The remaining 23 partial sequences should be
filled in soon as the Drosophila data is deposited from Celera. I will hold off
naming them until the sequence data is complete. For earlier updates see the
Whats New section of the main page.
A press release of July 28 from Celera stated that one million sequences (500 million bp of
sequence ) have been completed from the Drosophila genome. Below is a quote from the
press release.
"Celera expects to complete the random sequencing phase of
Drosophila in early September when it will begin sequencing the
human genome. This will entail completing another 2 million
sequences-or about 1 billion letters of genetic code. Working with
the Berkeley Drosophila Genome Project (BDGP), Celera will then
fill gaps and resolve ambiguities in the sequence to produce finished
sequence. Celera will begin making sequence data available to the
public in October 1999, and anticipates release of the completed
sequence by the end of the year and publication in collaboration with
the BDGP in early 2000."
Note July 9: The Rubin sequencing effort continues to deposit more sequence with over
1700 in June and at least 26 so far in July. These will be searched for new P450s.
Note June 16: The 4 family has been partially displayed in a tree including subfamilies 4A-
4P, with many mammal sequences included. The 47 family is probably missnamed. Also
4E4 should be a separate subfamily. Another tree has been prepared that reduces the
number of very similar sequences to include more CYP4 subfamilies up to Cyp4aa1
(formerly Cyp47). See the the tree above with 61 CYP4 sequences.
Note: June 14, 1999 The tree with 56 insect P450s includes many new Drosophila
sequences. Some are not yet named. This tree is based on an alignment that covers the I-
helix to the ends of the sequences, since many are missing the N-terminal. The 4 family sequences are
not included here. There are too many to fit, they will be treated in a separate tree.
Note: June 11, 1999
The Drosophila P450s have been found in Genbank by systematic BLAST searches of the
nr, month, others ESTs, gss and htgs sections, using different P450 family
representatives. The first search with Cyp4d2 yielded 101 new ESTs, 6 new sequences
from month, one from htgs and none from gss or nr. The second search with Cyp6d2 only
found 17 new ESTs and one sequence from month. The third search hit only 5 new ESTs
and one sequence from nr. At this point the search was halted, since the returns were not
worth the effort of scanning the output for new sequences. Some of the new sequences are
very different from other P450s (AC005130) and cannot be easily assembled into a
complete sequence by comparison with known P450s. I have identified exon containing
ORFS from this gene, but I cannot detect the exon boundaries. If you are brave have a try
at it. The new sequences (almost 300 total in the original FASTA file) have been compared
with each other by repetitive Do-It-Yourself WU-BLASTs and condensed onto 98
contigs. Ten of these are from other Drosophila species (4d10, 4e5, 6a9, 9b3, 9f1, 13b1,
28a1, 28a2, 28a3, 28a4), 88 are from D. melanogaster. Based on C. elegans 80 P450
genes, these 88 genes and gene fragments may represent nearly all the P450s from
Drosophila, though some are probably N- and C-terminals of the same gene and the
number of contigs will drop as the genome is completed.
Note: On May 28, 1999 28,049 Drosophila genome survey sequences were deposited from
Genoscope in France. These are BAC end sequences. The percent of the Drosophila
genome sequenced as reported at the MOT tables jumped from 15% to 24%. I have not
had a chance to search these for P450 hits, but there should be a number of new P450s in
this large sequence collection of 9% of the Drosophila genome.
A preliminary BLAST search with 6a2 as the query found 37 bona fide P450 hits in the
genome survey sequences in the month section of genbank. These probably represent 25 different genes.
There are probably more than this, but I will have to search with other
families like 9 and 28 to find them. These sequences have now been translated and added
to the FASTA file above.
June 10, 1999. More extensive searches of the nr, est, htgs, gss, and month sections of Genbank have
identified 235 ESTs, 44 genome survey sequences, 30 AC00XXXX genomic P1 clones and 41 other sequences
for a total of 350 accession numbers for Drosophila P450s. These have all been translated and are
being assembled into contigs. (See the FASTA file)
|AC007549 Drosophila melanogaster chromosome 2 clo... 1012 0.0 Cyp6a2
emb|AL054861.1|CNS00A30 Drosophila melanogaster 182 3e-77 cyp6a9
emb|AL053264.1|CNS0098O Drosophila melanogaster 272 4e-73 cyp6a9
emb|AL072094.1|CNS00GEP Drosophila melanogaster 178 1e-72 cyp6a9
emb|AL055555.1|CNS004XH Drosophila melanogaster 171 5e-50 cyp6a9
emb|AL070586.1|CNS00DGA Drosophila melanogaster 108 1e-38 cyp6a9
emb|AL054261.1|CNS004MS Drosophila melanogaster 105 1e-22 cyp6a9
emb|AL069964.1|CNS00DFU Drosophila 136 6e-32 72% identical to 6a9
emb|AL054065.1|CNS004PR Drosophila melanogaster 222 2e-62 cyp6a8
emb|AL063862.1|CNS00350 Drosophila melanogaster 123 5e-28 cyp9c1
emb|AL076220.1|CNS00JFP Drosophila melanogaster 77 5e-23 cyp9c1
gb|AC007581.2|AC007581 Drosophila melanogaster 89 7e-18 cyp9c1
gb|AC007291.10|AC007291 Drosophila melanogaster 57 9e-14 cyp4e3
emb|AL076873.1|CNS00JXU Drosophila 191 1e-59 exact match with AA951440
emb|AL076863.1|CNS00JXK Drosophila 173 2e-79 exact match with AA951440
emb|AL052842.1|CNS000F5 Drosophila 215 8e-56 exact match with AA699131
emb|AL074108.1|CNS00HVU Drosophila 171 7e-48 exact match with AA699131
emb|AL078165.1|CNS00KMI Drosophila 196 4e-50 exact match with Dm3472
emb|AL069773.1|CNS00ERU Drosophila 87 4e-17 exact match to Dm3472
emb|AL065891.1|CNS006T4 Drosophila 126 5e-29 exact match with AA141600
emb|AL058810.1|CNS0017H Drosophila 68 3e-11 exact match with Dm0590
emb|AL055637.1|CNS00ALR Drosophila 62 1e-09 exact match with AL058497
emb|AL058497.1|CNS00BYD Drosophila 40 0.006 exact match to AL055637
emb|AL070449.1|CNS00FAM 72 1e-12 exact match with composite sequence CK01076
emb|AL059533.1|CNS005I8 62 2e-09 exact match with composite sequence CK01076
emb|AL061295.1|CNS001S5 Drosophila 59 1e-08 exact match to L46858
emb|AL061650.1|CNS00613 Drosophila 74 3e-13 60% identical to L46858
emb|AL065705.1|CNS006L5 Drosophila 58 2e-08 exact match to AA698945
emb|AL059237.1|CNS00CG4 58 2e-08 exact match to AC005811, AL062712, AL068269, AL075733
emb|AL062712.1|CNS002HH 54 3e-07 exact match to AC005811, AL059237, AL068269, AL075733
emb|AL068269.1|CNS00LIR 70 3e-18 exact match to AC005811, AL062712, AL059237, AL075733
emb|AL075733.1|CNS00J4Z 51 2e-06 exact match to AC005811, AL062712, AL068269, AL059237
emb|AL054245.1|CNS009UB Drosophila 46 2e-12 exact match to AL062352
emb|AL062352.1|CNS002D3 Drosophila 89 4e-29 exact match to AL054245
emb|AL057969.1|CNS00BXP Drosophila 61 3e-09 exact match to AL067059
emb|AL067059.1|CNS007EC Drosophila 56 9e-08 exact match to AL057969
emb|AL057750.1|CNS00162 136 5e-58 65% identical to AL062684
emb|AL062684.1|CNS002GP 155 7e-38 65% identical to AL057750
gb|AC007356.6|AC007356 Drosophila 71 2e-12 probable mitochondrial clan sequence
gb|AC005472.9|AC005472 114 2e-25 66% identical to AA567377
gb|AC007571.2|AC007571 Drosophila 82 1e-15 probable new family
emb|AL072844.1|CNS00H2C Drosophila 80 5e-15 42% identical to 6a5
emb|AL070820.1|CNS00FNQ Drosophila 40 0.006 40% identical to CYP28A1