Feb. 8, 1999 revised Oct. 6, 1999 revised Dec. 22, 1999
There are 50 known P450s in humans and 15 pseudogenes (see below). Of the 50 genes that are known, 47 of them have ESTs in the EST section of Genbank. This is 94% of the known P450s. CYP2G1, CYP4F8, CYP7A1, CYP8B1, CYP11B1 and CYP2F1 from humans have no exact EST matches to their coding region (out of 1,250,000 human ESTs). CYP4F8, CYP11B1 and CYP8B1 have ESTs in the 3 prime non-coding region. The number of human disease genes cloned by positional cloning that have an EST in the dbEST is 83/91 or 91.2%. This is from Bassett's estimate Which was made in 1997. If new data are added from the list of disease genes at NCBI, the numbers increase to 118/122 = 96.7% of human disease genes have ESTs. This number is similar the the percent of known human P450s represented in the EST database. From these numbers we could predict 47/0.967 = 49 human P450s. Put another way, there ought to be 2 P450 genes in humans that do not have ESTs in the database. CYP2F1, CYP2G1 and CYP7A1 are known from humans but are not found in the EST database. This is actually one more than predicted. This estimate of 49 human P450s is one short of the actual P450 count in humans. This suggests that there are very few human P450s left to be discovered. This prediction is probably not far wrong, because even CYP27B1, which was notoriously hard to clone has ESTs in the database. Humans have 50 sequenced CYP genes and 15 pseudogenes. 1A1, 1A2, 1B1, 2A6, 2A7, 2A7PT (telomeric), 2A7PC(centromeric), 2A13, 2B6, 2B7P, 2C8, 2C9, 2C18, 2C19, 2D6, 2D7P, 2D7AP, 2D8P, 2D8BP, 2E1, 2F1, 2F1P, 2G1, 2J2, 2R1, 2S1, 3A4, 3A5, 3A5P1, 3A5P2, 3A7, 3A43, 4A11, 4B1, 4F2, 4F3, 4F8, 4F9P, 4F10P, 4F11, 4F12, 4X1, 4Z1, 5A1, 7A1, 7B1, 8A1, 8B1, 11A1, 11B1, 11B2, 17, 19, 21A1P, 21A2, 24, 26A1, 26B1, 27A1, 27B1, 39A1, 46, 51, 51P1, 51P2 2C10, 3A3 and 4A9 have been removed because they are probably sequencing artifacts. 2G1 has just been sequenced in humans. The Chromosome 19 region is represented by 93 contigs in AC008357 (Sept. 2 1999). The 2G1 sequence has been assembled from these, and one pseudogene fragment from AC008962 currently missing from AC008537. The sequence may have to be adjusted later. 2R1, 4X1 and 4F12 are missing their N-terminals 2S1, 4F11, 4Z1 and 39A1 are partial sequences For more details see New Human P450s For a list of UNIGENE entries of human P450s see human UNIGENE P450s
>Human 2G1 assembly hypothetical 78% identical to rat 2G1 from AC008537 and AC008962
MELGGAVTIFLALRLSCLLILIAWKRMDKAGKLPPGPTPILFLGHLLQVRTDATFQSFMK*
LREKYSPVFTVYMGPRPVVVLCGHEAVKEALIDQADEFSGRGELASIKQNFQGHG*
VALANGERWRILRRFSLTILRDFGMGKQSIKERIQEEASYLLEEFQKTK*
GAPIDPIFLLSRTVSNVISSVVFRSRFDYEDKQFLNLLRLINESFIEMSTPWAQ*
LYDMYSGIMQYLPGRHNLIYYLVEELKDFIASRVKINEASFDPQNPRDFIDCFLIKMH*
QEEKNPNTEFYLKNLVLTTLNLFVGGTETVSTTLHYGFLLLMKHPEVE*
AKIHEEINQVIGPHRLPRVDDRVKMPYTDVVIHEIQRLVDIVPMGVPHNIIQDTQFRGYLLPK*
GTDVFPLLGSVLKDPKYFRYPDAFYPQHFLDEQGRFKKNEAFVPFSSGRGK*
RICLGEAMARMELFLYFTSTLQNFSLCSLVPLVDIDITPKLSGFGNITPTYELCLVAR
The AC008537 has many P450s on it including CYP2B7, CYP2A6, CYP2A7, CYP2G1 And a possible new CYP2F sequence and probably some pseudogene fragments. Note this GSS might be the N-terminal of a human 2G1 pseudogene see AC008962 AQ620239 HS_5182_B2_D05_T7A RPCI-11 Human Male BAC Library Homo Length = 499 Identities = 49/61 (80%), Positives = 52/61 (84%) Frame = +1 Query: 1 MELGGAVTIFLALRLSCLLILIAWKRMDKAGKLPPGPTPILFLGHLLQVRTDATFQSFMK 60 ME+GGAVTIFLAL LSCLLILIAWK M+KAGKLPPGPTPI FL RTDATFQSFMK Sbjct: 205 MEMGGAVTIFLALCLSCLLILIAWK*MNKAGKLPPGPTPIPFLXEPAASRTDATFQSFMK 384 This is a GSS fragment for a 2G1 related gene AQ791192 HS_4507_B1_H06_T7A CIT Approved Human Genomic Sperm Library D Homo Length = 515 Identities = 53/61 (86%), Positives = 54/61 (87%) Frame = +1 Query: 1 MELGGAVTIFLALRLSCLLILIAWKRMDKAGKLPPGPTPILFLGHLLQVRTDATFQSFMK 60 MELGGAV IFLAL SCLLILIAWK MDKA KLPPGPTPILFLGHLL VRTDATFQSFM Sbjct: 199 MELGGAVNIFLALSSSCLLILIAWKPMDKARKLPPGPTPILFLGHLLHVRTDATFQSFMN 378 CYP3A43 human GenEMBL AC011904 8902-46787 13 exons Gene assembled from genomic sequence by Henry Strobel and David Nelson on Dec 11, 1999 intron exon boundaries defined by comparison to rat 3A9 and human 3A sequences. GT AG pairs found for all introns ESTs AA417369 zu08d03.s1 AA416822 zu08d03.r1 Soares testis Opposite ends of same clone 67% identical to rat 3A9 Assembled gene * = intron exon boundary ** = EST support for this boundary MDLIPNFAMETWVLVATSLVLLYI* YGTHSHKLFKKLGIPGPTPLPFLGTILFYLR* GLWNFDRECNEKYGEMWG* LYEGQQPMLVIMDPDMIKTVLVKECYSVFTNQM* PLGPMGFLKSALSFAEDEEWKRIRTLLSPAFTSVKFKE* MVPIISQCGDMLVRSLRQEAENSKSINLKE* DFFGAYTMDVITGTLFGVNLDSLNNPQDPFLKNMKKLLKLDFLDPFLLLI* SLFPFLTPVFEALNIGLFPKDVTHFLKNSIERMKESRLKDKQK* HRVDFFQQMIDSQNSKETKSHK* ALSDLELVAQSIIIIFAAYDTTSTTLPFIMYELATHPDVQQKLQEEIDAVLPNK** APVTYDALVQMEYLDMVVNETLRLFPVVSRVTRVCKKDIEINGVFIPKGLAVMVPIYALHHDPKYWTEPEKFCPE** RFSKKNKDSIDLYRYIPFGAGPRNCIGMRFALTNIKLAVIRALQNFSFKPCKETQ** IPLKLDNLPILQPEKPIVLKVHLRDGITSGP* coding region 504 amino acids exon 1 8902-8972 exon 2 17239-17332 exon 3 19906-19958 exon 4 24929-25028 exon 5 28274-28387 exon 6 28952-29040 exon 7 30332-30480 exon 8 36377-36504 exon 9 37619-37685 exon 10 40616-40776 exon 11 42399-42625 exon 12 44323-44485 exon 13 46692-46787