David R. Nelson Feb. 8, 1999 Revised Sept. 20, 1999, Oct. 13, 1999 Dec. 11, 1999, Feb. 10, 2000 In November 1995, I did a comprehensive search of all the human ESTs at the time looking for new P450s not yet identified or cloned. There were 297,363 human ESTs on Nov. 30, 1995. I found 12 P450s not yet described. Over the last five years, 11 of these have been cloned. One remains. I was planning to write this up and publish it, but the current state of genome sequencing in humans will make this a useless effort. Therefore, I have decided to publish these findings on the web and let the P450 cloners go get these sequences. CYP2S1 52% identical to CYP2B subfamily members and 50% with CYP2A members 50% with CYP2G1. This sequence is probably in a new subfamily. It has been named CYP2S1 PGTEFTNKNMLMTVIYXLFAGTMTVSTTVGYTLLLXMKYPHVQKWVRX ELNRELGAGQAPSLGDRTRLPYTDAVLHEAQRLLALVPMGIPRTLMRTTR FRGYTLPQGTEVFPLLGSILHEPNIFKHPEEFNPDRFLDADGRFRKHEAFLP FSLGKRVCLGEGLAKAEVFLFFTTILQAFSLESPCPPDTLSLKPTVSGLFNIPP AFQLQVRPTDLHSTTQTR* This sequence is a consensus of ESTs T84852, AA315278, AA300981 and AA301039 There was no UNIGENE entry for any of these ESTs Note on Sept. 17, 1999 I did find a mouse homolog for this sequence and it has the full C-terminal sequence. When I searched with this mouse C-terminal I found the human C-terminal also AA316621 AA496320. mouse ortholog of CYP2S1 starts at amino acid 61 ESTs AA546445, AI585412 LSKKYGPVFTVYLGPWRRVVVLVGHDAVREALGGQAEEFSGRGTLATLDKTFDGHGVFF ANGERWKQLRKFTLLALRDLGMGKREGEELIQAEVQSLVEAFRKTGGRPFNPSMLLGPA TSNVVCSLVFGIRFAYDDKEFQAVNQAASGTLLGSSSQWGQ GAP ESTs AA967201, AA543966 RLPYTDAVLHEAQRLLALVPMGMPHTITRTTCFRGYTLPKGTEVFPLIGSILHDPAVFQNPGEFHPGRFLDEDGRLRKHEAFL PYSLGKRVCLGEGLARAELWLFFTSILQAFSLETPCPPGDLSLKPAISGLFNIPPDFQLRVWPTGDQSR* Human genomic DNA matches to 2S1 gb|AC011505.2|AC011505 Homo sapiens chromosome 19 clone CITB-H1_2081K17, WORKING DRAFT SEQUENCE, 10 unordered pieces Length = 166330 CYP2S1 52% identical to CYP2B subfamily members and 50% with CYP2A members 50% with CYP2G1. This sequence is probably in a new subfamily. It has been named CYP2S1 LOCUS AC011510 130776 bp DNA HTG 06-FEB-2000 gb|AC011510.3|AC011510 Homo sapiens chromosome 19 clone CT-2195B23, WORKING DRAFT SEQUENCE, 9 ordered pieces Length = 130776 NINE EXONS Unigene entry Hs.98370 MEATGTWALLLALALLLLLTLALSGTRARGHLPPGPTPLPLLGNLLQLRPGALYSGLMR* LSKKYGPVFTIYLGPWRPVVVLVGQEAVREALGGQAEEFSGRGTVAMLEGTFDGH* GVFFSNGERWRQLRKFTMLALRDLGMGKREGEELIQAEARCLVETFQGTE* GRPFDPSLLLAQATSNVVCSLLFGLRFSYEDKEFQAVVRAAGGTLLGVSSQGGQ* TYEMFSWFLRPLPGPHKQLLHHVSTLAAFTVRQVQQHQGNLDASGPARDLVDAFLLKMAQ* EEQNPGTEFTNKNMLMTVIYLLFAGTMTVSTTVGYTLLLLMKYPHVQ* KWVREELNRELGAGQAPSLGDRTRLPYTDAVLHEAQRLLALVPMGIPRTLMRTTRFRGYTLPQ* GTEVFPLLGSILHEPNIFKHPEEFNPDRFLDADGRFRKHEAFLPFSL* GKRVCLGEGLAKAEVFLFFTTILQAFSLESPCPPDTLSLKPTVSGLFNIPPAFQLQVRPTDLHSTTQTR* Exon 1 comp(108139-108315) Exon 2 comp(106871-107036) Exon 3 comp(103651-103801) Exon 4 comp(102958-103118) Exon 5 comp(102692-102871) Exon 6 comp(100208-100349) Exon 7 comp(98748-98935) Exon 8 comp(96286-96427) Exon 9 comp(95897-96105) 3 prime UTR comp(94852-95896) = EST AI445492 that has a poly A tail Human ESTs = AA315278, AA300981, AA316621, AA496320, T84852, AA301039 Plus 17 ESTs in 3 prime UTR in Unigene entry Hs. 98370 ********************* There is a new human sequence in the genomic data from Genbank. It has been named CYP3A43 human GenEMBL AC011904 8902-46787 13 exons Gene assembled from genomic sequence by Henry Strobel and David Nelson on Dec 11, 1999 intron exon boundaries defined by comparison to rat 3A9 and human 3A sequences. GT AG pairs found for all introns ESTs AA417369 zu08d03.s1 AA416822 zu08d03.r1 Soares testis Opposite ends of same clone 67% identical to rat 3A9 Assembled gene * = intron exon boundary ** = EST support for this boundary MDLIPNFAMETWVLVATSLVLLYI* YGTHSHKLFKKLGIPGPTPLPFLGTILFYLR* GLWNFDRECNEKYGEMWG* LYEGQQPMLVIMDPDMIKTVLVKECYSVFTNQM* PLGPMGFLKSALSFAEDEEWKRIRTLLSPAFTSVKFKE* MVPIISQCGDMLVRSLRQEAENSKSINLKE* DFFGAYTMDVITGTLFGVNLDSLNNPQDPFLKNMKKLLKLDFLDPFLLLI* SLFPFLTPVFEALNIGLFPKDVTHFLKNSIERMKESRLKDKQK* HRVDFFQQMIDSQNSKETKSHK* ALSDLELVAQSIIIIFAAYDTTSTTLPFIMYELATHPDVQQKLQEEIDAVLPNK** APVTYDALVQMEYLDMVVNETLRLFPVVSRVTRVCKKDIEINGVFIPKGLAVMVPIYALHHDPKYWTEPEKFCPE** RFSKKNKDSIDLYRYIPFGAGPRNCIGMRFALTNIKLAVIRALQNFSFKPCKETQ** IPLKLDNLPILQPEKPIVLKVHLRDGITSGP* coding region 504 amino acids exon 1 8902-8972 exon 2 17239-17332 exon 3 19906-19958 exon 4 24929-25028 exon 5 28274-28387 exon 6 28952-29040 exon 7 30332-30480 exon 8 36377-36504 exon 9 37619-37685 exon 10 40616-40776 exon 11 42399-42625 exon 12 44323-44485 exon 13 46692-46787
T91507 and T91536 CYP2R1 (still confidential) UNIGENE entry Hs.16846 (14 ESTs) CYP4F11 This sequence is made from two ESTs in the original 12 from N- and C-terminal regions. The sequence has been named CYP4F11 GenEMBL AC005336 cosmid F20129 end of cosmid ESTs T56269 and T56204 opposite ends of clone yb89b03 ESTs T69576 and T69645 opposite ends of clone yc44c03 EST AA991369 and EST W23003 G07004 human STS WI-8821 N-terminal up though the C helix and the C-terminal from I helix to end is present. The middle region is not present. The rest of this gene should be on cosmid R28342 this is not in the database yet. About 22 amino acids missing at N-terminal. LLLVGGSWLLARVLAWTYTFYDNCRRLQCFPQPPKQNWFWGHQGLVTPTE EGMKTLTQLVTTYPQGFKLWLGPTFPLLILCHPDIIRPITSASAAVAPKD MIFYXXLKPWLGDGLLLSXXDKWNRQRRM (167 amino acid gap) IRGEXDTXMXGGHDTTASGLSWVLYHLKRHPEYQEQCRKEVKEXL KDREPIEIEWDDLAQXPFLTMCIKESLRLXPPVPVISRCXTQDXVLPDGRXIPKXIV CLINIIG IHYNPTVWPDPEVYDPFRFDQENIKERSPLAFIPFSAGPRNCIGQAFAM AEMKVVLALTLLHFRILPTHTEPRRKPELILRAEGGLWLRVEPLGANSQ* The 4F11 sequence is now known in full from a cDNA submitted by Henry Strobel and Xiaoming Cui cDNA sequence: N-terminal is on AC011517 rest is on AC020950 MPQLSLSWLGLGPVAASPWLLLLLVGGSWLLARVLAWTYTFYDNCRRLQC FPQPPKQNWFWGHQGLVTPTEEGMKTLTQLVTTYPQGFKLWLGPTFPLLI LCHPDIIRPITSASAAVAPKDMIFYGFLKPWLGDGLLLSGGDKWSRHRRML TPAFHFNLKPYMKIFNKSVNIMHDKWQRLASEGSARLDMFEHISLMTLDS LQKCVFSFESNCQEKPSEYIAAILELSAFVEKRNQQILLHTDFLYYLTPDGQR FRRACHLVHDFTDAVIQERRRTLPTQGIDDFLKNKAKSKTLDFIDVLLLSKD EDGKELSDEDIRAEADTFMFEGHDTTASGLSWVLYHLAKHPEYQEQCRQEV QELLKDREPIEIEWDDLAQLPFLTMCIKESLRLHPPVPVISRCCTQDFVLPDG RVIPKGIVCLINIIGIHYNPTVWPDPEVYDPFRFNQENIKERSPLAFIPFSAGP RNCIGQAFAMAEMKVVLALTLLHFRILPTHIEPRRKPELILRAEGGLWLRVEPLGANSQ 4F11 genomic DNA 12 exons Exon 1 AC011517 comp(11030-11223) four frameshifts in this exon MPQLSLSWLGLGPVAASPWLLLLLVGGSWLLARVLAWTYTFYDNCRRLQCFPQPPKQNWFWGHQGL Exon 2 AC011517 comp(7979-8120) missing G base at end of exon before GT pair Three frameshifts in this exon VTPTEEGMKTLTQLVTTYPQGFKLWLGPTFPLLILCHPDIIRPITSAS Exon 3 AC020950 comp(22900-22954) AAVAPKDMIFYGFLKPWLG Exon 4 AC020950 comp(22683-22811) DGLLLSGGDKWSRHRRMLTPAFHFNLKPYMKIFNKSVNIMH Exon 5 AC020950 comp(18976-19097) DKWQRLASEGSARLDMFEHISLMTLDSLQKCVFSFESNCQ Exon 6 AC020950 comp(18027-18297) EKPSEYIAAILELSAFVEKRNQQILLHTDFLYYLTPDGQRFRRACH LVHDFTDAVIQERRRTLPTQGIDDFLKNKAKSKTLDFIDVLLLSK Exon 7 AC020950 7459-7522 three frameshifts in this exon DEDGKELSDEDIRAEADTFMFE Exon 8 AC020950 7717-7849 one frameshift in this exon GHDTTASGLSWVLYHLAKHPEYQEQCRQEVQELLKDREPIEIE Exon 9 AC020950 15360-15493 WDDLAQLPFLTMCIKESLRLHPPVPVISRCCTQDFVLPDGRVIPK Exon 10 AC020950 15595-15662 GIVCLINIIGIHYNPTVWPDPEV Exon 11 AC005336 comp(36538-36620) YDPFRFNQENIKERSPLAFIPFSAGPR Exon 12 AC020950 comp(17336-17514) one frameshift in this exon NCIGQAFAMAEMKVVLALTLLHFRILPTHIEPRRKPELILRAEGGLWLRVEPLGANSQ* Exon 1 and 2 are on one contig on AC011517 Exon 3 and 4 are on one contig on AC020950 Exon 5 and 6 are on one contig on AC020950 Exon 7 and 8 are on one contig on AC020950 Exon 9 and 10 are on one contig on AC020950 Exon 11 and 12 are on one contig on AC005336 T98002 CYP4F12 GenEMBL AC004523 missing N-terminal UNIGENE entry Hs.110130 (12 ESTs) ITPTEEGLKNSTQMSATYSQGFTIWLGPIIPFIVLCHPDTIRSI TNASAAIAPKDNLFIRFLKPWLGEGILLSGGDKWSRHRRMLTPAFHFNILKSYITIFN KSANIMLDKWQHLASEGSSCLDMFEHISLMTLDSLQKCIFSFDSHCQERPSEYIATIL ELSALVEKRSQHILQHMDFLYYLSHDGRRFHRACRLVHDFTDAVIRERRRTLPTQGID DFFKDKAKSKTLDFIDVLLLSKDEDGKALSDEDIRAEADTFMFGGHDTTASGLSWVLY NLARHPEYQERCRQEVQELLKDRDPKEIEWDDLAQLPFLTMCVKESLRLHPPAPFISR CCTQDIVLPDGRVIPKGITCLIDIIGVHHNPTVWPDPEVYDPFRFDPENSKGRSPLAF IPFSAGPRNCIGQAFAMAEMKVVLALMLLHFRFLPDHTEPRRKLELIMRAEGGLWLRV EPLNVSLQ R53456 CYP4X1 (still confidential) UNIGENE entry Hs.26040 (13 ESTs) R21282 CYP26 CYP26A1 human GenEMBL NM_000783 White,J.A., Beckett-Jones,B., Guo,Y.D., Dilworth,F.J., Bonasoro,J., Jones,G. and Petkovich,M. cDNA cloning of human retinoic acid-metabolizing enzyme (hP450RAI) identifies a novel family of cytochromes P450 J. Biol. Chem. 272 (30), 18538-18541 (1997) Note: new family in mammals, homolog to human ESTs R51129 and R21282 MGLPALLASALCTFVLPLLLFLAAIKLWDLYCVSGRDRSCALPL PPGTMGFPFFGETLQMVLQRRKFLQMKRRKYGFIYKTHLFGRPTVRVMGADNVRRILL GDDRLVSVHWPASVRTILGSGCLSNLHDSSHKQRKKVIMRAFSREALECYVPVITEEV GSSLEQWLSCGERGLLVYPEVKRLMFRIAMRILLGCEPQLAGDGDSEQQLVEAFEEMT RNLFSLPIDVPFSGLYRGMKARNLIHARIEQNIRAKICGLRASEAGQGCKDALQLLIE HSWERGERLDMQALKQSSTELLFGGHETTASAATSLITYLGLYPHVLQKVREELKSKG LLCKSNQDNKLDMEILEQLKYIGCVIKETLRLNPPVPGGFRVALKTFELNGYQIPKGW NVIYSICDTHDVAEIFTNKEEFNPDRFMLPHPEDASRFSFIPFGGGLRSCVGKEFAKI LLKIFTVELARHCDWQLLNGPPTMKTSPTVYPVDNLPARFTHFHGEI CYP39A1 A new P450 family in humans The EST R07010 covers the C-terminal part of a P450. Two ESTs with coding regions are not found in UNIGENE, but the opposite end of EST R11279 = R11221 and it is in UNIGENE Hs.20766 with 16 EST sequences. This sequence is most like CYP7B1 and CYP8B1, but the percent identity is only 28%. The sequence is in a new family. It has been named CYP39A1. The *s indicate predicted intron exon boundaries. An h after the * indicates that this joint is confirmed by a human EST. An m after the * indicates the joint is supported by a mouse EST. A c after the * indicates the joint is supported by a chicken EST. The N-terminal exon is identified from the genomic sequence but it is a tentative identification requiring confirmation. The N-terminal has an EST AA398040 zt89c07.r1 that is part of a UNIGENE entry Hs.119154 with 5 ESTs. These appear to be from an untranslated region of a gene, including a poly A tail. I suspect that the AA398040 EST is flawed and has a retained intron sequence. The N-terminal and the intron sequence is found in the genomic clone AC008104. CYP39A1 has 12 exons. Only the boundaries after exons 1, 2 and 3 are not confirmed by EST data. The 2nd and 3rd exons were found by running the genomic DNA through a gene searching program called FGENESH at Baylor College of Medicines web site for sequence analysis. The second exon was also found by searching the genomic DNA with CYP8B1 as a query sequence. The 1st exon was found by searching with CYP7B1. CYP8B1 in mouse and human has no exons, but CYP8A1 has 10. It is probable that CYP8B1 evolved from a processed mRNA that had the introns removed. CYP8A1 and CYP39A1 do not share any intron exon boundaries. CYP7B1 is only known as mRNA so no intron boundaries can be defined. One CYP7B1 intron break is seen in GSS data 5 amino acids before the EXXR pair and it is not shared with CYP39A1. CYP7A1 has 6 exons and it may share one intron exon boundary at the end of the 2nd exon, but the alignment is not very good here. CYP51 shares one intron exon boundary at the KYG motif (end of exon 1 in CYP39A). This corresponds to the end of exon 2 in CYP51 The KYG motif is often associated with introns, and it may be an ancient site for a very early intron. I speculate that the conservation of sequence at this site as well as some others well conserved at intron locations may be due to the intron and not to any structure requirements of the P450s. CYP39A1 is most like CYP7B1 and CYP8B1, with CYP7A1 and CYP51 as additional matches. Since CYP7B1 is an oxysterol 7-alpha-hydroxylase, and CYP7A1 is cholesterol 7-alpha-hydroxylase and CYP8B1 is a sterol 12-alpha-hydroxylase, it is probable that CYP39A1 will have a sterol as substrate. However, CYP8A1 is prostacylin synthase, so this prediction may be incorrect. MELISPTVIIILGCLALFLLLQRKNLRRPPCIKGWIPWIGVGFEFGKAPLEFIEKARIK* YGPIFTVFAMGNRMTFVTEEEGINVFLKSKKVDFELAVQNIVYRT* ASIPKNVFLALHEKLYIMLKGKMGTVNLHQFTGQLTEELHEQLENLGTHGTMDLNNLVR* HLLYPVTVNMLFNKSLFSTNKKKIKEFHQYFQVYDEDFEYGSQLPECLLR* m NWSKSKKWFLELFEKNIPDIKACKSAKDNSM* m see 22757 (-) on AL035670 TLLQATLDIVETETSKENSPNYGLLLLWASLSNAVP* m c VAFWTLAYVLSHPDIHKAIMEGISSVFGKAG* m c KDKIKVSEDDLENLLLIKWCVLETIRLKAPGVITRKVVKPVEIL* h m c NYIIPSGDLLMLSPFWLHRNPKYFPEPELFKPERW* h (sequence not perfect at this boundary) EKGKFRRKHSFLGTASWAFGAGSSQCPGKV* m FALLEVQMCIILILYKYDCSLLDPLPKQ* h m SYLHLVGVPQPEGQCRIEYKQRI* human genomic clones that contain this gene: AC008104 Homo sapiens clone 446_F_17, WORKING DRAFT SEQUENCE, 13 unordered pieces Length = 180699 AL035670.15|HS347E1 Homo sapiens chromosome 6 clone dJ347E1, WORKING DRAFT SEQUENCE, in unordered pieces Length = 99785 Mouse ortholog CYP39A1 known from ESTs
Mouse EST AI118926 ue22b04.y1 MELFSPIAIAVLGSCVLFLFSRLKNLLGPPCIQGWIPWIGAGLEFGKAPLEFI
QFKTYDEGFEYGSQLPEWLLRNWSKSKRWLLALFEKNIGNIKAHGSAGHSGTLLQAILEVVETETRQYSPNYGLVVLWAALAN APPIAFWTLGYILSHPDIHRTVLESISSVFGTAGKDKIKVSEDDLKKLLIIKWCILESVRLRAPGVITRKVVKPVKILNHTVP SGDLLMLSPFWLHRNPKYFPEPESFKPERWKEANLDKYIFLDYFMAFGGRKFQCPGKWFALLEIQLCIILVLYKYECSLLDPL PKQSSRHLVGVPQPAGKCRIEYKQRA* exact matches for mouse AI118926 ue22b04.y1 N-terminal fragment AA272844 va97h09.r1 AA606237 vo06d06.r1 AI552260 vf73b10.y1 extends to 3 prime AA457858 vf73b10.r1 extends to 3 prime Mouse EST MELFSPIAIAVLGSCVLFLFSRLKNLLGPPCIQGWIPWIGAGLEFGKAPLEFI ||| || | || ||| ||| ||| ||||||| | |||||||||| Human 39A MELISPTVIIILGCLALFLLLQRKNLRRPPCIKGWIPWIGVGFEFGKAPLEFIEKARIK*
66% identical
These sequences are similar to a chicken sequence gb|AI979980.1|AI979980 pat.pk0008.g11 chicken activated T cell cDNA Gallus gallus cDNA clone pat.pk0008.g11 5' similar to cholesterol 7-alpha hydroxylase, mRNA sequence Length = 500 Score = 185 bits (465), Expect = 2e-46 Identities = 86/159 (54%), Positives = 119/159 (74%) Frame = +2 Query: 50 SGTLLQAILEVVETETRQYSPNYGLVVLWAALANAPPIAFWTLGYILSHPDIHRTVLESI 109 S LLQ +L+ + + PNYGL++LWA+ ANA PIAFWTL +ILS P +++ V+E + Sbjct: 29 SKXLLQHLLD--NLQGKHLXPNYGLLMLWASQANAVPIAFWTLVFILSSPSVYKKVMEDL 202 Query: 110 SSVFGTAGKDKIKVSEDDXXXXXXXXXXXXESVRLRAPGVITRKVVKPVKILNHTVPSGD 169 +SVFG AGKD+I+VSE+D E++RLR+PG IT+KV+KP++I + T+P+GD Sbjct: 203 TSVFGNAGKDEIEVSEEDLKNLPYIKWCTLEAIRLRSPGAITKKVIKPIRIQSFTIPAGD 382 Query: 170 LLMLSPFWLHRNPKYFPEPESFKPERWKEANLDKYIFLD 208 +LMLSP+WLHRNPKYFP+PE FKP+RWKE + + FLD Sbjct: 383 MLMLSPYWLHRNPKYFPDPEMFKPDRWKE-EI*RRXFLD 496 H06539 and R36281 CYP46 (in Genbank Aug 10, 1999) no UNIGENE entry for these ESTs opposite end of R36281 = R49568 with UNIGENE entry Hs.25121 (5 ESTs) The UNIGENE entry is only the 3 prime untranslated region of the mRNA CYP46 human GenEMBL NM_006668 Lund EG, Guileyardo JM and Russell DW. cDNA cloning of cholesterol 24-hydroxylase, a mediator of cholesterol homeostasis in the brain. Proc. Natl. Acad. Sci. U.S.A. 96, 7238-7243 (1999) 32% identity with Drosophila 4D2 ESTs H06539, H51951, R36281 mouse homolog EST AA096922 MSPGLLLLGSAVLLAFGLCCTFVHRARSRYEHIPGPPRPSFLLG HLPCFWKKDEVGGRVLQDVFLDWAKKYGPVVRVNVFHKTSVIVTSPESVKKFLMSTKY NKDSKMYRALQTVFGERLFGQGLVSECNYERWHKQRRVIDLAFSRSSLVSLMETFNEK AEQLVEILEAKADGQTPVSMQDMLTYTAMDILAKAAFGMETSMLLGAQKPLSQAVKLM LEGITASRNTLAKFLPGKRKQLREVRESIRFLRQVGRDWVQRRREALKRGEEVPADIL TQILKAEEGAQDDEGLLDNFVTFFIAGHETSANHLAFTVMELSRQPEIVARLQAEVDE VIGSKRYLDFEDLGRLQYLSQVLKESLRLYPPAWGTFRLLEEETLIDGVRVPGNTPLL FSTYVMGRMDTYFEDPLTFNPDRFGPGAPKPRFTYFPFSLGHRSCIGQQFAQMEVKVV MAKLLQRLEFRLVPGQRFGLQEQATLKPLDPVLCTLRPRGWQPAPPPPPC
It looks like it has been cloned now on Feb 4, 2000 see below, but the N-terminal 167 amino acids is still missing CYP4Z1X = CYP4A20 H21976 is 55% identical to CYP4A11 in the C-terminal region. It does have a UNIGENE entry Hs.176588 with eight ESTs from only two different clones. 60% identical to mouse 4A14. 58% identical to rabbit 4B1 Since the 4B and 4A subfamilies cannot be distinguished, this is probably a new subfamily. It has been named CYP4Z1 until the full length sequence is known. The name may have to be changed later. Assembled gene is 57% to 4A11 so this is not 4Z1 but is a new human CYP4A20 Missing first 167 amino acids NKWEEHIAQNSRLELFQHVSLMTLDSIMKCAFSHQGSIQLDRS SYLKAVFNLSKISNQRMNNFLHHNDLVFKFSSQGQIFSKFNQELHQFT HLEKVIQDRKESLKDKLKQDTTQKRRWDFLDILLSAKV ENTKDFSEADLQAEVKTFMFAGHDTTSSAISWILYCLAKYPEHQQRCRDEIRELLGDGSSITW EHLSQMPYTTMCIKECLRLYAPVVNISRLLDKPITFPDGRSLPA GITVFINIWALHHNPYFWEDPQV FNPLRFSRENSEKIHPYAFIPFSAG PRNCIGQHFAIIECKVAVALTLLRFKLAPDHSRPPQPVRQVVLKSKNGIHVFAKKV Related gene on AJ131016 = 4A11 Query: 1 NKWEEHIAQNSRLELFQHVSLMTLDSIMKCAFSHQGSIQLDRS 43 +KWEE + Q+S LE+FQHVSLMTLD+IMK AFSHQGSIQ+DRS Sbjct: 134795 DKWEELLGQDSPLEVFQHVSLMTLDTIMKSAFSHQGSIQVDRS 134667 Seq upstream of this seq 4A11 MSVSVLSPSRLLGDVSGILQAASLLILLLLLIKAVQLYLHRQWL LKALQQFPCPPSHWLFGHIQELQQDQELQRIQKWVETFPSACPHWLWGGKVRVQLYDP DYMKVILGRSDPKSHGSYRFLAPWIGYGLLLLNGQTWFQHRRMLTPAFHYDILKPYVG LMADSVRVMLDKWEELLGQDSPLEVFQHVSLMTLDTIMKCAFSHQGSIQVDRNSQSYI QAISDLNNLVFSRVRNAFHQNDTIYSLTSAGRWTHRACQLAHQHTDQVIQLRKAQLQK EGELEKIKRKRHLDFLDILLLAKMENGSILSDKDLRAEVDTFMFEGHDTTASGISWIL YALATHPKHQERCREEIHSLLGDGASITWNHLDQMPYTTMCIKEALRLYPPVPGIGRE LSTPVTFPDGRSLPKGIMVLLSIYGLHHNPKVWPNPEVFDPSRFAPGSAQHSHAFLPF SGGSRNCIGKQFAMNELKVATALTLLRFELLPDPTRIPIPIARLVLKSKNGIHLRLRR LPNPCEDKDQL Probable exon of 4A11 gene GYGLLLLNGQTWFQHRRMLTPAFHNDILKPYVGLMADSVRVML LPTTLFNLYLTAVIITWM*QLN*KIHMPEIYLLFLHLPMMWLLLDSSSPRKPPFSLPNCHQSSLGPCSL*YTHIHIFVSTLGYGLLLLNGQTWFQHRRMLTPAFHNDILKPYVGLMADSVRVMLVSPCLSPLPTPTHSTHSHQLHSQPCVPQAAIDIDTWNNTLRSLL*E*RVPQSCIR*ENTQASGHIHTLFPTIGIEIHGEHKALLLPLWNLSTRDWRHMQSCWTVGLPMLAGLGNGSFSGSMDRPRCHPPRLWPGAPGFCMDLSHPGKEMNTRSMFSEQYLP*IASSPVRTLISHPTVPSTF**MEKN*SPCFLYNTVPILAP*AIFGTETFCPKVGVLLVPQN*LSSQNTEESPIIYPRTTITTKFSQAYVLTESQNSILNPCYCERIC*MLY*KCVRIRCQFTNTGFTVPVSTMHLLCKVKNPNQDICKLQEKTRYLQVHEKARPY*QSTSWRLRRSMPPPTTGQMGRAPWPGFPSGGLSARLLDDPGHHHEECL CPQPFSTFISQLS*SHGCSSSIRKFTCLKYTSFSCICP*CGFFWIVLPPGSHLSPSQIVTSHP*VRAASDTHTYTYSCLP*GTACSC*MGRHGSSIDGC*PQPSTMTS*SHTWGSWQTLYE*CW*VHVSLLSPHPLTAHTLTNSTLNPVSHRQP*T*THGTTLSGHCCESRGFPRVV*GRRTPRHQVISTLCSPP*E*RFMVNTRPFSSHFGTSAQGTGGICNLVGQ*DFLCWLAWAMAASVAAWTGQDVTHPGSGLGPQVSAWI*ATLGRK*TPGLCSQSNTFHR*HHLQSGL*FLTQLCPAHFDEWKRIEVPVFSTTQCPY*PLKLSLALKLSALKLVSY*SLRTSSALRTLKSPPLFIPEQQ*QQNFHKHMS*LNLKIQYLIHAIVRGSVECCIKSV*ELGVSLPTLALQSQFLQCICSVK*RIPTKISASYRKKQDICRYMKKPGLISKALLGD*EGPCPLPPQDKWEELLGQDSPLEVFQHVSLMTLDTIMKSA AHNPFQPLSHSCHNHMDVAAQLENSHA*NIPPFPAFAHDVASSG*FFPQEATFLPPKLSPVIPRSVQPLIHTHTHIRVYLRVRLAPVEWADMVPASTDADPSLPQ*HPEAIRGAHGRLCTSDAGESMSLSSPHTHSQHTLSPTPLSTLCPTGSHRHRHMEQHSQVIAVRVEGSPELYKVGEHPGIRSYPHFVPHHRNRDSW*TQGPSPPTLEPQHKGLEAYAILLDSRTSYAGWPGQWQLQWQHGQAKMSPTQALAWGPRFLHGFKPPWEGNEHQVYVLRAIPSIDSIISSQDSDFSPNCAQHILMNGKELKSLFSLQHSAHTSPLSYLWH*NFLP*SWCPTSPSELAQLSEH*RVPHYLSQNNNNNKIFTSICPN*ISKFNT*SMLL*EDLLNVVLKVCEN*VSVYQHWLYSPSFYNASAL*SEESQPRYLQATGKNKISAGT*KSQALLAKHFLEIKKVHAPSHHRTNGKSSLARIPLWRSFSTSP**PWTPS*RVP Possible N-term? MNWEAIILSKLTQQQKTKYHMFSLISGNYMLDTYGHKYGNDRHWGLLEG* Upstream seq of AQ394813 (probably in intron region) EF*YIVEGFLS*LLTANSLRTRTISNSFLNSQWIWEKTCQVLNIFRDLFLESKTSVYLCSLQMFLNSYLFFIP MOUSE CYP4A14 sequence related to this human seq. (60% identical) GITATISIYGLHHNPRFWPNPKVFDPSRFAPDSSHHSHAYLPFSGGS RNCIGKQFAMNELKVAVALTLLRFELLPDPTRIPVPIARLVLKSKNGIHLCLKKLR" Blast of 4Z1 shows two genes on one genbank entry emb|AJ131016.1|HSA131016 Homo sapiens SCL gene locus Length = 193471 Score = 108 bits (268), Expect = 4e-23 Identities = 52/54 (96%), Positives = 52/54 (96%) Frame = -1 Query: 50 RNCIGQHFAIIECKVAVALTLLRFKLAPDHSRPPQPVRQVVLKSKNGIHVCQKK 103 RNCIGQHFAIIECKVAVALTLLRFKLAPDHSRPPQPVRQVVLKSKNGIHV KK Sbjct: 160300 RNCIGQHFAIIECKVAVALTLLRFKLAPDHSRPPQPVRQVVLKSKNGIHVFAKK 160139 Score = 67.9 bits (163), Expect = 7e-11 Identities = 34/54 (62%), Positives = 43/54 (78%) Frame = -3 This seq shows three diffs to 4A11 Query: 46 SAGLRNCIGQHFAIIECKVAVALTLLRFKLAPDHSRPPQPVRQVVLKSKNGIHV 99 SA RNCIG+ FA+ + KVA ALTLLRF+L PD +R P P+ ++VLKSKNGIH+ Sbjct: 129476 SAWPRNCIGKQFAMNQLKVARALTLLRFELLPDPTRIPIPMARLVLKSKNGIHL 129315 Score = 64.0 bits (153), Expect = 1e-09 Identities = 29/29 (100%), Positives = 29/29 (100%) Frame = -3 Query: 22 QVFNPLRFSRENSEKIHPYAFIPFSAGLR 50 QVFNPLRFSRENSEKIHPYAFIPFSAGLR Sbjct: 161417 QVFNPLRFSRENSEKIHPYAFIPFSAGLR 161331 Score = 58.2 bits (138), Expect = 6e-08 Identities = 23/23 (100%), Positives = 23/23 (100%) Frame = -3 Query: 1 GITVFINIWALHHNPYFWEDPQV 23 GITVFINIWALHHNPYFWEDPQV Sbjct: 162536 GITVFINIWALHHNPYFWEDPQV 162468 Blast of human genomic DNA with mouse 4A14 emb|AJ131016.1|HSA131016 Homo sapiens SCL gene locus Length = 193471 Score = 80.8 bits (196), Expect = 8e-14 Identities = 37/62 (59%), Positives = 42/62 (67%) Frame = -1 Query: 1 MGFFLFSPTRYLDGISGFFQWAFLLSLFLVLFKAVQFYLRRQWLLKTLQHFPCMPSHWLW 60 M + SP+R L G+SG Q LL L L+L KA Q YL RQWLLK LQ FPC PSHWL+ Sbjct: 140578 MSVSVLSPSRRLGGVSGILQVTSLLILLLLLIKAAQLYLHRQWLLKALQQFPCPPSHWLF 140399 Query: 61 GH 62 GH Sbjct: 140398 GH 140393 Score = 55.0 bits (130), Expect = 4e-06 Identities = 26/44 (59%), Positives = 33/44 (74%) Frame = -1 Query: 66 DKELQQILIWVEKFPSACLQCLSGSNIRVLLYDPDYVKVVLGRS 109 D+ELQ+I V+ FPSAC + G +RV LYDPDY+KV+LGRS Sbjct: 137275 DQELQRIQERVKTFPSACPYWIWGGKVRVQLYDPDYMKVILGRS 137144 Score = 82.7 bits (201), Expect = 2e-14 Identities = 37/48 (77%), Positives = 41/48 (85%) Frame = -2 Query: 120 FAPWIGYGLLLLNGKKWFQHRRMLTPAFHYDILKPYVKIMADSVNIML 167 F +GYGLLLLNG+ WFQHRRMLTPAFH DILKPYV +MADSV +ML Sbjct: 135972 FVSTLGYGLLLLNGQTWFQHRRMLTPAFHNDILKPYVGLMADSVRVML 135829 Score = 67.5 bits (162), Expect = 7e-10 Identities = 30/43 (69%), Positives = 37/43 (85%) Frame = -3 Query: 168 DKWEKLDGQDHPLEIFHCVSLMTLDTVMKCAFSYQGSVQLDEN 210 DKWE+L GQD PLE+F VSLMTLDT+MK AFS+QGS+Q+D + Sbjct: 134795 DKWEELLGQDSPLEVFQHVSLMTLDTIMKSAFSHQGSIQVDRS 134667 Score = 64.4 bits (154), Expect = 6e-09 Identities = 27/54 (50%), Positives = 41/54 (75%) Frame = -2 Query: 210 NSKLYTKAVEDLNNLTFFRLRNAFYKYNIIYNMSSDGRLSHHACQIAHEHTDGV 263 NS+ Y +A+ DLN+L F +RNAF++ + IY+++S GR +H ACQ+AH+HT V Sbjct: 134301 NSQSYIQAISDLNSLVFCCMRNAFHENDTIYSLTSAGRWTHRACQLAHQHTGSV 134140 Score = 55.0 bits (130), Expect(2) = 1e-53 Identities = 25/38 (65%), Positives = 33/38 (86%) Frame = -1 Query: 260 TDGVIKMRKSQLQNEEELQKARKKRHLDFLDILLFARM 297 TD VI++RK+QLQ E EL+K ++KRHLDFLDILL A++ Sbjct: 133711 TDQVIQLRKAQLQKEGELEKIKRKRHLDFLDILLLAKV 133598 Score = 179 bits (450), Expect(2) = 1e-53 Identities = 87/112 (77%), Positives = 102/112 (90%), Gaps = 32/112 (28%) Frame = -3 Query: 295 ARMEDRNSLSDEDLRAEVDTFMFEGHDTTASGISWIFYALATHPEHQQRCREEVQSILGD 354 ++ME+ + LSD+DLRAEVDTFMFEGHDTTASGISWI YALATHP+HQ+RCREE+ +LGD Sbjct: 133520 SQMENGSILSDKDLRAEVDTFMFEGHDTTASGISWILYALATHPKHQERCREEIHGLLGD 133341 Query: 355 GTSVTW--------------------------------DHLGQMPYTTMCIKEALRLYPP 382 G S+TW +HL QMPYTTMCIKEALRLYPP Sbjct: 133340 GASITW*VRAQKMGFPAFSTGAPGLPRPCWCSGWNCFRNHLDQMPYTTMCIKEALRLYPP 133161 Query: 383 VISVSRELSSPVTFPDGRSIPKGI 406 V + RELS+PVTFPDGRS+PKG+ Sbjct: 133160 VPGIGRELSTPVTFPDGRSLPKGM 133089 Score = 41.4 bits (95), Expect = 0.053 Identities = 16/26 (61%), Positives = 19/26 (72%) Frame = -1 Query: 405 GITATISIYGLHHNPRFWPNPKVFDP 430 GI +SIYGLHHNP+ WPN +V P Sbjct: 132199 GIMVLLSIYGLHHNPKVWPNLEVCGP 132122 Score = 52.3 bits (123), Expect = 3e-05 Identities = 22/27 (81%), Positives = 25/27 (92%) Frame = -3 Query: 426 KVFDPSRFAPDSSHHSHAYLPFSGGSR 452 +VFDPSRFAP S+ HSHA+LPFSGGSR Sbjct: 131990 QVFDPSRFAPGSAQHSHAFLPFSGGSR 131910 Score = 100 bits (247), Expect = 8e-20 Identities = 49/59 (83%), Positives = 54/59 (91%) Frame = -3 Query: 448 SGGSRNCIGKQFAMNELKVAVALTLLRFELLPDPTRIPVPIARLVLKSKNGIHLCLKKL 506 S RNCIGKQFAMN+LKVA ALTLLRFELLPDPTRIP+P+ARLVLKSKNGIHL L++L Sbjct: 129476 SAWPRNCIGKQFAMNQLKVARALTLLRFELLPDPTRIPIPMARLVLKSKNGIHLRLRRL 129300 This part is the 4Z1 sequence Score = 58.6 bits (139), Expect = 4e-07 Identities = 26/43 (60%), Positives = 35/43 (80%) Frame = -1 Query: 168 DKWEKLDGQDHPLEIFHCVSLMTLDTVMKCAFSYQGSVQLDEN 210 +KWE+ Q+ LE+F VSLMTLD++MKCAFS+QGS+QLD + Sbjct: 193444 NKWEEHIAQNSRLELFQHVSLMTLDSIMKCAFSHQGSIQLDRS 193316 An EST covers this region and can extend to the next exon gb|AI675602.1|AI675602 wc02e11.x1 NCI_CGAP_Pr28 Homo sapiens cDNA clone IMAGE:2314028 3' similar to SW:CP41_RAT P08516 CYTOCHROME P450 4A1 ;, mRNA sequence Length = 469 Score = 91.7 bits (224), Expect = 1e-18 Identities = 43/43 (100%), Positives = 43/43 (100%) Frame = +2 Query: 1 NKWEEHIAQNSRLELFQHVSLMTLDSIMKCAFSHQGSIQLDRS 43 NKWEEHIAQNSRLELFQHVSLMTLDSIMKCAFSHQGSIQLDRS Sbjct: 179 NKWEEHIAQNSRLELFQHVSLMTLDSIMKCAFSHQGSIQLDRS 307 Three frames for this EST does not extend 5 or 3 prime may have retained introns FHFQY*QGVLFYMVVIIRQIHFVHFIFSNPSKL*KGHKIT*KLGKDGHVYDHDIHPLPQNKWEEHIAQNSRLELFQHVSLMTLDSIMKCAFSHQGSIQLDRSVVKGR*LFANNCVTH*HVVPSSLFQYPGLIPESSVQP*QNLQPAHEQFSTSQRP FSFSVLTRGFILYGCHNKANSFCTLYIFKPQQTLKGT*NNLKIGKRWACV*S*YSSPAPEQMGGTHCPKLTSGALSTCLPDDPGQHHEVCLQPPGQHPVGQVSGKRKVIVCQ*LCHPLTCCSIFPIPVPWTHT*KQCSTLAKSPTSA*TIFYITTT FIFSTDKGFYFIWLS**GKFILYTLYFQTPANSKRDIK*LKNWEKMGMCMIMIFIPCPRTNGRNTLPKTHVWSSFNMSP**PWTAS*SVPSATRAASSWTGQW*KEGNCLPITVSPTNMLFHLPYSSTLDSYLKAVFNLSKISNQRMNNFLHHND Compare to earlier section of earlier gene FVSTLGYGLLLLNGQTWFQHRRMLTPAFHNDILKPYVGLMADSVRVML Compare to next section of earlier gene NSQSYIQAISDLNSLVFCCMRNAFHENDTIYSLTSAGRWTHRACQLAHQHTGSV Score = 96.3 bits (236), Expect = 2e-18 Identities = 40/63 (63%), Positives = 51/63 (80%) Frame = -3 Query: 298 EDRNSLSDEDLRAEVDTFMFEGHDTTASGISWIFYALATHPEHQQRCREEVQSILGDGTS 357 E+ S+ DL+AEV TFMF GHDTT+S ISWI Y LA +PEHQQRCR+E++ +LGDG+S Sbjct: 178967 ENTKDFSEADLQAEVKTFMFAGHDTTSSAISWILYCLAKYPEHQQRCRDEIRELLGDGSS 178788 Query: 358 VTW 360 +TW Sbjct: 178787 ITW 178779 Score = 77.6 bits (188), Expect = 7e-13 Identities = 32/46 (69%), Positives = 39/46 (84%) Frame = -3 Query: 361 DHLGQMPYTTMCIKEALRLYPPVISVSRELSSPVTFPDGRSIPKGI 406 +HL QMPYTTMCIKE LRLY PV+++SR L P+TFPDGRS+P G+ Sbjct: 171935 EHLSQMPYTTMCIKECLRLYAPVVNISRLLDKPITFPDGRSLPAGL 171798 Score = 40.6 bits (93), Expect = 0.090 Identities = 18/37 (48%), Positives = 24/37 (64%) Frame = -3 Query: 391 SSPVTFPDGRSIPKGITATISIYGLHHNPRFWPNPKV 427 SS + P S+ GIT I+I+ LHHNP FW +P+V Sbjct: 162578 SSLMHLPAFSSVYSGITVFINIWALHHNPYFWEDPQV 162468 Score = 76.9 bits (186), Expect = 1e-12 Identities = 38/60 (63%), Positives = 47/60 (78%) Frame = -1 Query: 447 FSGGSRNCIGKQFAMNELKVAVALTLLRFELLPDPTRIPVPIARLVLKSKNGIHLCLKKL 506 F G RNCIG+ FA+ E KVAVALTLLRF+L PD +R P P+ ++VLKSKNGIH+ KK+ Sbjct: 160315 FLGIPRNCIGQHFAIIECKVAVALTLLRFKLAPDHSRPPQPVRQVVLKSKNGIHVFAKKV 160136 Assembled gene Missing first 167 amino acids NKWEEHIAQNSRLELFQHVSLMTLDSIMKCAFSHQGSIQLDRS SYLKAVFNLSKISNQRMNNFLHHNDLVFKFSSQGQIFSKFNQELHQFT HLEKVIQDRKESLKDKLKQDTTQKRRWDFLDILLSAKV ENTKDFSEADLQAEVKTFMFAGHDTTSSAISWILYCLAKYPEHQQRCRDEIRELLGDGSSITW EHLSQMPYTTMCIKECLRLYAPVVNISRLLDKPITFPDGRSLPA GITVFINIWALHHNPYFWEDPQV FNPLRFSRENSEKIHPYAFIPFSAG PRNCIGQHFAIIECKVAVALTLLRFKLAPDHSRPPQPVRQVVLKSKNGIHVFAKKV From GSS gb|AQ394813|AQ394813 CITBI-E1-2542J12.TF CITBI-E1 Homo sapiens genomic clone 2542J12, genomic survey sequence [Homo sapiens] Length = 715 Score = 91.7 bits (224), Expect = 7e-19 Identities = 43/43 (100%), Positives = 43/43 (100%) Frame = +1 Query: 1 NKWEEHIAQNSRLELFQHVSLMTLDSIMKCAFSHQGSIQLDRS 43 NKWEEHIAQNSRLELFQHVSLMTLDSIMKCAFSHQGSIQLDRS Sbjct: 523 NKWEEHIAQNSRLELFQHVSLMTLDSIMKCAFSHQGSIQLDRS 651 Three frames of this entry PNSNI*LRVFCLDS*LQTL*EQELSLIPF*ILNGSGKKLVKY*IFLGIFS*NQKPLFIYALYRCF*ILIYFLFHVLSA*FFHDKSHINCTNET*KRGKNSYFPTLTKSQIASKAQYHFQY*QGVLFYMVVIIRQIHFVHFIFSNPSKL*KGHKIT*KLGKDGHVYDHDIHPLPQNKWEEHIAQNSRLELFQHVSLMTLDS RILIYS*GFSVLTLNCKLFENKNYL*FLSKFSMDLGKNLSSTKYF*GSFPRIKNLCLFMLSTDVFEFLFIFYSMFCQLSFSMIKVI*IVLMKLEKEGKIAISPPLQSHK*HLRHSIIFSTDKVFYFIWLS**GKFILYTLYFQTPANSKRDIK*LRNWEKMGMCMIMIFIPCPRTNGRNTLPKTHVWSSFNMSP**PWT EF*YIVEGFLS*LLTANSLRTRTISNSFLNSQWIWEKTCQVLNIFRDLFLESKTSVYLCSLQMFLNSYLFFIPCFVSLVFP**KSYKLY**NLKKREK*LFPHPYKVTNSI*GTVSFSVLTRCFILYGCHNKANSFCTLYIFKPQQTLKGT*NNLEIGKRWACV*S*YSSPAPEQMGGTHCPKLTSGALSTCLPDDPGQ This entry extends 5 prime end emb|AL136278.1|HS819P24S H.sapiens STS from genomic clone 819P24, sequence tagged site [Homo sapiens] Length = 499 Score = 45.7 bits (106), Expect = 2e-04 Identities = 24/25 (96%), Positives = 24/25 (96%) Frame = +2 Query: 1 EF*YIVEGFLS*LLTANSLRTRTIS 25 EF*YIVEGFLS*LLTANSLR RTIS Sbjct: 425 EF*YIVEGFLS*LLTANSLRIRTIS 499 Three frames of this entry TQASLSWKSTP*TDAPLFESKAPS**CPLQPAPTTASPCNALTHIQTDTTPIITYLLLFLYSSWLHDSCNLSEHLILVTCLEKSFQIELIN*FTLQS**QLNSV*IFILKWRTSSQ*TFGF*QSEIFWDIGRYVQGGK*HLNSNI*LRVFCLDS*LQTL*E*ELS NSSFLELEEHTLN*CTSLRV*GSFLMMPPSACSNHCLSLQCPYPHPNRHHPHYHLPIALSLLFLAP*QLQSFRTSDFSNVFGKKLSN*IN*LIHITELITA*QCINIYIEMENFVSINIWLLTK*NFLGYWQVCPRG*ITSEF*YIVEGFLS*LLTANSLRIRTIS KLKLP*VGRAHPELMHLSSSLRLLPDDAPFSLLQPLPLLAMPLPTSKQTPPPLSPTYCSFSTLLGSMTVAIFQNI*F**RVWKKAFKLN*LINSHYRVNNSLTVYKYLY*NGELRLNKHLAFDKVKFSGILAGMSKGVNNI*ILIYS*GFSVLTLNCKLFENKNYL These two entries from GSS extend 5 prime again gb|AQ061657|AQ061657 CIT-HSP-2348A12.TR CIT-HSP Homo sapiens genomic clone 2348A12 Length = 475 Score = 101 bits (249), Expect = 1e-21 Identities = 51/54 (94%), Positives = 51/54 (94%) Frame = +3 Query: 1 KLKLP*VGRAHPELMHLSSSLRLLPDDAPFSLLQPLPLLAMPLPTSKQTPPPLS 54 KLK *VGRAHPELMHLSSSLRLLPDDAPFSLLQPLPLLAMPLPTSKQTPP LS Sbjct: 312 KLKPX*VGRAHPELMHLSSSLRLLPDDAPFSLLQPLPLLAMPLPTSKQTPPXLS 473 Three frames of this entry LFR*HYRILKLIISFYTTTVLQFCCIISILLVRILRCRD*AVCLKSQGRPGAVAHACNPSTLGGQGGRMA*AQEFKTSLANMAKPHLY*KYMIIRQIRGRVEFKLKPLELEEHTLN*CT AI*VTL*NTQAYYILLYYYSLTILLYYLHFASKDIEMQRLSSLFKVTRQARCSGSCL*SQHFGRPRWADGLSPGVQDQPGQHGQTPSLLKIHDHKAD*GKS*VQTQAS*VGRAHPELMH SYLGDTIEYSSLLYPFILLQSYNFAVLSPFC**GY*DAEIKQFV*SHKAGQVQWLMPVIPALWEAKVGGWLEPRSSRPAWPTWPNPISTKNT*S*GRLGEELSSNSSLLSWKSTP*TDA This seq contain ALU repeat The entry below may extend the sequence gb|AC006028.2|AC006028 Homo sapiens clone GS165O14, WORKING DRAFT SEQUENCE, 5 unordered pieces Length = 284673 Score = 30.9 bits (68), Expect = 6.1 Identities = 13/15 (86%), Positives = 14/15 (92%) Frame = -1 Query: 1 SYLGDTIEYSSLLYP 15 SYLGDTIEYSSL +P Sbjct: 284577 SYLGDTIEYSSLTWP 284533 Three frames of the sequence This frame has a possible ETAM exon of A P450 VMLTVDSRGSPCVEL*ADNNFTQETAMTMITPSYLGDTIEYSSLTWPLPCWAASALGASVPSQ*PQPAAGLPLGSPAHLNRHWV*DTLWSL *C*RSTLEDPLVWNCERITISHRKQL*P*LRQAI*VTL*NTQA*PGHFLAGQPLP*EPQCPLSDHSQLLASHWVPPHT*TVTGFKTPCGR DADGRL*RIPLCGIVSG*QFHTGNSYDHDYAKLFR*HYRILKLDLATSLLGSLCLRSLSALSVTTASCWPPTGFPRTPEPSLGLRHPVVV These ESTs have this ETAM exon (may be accidental hit) gb|AA514190|AA514190 HFLEST-741 Human fetal liver (S.Xue) Homo sapiens cDNA Length = 503 Score = 46.9 bits (109), Expect = 3e-05 Identities = 25/35 (71%), Positives = 28/35 (79%) Frame = -2 Query: 1 ETAMTMITPSYLGDTIEYSSLTWPLPCWAASALGA 35 ETAMTMITPSYLGDTIEYSS +A++ALGA Sbjct: 340 ETAMTMITPSYLGDTIEYSS-------YASNALGA 257 gb|H00072|H00072 ph5g11u_19/1TV Homo sapiens cDNA clone ph5g11u_19/1TV. Length = 334 Score = 46.9 bits (109), Expect = 3e-05 Identities = 25/35 (71%), Positives = 28/35 (79%) Frame = -3 Query: 1 ETAMTMITPSYLGDTIEYSSLTWPLPCWAASALGA 35 ETAMTMITPSYLGDTIEYSS +A++ALGA Sbjct: 269 ETAMTMITPSYLGDTIEYSS-------YASNALGA 186 gb|H00069|H00069 ph5f06u_19/1TV Homo sapiens cDNA clone ph5f06u_19/1TV. Length = 405 Score = 45.7 bits (106), Expect = 8e-05 Identities = 24/35 (68%), Positives = 28/35 (79%) Frame = -2 Query: 1 ETAMTMITPSYLGDTIEYSSLTWPLPCWAASALGA 35 ETAMTMITPSYLGDTIEYSS ++++ALGA Sbjct: 164 ETAMTMITPSYLGDTIEYSS-------YSSNALGA 81 gb|T48598|T48598 ph6f9_19/1TV Homo sapiens cDNA clone ph6f9_19/1TV. Length = 375 Score = 45.7 bits (106), Expect = 8e-05 Identities = 24/35 (68%), Positives = 28/35 (79%) Frame = -3 Query: 1 ETAMTMITPSYLGDTIEYSSLTWPLPCWAASALGA 35 ETAMTMITPSYLGDTIEYSS ++++ALGA Sbjct: 271 ETAMTMITPSYLGDTIEYSS-------YSSNALGA 188 emb|AL045794.1|AL045794 DKFZp434G226_r1 434 (synonym: htes3) Homo sapiens cDNA clone DKFZp434G226 5', mRNA sequence Length = 508 Score = 45.3 bits (105), Expect = 1e-04 Identities = 21/21 (100%), Positives = 21/21 (100%) Frame = -2 Query: 1 ETAMTMITPSYLGDTIEYSSL 21 ETAMTMITPSYLGDTIEYSSL Sbjct: 120 ETAMTMITPSYLGDTIEYSSL 58 gb|AI535983.1|AI535983 xu.P6.B5 conorm Homo sapiens cDNA 3', mRNA sequence Length = 517 Score = 45.3 bits (105), Expect = 1e-04 Identities = 21/21 (100%), Positives = 21/21 (100%) Frame = -2 Query: 1 ETAMTMITPSYLGDTIEYSSL 21 ETAMTMITPSYLGDTIEYSSL Sbjct: 99 ETAMTMITPSYLGDTIEYSSL 37 gb|T41397|T41397 ph4c2_19/1TV Homo sapiens cDNA clone ph4c2_19/1TV. Length = 308 Score = 44.5 bits (103), Expect = 2e-04 Identities = 24/35 (68%), Positives = 27/35 (76%) Frame = -1 Query: 1 ETAMTMITPSYLGDTIEYSSLTWPLPCWAASALGA 35 E AMTMITPSYLGDTIEYSS +A++ALGA Sbjct: 308 EPAMTMITPSYLGDTIEYSS-------YASNALGA 225 gb|AI535783.1|AI535783 jun1.M13-Control conorm Homo sapiens cDNA 3', mRNA sequence Length = 506 Score = 34.4 bits (77), Expect = 0.19 Identities = 15/15 (100%), Positives = 15/15 (100%) Frame = -2 Query: 1 ETAMTMITPSYLGDT 15 ETAMTMITPSYLGDT Sbjct: 58 ETAMTMITPSYLGDT 14 gb|AQ059217|AQ059217 CIT-HSP-2348B20.TR CIT-HSP Homo sapiens genomic clone 2348B20 Length = 403 Score = 63.6 bits (152), Expect = 2e-10 Identities = 31/32 (96%), Positives = 31/32 (96%) Frame = +2 Query: 1 KLKLP*VGRAHPELMHLSSSLRLLPDDAPFSL 32 KLK P*VGRAHPELMHLSSSLRLLPDDAPFSL Sbjct: 308 KLKPP*VGRAHPELMHLSSSLRLLPDDAPFSL 403 From ESTs GITVFINIWALHHNPYFWEDPQVFNPLRFSRENSEKIHPYAFIPFSAGLRNCIGQHF AIIECKVAVALTLLR FKLAPDHSRPPQPVRQVVLKSKNGIHVCQKKFA* gb|AA193450|AA193450 zr40e07.r1 Soares NhHMPu S1 Homo sapiens cDNA clone 665892 5' Length = 623 Score = 100 bits (246), Expect = 3e-20 Identities = 47/47 (100%), Positives = 47/47 (100%) Frame = -2 Query: 45 YLKAVFNLSKISNQRMNNFLHHNDLVFKFSSQGQIFSKFNQELHQFT 91 YLKAVFNLSKISNQRMNNFLHHNDLVFKFSSQGQIFSKFNQELHQFT Sbjct: 262 YLKAVFNLSKISNQRMNNFLHHNDLVFKFSSQGQIFSKFNQELHQFT 122 Three frames of this EST GHFFLNPRETLKGP*NNLKLGKDGPCDDQIFIPCPRTKWGEHIAPKPHGLGLFNMSPPGWTLGQHHGRWSLQAPGQPSKLTGQVVKGKVIGLPINWGHPTNMVFHLPLIPVPWTHYLKAVFNLSKISNQRMNNFLHHNDLVFKFSSQGQIFSNLTKNFISSQVSPGIYMARVHYETSHCLRH*LWLWLLLLWTYGMVIIQI DTFF*TPGKL*RGLKIT*NWEKMGHVMIRYSSLAPEQNGGNTLPQNLTVWGSLTCLPLDGPWDSIMEGGPFRHQGSHPS*QVRW*KGR*LVCQLTGVTQLTWCSIFP*FQYPGLIT*KQCSTLAKSPTSA*TIFYITTTWFSNSALKAKSFQI*PRTSSVHRLVLGFTWPESTMRHLIV*DIDSGYGCFYYGHMAWSSFRL WTLFFKPPGNSKGALK*LKIGKRWAM**SDIHPLPQNKMGGTHCPKTSRSGAL*HVSPWMDPGTASWKVVPSGTRAAIQVDRSGGKREGNWFAN*LGSPN*HGVPSSPNSSTLDSLPESSVQP*QNLQPAHEQFSTSQRPGFQIQLSRPNLFKFNQELHQFTG*SWDLHGQSPL*DISLSETLTLVMAASTMDIWHGHHSDW Compare to these sequences Next region upstream NKWEEHIAQNSRLELFQHVSLMTLDSIMKCAFSHQGSIQLDRS Section before this from related gene FVSTLGYGLLLLNGQTWFQHRRMLTPAFHNDILKPYVGLMADSVRVML gb|H21976|H21976 yl38c11.r1 Homo sapiens cDNA clone 160532 5' similar to SP:CP4B_RABIT P15128 CYTOCHROME P450 IVB1 ;. Length = 332 Score = 105 bits (259), Expect = 8e-22 Identities = 45/45 (100%), Positives = 45/45 (100%) Frame = +1 Query: 239 GITVFINIWALHHNPYFWEDPQVFNPLRFSRENSEKIHPYAFIPF 283 GITVFINIWALHHNPYFWEDPQVFNPLRFSRENSEKIHPYAFIPF Sbjct: 46 GITVFINIWALHHNPYFWEDPQVFNPLRFSRENSEKIHPYAFIPF 180 Score = 81.1 bits (197), Expect = 2e-14 Identities = 40/51 (78%), Positives = 41/51 (79%) Frame = +2 Query: 273 EKIHPYAFIPFSAGPRNCIGQHFAIIECKVAVALTLLRFKLAPDHSRPPQP 323 +K P FSAG RNCIGQHFAIIEC VAVALTLLRFKLAPD SRPPQP Sbjct: 149 KKYIPMPSYHFSAGLRNCIGQHFAIIECXVAVALTLLRFKLAPDXSRPPQP 301 gb|AI675602.1|AI675602 wc02e11.x1 NCI_CGAP_Pr28 Homo sapiens cDNA clone IMAGE:2314028 3' similar to SW:CP41_RAT P08516 CYTOCHROME P450 4A1 ;, mRNA sequence Length = 469 Score = 91.7 bits (224), Expect(2) = 4e-28 Identities = 43/43 (100%), Positives = 43/43 (100%) Frame = +2 Query: 1 NKWEEHIAQNSRLELFQHVSLMTLDSIMKCAFSHQGSIQLDRS 43 NKWEEHIAQNSRLELFQHVSLMTLDSIMKCAFSHQGSIQLDRS Sbjct: 179 NKWEEHIAQNSRLELFQHVSLMTLDSIMKCAFSHQGSIQLDRS 307 Score = 55.8 bits (132), Expect(2) = 4e-28 Identities = 25/25 (100%), Positives = 25/25 (100%) Frame = +3 Query: 44 SYLKAVFNLSKISNQRMNNFLHHND 68 SYLKAVFNLSKISNQRMNNFLHHND Sbjct: 393 SYLKAVFNLSKISNQRMNNFLHHND 467 gb|AI668602.1|AI668602 yl48g04.x5 Soares breast 3NbHBst Homo sapiens cDNA clone IMAGE:161526 3' similar to SW:CP4B_RABIT P15128 CYTOCHROME P450 4B1 ;, mRNA sequence Length = 629 Score = 153 bits (384), Expect = 2e-36 Identities = 75/79 (94%), Positives = 75/79 (94%) Frame = -1 Query: 264 PLRFSRENSEKIHPYAFIPFSAGPRNCIGQHFAIIECKVAVALTLLRFKLAPDHSRPPQP 323 PLRFSRE SE IHPYAFIPFSAG RNCIGQHFAIIEC VAVALTLLRFKLAPDHSRPPQP Sbjct: 629 PLRFSREISEXIHPYAFIPFSAGLRNCIGQHFAIIECXVAVALTLLRFKLAPDHSRPPQP 450 Query: 324 VRQVVLKSKNGIHVFAKKV 342 VRQVVLKSKNGIHVFAKKV Sbjct: 449 VRQVVLKSKNGIHVFAKKV 393 gb|AI668594.1|AI668594 yl38c11.x5 Soares breast 3NbHBst Homo sapiens cDNA clone IMAGE:160532 3' similar to SW:CP4B_RABIT P15128 CYTOCHROME P450 4B1 ;, mRNA sequence Length = 629 Score = 159 bits (399), Expect = 3e-38 Identities = 77/79 (97%), Positives = 77/79 (97%) Frame = -1 Query: 264 PLRFSRENSEKIHPYAFIPFSAGPRNCIGQHFAIIECKVAVALTLLRFKLAPDHSRPPQP 323 PLRFSRENSE IHPYAFIPFSAG RNCIGQHFAIIECKVAVALTLLRFKLAPDHSRPPQP Sbjct: 629 PLRFSRENSEXIHPYAFIPFSAGLRNCIGQHFAIIECKVAVALTLLRFKLAPDHSRPPQP 450 Query: 324 VRQVVLKSKNGIHVFAKKV 342 VRQVVLKSKNGIHVFAKKV Sbjct: 449 VRQVVLKSKNGIHVFAKKV 393 gb|AI820775.1|AI820775 yl38c11.y5 Soares breast 3NbHBst Homo sapiens cDNA clone IMAGE:160532 5' similar to SW:CP4B_RABIT P15128 CYTOCHROME P450 4B1 ;, mRNA sequence Length = 548 Score = 220 bits (555), Expect = 2e-56 Identities = 103/104 (99%), Positives = 103/104 (99%) Frame = +2 Query: 239 GITVFINIWALHHNPYFWEDPQVFNPLRFSRENSEKIHPYAFIPFSAGPRNCIGQHFAII 298 GITVFINIWALHHNPYFWEDPQVFNPLRFSRENSEKIHPYAFIPFSAG RNCIGQHFAII Sbjct: 101 GITVFINIWALHHNPYFWEDPQVFNPLRFSRENSEKIHPYAFIPFSAGLRNCIGQHFAII 280 Query: 299 ECKVAVALTLLRFKLAPDHSRPPQPVRQVVLKSKNGIHVFAKKV 342 ECKVAVALTLLRFKLAPDHSRPPQPVRQVVLKSKNGIHVFAKKV Sbjct: 281 ECKVAVALTLLRFKLAPDHSRPPQPVRQVVLKSKNGIHVFAKKV 412 gb|AI733538.1|AI733538 yl48g04.y5 Soares breast 3NbHBst Homo sapiens cDNA clone IMAGE:161526 5' similar to SW:CP4B_RABIT P15128 CYTOCHROME P450 4B1 ;, mRNA sequence Length = 535 Score = 220 bits (555), Expect = 2e-56 Identities = 103/104 (99%), Positives = 103/104 (99%) Frame = +3 Query: 239 GITVFINIWALHHNPYFWEDPQVFNPLRFSRENSEKIHPYAFIPFSAGPRNCIGQHFAII 298 GITVFINIWALHHNPYFWEDPQVFNPLRFSRENSEKIHPYAFIPFSAG RNCIGQHFAII Sbjct: 87 GITVFINIWALHHNPYFWEDPQVFNPLRFSRENSEKIHPYAFIPFSAGLRNCIGQHFAII 266 Query: 299 ECKVAVALTLLRFKLAPDHSRPPQPVRQVVLKSKNGIHVFAKKV 342 ECKVAVALTLLRFKLAPDHSRPPQPVRQVVLKSKNGIHVFAKKV Sbjct: 267 ECKVAVALTLLRFKLAPDHSRPPQPVRQVVLKSKNGIHVFAKKV 398 gb|H25624|H25624 yl48g04.r1 Homo sapiens cDNA clone 161526 5' similar to gb:J02871 CYTOCHROME P450 IVB1 (HUMAN);contains Alu repetitive element;. Length = 432 Score = 201 bits (507), Expect = 7e-51 Identities = 96/102 (94%), Positives = 97/102 (94%), Gaps = 1/102 (0%) Frame = +3 Query: 239 GITVFINIWALHHNPYFWEDPQVFNPLRFSRENSEKIHPYAFIPFSAGPRNCIGQHFAII 298 GITVFINIWALHHNPYFWEDPQVFNPLRFSRENSEKIHPYAFIPFSAG RNCIGQHFAII Sbjct: 87 GITVFINIWALHHNPYFWEDPQVFNPLRFSRENSEKIHPYAFIPFSAGLRNCIGQHFAII 266 Query: 299 ECKVAVALTLLRFKLAPDHSRPPQPVRQVVLKSKN-GIHVFAK 340 ECKVAVALTLLRFKLAPDHSRPPQPVRQVVLKSK HVF + Sbjct: 267 ECKVAVALTLLRFKLAPDHSRPPQPVRQVVLKSKXWEFHVFCQ 395 For a list of UNIGENE entries of human P450s see human UNIGENE P450s