バイオ・データ・マイニング/MacでHMMERを使う
をテンプレートにして作成
開始行:
*はじめに [#y0a46a75]
「[[HMMER:http://hmmer.janelia.org/]]」は,隠れマルコフモ...
Linux用とMac OS X用のバイナリー(実行形式)ファイルが配布...
次の環境で確認しました.
-Mac OS X 10.6.3
-HMMER 3.0
-Pfam 24.0
-TAIR 9
*必要なもの [#u1910b30]
-make
-gcc
*ダウンロード [#yd2147b9]
-HMMER~
http://hmmer.janelia.org/
DownloadのSourceのところにある「FTP」または「HTTP」をクリ...
*インストール [#aa3c836d]
ダウンロードした''hmmer-3.0.tar.gz''をダブル・クリックし...
ターミナルを起動して展開されたディレクトリーに移動し,次...
#geshi(bash){{
./configure
make
make check
sudo make install
}}
*実行 [#vd2ad01b]
ここでは,シロイヌナズナのたんぱく質データに対して,Bcl 2...
**プロファイルHMMのダウンロード [#cf7af770]
[[Pfam:http://pfam.sanger.ac.uk/]]から検索クエリーとなる...
Pfamは,プロファイルHMMを集めて公開しているサイトです.
[[Bcl-2ファミリーのページ:http://pfam.sanger.ac.uk/family...
ここで,一番下のdownloadを右クリックし,「リンク先のファ...
ここでは,ファイル名を「''PF00452''」として説明します.
**たんぱく質データのダウンロード [#u08aeeea]
[[TAIR:http://www.arabidopsis.org/]]から検索対象となる配...
TAIRは,シロイヌナズナに関する情報を集めた研究用ポータル...
具体的には,ターミナルを起動し,次のように実行します.
#geshi(bash){{
$ ftp ftp.arabidopsis.org
}}
すると,FTPサーバーにつながり,ユーザー名とパスワードを求...
#geshi(bash){{
Connected to ftp.arabidopsis.org.
220 (vsFTPd 2.0.5)
Name (ftp.arabidopsis.org:username):
}}
そこで,ユーザー名は「''anonymous''」,パスワードはナシで...
ログインすると,こんなメッセージが表示されます.
#geshi(text){{
230-Welcome to ftp.arabidopsis.org, the guest ftp server ...
230-project.
230-
230-If you have any trouble with this server, first try l...
230-with a minus sign (-) as the first character of your ...
230-will turn off a feature that may be confusing your ft...
230-
230-Please send any questions, comments or problem report...
230-curator@arabidopsis.org.
230-
230-Anonymous access is now available, as well as the pre...
230-where a password was required.
230-
230-You may access this server using the URL
230-
230- ftp://ftp.arabidopsis.org
230-or
230- ftp://tairpub@ftp.arabidopsis.org/home/tair
230 Login successful.
Remote system type is UNIX.
Using binary mode to transfer files.
}}
まず,たんぱく質の配列データが置かれているディレクトリー...
#geshi(bash){{
ftp> cd /Sequences/blast_datasets/other_datasets/CURRENT
}}
ちなみに,lsコマンドを使うとExtended Passive Modeに突入し...
それから,たんぱく質データをダウンロードします.
#geshi(bash){{
ftp> get At_GB_all_prot.gz
ftp> get gp_GB_all_prot.gz
}}
これらのファイルは,FASTA形式で記述されたテキスト・ファイ...
''At''はシロイヌナズナだけのたんぱく質データ,''gp''はシ...
ダウンロードが終わったら,FTPサーバーからログアウトします.
#geshi(bash){{
ftp > quit
}}
**HMMERによる検索 [#g91b0787]
検索クエリーのプロファイルHMMと検索対象のたんぱく質データ...
HMMERの検索にはhmmsearchコマンドを使い,引数として検索ク...
#geshi(bash){{
$ hmmsearch PF00452 At_GB_all_prot.gz
}}
すると,次のような実行結果が得られます.
#geshi(bash){{
# hmmsearch :: search profile(s) against a sequence datab...
# HMMER 3.0 (March 2010); http://hmmer.org/
# Copyright (C) 2010 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License...
# - - - - - - - - - - - - - - - - - - - - - - - - - - - -...
# query HMM file: PF00452
# target sequence database: At_GB_all_prot.gz
# - - - - - - - - - - - - - - - - - - - - - - - - - - - -...
Query: Bcl-2 [M=101]
Accession: PF00452.12
Description: Apoptosis regulator proteins, Bcl-2 family
Scores for complete sequences (score includes all domains):
--- full sequence --- --- best 1 domain --- -#dom-
E-value score bias E-value score bias exp ...
------- ------ ----- ------- ------ ----- ---- -...
[No hits detected that satisfy reporting thresholds]
Domain annotation for each sequence (and alignments):
[No targets detected that satisfy reporting thresholds]
Internal pipeline statistics summary:
-------------------------------------
Query model(s): 1 (101 nodes)
Target sequences: 160839 (62392833 r...
Passed MSV filter: 5198 (0.032318);...
Passed bias filter: 4482 (0.0278664)...
Passed Vit filter: 286 (0.00177818...
Passed Fwd filter: 0 (0); expect...
Initial search space (Z): 160839 [actual num...
Domain search space (domZ): 0 [number of ...
# CPU time: 2.20u 0.06s 00:00:02.26 Elapsed: 00:00:02.00
# Mc/sec: 3135.16
//
}}
つまり,シロイヌナズナのたんぱく質データの中にはBcl-2ファ...
ちなみに,全植物からシロイヌナズナを抜いたたんぱく質デー...
#geshi(bash){{
$ hmmsearch PF00452 gp_GB_all_prot.gz
# hmmsearch :: search profile(s) against a sequence datab...
# HMMER 3.0 (March 2010); http://hmmer.org/
# Copyright (C) 2010 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License...
# - - - - - - - - - - - - - - - - - - - - - - - - - - - -...
# query HMM file: PF00452
# target sequence database: gp_GB_all_prot.gz
# - - - - - - - - - - - - - - - - - - - - - - - - - - - -...
Query: Bcl-2 [M=101]
Accession: PF00452.12
Description: Apoptosis regulator proteins, Bcl-2 family
Scores for complete sequences (score includes all domains):
--- full sequence --- --- best 1 domain --- -#dom-
E-value score bias E-value score bias exp ...
------- ------ ----- ------- ------ ----- ---- -...
------ inclusion threshold ------
3.4 13.9 0.0 8.5 12.7 0.0 1.7 ...
3.5 13.9 0.0 16 11.8 0.0 2.1 ...
3.8 13.8 0.0 19 11.5 0.0 2.2 ...
3.9 13.8 0.0 18 11.6 0.0 2.1 ...
4.2 13.7 0.0 5.2 13.3 0.0 1.0 ...
4.2 13.6 0.1 8.7 12.6 0.1 1.5 ...
4.3 13.6 0.0 20 11.5 0.0 2.1 ...
4.3 13.6 0.0 20 11.5 0.0 2.1 ...
4.8 13.5 0.0 21 11.4 0.0 2.1 ...
4.8 13.5 0.0 21 11.4 0.0 2.1 ...
4.8 13.5 0.0 21 11.4 0.0 2.1 ...
5.3 13.3 0.1 85 9.4 0.0 2.8 ...
6.9 12.9 0.1 16 11.7 0.1 1.6 ...
6.9 12.9 0.1 16 11.7 0.1 1.6 ...
8.1 12.7 0.1 87 9.4 0.0 2.2 ...
8.3 12.7 0.1 20 11.5 0.1 1.6 ...
8.3 12.7 0.1 20 11.5 0.1 1.6 ...
8.4 12.7 0.1 3.1e+02 7.7 0.0 2.5 ...
8.4 12.7 0.1 3.1e+02 7.7 0.0 2.5 ...
Domain annotation for each sequence (and alignments):
>> gi|21616905|gb|AAM66415.1| NBS-LRR protein [Oryza sat...
# score bias c-Evalue i-Evalue hmmfrom hmm to ...
--- ------ ----- --------- --------- ------- ------- ...
1 ? 12.7 0.0 0.00013 8.5 2 80 ....
Alignments for each domain:
== domain 1 score: 12.7 bits; conditional E-value: ...
HHHHHHHHHHHHHHCHHHCTTS--...
Bcl-2 2 rslgdeleqeheelfenlleqlni...
+++++e+ +e +++ n+ +i...
gi|21616905|gb|AAM66415.1| 45 KKVIQEITREGTNV-TNFNTLQEI...
55556666655553.344444455...
>> gi|218200452|gb|EEC82879.1| hypothetical protein OsI_...
# score bias c-Evalue i-Evalue hmmfrom hmm to ...
--- ------ ----- --------- --------- ------- ------- ...
1 ? -2.8 0.0 8.9 5.6e+05 9 29 ....
2 ? 11.8 0.0 0.00025 16 33 88 ....
Alignments for each domain:
== domain 1 score: -2.8 bits; conditional E-value: ...
HHHHHHHCHHHCTTS---SHH CS
Bcl-2 9 eqeheelfenlleqlnietpe 29
++ +e+++en+l++++ e+++
gi|218200452|gb|EEC82879.1| 41 DYHYEKEIENVLRRVHEEEDD 61
567888999999988875554 PP
== domain 2 score: 11.8 bits; conditional E-value: ...
HHHHHHHHHHHHH------.......
Bcl-2 33 elfaevaeelfsdgginWG.......
+f ++++++ +d+ inWG ...
gi|218200452|gb|EEC82879.1| 376 YAFVAMGNDVTTDDAINWGmayp...
46999**************9999...
>> gi|125602143|gb|EAZ41468.1| hypothetical protein OsJ_...
# score bias c-Evalue i-Evalue hmmfrom hmm to ...
--- ------ ----- --------- --------- ------- ------- ...
1 ? -2.9 0.0 9.6 6.1e+05 9 29 ....
2 ? 11.5 0.0 0.0003 19 33 86 ....
Alignments for each domain:
== domain 1 score: -2.9 bits; conditional E-value: ...
HHHHHHHCHHHCTTS---SHH CS
Bcl-2 9 eqeheelfenlleqlnietpe 29
++ +e+++en+l++++ e+++
gi|125602143|gb|EAZ41468.1| 59 DYHYEKEIENVLRRVHEEEDD 79
567888999999988875554 PP
== domain 2 score: 11.5 bits; conditional E-value: ...
HHHHHHHHHHHHH------.......
Bcl-2 33 elfaevaeelfsdgginWG.......
+f ++++++ +d+ inWG ...
gi|125602143|gb|EAZ41468.1| 394 YAFVAMGNDVTTDDAINWGmayp...
46999**************9999...
>> gi|222639889|gb|EEE68021.1| hypothetical protein OsJ_...
# score bias c-Evalue i-Evalue hmmfrom hmm to ...
--- ------ ----- --------- --------- ------- ------- ...
1 ? -2.8 0.0 8.9 5.7e+05 9 29 ....
2 ? 11.6 0.0 0.00028 18 33 86 ....
Alignments for each domain:
== domain 1 score: -2.8 bits; conditional E-value: ...
HHHHHHHCHHHCTTS---SHH CS
Bcl-2 9 eqeheelfenlleqlnietpe 29
++ +e+++en+l++++ e+++
gi|222639889|gb|EEE68021.1| 41 DYHYEKEIENVLRRVHEEEDD 61
567888999999988875554 PP
== domain 2 score: 11.6 bits; conditional E-value: ...
HHHHHHHHHHHHH------.......
Bcl-2 33 elfaevaeelfsdgginWG.......
+f ++++++ +d+ inWG ...
gi|222639889|gb|EEE68021.1| 376 YAFVAMGNDVTTDDAINWGmayp...
46999**************9999...
>> gi|157328291|emb|CAO17778.1| unnamed protein product ...
# score bias c-Evalue i-Evalue hmmfrom hmm to ...
--- ------ ----- --------- --------- ------- ------- ...
1 ? 13.3 0.0 8.2e-05 5.2 48 99 ....
Alignments for each domain:
== domain 1 score: 13.3 bits; conditional E-value: ...
-----HHHHHHH--HHHHHCCCH...
Bcl-2 48 inWGRivallafagalakklveq...
i+WGR+v + + g +++ +++...
gi|157328291|emb|CAO17778.1| 4 ISWGRVVLVQC-TGGNCQETRNS...
89***996555.55568888888...
>> gi|116055559|emb|CAL58227.1| peroxisomal membrane 22 ...
# score bias c-Evalue i-Evalue hmmfrom hmm to ...
--- ------ ----- --------- --------- ------- ------- ...
1 ? 12.6 0.1 0.00014 8.7 27 64 ....
Alignments for each domain:
== domain 1 score: 12.6 bits; conditional E-value: ...
SHHHHHHHHHHHHHHHHHH---...
Bcl-2 27 tpeeaselfaevaeelfsdggi...
++ as++f +++ e s +++...
gi|116055559|emb|CAL58227.1| 91 SKCVASDVFVQIVVEERSANDL...
33448899999999999999**...
>> gi|113622936|dbj|BAF22881.1| Os08g0139700 [Oryza sati...
# score bias c-Evalue i-Evalue hmmfrom hmm to ...
--- ------ ----- --------- --------- ------- ------- ...
1 ? -3.0 0.0 10 6.4e+05 9 29 ....
2 ? 11.5 0.0 0.00031 20 33 86 ....
Alignments for each domain:
== domain 1 score: -3.0 bits; conditional E-value: 10
HHHHHHHCHHHCTTS---SHH CS
Bcl-2 9 eqeheelfenlleqlnietpe 29
++ +e+++en+l++++ e+++
gi|113622936|dbj|BAF22881.1| 77 DYHYEKEIENVLRRVHEEEDD 97
567888999999988875554 PP
== domain 2 score: 11.5 bits; conditional E-value: ...
HHHHHHHHHHHHH------......
Bcl-2 33 elfaevaeelfsdgginWG......
+f ++++++ +d+ inWG ...
gi|113622936|dbj|BAF22881.1| 412 YAFVAMGNDVTTDDAINWGmay...
46999**************999...
>> gi|115474741|ref|NP_001060967.1| Os08g0139700 [Oryza ...
# score bias c-Evalue i-Evalue hmmfrom hmm to ...
--- ------ ----- --------- --------- ------- ------- ...
1 ? -3.0 0.0 10 6.4e+05 9 29 ....
2 ? 11.5 0.0 0.00031 20 33 86 ....
Alignments for each domain:
== domain 1 score: -3.0 bits; conditional E-value: 10
HHHHHHHCHHHCTTS---S...
Bcl-2 9 eqeheelfenlleqlniet...
++ +e+++en+l++++ e+++
gi|115474741|ref|NP_001060967.1| 77 DYHYEKEIENVLRRVHEEE...
5678889999999888755...
== domain 2 score: 11.5 bits; conditional E-value: ...
HHHHHHHHHHHHH-----...
Bcl-2 33 elfaevaeelfsdgginW...
+f ++++++ +d+ inW...
gi|115474741|ref|NP_001060967.1| 412 YAFVAMGNDVTTDDAINW...
46999*************...
>> gi|115607548|gb|ABJ16553.1| (E)-beta-caryophyllene/be...
# score bias c-Evalue i-Evalue hmmfrom hmm to ...
--- ------ ----- --------- --------- ------- ------- ...
1 ? 11.4 0.0 0.00033 21 33 86 ....
Alignments for each domain:
== domain 1 score: 11.4 bits; conditional E-value: ...
HHHHHHHHHHHHH------.......
Bcl-2 33 elfaevaeelfsdgginWG.......
+f ++++++ +d+ inWG ...
gi|115607548|gb|ABJ16553.1| 436 YAFVAMGNDVTTDDAINWGmayp...
46999**************9999...
>> gi|38636781|dbj|BAD03024.1| putative sesquiterpene cy...
# score bias c-Evalue i-Evalue hmmfrom hmm to ...
--- ------ ----- --------- --------- ------- ------- ...
1 ? 11.4 0.0 0.00033 21 33 86 ....
Alignments for each domain:
== domain 1 score: 11.4 bits; conditional E-value: ...
HHHHHHHHHHHHH------.......
Bcl-2 33 elfaevaeelfsdgginWG.......
+f ++++++ +d+ inWG ...
gi|38636781|dbj|BAD03024.1| 436 YAFVAMGNDVTTDDAINWGmayp...
46999**************9999...
>> gi|42761322|dbj|BAD11575.1| putative sesquiterpene cy...
# score bias c-Evalue i-Evalue hmmfrom hmm to ...
--- ------ ----- --------- --------- ------- ------- ...
1 ? 11.4 0.0 0.00033 21 33 86 ....
Alignments for each domain:
== domain 1 score: 11.4 bits; conditional E-value: ...
HHHHHHHHHHHHH------.......
Bcl-2 33 elfaevaeelfsdgginWG.......
+f ++++++ +d+ inWG ...
gi|42761322|dbj|BAD11575.1| 436 YAFVAMGNDVTTDDAINWGmayp...
46999**************9999...
>> gi|145282028|gb|ABP49617.1| ubiquitin-protein ligase-...
# score bias c-Evalue i-Evalue hmmfrom hmm to ...
--- ------ ----- --------- --------- ------- ------- ...
1 ? -1.7 0.0 4 2.6e+05 67 88 ....
2 ? -0.8 0.0 2 1.3e+05 14 45 ....
3 ? 9.4 0.0 0.0013 85 15 83 ....
Alignments for each domain:
== domain 1 score: -1.7 bits; conditional E-value: 4
CCCHTCCHHHHCHHHHHHHHHH CS
Bcl-2 67 lveqgeeelvkrivellseyle 88
++ ++++e+v+++ e++ +y++
gi|145282028|gb|ABP49617.1| 21 CEAEKKDEAVNMLKEMVQRYVA 42
4444446666666666666664 PP
== domain 2 score: -0.8 bits; conditional E-value: 2
HHCHHHCTTS---SHHHHHHH.H...
Bcl-2 14 elfenlleqlnietpeeasel.f...
++++ ++e l + + e+ e+ f...
gi|145282028|gb|ABP49617.1| 125 SAYNPMIEYLCNHGQTEKAEVfF...
57888999888877777555548...
== domain 3 score: 9.4 bits; conditional E-value: 0...
HCHHHCTTS---SHHHHHHHHHH...
Bcl-2 15 lfenlleqlnietpeeaselfae...
+++++le +i p + +lf++...
gi|145282028|gb|ABP49617.1| 215 ALDSMLEGGHI--P--EASLFRS...
56777777676..3..4689***...
>> gi|162687831|gb|EDQ74211.1| predicted protein [Physco...
# score bias c-Evalue i-Evalue hmmfrom hmm to ...
--- ------ ----- --------- --------- ------- ------- ...
1 ? 11.7 0.1 0.00026 16 17 70 ....
Alignments for each domain:
== domain 1 score: 11.7 bits; conditional E-value: ...
HHHCTTS---SHHHHHHHHHHHH...
Bcl-2 17 enlleqlnietpeeaselfaeva...
++l lni++ ++ + f +v+...
gi|162687831|gb|EDQ74211.1| 212 QQLKGLLNITNF-TTDTDFVSVM...
555555666444.4445599***...
>> gi|168016571|ref|XP_001760822.1| predicted protein [P...
# score bias c-Evalue i-Evalue hmmfrom hmm to ...
--- ------ ----- --------- --------- ------- ------- ...
1 ? 11.7 0.1 0.00026 16 17 70 ....
Alignments for each domain:
== domain 1 score: 11.7 bits; conditional E-value: ...
HHHCTTS---SHHHHHHH...
Bcl-2 17 enlleqlnietpeeasel...
++l lni++ ++ + ...
gi|168016571|ref|XP_001760822.1| 212 QQLKGLLNITNF-TTDTD...
555555666444.44455...
>> gi|116782627|gb|ABK22581.1| unknown [Picea sitchensis]
# score bias c-Evalue i-Evalue hmmfrom hmm to ...
--- ------ ----- --------- --------- ------- ------- ...
1 ? 0.7 0.0 0.69 4.4e+04 28 50 ....
2 ? 9.4 0.0 0.0014 87 15 82 ....
Alignments for each domain:
== domain 1 score: 0.7 bits; conditional E-value: 0...
HHHHHHHHHHHHHHHHHH----- CS
Bcl-2 28 peeaselfaevaeelfsdgginW 50
++ +e++ e+ +++ + +g nW
gi|116782627|gb|ABK22581.1| 47 YDQLYEAIIEIFKKINEIPGANW 69
34457788888888888899999 PP
== domain 2 score: 9.4 bits; conditional E-value: 0...
HCHHHCTTS---SHHH.HHHHHH...
Bcl-2 15 lfenlleqlnietpee.aselfa...
+ +++++ ++++++e ++e f+...
gi|116782627|gb|ABK22581.1| 76 SMVKMIQEYDLDQNKEiDREEFH...
56689999999888888999***...
>> gi|162692330|gb|EDQ78687.1| predicted protein [Physco...
# score bias c-Evalue i-Evalue hmmfrom hmm to ...
--- ------ ----- --------- --------- ------- ------- ...
1 ? 11.5 0.1 0.00031 20 18 77 ....
Alignments for each domain:
== domain 1 score: 11.5 bits; conditional E-value: ...
HHCTTS---SHHHHHHHHHHHHH...
Bcl-2 18 nlleqlnietpeeaselfaevae...
+l lni t+ ++ + f +v++...
gi|162692330|gb|EDQ78687.1| 213 QLKGLLNI-THFTTDTDFISVMT...
44455566.4445556699****...
>> gi|168007190|ref|XP_001756291.1| predicted protein [P...
# score bias c-Evalue i-Evalue hmmfrom hmm to ...
--- ------ ----- --------- --------- ------- ------- ...
1 ? 11.5 0.1 0.00031 20 18 77 ....
Alignments for each domain:
== domain 1 score: 11.5 bits; conditional E-value: ...
HHCTTS---SHHHHHHHH...
Bcl-2 18 nlleqlnietpeeaself...
+l lni t+ ++ + f...
gi|168007190|ref|XP_001756291.1| 213 QLKGLLNI-THFTTDTDF...
44455566.444555669...
>> gi|222847830|gb|EEE85377.1| predicted protein [Populu...
# score bias c-Evalue i-Evalue hmmfrom hmm to ...
--- ------ ----- --------- --------- ------- ------- ...
1 ? 7.7 0.0 0.0048 3.1e+02 19 55 ....
2 ? 2.3 0.0 0.23 1.5e+04 47 79 ....
Alignments for each domain:
== domain 1 score: 7.7 bits; conditional E-value: 0...
HCTTS---SHHHHHHHHHHHHHH...
Bcl-2 19 lleqlnietpeeaselfaevaee...
++++ ++ + a+++++ev++e...
gi|222847830|gb|EEE85377.1| 73 CSNHNSV-GETAARDTLEEVMAE...
5555555.445599*********...
== domain 2 score: 2.3 bits; conditional E-value: 0...
------HHHHHHH--HHHHHCCC...
Bcl-2 47 ginWGRivallafagalakklve...
+++W Ri al+ + a+ + +...
gi|222847830|gb|EEE85377.1| 242 DPSWARIAALVPEVVSCAEACDQ...
689*******9988888888887...
>> gi|224061623|ref|XP_002300572.1| predicted protein [P...
# score bias c-Evalue i-Evalue hmmfrom hmm to ...
--- ------ ----- --------- --------- ------- ------- ...
1 ? 7.7 0.0 0.0048 3.1e+02 19 55 ....
2 ? 2.3 0.0 0.23 1.5e+04 47 79 ....
Alignments for each domain:
== domain 1 score: 7.7 bits; conditional E-value: 0...
HCTTS---SHHHHHHHHH...
Bcl-2 19 lleqlnietpeeaselfa...
++++ ++ + a+++++...
gi|224061623|ref|XP_002300572.1| 73 CSNHNSV-GETAARDTLE...
5555555.445599****...
== domain 2 score: 2.3 bits; conditional E-value: 0...
------HHHHHHH--HHH...
Bcl-2 47 ginWGRivallafagala...
+++W Ri al+ + a...
gi|224061623|ref|XP_002300572.1| 242 DPSWARIAALVPEVVSCA...
689*******99888888...
Internal pipeline statistics summary:
-------------------------------------
Query model(s): 1 (101 nodes)
Target sequences: 1210800 (410418238 ...
Passed MSV filter: 33081 (0.0273216)...
Passed bias filter: 28502 (0.0235398)...
Passed Vit filter: 1482 (0.00122398...
Passed Fwd filter: 23 (1.89957e-0...
Initial search space (Z): 1210800 [actual num...
Domain search space (domZ): 19 [number of ...
# CPU time: 14.69u 0.41s 00:00:15.10 Elapsed: 00:00:21.93
# Mc/sec: 1890.21
//
}}
終了行:
*はじめに [#y0a46a75]
「[[HMMER:http://hmmer.janelia.org/]]」は,隠れマルコフモ...
Linux用とMac OS X用のバイナリー(実行形式)ファイルが配布...
次の環境で確認しました.
-Mac OS X 10.6.3
-HMMER 3.0
-Pfam 24.0
-TAIR 9
*必要なもの [#u1910b30]
-make
-gcc
*ダウンロード [#yd2147b9]
-HMMER~
http://hmmer.janelia.org/
DownloadのSourceのところにある「FTP」または「HTTP」をクリ...
*インストール [#aa3c836d]
ダウンロードした''hmmer-3.0.tar.gz''をダブル・クリックし...
ターミナルを起動して展開されたディレクトリーに移動し,次...
#geshi(bash){{
./configure
make
make check
sudo make install
}}
*実行 [#vd2ad01b]
ここでは,シロイヌナズナのたんぱく質データに対して,Bcl 2...
**プロファイルHMMのダウンロード [#cf7af770]
[[Pfam:http://pfam.sanger.ac.uk/]]から検索クエリーとなる...
Pfamは,プロファイルHMMを集めて公開しているサイトです.
[[Bcl-2ファミリーのページ:http://pfam.sanger.ac.uk/family...
ここで,一番下のdownloadを右クリックし,「リンク先のファ...
ここでは,ファイル名を「''PF00452''」として説明します.
**たんぱく質データのダウンロード [#u08aeeea]
[[TAIR:http://www.arabidopsis.org/]]から検索対象となる配...
TAIRは,シロイヌナズナに関する情報を集めた研究用ポータル...
具体的には,ターミナルを起動し,次のように実行します.
#geshi(bash){{
$ ftp ftp.arabidopsis.org
}}
すると,FTPサーバーにつながり,ユーザー名とパスワードを求...
#geshi(bash){{
Connected to ftp.arabidopsis.org.
220 (vsFTPd 2.0.5)
Name (ftp.arabidopsis.org:username):
}}
そこで,ユーザー名は「''anonymous''」,パスワードはナシで...
ログインすると,こんなメッセージが表示されます.
#geshi(text){{
230-Welcome to ftp.arabidopsis.org, the guest ftp server ...
230-project.
230-
230-If you have any trouble with this server, first try l...
230-with a minus sign (-) as the first character of your ...
230-will turn off a feature that may be confusing your ft...
230-
230-Please send any questions, comments or problem report...
230-curator@arabidopsis.org.
230-
230-Anonymous access is now available, as well as the pre...
230-where a password was required.
230-
230-You may access this server using the URL
230-
230- ftp://ftp.arabidopsis.org
230-or
230- ftp://tairpub@ftp.arabidopsis.org/home/tair
230 Login successful.
Remote system type is UNIX.
Using binary mode to transfer files.
}}
まず,たんぱく質の配列データが置かれているディレクトリー...
#geshi(bash){{
ftp> cd /Sequences/blast_datasets/other_datasets/CURRENT
}}
ちなみに,lsコマンドを使うとExtended Passive Modeに突入し...
それから,たんぱく質データをダウンロードします.
#geshi(bash){{
ftp> get At_GB_all_prot.gz
ftp> get gp_GB_all_prot.gz
}}
これらのファイルは,FASTA形式で記述されたテキスト・ファイ...
''At''はシロイヌナズナだけのたんぱく質データ,''gp''はシ...
ダウンロードが終わったら,FTPサーバーからログアウトします.
#geshi(bash){{
ftp > quit
}}
**HMMERによる検索 [#g91b0787]
検索クエリーのプロファイルHMMと検索対象のたんぱく質データ...
HMMERの検索にはhmmsearchコマンドを使い,引数として検索ク...
#geshi(bash){{
$ hmmsearch PF00452 At_GB_all_prot.gz
}}
すると,次のような実行結果が得られます.
#geshi(bash){{
# hmmsearch :: search profile(s) against a sequence datab...
# HMMER 3.0 (March 2010); http://hmmer.org/
# Copyright (C) 2010 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License...
# - - - - - - - - - - - - - - - - - - - - - - - - - - - -...
# query HMM file: PF00452
# target sequence database: At_GB_all_prot.gz
# - - - - - - - - - - - - - - - - - - - - - - - - - - - -...
Query: Bcl-2 [M=101]
Accession: PF00452.12
Description: Apoptosis regulator proteins, Bcl-2 family
Scores for complete sequences (score includes all domains):
--- full sequence --- --- best 1 domain --- -#dom-
E-value score bias E-value score bias exp ...
------- ------ ----- ------- ------ ----- ---- -...
[No hits detected that satisfy reporting thresholds]
Domain annotation for each sequence (and alignments):
[No targets detected that satisfy reporting thresholds]
Internal pipeline statistics summary:
-------------------------------------
Query model(s): 1 (101 nodes)
Target sequences: 160839 (62392833 r...
Passed MSV filter: 5198 (0.032318);...
Passed bias filter: 4482 (0.0278664)...
Passed Vit filter: 286 (0.00177818...
Passed Fwd filter: 0 (0); expect...
Initial search space (Z): 160839 [actual num...
Domain search space (domZ): 0 [number of ...
# CPU time: 2.20u 0.06s 00:00:02.26 Elapsed: 00:00:02.00
# Mc/sec: 3135.16
//
}}
つまり,シロイヌナズナのたんぱく質データの中にはBcl-2ファ...
ちなみに,全植物からシロイヌナズナを抜いたたんぱく質デー...
#geshi(bash){{
$ hmmsearch PF00452 gp_GB_all_prot.gz
# hmmsearch :: search profile(s) against a sequence datab...
# HMMER 3.0 (March 2010); http://hmmer.org/
# Copyright (C) 2010 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License...
# - - - - - - - - - - - - - - - - - - - - - - - - - - - -...
# query HMM file: PF00452
# target sequence database: gp_GB_all_prot.gz
# - - - - - - - - - - - - - - - - - - - - - - - - - - - -...
Query: Bcl-2 [M=101]
Accession: PF00452.12
Description: Apoptosis regulator proteins, Bcl-2 family
Scores for complete sequences (score includes all domains):
--- full sequence --- --- best 1 domain --- -#dom-
E-value score bias E-value score bias exp ...
------- ------ ----- ------- ------ ----- ---- -...
------ inclusion threshold ------
3.4 13.9 0.0 8.5 12.7 0.0 1.7 ...
3.5 13.9 0.0 16 11.8 0.0 2.1 ...
3.8 13.8 0.0 19 11.5 0.0 2.2 ...
3.9 13.8 0.0 18 11.6 0.0 2.1 ...
4.2 13.7 0.0 5.2 13.3 0.0 1.0 ...
4.2 13.6 0.1 8.7 12.6 0.1 1.5 ...
4.3 13.6 0.0 20 11.5 0.0 2.1 ...
4.3 13.6 0.0 20 11.5 0.0 2.1 ...
4.8 13.5 0.0 21 11.4 0.0 2.1 ...
4.8 13.5 0.0 21 11.4 0.0 2.1 ...
4.8 13.5 0.0 21 11.4 0.0 2.1 ...
5.3 13.3 0.1 85 9.4 0.0 2.8 ...
6.9 12.9 0.1 16 11.7 0.1 1.6 ...
6.9 12.9 0.1 16 11.7 0.1 1.6 ...
8.1 12.7 0.1 87 9.4 0.0 2.2 ...
8.3 12.7 0.1 20 11.5 0.1 1.6 ...
8.3 12.7 0.1 20 11.5 0.1 1.6 ...
8.4 12.7 0.1 3.1e+02 7.7 0.0 2.5 ...
8.4 12.7 0.1 3.1e+02 7.7 0.0 2.5 ...
Domain annotation for each sequence (and alignments):
>> gi|21616905|gb|AAM66415.1| NBS-LRR protein [Oryza sat...
# score bias c-Evalue i-Evalue hmmfrom hmm to ...
--- ------ ----- --------- --------- ------- ------- ...
1 ? 12.7 0.0 0.00013 8.5 2 80 ....
Alignments for each domain:
== domain 1 score: 12.7 bits; conditional E-value: ...
HHHHHHHHHHHHHHCHHHCTTS--...
Bcl-2 2 rslgdeleqeheelfenlleqlni...
+++++e+ +e +++ n+ +i...
gi|21616905|gb|AAM66415.1| 45 KKVIQEITREGTNV-TNFNTLQEI...
55556666655553.344444455...
>> gi|218200452|gb|EEC82879.1| hypothetical protein OsI_...
# score bias c-Evalue i-Evalue hmmfrom hmm to ...
--- ------ ----- --------- --------- ------- ------- ...
1 ? -2.8 0.0 8.9 5.6e+05 9 29 ....
2 ? 11.8 0.0 0.00025 16 33 88 ....
Alignments for each domain:
== domain 1 score: -2.8 bits; conditional E-value: ...
HHHHHHHCHHHCTTS---SHH CS
Bcl-2 9 eqeheelfenlleqlnietpe 29
++ +e+++en+l++++ e+++
gi|218200452|gb|EEC82879.1| 41 DYHYEKEIENVLRRVHEEEDD 61
567888999999988875554 PP
== domain 2 score: 11.8 bits; conditional E-value: ...
HHHHHHHHHHHHH------.......
Bcl-2 33 elfaevaeelfsdgginWG.......
+f ++++++ +d+ inWG ...
gi|218200452|gb|EEC82879.1| 376 YAFVAMGNDVTTDDAINWGmayp...
46999**************9999...
>> gi|125602143|gb|EAZ41468.1| hypothetical protein OsJ_...
# score bias c-Evalue i-Evalue hmmfrom hmm to ...
--- ------ ----- --------- --------- ------- ------- ...
1 ? -2.9 0.0 9.6 6.1e+05 9 29 ....
2 ? 11.5 0.0 0.0003 19 33 86 ....
Alignments for each domain:
== domain 1 score: -2.9 bits; conditional E-value: ...
HHHHHHHCHHHCTTS---SHH CS
Bcl-2 9 eqeheelfenlleqlnietpe 29
++ +e+++en+l++++ e+++
gi|125602143|gb|EAZ41468.1| 59 DYHYEKEIENVLRRVHEEEDD 79
567888999999988875554 PP
== domain 2 score: 11.5 bits; conditional E-value: ...
HHHHHHHHHHHHH------.......
Bcl-2 33 elfaevaeelfsdgginWG.......
+f ++++++ +d+ inWG ...
gi|125602143|gb|EAZ41468.1| 394 YAFVAMGNDVTTDDAINWGmayp...
46999**************9999...
>> gi|222639889|gb|EEE68021.1| hypothetical protein OsJ_...
# score bias c-Evalue i-Evalue hmmfrom hmm to ...
--- ------ ----- --------- --------- ------- ------- ...
1 ? -2.8 0.0 8.9 5.7e+05 9 29 ....
2 ? 11.6 0.0 0.00028 18 33 86 ....
Alignments for each domain:
== domain 1 score: -2.8 bits; conditional E-value: ...
HHHHHHHCHHHCTTS---SHH CS
Bcl-2 9 eqeheelfenlleqlnietpe 29
++ +e+++en+l++++ e+++
gi|222639889|gb|EEE68021.1| 41 DYHYEKEIENVLRRVHEEEDD 61
567888999999988875554 PP
== domain 2 score: 11.6 bits; conditional E-value: ...
HHHHHHHHHHHHH------.......
Bcl-2 33 elfaevaeelfsdgginWG.......
+f ++++++ +d+ inWG ...
gi|222639889|gb|EEE68021.1| 376 YAFVAMGNDVTTDDAINWGmayp...
46999**************9999...
>> gi|157328291|emb|CAO17778.1| unnamed protein product ...
# score bias c-Evalue i-Evalue hmmfrom hmm to ...
--- ------ ----- --------- --------- ------- ------- ...
1 ? 13.3 0.0 8.2e-05 5.2 48 99 ....
Alignments for each domain:
== domain 1 score: 13.3 bits; conditional E-value: ...
-----HHHHHHH--HHHHHCCCH...
Bcl-2 48 inWGRivallafagalakklveq...
i+WGR+v + + g +++ +++...
gi|157328291|emb|CAO17778.1| 4 ISWGRVVLVQC-TGGNCQETRNS...
89***996555.55568888888...
>> gi|116055559|emb|CAL58227.1| peroxisomal membrane 22 ...
# score bias c-Evalue i-Evalue hmmfrom hmm to ...
--- ------ ----- --------- --------- ------- ------- ...
1 ? 12.6 0.1 0.00014 8.7 27 64 ....
Alignments for each domain:
== domain 1 score: 12.6 bits; conditional E-value: ...
SHHHHHHHHHHHHHHHHHH---...
Bcl-2 27 tpeeaselfaevaeelfsdggi...
++ as++f +++ e s +++...
gi|116055559|emb|CAL58227.1| 91 SKCVASDVFVQIVVEERSANDL...
33448899999999999999**...
>> gi|113622936|dbj|BAF22881.1| Os08g0139700 [Oryza sati...
# score bias c-Evalue i-Evalue hmmfrom hmm to ...
--- ------ ----- --------- --------- ------- ------- ...
1 ? -3.0 0.0 10 6.4e+05 9 29 ....
2 ? 11.5 0.0 0.00031 20 33 86 ....
Alignments for each domain:
== domain 1 score: -3.0 bits; conditional E-value: 10
HHHHHHHCHHHCTTS---SHH CS
Bcl-2 9 eqeheelfenlleqlnietpe 29
++ +e+++en+l++++ e+++
gi|113622936|dbj|BAF22881.1| 77 DYHYEKEIENVLRRVHEEEDD 97
567888999999988875554 PP
== domain 2 score: 11.5 bits; conditional E-value: ...
HHHHHHHHHHHHH------......
Bcl-2 33 elfaevaeelfsdgginWG......
+f ++++++ +d+ inWG ...
gi|113622936|dbj|BAF22881.1| 412 YAFVAMGNDVTTDDAINWGmay...
46999**************999...
>> gi|115474741|ref|NP_001060967.1| Os08g0139700 [Oryza ...
# score bias c-Evalue i-Evalue hmmfrom hmm to ...
--- ------ ----- --------- --------- ------- ------- ...
1 ? -3.0 0.0 10 6.4e+05 9 29 ....
2 ? 11.5 0.0 0.00031 20 33 86 ....
Alignments for each domain:
== domain 1 score: -3.0 bits; conditional E-value: 10
HHHHHHHCHHHCTTS---S...
Bcl-2 9 eqeheelfenlleqlniet...
++ +e+++en+l++++ e+++
gi|115474741|ref|NP_001060967.1| 77 DYHYEKEIENVLRRVHEEE...
5678889999999888755...
== domain 2 score: 11.5 bits; conditional E-value: ...
HHHHHHHHHHHHH-----...
Bcl-2 33 elfaevaeelfsdgginW...
+f ++++++ +d+ inW...
gi|115474741|ref|NP_001060967.1| 412 YAFVAMGNDVTTDDAINW...
46999*************...
>> gi|115607548|gb|ABJ16553.1| (E)-beta-caryophyllene/be...
# score bias c-Evalue i-Evalue hmmfrom hmm to ...
--- ------ ----- --------- --------- ------- ------- ...
1 ? 11.4 0.0 0.00033 21 33 86 ....
Alignments for each domain:
== domain 1 score: 11.4 bits; conditional E-value: ...
HHHHHHHHHHHHH------.......
Bcl-2 33 elfaevaeelfsdgginWG.......
+f ++++++ +d+ inWG ...
gi|115607548|gb|ABJ16553.1| 436 YAFVAMGNDVTTDDAINWGmayp...
46999**************9999...
>> gi|38636781|dbj|BAD03024.1| putative sesquiterpene cy...
# score bias c-Evalue i-Evalue hmmfrom hmm to ...
--- ------ ----- --------- --------- ------- ------- ...
1 ? 11.4 0.0 0.00033 21 33 86 ....
Alignments for each domain:
== domain 1 score: 11.4 bits; conditional E-value: ...
HHHHHHHHHHHHH------.......
Bcl-2 33 elfaevaeelfsdgginWG.......
+f ++++++ +d+ inWG ...
gi|38636781|dbj|BAD03024.1| 436 YAFVAMGNDVTTDDAINWGmayp...
46999**************9999...
>> gi|42761322|dbj|BAD11575.1| putative sesquiterpene cy...
# score bias c-Evalue i-Evalue hmmfrom hmm to ...
--- ------ ----- --------- --------- ------- ------- ...
1 ? 11.4 0.0 0.00033 21 33 86 ....
Alignments for each domain:
== domain 1 score: 11.4 bits; conditional E-value: ...
HHHHHHHHHHHHH------.......
Bcl-2 33 elfaevaeelfsdgginWG.......
+f ++++++ +d+ inWG ...
gi|42761322|dbj|BAD11575.1| 436 YAFVAMGNDVTTDDAINWGmayp...
46999**************9999...
>> gi|145282028|gb|ABP49617.1| ubiquitin-protein ligase-...
# score bias c-Evalue i-Evalue hmmfrom hmm to ...
--- ------ ----- --------- --------- ------- ------- ...
1 ? -1.7 0.0 4 2.6e+05 67 88 ....
2 ? -0.8 0.0 2 1.3e+05 14 45 ....
3 ? 9.4 0.0 0.0013 85 15 83 ....
Alignments for each domain:
== domain 1 score: -1.7 bits; conditional E-value: 4
CCCHTCCHHHHCHHHHHHHHHH CS
Bcl-2 67 lveqgeeelvkrivellseyle 88
++ ++++e+v+++ e++ +y++
gi|145282028|gb|ABP49617.1| 21 CEAEKKDEAVNMLKEMVQRYVA 42
4444446666666666666664 PP
== domain 2 score: -0.8 bits; conditional E-value: 2
HHCHHHCTTS---SHHHHHHH.H...
Bcl-2 14 elfenlleqlnietpeeasel.f...
++++ ++e l + + e+ e+ f...
gi|145282028|gb|ABP49617.1| 125 SAYNPMIEYLCNHGQTEKAEVfF...
57888999888877777555548...
== domain 3 score: 9.4 bits; conditional E-value: 0...
HCHHHCTTS---SHHHHHHHHHH...
Bcl-2 15 lfenlleqlnietpeeaselfae...
+++++le +i p + +lf++...
gi|145282028|gb|ABP49617.1| 215 ALDSMLEGGHI--P--EASLFRS...
56777777676..3..4689***...
>> gi|162687831|gb|EDQ74211.1| predicted protein [Physco...
# score bias c-Evalue i-Evalue hmmfrom hmm to ...
--- ------ ----- --------- --------- ------- ------- ...
1 ? 11.7 0.1 0.00026 16 17 70 ....
Alignments for each domain:
== domain 1 score: 11.7 bits; conditional E-value: ...
HHHCTTS---SHHHHHHHHHHHH...
Bcl-2 17 enlleqlnietpeeaselfaeva...
++l lni++ ++ + f +v+...
gi|162687831|gb|EDQ74211.1| 212 QQLKGLLNITNF-TTDTDFVSVM...
555555666444.4445599***...
>> gi|168016571|ref|XP_001760822.1| predicted protein [P...
# score bias c-Evalue i-Evalue hmmfrom hmm to ...
--- ------ ----- --------- --------- ------- ------- ...
1 ? 11.7 0.1 0.00026 16 17 70 ....
Alignments for each domain:
== domain 1 score: 11.7 bits; conditional E-value: ...
HHHCTTS---SHHHHHHH...
Bcl-2 17 enlleqlnietpeeasel...
++l lni++ ++ + ...
gi|168016571|ref|XP_001760822.1| 212 QQLKGLLNITNF-TTDTD...
555555666444.44455...
>> gi|116782627|gb|ABK22581.1| unknown [Picea sitchensis]
# score bias c-Evalue i-Evalue hmmfrom hmm to ...
--- ------ ----- --------- --------- ------- ------- ...
1 ? 0.7 0.0 0.69 4.4e+04 28 50 ....
2 ? 9.4 0.0 0.0014 87 15 82 ....
Alignments for each domain:
== domain 1 score: 0.7 bits; conditional E-value: 0...
HHHHHHHHHHHHHHHHHH----- CS
Bcl-2 28 peeaselfaevaeelfsdgginW 50
++ +e++ e+ +++ + +g nW
gi|116782627|gb|ABK22581.1| 47 YDQLYEAIIEIFKKINEIPGANW 69
34457788888888888899999 PP
== domain 2 score: 9.4 bits; conditional E-value: 0...
HCHHHCTTS---SHHH.HHHHHH...
Bcl-2 15 lfenlleqlnietpee.aselfa...
+ +++++ ++++++e ++e f+...
gi|116782627|gb|ABK22581.1| 76 SMVKMIQEYDLDQNKEiDREEFH...
56689999999888888999***...
>> gi|162692330|gb|EDQ78687.1| predicted protein [Physco...
# score bias c-Evalue i-Evalue hmmfrom hmm to ...
--- ------ ----- --------- --------- ------- ------- ...
1 ? 11.5 0.1 0.00031 20 18 77 ....
Alignments for each domain:
== domain 1 score: 11.5 bits; conditional E-value: ...
HHCTTS---SHHHHHHHHHHHHH...
Bcl-2 18 nlleqlnietpeeaselfaevae...
+l lni t+ ++ + f +v++...
gi|162692330|gb|EDQ78687.1| 213 QLKGLLNI-THFTTDTDFISVMT...
44455566.4445556699****...
>> gi|168007190|ref|XP_001756291.1| predicted protein [P...
# score bias c-Evalue i-Evalue hmmfrom hmm to ...
--- ------ ----- --------- --------- ------- ------- ...
1 ? 11.5 0.1 0.00031 20 18 77 ....
Alignments for each domain:
== domain 1 score: 11.5 bits; conditional E-value: ...
HHCTTS---SHHHHHHHH...
Bcl-2 18 nlleqlnietpeeaself...
+l lni t+ ++ + f...
gi|168007190|ref|XP_001756291.1| 213 QLKGLLNI-THFTTDTDF...
44455566.444555669...
>> gi|222847830|gb|EEE85377.1| predicted protein [Populu...
# score bias c-Evalue i-Evalue hmmfrom hmm to ...
--- ------ ----- --------- --------- ------- ------- ...
1 ? 7.7 0.0 0.0048 3.1e+02 19 55 ....
2 ? 2.3 0.0 0.23 1.5e+04 47 79 ....
Alignments for each domain:
== domain 1 score: 7.7 bits; conditional E-value: 0...
HCTTS---SHHHHHHHHHHHHHH...
Bcl-2 19 lleqlnietpeeaselfaevaee...
++++ ++ + a+++++ev++e...
gi|222847830|gb|EEE85377.1| 73 CSNHNSV-GETAARDTLEEVMAE...
5555555.445599*********...
== domain 2 score: 2.3 bits; conditional E-value: 0...
------HHHHHHH--HHHHHCCC...
Bcl-2 47 ginWGRivallafagalakklve...
+++W Ri al+ + a+ + +...
gi|222847830|gb|EEE85377.1| 242 DPSWARIAALVPEVVSCAEACDQ...
689*******9988888888887...
>> gi|224061623|ref|XP_002300572.1| predicted protein [P...
# score bias c-Evalue i-Evalue hmmfrom hmm to ...
--- ------ ----- --------- --------- ------- ------- ...
1 ? 7.7 0.0 0.0048 3.1e+02 19 55 ....
2 ? 2.3 0.0 0.23 1.5e+04 47 79 ....
Alignments for each domain:
== domain 1 score: 7.7 bits; conditional E-value: 0...
HCTTS---SHHHHHHHHH...
Bcl-2 19 lleqlnietpeeaselfa...
++++ ++ + a+++++...
gi|224061623|ref|XP_002300572.1| 73 CSNHNSV-GETAARDTLE...
5555555.445599****...
== domain 2 score: 2.3 bits; conditional E-value: 0...
------HHHHHHH--HHH...
Bcl-2 47 ginWGRivallafagala...
+++W Ri al+ + a...
gi|224061623|ref|XP_002300572.1| 242 DPSWARIAALVPEVVSCA...
689*******99888888...
Internal pipeline statistics summary:
-------------------------------------
Query model(s): 1 (101 nodes)
Target sequences: 1210800 (410418238 ...
Passed MSV filter: 33081 (0.0273216)...
Passed bias filter: 28502 (0.0235398)...
Passed Vit filter: 1482 (0.00122398...
Passed Fwd filter: 23 (1.89957e-0...
Initial search space (Z): 1210800 [actual num...
Domain search space (domZ): 19 [number of ...
# CPU time: 14.69u 0.41s 00:00:15.10 Elapsed: 00:00:21.93
# Mc/sec: 1890.21
//
}}
ページ名: