バイオ・データ・マイニング/RでRandom Forestを使う

この記事はまだ書きかけです．

はじめに †

Random Forestは，Baggingのような，サンプリングをベースとしたアンサンブル学習の一手法です． Baggingとの最大の違いは，事例だけでなく属性のサンプリングも行うところです．

RにはrandomForestというパッケージがありますので，これを使います．

この記事の内容は，R 2.12.1とrandomForest 4.6-2で確認しました．

インストール †

Rを起動したら，ライブラリーの入手先を指定し，パッケージをインストールします．

> options(CRAN="http://cran.r-project.org")
> install.packages("randomForest")

途中でパッケージをダウンロードするためのサーバーを聞かれますので，日本のミラー・サイトを選択します．

↑

実行 †

ここでは，Rに標準で付属しているirisデータセットを使ってRandom Forestを実行します． Rにirisと入力すると，irisデータセットが表示されます．

> iris
    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
1            5.1         3.5          1.4         0.2     setosa
2            4.9         3.0          1.4         0.2     setosa
3            4.7         3.2          1.3         0.2     setosa
4            4.6         3.1          1.5         0.2     setosa
5            5.0         3.6          1.4         0.2     setosa
...

Random Forestを実行する手順は次のようになります．

randomForestライブラリーの読み込み
データの読み込み
乱数のシードの設定
Random Forestの実行
結果の表示

> library(randomForest)
> data(iris)
> set.seed(17)
> iris.rf <- randomForest(formula = Species ~ ., data = iris)
> print(iris.rf)

Call:
 randomForest(formula = Species ~ ., data = iris) 
               Type of random forest: classification
                     Number of trees: 500
No. of variables tried at each split: 2

        OOB estimate of  error rate: 4.67%
Confusion matrix:
           setosa versicolor virginica class.error
setosa         50          0         0        0.00
versicolor      0         47         3        0.06
virginica       0          4        46        0.08

パラメーターの設定などについての説明を追加したいと思っていますが，またいつか．

↑

参考情報 †

「R言語によるRandom Forest徹底入門－集団学習による分類・予測－」－ Tokyo.R #11で講師をしてきました | hamadakoichi blog
Random Forests, Leo Breiman, Mach Learn, 45:5-32, 2001.

RでRandom Forestを使う

はじめに †

インストール †

実行 †

参考情報 †

とうごろう.jp

とうごろぐ（ブログ）

Twitter

Facebook

授業

最新の20件