Pythonでデータ分析するはじめの一歩（Windows編）

2018-11-12 (月) 15:31:51 (2427d) | Topic path: Top / 機械学習 / Pythonでデータ分析するはじめの一歩（Windows編）

はじめに †

WindowsでAnacondaを使ってデータ分析を行う環境を整えます。

この記事の内容は、以下のバージョンで確認しました。

Windows 10
Anaconda 5.3 Python 3.7 version

インストール †

AnacondaのPython 3バージョンをダウンロードして、インストールします。

Anaconda

動作確認 †

Jupyter Notebook †

Windowsのスタートメニューの Anaconda3 (64it) フォルダーから Jupyter Notebook を選択し、起動します。

ブラウザが開いたら、New から Python 3 を選択します。

すると、新しいノートブックが作成されます。

scikit-learn †

標準で入っているirisデータをSVM (SVC)で学習してみます。

入力ボックスにPythonのプログラムを入力し、実行ボタンをクリックするか、Shiftキーを押しながらreturnキーを押して実行します。

from sklearn import datasets
from sklearn.svm import SVC
clf = SVC()
iris = datasets.load_iris()
clf.fit(iris.data, iris.target)

pandas †

pandasの動作を確認するため、irisデータをpandasに入れます。

まずは、irisのデータをそのまま表示してみます。

iris.data

次に、iris.dataをpandasのデータフレームに入れて表示してみます。

import pandas as pd
df_iris = pd.DataFrame(iris.data, columns=['sepal_length', 'sepal_width', 'petal_length', 'petal_width'])
df_iris

Matplotlib †

Matplotlibを使って、グラフを描いてみます。

まず、予測値をpandasのデータフレームに追加します。

iris.df['predict'] = clf.predict(iris.data)
iris.df

次に、予測した値によって、3つのグループに分けます。

x0 = df_iris[df_iris.predict==0]['sepal_length']
y0 = df_iris[df_iris.predict==0]['sepal_width']
x1 = df_iris[df_iris.predict==1]['sepal_length']
y1 = df_iris[df_iris.predict==1]['sepal_width']
x2 = df_iris[df_iris.predict==2]['sepal_length']
y2 = df_iris[df_iris.predict==2]['sepal_width']

これをMatplotlibで表示します。

%matplotlib notebook
import matplotlib.pyplot as plt
fig = plt.figure()
subplt = fig.add_subplot(1, 1, 1)
subplt.scatter(x0, y0, c='red')
subplt.scatter(x1, y1, c='green')
subplt.scatter(x2, y2, c='blue')

添付ファイル:

Anaconda_download.png 361件 [詳細]

matplotlib.png 421件 [詳細]

pandas.png 435件 [詳細]

numpy.png 403件 [詳細]

scikit_learn.png 415件 [詳細]

jupyter_notebook.png 410件 [詳細]

new_python3.png 385件 [詳細]

command_prompt.png 422件 [詳細]

download_python.png 407件 [詳細]

install_python.png 423件 [詳細]

とうごろう.jp

とうごろぐ（ブログ）

Twitter

Facebook

授業

最新の20件

2025-06-28

Tips For Online Dating Website No Cost

2025-05-12

機械学習/Rで機械学習する

2025-01-11

ColabでCUDAとPyTorchとPythonをダウングレードする

2024-10-02

バイオ・データ・マイニング/ClustalWでペアワイズ・アラインメントを行う

2024-08-06

2023-12-26

金融データ・マイニング/動的クラスタリングとクラスター変化検出

2023-12-22

強化学習/安全な強化学習

2023-12-21

2023-12-19

授業/情報数学

2023-01-11

バイオ・データ・マイニング/Rでロジスティック回帰を使う

2022-11-09

2022-10-14

バイオ・データ・マイニング/HMMERで相同性検索を行う

2020-12-23

バイオ・データ・マイニング/Rで回帰分析する

2020-12-09

バイオ・データ・マイニング/Rで階層クラスタリングを使う

2020-10-21

バイオ・データ・マイニング/BLASTで相同性検索を行う