fetch_lfw_peopleのimport¶

linearSVCを用いて分類学習を行う
Hold-out Validationを用いてモデルの評価を行う（train_test_split）

from sklearn.datasets import fetch_lfw_people
from sklearn.model_selection import train_test_split
from sklearn import svm

from matplotlib import pyplot as plt

import numpy as np

画像ロード¶

data :画像数は1140個。1画像は2914（縦62×横47）
imges :画像形式の配列（各dataをreshape(62,47)した形態）
target :分類結果（dataを分類した場合の正解データになる値）
target_names : 分類結果（target）の値に対応する人物名（文字列）

lfw = fetch_lfw_people(data_home='./scikit_learn_data/', min_faces_per_person=100, resize=0.5)
print('lfw:',lfw.keys())
print('data:',lfw.data.shape)
print('images:',lfw.images.shape)
print('target:',lfw.target.shape)
print('target_names:',lfw.target_names)

lfw: dict_keys(['data', 'images', 'target', 'target_names', 'DESCR'])
data: (1140, 2914)
images: (1140, 62, 47)
target: (1140,)
target_names: ['Colin Powell' 'Donald Rumsfeld' 'George W Bush' 'Gerhard Schroeder'
 'Tony Blair']

データを学習用データと検証用データに分ける¶

1/4を検証用データとする

X = lfw.data
y = lfw.target
X_train, X_test, y_train, y_test = train_test_split(lfw.data, lfw.target, test_size=0.25, random_state=0)

# 学習用データ数と検証用データ数
print('num of train data:', X_train.shape[0],
      '\nnum of test  data:', X_test.shape[0])

num of train data: 855 
num of test  data: 285

v,h = lfw.images.shape[1:3] # 画像の垂直・水平サイズを保持する
n_train = X_train.shape[0]  # 学習データ数を保持する

少しデータを覗いておきましょう。¶

for i in range(10):
    subplt = plt.subplot(2,5, i+1)
    subplt.imshow(X_train.reshape(n_train,v,h)[i], cmap='gray')
plt.show()

学習モデルを定義して学習させてみましょう。¶

linSVC=svm.LinearSVC(verbose=1)

linSVC.fit(X_train, y_train)

[LibLinear]

C:\Anaconda3\envs\AI\lib\site-packages\sklearn\svm\_base.py:947: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  "the number of iterations.", ConvergenceWarning)

LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
          intercept_scaling=1, loss='squared_hinge', max_iter=1000,
          multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,
          verbose=1)

収束しませんでした(-.-)　とのWarningが出たので、max_iterを増やしてみた。¶

linSVC.max_iter=10000
linSVC.fit(X_train, y_train)

[LibLinear]

C:\Anaconda3\envs\AI\lib\site-packages\sklearn\svm\_base.py:947: ConvergenceWarning: Liblinear failed to converge, increase the number of iterations.
  "the number of iterations.", ConvergenceWarning)

LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
          intercept_scaling=1, loss='squared_hinge', max_iter=10000,
          multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,
          verbose=1)

でも、やっぱりダメ。単純に回数増やしても収束しないようです。一旦置いておきましょう。¶

現状のモデルで、検証用データで予測値と正解を比較してみました。¶

ところどころ違いますね・・・。

linSVC.predict(X_test) # 学習結果での予測

array([0, 1, 2, 0, 1, 2, 0, 4, 2, 2, 1, 4, 2, 2, 2, 0, 2, 2, 2, 1, 2, 0,
       2, 1, 0, 2, 2, 4, 2, 2, 0, 4, 0, 2, 0, 2, 2, 3, 2, 4, 4, 2, 0, 2,
       3, 2, 4, 2, 2, 4, 3, 0, 0, 0, 2, 3, 2, 0, 4, 2, 0, 1, 2, 0, 2, 3,
       1, 2, 4, 2, 3, 3, 3, 2, 2, 2, 2, 3, 3, 2, 3, 2, 0, 0, 0, 2, 2, 1,
       1, 4, 4, 3, 4, 0, 2, 2, 2, 2, 4, 2, 2, 2, 1, 3, 0, 4, 2, 2, 2, 2,
       1, 0, 3, 0, 2, 0, 1, 2, 4, 2, 2, 2, 2, 3, 2, 2, 0, 2, 1, 2, 1, 0,
       3, 2, 2, 0, 4, 1, 0, 2, 4, 0, 2, 2, 4, 2, 2, 3, 2, 1, 2, 2, 2, 2,
       0, 0, 3, 2, 2, 0, 2, 2, 2, 2, 2, 3, 1, 0, 0, 2, 2, 2, 0, 2, 4, 2,
       4, 3, 1, 4, 1, 2, 0, 2, 0, 2, 2, 2, 2, 2, 2, 2, 2, 3, 1, 2, 1, 2,
       4, 2, 2, 1, 2, 2, 2, 3, 4, 0, 1, 0, 1, 0, 0, 3, 3, 2, 2, 0, 0, 2,
       4, 2, 4, 1, 0, 2, 1, 0, 2, 2, 2, 2, 0, 2, 3, 0, 2, 0, 0, 2, 0, 1,
       0, 1, 1, 4, 0, 2, 2, 1, 2, 1, 3, 4, 2, 2, 0, 2, 0, 4, 2, 2, 2, 3,
       2, 2, 2, 1, 2, 3, 2, 2, 3, 1, 1, 2, 4, 0, 3, 0, 3, 2, 0, 4, 2],
      dtype=int64)

y_test # 正解

array([0, 1, 2, 0, 2, 2, 4, 4, 2, 2, 1, 4, 1, 4, 2, 0, 2, 2, 2, 2, 4, 1,
       2, 1, 0, 2, 2, 4, 2, 2, 0, 4, 0, 2, 0, 2, 2, 2, 2, 4, 4, 2, 0, 2,
       3, 1, 4, 2, 2, 4, 3, 0, 0, 0, 2, 2, 2, 2, 4, 2, 0, 2, 2, 0, 2, 3,
       1, 2, 4, 2, 3, 3, 3, 2, 2, 2, 2, 3, 3, 1, 3, 2, 0, 0, 0, 4, 2, 1,
       1, 4, 4, 3, 4, 0, 2, 2, 2, 2, 4, 2, 2, 2, 1, 3, 0, 4, 2, 2, 2, 0,
       1, 0, 4, 0, 2, 0, 1, 2, 4, 2, 4, 2, 2, 3, 2, 2, 0, 2, 1, 2, 1, 0,
       3, 2, 3, 0, 4, 1, 0, 1, 4, 0, 2, 2, 1, 2, 2, 3, 3, 1, 2, 2, 2, 2,
       0, 0, 3, 2, 2, 0, 2, 2, 2, 2, 0, 3, 1, 0, 0, 2, 2, 2, 0, 3, 4, 2,
       4, 3, 3, 4, 1, 2, 0, 2, 0, 2, 2, 2, 2, 2, 2, 1, 2, 3, 1, 2, 1, 4,
       0, 2, 2, 1, 2, 2, 2, 0, 0, 0, 0, 0, 1, 0, 0, 3, 3, 2, 2, 0, 0, 2,
       4, 2, 4, 1, 0, 0, 0, 2, 2, 2, 2, 2, 1, 2, 3, 0, 2, 0, 0, 2, 0, 1,
       1, 1, 0, 4, 0, 2, 2, 1, 2, 1, 3, 4, 2, 2, 0, 2, 0, 4, 2, 2, 2, 3,
       0, 2, 2, 1, 2, 3, 2, 2, 3, 1, 1, 2, 4, 0, 3, 0, 3, 2, 0, 4, 2],
      dtype=int64)

linSVC.score(X_train,y_train)

1.0

linSVC.score(X_test,y_test)

0.8701754385964913

fetch_lfw_peopleのimport¶

画像ロード¶

データを学習用データと検証用データに分ける¶

少しデータを覗いておきましょう。¶

学習モデルを定義して学習させてみましょう。¶

収束しませんでした(-.-) とのWarningが出たので、max_iterを増やしてみた。¶

でも、やっぱりダメ。単純に回数増やしても収束しないようです。一旦置いておきましょう。¶

現状のモデルで、検証用データで予測値と正解を比較してみました。¶

少しは分類できているようですが・・・¶

収束しませんでした(-.-)　とのWarningが出たので、max_iterを増やしてみた。¶