petitviolet_blog

@petitviolet blog

Scipyでの疎行列の扱い

python 備忘録勉強

行列の計算

import scipy.sparse as sp
import numpy as np

a = sp.lil_matrix((1, 10000)) # 1*10000の疎行列が作成される
b = sp.lil_matrix((1, 10000))
# a.shape => (1, 10000)
for i in xrange(a.shape[1]):
	r = np.random.rand()
	if r < 0.9:
		r = 0.0
	a[0, i] = r
# aの各要素にrandomで数値を格納した
a
# => <1x10000 sparse matrix of type '<type 'numpy.float64'>'
        with 947 stored elements in LInked List format>
# bも同様にした

変換

ca = a.tocsr()
ca
# => <1x10000 sparse matrix of type '<type 'numpy.float64'>'
        with 947 stored elements in Compressed Sparse Row format>
#lil => csrとなりました

疎行列の種類は以下の7種類
ここから持ってきました。

csc_matrix: Compressed Sparse Column format
csr_matrix: Compressed Sparse Row format
bsr_matrix: Block Sparse Row format
lil_matrix: List of Lists format
dok_matrix: Dictionary of Keys format
coo_matrix: COOrdinate format (aka IJV, triplet format)
dia_matrix: DIAgonal format

行列積

# 転置行列
ta = a.T
# 行列の積
print a.dot(ta) # (1,1)の行列だが、これも疎行列で表される
# => (0, 0)        853.19504342
a * ta #これでも可

ベクトルの大きさ

v = np.array([[1, 1]])
math.sqrt(np.dot(v, v.T))
# => 1.4142135623730951
np.linalg.norm(v)
# => 1.4142135623730951

np.linalg.norm(a)
# => エラー起きる
np.linalg.norm(a.todense())
np.linalg.norm(a.toarray())
# => 29.209502621916037

＊＊コサイン類似度

import scipy.spatial.distance as dis
dis.cosine(a.todense(), b.todense())
# => 0.91347774109309299

疎行列を疎行列と扱うには独自でプログラム作らないとだめなんかなー
1 * Nベクトル同士の距離計算とか落ちてないかな