使用 where
函數(shù)能將索引掩碼轉(zhuǎn)換成索引位置:
indices = where(mask)indices=> (array([11, 12, 13, 14]),)x[indices] # this indexing is equivalent to the fancy indexing x[mask]=> array([ 5.5, 6. , 6.5, 7. ])
使用 diag
函數(shù)能夠提取出數(shù)組的對(duì)角線(xiàn):
diag(A)=> array([ 0, 11, 22, 33, 44])diag(A, -1)array([10, 21, 32, 43])
take
函數(shù)與高級(jí)索引(fancy indexing)用法相似:
v2 = arange(-3,3)v2=> array([-3, -2, -1, 0, 1, 2])row_indices = [1, 3, 5]v2[row_indices] # fancy indexing=> array([-2, 0, 2])v2.take(row_indices)=> array([-2, 0, 2])
但是 take
也可以用在 list 和其它對(duì)象上:
take([-3, -2, -1, 0, 1, 2], row_indices)=> array([-2, 0, 2])
選取多個(gè)數(shù)組的部分組成新的數(shù)組:
which = [1, 0, 1, 0]choices = [[-2,-2,-2,-2], [5,5,5,5]]choose(which, choices)=> array([ 5, -2, 5, -2])
矢量化是用 Python/Numpy 編寫(xiě)高效數(shù)值計(jì)算代碼的關(guān)鍵,這意味著在程序中盡量選擇使用矩陣或者向量進(jìn)行運(yùn)算,比如矩陣乘法等。
我們可以使用一般的算數(shù)運(yùn)算符,比如加減乘除,對(duì)數(shù)組進(jìn)行標(biāo)量運(yùn)算。
v1 = arange(0, 5)v1 * 2=> array([0, 2, 4, 6, 8])v1 + 2=> array([2, 3, 4, 5, 6])A * 2, A + 2=> (array([[ 0, 2, 4, 6, 8], [20, 22, 24, 26, 28], [40, 42, 44, 46, 48], [60, 62, 64, 66, 68], [80, 82, 84, 86, 88]]), array([[ 2, 3, 4, 5, 6], [12, 13, 14, 15, 16], [22, 23, 24, 25, 26], [32, 33, 34, 35, 36], [42, 43, 44, 45, 46]]))
當(dāng)我們?cè)诰仃囬g進(jìn)行加減乘除時(shí),它的默認(rèn)行為是 element-wise(逐項(xiàng)乘) 的:
A * A # element-wise multiplication=> array([[ 0, 1, 4, 9, 16], [ 100, 121, 144, 169, 196], [ 400, 441, 484, 529, 576], [ 900, 961, 1024, 1089, 1156], [1600, 1681, 1764, 1849, 1936]])v1 * v1=> array([ 0, 1, 4, 9, 16])A.shape, v1.shape=> ((5, 5), (5,))A * v1=> array([[ 0, 1, 4, 9, 16], [ 0, 11, 24, 39, 56], [ 0, 21, 44, 69, 96], [ 0, 31, 64, 99, 136], [ 0, 41, 84, 129, 176]])
矩陣乘法要怎么辦? 有兩種方法。
1.使用 dot
函數(shù)進(jìn)行 矩陣-矩陣,矩陣-向量,數(shù)量積乘法:
dot(A, A)=> array([[ 300, 310, 320, 330, 340], [1300, 1360, 1420, 1480, 1540], [2300, 2410, 2520, 2630, 2740], [3300, 3460, 3620, 3780, 3940], [4300, 4510, 4720, 4930, 5140]])dot(A, v1)=> array([ 30, 130, 230, 330, 430])dot(v1, v1)=> 30
2.將數(shù)組對(duì)象映射到 matrix
類(lèi)型。
M = matrix(A)v = matrix(v1).T # make it a column vectorv=> matrix([[0], [1], [2], [3], [4]])M * M=> matrix([[ 300, 310, 320, 330, 340], [1300, 1360, 1420, 1480, 1540], [2300, 2410, 2520, 2630, 2740], [3300, 3460, 3620, 3780, 3940], [4300, 4510, 4720, 4930, 5140]])M * v=> matrix([[ 30], [130], [230], [330], [430]])# inner productv.T * v=> matrix([[30]])# with matrix objects, standard matrix algebra appliesv + M*v=> matrix([[ 30], [131], [232], [333], [434]])
加減乘除不兼容的維度時(shí)會(huì)報(bào)錯(cuò):
v = matrix([1,2,3,4,5,6]).Tshape(M), shape(v)=> ((5, 5), (6, 1))M * v => Traceback (most recent call last): File "<ipython-input-9-995fb48ad0cc>", line 1, in <module> M * v File "/Applications/Spyder-Py2.app/Contents/Resources/lib/python2.7/numpy/matrixlib/defmatrix.py", line 341, in __mul__ return N.dot(self, asmatrix(other)) ValueError: shapes (5,5) and (6,1) not aligned: 5 (dim 1) != 6 (dim 0)
查看其它運(yùn)算函數(shù): inner
, outer
, cross
, kron
, tensordot
。 可以使用 help(kron)
。
之前我們使用 .T
對(duì) v
進(jìn)行了轉(zhuǎn)置。 我們也可以使用 transpose
函數(shù)完成同樣的事情。
讓我們看看其它變換函數(shù):
C = matrix([[1j, 2j], [3j, 4j]])C=> matrix([[ 0.+1.j, 0.+2.j], [ 0.+3.j, 0.+4.j]])
共軛:
conjugate(C)=> matrix([[ 0.-1.j, 0.-2.j], [ 0.-3.j, 0.-4.j]])
共軛轉(zhuǎn)置:
C.H=> matrix([[ 0.-1.j, 0.-3.j], [ 0.-2.j, 0.-4.j]])
real
與 imag
能夠分別得到復(fù)數(shù)的實(shí)部與虛部:
real(C) # same as: C.real=> matrix([[ 0., 0.], [ 0., 0.]])imag(C) # same as: C.imag=> matrix([[ 1., 2.], [ 3., 4.]])
angle
與 abs
可以分別得到幅角和絕對(duì)值:
angle(C+1) # heads up MATLAB Users, angle is used instead of arg=> array([[ 0.78539816, 1.10714872], [ 1.24904577, 1.32581766]])abs(C)=> matrix([[ 1., 2.], [ 3., 4.]])
from scipy.linalg import *inv(C) # equivalent to C.I => matrix([[ 0.+2.j , 0.-1.j ], [ 0.-1.5j, 0.+0.5j]])C.I * C=> matrix([[ 1.00000000e+00+0.j, 4.44089210e-16+0.j], [ 0.00000000e+00+0.j, 1.00000000e+00+0.j]])
linalg.det(C)=> (2.0000000000000004+0j)linalg.det(C.I)=> (0.50000000000000011+0j)
將數(shù)據(jù)集存儲(chǔ)在 Numpy 數(shù)組中能很方便地得到統(tǒng)計(jì)數(shù)據(jù)。為了有個(gè)感性地認(rèn)識(shí),讓我們用 numpy 來(lái)處理斯德哥爾摩天氣的數(shù)據(jù)。
# reminder, the tempeature dataset is stored in the data variable:shape(data)=> (77431, 7)
# the temperature data is in column 3mean(data[:,3])=> 6.1971096847515925
過(guò)去200年里斯德哥爾摩的日均溫度大約是 6.2 C。
std(data[:,3]), var(data[:,3])=> (8.2822716213405663, 68.596023209663286)
# lowest daily average temperaturedata[:,3].min()=> -25.800000000000001# highest daily average temperaturedata[:,3].max()=> 28.300000000000001
d = arange(0, 10)d=> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])# sum up all elementssum(d)=> 45# product of all elementsprod(d+1)=> 3628800# cummulative sumcumsum(d)=> array([ 0, 1, 3, 6, 10, 15, 21, 28, 36, 45])# cummulative productcumprod(d+1)=> array([ 1, 2, 6, 24, 120, 720, 5040, 40320, 362880, 3628800])# same as: diag(A).sum()trace(A)=> 110
我們能夠通過(guò)在數(shù)組中使用索引,高級(jí)索引,和其它從數(shù)組提取數(shù)據(jù)的方法來(lái)對(duì)數(shù)據(jù)集的子集進(jìn)行操作。
舉個(gè)例子,我們會(huì)再次用到溫度數(shù)據(jù)集:
!head -n 3 stockholm_td_adj.dat1800 1 1 -6.1 -6.1 -6.1 11800 1 2 -15.4 -15.4 -15.4 11800 1 3 -15.0 -15.0 -15.0 1
該數(shù)據(jù)集的格式是:年,月,日,日均溫度,最低溫度,最高溫度,地點(diǎn)。
如果我們只是關(guān)注一個(gè)特定月份的平均溫度,比如說(shuō)2月份,那么我們可以創(chuàng)建一個(gè)索引掩碼,只選取出我們需要的數(shù)據(jù)進(jìn)行操作:
unique(data[:,1]) # the month column takes values from 1 to 12=> array([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11., 12.])mask_feb = data[:,1] == 2# the temperature data is in column 3mean(data[mask_feb,3])=> -3.2121095707366085
擁有了這些工具我們就擁有了非常強(qiáng)大的數(shù)據(jù)處理能力。 像是計(jì)算每個(gè)月的平均溫度只需要幾行代碼:
months = arange(1,13)monthly_mean = [mean(data[data[:,1] == month, 3]) for month in months]fig, ax = subplots()ax.bar(months, monthly_mean)ax.set_xlabel("Month")ax.set_ylabel("Monthly avg. temp.");
當(dāng)諸如 min
, max
等函數(shù)對(duì)高維數(shù)組進(jìn)行操作時(shí),有時(shí)我們希望是對(duì)整個(gè)數(shù)組進(jìn)行該操作,有時(shí)則希望是對(duì)每一行進(jìn)行該操作。使用 axis
參數(shù)我們可以指定函數(shù)的行為:
m = rand(3,3)m=> array([[ 0.09260423, 0.73349712, 0.43306604], [ 0.65890098, 0.4972126 , 0.83049668], [ 0.80428551, 0.0817173 , 0.57833117]])# global maxm.max()=> 0.83049668273782951# max in each columnm.max(axis=0)=> array([ 0.80428551, 0.73349712, 0.83049668])# max in each rowm.max(axis=1)=> array([ 0.73349712, 0.83049668, 0.80428551])
Numpy 數(shù)組的維度可以在底層數(shù)據(jù)不用復(fù)制的情況下進(jìn)行修改,所以 reshape
操作的速度非???,即使是操作大數(shù)組。
A=> array([[ 0, 1, 2, 3, 4], [10, 11, 12, 13, 14], [20, 21, 22, 23, 24], [30, 31, 32, 33, 34], [40, 41, 42, 43, 44]])n, m = A.shapeB = A.reshape((1,n*m))B=> array([[ 0, 1, 2, 3, 4, 10, 11, 12, 13, 14, 20, 21, 22, 23, 24, 30, 31, 32, 33, 34, 40, 41, 42, 43, 44]])B[0,0:5] = 5 # modify the array B=> array([[ 5, 5, 5, 5, 5, 10, 11, 12, 13, 14, 20, 21, 22, 23, 24, 30, 31, 32, 33, 34, 40, 41, 42, 43, 44]])A # and the original variable is also changed. B is only a different view of the same data=> array([[ 5, 5, 5, 5, 5], [10, 11, 12, 13, 14], [20, 21, 22, 23, 24], [30, 31, 32, 33, 34], [40, 41, 42, 43, 44]])
我們也可以使用 flatten
函數(shù)創(chuàng)建一個(gè)高階數(shù)組的向量版本,但是它會(huì)將數(shù)據(jù)做一份拷貝。
B = A.flatten()B=> array([ 5, 5, 5, 5, 5, 10, 11, 12, 13, 14, 20, 21, 22, 23, 24, 30, 31, 32, 33, 34, 40, 41, 42, 43, 44])B[0:5] = 10 B=> array([10, 10, 10, 10, 10, 10, 11, 12, 13, 14, 20, 21, 22, 23, 24, 30, 31, 32, 33, 34, 40, 41, 42, 43, 44])A # now A has not changed, because B's data is a copy of A's, not refering to the same data=> array([[ 5, 5, 5, 5, 5], [10, 11, 12, 13, 14], [20, 21, 22, 23, 24], [30, 31, 32, 33, 34], [40, 41, 42, 43, 44]])
newaxis
可以幫助我們?yōu)閿?shù)組增加一個(gè)新維度,比如說(shuō),將一個(gè)向量轉(zhuǎn)換成列矩陣和行矩陣:
v = array([1,2,3])shape(v)=> (3,)# make a column matrix of the vector vv[:, newaxis]=> array([[1], [2], [3]])# column matrixv[:,newaxis].shape=> (3, 1)# row matrixv[newaxis,:].shape=> (1, 3)
函數(shù) repeat
, tile
, vstack
, hstack
, 與 concatenate
能幫助我們以已有的矩陣為基礎(chǔ)創(chuàng)建規(guī)模更大的矩陣。
tile
與 repeat
a = array([[1, 2], [3, 4]])# repeat each element 3 timesrepeat(a, 3)=> array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4])# tile the matrix 3 times tile(a, 3)=> array([[1, 2, 1, 2, 1, 2], [3, 4, 3, 4, 3, 4]])
concatenate
b = array([[5, 6]])concatenate((a, b), axis=0)=> array([[1, 2], [3, 4], [5, 6]])concatenate((a, b.T), axis=1)=> array([[1, 2, 5], [3, 4, 6]])
hstack
與 vstack
vstack((a,b))=> array([[1, 2], [3, 4], [5, 6]])hstack((a,b.T))=> array([[1, 2, 5], [3, 4, 6]])
為了獲得高性能,Python 中的賦值常常不拷貝底層對(duì)象,這被稱(chēng)作淺拷貝。
A = array([[1, 2], [3, 4]]) A=> array([[1, 2], [3, 4]])# now B is referring to the same array data as A B = A # changing B affects AB[0,0] = 10B=> array([[10, 2], [ 3, 4]])A=> array([[10, 2], [ 3, 4]])
如果我們希望避免改變?cè)瓟?shù)組數(shù)據(jù)的這種情況,那么我們需要使用 copy
函數(shù)進(jìn)行深拷貝:
B = copy(A)# now, if we modify B, A is not affectedB[0,0] = -5B=> array([[-5, 2], [ 3, 4]])A=> array([[10, 2], [ 3, 4]])
通常情況下,我們是希望盡可能避免遍歷數(shù)組元素的。因?yàn)榈啾认蛄窟\(yùn)算要慢的多。
但是有些時(shí)候迭代又是不可避免的,這種情況下用 Python 的 for
是最方便的:
v = array([1,2,3,4])for element in v: print(element)=> 1 2 3 4M = array([[1,2], [3,4]])for row in M: print("row", row) for element in row: print(element)=> row [1 2] 1 2 row [3 4] 3 4
當(dāng)我們需要遍歷數(shù)組并且更改元素內(nèi)容的時(shí)候,可以使用 enumerate
函數(shù)同時(shí)獲取元素與對(duì)應(yīng)的序號(hào):
for row_idx, row in enumerate(M): print("row_idx", row_idx, "row", row) for col_idx, element in enumerate(row): print("col_idx", col_idx, "element", element) # update the matrix M: square each element M[row_idx, col_idx] = element ** 2row_idx 0 row [1 2]col_idx 0 element 1col_idx 1 element 2row_idx 1 row [3 4]col_idx 0 element 3col_idx 1 element 4# each element in M is now squaredMarray([[ 1, 4], [ 9, 16]])
像之前提到的,為了獲得更好的性能我們最好盡可能避免遍歷我們的向量和矩陣,有時(shí)可以用矢量算法代替。首先要做的就是將標(biāo)量算法轉(zhuǎn)換為矢量算法:
def Theta(x): """ Scalar implemenation of the Heaviside step function. """ if x >= 0: return 1 else: return 0Theta(array([-3,-2,-1,0,1,2,3]))=> Traceback (most recent call last): File "<ipython-input-11-1f7d89baf696>", line 1, in <module> Theta(array([-3, -2, -1, 0, 1, 2, 3])) File "<ipython-input-10-fbb0379ab8cb>", line 2, in Theta if x >= 0: ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
很顯然 Theta
函數(shù)不是矢量函數(shù)所以無(wú)法處理向量。
為了得到 Theta
函數(shù)的矢量化版本我們可以使用 vectorize
函數(shù):
Theta_vec = vectorize(Theta)Theta_vec(array([-3,-2,-1,0,1,2,3]))=> array([0, 0, 0, 1, 1, 1, 1])
我們也可以自己實(shí)現(xiàn)矢量函數(shù):
def Theta(x): """ Vector-aware implemenation of the Heaviside step function. """ return 1 * (x >= 0)Theta(array([-3,-2,-1,0,1,2,3]))=> array([0, 0, 0, 1, 1, 1, 1])# still works for scalars as wellTheta(-1.2), Theta(2.6)=> (0, 1)
M=> array([[ 1, 4], [ 9, 16]])if (M > 5).any(): print("at least one element in M is larger than 5")else: print("no element in M is larger than 5")=> at least one element in M is larger than 5if (M > 5).all(): print("all elements in M are larger than 5")else: print("all elements in M are not larger than 5")=> all elements in M are not larger than 5
既然 Numpy 數(shù)組是靜態(tài)類(lèi)型,數(shù)組一旦生成類(lèi)型就無(wú)法改變。但是我們可以顯示地對(duì)某些元素?cái)?shù)據(jù)類(lèi)型進(jìn)行轉(zhuǎn)換生成新的數(shù)組,使用 astype
函數(shù)(可查看功能相似的 asarray
函數(shù)):
M.dtype=> dtype('int64')M2 = M.astype(float) M2=> array([[ 1., 4.], [ 9., 16.]])M2.dtype=> dtype('float64')M3 = M.astype(bool)M3=> array([[ True, True], [ True, True]], dtype=bool)
聯(lián)系客服