错位的梦寐

将分类标签转化为数字标签


将分类标签转化为数字标签

标签码

map() 函数

利用字典的映射

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'key1' : ['a', 'a', 'b', 'b', 'a'],^M
   ...:                    'key2' : ['one', 'two', 'one', 'two', 'one']})

In [3]: df
Out[3]:
  key1 key2
0    a  one
1    a  two
2    b  one
3    b  two
4    a  one

In [4]: df['key1'] = df['key1'].map({'a':'1',"b":"2"})

In [5]: df
Out[5]:
  key1 key2
0    1  one
1    1  two
2    2  one
3    2  two
4    1  one

LabelEncoder

from sklearn import preprocessing
le = preprocessing.LabelEncoder()

lb=le.fit(["paris", "paris", "tokyo", "amsterdam"])

lb.transform(["tokyo", "tokyo", "paris", "amsterdam", "amsterdam"]) 

# array([2, 2, 1, 0, 0], dtype=int64)

或者

preprocessing.LabelEncoder().\
			fit_transform(["paris", "paris", "tokyo", "amsterdam"])
    
# array([1, 1, 2, 0], dtype=int64)
df = pd.DataFrame({'key1' : ['a', 'a', 'b', 'b', 'a'],
                   'key2' : ['one', 'two', 'one', 'two', 'one']})

preprocessing.LabelEncoder().fit_transform(df['key1'])
# array([0, 0, 1, 1, 0])

独热编码方式

pd.get_dummies()函数

get_dummies 是利用 pandas 实现one hot encode的方式

参考

import pandas as pd
df = pd.DataFrame([  
            ['green' , 'A'],   
            ['red'   , 'B'],   
            ['blue'  , 'A']])  

df.columns = ['color',  'class'] 
pd.get_dummies(df)

get_dummies 前:


    color    	class
0    green    	A
1    red        B
2    blue    	A

get_dummies 后:

    color_blue    color_green    color_red    class_A        class_B
0    0            1            0            1            0
1    0            0            1            0            1
2    1            0            0            1            0

OneHotEncoder用法

参考

from sklearn.preprocessing import  OneHotEncoder

df = pd.DataFrame([  
            ['green' , 'A'],   
            ['red'   , 'B'],   
            ['blue'  , 'A']])  

df.columns = ['color',  'class'] 


Comments

Content