将分类标签转化为数字标签
标签码
map()
函数
利用字典的映射
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'key1' : ['a', 'a', 'b', 'b', 'a'],^M
...: 'key2' : ['one', 'two', 'one', 'two', 'one']})
In [3]: df
Out[3]:
key1 key2
0 a one
1 a two
2 b one
3 b two
4 a one
In [4]: df['key1'] = df['key1'].map({'a':'1',"b":"2"})
In [5]: df
Out[5]:
key1 key2
0 1 one
1 1 two
2 2 one
3 2 two
4 1 one
LabelEncoder
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
lb=le.fit(["paris", "paris", "tokyo", "amsterdam"])
lb.transform(["tokyo", "tokyo", "paris", "amsterdam", "amsterdam"])
# array([2, 2, 1, 0, 0], dtype=int64)
或者
preprocessing.LabelEncoder().\
fit_transform(["paris", "paris", "tokyo", "amsterdam"])
# array([1, 1, 2, 0], dtype=int64)
df = pd.DataFrame({'key1' : ['a', 'a', 'b', 'b', 'a'],
'key2' : ['one', 'two', 'one', 'two', 'one']})
preprocessing.LabelEncoder().fit_transform(df['key1'])
# array([0, 0, 1, 1, 0])
独热编码方式
pd.get_dummies()函数
get_dummies
是利用 pandas 实现one hot encode的方式
import pandas as pd
df = pd.DataFrame([
['green' , 'A'],
['red' , 'B'],
['blue' , 'A']])
df.columns = ['color', 'class']
pd.get_dummies(df)
get_dummies
前:
color class
0 green A
1 red B
2 blue A
get_dummies
后:
color_blue color_green color_red class_A class_B
0 0 1 0 1 0
1 0 0 1 0 1
2 1 0 0 1 0
OneHotEncoder用法
from sklearn.preprocessing import OneHotEncoder
df = pd.DataFrame([
['green' , 'A'],
['red' , 'B'],
['blue' , 'A']])
df.columns = ['color', 'class']