Определение стоимости автомобилей¶
Сервис по продаже автомобилей с пробегом «Не бит, не крашен» разрабатывает приложение для привлечения новых клиентов. В нём можно быстро узнать рыночную стоимость своего автомобиля. В вашем распоряжении исторические данные: технические характеристики, комплектации и цены автомобилей. Вам нужно построить модель для определения стоимости.
Заказчику важны:
- качество предсказания;
- скорость предсказания;
- время обучения.
Инструкция по выполнению проекта¶
Чтобы усилить исследование, не ограничивайтесь градиентным бустингом. Попробуйте более простые модели — иногда они работают лучше. Эти редкие случаи легко пропустить, если всегда применять только бустинг.
Поэкспериментируйте и сравните характеристики моделей: время обучения, время предсказания, точность результата.
Основные шаги:
- Загрузите данные, путь к файлу: /datasets/autos.csv.
- Изучите данные. Заполните пропущенные значения и обработайте аномалии в столбцах. Если среди признаков имеются неинформативные, удалите их.
- Подготовьте выборки для обучения моделей.
- Обучите разные модели, одна из которых — LightGBM, как минимум одна — не бустинг. Для каждой модели попробуйте разные гиперпараметры.
- Проанализируйте время обучения, время предсказания и качество моделей.
- Опираясь на критерии заказчика, выберете лучшую модель, проверьте её качество на тестовой выборке.
Примечания:
- Для оценки качества моделей применяйте метрику RMSE.
- Значение метрики RMSE должно быть меньше 2500.
- Самостоятельно освойте библиотеку LightGBM и её средствами постройте модели градиентного бустинга.
- Время выполнения ячейки кода Jupyter Notebook можно получить специальной командой. Найдите её.
- Модель градиентного бустинга может долго обучаться, поэтому измените у неё только два-три параметра.
- Если перестанет работать Jupyter Notebook, удалите лишние переменные оператором
del
.
Описание данных¶
Данные находятся в файле /datasets/autos.csv.
Признаки
DateCrawled
— дата скачивания анкеты из базыVehicleType
— тип автомобильного кузоваRegistrationYear
— год регистрации автомобиляGearbox
— тип коробки передачPower
— мощность (л. с.)Model
— модель автомобиляKilometer
— пробег (км)RegistrationMonth
— месяц регистрации автомобиляFuelType
— тип топливаBrand
— марка автомобиляRepaired
— была машина в ремонте или нетDateCreated
— дата создания анкетыNumberOfPictures
— количество фотографий автомобиляPostalCode
— почтовый индекс владельца анкеты (пользователя)LastSeen
— дата последней активности пользователя
Целевой признак
Price
— цена (евро)
Подготовка данных¶
Настройка тетради¶
# Базовые библиотеки
import pandas as pd # Датафреймы
import numpy as np # Математика для массивов
from math import factorial # Факториалы
from scipy import stats as st # Статистика
import os # Библиотека для оптимизации чтения данных из файла
import time # Расчет времени выполнения функций
# Pipeline (пайплайн)
from sklearn.pipeline import(
Pipeline, # Pipeline с ручным вводом названий шагов.
make_pipeline # Pipeline с автоматическим названием шагов.
)
# Функция для поддержки экспериментальной функции HavingGridSearchSV
from sklearn.experimental import enable_halving_search_cv
# Ускоренная автоматизация поиска лучших моделей и их параметров
from sklearn.model_selection import HalvingGridSearchCV
# Ускоренная автоматизация рандомного поиска лучших моделей и их параметров
from sklearn.model_selection import HalvingRandomSearchCV
# Автоматизация раздельного декодирования признаков
from sklearn.compose import(
make_column_selector,
make_column_transformer,
ColumnTransformer
)
# Обработка данных для машинного обучения
# Стандартизация данных
import re
#! pip install sklearn.preprocessing
from sklearn.preprocessing import(
OneHotEncoder, # Создание отдельных столбцов для каждого категориального значения, drop='first' (удаление первого столбца против dummy-ловушки), sparse=False (?)
OrdinalEncoder, # Кодирование порядковых категориальных признаков
#TargetEncoder, # Кодирование категорий на основе таргетов (ошибка, модуль не найден)
LabelEncoder,
StandardScaler,
MinMaxScaler
)
# Кодирование категорий на основе таргетов
!pip install -U category_encoders
from category_encoders.target_encoder import TargetEncoder
# Другие функции предобработки данных
from sklearn.impute import KNNImputer # Заполнение пропусков в данных методом k-блжиайших соседей.
from sklearn.utils import shuffle # Перемешивание данных для уравновешивания их в разных выборках
from statsmodels.stats.outliers_influence import variance_inflation_factor # Коэффициент инфляции дисперсии (5 и более - признак коррелирует со всеми остальными, его можно удалить и выразить через другие признаки)
from sklearn.model_selection import(
GridSearchCV, # Поиск гиперпараметров по сетке (GridSearch)
train_test_split, # Разделение выборок с целевыми и нецелевыми признаками на обучающую и тестовую
validation_curve,
StratifiedKFold, # Кроссвалидация с указанием количества фолдов (частей, на которые будет разбита обучающая выборка, одна из которых будет участвовать в валидации)
KFold, # Кроссвалидация
cross_val_score # Оценка качества модели на кроссвалидации
)
# Различные модели машинного обучения (в данном проекте требуется регрессия)
# (есть разбор на https://russianblogs.com/article/83691573909/)
# Линейная модель
from sklearn.linear_model import(
#LogisticRegression, # Линейная классификация
LinearRegression, # Линейная регрессия
Ridge , # Линейная регрессия. "Хребтовая" регрессия (метод наименьших квадратов)
BayesianRidge , # Линейная регрессия. Байесовская "хребтовая" регрессия (максимизации предельного логарифмического правдоподобия)
SGDRegressor # Линейная регрессия. SGD - Стохастический градиентный спуск (минимизирует регуляризованные эмпирические потери за счет стохастического градиентного спуска)
)
# Решающее дерево
from sklearn.tree import(
#DecisionTreeClassifier, # Решающее дерево. Классификация
DecisionTreeRegressor # Решающее дерево. Регрессия
)
# Случайный лес
from sklearn.ensemble import(
#RandomForestClassifier, # Случайный лес. Классификация
RandomForestRegressor # Случайный лес. Регрессия
)
# Машина опорных векторов
from sklearn.svm import(
SVR # # Линейная модель. Регрессия с использованием опорных векторов
)
# Нейронная сеть
from sklearn.neural_network import(
MLPRegressor # Нейронная сеть. Регрессия
)
# CatBoost (made in Yandex)
from catboost import(
CatBoostRegressor # CatBoost (Яндекс). Регрессия
)
# LightGBM
from lightgbm import(
LGBMRegressor # LightGBM. Регрессия
)
# Метрики (Показатели качества моделей)
from sklearn.metrics import(
# Метрики для моделей регрессии
mean_absolute_error, # MAE, Средняя абсолютная ошибка (не чувствительная к выбросам)
mean_absolute_percentage_error, # MAPE, Средняя абсолютная ошибка в % (универсальная в %)
mean_squared_error, # MSE, Средняя квадратичная ошибка (дисперсия, чувствительная к выбросам), RMSE (сигма) = mean_squared_error(test_y, preds, squared=False)
r2_score, # R^2, Коэффициент детерминации (универсальная в %, чувствительная к выбросам, может быть отрицательной и возвращать NaN)
# Другое
make_scorer, # Функция для использования собственных функций в параметре scoring функции HalvingGridSearchCV
ConfusionMatrixDisplay
)
# Визуализация графиков
import seaborn as sns
import matplotlib
%matplotlib inline
from matplotlib import pyplot as plt
from matplotlib import rcParams, rcParamsDefault
from pandas.plotting import scatter_matrix
# Для поиска совпадений
# в названиях населённых пунктов
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
# Улучшенная функция
# определения корреляции
# (возвращает сообщение о том,
# что модуль не найден)
!pip3 install phik
import phik
Collecting category_encoders Downloading category_encoders-2.6.2-py2.py3-none-any.whl (81 kB) |████████████████████████████████| 81 kB 1.1 MB/s eta 0:00:011 Requirement already satisfied: scikit-learn>=0.20.0 in /opt/conda/lib/python3.9/site-packages (from category_encoders) (0.24.1) Requirement already satisfied: scipy>=1.0.0 in /opt/conda/lib/python3.9/site-packages (from category_encoders) (1.9.1) Requirement already satisfied: patsy>=0.5.1 in /opt/conda/lib/python3.9/site-packages (from category_encoders) (0.5.2) Requirement already satisfied: statsmodels>=0.9.0 in /opt/conda/lib/python3.9/site-packages (from category_encoders) (0.13.2) Requirement already satisfied: pandas>=1.0.5 in /opt/conda/lib/python3.9/site-packages (from category_encoders) (1.2.4) Requirement already satisfied: numpy>=1.14.0 in /opt/conda/lib/python3.9/site-packages (from category_encoders) (1.21.1) Requirement already satisfied: python-dateutil>=2.7.3 in /opt/conda/lib/python3.9/site-packages (from pandas>=1.0.5->category_encoders) (2.8.1) Requirement already satisfied: pytz>=2017.3 in /opt/conda/lib/python3.9/site-packages (from pandas>=1.0.5->category_encoders) (2021.1) Requirement already satisfied: six in /opt/conda/lib/python3.9/site-packages (from patsy>=0.5.1->category_encoders) (1.16.0) Requirement already satisfied: threadpoolctl>=2.0.0 in /opt/conda/lib/python3.9/site-packages (from scikit-learn>=0.20.0->category_encoders) (3.1.0) Requirement already satisfied: joblib>=0.11 in /opt/conda/lib/python3.9/site-packages (from scikit-learn>=0.20.0->category_encoders) (1.1.0) Requirement already satisfied: packaging>=21.3 in /opt/conda/lib/python3.9/site-packages (from statsmodels>=0.9.0->category_encoders) (21.3) Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /opt/conda/lib/python3.9/site-packages (from packaging>=21.3->statsmodels>=0.9.0->category_encoders) (2.4.7) Installing collected packages: category-encoders Successfully installed category-encoders-2.6.2
/opt/conda/lib/python3.9/site-packages/fuzzywuzzy/fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')
Collecting phik Downloading phik-0.12.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (679 kB) |████████████████████████████████| 679 kB 953 kB/s eta 0:00:01 Requirement already satisfied: joblib>=0.14.1 in /opt/conda/lib/python3.9/site-packages (from phik) (1.1.0) Requirement already satisfied: numpy>=1.18.0 in /opt/conda/lib/python3.9/site-packages (from phik) (1.21.1) Requirement already satisfied: pandas>=0.25.1 in /opt/conda/lib/python3.9/site-packages (from phik) (1.2.4) Requirement already satisfied: matplotlib>=2.2.3 in /opt/conda/lib/python3.9/site-packages (from phik) (3.3.4) Requirement already satisfied: scipy>=1.5.2 in /opt/conda/lib/python3.9/site-packages (from phik) (1.9.1) Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.9/site-packages (from matplotlib>=2.2.3->phik) (1.4.4) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in /opt/conda/lib/python3.9/site-packages (from matplotlib>=2.2.3->phik) (2.4.7) Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.9/site-packages (from matplotlib>=2.2.3->phik) (0.11.0) Requirement already satisfied: python-dateutil>=2.1 in /opt/conda/lib/python3.9/site-packages (from matplotlib>=2.2.3->phik) (2.8.1) Requirement already satisfied: pillow>=6.2.0 in /opt/conda/lib/python3.9/site-packages (from matplotlib>=2.2.3->phik) (8.4.0) Requirement already satisfied: pytz>=2017.3 in /opt/conda/lib/python3.9/site-packages (from pandas>=0.25.1->phik) (2021.1) Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.9/site-packages (from python-dateutil>=2.1->matplotlib>=2.2.3->phik) (1.16.0) Installing collected packages: phik Successfully installed phik-0.12.3
# Отображение всех столбцов таблицы
pd.set_option('display.max_columns', None)
# Обязательно для нормального отображения графиков plt
rcParams['figure.figsize'] = 10, 6
%config InlineBackend.figure_format = 'svg'
# Дополнительно и не обязательно для декорирования графиков
factor = .8
default_dpi = rcParamsDefault['figure.dpi']
rcParams['figure.dpi'] = default_dpi * factor
# Глобальная переменная
# для функций со случайными значениями
STATE = 42
Загрузка и изучение данных¶
# Загрузка данных
def read_csv_file(path1, path2):
if os.path.exists(path1):
data = pd.read_csv(path1)
elif os.path.exists(path2):
data = pd.read_csv(path2)
else:
print('Файл не найден')
return data
data = read_csv_file(
'/datasets/autos.csv',
'datasets/autos.csv'
)
# Первичный анализ данных
print(data.info())
data.head(10)
<class 'pandas.core.frame.DataFrame'> RangeIndex: 354369 entries, 0 to 354368 Data columns (total 16 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 DateCrawled 354369 non-null object 1 Price 354369 non-null int64 2 VehicleType 316879 non-null object 3 RegistrationYear 354369 non-null int64 4 Gearbox 334536 non-null object 5 Power 354369 non-null int64 6 Model 334664 non-null object 7 Kilometer 354369 non-null int64 8 RegistrationMonth 354369 non-null int64 9 FuelType 321474 non-null object 10 Brand 354369 non-null object 11 Repaired 283215 non-null object 12 DateCreated 354369 non-null object 13 NumberOfPictures 354369 non-null int64 14 PostalCode 354369 non-null int64 15 LastSeen 354369 non-null object dtypes: int64(7), object(9) memory usage: 43.3+ MB None
DateCrawled | Price | VehicleType | RegistrationYear | Gearbox | Power | Model | Kilometer | RegistrationMonth | FuelType | Brand | Repaired | DateCreated | NumberOfPictures | PostalCode | LastSeen | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2016-03-24 11:52:17 | 480 | NaN | 1993 | manual | 0 | golf | 150000 | 0 | petrol | volkswagen | NaN | 2016-03-24 00:00:00 | 0 | 70435 | 2016-04-07 03:16:57 |
1 | 2016-03-24 10:58:45 | 18300 | coupe | 2011 | manual | 190 | NaN | 125000 | 5 | gasoline | audi | yes | 2016-03-24 00:00:00 | 0 | 66954 | 2016-04-07 01:46:50 |
2 | 2016-03-14 12:52:21 | 9800 | suv | 2004 | auto | 163 | grand | 125000 | 8 | gasoline | jeep | NaN | 2016-03-14 00:00:00 | 0 | 90480 | 2016-04-05 12:47:46 |
3 | 2016-03-17 16:54:04 | 1500 | small | 2001 | manual | 75 | golf | 150000 | 6 | petrol | volkswagen | no | 2016-03-17 00:00:00 | 0 | 91074 | 2016-03-17 17:40:17 |
4 | 2016-03-31 17:25:20 | 3600 | small | 2008 | manual | 69 | fabia | 90000 | 7 | gasoline | skoda | no | 2016-03-31 00:00:00 | 0 | 60437 | 2016-04-06 10:17:21 |
5 | 2016-04-04 17:36:23 | 650 | sedan | 1995 | manual | 102 | 3er | 150000 | 10 | petrol | bmw | yes | 2016-04-04 00:00:00 | 0 | 33775 | 2016-04-06 19:17:07 |
6 | 2016-04-01 20:48:51 | 2200 | convertible | 2004 | manual | 109 | 2_reihe | 150000 | 8 | petrol | peugeot | no | 2016-04-01 00:00:00 | 0 | 67112 | 2016-04-05 18:18:39 |
7 | 2016-03-21 18:54:38 | 0 | sedan | 1980 | manual | 50 | other | 40000 | 7 | petrol | volkswagen | no | 2016-03-21 00:00:00 | 0 | 19348 | 2016-03-25 16:47:58 |
8 | 2016-04-04 23:42:13 | 14500 | bus | 2014 | manual | 125 | c_max | 30000 | 8 | petrol | ford | NaN | 2016-04-04 00:00:00 | 0 | 94505 | 2016-04-04 23:42:13 |
9 | 2016-03-17 10:53:50 | 999 | small | 1998 | manual | 101 | golf | 150000 | 0 | NaN | volkswagen | NaN | 2016-03-17 00:00:00 | 0 | 27472 | 2016-03-31 17:17:06 |
# Анализ значений датафрейма
data.hist()
plt.subplots_adjust(wspace=.4, hspace=.5)
data.describe()
Price | RegistrationYear | Power | Kilometer | RegistrationMonth | NumberOfPictures | PostalCode | |
---|---|---|---|---|---|---|---|
count | 354369.000000 | 354369.000000 | 354369.000000 | 354369.000000 | 354369.000000 | 354369.0 | 354369.000000 |
mean | 4416.656776 | 2004.234448 | 110.094337 | 128211.172535 | 5.714645 | 0.0 | 50508.689087 |
std | 4514.158514 | 90.227958 | 189.850405 | 37905.341530 | 3.726421 | 0.0 | 25783.096248 |
min | 0.000000 | 1000.000000 | 0.000000 | 5000.000000 | 0.000000 | 0.0 | 1067.000000 |
25% | 1050.000000 | 1999.000000 | 69.000000 | 125000.000000 | 3.000000 | 0.0 | 30165.000000 |
50% | 2700.000000 | 2003.000000 | 105.000000 | 150000.000000 | 6.000000 | 0.0 | 49413.000000 |
75% | 6400.000000 | 2008.000000 | 143.000000 | 150000.000000 | 9.000000 | 0.0 | 71083.000000 |
max | 20000.000000 | 9999.000000 | 20000.000000 | 150000.000000 | 12.000000 | 0.0 | 99998.000000 |
# Подсчёт пропусков
data_shape = data.shape[0]
print('Всего объектов:', data_shape)
print()
print('Количество объектов с пропусками в признаках:')
for i in data.columns:
if data_shape - data[i].loc[data[i].notna()].shape[0] > 0:
_a = data_shape - data[i].loc[data[i].notna()].shape[0]
_b = int((1 - data[i].loc[data[i].notna()].shape[0] / data_shape) * 100)
_c = data[i].dtype
print(f'{i} ({_c})\t= {_a} ({_b}%)')
Всего объектов: 354369 Количество объектов с пропусками в признаках: VehicleType (object) = 37490 (10%) Gearbox (object) = 19833 (5%) Model (object) = 19705 (5%) FuelType (object) = 32895 (9%) Repaired (object) = 71154 (20%)
# Анализ значений атрибута "RegistrationYear"
print('Уникальные значения атрибута "RegistrationYear":')
print(np.sort(data['RegistrationYear'].unique()))
print()
print('Количество значений атрибута "RegistrationYear", которые меньше 1990 и больше 2023:',
data.loc[
(data['RegistrationYear'] < 1900) |
(data['RegistrationYear'] > 2023),
'RegistrationYear'
].count(),
'это',
(data.loc[
(data['RegistrationYear'] < 1900) |
(data['RegistrationYear'] > 2023),
'RegistrationYear'
].count() / data_shape) * 100, '%'
)
Уникальные значения атрибута "RegistrationYear": [1000 1001 1039 1111 1200 1234 1253 1255 1300 1400 1500 1600 1602 1688 1800 1910 1915 1919 1920 1923 1925 1927 1928 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 1940 1941 1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2066 2200 2222 2290 2500 2800 2900 3000 3200 3500 3700 3800 4000 4100 4500 4800 5000 5300 5555 5600 5900 5911 6000 6500 7000 7100 7500 7800 8000 8200 8455 8500 8888 9000 9229 9450 9996 9999] Количество значений атрибута "RegistrationYear", которые меньше 1990 и больше 2023: 171 это 0.048254785266205566 %
# Анализ минимальна значений атрибута "Power"
string = f"{data.loc[data['Power'] < .75, 'Power'].count()} объектов имеют мощность двигателя меньше 0.75 л.с. "
string += f"Это {data.loc[data['Power'] < .75, 'Power'].count() / data_shape * 100}% от всего количества объектов. "
print(string, '\n')
print('Примеры подобных объектов:')
data.loc[data['Power'] < .75].head()
print()
40225 объектов имеют мощность двигателя меньше 0.75 л.с. Это 11.351162206626427% от всего количества объектов. Примеры подобных объектов:
# Анализ максимальная значений атрибута "Power"
string = f"{data.loc[data['Power'] > 5000, 'Power'].count()} объектов имеют мощность двигателя больше 5000 л.с. "
string += f"Это {data.loc[data['Power'] > 5000, 'Power'].count() / data_shape * 100}% от всего количества объектов. "
print(string, '\n')
print('Примеры подобных объектов:')
data.loc[data['Power'] > 5000].head()
82 объектов имеют мощность двигателя больше 5000 л.с. Это 0.023139721589642434% от всего количества объектов. Примеры подобных объектов:
DateCrawled | Price | VehicleType | RegistrationYear | Gearbox | Power | Model | Kilometer | RegistrationMonth | FuelType | Brand | Repaired | DateCreated | NumberOfPictures | PostalCode | LastSeen | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
7661 | 2016-04-02 19:25:25 | 1499 | small | 1999 | manual | 7515 | lupo | 150000 | 4 | petrol | volkswagen | NaN | 2016-04-02 00:00:00 | 0 | 65830 | 2016-04-06 11:46:49 |
11039 | 2016-03-25 19:55:32 | 0 | sedan | 1998 | manual | 10317 | other | 150000 | 8 | petrol | fiat | no | 2016-03-25 00:00:00 | 0 | 57520 | 2016-04-01 19:16:33 |
25232 | 2016-03-28 19:57:39 | 10900 | bus | 2009 | manual | 10520 | caddy | 150000 | 6 | gasoline | volkswagen | no | 2016-03-28 00:00:00 | 0 | 36272 | 2016-04-07 02:47:02 |
33952 | 2016-03-09 11:37:03 | 3740 | small | 2006 | manual | 6920 | aygo | 90000 | 10 | NaN | toyota | no | 2016-03-09 00:00:00 | 0 | 94116 | 2016-03-17 05:16:32 |
44520 | 2016-03-10 22:37:21 | 2500 | convertible | 1998 | manual | 7512 | golf | 150000 | 6 | NaN | volkswagen | NaN | 2016-03-10 00:00:00 | 0 | 68239 | 2016-04-05 15:17:50 |
# Анализ уникальных значений
# категориальных текстовых признаков
for i in data.select_dtypes(include='object').columns:
print(f'Уникальные значения признака "{i}":', data[i].unique())
print(f'Всего унимальных значений признака "{i}":', len(data[i].unique()))
print()
Уникальные значения признака "DateCrawled": ['2016-03-24 11:52:17' '2016-03-24 10:58:45' '2016-03-14 12:52:21' ... '2016-03-21 09:50:58' '2016-03-14 17:48:27' '2016-03-19 18:57:12'] Всего унимальных значений признака "DateCrawled": 271174 Уникальные значения признака "VehicleType": [nan 'coupe' 'suv' 'small' 'sedan' 'convertible' 'bus' 'wagon' 'other'] Всего унимальных значений признака "VehicleType": 9 Уникальные значения признака "Gearbox": ['manual' 'auto' nan] Всего унимальных значений признака "Gearbox": 3 Уникальные значения признака "Model": ['golf' nan 'grand' 'fabia' '3er' '2_reihe' 'other' 'c_max' '3_reihe' 'passat' 'navara' 'ka' 'polo' 'twingo' 'a_klasse' 'scirocco' '5er' 'meriva' 'arosa' 'c4' 'civic' 'transporter' 'punto' 'e_klasse' 'clio' 'kadett' 'kangoo' 'corsa' 'one' 'fortwo' '1er' 'b_klasse' 'signum' 'astra' 'a8' 'jetta' 'fiesta' 'c_klasse' 'micra' 'vito' 'sprinter' '156' 'escort' 'forester' 'xc_reihe' 'scenic' 'a4' 'a1' 'insignia' 'combo' 'focus' 'tt' 'a6' 'jazz' 'omega' 'slk' '7er' '80' '147' '100' 'z_reihe' 'sportage' 'sorento' 'v40' 'ibiza' 'mustang' 'eos' 'touran' 'getz' 'a3' 'almera' 'megane' 'lupo' 'r19' 'zafira' 'caddy' 'mondeo' 'cordoba' 'colt' 'impreza' 'vectra' 'berlingo' 'tiguan' 'i_reihe' 'espace' 'sharan' '6_reihe' 'panda' 'up' 'seicento' 'ceed' '5_reihe' 'yeti' 'octavia' 'mii' 'rx_reihe' '6er' 'modus' 'fox' 'matiz' 'beetle' 'c1' 'rio' 'touareg' 'logan' 'spider' 'cuore' 's_max' 'a2' 'galaxy' 'c3' 'viano' 's_klasse' '1_reihe' 'avensis' 'roomster' 'sl' 'kaefer' 'santa' 'cooper' 'leon' '4_reihe' 'a5' '500' 'laguna' 'ptcruiser' 'clk' 'primera' 'x_reihe' 'exeo' '159' 'transit' 'juke' 'qashqai' 'carisma' 'accord' 'corolla' 'lanos' 'phaeton' 'verso' 'swift' 'rav' 'picanto' 'boxster' 'kalos' 'superb' 'stilo' 'alhambra' 'mx_reihe' 'roadster' 'ypsilon' 'cayenne' 'galant' 'justy' '90' 'sirion' 'crossfire' 'agila' 'duster' 'cr_reihe' 'v50' 'c_reihe' 'v_klasse' 'm_klasse' 'yaris' 'c5' 'aygo' 'cc' 'carnival' 'fusion' '911' 'bora' 'forfour' 'm_reihe' 'cl' 'tigra' '300c' 'spark' 'v70' 'kuga' 'x_type' 'ducato' 's_type' 'x_trail' 'toledo' 'altea' 'voyager' 'calibra' 'bravo' 'antara' 'tucson' 'citigo' 'jimny' 'wrangler' 'lybra' 'q7' 'lancer' 'captiva' 'c2' 'discovery' 'freelander' 'sandero' 'note' '900' 'cherokee' 'clubman' 'samara' 'defender' '601' 'cx_reihe' 'legacy' 'pajero' 'auris' 'niva' 's60' 'nubira' 'vivaro' 'g_klasse' 'lodgy' '850' 'range_rover' 'q3' 'serie_2' 'glk' 'charade' 'croma' 'outlander' 'doblo' 'musa' 'move' '9000' 'v60' '145' 'aveo' '200' 'b_max' 'range_rover_sport' 'terios' 'rangerover' 'q5' 'range_rover_evoque' 'materia' 'delta' 'gl' 'kalina' 'amarok' 'elefantino' 'i3' 'kappa' 'serie_3' 'serie_1'] Всего унимальных значений признака "Model": 251 Уникальные значения признака "FuelType": ['petrol' 'gasoline' nan 'lpg' 'other' 'hybrid' 'cng' 'electric'] Всего унимальных значений признака "FuelType": 8 Уникальные значения признака "Brand": ['volkswagen' 'audi' 'jeep' 'skoda' 'bmw' 'peugeot' 'ford' 'mazda' 'nissan' 'renault' 'mercedes_benz' 'opel' 'seat' 'citroen' 'honda' 'fiat' 'mini' 'smart' 'hyundai' 'sonstige_autos' 'alfa_romeo' 'subaru' 'volvo' 'mitsubishi' 'kia' 'suzuki' 'lancia' 'toyota' 'chevrolet' 'dacia' 'daihatsu' 'trabant' 'saab' 'chrysler' 'jaguar' 'daewoo' 'porsche' 'rover' 'land_rover' 'lada'] Всего унимальных значений признака "Brand": 40 Уникальные значения признака "Repaired": [nan 'yes' 'no'] Всего унимальных значений признака "Repaired": 3 Уникальные значения признака "DateCreated": ['2016-03-24 00:00:00' '2016-03-14 00:00:00' '2016-03-17 00:00:00' '2016-03-31 00:00:00' '2016-04-04 00:00:00' '2016-04-01 00:00:00' '2016-03-21 00:00:00' '2016-03-26 00:00:00' '2016-04-07 00:00:00' '2016-03-15 00:00:00' '2016-03-11 00:00:00' '2016-03-20 00:00:00' '2016-03-23 00:00:00' '2016-03-27 00:00:00' '2016-03-12 00:00:00' '2016-03-13 00:00:00' '2016-03-18 00:00:00' '2016-03-10 00:00:00' '2016-03-07 00:00:00' '2016-03-09 00:00:00' '2016-03-08 00:00:00' '2016-04-03 00:00:00' '2016-03-29 00:00:00' '2016-03-25 00:00:00' '2016-03-28 00:00:00' '2016-03-30 00:00:00' '2016-03-22 00:00:00' '2016-02-09 00:00:00' '2016-03-05 00:00:00' '2016-04-02 00:00:00' '2016-03-16 00:00:00' '2016-03-19 00:00:00' '2016-04-05 00:00:00' '2016-03-06 00:00:00' '2016-02-12 00:00:00' '2016-03-03 00:00:00' '2016-03-01 00:00:00' '2016-03-04 00:00:00' '2016-04-06 00:00:00' '2016-02-15 00:00:00' '2016-02-24 00:00:00' '2016-02-27 00:00:00' '2015-03-20 00:00:00' '2016-02-28 00:00:00' '2016-02-17 00:00:00' '2016-01-27 00:00:00' '2016-02-20 00:00:00' '2016-02-29 00:00:00' '2016-02-10 00:00:00' '2016-02-23 00:00:00' '2016-02-21 00:00:00' '2015-11-02 00:00:00' '2016-02-19 00:00:00' '2016-02-26 00:00:00' '2016-02-11 00:00:00' '2016-01-10 00:00:00' '2016-02-06 00:00:00' '2016-02-18 00:00:00' '2016-01-29 00:00:00' '2016-03-02 00:00:00' '2015-12-06 00:00:00' '2016-01-24 00:00:00' '2016-01-30 00:00:00' '2016-02-02 00:00:00' '2016-02-16 00:00:00' '2016-02-13 00:00:00' '2016-02-05 00:00:00' '2016-02-22 00:00:00' '2015-11-17 00:00:00' '2014-03-10 00:00:00' '2016-02-07 00:00:00' '2016-01-23 00:00:00' '2016-02-25 00:00:00' '2016-02-14 00:00:00' '2016-01-02 00:00:00' '2015-09-04 00:00:00' '2015-11-12 00:00:00' '2015-12-27 00:00:00' '2015-11-24 00:00:00' '2016-01-20 00:00:00' '2016-02-03 00:00:00' '2015-12-05 00:00:00' '2015-08-07 00:00:00' '2016-01-28 00:00:00' '2016-01-31 00:00:00' '2016-02-08 00:00:00' '2016-01-07 00:00:00' '2016-01-22 00:00:00' '2016-01-18 00:00:00' '2016-01-08 00:00:00' '2015-11-23 00:00:00' '2016-01-13 00:00:00' '2016-01-17 00:00:00' '2016-01-15 00:00:00' '2015-11-08 00:00:00' '2016-01-26 00:00:00' '2016-02-04 00:00:00' '2016-01-25 00:00:00' '2016-01-16 00:00:00' '2015-08-10 00:00:00' '2016-01-03 00:00:00' '2016-01-19 00:00:00' '2015-12-30 00:00:00' '2016-02-01 00:00:00' '2015-12-17 00:00:00' '2015-11-10 00:00:00' '2016-01-06 00:00:00' '2015-09-09 00:00:00' '2015-06-18 00:00:00'] Всего унимальных значений признака "DateCreated": 109 Уникальные значения признака "LastSeen": ['2016-04-07 03:16:57' '2016-04-07 01:46:50' '2016-04-05 12:47:46' ... '2016-03-19 20:44:43' '2016-03-29 10:17:23' '2016-03-21 10:42:49'] Всего унимальных значений признака "LastSeen": 179150
# Анализ нефвных совпадений
# признака "Model"
for i in data['Model'].fillna('no_value').unique():
print(i, '~', process.extract(i, data['Model'].fillna('no_value').unique(), limit=3))
golf ~ [('golf', 100), ('gl', 67), ('twingo', 60)] no_value ~ [('no_value', 100), ('altea', 60), ('lupo', 51)] grand ~ [('grand', 100), ('panda', 60), ('logan', 60)] fabia ~ [('fabia', 100), ('ibiza', 60), ('agila', 60)] 3er ~ [('3er', 100), ('5er', 67), ('1er', 67)] 2_reihe ~ [('2_reihe', 100), ('3_reihe', 86), ('z_reihe', 86)] other ~ [('other', 100), ('transporter', 72), ('boxster', 67)] c_max ~ [('c_max', 100), ('s_max', 80), ('b_max', 80)] 3_reihe ~ [('3_reihe', 100), ('2_reihe', 86), ('z_reihe', 86)] passat ~ [('passat', 100), ('tt', 60), ('arosa', 55)] navara ~ [('navara', 100), ('rav', 72), ('niva', 68)] ka ~ [('ka', 100), ('kadett', 90), ('kangoo', 90)] polo ~ [('polo', 100), ('doblo', 67), ('toledo', 60)] twingo ~ [('twingo', 100), ('elefantino', 72), ('citigo', 67)] a_klasse ~ [('a_klasse', 100), ('e_klasse', 88), ('b_klasse', 88)] scirocco ~ [('scirocco', 100), ('cc', 90), ('clio', 68)] 5er ~ [('5er', 100), ('3er', 67), ('1er', 67)] meriva ~ [('meriva', 100), ('materia', 77), ('niva', 68)] arosa ~ [('arosa', 100), ('carisma', 67), ('corsa', 60)] c4 ~ [('c4', 100), ('300c', 60), ('a4', 50)] civic ~ [('civic', 100), ('mii', 60), ('cc', 57)] transporter ~ [('transporter', 100), ('other', 72), ('note', 68)] punto ~ [('punto', 100), ('picanto', 67), ('ducato', 55)] e_klasse ~ [('e_klasse', 100), ('a_klasse', 88), ('b_klasse', 88)] clio ~ [('clio', 100), ('cl', 90), ('scirocco', 68)] kadett ~ [('kadett', 100), ('ka', 90), ('tt', 90)] kangoo ~ [('kangoo', 100), ('ka', 90), ('aygo', 68)] corsa ~ [('corsa', 100), ('cordoba', 67), ('carisma', 67)] one ~ [('one', 100), ('phaeton', 72), ('ypsilon', 72)] fortwo ~ [('fortwo', 100), ('sorento', 62), ('forfour', 62)] 1er ~ [('1er', 100), ('3er', 67), ('5er', 67)] b_klasse ~ [('b_klasse', 100), ('a_klasse', 88), ('e_klasse', 88)] signum ~ [('signum', 100), ('insignia', 57), ('tiguan', 50)] astra ~ [('astra', 100), ('antara', 73), ('rav', 72)] a8 ~ [('a8', 100), ('meriva', 60), ('corsa', 60)] jetta ~ [('jetta', 100), ('tt', 90), ('a8', 60)] fiesta ~ [('fiesta', 100), ('a8', 60), ('a4', 60)] c_klasse ~ [('c_klasse', 100), ('a_klasse', 88), ('e_klasse', 88)] micra ~ [('micra', 100), ('rav', 72), ('corsa', 60)] vito ~ [('vito', 100), ('viano', 67), ('citigo', 60)] sprinter ~ [('sprinter', 100), ('spider', 71), ('note', 68)] 156 ~ [('156', 100), ('159', 67), ('145', 67)] escort ~ [('escort', 100), ('colt', 68), ('sorento', 62)] forester ~ [('forester', 100), ('boxster', 67), ('other', 62)] xc_reihe ~ [('xc_reihe', 100), ('x_reihe', 93), ('c_reihe', 93)] scenic ~ [('scenic', 100), ('scirocco', 57), ('seicento', 57)] a4 ~ [('a4', 100), ('meriva', 60), ('corsa', 60)] a1 ~ [('a1', 100), ('meriva', 60), ('corsa', 60)] insignia ~ [('insignia', 100), ('niva', 77), ('a8', 60)] combo ~ [('combo', 100), ('croma', 60), ('doblo', 60)] focus ~ [('focus', 100), ('modus', 60), ('fox', 60)] tt ~ [('tt', 100), ('kadett', 90), ('jetta', 90)] a6 ~ [('a6', 100), ('meriva', 60), ('corsa', 60)] jazz ~ [('jazz', 100), ('ka', 45), ('a8', 45)] omega ~ [('omega', 100), ('megane', 73), ('one', 60)] slk ~ [('slk', 100), ('sl', 90), ('clk', 67)] 7er ~ [('7er', 100), ('3er', 67), ('5er', 67)] 80 ~ [('80', 100), ('850', 80), ('a8', 50)] 147 ~ [('147', 100), ('145', 67), ('c4', 45)] 100 ~ [('100', 100), ('500', 67), ('900', 67)] z_reihe ~ [('z_reihe', 100), ('2_reihe', 86), ('3_reihe', 86)] sportage ~ [('sportage', 100), ('range_rover_sport', 69), ('bora', 68)] sorento ~ [('sorento', 100), ('seicento', 67), ('fortwo', 62)] v40 ~ [('v40', 100), ('v50', 67), ('v70', 67)] ibiza ~ [('ibiza', 100), ('fabia', 60), ('a8', 60)] mustang ~ [('mustang', 100), ('musa', 73), ('gl', 60)] eos ~ [('eos', 100), ('mondeo', 72), ('terios', 67)] touran ~ [('touran', 100), ('bora', 68), ('tiguan', 67)] getz ~ [('getz', 100), ('sportage', 60), ('voyager', 51)] a3 ~ [('a3', 100), ('meriva', 60), ('corsa', 60)] almera ~ [('almera', 100), ('altea', 73), ('rav', 72)] megane ~ [('megane', 100), ('omega', 73), ('one', 60)] lupo ~ [('lupo', 100), ('up', 90), ('no_value', 51)] r19 ~ [('r19', 100), ('159', 67), ('90', 60)] zafira ~ [('zafira', 100), ('rav', 72), ('calibra', 62)] caddy ~ [('caddy', 100), ('cayenne', 50), ('charade', 50)] mondeo ~ [('mondeo', 100), ('eos', 72), ('one', 67)] cordoba ~ [('cordoba', 100), ('corolla', 71), ('corsa', 67)] colt ~ [('colt', 100), ('escort', 68), ('cl', 67)] impreza ~ [('impreza', 100), ('a8', 60), ('a4', 60)] vectra ~ [('vectra', 100), ('rav', 72), ('a8', 60)] berlingo ~ [('berlingo', 100), ('golf', 60), ('3er', 60)] tiguan ~ [('tiguan', 100), ('tigra', 73), ('touran', 67)] i_reihe ~ [('i_reihe', 100), ('2_reihe', 86), ('3_reihe', 86)] espace ~ [('espace', 100), ('eos', 60), ('ceed', 60)] sharan ~ [('sharan', 100), ('samara', 67), ('charade', 62)] 6_reihe ~ [('6_reihe', 100), ('2_reihe', 86), ('3_reihe', 86)] panda ~ [('panda', 100), ('grand', 60), ('santa', 60)] up ~ [('up', 100), ('lupo', 90), ('superb', 90)] seicento ~ [('seicento', 100), ('sorento', 67), ('picanto', 67)] ceed ~ [('ceed', 100), ('espace', 60), ('cayenne', 55)] 5_reihe ~ [('5_reihe', 100), ('2_reihe', 86), ('3_reihe', 86)] yeti ~ [('yeti', 100), ('i3', 60), ('x_type', 51)] octavia ~ [('octavia', 100), ('rav', 60), ('carisma', 57)] mii ~ [('mii', 100), ('civic', 60), ('micra', 60)] rx_reihe ~ [('rx_reihe', 100), ('x_reihe', 93), ('xc_reihe', 88)] 6er ~ [('6er', 100), ('3er', 67), ('5er', 67)] modus ~ [('modus', 100), ('musa', 67), ('focus', 60)] fox ~ [('fox', 100), ('fortwo', 60), ('forester', 60)] matiz ~ [('matiz', 100), ('materia', 67), ('elefantino', 54)] beetle ~ [('beetle', 100), ('leon', 60), ('toledo', 50)] c1 ~ [('c1', 100), ('300c', 60), ('c4', 50)] rio ~ [('rio', 100), ('sirion', 90), ('terios', 90)] touareg ~ [('touareg', 100), ('touran', 62), ('gl', 60)] logan ~ [('logan', 100), ('leon', 67), ('grand', 60)] spider ~ [('spider', 100), ('sprinter', 71), ('superb', 67)] cuore ~ [('cuore', 100), ('corsa', 60), ('one', 60)] s_max ~ [('s_max', 100), ('c_max', 80), ('b_max', 80)] a2 ~ [('a2', 100), ('meriva', 60), ('corsa', 60)] galaxy ~ [('galaxy', 100), ('galant', 67), ('glk', 60)] c3 ~ [('c3', 100), ('300c', 60), ('c4', 50)] viano ~ [('viano', 100), ('vivaro', 73), ('vito', 67)] s_klasse ~ [('s_klasse', 100), ('a_klasse', 88), ('e_klasse', 88)] 1_reihe ~ [('1_reihe', 100), ('2_reihe', 86), ('3_reihe', 86)] avensis ~ [('avensis', 100), ('aveo', 68), ('eos', 60)] roomster ~ [('roomster', 100), ('roadster', 75), ('boxster', 67)] sl ~ [('sl', 100), ('slk', 90), ('focus', 60)] kaefer ~ [('kaefer', 100), ('ka', 90), ('3er', 60)] santa ~ [('santa', 100), ('antara', 73), ('panda', 60)] cooper ~ [('cooper', 100), ('3er', 60), ('5er', 60)] leon ~ [('leon', 100), ('ypsilon', 77), ('phaeton', 68)] 4_reihe ~ [('4_reihe', 100), ('2_reihe', 86), ('3_reihe', 86)] a5 ~ [('a5', 100), ('meriva', 60), ('corsa', 60)] 500 ~ [('500', 100), ('100', 67), ('v50', 67)] laguna ~ [('laguna', 100), ('niva', 60), ('panda', 55)] ptcruiser ~ [('ptcruiser', 100), ('duster', 66), ('primera', 62)] clk ~ [('clk', 100), ('cl', 90), ('slk', 67)] primera ~ [('primera', 100), ('rav', 72), ('sprinter', 67)] x_reihe ~ [('x_reihe', 100), ('xc_reihe', 93), ('rx_reihe', 93)] exeo ~ [('exeo', 100), ('eos', 57), ('toledo', 51)] 159 ~ [('159', 100), ('156', 67), ('r19', 67)] transit ~ [('transit', 100), ('transporter', 67), ('touran', 62)] juke ~ [('juke', 100), ('no_value', 51), ('range_rover_evoque', 51)] qashqai ~ [('qashqai', 100), ('i3', 60), ('arosa', 50)] carisma ~ [('carisma', 100), ('arosa', 67), ('yaris', 67)] accord ~ [('accord', 100), ('cc', 90), ('cordoba', 62)] corolla ~ [('corolla', 100), ('cordoba', 71), ('corsa', 67)] lanos ~ [('lanos', 100), ('arosa', 60), ('eos', 60)] phaeton ~ [('phaeton', 100), ('one', 72), ('leon', 68)] verso ~ [('verso', 100), ('range_rover_sport', 72), ('range_rover', 68)] swift ~ [('swift', 100), ('tt', 60), ('transit', 50)] rav ~ [('rav', 100), ('bravo', 90), ('navara', 72)] picanto ~ [('picanto', 100), ('punto', 67), ('seicento', 67)] boxster ~ [('boxster', 100), ('other', 67), ('forester', 67)] kalos ~ [('kalos', 100), ('ka', 90), ('arosa', 60)] superb ~ [('superb', 100), ('up', 90), ('spider', 67)] stilo ~ [('stilo', 100), ('ypsilon', 67), ('rio', 60)] alhambra ~ [('alhambra', 100), ('bora', 77), ('rav', 72)] mx_reihe ~ [('mx_reihe', 100), ('x_reihe', 93), ('m_reihe', 93)] roadster ~ [('roadster', 100), ('roomster', 75), ('duster', 71)] ypsilon ~ [('ypsilon', 100), ('leon', 77), ('one', 72)] cayenne ~ [('cayenne', 100), ('one', 60), ('ceed', 55)] galant ~ [('galant', 100), ('galaxy', 67), ('tt', 60)] justy ~ [('justy', 100), ('duster', 55), ('s_type', 55)] 90 ~ [('90', 100), ('900', 90), ('9000', 90)] sirion ~ [('sirion', 100), ('rio', 90), ('one', 72)] crossfire ~ [('crossfire', 100), ('eos', 60), ('rio', 60)] agila ~ [('agila', 100), ('fabia', 60), ('altea', 60)] duster ~ [('duster', 100), ('roadster', 71), ('ptcruiser', 66)] cr_reihe ~ [('cr_reihe', 100), ('c_reihe', 93), ('xc_reihe', 88)] v50 ~ [('v50', 100), ('v40', 67), ('500', 67)] c_reihe ~ [('c_reihe', 100), ('xc_reihe', 93), ('cr_reihe', 93)] v_klasse ~ [('v_klasse', 100), ('a_klasse', 88), ('e_klasse', 88)] m_klasse ~ [('m_klasse', 100), ('a_klasse', 88), ('e_klasse', 88)] yaris ~ [('yaris', 100), ('auris', 80), ('carisma', 67)] c5 ~ [('c5', 100), ('300c', 60), ('c4', 50)] aygo ~ [('aygo', 100), ('kangoo', 68), ('sportage', 51)] cc ~ [('cc', 100), ('scirocco', 90), ('accord', 90)] carnival ~ [('carnival', 100), ('niva', 90), ('carisma', 67)] fusion ~ [('fusion', 100), ('one', 72), ('sirion', 67)] 911 ~ [('911', 100), ('a1', 45), ('c1', 45)] bora ~ [('bora', 100), ('alhambra', 77), ('calibra', 77)] forfour ~ [('forfour', 100), ('fortwo', 62), ('fox', 60)] m_reihe ~ [('m_reihe', 100), ('mx_reihe', 93), ('2_reihe', 86)] cl ~ [('cl', 100), ('clio', 90), ('clk', 90)] tigra ~ [('tigra', 100), ('tiguan', 73), ('rav', 72)] 300c ~ [('300c', 100), ('c4', 60), ('c1', 60)] spark ~ [('spark', 100), ('ka', 60), ('espace', 55)] v70 ~ [('v70', 100), ('v40', 67), ('v50', 67)] kuga ~ [('kuga', 100), ('ka', 67), ('a8', 60)] x_type ~ [('x_type', 100), ('s_type', 83), ('yeti', 51)] ducato ~ [('ducato', 100), ('picanto', 62), ('punto', 55)] s_type ~ [('s_type', 100), ('x_type', 83), ('justy', 55)] x_trail ~ [('x_trail', 100), ('rio', 60), ('rav', 60)] toledo ~ [('toledo', 100), ('leon', 68), ('polo', 60)] altea ~ [('altea', 100), ('almera', 73), ('materia', 67)] voyager ~ [('voyager', 100), ('3er', 60), ('5er', 60)] calibra ~ [('calibra', 100), ('bora', 77), ('rav', 72)] bravo ~ [('bravo', 100), ('rav', 90), ('alhambra', 68)] antara ~ [('antara', 100), ('astra', 73), ('santa', 73)] tucson ~ [('tucson', 100), ('one', 72), ('fusion', 67)] citigo ~ [('citigo', 100), ('golf', 60), ('clio', 60)] jimny ~ [('jimny', 100), ('i3', 45), ('viano', 40)] wrangler ~ [('wrangler', 100), ('gl', 90), ('range_rover_sport', 68)] lybra ~ [('lybra', 100), ('rav', 72), ('bora', 67)] q7 ~ [('q7', 100), ('q3', 50), ('q5', 50)] lancer ~ [('lancer', 100), ('freelander', 75), ('outlander', 75)] captiva ~ [('captiva', 100), ('niva', 68), ('carnival', 67)] c2 ~ [('c2', 100), ('300c', 60), ('c4', 50)] discovery ~ [('discovery', 100), ('move', 68), ('3er', 60)] freelander ~ [('freelander', 100), ('lancer', 75), ('defender', 67)] sandero ~ [('sandero', 100), ('rio', 72), ('mondeo', 62)] note ~ [('note', 100), ('transporter', 68), ('sprinter', 68)] 900 ~ [('900', 100), ('90', 90), ('9000', 86)] cherokee ~ [('cherokee', 100), ('3er', 60), ('5er', 60)] clubman ~ [('clubman', 100), ('cl', 90), ('clk', 60)] samara ~ [('samara', 100), ('rav', 72), ('navara', 67)] defender ~ [('defender', 100), ('freelander', 67), ('3er', 60)] 601 ~ [('601', 100), ('s60', 67), ('v60', 67)] cx_reihe ~ [('cx_reihe', 100), ('x_reihe', 93), ('c_reihe', 93)] legacy ~ [('legacy', 100), ('omega', 55), ('logan', 55)] pajero ~ [('pajero', 100), ('rio', 72), ('phaeton', 62)] auris ~ [('auris', 100), ('yaris', 80), ('carisma', 67)] niva ~ [('niva', 100), ('carnival', 90), ('insignia', 77)] s60 ~ [('s60', 100), ('601', 67), ('v60', 67)] nubira ~ [('nubira', 100), ('rav', 72), ('bora', 68)] vivaro ~ [('vivaro', 100), ('viano', 73), ('rio', 72)] g_klasse ~ [('g_klasse', 100), ('a_klasse', 88), ('e_klasse', 88)] lodgy ~ [('lodgy', 100), ('logan', 60), ('legacy', 55)] 850 ~ [('850', 100), ('80', 80), ('500', 67)] range_rover ~ [('range_rover', 100), ('rangerover', 95), ('range_rover_sport', 90)] q3 ~ [('q3', 100), ('a3', 50), ('c3', 50)] serie_2 ~ [('serie_2', 100), ('serie_3', 86), ('serie_1', 86)] glk ~ [('glk', 100), ('gl', 90), ('slk', 67)] charade ~ [('charade', 100), ('sharan', 62), ('rav', 60)] croma ~ [('croma', 100), ('cordoba', 67), ('carisma', 67)] outlander ~ [('outlander', 100), ('lancer', 75), ('freelander', 63)] doblo ~ [('doblo', 100), ('polo', 67), ('combo', 60)] musa ~ [('musa', 100), ('mustang', 73), ('modus', 67)] move ~ [('move', 100), ('discovery', 68), ('range_rover', 68)] 9000 ~ [('9000', 100), ('90', 90), ('900', 86)] v60 ~ [('v60', 100), ('v40', 67), ('v50', 67)] 145 ~ [('145', 100), ('156', 67), ('147', 67)] aveo ~ [('aveo', 100), ('avensis', 68), ('phaeton', 68)] 200 ~ [('200', 100), ('100', 67), ('500', 67)] b_max ~ [('b_max', 100), ('c_max', 80), ('s_max', 80)] range_rover_sport ~ [('range_rover_sport', 100), ('range_rover', 90), ('rangerover', 81)] terios ~ [('terios', 100), ('rio', 90), ('eos', 67)] rangerover ~ [('rangerover', 100), ('range_rover', 95), ('range_rover_sport', 81)] q5 ~ [('q5', 100), ('a5', 50), ('c5', 50)] range_rover_evoque ~ [('range_rover_evoque', 100), ('range_rover', 90), ('rangerover', 81)] materia ~ [('materia', 100), ('meriva', 77), ('astra', 67)] delta ~ [('delta', 100), ('a8', 60), ('jetta', 60)] gl ~ [('gl', 100), ('wrangler', 90), ('glk', 90)] kalina ~ [('kalina', 100), ('ka', 90), ('calibra', 62)] amarok ~ [('amarok', 100), ('samara', 67), ('ka', 60)] elefantino ~ [('elefantino', 100), ('twingo', 72), ('rio', 60)] i3 ~ [('i3', 100), ('yeti', 60), ('qashqai', 60)] kappa ~ [('kappa', 100), ('ka', 90), ('kalina', 55)] serie_3 ~ [('serie_3', 100), ('serie_2', 86), ('serie_1', 86)] serie_1 ~ [('serie_1', 100), ('serie_2', 86), ('serie_3', 86)]
# Анализ нефвных совпадений
# признака "Brand"
for i in data['Brand'].unique():
print(i, '~', process.extract(i, data['Brand'].unique(), limit=3))
volkswagen ~ [('volkswagen', 100), ('volvo', 54), ('opel', 45)] audi ~ [('audi', 100), ('hyundai', 60), ('subaru', 51)] jeep ~ [('jeep', 100), ('peugeot', 36), ('chevrolet', 31)] skoda ~ [('skoda', 100), ('honda', 60), ('kia', 50)] bmw ~ [('bmw', 100), ('volkswagen', 30), ('mazda', 30)] peugeot ~ [('peugeot', 100), ('opel', 45), ('seat', 45)] ford ~ [('ford', 100), ('mercedes_benz', 45), ('alfa_romeo', 45)] mazda ~ [('mazda', 100), ('lada', 67), ('audi', 44)] nissan ~ [('nissan', 100), ('saab', 51), ('seat', 45)] renault ~ [('renault', 100), ('seat', 55), ('sonstige_autos', 51)] mercedes_benz ~ [('mercedes_benz', 100), ('ford', 45), ('seat', 45)] opel ~ [('opel', 100), ('citroen', 51), ('volkswagen', 45)] seat ~ [('seat', 100), ('smart', 67), ('renault', 55)] citroen ~ [('citroen', 100), ('opel', 51), ('chevrolet', 50)] honda ~ [('honda', 100), ('hyundai', 67), ('skoda', 60)] fiat ~ [('fiat', 100), ('daihatsu', 68), ('kia', 57)] mini ~ [('mini', 100), ('nissan', 45), ('mitsubishi', 45)] smart ~ [('smart', 100), ('seat', 67), ('subaru', 55)] hyundai ~ [('hyundai', 100), ('honda', 67), ('audi', 60)] sonstige_autos ~ [('sonstige_autos', 100), ('renault', 51), ('audi', 45)] alfa_romeo ~ [('alfa_romeo', 100), ('rover', 54), ('land_rover', 50)] subaru ~ [('subaru', 100), ('smart', 55), ('audi', 51)] volvo ~ [('volvo', 100), ('volkswagen', 54), ('chevrolet', 54)] mitsubishi ~ [('mitsubishi', 100), ('audi', 45), ('fiat', 45)] kia ~ [('kia', 100), ('suzuki', 72), ('lancia', 60)] suzuki ~ [('suzuki', 100), ('kia', 72), ('subaru', 50)] lancia ~ [('lancia', 100), ('dacia', 73), ('kia', 60)] toyota ~ [('toyota', 100), ('sonstige_autos', 40), ('audi', 36)] chevrolet ~ [('chevrolet', 100), ('chrysler', 59), ('volvo', 54)] dacia ~ [('dacia', 100), ('lancia', 73), ('daihatsu', 72)] daihatsu ~ [('daihatsu', 100), ('dacia', 72), ('fiat', 68)] trabant ~ [('trabant', 100), ('seat', 45), ('fiat', 45)] saab ~ [('saab', 100), ('nissan', 51), ('seat', 50)] chrysler ~ [('chrysler', 100), ('chevrolet', 59), ('rover', 46)] jaguar ~ [('jaguar', 100), ('subaru', 50), ('audi', 45)] daewoo ~ [('daewoo', 100), ('lada', 45), ('alfa_romeo', 38)] porsche ~ [('porsche', 100), ('ford', 45), ('seat', 45)] rover ~ [('rover', 100), ('land_rover', 90), ('alfa_romeo', 54)] land_rover ~ [('land_rover', 100), ('rover', 90), ('lada', 68)] lada ~ [('lada', 100), ('land_rover', 68), ('mazda', 67)]
# Проверка коррелируемости
# признаков датафрейма
data.corr()
Price | RegistrationYear | Power | Kilometer | RegistrationMonth | NumberOfPictures | PostalCode | |
---|---|---|---|---|---|---|---|
Price | 1.000000 | 0.026916 | 0.158872 | -0.333199 | 0.110581 | NaN | 0.076055 |
RegistrationYear | 0.026916 | 1.000000 | -0.000828 | -0.053447 | -0.011619 | NaN | -0.003459 |
Power | 0.158872 | -0.000828 | 1.000000 | 0.024002 | 0.043380 | NaN | 0.021665 |
Kilometer | -0.333199 | -0.053447 | 0.024002 | 1.000000 | 0.009571 | NaN | -0.007698 |
RegistrationMonth | 0.110581 | -0.011619 | 0.043380 | 0.009571 | 1.000000 | NaN | 0.013995 |
NumberOfPictures | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
PostalCode | 0.076055 | -0.003459 | 0.021665 | -0.007698 | 0.013995 | NaN | 1.000000 |
Выводы из анализа данных
- Датафрейм содержит 354369 объектов и 16 признаков, 7 из которых являются 64-битными целочисленными, а 9 типа
object
. Целевым является целочисленный признакPrice
. - В данных есть пропуски. Их количество в разных признаках более 1% от всего количества. Поэтому, их нельзя удалить, а требуется заменить на значение
no_value
. - Категориальные признаки
VehicleType
,FuelType
иModel
имеют значенияother
, которые можно изменить наno_value
также, как и пропуски. - По мнению автора данной работы, следующие признаки могут быть неинформативными для моделей машинного обучения:
DateCrawled
— дата скачивания анкеты из базы (может влиять на цену относительно даты размещения объявления, но незначительно)RegistrationMonth
— месяц регистрации автомобиля (большее значение имеет год регистрации)DateCreated
— дата создания анкеты (в целях предсказания цен автомобилей в будущих анкетах эта информация не акутальна)NumberOfPictures
— количество фотографий автомобиля (может влиять на цену, т.к. фотографии продоваемого объекта вызывают доверие, но признак содержит только нули)PostalCode
— почтовый индекс владельца анкеты (может влиять, если местоположение продавца и покупателя имеет значение, но это не точно)LastSeen
— дата последней активности пользователя (может указывать на продолжительность наличия объявления в сети, но этот признак в меньшей степени может влиять на цену, чем другие признаки, описывающие характеристики продаваемого авто)
- Числовые данные не распределены нормально и имеют выбросы.
- Числовые признаки имеют разный диапазон. Для использования в машинном обучении их требуется стандартизировать.
- Корреляция аттрибутов между собой и с целевым признаком слабая. Наибольшей обратной корреляцией с целевым признаком обладают признаки
RegistrationYear
. - Анализ текстовых категриальных признаков
Model
иBrand
выявил неявное совпадение значенийrange_rover
иrangerover
в признакеModel
. Эти значения следует объеденить вrange_rover
. - Названия признаков не в «змеином» стиле. Можно привести их в соответствие со «змеиный» стилем.
Данные с пропусками:
VehicleType
— тип автомобильного кузоваGearbox
— тип коробки передачModel
— модель автомобиляFuelType
— тип топливаRepaired
— была машина в ремонте или нет
Все признаки с пропусками категориального типа object
. Для них будет использована категория no_value
, указывающая на отсутствие значений.
Ненормальности в данных:
- Атрибут
RegistrationYear
содержит 171 значение с годами производства автомобилей меньше 1900 года и больше 2023 года. Объектами с этими значениями можно принебречь, т.к. их мне 1% от всего количества объектов. Их требуется удалить. - Атрибут
Power
содержит более 11% значений с мощностью двигателя выходящую за пределы известных значейни. Например, меньше, чем у самого маломощного автомобиля «Benz Patent Motorwagen», у которого мощность двигателя равна 0.75 л.с. (https://1gai-ru.turbopages.org/turbo/1gai.ru/s/blog/cars/513900-desyat-samyh-malomoschnyh-avtomobiley.html, 2023). Также атрибутPower
содержит значения мощности двигателя более 5000 л.с.. Это превышает мощность самого мощного автомобиля Devel Sixteen (https://www.driver-helper.ru/text/sovetiy/top-10-samyx-moshhnyx-serijnyx-avto-v-mire). Учитывая большую долю подобных объектов и тот факт, что подобные объекты могут появиться в эксплуатационном данных, вместо того, чтобы от них избавляться, в них следует заменить ненормальные значенияPower
на медианные для каждой группы связкиBrand
Model
.
Предобработка данных¶
# Замена пропусков
# на значение "no_value"
for i in data.columns:
if data_shape - data[i].loc[data[i].notna()].shape[0] > 0:
data.loc[data[i].isna(), i] = 'no_value'
# Замена значения "other" на "no_value"
# для унификации отсутствующей информации
# в признаках "VehicleType" и "FuellType"
data.loc[
(data['VehicleType'] == 'other') |
(data['FuelType'] == 'other') |
(data['Model'] == 'other'),
['VehicleType', 'FuelType', 'Model']
] = 'no_value'
# Определение максимальной даты просмотра объявления
# для установки в качестве предельного срока
# выпуска автомобиля
data['DateCrawled'] = pd.to_datetime(
data['DateCrawled'], format='%Y-%m-%dT%H:%M:%S'
)
date_crawled_max = data['DateCrawled'].max()
date_crawled_max
Timestamp('2016-04-07 14:36:58')
# Удаление неинформативных признаков
data = data.drop([
'DateCrawled',
'RegistrationMonth',
'DateCreated',
'NumberOfPictures',
'PostalCode',
'LastSeen'
], axis=1)
# Удаление ненормальностей в данных
# признака "RegistrationYear"
data = data.loc[
(data['RegistrationYear'] > 1900) &
(data['RegistrationYear'] < date_crawled_max.year)
]
# Замена ненормальностей в данных
# признака "Power" в более 10% объектов
#power_group_median = data.groupby(['Brand', 'Model'])['Power'].median()
power_group_median = data.pivot_table(values='Power', index=['Brand', 'Model'], aggfunc='median')
for i in power_group_median.index:
data['Power'] = np.where(
((data['Power'] < .75) | (data['Power'] > 5000)) &
((data['Power'] == i[0]) & (data['Model'] == i[1])),
power_group_median.loc[i],
data['Power']
)
data['Power']
0 0.0 1 190.0 2 163.0 3 75.0 4 69.0 ... 354364 0.0 354365 0.0 354366 101.0 354367 102.0 354368 100.0 Name: Power, Length: 330174, dtype: float64
# Объединение неявно совпадающих значений признака "Model"
# "range_rover" и "rangerover" в "range_rover"
data.loc[data['Model'] == 'rangerover', 'Model'] = 'range_rover'
# Приведение названий признаков датафрейма
# к "змеиному" стилю
data.columns = [re.sub(r'(?<!^)(?=[A-Z])', '_', i).lower() for i in data.columns]
Проверка результатов предобработкаи данных¶
# Проверка изменений
data.info()
data.head(10)
<class 'pandas.core.frame.DataFrame'> Int64Index: 330174 entries, 0 to 354368 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 price 330174 non-null int64 1 vehicle_type 330174 non-null object 2 registration_year 330174 non-null int64 3 gearbox 330174 non-null object 4 power 330174 non-null float64 5 model 330174 non-null object 6 kilometer 330174 non-null int64 7 fuel_type 330174 non-null object 8 brand 330174 non-null object 9 repaired 330174 non-null object dtypes: float64(1), int64(3), object(6) memory usage: 27.7+ MB
price | vehicle_type | registration_year | gearbox | power | model | kilometer | fuel_type | brand | repaired | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 480 | no_value | 1993 | manual | 0.0 | golf | 150000 | petrol | volkswagen | no_value |
1 | 18300 | coupe | 2011 | manual | 190.0 | no_value | 125000 | gasoline | audi | yes |
2 | 9800 | suv | 2004 | auto | 163.0 | grand | 125000 | gasoline | jeep | no_value |
3 | 1500 | small | 2001 | manual | 75.0 | golf | 150000 | petrol | volkswagen | no |
4 | 3600 | small | 2008 | manual | 69.0 | fabia | 90000 | gasoline | skoda | no |
5 | 650 | sedan | 1995 | manual | 102.0 | 3er | 150000 | petrol | bmw | yes |
6 | 2200 | convertible | 2004 | manual | 109.0 | 2_reihe | 150000 | petrol | peugeot | no |
7 | 0 | no_value | 1980 | manual | 50.0 | no_value | 40000 | no_value | volkswagen | no |
8 | 14500 | bus | 2014 | manual | 125.0 | c_max | 30000 | petrol | ford | no_value |
9 | 999 | small | 1998 | manual | 101.0 | golf | 150000 | no_value | volkswagen | no_value |
# Проверка изменений
data.hist()
plt.subplots_adjust(wspace=.4, hspace=.5)
data.describe()
price | registration_year | power | kilometer | |
---|---|---|---|---|
count | 330174.000000 | 330174.000000 | 330174.000000 | 330174.000000 |
mean | 4540.116554 | 2002.089226 | 111.900141 | 127920.581269 |
std | 4564.387345 | 6.802931 | 182.410180 | 37913.642129 |
min | 0.000000 | 1910.000000 | 0.000000 | 5000.000000 |
25% | 1149.000000 | 1999.000000 | 70.000000 | 125000.000000 |
50% | 2850.000000 | 2002.000000 | 105.000000 | 150000.000000 |
75% | 6500.000000 | 2007.000000 | 143.000000 | 150000.000000 |
max | 20000.000000 | 2015.000000 | 20000.000000 | 150000.000000 |
# Анализ уникальных значений
# категориальных текстовых признаков
for i in data.select_dtypes(include='object').columns:
print(f'Уникальные значения признака "{i}":', data[i].unique())
print(f'Всего унимальных значений признака "{i}":', len(data[i].unique()))
print()
Уникальные значения признака "vehicle_type": ['no_value' 'coupe' 'suv' 'small' 'sedan' 'convertible' 'bus' 'wagon'] Всего унимальных значений признака "vehicle_type": 8 Уникальные значения признака "gearbox": ['manual' 'auto' 'no_value'] Всего унимальных значений признака "gearbox": 3 Уникальные значения признака "model": ['golf' 'no_value' 'grand' 'fabia' '3er' '2_reihe' 'c_max' '3_reihe' 'passat' 'navara' 'ka' 'twingo' 'a_klasse' 'scirocco' '5er' 'arosa' 'civic' 'transporter' 'punto' 'e_klasse' 'corsa' 'one' 'fortwo' 'clio' '1er' 'b_klasse' 'signum' 'astra' 'a8' 'jetta' 'polo' 'fiesta' 'c_klasse' 'micra' 'sprinter' '156' 'escort' 'forester' 'xc_reihe' 'scenic' 'a4' 'a1' 'insignia' 'combo' 'focus' 'tt' 'a6' 'jazz' 'omega' 'slk' '7er' '80' '147' '100' 'meriva' 'z_reihe' 'sorento' 'v40' 'ibiza' 'mustang' 'eos' 'vito' 'touran' 'getz' 'a3' 'megane' 'lupo' 'r19' 'caddy' 'mondeo' 'cordoba' 'colt' 'impreza' 'vectra' 'berlingo' 'tiguan' 'sharan' '6_reihe' 'c4' 'panda' 'up' 'i_reihe' 'ceed' 'kangoo' '5_reihe' 'yeti' 'octavia' 'zafira' 'mii' 'rx_reihe' '6er' 'fox' 'matiz' 'beetle' 'rio' 'touareg' 'logan' 'spider' 'cuore' 's_max' 'modus' 'a2' 'galaxy' 'c3' 'viano' 's_klasse' '1_reihe' 'avensis' 'roomster' 'sl' 'kaefer' 'santa' 'cooper' 'leon' '4_reihe' 'a5' 'sportage' 'laguna' 'ptcruiser' 'clk' 'primera' 'espace' 'x_reihe' 'exeo' '159' 'transit' 'juke' 'qashqai' 'carisma' 'accord' 'corolla' 'lanos' 'phaeton' 'verso' 'swift' 'rav' 'picanto' 'boxster' 'kalos' 'superb' 'stilo' 'alhambra' 'mx_reihe' 'roadster' 'ypsilon' 'cayenne' 'galant' 'justy' '90' 'sirion' 'crossfire' 'agila' 'duster' 'v50' '500' 'c_reihe' 'v_klasse' 'm_klasse' 'yaris' 'c5' 'aygo' 'almera' 'seicento' 'cc' 'fusion' '911' 'bora' 'forfour' 'm_reihe' 'cl' 'tigra' '300c' 'cr_reihe' 'spark' 'v70' 'kuga' 'x_type' 'ducato' 's_type' 'x_trail' 'toledo' 'altea' 'voyager' 'calibra' 'carnival' 'bravo' 'antara' 'tucson' 'c1' 'kadett' 'citigo' 'jimny' 'wrangler' 'lybra' 'q7' 'lancer' 'captiva' 'discovery' 'freelander' 'sandero' 'note' '900' 'cherokee' 'clubman' 'samara' 'defender' 'cx_reihe' 'legacy' '601' 'pajero' 'c2' 'niva' 's60' 'nubira' 'vivaro' 'g_klasse' 'auris' 'lodgy' '850' 'range_rover' 'q3' 'glk' 'charade' 'croma' 'outlander' 'doblo' 'musa' 'move' '9000' 'v60' '145' '200' 'b_max' 'range_rover_sport' 'aveo' 'terios' 'q5' 'range_rover_evoque' 'materia' 'delta' 'gl' 'serie_2' 'kalina' 'elefantino' 'i3' 'amarok' 'kappa' 'serie_3' 'serie_1'] Всего унимальных значений признака "model": 249 Уникальные значения признака "fuel_type": ['petrol' 'gasoline' 'no_value' 'lpg' 'cng' 'electric' 'hybrid'] Всего унимальных значений признака "fuel_type": 7 Уникальные значения признака "brand": ['volkswagen' 'audi' 'jeep' 'skoda' 'bmw' 'peugeot' 'ford' 'mazda' 'nissan' 'renault' 'mercedes_benz' 'seat' 'honda' 'fiat' 'opel' 'mini' 'smart' 'sonstige_autos' 'alfa_romeo' 'subaru' 'volvo' 'mitsubishi' 'kia' 'hyundai' 'suzuki' 'lancia' 'citroen' 'toyota' 'chevrolet' 'dacia' 'daihatsu' 'trabant' 'saab' 'chrysler' 'jaguar' 'daewoo' 'porsche' 'rover' 'land_rover' 'lada'] Всего унимальных значений признака "brand": 40 Уникальные значения признака "repaired": ['no_value' 'yes' 'no'] Всего унимальных значений признака "repaired": 3
# Анализ количества
# удалённых объектов
print('Всего удалённо объектов:', data_shape - data.shape[0])
print(f'Доля удалённых объектов: {(1 - data.shape[0] / data_shape)*100}%')
Всего удалённо объектов: 24195 Доля удалённых объектов: 6.827628827578036%
# Анализ коррляции признаков
# после предобработки данных
data.corr()
price | registration_year | power | kilometer | |
---|---|---|---|---|
price | 1.000000 | 0.490673 | 0.164822 | -0.336981 |
registration_year | 0.490673 | 1.000000 | 0.067158 | -0.220855 |
power | 0.164822 | 0.067158 | 1.000000 | 0.027308 |
kilometer | -0.336981 | -0.220855 | 0.027308 | 1.000000 |
# Анализ коррляции признаков
# после предобработки данных
data.phik_matrix()
interval columns not set, guessing: ['price', 'registration_year', 'power', 'kilometer']
price | vehicle_type | registration_year | gearbox | power | model | kilometer | fuel_type | brand | repaired | |
---|---|---|---|---|---|---|---|---|---|---|
price | 1.000000 | 0.276073 | 0.609296 | 0.305974 | 0.005928 | 0.567335 | 0.311190 | 0.263280 | 0.356071 | 0.366852 |
vehicle_type | 0.276073 | 1.000000 | 0.209775 | 0.336422 | 0.005379 | 0.942273 | 0.164271 | 0.573069 | 0.644883 | 0.207384 |
registration_year | 0.609296 | 0.209775 | 1.000000 | 0.146034 | 0.000000 | 0.574288 | 0.307834 | 0.274391 | 0.356596 | 0.235890 |
gearbox | 0.305974 | 0.336422 | 0.146034 | 1.000000 | 0.008489 | 0.626205 | 0.070258 | 0.282519 | 0.523394 | 0.482828 |
power | 0.005928 | 0.005379 | 0.000000 | 0.008489 | 1.000000 | 0.000000 | 0.000000 | 0.000000 | 0.002184 | 0.013631 |
model | 0.567335 | 0.942273 | 0.574288 | 0.626205 | 0.000000 | 1.000000 | 0.437192 | 0.711109 | 0.997654 | 0.281206 |
kilometer | 0.311190 | 0.164271 | 0.307834 | 0.070258 | 0.000000 | 0.437192 | 1.000000 | 0.137378 | 0.276410 | 0.226748 |
fuel_type | 0.263280 | 0.573069 | 0.274391 | 0.282519 | 0.000000 | 0.711109 | 0.137378 | 1.000000 | 0.355297 | 0.194107 |
brand | 0.356071 | 0.644883 | 0.356596 | 0.523394 | 0.002184 | 0.997654 | 0.276410 | 0.355297 | 1.000000 | 0.164810 |
repaired | 0.366852 | 0.207384 | 0.235890 | 0.482828 | 0.013631 | 0.281206 | 0.226748 | 0.194107 | 0.164810 | 1.000000 |
Выводы из преварительной добработки данных
Датафрейм был успешно оптимизирован и подготовлен к использованию в машинном обучении:
- Все неопределенные значения во всех признаках заменены на
no_value
. - Удалены неинформативные признаки
DateCrawled
,RegistrationMonth
,DateCreated
,NumberOfPictures
,PostalCode
,LastSeen
. - Удален 171 объект с датой регистрации автомобиля в признаке
RegistrationYear
меньше 1900 года и старше 2023 года. - Заменены значения признака
Power
с мощностью двигателя менее 0.75 л.с. и более 5000 на медианные значения группBrand
+Model
. - Изменены неявно совпадающие значения
range_rover
иrangerover
признакаModel
на значениеrange_rover
. - Названия признаков приведены к «змеиному» стилю.
После предвартиельной обработки данных все числовые признаки имеют среднюю и слабую корреляцию между собой и с целевым признаком. Наибольшей прямой корреляцией с целевым признаком обладает признак registration_year
, а обратной корреляцией kilometer
. После предварительной обработки данных стало очевидным то, что требуется все категориальные признаки кодировать с помощью технологии One Hot Encoding.
Обучение моделей¶
Полезные функции подготовки данных и подбора моделей и их параметров¶
# Функция для кодирования категориальных текстовых признаков
# с помощью технологии One Hot Encoding (pd.get_dummies())
def features_get_dummies(features, column_name):
features = features.join(
pd.get_dummies(
data[column_name],
prefix=column_name,
prefix_sep='_',
drop_first=True
)
)
features = features.drop(column_name, axis=1)
return features
# Функция подготовки данных перед подбором моделей и их параметров
def data_preprocessing(data, target_name):
# Перемешивание объектов
# для их лучшего распределения в выборках
data = shuffle(data, random_state=STATE)
# Разделение датафреймов на целевую и нецелевую выборку
features = data.drop([target_name], axis=1)
target = data[target_name]
# Разделение целевой и нецелевой выборки
# на обучающие и тестовые выборки
features_train, features_test, target_train, target_test = train_test_split(
features,
target,
test_size=.25,
random_state=STATE
)
return features_train, features_test, target_train, target_test
# Функция создания структуры пайплайна
def params_and_model_selection(
features_train,
features_test,
target_train,
target_test,
model_params
):
start_time = time.time()
#funtion_time = %timeit
# Стандартизация числовых значений
numeric_transformer = make_pipeline(
StandardScaler()
)
# Шаг препроцессинга в Пайплайне
preprocessor = make_column_transformer(
(numeric_transformer, features_train.columns)
)
# Pipeline
pipe = Pipeline([
('preprocessor', preprocessor),
('regressor', model_params[0]['regressor'][0])
])
pipe.fit(features_train, target_train)
#predict = pipe.predict(features_test) # спорно, т.к. это должно быть в результате, а здесь должна использоваться валидационная выборка
# Передача функции ошибки через make_scorer в HalvingGridSearchCV
smape_score = make_scorer(
mean_squared_error,
squared=False # Для RMSE
)
# HalvingGridSearchCV
# (о подборе оптимальных параметров:
# https://scikit-learn.ru/3-2-tuning-the-hyper-parameters-of-an-estimator/)
#grid = HalvingRandomSearchCV(
grid = HalvingGridSearchCV(
pipe,
model_params,
cv=4, # параметр KFold для кроссвалидации (обучющая и валидационная выборки 75:25)
n_jobs=-1, # количество параллельно выполняемых заданий (-1 - задействованы все процессоры)
scoring=smape_score, # Передача функции ошибки через make_scorer в HalvingGridSearchCV
error_score='raise', #0 , #
random_state=STATE
)
grid.fit(features_train, target_train)
finish_time = time.time()
funtion_time = finish_time - start_time
return grid, funtion_time
# Вывод на печать результатов модели
def print_model_result(grids, data_times, model_name):
print('Модель :', model_name)
print('RMSE :', grids[-1].best_score_)
print(f'Время : {data_times[-1]} секунд')
print('Параметры:\n', grids[-1].best_estimator_)
print()
print('-'*20)
print()
Функции моделей¶
# LinearRegression
def grids_LinearRegression(
features_train,
features_test,
target_train,
target_test,
grids,
data_times
):
grid, time_best = params_and_model_selection(
features_train,
features_test,
target_train,
target_test,
[{
'regressor': [LinearRegression()] # score: R^2
}]
)
grids.append(grid)
data_times.append(time_best)
return grids, data_times
# DecisionTreeRegressor
def grids_DecisionTreeRegressor(
features_train,
features_test,
target_train,
target_test,
grids,
data_times
):
grids_this = 0
grids_best = 0
funtion_time = 0
time_best = 0
# Поиск "regressor__max_depth"
range_min = 1
range_max = 201
range_step = 20
for i in range(1, 5, 1):
# Поиск лучших параметров
grids_this, funtion_time = params_and_model_selection(
features_train,
features_test,
target_train,
target_test,
[{
'regressor': [DecisionTreeRegressor(random_state=STATE)], # score: R^2
'regressor__max_depth': range(
range_min,
range_max,
range_step
)
}]
)
# Выбор лучшей модели
if grids_best == 0:
grids_best = grids_this
time_best = funtion_time
elif grids_this.best_score_ > grids_best.best_score_:
grids_best = grids_this
time_best = funtion_time
if range_step == 1: break
# Выбор параметров поиска
regressor__max_depth = grids_this.best_params_['regressor__max_depth']
if int(regressor__max_depth - range_step / 2) > 0:
range_min = int(regressor__max_depth - range_step / 2)
else:
range_min = regressor__max_depth
range_max = int(regressor__max_depth + range_step / 2) + 1
range_step = int(range_step / 2)
grids.append(grids_best)
data_times.append(time_best)
return grids, data_times
# RandomForestRegressor
def grids_RandomForestRegressor(
features_train,
features_test,
target_train,
target_test,
grids,
data_times
):
grids_this = 0
grids_best = 0
funtion_time = 0
time_best = 0
# Поиск "regressor__max_depth"
range_min = 20
range_max = 61
range_step = 20
for i in range(1, 5, 1):
# Поиск лучших параметров
grids_this, funtion_time = params_and_model_selection(
features_train,
features_test,
target_train,
target_test,
[{
'regressor': [RandomForestRegressor(random_state=STATE)], # score: R^2
'regressor__max_depth': range(
range_min,
range_max,
range_step
),
'regressor__n_estimators': [1]
}]
)
# Выбор лучшей модели
if grids_best == 0:
grids_best = grids_this
time_best = funtion_time
elif grids_this.best_score_ > grids_best.best_score_:
grids_best = grids_this
time_best = funtion_time
if range_step == 1: break
# Выбор параметров поиска
regressor__max_depth = grids_this.best_params_['regressor__max_depth']
if int(regressor__max_depth - range_step / 2) > 0:
range_min = int(regressor__max_depth - range_step / 2)
else:
range_min = regressor__max_depth
range_max = int(regressor__max_depth + range_step / 2) + 1
range_step = int(range_step / 2)
if range_step == 0: range_step = 1
# Поиск "regressor__n_estimators"
range_min = 10
range_max = 31
range_step = 10
for i in range(1, 5, 1):
# Поиск лучших параметров
grids_this, funtion_time = params_and_model_selection(
features_train,
features_test,
target_train,
target_test,
[{
'regressor': [RandomForestRegressor(random_state=STATE)], # score: R^2
'regressor__max_depth': [regressor__max_depth],
'regressor__n_estimators': range(
range_min,
range_max,
range_step
)
}]
)
# Выбор лучшей модели
if grids_best == 0:
grids_best = grids_this
time_best = funtion_time
elif grids_this.best_score_ > grids_best.best_score_:
grids_best = grids_this
time_best = funtion_time
if range_step == 1: break
# Выбор параметров поиска
regressor__n_estimators = grids_this.best_params_['regressor__n_estimators']
if int(regressor__n_estimators - range_step / 2) > 0:
range_min = int(regressor__n_estimators - range_step / 2)
else:
range_min = regressor__n_estimators
range_max = int(regressor__n_estimators + range_step / 2) + 1
range_step = int(range_step / 10)
if range_step == 0: range_step = 1
grids.append(grids_best)
data_times.append(time_best)
return grids, data_times
# SGDRegressor
def grids_SGDRegressor(
features_train,
features_test,
target_train,
target_test,
grids,
data_times
):
# Поиск лучших параметров
grids_best, time_best = params_and_model_selection(
features_train,
features_test,
target_train,
target_test,
[{
'regressor': [SGDRegressor()]
}]
)
grids.append(grids_best)
data_times.append(time_best)
return grids, data_times
# MLPRegressor
def grids_MLPRegressor(
features_train,
features_test,
target_train,
target_test,
grids,
data_times
):
# Поиск лучших параметров
grids_best, time_best = params_and_model_selection(
features_train,
features_test,
target_train,
target_test,
[{
'regressor': [MLPRegressor()]
}]
)
grids.append(grids_best)
data_times.append(time_best)
return grids, data_times
# CatBoostRegressor
def grids_CatBoostRegressor(
features_train,
features_test,
target_train,
target_test,
grids,
data_times
):
grids_best, time_best = params_and_model_selection(
features_train,
features_test,
target_train,
target_test,
[{
'regressor': [CatBoostRegressor()]
}]
)
grids.append(grids_best)
data_times.append(time_best)
return grids, data_times
# LGBMRegressor
def grids_LGBMRegressor(
features_train,
features_test,
target_train,
target_test,
grids,
data_times
):
# Поиск лучших параметров
grids_this = 0
grids_best = 0
funtion_time = 0
time_best = 0
# Поиск "regressor__max_depth"
range_min = 1
range_max = 201
range_step = 20
for i in range(1, 5, 1):
#print('regressor__max_depth =', range(range_min, range_max, range_step))
# Поиск лучших параметров
grids_this, funtion_time = params_and_model_selection(
features_train,
features_test,
target_train,
target_test,
[{
'regressor': [LGBMRegressor(random_state=STATE)], # score: R^2
'regressor__max_depth': range(
range_min,
range_max,
range_step
),
'regressor__n_estimators': [1]
}]
)
# Выбор лучшей модели
if grids_best == 0:
grids_best = grids_this
time_best = funtion_time
elif grids_this.best_score_ > grids_best.best_score_:
grids_best = grids_this
time_best = funtion_time
if range_step == 1: break
# Выбор параметров поиска
regressor__max_depth = grids_this.best_params_['regressor__max_depth']
if int(regressor__max_depth - range_step / 2) > 0:
range_min = int(regressor__max_depth - range_step / 2)
else:
range_min = regressor__max_depth
range_max = int(regressor__max_depth + range_step / 2) + 1
range_step = int(range_step / 2)
if range_step == 0: range_step = 1
# Поиск "regressor__n_estimators"
range_min = 1
range_max = 51
range_step = 10
for i in range(1, 5, 1):
#print('regressor__n_estimators =', range(range_min, range_max, range_step))
# Поиск лучших параметров
grids_this, funtion_time = params_and_model_selection(
features_train,
features_test,
target_train,
target_test,
[{
'regressor': [LGBMRegressor(random_state=STATE)], # score: R^2
'regressor__max_depth': [regressor__max_depth],
'regressor__n_estimators': range(
range_min,
range_max,
range_step
)
}]
)
# Выбор лучшей модели
if grids_best == 0:
grids_best = grids_this
time_best = funtion_time
elif grids_this.best_score_ > grids_best.best_score_:
grids_best = grids_this
time_best = funtion_time
if range_step == 1: break
# Выбор параметров поиска
regressor__n_estimators = grids_this.best_params_['regressor__n_estimators']
if int(regressor__n_estimators - range_step / 2) > 0:
range_min = int(regressor__n_estimators - range_step / 2)
else:
range_min = regressor__n_estimators
range_max = int(regressor__n_estimators + range_step / 2) + 1
range_step = int(range_step / 10)
if range_step == 0: range_step = 1
grids.append(grids_best)
data_times.append(time_best)
return grids, data_times
Применение функций¶
# Подготовка выборок из датафрейма
# Разделение обучающего датафрейма на целевую и нецелевую выборку
features_train, features_test, target_train, target_test = data_preprocessing(data, 'price')
# Кодирование категориальных текстовых признаков
# с помощью технологии TargetEncoder
features_encoding = ['vehicle_type', 'gearbox', 'model', 'fuel_type', 'brand', 'repaired']
te_fit = TargetEncoder().fit(features_train[features_encoding], target_train)
features_train[features_encoding] = te_fit.transform(features_train[features_encoding])
features_test[features_encoding] = te_fit.transform(features_test[features_encoding])
print(features_train.info())
features_train.head()
<class 'pandas.core.frame.DataFrame'> Int64Index: 247630 entries, 223244 to 257691 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 vehicle_type 247630 non-null float64 1 registration_year 247630 non-null int64 2 gearbox 247630 non-null float64 3 power 247630 non-null float64 4 model 247630 non-null float64 5 kilometer 247630 non-null int64 6 fuel_type 247630 non-null float64 7 brand 247630 non-null float64 8 repaired 247630 non-null float64 dtypes: float64(7), int64(2) memory usage: 18.9 MB None
vehicle_type | registration_year | gearbox | power | model | kilometer | fuel_type | brand | repaired | |
---|---|---|---|---|---|---|---|---|---|
223244 | 4759.405926 | 2005 | 6967.682167 | 190.0 | 5767.135775 | 150000 | 6756.042179 | 6010.171569 | 5385.242357 |
98910 | 3540.618365 | 2000 | 2270.272120 | 0.0 | 3793.928198 | 150000 | 3394.617204 | 3261.835992 | 2672.054560 |
62348 | 6827.486197 | 2010 | 4081.965033 | 120.0 | 3223.449143 | 80000 | 3807.829533 | 3147.120726 | 5385.242357 |
318127 | 4759.405926 | 2009 | 4081.965033 | 143.0 | 5890.414408 | 125000 | 6756.042179 | 6384.934049 | 5385.242357 |
290228 | 4759.405926 | 1993 | 4081.965033 | 45.0 | 2604.409468 | 150000 | 3394.617204 | 4530.331913 | 5385.242357 |
# Поиск лучших моделей и их параметров
data_grids = []
data_times = []
# LinearRegression (dummy-model)
data_grids, data_times = grids_LinearRegression(
features_train,
features_test,
target_train,
target_test,
data_grids,
data_times
)
print_model_result(data_grids, data_times, 'LinearRegression')
Модель : LinearRegression RMSE : 3069.23279496429 Время : 2.045431137084961 секунд Параметры: Pipeline(steps=[('preprocessor', ColumnTransformer(transformers=[('pipeline', Pipeline(steps=[('standardscaler', StandardScaler())]), Index(['vehicle_type', 'registration_year', 'gearbox', 'power', 'model', 'kilometer', 'fuel_type', 'brand', 'repaired'], dtype='object'))])), ('regressor', LinearRegression())]) --------------------
# DecisionTreeRegressor
data_grids, data_times = grids_DecisionTreeRegressor(
features_train,
features_test,
target_train,
target_test,
data_grids,
data_times
)
print_model_result(data_grids, data_times, 'DecisionTreeRegressor')
Модель : DecisionTreeRegressor RMSE : 3520.5951256556477 Время : 14.969941139221191 секунд Параметры: Pipeline(steps=[('preprocessor', ColumnTransformer(transformers=[('pipeline', Pipeline(steps=[('standardscaler', StandardScaler())]), Index(['vehicle_type', 'registration_year', 'gearbox', 'power', 'model', 'kilometer', 'fuel_type', 'brand', 'repaired'], dtype='object'))])), ('regressor', DecisionTreeRegressor(max_depth=1, random_state=42))]) --------------------
# SGDRegressor
data_grids, data_times = grids_SGDRegressor(
features_train,
features_test,
target_train,
target_test,
data_grids,
data_times
)
print_model_result(data_grids, data_times, 'SGDRegressor')
Модель : SGDRegressor RMSE : 3089.779746424787 Время : 5.435046195983887 секунд Параметры: Pipeline(steps=[('preprocessor', ColumnTransformer(transformers=[('pipeline', Pipeline(steps=[('standardscaler', StandardScaler())]), Index(['vehicle_type', 'registration_year', 'gearbox', 'power', 'model', 'kilometer', 'fuel_type', 'brand', 'repaired'], dtype='object'))])), ('regressor', SGDRegressor())]) --------------------
# CatBoostRegressor
data_grids, data_times = grids_CatBoostRegressor(
features_train,
features_test,
target_train,
target_test,
data_grids,
data_times
)
print_model_result(data_grids, data_times, 'CatBoostRegressor')
Learning rate set to 0.097814 0: learn: 4268.9324355 total: 101ms remaining: 1m 41s 1: learn: 4010.6410460 total: 138ms remaining: 1m 8s 2: learn: 3784.1669481 total: 181ms remaining: 1m 3: learn: 3579.5232253 total: 218ms remaining: 54.4s 4: learn: 3400.3400769 total: 255ms remaining: 50.7s 5: learn: 3242.2229009 total: 299ms remaining: 49.5s 6: learn: 3101.0951468 total: 336ms remaining: 47.6s 7: learn: 2976.0448761 total: 382ms remaining: 47.3s 8: learn: 2865.0184049 total: 418ms remaining: 46s 9: learn: 2769.5971830 total: 455ms remaining: 45s 10: learn: 2688.7300008 total: 496ms remaining: 44.6s 11: learn: 2611.4192577 total: 533ms remaining: 43.8s 12: learn: 2547.1626659 total: 581ms remaining: 44.1s 13: learn: 2486.4814447 total: 618ms remaining: 43.5s 14: learn: 2436.7976506 total: 659ms remaining: 43.3s 15: learn: 2388.2476102 total: 698ms remaining: 42.9s 16: learn: 2349.1750890 total: 734ms remaining: 42.4s 17: learn: 2311.5397711 total: 777ms remaining: 42.4s 18: learn: 2279.0709995 total: 812ms remaining: 41.9s 19: learn: 2249.5200641 total: 853ms remaining: 41.8s 20: learn: 2225.0638599 total: 896ms remaining: 41.8s 21: learn: 2202.5869378 total: 939ms remaining: 41.7s 22: learn: 2181.3138603 total: 989ms remaining: 42s 23: learn: 2162.7987170 total: 1.03s remaining: 41.9s 24: learn: 2146.4960625 total: 1.08s remaining: 42s 25: learn: 2130.0284629 total: 1.11s remaining: 41.8s 26: learn: 2114.8175590 total: 1.16s remaining: 41.9s 27: learn: 2103.0645814 total: 1.21s remaining: 41.9s 28: learn: 2092.1972325 total: 1.25s remaining: 42s 29: learn: 2082.0707635 total: 1.29s remaining: 41.8s 30: learn: 2072.5199001 total: 1.33s remaining: 41.7s 31: learn: 2061.9256277 total: 1.38s remaining: 41.6s 32: learn: 2053.0818270 total: 1.41s remaining: 41.3s 33: learn: 2045.8192642 total: 1.45s remaining: 41.2s 34: learn: 2038.9822441 total: 1.49s remaining: 41s 35: learn: 2033.6254723 total: 1.52s remaining: 40.7s 36: learn: 2028.4506164 total: 1.57s remaining: 40.8s 37: learn: 2020.3179117 total: 1.6s remaining: 40.6s 38: learn: 2015.5523488 total: 1.64s remaining: 40.4s 39: learn: 2010.3228599 total: 1.68s remaining: 40.2s 40: learn: 2004.3267459 total: 1.72s remaining: 40.2s 41: learn: 1998.5867855 total: 1.76s remaining: 40.3s 42: learn: 1994.6349425 total: 1.8s remaining: 40.1s 43: learn: 1991.2172038 total: 1.84s remaining: 40s 44: learn: 1988.5022244 total: 1.88s remaining: 39.9s 45: learn: 1984.3815351 total: 1.92s remaining: 39.8s 46: learn: 1978.7809852 total: 1.96s remaining: 39.8s 47: learn: 1976.1214547 total: 2s remaining: 39.6s 48: learn: 1972.7190666 total: 2.04s remaining: 39.5s 49: learn: 1968.6779676 total: 2.07s remaining: 39.4s 50: learn: 1963.1907207 total: 2.11s remaining: 39.2s 51: learn: 1960.2988615 total: 2.15s remaining: 39.2s 52: learn: 1957.8772666 total: 2.18s remaining: 39s 53: learn: 1955.8311334 total: 2.22s remaining: 39s 54: learn: 1953.0846871 total: 2.26s remaining: 38.8s 55: learn: 1950.5234577 total: 2.29s remaining: 38.6s 56: learn: 1946.2558032 total: 2.34s remaining: 38.7s 57: learn: 1943.7185580 total: 2.37s remaining: 38.5s 58: learn: 1941.8747937 total: 2.41s remaining: 38.4s 59: learn: 1938.4969708 total: 2.45s remaining: 38.4s 60: learn: 1935.7775907 total: 2.48s remaining: 38.2s 61: learn: 1934.4810527 total: 2.52s remaining: 38.1s 62: learn: 1931.2776221 total: 2.55s remaining: 38s 63: learn: 1929.5589586 total: 2.59s remaining: 37.8s 64: learn: 1926.3388558 total: 2.63s remaining: 37.9s 65: learn: 1923.6750746 total: 2.66s remaining: 37.7s 66: learn: 1922.5330800 total: 2.69s remaining: 37.5s 67: learn: 1920.8281168 total: 2.74s remaining: 37.6s 68: learn: 1918.3175340 total: 2.77s remaining: 37.4s 69: learn: 1915.7574885 total: 2.82s remaining: 37.4s 70: learn: 1913.6291187 total: 2.85s remaining: 37.3s 71: learn: 1911.3063450 total: 2.88s remaining: 37.2s 72: learn: 1910.0146675 total: 2.92s remaining: 37.1s 73: learn: 1908.5637534 total: 2.96s remaining: 37s 74: learn: 1906.4780092 total: 2.99s remaining: 36.9s 75: learn: 1905.1691680 total: 3.03s remaining: 36.9s 76: learn: 1903.1997125 total: 3.06s remaining: 36.7s 77: learn: 1901.8821513 total: 3.1s remaining: 36.7s 78: learn: 1900.1337731 total: 3.14s remaining: 36.6s 79: learn: 1899.1992384 total: 3.17s remaining: 36.5s 80: learn: 1897.3895928 total: 3.21s remaining: 36.4s 81: learn: 1895.8139714 total: 3.24s remaining: 36.3s 82: learn: 1893.6768162 total: 3.28s remaining: 36.2s 83: learn: 1891.4986383 total: 3.32s remaining: 36.2s 84: learn: 1889.8489640 total: 3.36s remaining: 36.2s 85: learn: 1888.8394635 total: 3.4s remaining: 36.1s 86: learn: 1887.7631764 total: 3.43s remaining: 36s 87: learn: 1886.0409055 total: 3.47s remaining: 35.9s 88: learn: 1884.4447070 total: 3.51s remaining: 35.9s 89: learn: 1882.9561620 total: 3.54s remaining: 35.8s 90: learn: 1881.6691320 total: 3.58s remaining: 35.8s 91: learn: 1879.9898022 total: 3.63s remaining: 35.8s 92: learn: 1879.0661497 total: 3.66s remaining: 35.7s 93: learn: 1877.3728090 total: 3.71s remaining: 35.7s 94: learn: 1876.1311891 total: 3.74s remaining: 35.7s 95: learn: 1874.7015338 total: 3.78s remaining: 35.6s 96: learn: 1873.5465573 total: 3.82s remaining: 35.6s 97: learn: 1872.7624631 total: 3.85s remaining: 35.5s 98: learn: 1870.7938629 total: 3.9s remaining: 35.5s 99: learn: 1869.2673785 total: 3.93s remaining: 35.4s 100: learn: 1867.9182654 total: 3.97s remaining: 35.4s 101: learn: 1866.6275050 total: 4.02s remaining: 35.4s 102: learn: 1864.8356438 total: 4.05s remaining: 35.3s 103: learn: 1863.8575595 total: 4.09s remaining: 35.3s 104: learn: 1862.9924644 total: 4.13s remaining: 35.2s 105: learn: 1862.4445378 total: 4.16s remaining: 35.1s 106: learn: 1861.5129977 total: 4.2s remaining: 35.1s 107: learn: 1859.9204173 total: 4.24s remaining: 35s 108: learn: 1858.7382351 total: 4.28s remaining: 35s 109: learn: 1857.3892643 total: 4.31s remaining: 34.9s 110: learn: 1856.6283950 total: 4.35s remaining: 34.8s 111: learn: 1855.4247124 total: 4.39s remaining: 34.8s 112: learn: 1854.3838611 total: 4.43s remaining: 34.8s 113: learn: 1853.5954063 total: 4.46s remaining: 34.7s 114: learn: 1852.6219831 total: 4.51s remaining: 34.7s 115: learn: 1851.9004090 total: 4.55s remaining: 34.7s 116: learn: 1851.2159787 total: 4.59s remaining: 34.6s 117: learn: 1850.5915299 total: 4.62s remaining: 34.5s 118: learn: 1849.8976949 total: 4.65s remaining: 34.5s 119: learn: 1848.8652075 total: 4.7s remaining: 34.4s 120: learn: 1847.8232751 total: 4.73s remaining: 34.4s 121: learn: 1847.2230104 total: 4.77s remaining: 34.3s 122: learn: 1846.5371880 total: 4.8s remaining: 34.3s 123: learn: 1845.8262919 total: 4.84s remaining: 34.2s 124: learn: 1845.3827261 total: 4.88s remaining: 34.1s 125: learn: 1844.6014292 total: 4.91s remaining: 34.1s 126: learn: 1843.7578276 total: 4.95s remaining: 34s 127: learn: 1843.0869271 total: 4.99s remaining: 34s 128: learn: 1841.9619538 total: 5.02s remaining: 33.9s 129: learn: 1841.2991078 total: 5.06s remaining: 33.9s 130: learn: 1840.4296282 total: 5.1s remaining: 33.8s 131: learn: 1839.1957353 total: 5.13s remaining: 33.7s 132: learn: 1838.1950594 total: 5.17s remaining: 33.7s 133: learn: 1837.3731522 total: 5.21s remaining: 33.6s 134: learn: 1836.7928530 total: 5.24s remaining: 33.6s 135: learn: 1835.7954051 total: 5.28s remaining: 33.6s 136: learn: 1834.9445079 total: 5.31s remaining: 33.5s 137: learn: 1834.0497983 total: 5.36s remaining: 33.5s 138: learn: 1833.0138969 total: 5.39s remaining: 33.4s 139: learn: 1832.2018222 total: 5.42s remaining: 33.3s 140: learn: 1831.6937920 total: 5.46s remaining: 33.3s 141: learn: 1831.1211248 total: 5.49s remaining: 33.2s 142: learn: 1830.3238911 total: 5.53s remaining: 33.1s 143: learn: 1829.7812267 total: 5.57s remaining: 33.1s 144: learn: 1829.0570199 total: 5.6s remaining: 33s 145: learn: 1828.4505519 total: 5.64s remaining: 33s 146: learn: 1828.0967705 total: 5.68s remaining: 33s 147: learn: 1827.0128422 total: 5.72s remaining: 32.9s 148: learn: 1826.1013723 total: 5.76s remaining: 32.9s 149: learn: 1825.4182648 total: 5.8s remaining: 32.9s 150: learn: 1824.6176075 total: 5.84s remaining: 32.8s 151: learn: 1823.8060667 total: 5.88s remaining: 32.8s 152: learn: 1823.1678913 total: 5.92s remaining: 32.7s 153: learn: 1822.7543456 total: 5.95s remaining: 32.7s 154: learn: 1822.1850611 total: 5.98s remaining: 32.6s 155: learn: 1821.5692808 total: 6.03s remaining: 32.6s 156: learn: 1820.9095442 total: 6.06s remaining: 32.6s 157: learn: 1820.4462638 total: 6.09s remaining: 32.5s 158: learn: 1819.8860522 total: 6.14s remaining: 32.5s 159: learn: 1819.2441115 total: 6.17s remaining: 32.4s 160: learn: 1818.4616216 total: 6.22s remaining: 32.4s 161: learn: 1817.9427457 total: 6.26s remaining: 32.4s 162: learn: 1817.4403200 total: 6.29s remaining: 32.3s 163: learn: 1816.9666062 total: 6.33s remaining: 32.3s 164: learn: 1816.2439721 total: 6.37s remaining: 32.2s 165: learn: 1815.8090863 total: 6.41s remaining: 32.2s 166: learn: 1815.3273199 total: 6.44s remaining: 32.1s 167: learn: 1814.5158132 total: 6.48s remaining: 32.1s 168: learn: 1814.1986002 total: 6.52s remaining: 32.1s 169: learn: 1813.6626373 total: 6.56s remaining: 32s 170: learn: 1812.9822789 total: 6.61s remaining: 32s 171: learn: 1812.5582427 total: 6.64s remaining: 32s 172: learn: 1811.8354672 total: 6.67s remaining: 31.9s 173: learn: 1811.3725020 total: 6.71s remaining: 31.9s 174: learn: 1810.8608786 total: 6.75s remaining: 31.8s 175: learn: 1810.2449498 total: 6.79s remaining: 31.8s 176: learn: 1809.6423035 total: 6.83s remaining: 31.7s 177: learn: 1809.2480727 total: 6.86s remaining: 31.7s 178: learn: 1808.6675072 total: 6.91s remaining: 31.7s 179: learn: 1808.0039389 total: 6.94s remaining: 31.6s 180: learn: 1807.6046747 total: 6.98s remaining: 31.6s 181: learn: 1807.3519822 total: 7.01s remaining: 31.5s 182: learn: 1806.8990466 total: 7.05s remaining: 31.5s 183: learn: 1806.2886441 total: 7.1s remaining: 31.5s 184: learn: 1805.5854407 total: 7.14s remaining: 31.5s 185: learn: 1805.0783816 total: 7.18s remaining: 31.4s 186: learn: 1804.5242344 total: 7.22s remaining: 31.4s 187: learn: 1804.3149081 total: 7.25s remaining: 31.3s 188: learn: 1803.7452142 total: 7.29s remaining: 31.3s 189: learn: 1803.4434408 total: 7.33s remaining: 31.2s 190: learn: 1803.1064225 total: 7.37s remaining: 31.2s 191: learn: 1802.6029852 total: 7.4s remaining: 31.2s 192: learn: 1802.1195912 total: 7.44s remaining: 31.1s 193: learn: 1801.6937097 total: 7.48s remaining: 31.1s 194: learn: 1801.2427150 total: 7.51s remaining: 31s 195: learn: 1800.9240433 total: 7.55s remaining: 31s 196: learn: 1800.5739351 total: 7.59s remaining: 31s 197: learn: 1800.1988974 total: 7.63s remaining: 30.9s 198: learn: 1799.8508234 total: 7.67s remaining: 30.9s 199: learn: 1799.1884678 total: 7.71s remaining: 30.8s 200: learn: 1798.5902332 total: 7.74s remaining: 30.8s 201: learn: 1798.0212189 total: 7.78s remaining: 30.7s 202: learn: 1797.7243252 total: 7.82s remaining: 30.7s 203: learn: 1797.0336469 total: 7.86s remaining: 30.7s 204: learn: 1796.3191776 total: 7.9s remaining: 30.6s 205: learn: 1795.9519571 total: 7.93s remaining: 30.6s 206: learn: 1795.5950900 total: 7.97s remaining: 30.5s 207: learn: 1795.2389059 total: 8.01s remaining: 30.5s 208: learn: 1794.7079839 total: 8.04s remaining: 30.4s 209: learn: 1794.1596165 total: 8.08s remaining: 30.4s 210: learn: 1793.7831297 total: 8.12s remaining: 30.3s 211: learn: 1793.4907585 total: 8.15s remaining: 30.3s 212: learn: 1792.9769478 total: 8.19s remaining: 30.2s 213: learn: 1792.5382337 total: 8.22s remaining: 30.2s 214: learn: 1791.9538646 total: 8.26s remaining: 30.2s 215: learn: 1791.2988125 total: 8.3s remaining: 30.1s 216: learn: 1790.8879168 total: 8.33s remaining: 30.1s 217: learn: 1790.5466705 total: 8.37s remaining: 30s 218: learn: 1790.1821375 total: 8.41s remaining: 30s 219: learn: 1789.8022776 total: 8.45s remaining: 29.9s 220: learn: 1789.3897690 total: 8.48s remaining: 29.9s 221: learn: 1789.2006412 total: 8.51s remaining: 29.8s 222: learn: 1788.6142665 total: 8.56s remaining: 29.8s 223: learn: 1788.1002454 total: 8.6s remaining: 29.8s 224: learn: 1787.6531289 total: 8.64s remaining: 29.8s 225: learn: 1787.2842224 total: 8.68s remaining: 29.7s 226: learn: 1786.9077368 total: 8.71s remaining: 29.6s 227: learn: 1786.4275520 total: 8.75s remaining: 29.6s 228: learn: 1786.1205733 total: 8.78s remaining: 29.6s 229: learn: 1785.6904303 total: 8.82s remaining: 29.5s 230: learn: 1785.3968018 total: 8.86s remaining: 29.5s 231: learn: 1784.9782433 total: 8.89s remaining: 29.4s 232: learn: 1784.6855910 total: 8.94s remaining: 29.4s 233: learn: 1784.2238292 total: 8.97s remaining: 29.4s 234: learn: 1783.9608709 total: 9s remaining: 29.3s 235: learn: 1783.6271962 total: 9.04s remaining: 29.3s 236: learn: 1783.2335930 total: 9.07s remaining: 29.2s 237: learn: 1782.7868410 total: 9.11s remaining: 29.2s 238: learn: 1782.5721117 total: 9.15s remaining: 29.1s 239: learn: 1781.9219510 total: 9.18s remaining: 29.1s 240: learn: 1781.7145898 total: 9.22s remaining: 29s 241: learn: 1781.3611394 total: 9.26s remaining: 29s 242: learn: 1781.1198496 total: 9.29s remaining: 28.9s 243: learn: 1780.6603243 total: 9.33s remaining: 28.9s 244: learn: 1780.2112602 total: 9.36s remaining: 28.9s 245: learn: 1779.9453147 total: 9.4s remaining: 28.8s 246: learn: 1779.5035305 total: 9.44s remaining: 28.8s 247: learn: 1779.1024113 total: 9.47s remaining: 28.7s 248: learn: 1778.6124969 total: 9.51s remaining: 28.7s 249: learn: 1778.2540646 total: 9.56s remaining: 28.7s 250: learn: 1777.8708548 total: 9.59s remaining: 28.6s 251: learn: 1777.6396625 total: 9.63s remaining: 28.6s 252: learn: 1777.3333206 total: 9.67s remaining: 28.5s 253: learn: 1776.9195690 total: 9.71s remaining: 28.5s 254: learn: 1776.6025572 total: 9.74s remaining: 28.5s 255: learn: 1776.1948877 total: 9.78s remaining: 28.4s 256: learn: 1775.9161063 total: 9.82s remaining: 28.4s 257: learn: 1775.6270258 total: 9.85s remaining: 28.3s 258: learn: 1775.3998128 total: 9.89s remaining: 28.3s 259: learn: 1775.0897997 total: 9.93s remaining: 28.3s 260: learn: 1774.8271029 total: 9.96s remaining: 28.2s 261: learn: 1774.3388322 total: 10s remaining: 28.2s 262: learn: 1774.0200502 total: 10s remaining: 28.1s 263: learn: 1773.6887452 total: 10.1s remaining: 28.1s 264: learn: 1773.4059839 total: 10.1s remaining: 28.1s 265: learn: 1773.1201130 total: 10.2s remaining: 28s 266: learn: 1772.7696711 total: 10.2s remaining: 28s 267: learn: 1772.4872386 total: 10.2s remaining: 27.9s 268: learn: 1772.2062464 total: 10.3s remaining: 27.9s 269: learn: 1771.8423772 total: 10.3s remaining: 27.9s 270: learn: 1771.5846551 total: 10.3s remaining: 27.8s 271: learn: 1771.4041686 total: 10.4s remaining: 27.8s 272: learn: 1770.8989639 total: 10.4s remaining: 27.7s 273: learn: 1770.1904435 total: 10.5s remaining: 27.7s 274: learn: 1769.7568932 total: 10.5s remaining: 27.6s 275: learn: 1769.3664708 total: 10.5s remaining: 27.6s 276: learn: 1769.0642151 total: 10.6s remaining: 27.6s 277: learn: 1768.6658957 total: 10.6s remaining: 27.5s 278: learn: 1768.1364497 total: 10.6s remaining: 27.5s 279: learn: 1767.9284926 total: 10.7s remaining: 27.5s 280: learn: 1767.6985789 total: 10.7s remaining: 27.4s 281: learn: 1767.1823263 total: 10.8s remaining: 27.4s 282: learn: 1766.8906104 total: 10.8s remaining: 27.4s 283: learn: 1766.6383109 total: 10.8s remaining: 27.3s 284: learn: 1766.1665040 total: 10.9s remaining: 27.3s 285: learn: 1765.8448374 total: 10.9s remaining: 27.3s 286: learn: 1765.5680723 total: 11s remaining: 27.3s 287: learn: 1765.3666445 total: 11s remaining: 27.2s 288: learn: 1765.1500794 total: 11.1s remaining: 27.2s 289: learn: 1764.8564873 total: 11.1s remaining: 27.2s 290: learn: 1764.5185652 total: 11.1s remaining: 27.1s 291: learn: 1764.1867872 total: 11.2s remaining: 27.1s 292: learn: 1763.8993393 total: 11.2s remaining: 27s 293: learn: 1763.6660679 total: 11.2s remaining: 27s 294: learn: 1763.3575871 total: 11.3s remaining: 26.9s 295: learn: 1763.1142639 total: 11.3s remaining: 26.9s 296: learn: 1762.6812398 total: 11.3s remaining: 26.8s 297: learn: 1762.2830127 total: 11.4s remaining: 26.8s 298: learn: 1762.0901304 total: 11.4s remaining: 26.7s 299: learn: 1761.7681685 total: 11.5s remaining: 26.7s 300: learn: 1761.4818991 total: 11.5s remaining: 26.7s 301: learn: 1761.1103422 total: 11.5s remaining: 26.7s 302: learn: 1760.8641990 total: 11.6s remaining: 26.6s 303: learn: 1760.6317205 total: 11.6s remaining: 26.6s 304: learn: 1760.3780051 total: 11.6s remaining: 26.5s 305: learn: 1760.0414518 total: 11.7s remaining: 26.5s 306: learn: 1759.8263695 total: 11.7s remaining: 26.4s 307: learn: 1759.5421926 total: 11.8s remaining: 26.4s 308: learn: 1759.2803304 total: 11.8s remaining: 26.4s 309: learn: 1758.7037405 total: 11.8s remaining: 26.3s 310: learn: 1758.4650643 total: 11.9s remaining: 26.3s 311: learn: 1758.0051509 total: 11.9s remaining: 26.2s 312: learn: 1757.8479266 total: 11.9s remaining: 26.2s 313: learn: 1757.5739874 total: 12s remaining: 26.1s 314: learn: 1757.3063596 total: 12s remaining: 26.1s 315: learn: 1757.0848107 total: 12s remaining: 26.1s 316: learn: 1756.7994567 total: 12.1s remaining: 26s 317: learn: 1756.5600904 total: 12.1s remaining: 26s 318: learn: 1756.2659534 total: 12.1s remaining: 25.9s 319: learn: 1756.0907102 total: 12.2s remaining: 25.9s 320: learn: 1755.8491010 total: 12.2s remaining: 25.8s 321: learn: 1755.5478877 total: 12.2s remaining: 25.8s 322: learn: 1755.1168360 total: 12.3s remaining: 25.7s 323: learn: 1754.8449535 total: 12.3s remaining: 25.7s 324: learn: 1754.5057935 total: 12.4s remaining: 25.7s 325: learn: 1754.0880522 total: 12.4s remaining: 25.6s 326: learn: 1753.8344479 total: 12.4s remaining: 25.6s 327: learn: 1753.6281704 total: 12.5s remaining: 25.5s 328: learn: 1753.2965795 total: 12.5s remaining: 25.5s 329: learn: 1752.9300382 total: 12.5s remaining: 25.4s 330: learn: 1752.5147734 total: 12.6s remaining: 25.4s 331: learn: 1752.1240970 total: 12.6s remaining: 25.4s 332: learn: 1751.9118084 total: 12.6s remaining: 25.3s 333: learn: 1751.5902142 total: 12.7s remaining: 25.3s 334: learn: 1751.3225292 total: 12.7s remaining: 25.2s 335: learn: 1750.9929799 total: 12.7s remaining: 25.2s 336: learn: 1750.7245450 total: 12.8s remaining: 25.1s 337: learn: 1750.3474505 total: 12.8s remaining: 25.1s 338: learn: 1750.1473267 total: 12.9s remaining: 25.1s 339: learn: 1749.9925988 total: 12.9s remaining: 25s 340: learn: 1749.7293843 total: 12.9s remaining: 25s 341: learn: 1749.5443800 total: 13s remaining: 24.9s 342: learn: 1749.2786608 total: 13s remaining: 24.9s 343: learn: 1749.0282543 total: 13s remaining: 24.9s 344: learn: 1748.8016758 total: 13.1s remaining: 24.8s 345: learn: 1748.5816151 total: 13.1s remaining: 24.8s 346: learn: 1748.2391353 total: 13.1s remaining: 24.7s 347: learn: 1747.9193570 total: 13.2s remaining: 24.7s 348: learn: 1747.5006882 total: 13.2s remaining: 24.6s 349: learn: 1747.2459727 total: 13.2s remaining: 24.6s 350: learn: 1746.9780406 total: 13.3s remaining: 24.6s 351: learn: 1746.6494659 total: 13.3s remaining: 24.5s 352: learn: 1746.4461148 total: 13.3s remaining: 24.5s 353: learn: 1746.2071127 total: 13.4s remaining: 24.4s 354: learn: 1745.9392687 total: 13.4s remaining: 24.4s 355: learn: 1745.5625000 total: 13.5s remaining: 24.4s 356: learn: 1745.2700794 total: 13.5s remaining: 24.3s 357: learn: 1744.8984304 total: 13.5s remaining: 24.3s 358: learn: 1744.6932283 total: 13.6s remaining: 24.3s 359: learn: 1744.5093506 total: 13.6s remaining: 24.2s 360: learn: 1744.2132385 total: 13.7s remaining: 24.2s 361: learn: 1743.7651854 total: 13.7s remaining: 24.1s 362: learn: 1743.2861394 total: 13.7s remaining: 24.1s 363: learn: 1743.0874491 total: 13.8s remaining: 24.1s 364: learn: 1742.8353213 total: 13.8s remaining: 24s 365: learn: 1742.6024486 total: 13.8s remaining: 24s 366: learn: 1742.3204285 total: 13.9s remaining: 23.9s 367: learn: 1742.1034864 total: 13.9s remaining: 23.9s 368: learn: 1741.6239581 total: 13.9s remaining: 23.8s 369: learn: 1741.3617940 total: 14s remaining: 23.8s 370: learn: 1741.0757835 total: 14s remaining: 23.8s 371: learn: 1740.7511262 total: 14.1s remaining: 23.7s 372: learn: 1740.4421277 total: 14.1s remaining: 23.7s 373: learn: 1740.1921007 total: 14.1s remaining: 23.7s 374: learn: 1739.7759459 total: 14.2s remaining: 23.6s 375: learn: 1739.5004514 total: 14.2s remaining: 23.6s 376: learn: 1739.2280361 total: 14.2s remaining: 23.5s 377: learn: 1739.0545965 total: 14.3s remaining: 23.5s 378: learn: 1738.8086327 total: 14.3s remaining: 23.4s 379: learn: 1738.6508553 total: 14.3s remaining: 23.4s 380: learn: 1738.3791908 total: 14.4s remaining: 23.4s 381: learn: 1738.1401524 total: 14.4s remaining: 23.3s 382: learn: 1737.8842934 total: 14.5s remaining: 23.3s 383: learn: 1737.6455247 total: 14.5s remaining: 23.3s 384: learn: 1737.4289805 total: 14.5s remaining: 23.2s 385: learn: 1737.1759950 total: 14.6s remaining: 23.2s 386: learn: 1736.9287996 total: 14.6s remaining: 23.1s 387: learn: 1736.6644316 total: 14.7s remaining: 23.1s 388: learn: 1736.4509275 total: 14.7s remaining: 23.1s 389: learn: 1736.1780686 total: 14.7s remaining: 23s 390: learn: 1735.8027451 total: 14.8s remaining: 23s 391: learn: 1735.5312872 total: 14.8s remaining: 23s 392: learn: 1735.2365895 total: 14.8s remaining: 22.9s 393: learn: 1735.0621982 total: 14.9s remaining: 22.9s 394: learn: 1734.7786646 total: 14.9s remaining: 22.8s 395: learn: 1734.5900719 total: 14.9s remaining: 22.8s 396: learn: 1734.4205195 total: 15s remaining: 22.8s 397: learn: 1734.1769668 total: 15s remaining: 22.7s 398: learn: 1734.0092385 total: 15s remaining: 22.7s 399: learn: 1733.6076731 total: 15.1s remaining: 22.6s 400: learn: 1733.4073943 total: 15.1s remaining: 22.6s 401: learn: 1733.1870464 total: 15.2s remaining: 22.5s 402: learn: 1732.8418426 total: 15.2s remaining: 22.5s 403: learn: 1732.5884280 total: 15.2s remaining: 22.5s 404: learn: 1732.4253999 total: 15.3s remaining: 22.4s 405: learn: 1732.2204470 total: 15.3s remaining: 22.4s 406: learn: 1732.0078576 total: 15.3s remaining: 22.3s 407: learn: 1731.6408685 total: 15.4s remaining: 22.3s 408: learn: 1731.4376410 total: 15.4s remaining: 22.3s 409: learn: 1731.0719856 total: 15.4s remaining: 22.2s 410: learn: 1730.9297863 total: 15.5s remaining: 22.2s 411: learn: 1730.5167411 total: 15.5s remaining: 22.2s 412: learn: 1730.2006985 total: 15.6s remaining: 22.1s 413: learn: 1729.9278694 total: 15.6s remaining: 22.1s 414: learn: 1729.6756361 total: 15.6s remaining: 22s 415: learn: 1729.4111341 total: 15.7s remaining: 22s 416: learn: 1729.2568988 total: 15.7s remaining: 22s 417: learn: 1729.0928255 total: 15.7s remaining: 21.9s 418: learn: 1728.8499582 total: 15.8s remaining: 21.9s 419: learn: 1728.5351801 total: 15.8s remaining: 21.9s 420: learn: 1728.2158585 total: 15.9s remaining: 21.8s 421: learn: 1728.0929619 total: 15.9s remaining: 21.8s 422: learn: 1727.9116979 total: 15.9s remaining: 21.7s 423: learn: 1727.6771975 total: 16s remaining: 21.7s 424: learn: 1727.4173729 total: 16s remaining: 21.7s 425: learn: 1727.0662389 total: 16s remaining: 21.6s 426: learn: 1726.8669824 total: 16.1s remaining: 21.6s 427: learn: 1726.6371360 total: 16.1s remaining: 21.5s 428: learn: 1726.3195048 total: 16.1s remaining: 21.5s 429: learn: 1726.0887469 total: 16.2s remaining: 21.5s 430: learn: 1725.8260628 total: 16.2s remaining: 21.4s 431: learn: 1725.5895624 total: 16.3s remaining: 21.4s 432: learn: 1725.2705483 total: 16.3s remaining: 21.4s 433: learn: 1725.1367573 total: 16.3s remaining: 21.3s 434: learn: 1725.0005454 total: 16.4s remaining: 21.3s 435: learn: 1724.8051876 total: 16.4s remaining: 21.2s 436: learn: 1724.6564043 total: 16.4s remaining: 21.2s 437: learn: 1724.5366264 total: 16.5s remaining: 21.1s 438: learn: 1724.2321711 total: 16.5s remaining: 21.1s 439: learn: 1724.0565394 total: 16.5s remaining: 21.1s 440: learn: 1723.8850055 total: 16.6s remaining: 21s 441: learn: 1723.6696154 total: 16.6s remaining: 21s 442: learn: 1723.3431246 total: 16.7s remaining: 20.9s 443: learn: 1723.1398505 total: 16.7s remaining: 20.9s 444: learn: 1722.8639785 total: 16.7s remaining: 20.9s 445: learn: 1722.6735897 total: 16.8s remaining: 20.8s 446: learn: 1722.5845354 total: 16.8s remaining: 20.8s 447: learn: 1722.4097049 total: 16.8s remaining: 20.7s 448: learn: 1722.1139844 total: 16.9s remaining: 20.7s 449: learn: 1721.7760568 total: 16.9s remaining: 20.7s 450: learn: 1721.4713080 total: 17s remaining: 20.6s 451: learn: 1721.2194677 total: 17s remaining: 20.6s 452: learn: 1721.0612090 total: 17s remaining: 20.6s 453: learn: 1720.9146539 total: 17.1s remaining: 20.5s 454: learn: 1720.8222587 total: 17.1s remaining: 20.5s 455: learn: 1720.6646628 total: 17.1s remaining: 20.5s 456: learn: 1720.4637749 total: 17.2s remaining: 20.4s 457: learn: 1720.2393639 total: 17.2s remaining: 20.4s 458: learn: 1720.1202312 total: 17.3s remaining: 20.3s 459: learn: 1719.7680282 total: 17.3s remaining: 20.3s 460: learn: 1719.5163737 total: 17.3s remaining: 20.3s 461: learn: 1719.3732885 total: 17.4s remaining: 20.2s 462: learn: 1719.2246511 total: 17.4s remaining: 20.2s 463: learn: 1719.1061951 total: 17.4s remaining: 20.1s 464: learn: 1718.8771668 total: 17.5s remaining: 20.1s 465: learn: 1718.6532539 total: 17.5s remaining: 20.1s 466: learn: 1718.4049513 total: 17.5s remaining: 20s 467: learn: 1718.2375973 total: 17.6s remaining: 20s 468: learn: 1717.9303183 total: 17.6s remaining: 19.9s 469: learn: 1717.7103214 total: 17.7s remaining: 19.9s 470: learn: 1717.4267298 total: 17.7s remaining: 19.9s 471: learn: 1717.1456832 total: 17.7s remaining: 19.8s 472: learn: 1716.9588359 total: 17.8s remaining: 19.8s 473: learn: 1716.8348814 total: 17.8s remaining: 19.8s 474: learn: 1716.5559279 total: 17.8s remaining: 19.7s 475: learn: 1716.3504419 total: 17.9s remaining: 19.7s 476: learn: 1716.2040290 total: 17.9s remaining: 19.6s 477: learn: 1716.0826465 total: 17.9s remaining: 19.6s 478: learn: 1715.9015271 total: 18s remaining: 19.6s 479: learn: 1715.7024150 total: 18s remaining: 19.5s 480: learn: 1715.6024799 total: 18s remaining: 19.5s 481: learn: 1715.3289656 total: 18.1s remaining: 19.4s 482: learn: 1715.2076024 total: 18.1s remaining: 19.4s 483: learn: 1714.9178652 total: 18.2s remaining: 19.4s 484: learn: 1714.7177816 total: 18.2s remaining: 19.3s 485: learn: 1714.4598430 total: 18.2s remaining: 19.3s 486: learn: 1714.0183643 total: 18.3s remaining: 19.2s 487: learn: 1713.7304333 total: 18.3s remaining: 19.2s 488: learn: 1713.5103897 total: 18.3s remaining: 19.2s 489: learn: 1713.3742397 total: 18.4s remaining: 19.1s 490: learn: 1713.2619764 total: 18.4s remaining: 19.1s 491: learn: 1713.0387966 total: 18.4s remaining: 19s 492: learn: 1712.9401211 total: 18.5s remaining: 19s 493: learn: 1712.7394449 total: 18.5s remaining: 19s 494: learn: 1712.5343663 total: 18.5s remaining: 18.9s 495: learn: 1712.3784743 total: 18.6s remaining: 18.9s 496: learn: 1712.0922978 total: 18.6s remaining: 18.9s 497: learn: 1711.8156402 total: 18.7s remaining: 18.8s 498: learn: 1711.6820771 total: 18.7s remaining: 18.8s 499: learn: 1711.4950015 total: 18.7s remaining: 18.7s 500: learn: 1711.2591853 total: 18.8s remaining: 18.7s 501: learn: 1711.0977516 total: 18.8s remaining: 18.7s 502: learn: 1710.9929387 total: 18.9s remaining: 18.6s 503: learn: 1710.8271099 total: 18.9s remaining: 18.6s 504: learn: 1710.6331287 total: 18.9s remaining: 18.6s 505: learn: 1710.4762621 total: 19s remaining: 18.5s 506: learn: 1710.3142589 total: 19s remaining: 18.5s 507: learn: 1710.1899185 total: 19s remaining: 18.4s 508: learn: 1709.9372884 total: 19.1s remaining: 18.4s 509: learn: 1709.7034273 total: 19.1s remaining: 18.4s 510: learn: 1709.5120325 total: 19.2s remaining: 18.3s 511: learn: 1709.2075357 total: 19.2s remaining: 18.3s 512: learn: 1709.0654959 total: 19.2s remaining: 18.3s 513: learn: 1708.8787820 total: 19.3s remaining: 18.2s 514: learn: 1708.6184315 total: 19.3s remaining: 18.2s 515: learn: 1708.5472845 total: 19.3s remaining: 18.1s 516: learn: 1708.4376861 total: 19.4s remaining: 18.1s 517: learn: 1708.2355679 total: 19.4s remaining: 18.1s 518: learn: 1707.9584908 total: 19.4s remaining: 18s 519: learn: 1707.6131821 total: 19.5s remaining: 18s 520: learn: 1707.3502514 total: 19.5s remaining: 17.9s 521: learn: 1707.1687955 total: 19.6s remaining: 17.9s 522: learn: 1706.8852222 total: 19.6s remaining: 17.9s 523: learn: 1706.7647810 total: 19.6s remaining: 17.8s 524: learn: 1706.6047831 total: 19.7s remaining: 17.8s 525: learn: 1706.4204895 total: 19.7s remaining: 17.8s 526: learn: 1706.2095678 total: 19.7s remaining: 17.7s 527: learn: 1706.0232433 total: 19.8s remaining: 17.7s 528: learn: 1705.7865371 total: 19.8s remaining: 17.6s 529: learn: 1705.6659996 total: 19.8s remaining: 17.6s 530: learn: 1705.5108255 total: 19.9s remaining: 17.6s 531: learn: 1705.2820013 total: 19.9s remaining: 17.5s 532: learn: 1705.0822526 total: 20s remaining: 17.5s 533: learn: 1704.9111084 total: 20s remaining: 17.5s 534: learn: 1704.7166620 total: 20s remaining: 17.4s 535: learn: 1704.5360805 total: 20.1s remaining: 17.4s 536: learn: 1704.3616382 total: 20.1s remaining: 17.3s 537: learn: 1704.2146442 total: 20.1s remaining: 17.3s 538: learn: 1704.0714488 total: 20.2s remaining: 17.3s 539: learn: 1703.8316317 total: 20.2s remaining: 17.2s 540: learn: 1703.5975953 total: 20.3s remaining: 17.2s 541: learn: 1703.4061100 total: 20.3s remaining: 17.2s 542: learn: 1703.2313359 total: 20.3s remaining: 17.1s 543: learn: 1703.0239437 total: 20.4s remaining: 17.1s 544: learn: 1702.8668595 total: 20.4s remaining: 17s 545: learn: 1702.7368019 total: 20.4s remaining: 17s 546: learn: 1702.5079268 total: 20.5s remaining: 17s 547: learn: 1702.4216445 total: 20.5s remaining: 16.9s 548: learn: 1702.2927510 total: 20.5s remaining: 16.9s 549: learn: 1702.0422832 total: 20.6s remaining: 16.8s 550: learn: 1701.8703780 total: 20.6s remaining: 16.8s 551: learn: 1701.7375122 total: 20.6s remaining: 16.8s 552: learn: 1701.6181835 total: 20.7s remaining: 16.7s 553: learn: 1701.5040375 total: 20.7s remaining: 16.7s 554: learn: 1701.3217843 total: 20.8s remaining: 16.7s 555: learn: 1701.1424688 total: 20.8s remaining: 16.6s 556: learn: 1700.8518772 total: 20.8s remaining: 16.6s 557: learn: 1700.6148405 total: 20.9s remaining: 16.6s 558: learn: 1700.4257132 total: 20.9s remaining: 16.5s 559: learn: 1700.3250255 total: 21s remaining: 16.5s 560: learn: 1700.1770679 total: 21s remaining: 16.4s 561: learn: 1700.0776019 total: 21.1s remaining: 16.4s 562: learn: 1699.9274944 total: 21.1s remaining: 16.4s 563: learn: 1699.7862868 total: 21.1s remaining: 16.3s 564: learn: 1699.5850276 total: 21.2s remaining: 16.3s 565: learn: 1699.3798315 total: 21.2s remaining: 16.3s 566: learn: 1699.2410570 total: 21.2s remaining: 16.2s 567: learn: 1699.0771009 total: 21.3s remaining: 16.2s 568: learn: 1698.9200611 total: 21.3s remaining: 16.1s 569: learn: 1698.8371874 total: 21.3s remaining: 16.1s 570: learn: 1698.6693974 total: 21.4s remaining: 16.1s 571: learn: 1698.4458230 total: 21.4s remaining: 16s 572: learn: 1698.2352671 total: 21.5s remaining: 16s 573: learn: 1698.0114403 total: 21.5s remaining: 16s 574: learn: 1697.9620283 total: 21.5s remaining: 15.9s 575: learn: 1697.7285149 total: 21.6s remaining: 15.9s 576: learn: 1697.5572914 total: 21.6s remaining: 15.8s 577: learn: 1697.2472830 total: 21.6s remaining: 15.8s 578: learn: 1697.0865106 total: 21.7s remaining: 15.8s 579: learn: 1696.8986570 total: 21.7s remaining: 15.7s 580: learn: 1696.6495019 total: 21.7s remaining: 15.7s 581: learn: 1696.5006199 total: 21.8s remaining: 15.6s 582: learn: 1696.3224090 total: 21.8s remaining: 15.6s 583: learn: 1696.1328786 total: 21.8s remaining: 15.6s 584: learn: 1695.9305801 total: 21.9s remaining: 15.5s 585: learn: 1695.7786486 total: 21.9s remaining: 15.5s 586: learn: 1695.6893543 total: 21.9s remaining: 15.4s 587: learn: 1695.6262466 total: 22s remaining: 15.4s 588: learn: 1695.4207369 total: 22s remaining: 15.4s 589: learn: 1695.3013015 total: 22.1s remaining: 15.3s 590: learn: 1695.1384591 total: 22.1s remaining: 15.3s 591: learn: 1694.9932710 total: 22.1s remaining: 15.2s 592: learn: 1694.7981887 total: 22.2s remaining: 15.2s 593: learn: 1694.6564635 total: 22.2s remaining: 15.2s 594: learn: 1694.2926821 total: 22.2s remaining: 15.1s 595: learn: 1694.1911682 total: 22.3s remaining: 15.1s 596: learn: 1694.0801305 total: 22.3s remaining: 15s 597: learn: 1693.9037712 total: 22.3s remaining: 15s 598: learn: 1693.6964583 total: 22.4s remaining: 15s 599: learn: 1693.5709800 total: 22.4s remaining: 14.9s 600: learn: 1693.4147224 total: 22.4s remaining: 14.9s 601: learn: 1693.2299947 total: 22.5s remaining: 14.9s 602: learn: 1693.0585101 total: 22.5s remaining: 14.8s 603: learn: 1692.8997478 total: 22.6s remaining: 14.8s 604: learn: 1692.7430051 total: 22.6s remaining: 14.8s 605: learn: 1692.6348260 total: 22.6s remaining: 14.7s 606: learn: 1692.4622758 total: 22.7s remaining: 14.7s 607: learn: 1692.3065340 total: 22.7s remaining: 14.6s 608: learn: 1692.1929430 total: 22.7s remaining: 14.6s 609: learn: 1691.9846774 total: 22.8s remaining: 14.6s 610: learn: 1691.8124788 total: 22.8s remaining: 14.5s 611: learn: 1691.4861328 total: 22.9s remaining: 14.5s 612: learn: 1691.3641961 total: 22.9s remaining: 14.5s 613: learn: 1691.2283038 total: 22.9s remaining: 14.4s 614: learn: 1691.1405414 total: 23s remaining: 14.4s 615: learn: 1690.8798935 total: 23s remaining: 14.3s 616: learn: 1690.7862335 total: 23s remaining: 14.3s 617: learn: 1690.6369271 total: 23.1s remaining: 14.3s 618: learn: 1690.4939458 total: 23.1s remaining: 14.2s 619: learn: 1690.4174686 total: 23.1s remaining: 14.2s 620: learn: 1690.2908262 total: 23.2s remaining: 14.1s 621: learn: 1690.0350903 total: 23.2s remaining: 14.1s 622: learn: 1689.9688565 total: 23.3s remaining: 14.1s 623: learn: 1689.7280056 total: 23.3s remaining: 14s 624: learn: 1689.5527882 total: 23.3s remaining: 14s 625: learn: 1689.3676756 total: 23.4s remaining: 14s 626: learn: 1689.1537142 total: 23.4s remaining: 13.9s 627: learn: 1688.9982296 total: 23.4s remaining: 13.9s 628: learn: 1688.7045118 total: 23.5s remaining: 13.8s 629: learn: 1688.5835104 total: 23.5s remaining: 13.8s 630: learn: 1688.4210762 total: 23.6s remaining: 13.8s 631: learn: 1688.3263688 total: 23.6s remaining: 13.7s 632: learn: 1688.2075667 total: 23.6s remaining: 13.7s 633: learn: 1687.9733608 total: 23.7s remaining: 13.7s 634: learn: 1687.8498311 total: 23.7s remaining: 13.6s 635: learn: 1687.6652688 total: 23.7s remaining: 13.6s 636: learn: 1687.4933491 total: 23.8s remaining: 13.5s 637: learn: 1687.3639066 total: 23.8s remaining: 13.5s 638: learn: 1687.2164424 total: 23.8s remaining: 13.5s 639: learn: 1687.1215159 total: 23.9s remaining: 13.4s 640: learn: 1686.9672682 total: 23.9s remaining: 13.4s 641: learn: 1686.7835279 total: 23.9s remaining: 13.3s 642: learn: 1686.6344900 total: 24s remaining: 13.3s 643: learn: 1686.4368296 total: 24s remaining: 13.3s 644: learn: 1686.1283055 total: 24s remaining: 13.2s 645: learn: 1686.0319744 total: 24.1s remaining: 13.2s 646: learn: 1685.8784097 total: 24.1s remaining: 13.2s 647: learn: 1685.7247837 total: 24.1s remaining: 13.1s 648: learn: 1685.6297880 total: 24.2s remaining: 13.1s 649: learn: 1685.4069375 total: 24.2s remaining: 13s 650: learn: 1685.2322691 total: 24.2s remaining: 13s 651: learn: 1685.0892882 total: 24.3s remaining: 13s 652: learn: 1684.9853738 total: 24.3s remaining: 12.9s 653: learn: 1684.8911601 total: 24.4s remaining: 12.9s 654: learn: 1684.7264021 total: 24.4s remaining: 12.8s 655: learn: 1684.6448773 total: 24.4s remaining: 12.8s 656: learn: 1684.4500992 total: 24.5s remaining: 12.8s 657: learn: 1684.2712591 total: 24.5s remaining: 12.7s 658: learn: 1684.0837157 total: 24.5s remaining: 12.7s 659: learn: 1683.9473053 total: 24.6s remaining: 12.7s 660: learn: 1683.8144984 total: 24.6s remaining: 12.6s 661: learn: 1683.7403280 total: 24.6s remaining: 12.6s 662: learn: 1683.5161112 total: 24.7s remaining: 12.5s 663: learn: 1683.3907945 total: 24.7s remaining: 12.5s 664: learn: 1683.2898509 total: 24.8s remaining: 12.5s 665: learn: 1683.0970672 total: 24.8s remaining: 12.4s 666: learn: 1683.0061456 total: 24.8s remaining: 12.4s 667: learn: 1682.9051575 total: 24.9s remaining: 12.4s 668: learn: 1682.7057448 total: 24.9s remaining: 12.3s 669: learn: 1682.5544938 total: 24.9s remaining: 12.3s 670: learn: 1682.3945318 total: 25s remaining: 12.2s 671: learn: 1682.2167941 total: 25s remaining: 12.2s 672: learn: 1682.0770457 total: 25s remaining: 12.2s 673: learn: 1681.8354251 total: 25.1s remaining: 12.1s 674: learn: 1681.6784281 total: 25.1s remaining: 12.1s 675: learn: 1681.4974934 total: 25.1s remaining: 12s 676: learn: 1681.3850834 total: 25.2s remaining: 12s 677: learn: 1681.2517942 total: 25.2s remaining: 12s 678: learn: 1681.1611254 total: 25.2s remaining: 11.9s 679: learn: 1681.0016668 total: 25.3s remaining: 11.9s 680: learn: 1680.8935747 total: 25.3s remaining: 11.9s 681: learn: 1680.8247735 total: 25.3s remaining: 11.8s 682: learn: 1680.7218487 total: 25.4s remaining: 11.8s 683: learn: 1680.5864594 total: 25.4s remaining: 11.7s 684: learn: 1680.4605058 total: 25.5s remaining: 11.7s 685: learn: 1680.2376068 total: 25.5s remaining: 11.7s 686: learn: 1680.1758320 total: 25.5s remaining: 11.6s 687: learn: 1680.0241639 total: 25.6s remaining: 11.6s 688: learn: 1679.8791839 total: 25.6s remaining: 11.5s 689: learn: 1679.7162481 total: 25.6s remaining: 11.5s 690: learn: 1679.5416180 total: 25.7s remaining: 11.5s 691: learn: 1679.4170630 total: 25.7s remaining: 11.4s 692: learn: 1679.2436408 total: 25.7s remaining: 11.4s 693: learn: 1679.1117381 total: 25.8s remaining: 11.4s 694: learn: 1678.9576004 total: 25.8s remaining: 11.3s 695: learn: 1678.8058421 total: 25.8s remaining: 11.3s 696: learn: 1678.6894377 total: 25.9s remaining: 11.3s 697: learn: 1678.5200190 total: 25.9s remaining: 11.2s 698: learn: 1678.3965343 total: 26s remaining: 11.2s 699: learn: 1678.2687540 total: 26s remaining: 11.1s 700: learn: 1678.0802346 total: 26s remaining: 11.1s 701: learn: 1677.8958989 total: 26.1s remaining: 11.1s 702: learn: 1677.7963547 total: 26.1s remaining: 11s 703: learn: 1677.6392694 total: 26.1s remaining: 11s 704: learn: 1677.4352800 total: 26.2s remaining: 10.9s 705: learn: 1677.3125238 total: 26.2s remaining: 10.9s 706: learn: 1677.1476560 total: 26.2s remaining: 10.9s 707: learn: 1677.0367564 total: 26.3s remaining: 10.8s 708: learn: 1676.9361584 total: 26.3s remainin