Informações sobre o curso
4.7
377 classificações
83 avaliações
If you want to break into competitive data science, then this course is for you! Participating in predictive modelling competitions can help you gain practical experience, improve and harness your data modelling skills in various domains such as credit, insurance, marketing, natural language processing, sales’ forecasting and computer vision to name a few. At the same time you get to do it in a competitive context against thousands of participants where each one tries to build the most predictive algorithm. Pushing each other to the limit can result in better performance and smaller prediction errors. Being able to achieve high ranks consistently can help you accelerate your career in data science. In this course, you will learn to analyse and solve competitively such predictive modelling tasks. When you finish this class, you will: - Understand how to solve predictive modelling competitions efficiently and learn which of the skills obtained can be applicable to real-world tasks. - Learn how to preprocess the data and generate new features from various sources such as text and images. - Be taught advanced feature engineering techniques like generating mean-encodings, using aggregated statistical measures or finding nearest neighbors as a means to improve your predictions. - Be able to form reliable cross validation methodologies that help you benchmark your solutions and avoid overfitting or underfitting when tested with unobserved (test) data. - Gain experience of analysing and interpreting the data. You will become aware of inconsistencies, high noise levels, errors and other data-related issues such as leakages and you will learn how to overcome them. - Acquire knowledge of different algorithms and learn how to efficiently tune their hyperparameters and achieve top performance. - Master the art of combining different machine learning models and learn how to ensemble. - Get exposed to past (winning) solutions and codes and learn how to read them. Disclaimer : This is not a machine learning course in the general sense. This course will teach you how to get high-rank solutions against thousands of competitors with focus on practical usage of machine learning methods rather than the theoretical underpinnings behind them. Prerequisites: - Python: work with DataFrames in pandas, plot figures in matplotlib, import and train models from scikit-learn, XGBoost, LightGBM. - Machine Learning: basic understanding of linear models, K-NN, random forest, gradient boosting and neural networks....
Globe

cursos 100% online

Comece imediatamente e aprenda em seu próprio cronograma.
Calendar

Prazos flexíveis

Redefinir os prazos de acordo com sua programação.
Advanced Level

Nível avançado

Clock

Approx. 45 hours to complete

Sugerido: 6-10 hours/week...
Comment Dots

English

Legendas: English...

Habilidades que você terá

Data AnalysisFeature ExtractionFeature EngineeringXgboost
Globe

cursos 100% online

Comece imediatamente e aprenda em seu próprio cronograma.
Calendar

Prazos flexíveis

Redefinir os prazos de acordo com sua programação.
Advanced Level

Nível avançado

Clock

Approx. 45 hours to complete

Sugerido: 6-10 hours/week...
Comment Dots

English

Legendas: English...

Programa - O que você aprenderá com este curso

Week
1
Clock
6 horas para concluir

Introduction & Recap

This week we will introduce you to competitive data science. You will learn about competitions' mechanics, the difference between competitions and a real life data science, hardware and software that people usually use in competitions. We will also briefly recap major ML models frequently used in competitions....
Reading
8 vídeos (Total de 46 min), 7 leituras, 6 testes
Video8 videos
Meet your lecturers2min
Course overview7min
Competition Mechanics6min
Kaggle Overview [screencast]7min
Real World Application vs Competitions5min
Recap of main ML algorithms9min
Software/Hardware Requirements5min
Reading7 leituras
Welcome!10min
Week 1 overview10min
Disclaimer10min
Explanation for quiz questions10min
Additional Materials and Links10min
Explanation for quiz questions10min
Additional Material and Links10min
Quiz5 exercícios práticos
Practice Quiz8min
Recap8min
Recap12min
Software/Hardware6min
Graded Soft/Hard Quiz8min
Clock
2 horas para concluir

Feature Preprocessing and Generation with Respect to Models

In this module we will summarize approaches to work with features: preprocessing, generation and extraction. We will see, that the choice of the machine learning model impacts both preprocessing we apply to the features and our approach to generation of new ones. We will also discuss feature extraction from text with Bag Of Words and Word2vec, and feature extraction from images with Convolution Neural Networks....
Reading
7 vídeos (Total de 73 min), 4 leituras, 4 testes
Video7 videos
Numeric features13min
Categorical and ordinal features10min
Datetime and coordinates8min
Handling missing values10min
Bag of words10min
Word2vec, CNN13min
Reading4 leituras
Explanation for quiz questions10min
Additional Material and Links10min
Explanation for quiz questions10min
Additional Material and Links10min
Quiz4 exercícios práticos
Feature preprocessing and generation with respect to models8min
Feature preprocessing and generation with respect to models8min
Feature extraction from text and images8min
Feature extraction from text and images8min
Clock
29 minutos para concluir

Final Project Description

This is just a reminder, that the final project in this course is better to start soon! The final project is in fact a competition, in this module you can find an information about it....
Reading
1 vídeo (Total de 4 min), 2 leituras
Video1 vídeos
Reading2 leituras
Final project10min
Final project advice #110min
Week
2
Clock
2 horas para concluir

Exploratory Data Analysis

We will start this week with Exploratory Data Analysis (EDA). It is a very broad and exciting topic and an essential component of solving process. Besides regular videos you will find a walk through EDA process for Springleaf competition data and an example of prolific EDA for NumerAI competition with extraordinary findings....
Reading
8 vídeos (Total de 80 min), 2 leituras, 1 teste
Video8 videos
Building intuition about the data6min
Exploring anonymized data15min
Visualizations11min
Dataset cleaning and other things to check7min
Springleaf competition EDA I8min
Springleaf competition EDA II16min
Numerai competition EDA6min
Reading2 leituras
Week 2 overview10min
Additional material and links10min
Quiz1 exercício prático
Exploratory data analysis12min
Clock
2 horas para concluir

Validation

In this module we will discuss various validation strategies. We will see that the strategy we choose depends on the competition setup and that correct validation scheme is one of the bricks for any winning solution. ...
Reading
4 vídeos (Total de 51 min), 3 leituras, 2 testes
Video4 videos
Validation strategies7min
Data splitting strategies14min
Problems occurring during validation20min
Reading3 leituras
Validation strategies10min
Comments on quiz10min
Additional material and links10min
Quiz2 exercícios práticos
Validation8min
Validation8min
Clock
5 horas para concluir

Data Leakages

Finally, in this module we will cover something very unique to data science competitions. That is, we will see examples how it is sometimes possible to get a top position in a competition with a very little machine learning, just by exploiting a data leakage. ...
Reading
3 vídeos (Total de 26 min), 3 leituras, 3 testes
Video3 videos
Leaderboard probing and examples of rare data leaks9min
Expedia challenge9min
Reading3 leituras
Comments on quiz10min
Additional material and links10min
Final project advice #210min
Quiz1 exercício prático
Data leakages8min
Week
3
Clock
3 horas para concluir

Metrics Optimization

This week we will first study another component of the competitions: the evaluation metrics. We will recap the most prominent ones and then see, how we can efficiently optimize a metric given in a competition....
Reading
8 vídeos (Total de 83 min), 3 leituras, 2 testes
Video8 videos
Regression metrics review I14min
Regression metrics review II8min
Classification metrics review20min
General approaches for metrics optimization6min
Regression metrics optimization10min
Classification metrics optimization I7min
Classification metrics optimization II6min
Reading3 leituras
Week 3 overview10min
Comments on quiz10min
Additional material and links10min
Quiz2 exercícios práticos
Metrics12min
Metrics12min
Clock
4 horas para concluir

Advanced Feature Engineering I

In this module we will study a very powerful technique for feature generation. It has a lot of names, but here we call it "mean encodings". We will see the intuition behind them, how to construct them, regularize and extend them. ...
Reading
3 vídeos (Total de 27 min), 2 leituras, 2 testes
Video3 videos
Regularization7min
Extensions and generalizations10min
Reading2 leituras
Comments on quiz10min
Final project advice #310min
Quiz1 exercício prático
Mean encodings8min
Week
4
Clock
3 horas para concluir

Hyperparameter Optimization

In this module we will talk about hyperparameter optimization process. We will also have a special video with practical tips and tricks, recorded by four instructors....
Reading
6 vídeos (Total de 86 min), 4 leituras, 2 testes
Video6 videos
Hyperparameter tuning II12min
Hyperparameter tuning III13min
Practical guide16min
KazAnova's competition pipeline, part 118min
KazAnova's competition pipeline, part 217min
Reading4 leituras
Week 4 overview10min
Comments on quiz10min
Additional material and links10min
Additional materials and links10min
Quiz2 exercícios práticos
Practice quiz6min
Graded quiz8min
Clock
4 horas para concluir

Advanced feature engineering II

In this module we will learn about a few more advanced feature engineering techniques....
Reading
4 vídeos (Total de 22 min), 2 leituras, 2 testes
Video4 videos
Matrix factorizations6min
Feature Interactions5min
t-SNE5min
Reading2 leituras
Comments on quiz10min
Additional Materials and Links10min
Quiz1 exercício prático
Graded Advanced Features II Quiz12min
Clock
10 horas para concluir

Ensembling

Nowadays it is hard to find a competition won by a single model! Every winning solution incorporates ensembles of models. In this module we will talk about the main ensembling techniques in general, and, of course, how it is better to ensemble the models in practice. ...
Reading
8 vídeos (Total de 92 min), 4 leituras, 4 testes
Video8 videos
Bagging9min
Boosting16min
Stacking16min
StackNet14min
Ensembling Tips and Tricks14min
CatBoost 17min
CatBoost 27min
Reading4 leituras
Validation schemes for 2-nd level models10min
Comments on quiz10min
Additional materials and links10min
Final project advice #410min
Quiz2 exercícios práticos
Ensembling8min
Ensembling12min
4.7
Direction Signs

33%

comecei uma nova carreira após concluir estes cursos
Briefcase

83%

consegui um benefício significativo de carreira com este curso

Melhores avaliações

por MSMar 29th 2018

Top Kagglers gently introduce one to Data Science Competitions. One will have a great chance to learn various tips and tricks and apply them in practice throughout the course. Highly recommended!

por MMNov 10th 2017

This course is fantastic. It's chock full of practical information that is presented clearly and concisely. I would like to thank the team for sharing their knowledge so generously.

Instrutores

Dmitry Ulyanov

Visiting lecturer
HSE Faculty of Computer Science

Alexander Guschin

Visiting lecturer at HSE, Lecturer at MIPT
HSE Faculty of Computer Science

Mikhail Trofimov

Visiting lecturer
HSE Faculty of Computer Science

Dmitry Altukhov

Visiting lecturer
HSE Faculty of Computer Science

Marios Michailidis

Research Data Scientist
H2O.ai

Sobre National Research University Higher School of Economics

National Research University - Higher School of Economics (HSE) is one of the top research universities in Russia. Established in 1992 to promote new research and teaching in economics and related disciplines, it now offers programs at all levels of university education across an extraordinary range of fields of study including business, sociology, cultural studies, philosophy, political science, international relations, law, Asian studies, media and communications, IT, mathematics, engineering, and more. Learn more on www.hse.ru...

Sobre o Programa de cursos integrados Advanced Machine Learning

This specialization gives an introduction to deep learning, reinforcement learning, natural language understanding, computer vision and Bayesian methods. Top Kaggle machine learning practitioners and CERN scientists will share their experience of solving real-world problems and help you to fill the gaps between theory and practice. Upon completion of 7 courses you will be able to apply modern machine learning methods in enterprise and understand the caveats of real-world data and settings....
Advanced Machine Learning

Perguntas Frequentes – FAQ

  • Ao se inscrever para um Certificado, você terá acesso a todos os vídeos, testes e tarefas de programação (se aplicável). Tarefas avaliadas pelos colegas apenas podem ser enviadas e avaliadas após o início da sessão. Caso escolha explorar o curso sem adquiri-lo, talvez você não consiga acessar certas tarefas.

  • Quando você se inscreve no curso, tem acesso a todos os cursos na Especialização e pode obter um certificado quando concluir o trabalho. Seu Certificado eletrônico será adicionado à sua página de Participações e você poderá imprimi-lo ou adicioná-lo ao seu perfil no LinkedIn. Se quiser apenas ler e assistir o conteúdo do curso, você poderá frequentá-lo como ouvinte sem custo.

Mais dúvidas? Visite o Central de Ajuda ao Aprendiz.