-
p-value 定义,benefits,limitations
-
Linear Regression 的 assumptions name and describe,选两个 describe 如何 test
-
Overfitting 定义,什么时候会发生,如何 detect
-
When developing a linear/logistic regression model with hundreds of variables, which modelling technique will you consider for variable reduction? Name a couple of effective techniques and describe these in detail.
-
What aspects will you pay attention to when preparing the modelling dataset? Please elaborate the treatments when facing the described situation and explain why the treatments are appropriate.
-
Name at least 3 performance metrics for the classification model. Describe their meaning, calculation, advantages, and disadvantages.
-
What is the normal distribution? What are the properties of normally distributed random variable? How to check if the data follows normal distribution?
-
Imagine the model is already in the production. What are the checks and controls that have to be implemented to monitor the model? What are the reasons that can cause model deterioration?
-
What is the data standardization? What is the formula for data standardization? In what situations and why standardization has to be applied?
-
What is the decision tree? What are the advantages of the decision tree comparing to OLS? How is the decision tree constructed?