The best way to meet the requirements is to use a principal component analysis (PCA) model, which is a technique that reduces the dimensionality of the dataset by transforming the original variables into a smaller set of new variables, called principal components, that capture most of the variance and information in the data1. This technique has the following advantages:
It can increase the accuracy of the model by removing noise, redundancy, and multicollinearity from the data, and by enhancing the interpretability and generalization of the model23.
It can decrease the processing time of the model by reducing the number of features and the computational complexity of the model, and by improving the convergence and stability of the model45.
It is suitable for numeric variables, as it relies on the covariance or correlation matrix of the data, and it can handle a large quantity of variables, as it can extract the most relevant ones16.
The other options are not effective or appropriate, because they have the following drawbacks:
A: Creating new features and interaction variables can increase the accuracy of the model by capturing more complex and nonlinear relationships in the data, but it can also increase the processing time of the model by adding more features and increasing the computational complexity of the model7. Moreover, it can introduce more noise, redundancy, and multicollinearity in the data, which can degrade the performance and interpretability of the model8.
C: Applying normalization on the feature set can increase the accuracy of the model by scaling the features to a common range and avoiding the dominance of some features over others, but it can also decrease the processing time of the model by reducing the numerical instability and improving the convergence of the model . However, normalization alone is not enough to address the high dimensionality and high latency issues of the dataset, as it does not reduce the number of features or the variance in the data.
D: Using a multiple correspondence analysis (MCA) model is not suitable for numeric variables, as it is a technique that reduces the dimensionality of the dataset by transforming the original categorical variables into a smaller set of new variables, called factors, that capture most of the inertia and information in the data. MCA is similar to PCA, but it is designed for nominal or ordinal variables, not for continuous or interval variables.
1: Principal Component Analysis - Amazon SageMaker
2: How to Use PCA for Data Visualization and Improved Performance in Machine Learning | by Pratik Shukla | Towards Data Science
3: Principal Component Analysis (PCA) for Feature Selection and some of its Pitfalls | by Nagesh Singh Chauhan | Towards Data Science
4: How to Reduce Dimensionality with PCA and Train a Support Vector Machine in Python | by James Briggs | Towards Data Science
5: Dimensionality Reduction and Its Applications | by Aniruddha Bhandari | Towards Data Science
6: Principal Component Analysis (PCA) in Python | by Susan Li | Towards Data Science
7: Feature Engineering for Machine Learning | by Dipanjan (DJ) Sarkar | Towards Data Science
8: Feature Engineering — How to Engineer Features and How to Get Good at It | by Parul Pandey | Towards Data Science
[Feature Scaling for Machine Learning: Understanding the Difference Between Normalization vs. Standardization | by Benjamin Obi Tayo Ph.D. | Towards Data Science]
[Why, How and When to Scale your Features | by George Seif | Towards Data Science]
[Normalization vs Dimensionality Reduction | by Saurabh Annadate | Towards Data Science]
[Multiple Correspondence Analysis - Amazon SageMaker]
[Multiple Correspondence Analysis (MCA) | by Raul Eulogio | Towards Data Science]