Below is my personalised training plan to move into a career in Data/Analytics with Machine Learning, with specific reference to consumer-direct/e-commerce business. To be implemented throughout March- November 2016.
Areas to cover
Mathematics: Linear algebra, Multivariate calculus
Probability and Statistics: Inferential stats, Descriptive stats, Regression analysis (linear, logistic), Cluster analysis, Classification and regression tree model (CART), Hypothesis testing, Decision trees, Predictive modelling/forecasting
Programming/Developing: Java, C/C++, Python, R, Hadoop, Spark, SQL, XML, VBA/Advanced Excel, HTML, API’s
Data Science/Analytics: Data cleaning/exploration/manipulation, Data analysis, Data analysis libraries (e.g. for Python: NumPy, Pandas, etc), Clustering, Segmentation, Time series forecasting, Making inferences/predictions, Building data pipelines, Working with large datasets/databases, Distributed computing (Hadoop), Data mining, Data analysis practice (Dataquest, GitHub)
Data visualisation: Visualisation, Presentation, Dashboards, Platforms (Clikview, Tableau, D3.js)
Machine Learning: Boosting, Ensemble modelling
Core Objective: skills development
1. Data Scientist – essential
Data capture, cleaning and handling
Data organisation, analysis and insight-generation
Data visualisation and presentation
Programming languages: Python, R, SAS, Matlab, SQL (+ latest technologies: Hive, Pig, Spark)
Other skills: Distributed computing (Hadoop); Predictive modelling; Maths/stats; Machine learning
Talents/characteristics: Inquisitive; Pattern-seeking; Story-teller; Visual communicator
1A. Data & Analytics Manager – secondary
Managing teams of data scientists and analysts
Programming languages: Python, R, SAS, Matlab, SQL, Java
Other skills: Leadership; Project management; Data mining; Predictive modelling
Talents/characteristics: Leader; Communicator; Interpersonal skills
2. Data Analyst – essential
Collecting, processing and analysing data (primarily statistical analysis)
Knowledge of area of data specialism (e.g. business processes, marketing)
Programming languages: Python, R, C/C++, SQL, JavaScript, HTML
Other skills: Advanced Excel; Databases (SQL, noSQL); Maths/stats; Machine learning
Talents/characteristics: Intuitive; Analytical; Communicator
3. Additional skills (Data architect) – optional
Integrating, protecting and maintaining data sources
Understanding database architecture
Maintaining databases/data warehouses; Ensuring data readily accessible to all users
Programming languages: SQL, XML (+ latest technologies: Hive, Pig, Spark)
Other skills: Modelling; Systems development; BI tools
4. Additional skills (Data engineer) – optional
Developing, building, testing and maintaining architectures (e.g. databases, processing systems)
Programming languages: SQL, R, Matlab, SAS, SPSS, Python, Java, C++, Ruby, Perl (+ latest technologies: Hive, Pig)
Other skills: Modelling; Databases (SQL, noSQL); API’s; Data warehousing
5. Additional skills (Database administrator) – optional
Maintaining integrity and accessibility of all data
Other skills: Data modelling/design; Data backup/recovery; Databases (SQL, noSQL); Data security; ERP/business systems knowledge
6. Additional skills (Business analyst) – optional
Improving business processes
Business partnering
Providing interface between Systems/IT and rest of business/data-users
Programming languages: SQL
Other skills: Excel; Data visualisation (e.g. Tableau); Data modelling; Project management