Tags
O’Reilly released a free downloadable report a while back that presents the results of a survey of Data Scientists across the industry – circa 250 respondents. The report looks at a list of skills and classifies Data Scientists into 4 main categories:
- Data Businessperson
- Data Creative
- Data Developer
- Data Researcher
Under each of these headings the roles are defined as:
As an Architect I can see a fit to the “Jack of All Trades” box, however I think that there is a reach across the Researcher, Creative and Businessperson categories if we were to be classed. However as an Architect it is important to understand the skills that a Data Scientist needs across these areas as going forward there will be more opportunities to work side by side with Data Scientists in solutions and architectures.
The report gives a list of skills that a Data Scientist has under each classification of Data Scientist
- Algorithms (ex: computational complexity, CS theory)
- Back-End Programming (ex: JAVA/Rails/Objective C)
- Bayesian/Monte-Carlo Statistics (ex: MCMC, BUGS)
- Big and Distributed Data (ex: Hadoop, Map/Reduce)
- Business (ex: management, business development, budgeting)
- Classical Statistics (ex: general linear model, ANOVA)
- Data Manipulation (ex: regexes, R, SAS, web scraping)
- Front-End Programming (ex: JavaScript, HTML, CSS)
- Graphical Models (ex: social networks, Bayes networks)
- Machine Learning (ex: decision trees, neural nets, SVM, clustering)
- Math (ex: linear algebra, real analysis, calculus)
- Optimization (ex: linear, integer, convex, global)
- Product Development (ex: design, project management)
- Science (ex: experimental design, technical writing/publishing)
- Simulation (ex: discrete, agent-based, continuous)
- Spatial Statistics (ex: geographic covariates, GIS)
- Structured Data (ex: SQL, JSON, XML)
- Surveys and Marketing (ex: multinomial modeling)
- Systems Administration (ex: *nix, DBA, cloud tech.)
- Temporal Statistics (ex: forecasting, time-series analysis)
- Unstructured Data (ex: noSQL, text mining)
- Visualization
ML = Machine Learning
OR = Operations Research
From reading other reports this is by no means a full list of skills but provides a good insight into what a Data Scientist needs in their skills bag.
The report then looks at typical tasks that would be covered by each category and splits these into 22 core tasks across 5 main tasks.
The visualisation below illustrates the results showing the skills and tasks across each Data Scientist type to show a percentage of skill that is needed.
Overall a good report giving a highlight of the business areas and skills of a Data Scientist
Report Source
Analyzing the Analyzers
An Introspective Survey of Data Scientists and Their Work
http://www.oreilly.com/data/free/analyzing-the-analyzers.csp
Pingback: Learning Data Science – Useful References | Max Hemingway