Tags

,

O’Reilly released a free downloadable report a while back that presents the results of a survey of Data Scientists across the industry – circa 250 respondents. The report looks at a list of skills and classifies Data Scientists into 4 main categories:

  • Data Businessperson
  • Data Creative
  • Data Developer
  • Data Researcher

Under each of these headings the roles are defined as:

DS+Types

As an Architect I can see a fit to the “Jack of All Trades” box, however I think that there is a reach across the Researcher, Creative and Businessperson categories if we were to be classed. However as an Architect it is important to understand the skills that a Data Scientist needs across these areas as going forward there will be more opportunities to work side by side with Data Scientists in solutions and architectures.

The report gives a list of skills that a Data Scientist has under each classification of Data Scientist

  • Algorithms (ex: computational complexity, CS theory)
  • Back-End Programming (ex: JAVA/Rails/Objective C)
  • Bayesian/Monte-Carlo Statistics (ex: MCMC, BUGS)
  • Big and Distributed Data (ex: Hadoop, Map/Reduce)
  • Business (ex: management, business development, budgeting)
  • Classical Statistics (ex: general linear model, ANOVA)
  • Data Manipulation (ex: regexes, R, SAS, web scraping)
  • Front-End Programming (ex: JavaScript, HTML, CSS)
  • Graphical Models (ex: social networks, Bayes networks)
  • Machine Learning (ex: decision trees, neural nets, SVM, clustering)
  • Math (ex: linear algebra, real analysis, calculus)
  • Optimization (ex: linear, integer, convex, global)
  • Product Development (ex: design, project management)
  • Science (ex: experimental design, technical writing/publishing)
  • Simulation (ex: discrete, agent-based, continuous)
  • Spatial Statistics (ex: geographic covariates, GIS)
  • Structured Data (ex: SQL, JSON, XML)
  • Surveys and Marketing (ex: multinomial modeling)
  • Systems Administration (ex: *nix, DBA, cloud tech.)
  • Temporal Statistics (ex: forecasting, time-series analysis)
  • Unstructured Data (ex: noSQL, text mining)
  • Visualization

ML = Machine Learning

OR = Operations Research

From reading other reports this is by no means a full list of skills but provides a good insight into what a Data Scientist needs in their skills bag.

The report then looks at typical tasks that would be covered by each category and splits these into 22 core tasks across 5 main tasks.

Data+Science+Skills+2

The visualisation below illustrates the results showing the skills and tasks across each Data Scientist type to show a percentage of skill that is needed.

Data+Science+Skills

Overall a good report giving a highlight of the business areas and skills of a Data Scientist

Report Source

Analyzing the Analyzers

An Introspective Survey of Data Scientists and Their Work

http://www.oreilly.com/data/free/analyzing-the-analyzers.csp