At the end of 2014 O’Reilly published a Data Science Salary Survey report. Two areas of the report that caught my attention not because of the Salary side, but because of the other data collected and the trends it shows.
The first of these is the popularity of Tools that help enable Data Scientists. R and Excel seem to be on a par which is interesting to see as R is typically seen as being more powerful than Excel (I’m sure there is a bigger debate around that but wont get into it here!) , although Excel is more graphically pleasing to the user in manipulation of the data. However the data does not show where someone is using both or has a preference between one and the other.
The respondents fall into several roles, which is most probably the swing between a Windows and Linux type environment and the tools used:
- Analyst – includes coding
- Software developer
- Technical lead
- Product developer
- Non-coding Analyst
- Database administrator
- UI/UX developer
Interesting that there is no one single role for a Data Scientist listed in the roles.
The report also shows the use of amount of cloud computing that is used by Data Scientists that responded to the survey. Approx a third still not moving to cloud, however two thirds are using it or experimenting with it in some way. As the common tools are now being altered for the cloud, such as R cluster computing which is now available, there will be more shift to a cloud experience for data manipulation. The one thing that lets R down is the use of memory to hold and load data. The bigger the data set the more memory you need. This may change over time as a limitation and R Cluster is one way around this.
Of course this is only a report based on a number of respondents showing a sample of what is being carried out in the field of Data Science. The trends may be different if run with a bigger data set and different roles responded.