• About Me Card

Max Hemingway

~ Musings as I work through life, career and everything.

Max Hemingway

Tag Archives: Data Science

Query LinkedIn with RlinkedIn

20 Friday Mar 2015

Posted by Max Hemingway in Data Science, Programming

≈ Leave a comment

Tags

Data Science, Programming

A good article has appeared on R Bloggers on how to analyse LinkedIn using R.

It shows how to analyse using a package called RlinkedIn to create a Wordcloud/TagCloud.

tag cloud

Rlinkedin can also be used to query a number of Linkedin API’s

  • Connections API
  • Profile API
  • People Search API
  • Job Search API
  • Company Profile API
  • Groups API
  • Share API

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on LinkedIn (Opens in new window) LinkedIn
  • Click to email a link to a friend (Opens in new window) Email
  • Click to share on Pinterest (Opens in new window) Pinterest
  • Click to share on Reddit (Opens in new window) Reddit
  • Click to share on Tumblr (Opens in new window) Tumblr
  • Click to share on Pocket (Opens in new window) Pocket
  • Click to share on Telegram (Opens in new window) Telegram
  • Click to share on Threads (Opens in new window) Threads
  • Click to share on WhatsApp (Opens in new window) WhatsApp
  • Click to share on Mastodon (Opens in new window) Mastodon
  • Click to share on X (Opens in new window) X
  • Click to share on Bluesky (Opens in new window) Bluesky
Like Loading...

R Cheat Sheets

13 Friday Mar 2015

Posted by Max Hemingway in Data Science, Programming

≈ Leave a comment

Tags

Data Science, Programming, R

There is a good collection of R Cheat Sheets at RStudio that cover:

  • Package Development Cheat Sheet
  • Data Wrangling Cheat Sheet (using dplyr and tidyr)
  • R Markdown Cheat Sheet
  • R Markdown Reference Guide (using Markdown, Knitr and Pandoc)

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on LinkedIn (Opens in new window) LinkedIn
  • Click to email a link to a friend (Opens in new window) Email
  • Click to share on Pinterest (Opens in new window) Pinterest
  • Click to share on Reddit (Opens in new window) Reddit
  • Click to share on Tumblr (Opens in new window) Tumblr
  • Click to share on Pocket (Opens in new window) Pocket
  • Click to share on Telegram (Opens in new window) Telegram
  • Click to share on Threads (Opens in new window) Threads
  • Click to share on WhatsApp (Opens in new window) WhatsApp
  • Click to share on Mastodon (Opens in new window) Mastodon
  • Click to share on X (Opens in new window) X
  • Click to share on Bluesky (Opens in new window) Bluesky
Like Loading...

Correlation does not imply causation

10 Tuesday Mar 2015

Posted by Max Hemingway in Data Science

≈ 1 Comment

Tags

Data Science, learning

Watching different Data Science & Statistics training videos one statement that comes up often is “Correlation does not imply causation”.

Wikipedia defines this as:

Correlation does not imply causation is a phrase used in science and statistics to emphasize that a correlation between two variables does not necessarily imply that one causes the other.

The xkcd comic site has a great strip on the subject

correlation

To illustrate this further, there are a number of Graphs that have been put together that visualise why Correlation does not imply causation.

graph

If however Mozzarella Cheese is ever found as a link to Engineering Doctorates it will be time to by shares in the cheese manufacturing companies as sales would soar!

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on LinkedIn (Opens in new window) LinkedIn
  • Click to email a link to a friend (Opens in new window) Email
  • Click to share on Pinterest (Opens in new window) Pinterest
  • Click to share on Reddit (Opens in new window) Reddit
  • Click to share on Tumblr (Opens in new window) Tumblr
  • Click to share on Pocket (Opens in new window) Pocket
  • Click to share on Telegram (Opens in new window) Telegram
  • Click to share on Threads (Opens in new window) Threads
  • Click to share on WhatsApp (Opens in new window) WhatsApp
  • Click to share on Mastodon (Opens in new window) Mastodon
  • Click to share on X (Opens in new window) X
  • Click to share on Bluesky (Opens in new window) Bluesky
Like Loading...

Data Science in the Cloud ebook

09 Monday Mar 2015

Posted by Max Hemingway in Cloud, Data Science

≈ 1 Comment

Tags

Cloud, Data Science

Microsoft have released a free e-book to download about using Data Science, R and Azure ML (Machine Learning).

Data Science in the Cloud with Microsoft Azure Machine Learning and R

The topics in the book cover:

  • Data management with Azure ML.
  • Data transformation with Azure ML and R.
  • Data I/O between Azure ML and the R Scripts.
  • R graphics with Azure ML.
  • Building and evaluating machine learning models with Azure ML and R.
  • Publishing Azure ML models as a web service.

Added to my every increasing pile of things to read.

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on LinkedIn (Opens in new window) LinkedIn
  • Click to email a link to a friend (Opens in new window) Email
  • Click to share on Pinterest (Opens in new window) Pinterest
  • Click to share on Reddit (Opens in new window) Reddit
  • Click to share on Tumblr (Opens in new window) Tumblr
  • Click to share on Pocket (Opens in new window) Pocket
  • Click to share on Telegram (Opens in new window) Telegram
  • Click to share on Threads (Opens in new window) Threads
  • Click to share on WhatsApp (Opens in new window) WhatsApp
  • Click to share on Mastodon (Opens in new window) Mastodon
  • Click to share on X (Opens in new window) X
  • Click to share on Bluesky (Opens in new window) Bluesky
Like Loading...

Course on Data Analysis and Statistical Inference

06 Friday Mar 2015

Posted by Max Hemingway in Data Science

≈ Leave a comment

Tags

Data Science, learning, R

Scanning my daily feeds from feedly I came across this post about a new Data Analysis and Statistical Inference course on Coursera that has just started this week. looks to be a good grounding on the subject.

The course is split into 7 modules

  • Unit 1 – Introduction to data
  • Unit 2 – Probability and distributions
  • Unit 3 – Foundations for inference
  • Unit 4 – Inference for numerical variables
  • Unit 5 – Inference for categorical variables
  • Unit 6 – Introduction to linear regression
  • Unit 7 – Multiple linear regression

The timetable of work over 10 weeks

Week 1: Introduction to Data, March 2 – 9

Review the START HERE! pages
Review Learning Objectives for Unit 1
Watch the videos for Unit 1 Introduction to Data
Start Quiz 1 — due at 13:00 EST (-5:00), Monday, March 16
Begin Lab 0 — due at 13:00 EST (-5:00), Monday, March 16, this lab is not graded (for practice)
Begin Lab 1 — due at 13:00 EST (-5:00), Monday, March 16
Explore the Discussion Forums and contribute
Week 2: Probability and Distributions, March 9 – 16

Review Learning Objectives for Unit 2
View videos for Unit 2 Probability and Distributions
Start Quiz 2 — due at 13:00 EST (-5:00), Monday, March 23
Begin Lab 2 — due at 13:00 EST (-5:00), Monday, March 23
Explore the Discussion Forums and contribute
Begin your Project Proposal — due at 13:00 EST (-5:00), Monday, March 23
Complete Quiz 1 — due at 13:00 EST (-5:00), Monday, March 16
Complete Lab 0 and Lab 1 — due at 13:00 EST (-5:00), March 16
Week 3: Foundations for Inference, March 16 – 23

Review Learning Objectives for Unit 3
View videos for Unit 3 Foundations for Inference
Start Quiz 3 — due at 13:00 EST (-5:00), Monday, March 30
Begin Labs 3A and 3B — due at 13:00 EST (-5:00), Monday, March 30
Explore the Discussion Forums and contribute
Submit your Project Proposal before 13:00 EST (-5:00), Monday, March 23
Complete Quiz 2 — due at 13:00 EST (-5:00), Monday, March 23
Complete Lab 2 — due at 13:00 EST (-5:00), Monday, March 23
Week 4: Foundations for Inference and Midterm, March 23 – 30

No new materials
Review Learning Objectives for Unit 3
Complete videos for Unit 3 Foundations for Inference
Complete Quiz 3 — due at 13:00 EST (-5:00), Monday, March 30
Complete Lab 3A and 3B — due at 13:00 EST (-5:00), Monday, March 30
Begin assessing Project Proposals — due 13:00 EST (-5:00), Monday, April 6
Begin Midterm — due 13:00 EST (-5:00), Monday, April 6
Week 5: Statistical Inference for Numerical Variables, March 30 – April 6

Review Learning Objectives for Unit 4
View videos for Unit 4 Statistical Inference for Numerical Variables
Start Quiz 4 — due at 13:00 EST (-5:00), Monday, April 13
Begin Lab 4 — due at 13:00 EST (-5:00), Monday, April 13
Explore the Discussion Forums and contribute
Complete Project Proposal assessments — due 13:00 EST (-5:00), Monday, April 6
Complete Midterm — due 13:00 EST (-5:00), Monday, April 6
Please submit at least 3 hours before the deadline
Week 6: Statistical Inference for Categorical Variables, April 6 – 13

Review Learning Objectives for Unit 5
View videos for Unit 5 Statistical Inference for Categorical Variables
Start Quiz 5 — due at 13:00 EST (-5:00), Monday, April 20
Begin Lab 5 — due at 13:00 EST (-5:00), Monday, April 20
Begin Data Analysis Project — due at 13:00 EST (-5:00), Monday, April 20
Explore the Discussion Forums and contribute
Complete Quiz 4 — due at 13:00 EST (-5:00), Monday, April 13
Complete Lab 4 — due at 13:00 EST (-5:00), Monday, April 13
Week 7: Simple Linear Regression, April 13 – 20

Review Learning Objectives for Unit 6
View videos for Unit 6 Simple Linear Regression
Start Quiz 6 — due at 13:00 EST (-5:00), Monday, April 27
Begin Lab 6 — due at 13:00 EST (-5:00), Monday, April 27
Explore the Discussion Forums and contribute
Complete Quiz 5 — due at 13:00 EST (-5:00), Monday, April 20
Complete Lab 5 — due at 13:00 EST (-5:00), Monday, April 20
Complete Data Analysis Project — due at 13:00 EST (-5:00), Monday, April 20
Week 8: Multiple Linear Regression, April 20 – 27

Review Learning Objectives for Unit 7
View videos for Unit 7 Multiple Linear Regression
Start Quiz 7 — due at 13:00 EST (-5:00), Monday, May 4
Begin Lab 7 — due at 13:00 EST (-5:00), Monday, May 4
Explore the Discussion Forums and contribute
Begin review of Data Analysis Project due at 13:00 EST (-5:00), Monday, May 4
Complete Quiz 6 — due at 13:00 EST (-5:00), Monday, April 27
Complete Lab 6 — due at 13:00 EST (-5:00), Monday, April 27
Week 9: Review and catch up, April 27 – May 4

Note that all assignment due times are now in Eastern Standard Time (EST)
View review videos
Complete Quiz 7 — due at 13:00 EST (-5:00), Monday, May 4
Complete Lab 7 — due at 13:00 EST (-5:00), Monday, May 4
Explore the Discussion Forums and contribute
Complete review of Data Analysis Project — due at 13:00 EST (-5:00), Monday, May 4
Final exam is available this week — due 13:00 EST (-5:00), Monday, May 11
Week 10: Final Exam, May 4 – 11

(Source: https://www.coursera.org/course/statistics)

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on LinkedIn (Opens in new window) LinkedIn
  • Click to email a link to a friend (Opens in new window) Email
  • Click to share on Pinterest (Opens in new window) Pinterest
  • Click to share on Reddit (Opens in new window) Reddit
  • Click to share on Tumblr (Opens in new window) Tumblr
  • Click to share on Pocket (Opens in new window) Pocket
  • Click to share on Telegram (Opens in new window) Telegram
  • Click to share on Threads (Opens in new window) Threads
  • Click to share on WhatsApp (Opens in new window) WhatsApp
  • Click to share on Mastodon (Opens in new window) Mastodon
  • Click to share on X (Opens in new window) X
  • Click to share on Bluesky (Opens in new window) Bluesky
Like Loading...

Data Scientist Job Titles, Architecture and Software Warlocks

23 Monday Feb 2015

Posted by Max Hemingway in Architecture, Data Science

≈ Leave a comment

Tags

Architecture, Data Science

An interesting piece of research on Data Scientist Job Titles has been carried out from data on LinkedIn of over 10,000 professionals.

The data splits out into 11 categories listing 700+ Job Titles:

  • Recruiter
  • Engineering
  • Developer
  • Data Plumbing
  • Data Science
  • Statistician
  • Research
  • Business Analytics
  • Consultant
  • Trainer
  • Student

The raw data is available to play with. I have been looking at the Architecture job titles that fall into the “Data Plumbing” category to initially have a view of Architecture Roles within Data Science and what that means. I will continue this research and blog later about it.

Architect

As a bit of fun if you want a generated Job Title, one app to try is this one – Generate a Job Title for you comes up with “Your Silicon Valley job title is……..”

My favourite Generation is the “Software Warlock”

Software Warlock

A close second is  “Grand Poobah of Digital Innovation”

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on LinkedIn (Opens in new window) LinkedIn
  • Click to email a link to a friend (Opens in new window) Email
  • Click to share on Pinterest (Opens in new window) Pinterest
  • Click to share on Reddit (Opens in new window) Reddit
  • Click to share on Tumblr (Opens in new window) Tumblr
  • Click to share on Pocket (Opens in new window) Pocket
  • Click to share on Telegram (Opens in new window) Telegram
  • Click to share on Threads (Opens in new window) Threads
  • Click to share on WhatsApp (Opens in new window) WhatsApp
  • Click to share on Mastodon (Opens in new window) Mastodon
  • Click to share on X (Opens in new window) X
  • Click to share on Bluesky (Opens in new window) Bluesky
Like Loading...

Mapping Social Media Clickbait in R and ggplot2

16 Monday Feb 2015

Posted by Max Hemingway in Data Science, Open Source, Social Media

≈ 1 Comment

Tags

Data Science, Social Media

Have you been caught out by the ever increasing world of Clickbait, enticed in with titles like “You’ll Never Believe What The Parrot Did Next!” in your Social Media feeds such as facebook.

The main purpose of Clickbait is to get you to a site where adverts are displayed to get you to onward click and generate revenue for the sites owners.

Max Woolf @ Minimaxir.com has recently mapped out 15,656 BuzzFeed Listicles which have been shared on Facebook.

Buzzfeed

This has been achieved using R and ggplot based on a dataset from Buzzfeed.  A copy of the code is also available on the authors Github repository.

Looking at the dataset itself:

The top 3 articles shared

  • 41 Camping Hacks That Are Borderline Genius – 1, 734,676 Shares
  • 50 Things That Look Just Like Your Childhood – 1,655,900 Shares
  • 27 Surreal Places To Visit Before You Die – 1,329,602 Shares

The bottom 3 articles shared:

  • 8 Celebrity Tweets You Missed Today – 1 share
  • 7 Outtakes From Out100s 2012 Portraits – 1 share
  • 5 Questions About The JOBS Act Vote And Whats Changed – 1 share

The total number of shares in the data set is a staggering 185,415,297

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on LinkedIn (Opens in new window) LinkedIn
  • Click to email a link to a friend (Opens in new window) Email
  • Click to share on Pinterest (Opens in new window) Pinterest
  • Click to share on Reddit (Opens in new window) Reddit
  • Click to share on Tumblr (Opens in new window) Tumblr
  • Click to share on Pocket (Opens in new window) Pocket
  • Click to share on Telegram (Opens in new window) Telegram
  • Click to share on Threads (Opens in new window) Threads
  • Click to share on WhatsApp (Opens in new window) WhatsApp
  • Click to share on Mastodon (Opens in new window) Mastodon
  • Click to share on X (Opens in new window) X
  • Click to share on Bluesky (Opens in new window) Bluesky
Like Loading...

Testing your base R skills

09 Monday Feb 2015

Posted by Max Hemingway in Data Science, Programming

≈ 1 Comment

Tags

Data Science, Programming

Fancy testing your R skills? There is an Base R Assessment written by Francis Smart now available. Presents a good challenge on testing your skills based on 5 different skill levels.

The tests are slightly slow at the moment but worth a go to challenge yourself.

R Test

Source: http://www.econometricsbysimulation.com/2015/02/base-r-assessment.html

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on LinkedIn (Opens in new window) LinkedIn
  • Click to email a link to a friend (Opens in new window) Email
  • Click to share on Pinterest (Opens in new window) Pinterest
  • Click to share on Reddit (Opens in new window) Reddit
  • Click to share on Tumblr (Opens in new window) Tumblr
  • Click to share on Pocket (Opens in new window) Pocket
  • Click to share on Telegram (Opens in new window) Telegram
  • Click to share on Threads (Opens in new window) Threads
  • Click to share on WhatsApp (Opens in new window) WhatsApp
  • Click to share on Mastodon (Opens in new window) Mastodon
  • Click to share on X (Opens in new window) X
  • Click to share on Bluesky (Opens in new window) Bluesky
Like Loading...

Statistical Inference Resources

03 Tuesday Feb 2015

Posted by Max Hemingway in Data Science, Machine Learning

≈ 1 Comment

Tags

Data Science, Machine Learning, Statistical Inference

Here are some useful links and resources to learning Statistical Inference

  • Statistical Inference -Coursera Data Science series.
  • An Introduction to Statistical Learning with Applications in R
  • DSO 530: Applied Modern Statistical Learning Techniques
  • The Elements of Statistical Learning
  • Wikibook of Statistics
  • Introduction to Statistical Thought
  • Advanced Data Analysis from an Elementary Point of View
  • simpleR – Using R for Introductory Statistics
  • Forecasting: Principles and Practice

*check the sites for any appropriate rules around downloading of course.

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on LinkedIn (Opens in new window) LinkedIn
  • Click to email a link to a friend (Opens in new window) Email
  • Click to share on Pinterest (Opens in new window) Pinterest
  • Click to share on Reddit (Opens in new window) Reddit
  • Click to share on Tumblr (Opens in new window) Tumblr
  • Click to share on Pocket (Opens in new window) Pocket
  • Click to share on Telegram (Opens in new window) Telegram
  • Click to share on Threads (Opens in new window) Threads
  • Click to share on WhatsApp (Opens in new window) WhatsApp
  • Click to share on Mastodon (Opens in new window) Mastodon
  • Click to share on X (Opens in new window) X
  • Click to share on Bluesky (Opens in new window) Bluesky
Like Loading...

Have you tried R yet?

28 Wednesday Jan 2015

Posted by Max Hemingway in Data Science

≈ Leave a comment

Tags

Data Science, Programming

If you have not yet had a chance to try “R” as a language, here is a good site for having a go at some of the functions and power of the R programming language.

Code School – Try R

There are 8 lessons in this group

  1. Using R
  2. Vectors
  3. Matrices
  4. Summary Statistics
  5. Factors
  6. Data Frames
  7. Real-World Data
  8. Whats Next

The lessons present you with an R interface so you don’t have to have the R software installed on your end device.

There are also other good lessons available from http://www.codeschool.com such as Try GIT and GIT Real.

Share this:

  • Click to share on Facebook (Opens in new window) Facebook
  • Click to share on LinkedIn (Opens in new window) LinkedIn
  • Click to email a link to a friend (Opens in new window) Email
  • Click to share on Pinterest (Opens in new window) Pinterest
  • Click to share on Reddit (Opens in new window) Reddit
  • Click to share on Tumblr (Opens in new window) Tumblr
  • Click to share on Pocket (Opens in new window) Pocket
  • Click to share on Telegram (Opens in new window) Telegram
  • Click to share on Threads (Opens in new window) Threads
  • Click to share on WhatsApp (Opens in new window) WhatsApp
  • Click to share on Mastodon (Opens in new window) Mastodon
  • Click to share on X (Opens in new window) X
  • Click to share on Bluesky (Opens in new window) Bluesky
Like Loading...
← Older posts
Newer posts →

RSS Feed

RSS Feed RSS - Posts

Other Publications I contribute to

https://sparrowhawkbushcraft.com/

Recent Posts

  • Graceful Speech & Timeless Tales: Mastering the Art of Gesture
  • Graceful Speech & Timeless Tales: The Power of Pitch
  • Graceful Speech & Timeless Tales: Modulation
  • Graceful Speech & Timeless Tales: Harnessing Inflection
  • Adventure Games: Open Sourced Zork

Categories

  • 21st Century Human
  • 3D Printing
  • AI
  • Applications
  • ArchiMate
  • Architecture
  • Arduino
  • Automation
  • BCS
  • Big Data
  • Certification
  • Climate Change
  • Cloud
  • Cobotics
  • Connected Home
  • Data
  • Data Fellowship
  • Data Science
  • Development
  • DevOps/OpsDev
  • Digital
  • DigitalFit
  • Drone
  • Enterprise Architecture
  • F-TAG
  • Governance
  • Health
  • Innovation
  • IoT
  • Machine Learning
  • Metaverse
  • Micro:Bit
  • Mindset
  • Mobiles
  • Networks
  • Open Source
  • Podcasts
  • Productivity
  • Programming
  • Quantum
  • Raspberry Pi
  • Robotics
  • Scouting
  • Scouts
  • Security
  • Smart Home
  • Social Media
  • Space
  • STEM
  • Story Telling
  • Technologists Toolkit
  • Tools
  • Uncategorized
  • Wearable Tech
  • Windows
  • xR

Archives

Reading Shelf

Archives

Recent Posts

  • Graceful Speech & Timeless Tales: Mastering the Art of Gesture
  • Graceful Speech & Timeless Tales: The Power of Pitch
  • Graceful Speech & Timeless Tales: Modulation
  • Graceful Speech & Timeless Tales: Harnessing Inflection
  • Adventure Games: Open Sourced Zork

Top Posts & Pages

  • Graceful Speech & Timeless Tales: The Art of Articulation
  • Graceful Speech & Timeless Tales: Modulation
  • Graceful Speech & Timeless Tales: The Power of Pitch
  • Graceful Speech & Timeless Tales: Mastering the Art of Gesture
  • Mastering the CPD Cycle for Professional Growth
  • Adventure Games: Open Sourced Zork
  • 20 Informative Podcasts for 2025: Boost Your PKMS
  • Understanding ISO/IEC 42001: A Course Review
  • Building Cyber Resilience: Enterprise Architecture and ArchiMate for Strategic Security

Category Cloud

21st Century Human Architecture Automation Big Data Cloud Data Data Science Development DevOps/OpsDev Digital DigitalFit Enterprise Architecture Innovation IoT Machine Learning Mindset Open Source Podcasts Productivity Programming Raspberry Pi Robotics Security Social Media STEM Story Telling Technologists Toolkit Tools Uncategorized Wearable Tech

Tags

3D Printing 21st Century Human AI Applications Architecture artificial-intelligence Automation BCS Big Data Blockchain business Certification Cloud Cobot Cobotics Coding Communication Connected Home CPD creativity cybersecurity Data Data Fellowship Data Science Delivery Development DevOps Digital DigitalFit Digital Human Drone Email Enterprise Architecture GTD Infographic Information Theory Innovation IoT Journal Knowledge learning Machine Learning Metaverse MicroLearning Mindset Mixed Reality Networks Open Source OpsDev PKMS Podcasts Productivity Programming Proving It Quantum R RaspberryPI Robot Robotics Scouts Security Smart Home Social Media STEM Story Telling Technologists Toolkit technology Technology Couch Podcast Thinking Tools Training Visualisation Voice Wearable Tech xR

License

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Meta

  • Create account
  • Log in
  • Entries feed
  • Comments feed
  • WordPress.com

Blog at WordPress.com.

  • Subscribe Subscribed
    • Max Hemingway
    • Join 82 other subscribers
    • Already have a WordPress.com account? Log in now.
    • Max Hemingway
    • Subscribe Subscribed
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar
 

Loading Comments...
 

    %d