• About

Max Hemingway

~ Musings as I work through life, career and everything.

Max Hemingway

Tag Archives: Big Data

Learning Data Science – Useful References

14 Tuesday Jul 2015

Posted by Max Hemingway in Big Data, Data Science, Machine Learning, Open Source

≈ 1 Comment

Tags

Big Data, Data, Data Science, Knowledge, Machine Learning

Firstly thanks to Tim Osterbuhr who prompteLearningd me to create this list of resources that I have found useful in learning about Data Science after he read my blog post on Learning Data Science. Tim has also provided some of the likes below as well.

Here is the list of Useful References for Learning Data Science. (This list is be no means exhaustive)

From my Blog

  • Learning Data Science
  • Data Science in the Cloud ebook
  • Data Science and Information Theory
  • Data Mining Courses
  • Open Source, Open Human, Open Data, Open Sesame!
  • Data Scientist Skill Set
  • R {swirls} – Learning R by doing
  • Correlation does not imply causation
  • Statistical Inference Resources

From Around the Web

  • 6 checkpoints to ensure regression model validity for analytics
  • Algorithms: Design and Analysis
  • Analyzing Big Data with Twitter
  • Big Data Analytics: Descriptive Vs. Predictive Vs. Prescriptive
  • Data Analysis
  • Data Mining for the Masses
  • Data Science Course
  • Google Visualization API Reference
  • k-means clustering
  • Occam’s Razor
  • PCA Step by Step
  • Regression Equation: What it is and How to use it
  • Using JavaScript visualization libraries with R

Public Data Sets

  • http://www.cs.cmu.edu/~./enron/
  • http://www.secviz.org/content/the-davix-live-cd
  • http://www.caida.org/data/overview/
  • http://www.secviz.org/content/visual-analytics-workshop-with-worlds-leading-security-visualization-expert-0
  • http://snap.stanford.edu/data/
  • http://analytics.ncsu.edu/
  • https://code.google.com/p/google-refine/

Data Science Books

  • 9 Free Books for Learning Data Mining & Data Analysis
  • 16 Free Data Science Books
  • 27 free data mining books

Happy to add other links from readers to this list.

Share this:

  • Twitter
  • Facebook
  • LinkedIn
  • Email
  • Pinterest

Like this:

Like Loading...

Techdays Online Azure Special

02 Tuesday Jun 2015

Posted by Max Hemingway in Architecture, Big Data, Cloud, DevOps/OpsDev, IoT, Machine Learning

≈ Leave a comment

Tags

Architecture, Big Data, Cloud, DevOps, IoT, Machine Learning, Open Source, OpsDev

Microsoft are running a Techdays Online Azure Special over the next 3 days

Registration is at https://info.microsoft.com/UK-Azure-WBNR-FY15-06Jun-Azure-Techdays-2015-Registration.html

  • June 02, 2015 09:00 AM – TechDays Online Azure Special Day One: Keynotes, IOT, Hybrid and Open Source
  • June 03, 2015 09:00 AM – TechDays Online Azure Special Day Two: Apps, Architecture, Big Data and Machine Learning
  • June 04, 2015 09:00 AM – TechDays Online Azure Special Day Three: Cloud Infrastructure and Dev Ops

Hopefully the sessions will be available offline after the event for reference and catch up.

Books

Share this:

  • Twitter
  • Facebook
  • LinkedIn
  • Email
  • Pinterest

Like this:

Like Loading...

Open Source Web Crawlers and Data Sets

15 Friday May 2015

Posted by Max Hemingway in Big Data, Data Science

≈ 1 Comment

Tags

Big Data, Data, Data Science

webA great list of 50 Open Source Web Crawlers has been produced by Baiju NT on a Big Data Blog

Web Crawlers are useful in gathering data from other sites when performing research, although caution should be used as with today’s levels of protection some sites defenses may consider your data gathering as an attack.

Its probably best to check first if any data sets exist with the data you are looking for.

https://www.quandl.com/ is a search engine for data sets that has listed 12 million data sets.

There are lots of data sets available from governments such as http://data.gov.uk/ in the UK.

If its a smaller list of good data sources is needed have a look at http://www.kdnuggets.com/datasets/index.html

Sources:

  • https://www.quandl.com/
  • http://www.kdnuggets.com/datasets/index.html
  • http://bigdata-madesimple.com/top-50-open-source-web-crawlers-for-data-mining/

Share this:

  • Twitter
  • Facebook
  • LinkedIn
  • Email
  • Pinterest

Like this:

Like Loading...

Data Mining Courses

28 Tuesday Apr 2015

Posted by Max Hemingway in Big Data, Data Science

≈ 1 Comment

Tags

Big Data, Data, Data Science, learning

mineVia Coursera the University of Illinois at Urbana-Champaign is running a specialisation on Data Mining.  As with all Coursera courses, you don’t have to take the specialisation, but can take the courses individually or one after each other. Taking the courses outside of the specialisation means that you wont get to complete the capstone project and earn your certificate at the end.

This track is made up 5 courses covering:

Pattern Discovery in Data Mining

  • Introduction to data mining
  • Concepts and challenges in pattern discovery and analysis
  • Scalable pattern discovery algorithms
  • Pattern evaluation
  • Mining flexible patterns in multi-dimensional space
  • Mining sequential patterns
  • Mining graph patterns
  • Pattern-based classification
  • Application examples of pattern discovery

Text Retrieval and Search Engines

  • Introduction to text data mining
  • Basic concepts in text retrieval
  • Information retrieval models
  • Implementation of a search engine
  • Evaluation of search engines
  • Advanced search engine technologies

Cluster Analysis in Data Mining

  • Basic concept and introduction
  • Partitioning methods
  • Hierarchical methods
  • Density-based methods
  • Probabilistic models and EM algorithm
  • Spectral clustering
  • Clustering high dimensional data
  • Clustering streaming data
  • Clustering graph data and network data
  • Constraint-based clustering and semi-supervised clustering
  • Application examples of cluster analysis

Text Mining and Analytics

  • Overview of text analytics and applications
  • Extending a search engine to support text analytics (text categorization, text clustering, text summarization)
  • Topic mining and analysis with statistical topic models
  • Opinion mining and summarization
  • Integrative analysis of text and structured data

Data Visualization

  • Visualization Infrastructure (graphics programming and human perception)
  • Basic Visualization (charts, graphs, animation, interactivity)
  • Visualizing Relationships (hierarchies, networks)
  • Visualizing Information (text, databases)

These courses would complement the courses from John Hopkins on Data Science

Source: https://www.coursera.org/specialization/datamining/20

Share this:

  • Twitter
  • Facebook
  • LinkedIn
  • Email
  • Pinterest

Like this:

Like Loading...

Big Data – 4V’s + Verification

27 Monday Apr 2015

Posted by Max Hemingway in Big Data, Data Science, IoT

≈ Leave a comment

Tags

Big Data, Data, Data Science, Infographic, IoT

IBM have released an Infographic on the “Four V’s of Big Data” which covers:

  • Volume – Scale of Data
  • Variety – Different forms of Data
  • Velocity – Analysis of Streaming Data
  • Veracity – Uncertainty of Data

4-Vs-of-big-data

There should be another V for “Verification” which covers the questions you ask of the data in order to obtain the results. A check should also be made on the data to look at the inference of the results as different views or questions asked in a slightly different way could produce completely different outcomes in the data.

Having the right data is important and ensuring the data gathered and collected is relevant to the business questions you are asking. Two stats in the infographic stick out for me on this:

  • $3.1 Trillion a year on poor data quality
  • 40 Zetabytes of data created by 2020

Perhaps with the right Verification there may not be so much uncertainty (Veracity) and a huge saving to businesses reducing a high loss in money, time and incorrect data.

Share this:

  • Twitter
  • Facebook
  • LinkedIn
  • Email
  • Pinterest

Like this:

Like Loading...

Do you know Big Data?

07 Tuesday Apr 2015

Posted by Max Hemingway in Big Data, Data Science, Tools

≈ Leave a comment

Tags

Big Data, Data, Data Science, Knowledge

Whilst looking into some suitable questions to ask about Big Data, I can across an excellent poster titled “Do you know Big Data?” produced by Altamira.

The poster covers a set of questions that help you question Big Data and a Big Data project.

  • What is Big Data?
  • What types of Big Data are there?
  • How do we extract knowledge from Big Data?
  • What do we do with knowledge we extract?
  • What types of Visual Techniques are there?
  • What types of Statistical Algorithms are there?
  • How big is Big Data?
  • What is a Data Scientist?
  • How do we implement Big Data solutions?
  • How do we address privacy and ethics in Big Data?
  • How do we secure Big Data?
  • What are leading Big Data tools?
  • What questions should we ask about Databases?
  • What questions about Predictive Tools?

bigdata

A useful tool as a starting place to research further elements of Big Data.

Share this:

  • Twitter
  • Facebook
  • LinkedIn
  • Email
  • Pinterest

Like this:

Like Loading...

Technology Couch Podcast

Technology Couch Podcast

Topical discussions with different guests on Technology

Chat and views on latest Technology trends, news and what is currently hot in the industry

Max Hemingway

Subscribe via iTunes

RSS Feed

RSS Feed RSS - Posts

Currently Reading

@HemingwayReads

Other Publications I contribute to

https://sparrowhawkbushcraft.com/

Recent Posts

  • Geek Out as a Scout Leader – Rolling a NAT 20
  • Journaling my Daily Musings
  • 2020 – The Age of Ambiguity
  • Too Much Reliance on SatNav and online maps?
  • Map Camp 2020

Categories

  • 21st Century Human
  • 3D Printing
  • Applications
  • Architecture
  • Arduino
  • Automation
  • Big Data
  • Certification
  • Cloud
  • Cobotics
  • Connected Home
  • Data Science
  • Development
  • DevOps/OpsDev
  • Digital
  • DigitalFit
  • Drone
  • Enterprise Architecture
  • Governance
  • Innovation
  • IoT
  • Machine Learning
  • Micro:Bit
  • Networks
  • Open Source
  • Podcasts
  • Productivity
  • Programming
  • Quantum
  • Raspberry Pi
  • Robotics
  • Scouting
  • Scouts
  • Security
  • Smart Home
  • Social Media
  • STEM
  • Tools
  • Uncategorized
  • Wearable Tech
  • Windows
  • xR

Archives

Reading Shelf

Archives

Recent Posts

  • Geek Out as a Scout Leader – Rolling a NAT 20
  • Journaling my Daily Musings
  • 2020 – The Age of Ambiguity
  • Too Much Reliance on SatNav and online maps?
  • Map Camp 2020

Top Posts & Pages

  • Pen based Productivity Tools – The Chronodex
  • Having the Right Digital Mindset: Business (Change, Agility and a Growth Mindset)
  • Personal Knowledge Management System – Revised for 2020
  • Personal Knowledge Management System
  • Testing your base R skills
  • Installing the Docker Toolbox - Exit Status 255 and 1
  • Technology Couch Podcast – Episode 6
  • Having the Right Digital Mindset: Application
  • IT Professionals and Continuing Professional Development (CPD) Hours
  • IoT Device Security Considerations and Security Layers - Sensor/Instruments

Category Cloud

21st Century Human Architecture Automation Big Data Cloud Cobotics Data Science Development DevOps/OpsDev Digital DigitalFit Enterprise Architecture Governance Innovation IoT Machine Learning Open Source Podcasts Productivity Programming Raspberry Pi Robotics Scouts Security Social Media STEM Tools Uncategorized Wearable Tech xR

Tags

# 3D Printing 21st Century Human AI API Applications Architecture Arduino Automation Big Data Blockchain Certification Cloud Cobot Cobotics Coding Communication Connected Home Continuous Delivery CPD Data Data Science Delivery Development DevOps Digital DigitalFit Digital Human Docker Drone Email Encryption Enterprise Architecture Framework GTD Hashtag Infographic Information Theory Innovation IoT Journal Knowledge learning Machine Learning Micro:Bit MicroLearning Mixed Reality Networks Open Source OpsDev Podcasts Productivity Programming Proving It R RaspberryPI Robot Robotics Scouts Security Smart Home Social Media Standards Statistical Inference STEM Technology Couch Podcast Thinking Tools Training Visionables Visualisation Voice Wearable Tech Windows xR

License

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Meta

  • Register
  • Log in
  • Entries feed
  • Comments feed
  • WordPress.com

Blog at WordPress.com.

Cancel
loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.
%d bloggers like this: