• About Me Card

Max Hemingway

~ Musings as I work through life, career and everything.

Max Hemingway

Tag Archives: Big Data

Learning Data Science – Useful References

14 Tuesday Jul 2015

Posted by Max Hemingway in Big Data, Data Science, Machine Learning, Open Source

≈ 1 Comment

Tags

Big Data, Data, Data Science, Knowledge, Machine Learning

Firstly thanks to Tim Osterbuhr who prompteLearningd me to create this list of resources that I have found useful in learning about Data Science after he read my blog post on Learning Data Science. Tim has also provided some of the likes below as well.

Here is the list of Useful References for Learning Data Science. (This list is be no means exhaustive)

From my Blog

  • Learning Data Science
  • Data Science in the Cloud ebook
  • Data Science and Information Theory
  • Data Mining Courses
  • Open Source, Open Human, Open Data, Open Sesame!
  • Data Scientist Skill Set
  • R {swirls} – Learning R by doing
  • Correlation does not imply causation
  • Statistical Inference Resources

From Around the Web

  • 6 checkpoints to ensure regression model validity for analytics
  • Algorithms: Design and Analysis
  • Analyzing Big Data with Twitter
  • Big Data Analytics: Descriptive Vs. Predictive Vs. Prescriptive
  • Data Analysis
  • Data Mining for the Masses
  • Data Science Course
  • Google Visualization API Reference
  • k-means clustering
  • Occam’s Razor
  • PCA Step by Step
  • Regression Equation: What it is and How to use it
  • Using JavaScript visualization libraries with R

Public Data Sets

  • http://www.cs.cmu.edu/~./enron/
  • http://www.secviz.org/content/the-davix-live-cd
  • http://www.caida.org/data/overview/
  • http://www.secviz.org/content/visual-analytics-workshop-with-worlds-leading-security-visualization-expert-0
  • http://snap.stanford.edu/data/
  • http://analytics.ncsu.edu/
  • https://code.google.com/p/google-refine/

Data Science Books

  • 9 Free Books for Learning Data Mining & Data Analysis
  • 16 Free Data Science Books
  • 27 free data mining books

Happy to add other links from readers to this list.

Share this:

  • Twitter
  • Facebook
  • LinkedIn
  • Email
  • Pinterest

Like this:

Like Loading...

Techdays Online Azure Special

02 Tuesday Jun 2015

Posted by Max Hemingway in Architecture, Big Data, Cloud, DevOps/OpsDev, IoT, Machine Learning

≈ Leave a comment

Tags

Architecture, Big Data, Cloud, DevOps, IoT, Machine Learning, Open Source, OpsDev

Microsoft are running a Techdays Online Azure Special over the next 3 days

Registration is at https://info.microsoft.com/UK-Azure-WBNR-FY15-06Jun-Azure-Techdays-2015-Registration.html

  • June 02, 2015 09:00 AM – TechDays Online Azure Special Day One: Keynotes, IOT, Hybrid and Open Source
  • June 03, 2015 09:00 AM – TechDays Online Azure Special Day Two: Apps, Architecture, Big Data and Machine Learning
  • June 04, 2015 09:00 AM – TechDays Online Azure Special Day Three: Cloud Infrastructure and Dev Ops

Hopefully the sessions will be available offline after the event for reference and catch up.

Books

Share this:

  • Twitter
  • Facebook
  • LinkedIn
  • Email
  • Pinterest

Like this:

Like Loading...

Open Source Web Crawlers and Data Sets

15 Friday May 2015

Posted by Max Hemingway in Big Data, Data Science

≈ 1 Comment

Tags

Big Data, Data, Data Science

webA great list of 50 Open Source Web Crawlers has been produced by Baiju NT on a Big Data Blog

Web Crawlers are useful in gathering data from other sites when performing research, although caution should be used as with today’s levels of protection some sites defenses may consider your data gathering as an attack.

Its probably best to check first if any data sets exist with the data you are looking for.

https://www.quandl.com/ is a search engine for data sets that has listed 12 million data sets.

There are lots of data sets available from governments such as http://data.gov.uk/ in the UK.

If its a smaller list of good data sources is needed have a look at http://www.kdnuggets.com/datasets/index.html

Sources:

  • https://www.quandl.com/
  • http://www.kdnuggets.com/datasets/index.html
  • http://bigdata-madesimple.com/top-50-open-source-web-crawlers-for-data-mining/

Share this:

  • Twitter
  • Facebook
  • LinkedIn
  • Email
  • Pinterest

Like this:

Like Loading...

Data Mining Courses

28 Tuesday Apr 2015

Posted by Max Hemingway in Big Data, Data Science

≈ 1 Comment

Tags

Big Data, Data, Data Science, learning

mineVia Coursera the University of Illinois at Urbana-Champaign is running a specialisation on Data Mining.  As with all Coursera courses, you don’t have to take the specialisation, but can take the courses individually or one after each other. Taking the courses outside of the specialisation means that you wont get to complete the capstone project and earn your certificate at the end.

This track is made up 5 courses covering:

Pattern Discovery in Data Mining

  • Introduction to data mining
  • Concepts and challenges in pattern discovery and analysis
  • Scalable pattern discovery algorithms
  • Pattern evaluation
  • Mining flexible patterns in multi-dimensional space
  • Mining sequential patterns
  • Mining graph patterns
  • Pattern-based classification
  • Application examples of pattern discovery

Text Retrieval and Search Engines

  • Introduction to text data mining
  • Basic concepts in text retrieval
  • Information retrieval models
  • Implementation of a search engine
  • Evaluation of search engines
  • Advanced search engine technologies

Cluster Analysis in Data Mining

  • Basic concept and introduction
  • Partitioning methods
  • Hierarchical methods
  • Density-based methods
  • Probabilistic models and EM algorithm
  • Spectral clustering
  • Clustering high dimensional data
  • Clustering streaming data
  • Clustering graph data and network data
  • Constraint-based clustering and semi-supervised clustering
  • Application examples of cluster analysis

Text Mining and Analytics

  • Overview of text analytics and applications
  • Extending a search engine to support text analytics (text categorization, text clustering, text summarization)
  • Topic mining and analysis with statistical topic models
  • Opinion mining and summarization
  • Integrative analysis of text and structured data

Data Visualization

  • Visualization Infrastructure (graphics programming and human perception)
  • Basic Visualization (charts, graphs, animation, interactivity)
  • Visualizing Relationships (hierarchies, networks)
  • Visualizing Information (text, databases)

These courses would complement the courses from John Hopkins on Data Science

Source: https://www.coursera.org/specialization/datamining/20

Share this:

  • Twitter
  • Facebook
  • LinkedIn
  • Email
  • Pinterest

Like this:

Like Loading...

Big Data – 4V’s + Verification

27 Monday Apr 2015

Posted by Max Hemingway in Big Data, Data Science, IoT

≈ Leave a comment

Tags

Big Data, Data, Data Science, Infographic, IoT

IBM have released an Infographic on the “Four V’s of Big Data” which covers:

  • Volume – Scale of Data
  • Variety – Different forms of Data
  • Velocity – Analysis of Streaming Data
  • Veracity – Uncertainty of Data

4-Vs-of-big-data

There should be another V for “Verification” which covers the questions you ask of the data in order to obtain the results. A check should also be made on the data to look at the inference of the results as different views or questions asked in a slightly different way could produce completely different outcomes in the data.

Having the right data is important and ensuring the data gathered and collected is relevant to the business questions you are asking. Two stats in the infographic stick out for me on this:

  • $3.1 Trillion a year on poor data quality
  • 40 Zetabytes of data created by 2020

Perhaps with the right Verification there may not be so much uncertainty (Veracity) and a huge saving to businesses reducing a high loss in money, time and incorrect data.

Share this:

  • Twitter
  • Facebook
  • LinkedIn
  • Email
  • Pinterest

Like this:

Like Loading...

Do you know Big Data?

07 Tuesday Apr 2015

Posted by Max Hemingway in Big Data, Data Science, Tools

≈ Leave a comment

Tags

Big Data, Data, Data Science, Knowledge

Whilst looking into some suitable questions to ask about Big Data, I can across an excellent poster titled “Do you know Big Data?” produced by Altamira.

The poster covers a set of questions that help you question Big Data and a Big Data project.

  • What is Big Data?
  • What types of Big Data are there?
  • How do we extract knowledge from Big Data?
  • What do we do with knowledge we extract?
  • What types of Visual Techniques are there?
  • What types of Statistical Algorithms are there?
  • How big is Big Data?
  • What is a Data Scientist?
  • How do we implement Big Data solutions?
  • How do we address privacy and ethics in Big Data?
  • How do we secure Big Data?
  • What are leading Big Data tools?
  • What questions should we ask about Databases?
  • What questions about Predictive Tools?

bigdata

A useful tool as a starting place to research further elements of Big Data.

Share this:

  • Twitter
  • Facebook
  • LinkedIn
  • Email
  • Pinterest

Like this:

Like Loading...

Technology Couch Podcast

Technology Couch Podcast

Topical discussions with different guests on Technology

Chat and views on latest Technology trends, news and what is currently hot in the industry

Max Hemingway

  • Listen on Apple Podcasts
  • Podcast RSS Feed

RSS Feed

RSS Feed RSS - Posts

Currently Reading

@HemingwayReads

Other Publications I contribute to

https://sparrowhawkbushcraft.com/

Recent Posts

  • My Virtual Selfie – Avatars and Identity Security
  • Air Launching Satellites into Space – UK First
  • Our Acceptance of Modern Technologies
  • Pen based Productivity Tools – The Chronodex 2023
  • Top 10 Tech Podcasts for 2023

Categories

  • 21st Century Human
  • 3D Printing
  • Applications
  • Architecture
  • Arduino
  • Automation
  • BCS
  • Big Data
  • Certification
  • Cloud
  • Cobotics
  • Connected Home
  • Data
  • Data Fellowship
  • Data Science
  • Development
  • DevOps/OpsDev
  • Digital
  • DigitalFit
  • Drone
  • Enterprise Architecture
  • F-TAG
  • Governance
  • Health
  • Innovation
  • IoT
  • Machine Learning
  • Micro:Bit
  • Mindset
  • Mobiles
  • Networks
  • Open Source
  • Podcasts
  • Productivity
  • Programming
  • Quantum
  • Raspberry Pi
  • Robotics
  • Scouting
  • Scouts
  • Security
  • Smart Home
  • Social Media
  • Space
  • STEM
  • Tools
  • Uncategorized
  • Wearable Tech
  • Windows
  • xR

Archives

Reading Shelf

Archives

Recent Posts

  • My Virtual Selfie – Avatars and Identity Security
  • Air Launching Satellites into Space – UK First
  • Our Acceptance of Modern Technologies
  • Pen based Productivity Tools – The Chronodex 2023
  • Top 10 Tech Podcasts for 2023

Top Posts & Pages

  • Data Fellowship - BCS Level 4 Certificate in Data Analysis Tools
  • A formula for Innovation
  • Have you tried R yet?
  • Personal Knowledge Management System
  • The Nature and Cycle of CPD
  • Data Fellowship - BCS Level 4 Diploma in Data Analysis Concepts
  • My Virtual Selfie - Avatars and Identity Security

Category Cloud

21st Century Human Architecture Automation Big Data Cloud Cobotics Data Data Science Development DevOps/OpsDev Digital DigitalFit Enterprise Architecture Governance Innovation IoT Machine Learning Mindset Open Source Podcasts Productivity Programming Raspberry Pi Robotics Security Social Media STEM Tools Uncategorized Wearable Tech

Tags

# 3D Printing 21st Century Human Applications Architecture Automation BCS Big Data Blockchain Certification Cloud Cobot Cobotics Coding Communication Connected Home Continuous Delivery CPD Data Data Fellowship Data Science Delivery Development DevOps Digital DigitalFit Digital Human Docker Drone Email Encryption Enterprise Architecture Framework GTD Hashtag Infographic Information Theory Innovation IoT Journal Knowledge learning Machine Learning Micro:Bit MicroLearning Mindset Mixed Reality Networks Open Source OpsDev PKMS Podcasts Productivity Programming Proving It R RaspberryPI Robot Robotics Scouts Security Smart Home Social Media Standards Statistical Inference STEM Technology Couch Podcast Thinking Tools Training Visualisation Voice Wearable Tech Windows xR

License

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Meta

  • Register
  • Log in
  • Entries feed
  • Comments feed
  • WordPress.com

Blog at WordPress.com.

  • Follow Following
    • Max Hemingway
    • Join 71 other followers
    • Already have a WordPress.com account? Log in now.
    • Max Hemingway
    • Customize
    • Follow Following
    • Sign up
    • Log in
    • Report this content
    • View site in Reader
    • Manage subscriptions
    • Collapse this bar
 

Loading Comments...
 

    %d bloggers like this: