• About

Max Hemingway

~ Musings as I work through life, career and everything.

Max Hemingway

Tag Archives: Data

Experimental Mindset

10 Wednesday Feb 2021

Posted by Max Hemingway in 21st Century Human, Data, Data Science, Mindset

≈ Leave a comment

Tags

21st Century Human, Data, Data Science, Mindset

We have all at sometime done some sort of experiment, from maybe from a young age as to see which cry and actions resulted in the reward of milk to test driving cars to find which is best suited to your needs before you buy it. These are experiments that produced results from things we have tried and may not have thought about it as developing an Experimental Mindset. In this article I am concentrating on how this applies to data.

Here are my notes from my research into the topic.

The main areas for an Experimental Mindset are:

  • Learning
  • Testing
  • Evaluating

In order to constantly learn you need to be open to learning and develop your Growth Mindset. I have covered this in another blog so wont repeat here: Having the Right Digital Mindset: Business (Change, Agility and a Growth Mindset).

Having an Experimental Mindset is one of the key traits in being a Data Analyst or Data Scientist and it is not a new term. This has been around as long as the field of science and research has. These arena have developed methodologies that have been adopted and taken forward by many other areas such as business and computing that can be used for testing and evaluating.

At a high level this methodology can be shown as:

Observations –> Hypothesis –> Scientific Law

Overlaid with the areas for data this can be shown as:

Observations (Learning) –> Hypothesis (Testing) –> Scientific Law (Evaluating)

or as:

Observations (Data) –> Hypothesis (Product/Service) –> Scientific Law (Predictive Model)

Using this methodology, one of the more common types of Hypothesis Testing is A/B Testing. This sets out a framework for a simple controlled experiment against two versions (A and B) to look at the impact of changes to a thing or product. Some useful articles on A/B Testing are listed below that go into the details of it:

  • A/B Testing
  • A Beginner’s Guide To A/B Testing: An Introduction
  • A Refresher on A/B Testing

Udacity host a course by Google on A/B testing.

There are some risks to A/B Testing that should be considered when reviewing the results:

  • Sampling Bias
  • Study Population
  • Target Population
  • Segmentation
  • World Time Zones
  • Target Population
  • Data/Privacy Laws

I will go further into the realms of A/B testing in a later blog post.

Further Reading

  • 5 Benefits of Adopting an Experimental Mindset
  • A/B Testing
  • A Beginner’s Guide To A/B Testing: An Introduction
  • A Refresher on A/B Testing
  • Comparison of Segmentation Approaches
  • Design Thinking Mindsets for Human-Centered Design
  • Embracing an Experimental Mindset
  • Sampling Bias
  • Sampling Bias
  • Sampling bias: What is it and why does it matter?
  • Simpson’s Paradox and segmentation: why analysis is crucial
  • The Upside of an Experimental Mindset

Share this:

  • Twitter
  • Facebook
  • LinkedIn
  • Email
  • Pinterest

Like this:

Like Loading...

Data Storytelling

08 Monday Feb 2021

Posted by Max Hemingway in Data, Data Science

≈ Leave a comment

Tags

Data, Data Science

Humans have been using the medium of storytelling since the begining, but only really recording it from the moment a wet painted hand went onto a cave wall. These days we read stories in books or access stories over the internet on our tablets and other devices.

Photo by Suzy Hazelwood on Pexels.com

The main key to all of storytelling is data in one form or another. From 1 x wooly mammoth and 3 x hunters (thats 4 items of data) in a cave painting to the complexity of how many bits and bytes are in an online book.

For a good explanation on What is data? – Cassie Kozyrkov, Head of Decision Intelligence,@ Google has written some great posts and videos on the subject.

So when we have data, we use stories to explain what it is telling us – hopefully not through 1000’s of powerpoint slides…….Make it Stop!!. What are you going to put in those slides that will keep the audience hooked and focused.

Stories are normall based around a simple concept of beginning, middle and end, however there is more to it that that if you want to tell a good story.

The first thing through before getting to the story is to make sure you understand what the data is telling you. If you don’t understand the data and your asked a question, will you be able to answer it or further illustrate your point. Keep in mind – EVALUATE – LEARN – PRACTICE. Then maybe practice some more until you are confident with what your about to talk about.

Decluttered and simple visuals help to tell the story and keep the audience focused on what you are telling them, rather than they spend the time trying to understand what all that text and facts are on the screen. Information is Beautiful is a site that shows some ways to display data visually in easy to understand ways by David McCandless. Here is his TED talk:

Stories normally follow a Heroes Journey which takes the plot line through a series of steps to keep the audience wanting more and to continue to read the rest or listen until the end. When storytelling about data, as similar construct can be used using the Heroes Journey:

SequenceHeroes Storytelling StepData Storytelling Step
1Status QuoWhats the current normal
2Call to AdvetureThe Question (What is being asked of the data)
3AssistanceWhat are the Sources
4DepatureTurn the data into something understandable
5TrailsData Analysis
6ApproachMethods used
7CrisisData Modelling / Wrangling
8TreasureThe Findings
9ResultResult
10ReturnPresentation
11New LifeNew normal
12ResolutionReview
13EndEnd or maybe a different question?
Data Storytelling using a Heroes Journey

There is a good explanation of the different styles of Heroes Journey on Wikipedia. the above table is change a bit. Heres a video that goes through a format:

Now we have a structure, how you tell the story is just as important. How can you pursuade the audience about the data and point of view that you are presenting?

There are, then, these three means of effecting persuasion. The man who is to be in command of them must, it is clear, be able (1) to reason logically, (2) to understand human character and goodness in their various forms, and (3) to understand the emotions–that is, to name them and describe them, to know their causes and the way in which they are excited.

Aristotle

Aristotle set out his Powers of Persuasion in four areas:

  • Ethos – Author/Speaker (Character, Credibility, Authority, Truthfulness)
  • Pathos – Howtopic effects you – connect and bridge the gap (Current emotional state, Target emotional state)
  • Logos – Why it effects you – story / proposal (Reasonablenss, Consistancy, Clarity)
  • Karios – Time and place

Ethos – ‘It is not true, as some writers assume in their treatises on rhetoric, that the personal goodness revealed by the speaker contributes nothing to his power of persuasion; on the contrary, his character may almost be called the most effective means of persuasion he possesses.’

Pathos ‘persuasion is effected through the speech itself when we have proved a truth or an apparent truth by means of the persuasive arguments suitable to the case in question.’

Logos ‘persuasion may come through the hearers, when the speech stirs their emotions. Our judgements when we are pleased and friendly are not the same as when we are pained and hostile.’

Rhetoric, Aristotle

Karios is an Ancient Greek word meaning the right, critical, or opportune moment.

How we can use these areas is illustrated in this example:

When preparing for the Storytelling session its worth checking that you are not going to fall into the trap of the “echo chamber effect”.  From my post on the subject I have created the following term to help me remember – STACK

  • Step Back
  • Think
  • Absorb other views
  • Challenge your thinking
  • communicate your Knowledge

Storytelling is more trustworthy than just presenting data on its own. One to consider when you create your next PowerPoint Presentation.

Further Reading

  • Data Storytelling: The Essential Data Science Skill Everyone Needs
  • 5 Data Storytelling Tips for Creating More Persuasive Charts and Graphs
  • Data Storytelling: What It Is, Why It Matters
  • What is data storytelling? Plus 5 great examples

Share this:

  • Twitter
  • Facebook
  • LinkedIn
  • Email
  • Pinterest

Like this:

Like Loading...

Data Fellowship

08 Monday Feb 2021

Posted by Max Hemingway in Data, Data Science

≈ Leave a comment

Tags

Data, Data Science

Data, it’s everywhere and there are thousands, millions, billions…… lets just say “lots” of data created evry second of the day, from articles and discussions on the internet, to texts and whats apps, to cars, to well anything with a chip in it really. It goes a huge way to ruling our lives and telling us how to live, from what to eat to the carbon footprint of the world. so when I was given an opportunity to undertake an apprenticeship in Data Analytics on a Data Fellowship Apprenticeship over the next 18 months. Of course Im going to jump at that!

A great way to check my understanding and knowledge on things and learn many new things and more importantly for me provide a qualification at Data Analyst Level 4 standard.

So what is the So What? At the moment the programme is starting, so not much to report back so far, however I have started to document some of my journey and bits in my GitHub repo and will use this and my blog to record my thoughts and learnings going forward. Watch this space as they say.

Share this:

  • Twitter
  • Facebook
  • LinkedIn
  • Email
  • Pinterest

Like this:

Like Loading...

Your Digital Exhaust – The data we share

06 Wednesday May 2020

Posted by Max Hemingway in Connected Home, Data, Security, Social Media

≈ 1 Comment

Tags

Connected Home, Data, Security, Social Media

Dont say a wordEveryone who uses a computer or mobile creates their own digital exhaust in the form of data that we leave behind and spew out of our devices – from location data to social media posts and videos. Other things we own such as cars and houses are also generating data from SatNavs to Smart Meters.

If we could measure individual volume of data and information against todays climate change measures and visualise it, we would probably call it an ecological disaster on a person by person scale, however we go about our daily lives creating data with and without knowing it.

To be clear creating data does have a climate effect as there are systems behind what we create and they all need power, cooling etc. However, putting any talk to the side around the ecological effects of this as there is enough said already about climate and climate change and focusing on the data itself.

At the beginning of 2020, the digital universe was estimated to consist of 44 zettabytes of data, which is 44 trillion gigabytes and growing. That’s a lot of data!

We go about generating data without knowing or thinking until a news article catches our attention about something someone said many years ago. Recent times have seen an almost doubling of the use of the internet. This in turn increases the amount of data being created as people discover ways to help elivate lockdown with video calls to new dances on TikTok.

To put this into perspective a bit, with a trolley full of phones you can create a virtual traffic jam, but dont try that at home. This example illustrates the data being generated from a device and how others are using it, in this case to look at traffic patterns

In this increase of posts and data about people across the many different platforms available, are you stopping to think about what your posting?  We go about generating data without thinking until a news article catches our attention about something someone said many years ago that has been found on a social platform somewhere.

Sci-Fi moment alert! – Having watched an episode of “The Orville” by Seth MacFarlane called “Lasting Impressions” where the crew of the Orville open a Time Capsule and recreate someones life in a holodeck using just the data from a iPhone (after accessing a video on the phone where the person who’s phone it is, gives their consent for the data to be used in the future) and recreate and interact with the phones original owner. This provides the crew with a view into that persons life and what they were like.

Have you through about what would happen to your data in the future?

This concept can easily be recreated today and there are TV programs that investigate and look at people to check who they really are (Catfish the TV show). Its easy to see how people leave a trail of digital evidence and clues from what they post and are not secure on what they do or think about what they post.

Here are some good tips to help secure your online presence:

Privacy and security settings exist for a reason: Learn about and use the privacy and security settings on social networks. They are there to help you control who sees what you post and manage your online experience in a positive way.

Once posted, always posted: Protect your reputation on social networks. What you post online stays online. Think twice before posting pictures you wouldn’t want your parents or future employers to see. Recent research found that 70 percent of job recruiters rejected candidates based on information they found online.

Your online reputation can be a good thing: Recent research also found that recruiters respond to a strong, positive personal brand online. So show your smarts, thoughtfulness and mastery of the environment.

Keep personal info personal: Be cautious about how much personal information you provide on social networking sites. The more information you post, the easier it may be for a hacker or someone else to use that information to steal your identity, access your data or commit other crimes such as stalking.

Know and manage your friends: Social networks can be used for a variety of purposes. Some of the fun is creating a large pool of friends from many aspects of your life. That doesn’t mean all friends are created equal. Use tools to manage the information you share with friends in different groups or even have multiple online pages. If you’re trying to create a public persona as a blogger or expert, create an open profile or a “fan” page that encourages broad participation and limits personal information. Use your personal profile to keep your real friends (the ones you know and trust) up to date with your daily life.

Be honest if you’re uncomfortable: If a friend posts something about you that makes you uncomfortable or seems inappropriate, let them know. Likewise, stay open minded if a friend approaches you because something you’ve posted makes him or her uncomfortable. People have different tolerances for how much the world knows about them respect those differences.

Know what action to take: If someone is harassing or threatening you, remove them from your friends list, block them and report them to the site administrator.

Keep security software current: Having the latest security software, web browser and operating system is the best defense against viruses, malware and other online threats.

Own your online presence: When applicable, set the privacy and security settings on websites to your comfort level for information sharing. It’s OK to limit how and with whom you share information.

Source: https://staysafeonline.org/stay-safe-online/securing-key-accounts-devices/social-media/

Additional tips are available at this source.

Further Reading

Tips on being Social Media Savvy

Share this:

  • Twitter
  • Facebook
  • LinkedIn
  • Email
  • Pinterest

Like this:

Like Loading...

An A-Z Guide to being an Architect

07 Thursday Jan 2016

Posted by Max Hemingway in Architecture, Big Data, Cloud, Development, DevOps/OpsDev, Enterprise Architecture, Governance, Innovation, IoT, Open Source, Productivity, Programming, Security, Social Media, Tools

≈ Leave a comment

Tags

Architecture, Cloud, CPD, Data, Development, DevOps, Innovation, IoT, Knowledge, learning, Open Source, OpsDev, Productivity, Programming, Social Media

Back in 2008 Microsoft published An A-Z Guide to ABCBeing an Architect in their Architecture Journals.

Here is my take on an updated A to Z Guide to being an Architect. A couple of these may be similar.

A – Architect

Having the right level of skills as an Architect or engaging an Architect with the right level of skills will depend on the work needing to be undertaken. There are several types of Architect with some specialising in certain areas and others being multi domain skilled. The list below covers some of the different types of Architect- this is not an exhaustive list:

  • Enterprise Architect
  • Information Architect
  • Solutions Architect
  • Software Architect
  • Systems Architect

B – Blueprints

Following Blueprints and Patterns either published by vendors (such as the Microsoft Blueprints) or developed internally around your products and services will ensure repeat-ability and cost control around the design process.

Some examples showing different pattern types can be found at Architecture Patterns

C – Contextual Web Era

The up and coming 4th Platform area is the Contextual Web Era

  • 1st Platform – Mainframe Era
  • 2nd Platform – Client Server Era
  • 3rd Platform – Cloud Era
  • 4th Platform – Contextual Web Era

This is an up and coming era with lots of new innovation and developments. Keeping up with developments is key going forward for any architect to understand designs/solutions, art of the possible now and future, innovation and for developing roadmaps for solutions.

D – DevOps

To quote Wikipedia – “DevOps (a clipped compound of “development” and “operations”) is a culture, movement or practice that emphasizes the collaboration and communication of both software developers and other information-technology (IT) professionals while automating the process of software delivery and infrastructure changes”. Having knowledge of DevOps, OpsDev and Agile assist with Architecting a solution for a business understanding their practices and modes of interacting with technology to meet business requirements. A Good book on the subject of DevOps is “The Phoenix Project” by Gene Kim.

E – Enterprise Architecture

EA (Enterprise Architecture) is a blueprint that defines how a business can meet its objectives and strategy. This is achieved by conducting analysis, design, planning, recommendations and implementations through an Enterprise Architecture Framework

Enterprise Architecture Wikibook

F – Four Two Zero One Zero

42010 is the ISO Standard that most frameworks adhere to. Working to a Framework brings structure to your designs and life cycles.

There are a number of frame works available such as:

  • DoDAF
  • MoDAF
  • TOGAF
  • Zachman
  • Other Frameworks are available

Enterprise Architecture Wikipedia Book

G – Governance

Governance is an important part of architecture as it

  • Ensures Conformance
  • Controls Variance
  • Maintains Vitality
  • Enables Communication
  • Sets Direction
  • Issue Resolution
  • Provides Guidance and Prioritisation
  • Promotes Best Practise
  • Minimises Risk
  • Protects IT environments from tactical IT changes, project solutions, and strategic proposals that are not in an organisations global best interest
  • Controlling Technical Diversity, Over-Engineering and Unnecessary Complexity
  • Ensures projects can proceed quickly & efficiently
  • Control over IT spend
  • Quality Standards
  • Efficient and optimal use of resources and increase the effectiveness of IT processes

H – Hands On

It is important to be current and understand the technologies you are architecting. There are lots of options available to get your hands dirty using technology from using Cloud Servers to virtual machines on your compute device. There are other computing devices such as the Raspberry PI that provide a cheap alternative to standing up small farms to learn on.

I – IoT

IoT (Internet of Things) is where physical things are connected by the internet using embedded sensors, software, networks and electronics. This allows the items to be managed, controlled and reported on. My blog posts on IoT Device Security Considerations and Security Layers goes into more detail on this subject.

J – Juxtaposition

Juxtaposition is something an architect should be doing to compare things/items/artefacts etc.
noun;
1. an act or instance of placing close together or side by side, especially for comparison or contrast.
2.the state of being close together or side by side.

Source:http://dictionary.reference.com/browse/juxtaposition

K – Knowledge

I would class Skills with Knowledge. It is important as an Architect to ensure that your skills/knowledge are up to date and where you are unsure of a technology, you have a plan to address and skill up. Build a good CPD (Continuing Professional Development) plan and work towards completing it.

L – Language

With the move to cloud it is important to ensure your scripting skills are up to date as most cloud platforms use scripting to assist with the deployment of environments. This is also true of other DevOps/OpsDev applications. If you are unsure on what to learn this guide may help you – Learn a Programming Language – But which one?

M -Micro Segmentation

Micro Segmentation allows a business to use Networks, Compute and Storage to automate and deliver complex solutions by carving up and using the infrastructure. This segments part of the infrastructures to specific functions/tasks. It can also be used in a security context to segment networks, firewalls, compute and storage to increase security and reduce cyber attacks.  VMware have produced a book “Micro Segmentation for Dummies” that can be downloaded from here.

N – Next Generation

Next Generation refers to the next stage or development to something such as a new release of hardware or software. Next Generation is becoming a common term now to define products and artefacts, an example being Next Generation Firewalls.

O – Open Source

Open Source has been available for a long time with software such a Linux, however there is a bigger shift towards using Open Source and acceptance by businesses. Some examples of Open Source that is now mainstream within business include;

  • Ansible
  • Chef
  • Docker
  • Puppet

P – Performance

Performance can cover people as well as solutions / systems. Performance metrics should be set out at the inception of an engagement then monitored and reported on. This will be a factor in driving Continuous Improvement going forward as well as forecasting / planning for future upgrades and expansion.

Q – Quality

Quality is a huge subject and has a lot if standards governing it and how it affects all aspects of business and architecture. Knowing which standards and how they affect a solution will assist in the whole architecture lifecycle. There are also a number of tools available to help you;

  • Architecture Frameworks
  • ITIL
  • Six Sigma

There is also a level of pride and satisfaction in producing a quality solution and system achieving the objectives and requirements set out by the business.

R- Roadmap

Any architecture/solution should have a roadmap to set out its future. Roadmaps should include items such as:

  • Current state
  • Future state
  • Innovation
  • Upgrades / Releases
  • New Features / Functions
  • End of Life / Replacement

S – SMAC

SMAC stands for Social, Mobile, Analytics, Cloud. SMAC is an acronym that covers the areas and concepts when these four technologies are brought together to drive innovation in business. A good description of SMAC written by a colleague can be found here Acronyms SMAC.

T – Transformation

The majority, if not all systems will undergo a form of transformation. This may be in the form of a simple upgrade or to a complex redesign and migration to something else.

U – UX

UX (User eXperience) affects how people interact with your architecture / design and how they feel about it (emotions and attitudes). With the boom in apps and the nearing Contextual Web Era, UX is one of the most important factors to getting an architecture used. If your users don’t like the system they may find something else to use that they like.

V – Vision

Understanding the vision of your customer and their business is the driving factor for any architecture.

On working with your customer you should look to become a Trusted Advisor and also with your colleagues. A great book on the subject is The Trusted Advisor by David Maister. The book covers 3 main areas which discusses perspectives on trust, the structure of trust building and putting trust to work.

W – WWW

The internet is a key delivery mechanism for systems. Knowing how this works and key components to the internet should be understood such as:

  • IPV4 – IPV6
  • DNS
  • Routing
  • Connectivity
  • Security

X – X86

X86 – is a standard that every knows as its one of the most common platform types available.

Y – Year

Year is for the longevity of the solution you are designing. How many years are your expecting it to last What are the Business Requirements, statutory obligations, depreciation etc that need to be planned in. Consider things like End of Life, Maintenance and Upgrades on hardware and software from a solution point of view.

Z – Zero Defects

The best solution is the one with zero defects, but reaching this goal can be a challenge and can also consume a lot of expense. The best way to ensure Zero Defects is to use:

  • Best Practice
  • Reference Architectures
  • Blueprints/Patterns
  • Checklists
  • Reuse
  • Lessons Learnt

This is my current A to Z and some of the entries may be different in your version so “What is in your A to Z of being an Architect?”

I will look to write some further blog posts on the areas listed in this A to Z

Share this:

  • Twitter
  • Facebook
  • LinkedIn
  • Email
  • Pinterest

Like this:

Like Loading...

The Internet of Security and Things

08 Tuesday Sep 2015

Posted by Max Hemingway in Big Data, Cloud, IoT, Security

≈ Leave a comment

Tags

Cloud, Data, IoT, Security

How secure is the Internet of Things?

Traditionally we have been used to MThingsalware protection and Anti-Virus on our PC’s, then moving to laptops and other devices. Now on phones and slowly moving towards
the Internet of Things.  One article in the news today caught my eye where it is reported that Malware is being found pre-installed on devices, in this case Mobile Phones. G Data Report

It would seem that the hackers are trying to get the jump on the industry well before the devices are falling into the hands of the consumer. This is not the first time such incidents have been reported.

The race for Internet of Things sensors, devices and “Things” is growing fast, however with these incidents of Malware being found, how long will it be before code is appearing on chips on sensors that shouldn’t be there.

There are lots of Operating Systems available for the IOT. These can be classed as the mainstream ones that appear in the news and everyone knows such as Microsoft, Raspberry Pi, Linux etc, to the less know ones that are used on chipsets such as Contiki, TinyOS, Nano-RK.   (See https://maxhemingway.com/2015/04/14/iot-operating-systems/).

There are a number of challenges for the IoT industry, businesses and consumers (this list is not exhaustive);

  • Authentication
  • Data Capture
  • Encryption
  • Intrusion – Application, Network and Physical
  • Location tracking
  • Malware/Anti-Virus
  • Service disruption
  • Taking control of devices

These threats will drive the Internet of Security to protect the Internet of Things.

Cisco is looking to tackle some of these by running a Security Grand Challenge to offer prizes to the best security solutions.

More competitions and challenges will probably emerge as the industries try to understand and protect against the risks and use a crowd source model to help protect the IoT.

Share this:

  • Twitter
  • Facebook
  • LinkedIn
  • Email
  • Pinterest

Like this:

Like Loading...

Visual Introduction to Machine Learning

03 Monday Aug 2015

Posted by Max Hemingway in Data Science, Machine Learning

≈ 2 Comments

Tags

Data, Data Science, Machine Learning

I came across this “Visual Introduction to Machine Learning” in a forum. This is an experimental site showing statistical thinking with an interactive web page. The page builds as you scroll down and takes you through a journey of Machine Learning.

It provides a high level graphical view of:Machine

  • Nuance
  • Drawing boundaries
  • Machine learning
  • Forks
  • Tradeoffs
  • Best splits
  • Recursion
  • Trees
  • Making predictions

The URL in the Web page indicates that this is part 1 so hopefully there will be more to follow with the first page indicated further posts on “overfitting, and how it relates to a fundamental trade-off in machine learning”

You can follow this project on Twitter @r2d3us

Other posts on Machine Learning

  • In-depth Introduction to Machine Learning
  • Learning Data Science

Share this:

  • Twitter
  • Facebook
  • LinkedIn
  • Email
  • Pinterest

Like this:

Like Loading...

Learning Data Science – Useful References

14 Tuesday Jul 2015

Posted by Max Hemingway in Big Data, Data Science, Machine Learning, Open Source

≈ 1 Comment

Tags

Big Data, Data, Data Science, Knowledge, Machine Learning

Firstly thanks to Tim Osterbuhr who prompteLearningd me to create this list of resources that I have found useful in learning about Data Science after he read my blog post on Learning Data Science. Tim has also provided some of the likes below as well.

Here is the list of Useful References for Learning Data Science. (This list is be no means exhaustive)

From my Blog

  • Learning Data Science
  • Data Science in the Cloud ebook
  • Data Science and Information Theory
  • Data Mining Courses
  • Open Source, Open Human, Open Data, Open Sesame!
  • Data Scientist Skill Set
  • R {swirls} – Learning R by doing
  • Correlation does not imply causation
  • Statistical Inference Resources

From Around the Web

  • 6 checkpoints to ensure regression model validity for analytics
  • Algorithms: Design and Analysis
  • Analyzing Big Data with Twitter
  • Big Data Analytics: Descriptive Vs. Predictive Vs. Prescriptive
  • Data Analysis
  • Data Mining for the Masses
  • Data Science Course
  • Google Visualization API Reference
  • k-means clustering
  • Occam’s Razor
  • PCA Step by Step
  • Regression Equation: What it is and How to use it
  • Using JavaScript visualization libraries with R

Public Data Sets

  • http://www.cs.cmu.edu/~./enron/
  • http://www.secviz.org/content/the-davix-live-cd
  • http://www.caida.org/data/overview/
  • http://www.secviz.org/content/visual-analytics-workshop-with-worlds-leading-security-visualization-expert-0
  • http://snap.stanford.edu/data/
  • http://analytics.ncsu.edu/
  • https://code.google.com/p/google-refine/

Data Science Books

  • 9 Free Books for Learning Data Mining & Data Analysis
  • 16 Free Data Science Books
  • 27 free data mining books

Happy to add other links from readers to this list.

Share this:

  • Twitter
  • Facebook
  • LinkedIn
  • Email
  • Pinterest

Like this:

Like Loading...

Open Source Web Crawlers and Data Sets

15 Friday May 2015

Posted by Max Hemingway in Big Data, Data Science

≈ 1 Comment

Tags

Big Data, Data, Data Science

webA great list of 50 Open Source Web Crawlers has been produced by Baiju NT on a Big Data Blog

Web Crawlers are useful in gathering data from other sites when performing research, although caution should be used as with today’s levels of protection some sites defenses may consider your data gathering as an attack.

Its probably best to check first if any data sets exist with the data you are looking for.

https://www.quandl.com/ is a search engine for data sets that has listed 12 million data sets.

There are lots of data sets available from governments such as http://data.gov.uk/ in the UK.

If its a smaller list of good data sources is needed have a look at http://www.kdnuggets.com/datasets/index.html

Sources:

  • https://www.quandl.com/
  • http://www.kdnuggets.com/datasets/index.html
  • http://bigdata-madesimple.com/top-50-open-source-web-crawlers-for-data-mining/

Share this:

  • Twitter
  • Facebook
  • LinkedIn
  • Email
  • Pinterest

Like this:

Like Loading...

Data Mining Courses

28 Tuesday Apr 2015

Posted by Max Hemingway in Big Data, Data Science

≈ 1 Comment

Tags

Big Data, Data, Data Science, learning

mineVia Coursera the University of Illinois at Urbana-Champaign is running a specialisation on Data Mining.  As with all Coursera courses, you don’t have to take the specialisation, but can take the courses individually or one after each other. Taking the courses outside of the specialisation means that you wont get to complete the capstone project and earn your certificate at the end.

This track is made up 5 courses covering:

Pattern Discovery in Data Mining

  • Introduction to data mining
  • Concepts and challenges in pattern discovery and analysis
  • Scalable pattern discovery algorithms
  • Pattern evaluation
  • Mining flexible patterns in multi-dimensional space
  • Mining sequential patterns
  • Mining graph patterns
  • Pattern-based classification
  • Application examples of pattern discovery

Text Retrieval and Search Engines

  • Introduction to text data mining
  • Basic concepts in text retrieval
  • Information retrieval models
  • Implementation of a search engine
  • Evaluation of search engines
  • Advanced search engine technologies

Cluster Analysis in Data Mining

  • Basic concept and introduction
  • Partitioning methods
  • Hierarchical methods
  • Density-based methods
  • Probabilistic models and EM algorithm
  • Spectral clustering
  • Clustering high dimensional data
  • Clustering streaming data
  • Clustering graph data and network data
  • Constraint-based clustering and semi-supervised clustering
  • Application examples of cluster analysis

Text Mining and Analytics

  • Overview of text analytics and applications
  • Extending a search engine to support text analytics (text categorization, text clustering, text summarization)
  • Topic mining and analysis with statistical topic models
  • Opinion mining and summarization
  • Integrative analysis of text and structured data

Data Visualization

  • Visualization Infrastructure (graphics programming and human perception)
  • Basic Visualization (charts, graphs, animation, interactivity)
  • Visualizing Relationships (hierarchies, networks)
  • Visualizing Information (text, databases)

These courses would complement the courses from John Hopkins on Data Science

Source: https://www.coursera.org/specialization/datamining/20

Share this:

  • Twitter
  • Facebook
  • LinkedIn
  • Email
  • Pinterest

Like this:

Like Loading...
← Older posts

Technology Couch Podcast

Technology Couch Podcast

Topical discussions with different guests on Technology

Chat and views on latest Technology trends, news and what is currently hot in the industry

Max Hemingway

Follow via iTunes

RSS Feed

RSS Feed RSS - Posts

Currently Reading

@HemingwayReads

Other Publications I contribute to

https://sparrowhawkbushcraft.com/

Recent Posts

  • Logical and Creative Thinking
  • Experimental Mindset
  • Data Storytelling
  • Data Fellowship
  • Geek Out as a Scout Leader – Rolling a NAT 20

Categories

  • 21st Century Human
  • 3D Printing
  • Applications
  • Architecture
  • Arduino
  • Automation
  • Big Data
  • Certification
  • Cloud
  • Cobotics
  • Connected Home
  • Data
  • Data Science
  • Development
  • DevOps/OpsDev
  • Digital
  • DigitalFit
  • Drone
  • Enterprise Architecture
  • Governance
  • Innovation
  • IoT
  • Machine Learning
  • Micro:Bit
  • Mindset
  • Networks
  • Open Source
  • Podcasts
  • Productivity
  • Programming
  • Quantum
  • Raspberry Pi
  • Robotics
  • Scouting
  • Scouts
  • Security
  • Smart Home
  • Social Media
  • STEM
  • Tools
  • Uncategorized
  • Wearable Tech
  • Windows
  • xR

Archives

Reading Shelf

Archives

Recent Posts

  • Logical and Creative Thinking
  • Experimental Mindset
  • Data Storytelling
  • Data Fellowship
  • Geek Out as a Scout Leader – Rolling a NAT 20

Top Posts & Pages

  • Logical and Creative Thinking
  • About
  • Playing a Game with Innovation and Thinking
  • Data Scientist Job Titles, Architecture and Software Warlocks
  • Manual tasks of today should be the Automated tasks of tomorrow
  • Having the Right Digital Mindset: Application
  • Personal Knowledge Management System – Revised for 2020
  • STEM - Hypothetical Big Questions - Robots
  • IT Professionals and Continuing Professional Development (CPD) Hours
  • Personal Knowledge Management System

Category Cloud

21st Century Human Architecture Automation Big Data Cloud Cobotics Data Data Science Development DevOps/OpsDev Digital DigitalFit Enterprise Architecture Governance Innovation IoT Machine Learning Mindset Open Source Podcasts Productivity Programming Raspberry Pi Robotics Security Social Media STEM Tools Uncategorized Wearable Tech

Tags

# 3D Printing 21st Century Human AI API Applications Architecture Arduino Automation Big Data Blockchain Certification Cloud Cobot Cobotics Coding Communication Connected Home Continuous Delivery CPD Data Data Science Delivery Development DevOps Digital DigitalFit Digital Human Docker Drone Email Encryption Enterprise Architecture Framework GTD Hashtag Infographic Information Theory Innovation IoT Journal Knowledge learning Machine Learning Micro:Bit MicroLearning Mindset Mixed Reality Networks Open Source OpsDev Podcasts Productivity Programming Proving It R RaspberryPI Robot Robotics Scouts Security Smart Home Social Media Standards Statistical Inference STEM Technology Couch Podcast Thinking Tools Training Visualisation Voice Wearable Tech Windows xR

License

Creative Commons Licence
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Meta

  • Register
  • Log in
  • Entries feed
  • Comments feed
  • WordPress.com

Blog at WordPress.com.

Cancel

 
Loading Comments...
Comment
    ×
    loading Cancel
    Post was not sent - check your email addresses!
    Email check failed, please try again
    Sorry, your blog cannot share posts by email.
    <span>%d</span> bloggers like this: