Back in February 2021 I wrote a short blog about a Data Fellowship apprenticeship that I was beginning. Today that journey came to an end when I received notification that I had passed the final parts of the course, exam, projects and interview. This means that I now hold a qualification and am awaiting my certificate as BCS Data Analyst (level 4).
It has been a long journey to completion, but each stage has been an adventure and one that I have enjoyed working through.
I know that I haven’t posted into my blog in a while. Mainly because I have been busy with my Data Fellowship and a few other things. Recently I have been studying for todays exam “BCS Level 4 Certificate in Data Analysis Tools” – QAN 603/0824/2.
The ability to still take exams at home (under exam conditions), is a bit more relaxing than having to take a journey to get to an exam centre, but still just as unnerving as you complete and press the end exam button awaiting the mark. The ability to take exams at home, still under the same conditions with cameras on and screen shared does open the ability to obtain qualifications up to more people and fit them in better around a normal working day.
The objectives of this part of the course/exam are:
Explain the purpose and outputs of data integration activities
Explain how data from multiple sources can be integrated to provide a unified view of the data
Describe how programming languages for statistical computing (SQL) can be applied to data integration activities, improving speed and data quality for analysis
Explain how to take account of data quality when preparing data for analysis, improving quality, accuracy and usefulness
Explain the nature and challenges of data volumes being processed through integration activities and how a programming approach can improve this
Understand testing requirements to ensure that unified data sets are correct, complete and up to date
Explain the capabilities (speed, cost, function) of statistical programming languages and software tools, when manipulating, processing and cleaning data and the tools required to solve analysis issues
Explain how statistical programming languages are used in preparing data for analysis and within analysis projects
The last exam that I took was in an examination centre where you turn up and sit at an already configured computer. This time I sat the exam at home in my dining room with camera and microphones on. Special software ensuring that my only windows open are the exam and meeting room with the invigilator watching me.
Sitting down getting ready for the exam, I hit that unfortunate moment of your laptop is about to reboot and install an operating system upgrade. Great timing! Just enough time to get another device loaded with the right software and logins to the required pages. Not a good start to entering an exam for the mindset, but all went well in the end.
Study for this stage of the Data Fellowship has been part of the apprenticeship course and objectives. For me it was a cementing of the concepts and bringing some areas up to date.
Objectives are: Demonstrate knowledge and understanding of Data Analysis and its underlying architecture, principles, and techniques. Key areas are:
Explore the different types of data, including open and public data, administrative data, and research data
Understand the data lifecycle
Illustrate the differences between structured and unstructured data
Understand the importance of clearly defining customer requirements for data analysis
Understand the quality issues that can arise with data and how to avoid and/or resolve these
Explore the steps involved in carrying out routine data analysis tasks
Understand the range of data protection and legal issues
Explore the fundamentals of data structures
Explore the database system design, implementation, and maintenance
Understands the organisation’s data architecture
Understands the importance of the domain context for data analytics
Our brain is an amazing organ of that learns, remembers, controls, moves, repairs a complex body. It is in control of lots of functions and as part of that it is also responsible for our Logical and Creative Thinking. There are lots of articles that talk about the left side of the brain being responsible for Logical and the right side for Creativity. This was first researched by Roger Wolcott Sperry with his work on the split brain.
Either way the brain is still an amazing thing and you can learn to use both Logical and Creative Thinking techniques, you just need to apply a growth mindset.
“We cannot solve our problems with the same thinking we used when we created them.” – Albert Einstein
Logical Thinking
Logical thinking helps us to make “sense” of things, coming up with solutions and in decision making.
The five W’s and 1 H are commonly used as questioning to help form logical thinking. These are
Who
When
Why
What
Where
How
Some add another H – How Much to the list as cost can play an important factor in decisions.
Creative Thinking
Creative thinking helps us approach things with an out of the box approach and an ability to look at things through different lenses to discover new solutions.
Balanced View
Taking a balanced view across Logical and Creative thinking, the Six Thinking Hats written by Dr. Edward de Bono starts to provide a balanced view by using the idea of parallel thinking to plan and use thinking more effectively. This can include logical and creative thinking.
Blue Hat – Process
manage process
action plans
next steps
reviewing thinking
summary
White Hat – Facts
data
facts
information needed
information available
Red Hat – Feelings
feelings
hunches
instinct
intuition
Green Hat – Creativity
creativity
solutions
ideas
alternatives
possibilities
Yellow Hat – Benefits
positives
brightness and optimism
value
benefits
Black Hat – Cautions
difficulties
potential problems
weaknesses
Build on the Skills
Learn different ways of thinking
Learn some new ways of thinking that you have not used before.
Practice and mix it up
As the phrase goes “Practice makes perfect”. Using different methods of thinking can bring different views and possibly different solutions to the problem/challenge.
Personally I have created my own set of cards based on several ways and methods of thinking that I use when I am looking at a problem. See my blog post Playing a Game with Innovation and Thinking.
Work with others
There is nothing better than working with others to bring in different views and ways of thinking that you may not have thought of previously. This is a great way of seeing how other people approach the problem/challenge and help identify if there are areas you can improve/learn on.
Be creative
Spend some time on creative hobbies that will help you build you creative thinking.
Learning a new skill
Learning a new skill will help you develop your thinking.
We have all at sometime done some sort of experiment, from maybe from a young age as to see which cry and actions resulted in the reward of milk to test driving cars to find which is best suited to your needs before you buy it. These are experiments that produced results from things we have tried and may not have thought about it as developing an Experimental Mindset. In this article I am concentrating on how this applies to data.
Here are my notes from my research into the topic.
Having an Experimental Mindset is one of the key traits in being a Data Analyst or Data Scientist and it is not a new term. This has been around as long as the field of science and research has. These arena have developed methodologies that have been adopted and taken forward by many other areas such as business and computing that can be used for testing and evaluating.
Overlaid with the areas for data this can be shown as:
Observations (Learning) –> Hypothesis (Testing) –> Scientific Law (Evaluating)
or as:
Observations (Data) –> Hypothesis (Product/Service) –> Scientific Law (Predictive Model)
Using this methodology, one of the more common types of Hypothesis Testing is A/B Testing. This sets out a framework for a simple controlled experiment against two versions (A and B) to look at the impact of changes to a thing or product. Some useful articles on A/B Testing are listed below that go into the details of it:
Humans have been using the medium of storytelling since the beginning, but only really recording it from the moment a wet painted hand went onto a cave wall. These days we read stories in books or access stories over the internet on our tablets and other devices.
The main key to all of storytelling is data in one form or another. From 1 x wooly mammoth and 3 x hunters (thats 4 items of data) in a cave painting to the complexity of how many bits and bytes are in an online book.
For a good explanation on What is data? – Cassie Kozyrkov, Head of Decision Intelligence,@ Google has written some great posts and videos on the subject.
So when we have data, we use stories to explain what it is telling us – hopefully not through 1000’s of powerpoint slides…….Make it Stop!!. What are you going to put in those slides that will keep the audience hooked and focused.
Stories are normally based around a simple concept of beginning, middle and end, however there is more to it that that if you want to tell a good story.
The first thing through before getting to the story is to make sure you understand what the data is telling you. If you don’t understand the data and your asked a question, will you be able to answer it or further illustrate your point. Keep in mind – EVALUATE – LEARN – PRACTICE. Then maybe practice some more until you are confident with what your about to talk about.
Decluttered and simple visuals help to tell the story and keep the audience focused on what you are telling them, rather than they spend the time trying to understand what all that text and facts are on the screen. Information is Beautiful is a site that shows some ways to display data visually in easy to understand ways by David McCandless. Here is his TED talk:
Stories normally follow a Heroes Journey which takes the plot line through a series of steps to keep the audience wanting more and to continue to read the rest or listen until the end. When storytelling about data, as similar construct can be used using the Heroes Journey:
Sequence
Heroes Storytelling Step
Data Storytelling Step
1
Status Quo
Whats the current normal
2
Call to Adveture
The Question (What is being asked of the data)
3
Assistance
What are the Sources
4
Depature
Turn the data into something understandable
5
Trails
Data Analysis
6
Approach
Methods used
7
Crisis
Data Modelling / Wrangling
8
Treasure
The Findings
9
Result
Result
10
Return
Presentation
11
New Life
New normal
12
Resolution
Review
13
End
End or maybe a different question?
Data Storytelling using a Heroes Journey
There is a good explanation of the different styles of Heroes Journey on Wikipedia. the above table is change a bit. Heres a video that goes through a format:
Now we have a structure, how you tell the story is just as important. How can you pursuade the audience about the data and point of view that you are presenting?
There are, then, these three means of effecting persuasion. The man who is to be in command of them must, it is clear, be able (1) to reason logically, (2) to understand human character and goodness in their various forms, and (3) to understand the emotions–that is, to name them and describe them, to know their causes and the way in which they are excited.
Pathos – How topic effects you – connect and bridge the gap (Current emotional state, Target emotional state)
Logos – Why it effects you – story / proposal (Reasonableness, Consistency, Clarity)
Karios – Time and place
Ethos – ‘It is not true, as some writers assume in their treatises on rhetoric, that the personal goodness revealed by the speaker contributes nothing to his power of persuasion; on the contrary, his character may almost be called the most effective means of persuasion he possesses.’
Pathos ‘persuasion is effected through the speech itself when we have proved a truth or an apparent truth by means of the persuasive arguments suitable to the case in question.’
Logos ‘persuasion may come through the hearers, when the speech stirs their emotions. Our judgements when we are pleased and friendly are not the same as when we are pained and hostile.’
Rhetoric, Aristotle
Karios is an Ancient Greek word meaning the right, critical, or opportune moment.
How we can use these areas is illustrated in this example:
When preparing for the Storytelling session its worth checking that you are not going to fall into the trap of the “echo chamber effect”. From my post on the subject I have created the following term to help me remember – STACK
Step Back
Think
Absorb other views
Challenge your thinking
communicate your Knowledge
Storytelling is more trustworthy than just presenting data on its own. One to consider when you create your next PowerPoint Presentation.
Data, it’s everywhere and there are thousands, millions, billions…… lets just say “lots” of data created evry second of the day, from articles and discussions on the internet, to texts and whats apps, to cars, to well anything with a chip in it really. It goes a huge way to ruling our lives and telling us how to live, from what to eat to the carbon footprint of the world. so when I was given an opportunity to undertake an apprenticeship in Data Analytics on a Data Fellowship Apprenticeship over the next 18 months. Of course Im going to jump at that!
A great way to check my understanding and knowledge on things and learn many new things and more importantly for me provide a qualification at Data Analyst Level 4 standard.
So what is the So What? At the moment the programme is starting, so not much to report back so far, however I have started to document some of my journey and bits in my GitHub repo and will use this and my blog to record my thoughts and learnings going forward. Watch this space as they say.
When learning Data Science one area to learn is that of probability.
William Chen has created a good 10 page Probability Cheat Sheet to help guide you through. The content is based on “Harvard’s Introduction to Probability”.
The cheatsheet summarizes important probability probability concepts, formulas, and distributions, with figures, examples, and stories.
There are also 16 Data Science books listed on his site. A couple of which cover statistics and probability.
I came across this “Visual Introduction to Machine Learning” in a forum. This is an experimental site showing statistical thinking with an interactive web page. The page builds as you scroll down and takes you through a journey of Machine Learning.
It provides a high level graphical view of:
Nuance
Drawing boundaries
Machine learning
Forks
Tradeoffs
Best splits
Recursion
Trees
Making predictions
The URL in the Web page indicates that this is part 1 so hopefully there will be more to follow with the first page indicated further posts on “overfitting, and how it relates to a fundamental trade-off in machine learning”
Firstly thanks to Tim Osterbuhr who prompted me to create this list of resources that I have found useful in learning about Data Science after he read my blog post on Learning Data Science. Tim has also provided some of the likes below as well.
Here is the list of Useful References for Learning Data Science. (This list is be no means exhaustive)