I have recently been asked a question by a colleague “what’s the difference between Data Science and Information Theory?”

Here is my viewpoint on this.

Information Theory
Wikipedia states “Information theory is a branch of applied mathematics, electrical engineering, and computer science involving the quantification of information.

Information theory was developed by Claude E. Shannon to find fundamental limits on signal processing operations such as compressing data and on reliably storing and communicating data”[1]

The influencial article by mathematician Claude E Shannon in 1948 is called “A Mathematical Theory of Communciation”. [2]

Shannon states in his paper that a communication system essentially consists of 5 parts[2]

1. An information source which produces a message or sequence of messages to be communicated to the receiving terminal
2. A transmitter which operates on the message in some way to produce a signal suitable for transmission over the channel.
3. The channel is merely the medium used to transmit the signal from transmitter to receiver.
4. The receiver ordinarily performs the inverse operation of that done by the transmitter, reconstructing the message from the signal.
5. The destination is the person (or thing) for whom the message is intended.

He then goes on to discuss that communications are classified in three main categories of discrete, continuous and mixed. Then applies his mathematical computations and theory for each:

PART I: DISCRETE NOISELESS SYSTEMS
1. THE DISCRETE NOISELESS CHANNEL
2. THE DISCRETE SOURCE OF INFORMATION
3. THE SERIES OF APPROXIMATIONS TO ENGLISH
4. GRAPHICAL REPRESENTATION OF A MARKOFF PROCESS
5. ERGODIC AND MIXED SOURCES
6. CHOICE, UNCERTAINTY AND ENTROPY
7. THE ENTROPY OF AN INFORMATION SOURCE
8. REPRESENTATION OF THE ENCODING AND DECODING OPERATIONS
9. THE FUNDAMENTAL THEOREM FOR A NOISELESS CHANNEL
10. DISCUSSION AND EXAMPLES
PART II: THE DISCRETE CHANNEL WITH NOISE
11. REPRESENTATION OF A NOISY DISCRETE CHANNEL
12. EQUIVOCATION AND CHANNEL CAPACITY
13. THE FUNDAMENTAL THEOREM FOR A DISCRETE CHANNEL WITH NOISE
14. DISCUSSION
15. EXAMPLE OF A DISCRETE CHANNEL AND ITS CAPACITY
16. THE CHANNEL CAPACITY IN CERTAIN SPECIAL CASES
17. AN EXAMPLE OF EFFICIENT CODING
PART III: MATHEMATICAL PRELIMINARIES
18. SETS AND ENSEMBLES OF FUNCTIONS
19. BAND LIMITED ENSEMBLES OF FUNCTIONS
20. ENTROPY OF A CONTINUOUS DISTRIBUTION
21. ENTROPY OF AN ENSEMBLE OF FUNCTIONS
22. ENTROPY LOSS IN LINEAR FILTERS
23. ENTROPY OF A SUM OF TWO ENSEMBLES
PART IV: THE CONTINUOUS CHANNEL
24. THE CAPACITY OF A CONTINUOUS CHANNEL
25. CHANNEL CAPACITY WITH AN AVERAGE POWER LIMITATION
26. THE CHANNEL CAPACITY WITH A PEAK POWER LIMITATION
PART V: THE RATE FOR A CONTINUOUS SOURCE
27. FIDELITY EVALUATION FUNCTIONS
28. THE RATE FOR A SOURCE RELATIVE TO A FIDELITY EVALUATION
29. THE CALCULATION OF RATES

Data Science
Wikipedia states “Data science is, in general terms, the extraction of knowledge from data. The key word in this job title is “science,” with the main goals being to extract meaning from data and to produce data products.”[3]

A Data Scientist practitioner will use various tools and methodologies to extract the information to a question they are set to produce a set of answers. The most important part of Data Science is the question.

Some of the tools/methodologies used are (*this list is by no means complete):
• Computer Programming
• Data Engineering
• Data Warehousing
• Discrete Optimisation
• Geometric Methods
• Graphical Representation
• Information Theory
• Machine Learning
• Modelling
• Pattern recognition and learning
• Probability Models
• Statistical Learning

A great start to learning some of these are covered in my previous blog post: https://maxhemingway.com/2014/12/12/learning-data-science/

In Conclusion
In answer to the question Data Science can use the foundations that Shannon set out for Information Theory in 1948 and others as the theorem has progressed over time, as well as other tools/methodologies to answer the question set. Information Theory is part of the core syllabus on some Data Science courses [4]