AI? ML? Big data? Data science? Data engineering? What is it all about? What are the differences? Is it just another buzzword? In this article we will try to dispel your doubts.
Over the last few years, artificial intelligence (AI) has moved from sci-fi books and movies to everyday life. As we’ve all heard, it has once again helped to discover a new drug, it has beaten a world champion in yet another game, and a rival company's product is due to have it back in a few months. Demystifying the term and leaving aside philosophical and futuristic considerations - the term AI is used today to describe special algorithms that are able to make conclusions based on the data presented to them.
At the moment, the greatest triumphs, due to their accuracy and capabilities, are the methods of so-called machine learning (ML), and in fact the terms ML and AI have become synonymous. There is a saying, popular among engineers, that AI is a sales term and ML is a technological term. "If something is in a Powerpoint presentation then it is AI, if it is written in Python [programming language] then it is machine learning." Among machine learning algorithms, the most well-known are artificial neural networks, and in particular those known as deep learning networks. Like other machine learning models, the networks have more in common with the functions and linear regression than with the brain and consciousness. The name historically refers to the fact that their computational scheme was modelled on neurons conducting electrical impulses. Networks are usually organised into layers, which process the input information one by one to present the result at the end. Whether a network is deep or not is indicated by the number of these layers. There is no specific limit number, but we can talk about deep networks when there are 5 to 10 layers. For comparison, popular networks such as VGG16 have about 100 layers.
Why are deep networks and related deep learning so special?
Well, they have made a real breakthrough. They are the ones that made it possible to recognise images, people in video, automatically translate texts, synthesise speech or generate fake news. Why have these technologies evolved only now? Two events have had a significant impact on this. Firstly, teaching such networks requires a lot of computing power. Currently, it is relatively cheap and easily available. The second problem is the learning mechanism itself. Leaving aside the technical details, methods that coped with learning neurons grouped in a few layers had huge problems with learning neurons grouped in dozens and more layers.
It is said that the best definition of big data is when excel is no longer sufficient to process data. The last decade has seen an explosion in the amount of information produced, mainly due to computerisation. Analytics on a website, sensors in a factory, ordering systems or applications in a smartphone collect large amounts of data about customers or a company. At some point, classic databases are not enough to store them efficiently. This is where big data solutions emerge. Not only do they make it possible to store data, but also to process it (for example, to check in real time for dangerous anomalies registered by sensors), to make it available, for example, to generate reports, to collect it and to send it. Whether you need big data solutions in the nearest future or in a new project, can be assessed relatively easily. The role of the data engineer is to build systems for acquiring and working with information. Data science is another related term. A person dealing with this branch, i.e. data scientist, combines the competences of a programmer, analyst, knows artificial intelligence solutions (although we already know that a better term is machine learning) and big data and, most importantly, is able to use them to extract useful knowledge and conclusions for the business. He or she solves issues that classic analysts, equipped with an effective spreadsheet and statistical methods, are not able to cope with.
While the data scientist is a natural evolution of the data analyst, the data engineer is like a descendant of the database engineer in the big data environment. This is simply someone who is tasked with developing and then maintaining databases and systems for big data solutions. A slightly extreme example of demand for a data engineer might be Facebook, which processes millions of messages, notifications or reactions per second. However, if you're planning an app to be used by thousands of people, you need to take it into account.
The best definition of big data is when excel is no longer sufficient to process data. The last decade has seen an explosion in the amount of information produced, mainly due to computerisation. Analytics on a website, sensors in a factory, ordering systems or applications in a smartphone collect large amounts of data about customers or a company. At some point, classic databases are not enough to store them efficiently.
I hope this short post has somewhat clarified the concepts in the title. First of all, it is worth remembering that ML is in practice a technical synonym for AI, which has nothing to do with the consciousness of machines. Big data is a term used to describe solutions dedicated to data that cannot be efficiently handled by classic databases. Data scientist and data engineer are professions resulting from the evolution of data analyst and database scientist in a world full of data, which requires a wider range of tools and skills than before.