Contrary to appearances, work on an AI project does not begin when the developer sits down in front of the computer, but already at the stage of data collection. This collection of information is the foundation of a specific tool, so proper execution of this process is crucial for the success of the project. Knowing the pitfalls lurking on this road, we believe that this is the right time to start working with companies such as Numlabs.

The price of mistakes

The rule is simple - the output results of any AI-based system depend on the quality of the input data. Unfortunately, as our experience shows, not many people remember about this. Mistakes made in the phase of collecting, describing and processing data are one of the most common problems our team encounters when starting to work on a commissioned project. The reasons may vary: from accumulating inappropriate data, through saving it in non-functional formats, to incompetence of subcontractors entrusted with the task. The effects are always the same - higher project costs, longer implementation time and unsatisfactory results. What might this look like? Here are two examples:

1. A client hired a subcontractor to manually transcribe a large data file. As a result, he was provided with spreadsheets that included time stored as hh:mm:ss.sss. Due to the manual nature of the work performed, the data was not free of typically human errors (for example, a mistake in entering values). The anomaly is only revealed by the team hired to develop the AI tool. So before proceeding with the actual task, developers must first analyze the data collected by the subcontractor to diagnose the problem, and then use a better transcription tool, or at least one that finds and highlights errors in the worksheet, which would make it easier to apply corrections. Ultimately, finalization is delayed.

2. Another client is hiring AI industry professionals to help expand a team responsible for broad growth and a culture of experimentation. The developers are working with a well-managed and committed team. So at first, nothing heralds complications. Mistakes come to light only at the end of the work. An audit of the solutions responsible for switching experiments and collecting data shows that the cause lies at the heart of the project - in the data. The internal team was focused on another task and treated data collection as a side task. The cost of the project increases significantly because the errors at the root determined the shape of the whole.

"Data Defender" from start to finish

However, similar problems can be avoided relatively easily. Problems of the kind described above result from suboptimal project organization. This is the approach of using solutions that address most, but not all, of the project's challenges. Data useful from a machine learning perspective is often, as in the second example cited above, a by product of the tool's core functionality. Typically, therefore, the team working on it does not focus on the data collected for learning the AI system and does not have machine learning expertise. Thus, the information collections are basically devoid of a stakeholder who would serve to support the decision-making of the Product Owner and developers and illuminate the possible consequences of actions proposed by the ones that are not related with ML. This attitude makes the project error-prone and puts the entire project at risk of failure.

It is at this initial stage that a "Data Defender" should come to work - a specialist or a team of them oriented towards future work in machine learning. Present at the project from the very beginning, he or she could create the basis for a data collection system that would integrate the stage of information accumulation and subsequent work with it. The benefit for the client is glaringly obvious - the solution protects against the risk of problems delaying production, or at least gives the opportunity to modify the project later in terms of machine learning. So as you can see, the presence of a "data protector" throughout the project, not just in the last phase, is crucial.

Our experience in your project

At Numlabs we specialize in preparing practical tools using artificial intelligence. Therefore, by joining the project we bring our Machine Learning know-how. Including us as early as possible in the project will minimize the risks and reduce the costs associated with processing data for ML. In short, we ensure more efficient project execution. At the same time we care about the comfort of working with the tool, because we know from experience how important user experience and ergonomics are for employees. Cooperation with Numlabs would consist primarily in consulting on the project, early evaluation of collected data and improving the product so that it satisfies both the user and the team creating it. Would you like to benefit from our experience? Contact us and find out more.