Data Science Internship at eonum AG
Experience report by Joel Bessire
26. August 2019
After I successfully completed my master’s degree in mathematics at the University of Zurich in 2016, I was quite uncertain about my future career. Many of my colleagues opted for insurance companies or banks, some for computer science and some jumped straight into their doctoral thesis. I decided to go on a long trip abroad and think about my future career afterwards. But even after my return, I haven’t formed a concrete plan and so I decided to work temporarily, while keeping an eye on several job portals. I was hoping for advertisements in the areas of computer science, such as data science or artificial intelligence (AI). Although, I had little previous knowledge, this area particularly appealed to me.
In the beginning, the search was rather difficult. Despite the good qualifications of studying mathematics, almost all job advertisements were associated with innumerable requirements and previous experience. Although I was confident that I could learn them on the job, experienced applicants had a clear advantage. At the start of 2019, the eonum AG invited me to an interview. Our conversation was very pleasant, and I was immediately struck by how likeable the team was. A few days later, I was able to sign a contract for a four-months internship.
On 1 April, it was time to start my first day in Bern. Tim Peter, eonum AG’s founder and CEO, welcomed me and outlined the agenda of my internship. The first step was to set up my working station. This means to set up my Notebook with the current Ubuntu and to install the needed software from the terminal. The commands that I had to look up and try to understand were in the in-house wiki. A wiki is a documentation platform where all relevant information can be entered, collected and shared with other users. The most famous example for a wiki is Wikipedia. To someone that was working with a terminal the last time five years ago during its studies, the sheer flood of commands, options and parameters was tremendous. At the end of my first day, I was exhausted. The cognitive challenge reminded me of my study time. It was demanding, but fun from the very beginning.
The next step was to gain an overview over the skills that I had to acquire during the internship. The four main fields were classic machine learning (ML) and deep learning (DL), natural language processing (NLP), basics and engineering as well as programming. Among them were terms like Neural Networks, Keras, Word2vec, OSI, ssh, Linux Shell, Python, JSON and so on. The know-how to be learned could fill an entire course of studies and so it was important to approach the whole thing as systematically as possible.
In order to familiarize myself with ML and DL, I enrolled for an online specialization on the Coursera training platform. The specialization in DL by Andrew Ng, co-founder of Coursera and well-known AI programmer, required basic knowledge in Python, which I’ve already possessed thanks to a basic course during my studies. Shallow as well as Deep Neural Networks were introduced, which were followed by optimization and regulation possibilities. The course content were explained by Andrew Ng himself. There were quizzes and programming tasks at the end of every week.
Now, my mathematics studies proved to be extremely useful. Neural nets, in a nutshell, are nonlinear functions that map from one geometric space to another. Those functions have weights, which are being optimized during the training. For this an optimization function is defined, which is optimized by gradient descent. Those who feel threatened by AI and fear a terminator-like future may be reassured. There can’t be any mention of AI that feels, thinks and acts like humans.
At the same time as the online course, I started reading the book “Deep Learning with Python” by François Chollet. Chollet is a well-known AI scientist and author of Keras, a user friendly DL library written in Python. Many topics in the book complemented the online course perfectly. In order to get ready in basic ML topics too, I completed some microcourses on the online platform Kaggle. On Kaggle, research institutions or companies upload data and describe related problems. Kaggle users, mostly Data scientists and ML specialists, then try to solve the challenges that have been set. Anyone can register and in order to get started, Kaggle provides a page with learning content and exercises: the microcourses mentioned above.
I’ve read several chapters in the book “Linux-Server mit Debian GNU/Linux” related to the field of basics and engineering. As time went by, I got an understanding what I had done on the first day when setting up my workstation. My handling of the terminal became more and more fluent and I learned a lot about how computers work. In contrast to working with a graphical user interface (GUI), working from a terminal offers more options. The formal language used to communicate with the computer obeys a syntax. This syntax needs to be studied and understood, so the computer actually does what he is supposed to do. This gives you much more possibilities than if you were using a GUI. However, it is possible that the actual work is being delayed because you are busy with time-consuming installations. I keep in good memory my attempt to start a Python program via TensorFlow Framework on the graphics processor (GPU) of an external computer. Not particularly complicated in theory, this task proved to be much more complex in practice than expected. Troubleshooting was an arduous, time-consuming task, but led to a deeper understanding and many new insights.
To get some basic knowledge of NLP, I watched videos of the lecture “NLP with Deep Learning” by Chris Manning, Professor in ML at Stanford University. In addition, I solved the tasks associated with the lecture. On 18 June, I accompanied Tim to SwissText, a annual conference on text analysis and NLP. In the morning, there were different sessions during which, among others, Tamedia and Migros Bank were explaining the use of NLP in their company. In the afternoon, we visited a workshop about the push of researches in the field of NLP in Swiss German.
After approximately two months, Tim decided that I have gathered enough know-how to start with small practical tasks. I started to work my way through the in-house Python scripts in order to get a picture of it. Personally, I was impressed by the sheer amount of coding lines, mostly written by Tim himself. Now my theoretical training proved to be very useful. Tim assigned suitable projects to me, discussed them with me and was always patient when I had questions.
One of the projects was the integration of the SHAP values into the length of stay prediction of our software Casematch. Casematch is able to use implicit knowledge from data to make predictions about the length of stay of individual patients. SHAP values can be used to explain these forecasts. These indicate in the form of a list and various graphs which causes led to the predicted length of stay. This allows users to gain detailed insight into the ML models, often referred to as black boxes. The outputs become more understandable and also comprehensible for non-experts.
After an intensive four months, I am now at the end of my internship. It was a challenging time in which I learned a lot. Due to the small team, I could benefit from a close support. The work is exciting and varied, because you are involved in all steps from the idea, over the development up to the finished product. The team is warm-hearted and progressive. You’re allowed to put forward your ideas, have the best opportunities for professional self-fulfillment and you are constantly developing yourself in this already dynamic field. The office situation offers variety too. Together with four other parties, eonum AG forms an office community. The break and lunch conversations could not be more diverse. Here too, the contact is open and honest.
I am now looking forward to taking up a permanent position at eonum AG and to shaping the future together.