As a teenager, I remember the excitement around each new episode of the reality TV show ‘Survivors’. While the viewers were given close to real glimpses to the characters intrigues and strategies, we all know that the real insights are hidden in the raw data.

A new public dataset was recently released, documenting everything a data scientist could wish to have on the TV show: from rating, to castaways personality, and voting results.

In this blog post I will play with some visualization that I find to be interesting.

What are the personality characteristics of the winners?

Myers-Briggs Type Indicator is a popular personality characteristic method, though its…

The most exciting event for my kids, especially during COVID-19 times, is to open delivery boxes that arrive at our front door almost on a daily basis. Even though most of the deliveries are just groceries, the kids are always enthusiastic and thrilled about what treasures are they going to find in the box this time.

This is also how I felt when I came across a new COVID-19 dataset. It takes an interesting, yet, simple, approach: rather than focus on cause of death and an endless debate about potential biases for each record, this dataset simply counts all deaths…

Perhaps not the most elegant way to do it, but worth it for the catchy title

Long piped R commands… Some loves it, and some don’t!

When you want to shoot an impressive complex falling domino tiles video, are you going to take the risk of accidentally flip one cube before it is completed, and ruin the whole thing, or would you prefer break it into independent units, better isolated, that would also allow you to explore, and test its features, without harming the other parts?

In case of R, the pipe operator, originally from the package magrittr, allows you to pass along an object to the next function. …

5 minutes to create the animation. 50 minutes to figure out how to embed into Medium

In my previous blog post I visualized mobility data from the Covid-19 Open data project.

This blog post will demonstrate another popular visualization, a race bar chart animation. I used the cumulative number of COVID-19 deaths as the presented outcome.

single frame from the animation (by the author)

Animation was created using the robservable R package, which allows the use of Observable notebooks (or parts of them) as htmlwidgets in R.

HTML animation file can be found at my GitHub repo. Below a recorded video from my screen, therefore the degraded quality.

Visualizing mobility data trends of Saxophone shape graphs

Rumor has it that COVID-19 is a great opportunity to show your data-science / analytics skills, and you don’t have to be an epidemiologist in order to share your 2 cents with the world. Well… here is my attempt.

Google has shared a public data set about daily mobility across the world, before, during, and after COVID-19 first wave, starting February 15, 2020. The version I have has 6 types of mobility measures:

Mobility retail and recreation

mobility grocery and pharmacy

mobility parks

mobility transit stations

mobility workplaces

mobility residential

In this post I focused on overall state level analysis…

From a data-science perspective.


What is the wildest wish list you have for a software tool? Go crazy, ask yourself what-if I want to take this and that, and combine/compare/compose them together in a minimal effort? Most likely someone else, a domain expert, has already figured a good way to solve it. They may have needed to wrap it under fancy software structure that is not intuitive to begin with and has its own lingo. Is it worth learning it, and leverage on the wide-scope solutions you are likely to discover way down the road? What else characterizes wide-scope packages?

image source

ML example…

written on December, 2019.

My previous post from mid 2018 described my learning experience with R packages for ‘meta’ machine learning aggregator packages: mlr, caret and SuperLerner. These packages unify a machine learning framework over multiple independent individual multivariate models/packages, and provide a ‘meta’ machine learning framework around them for common tasks such as resampling, tuning, benchmarking, ensemble and others.

Since then, a couple of major development has evolved and matured in the R ecosystem, replacing deprecated packages that, ahem, ahem, not comfortable to admit, have started to suffer from … scope creep.

The current refactoring packages I will discuss…

From a data-science perspective.

You’ve got a fresh new project on your desk, some exciting data, a challenging Kaggle competition, a new client you wish to impress, and you are fully motivated. At first, the problem seems to be well defined, and you even feel comfortable with the task in hand. You have just completed a similar task. This new one should not be much different. Maybe even just a few copy/pastes with some modifications at the edges.

But then it comes… The client / collaborator / boss has just one simple additional request… It usually goes like this:

‘Hmmmm, I wonder how would…

This post was published around mid 2018.

Also check a followup post from end of 2019 about 2nd generation meta packages.

Do you remember learning about linear regression in your Statistics class? Congratulations! You are now a fortunate member of a diverse community of Statisticians, Mathematicians, Data Scientists, Computer Scientists, Engineers, and many (many) others who have used a ‘machine learning’ approach! What the heck, you can even tell your friends that you are doing some fancy ‘Artificial Intelligence’.

Linear regression is only one of many supervised models. There are hundreds of other models including: non-linear models, polynomial models, Tree…

Dror Berel

BioPassenger, Computational Biology for the people

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store