Who are Data Scientists?
Candidates with this designation are responsible for extracting data and interpreting meaning from that data. Being a data scientist one requires both tools and methods from statistics and machine learning. They have to spend a lot of time in the process of collecting, cleaning, and munging data, because a huge amount of data is never clean.
Data Scientist needs to have both technical and non-technical skills to perform their job in an effective manner.
To quickly screen the best data science candidate make sure they should have below things:
- They should have a degree in mathematics, statistics, computer science, management information systems.
- They should have worked on data collection and analysis.
- If they work as an individual contributor with excellent problem-solving skills then they are the best fit.
- They should have excellent communication skills with both verbally and visually.
For Evaluating Technical Skills of Data Scientist:
In Data Science 3 types of technical tools are involved:
1- Tools for pulling data,
2- Tools for analyzing the data, and
3- Tools for presenting the results
While technically screening the candidates for Data Scientist position look for the below tools and technical skills
Data Scientist generally uses these tools for data pulling & pre-processing:
1- SQL: This is a must skill for all data scientists, regardless of whether you are using structured or unstructured data. Companies are using the latest SQL engines like Apache Hive, Spark-SQL, Flink-SQL, Impala, etc.
2- Big Data Technologies: This is the most important out of the skills needed to become a “Data Scientist”. The data scientist needs to know about different Big Data technologies like Hadoop and its ecosystem, Spark and Flink if possible.
3- UNIX: Most raw data is stored on a UNIX or Linux server before putting it in a data store for processing. So Unix knowledge is good for Data Scientists.
4- Python: is a most popular language for the data scientist. Python is an interpreted, object-oriented programming language with dynamic semantics. It is a high-level language with dynamic binding and typing.
Tools for Data Analysis & pattern matching
This depends on your level of statistical knowledge. Some tools are used for more advanced statistics and some for more basic statistics.
1- SAS: Many companies are using SAS, it’s good if your candidate has worked on SAS. It helps them to manipulate equations easily.
2- R: R is the most popular in the statistical world. R is an open-source tool and language that is object-oriented, so you can use that anywhere. It is the first choice of any data scientist as most things are implemented in R.
3- Machine Learning: Machine learning is the most demanding and most useful tool the data scientists must have. There are a lot of machine learning tools that are available in the market like weka, NLTK, etc. but machine learning tools on top of big data technologies are grabbing industry attention like Mahout, MLlib, FlinkML.
Tools for Visualization
Tableau: It is a popular tool, especially in Silicon Valley.
Few other tools for visualizations are JasperSoft, SAP BI, QlikView, MicroStrategy, etc.
Hope the above article will help you in screening the Data Scientist profile and even in sourcing their profiles.