Data science makes use of many tools, and naturally programming is a large part of data science. But as commonly known, there are peripheral skills that are vital. Let us check out three hard reasons why Bash can come in handy for data scientists.
ETL operations
Take for instance the acronym ETL, Extract, Transform, Load. The extraction of data, and in particular text data, to transform it into a desired format and then loading it again. These tasks are by no means menial and consider intellect as well as using tools. Bash offers a range of tools to work with data files, and to filter and modify segments. For instance, awk
is a domain-specific language for text processing. Furthermore, the grep
command and its variants are used to quickly match for patterns in text output (which can then be combined with other commands using the pipe functionality). The lists goes on, with commands such as sed
, cat
, head
, tail
. All these commands come in handy for working with text data.
Bash can be used in both Linux and Windows
Continuing on the same train of thought, Bash which is now available in Linux and Windows (using the Windows Subsystem for Linux, WSL), is an integral part for automating the creation, copying and moving files automatically. With the help of (writing) Bash scripts, a data scientist can automate much of the work that is involved with files. Yes, certainly, Batch and PowerShell in Windows can be used for the same. But Bash is available in Linux and Windows, and has a much richer scripting possibilities, with more commands and the ability to use functions in code that can then be called from other Bash scripts. The list goes on and Bash with its pipe functionality is more modular and more permissive for automation with files and network tasks.
Are you using MySQL or MariaDB? Bash to the rescue!
Are you using the relational database MySQL (or MariaDB) in Linux for the storage of column-oriented data? Furthermore, are you in need of working on the commandline to import and export data to and from the RDBMS? Well then, with Bash you can use mysqldump
for the export of data, and the following format of the mysql
command to import a data file: mysql -u <username> -p <password> < datafile.sql
. If your .sql
file contains CREATE DATABASE
and CREATE TABLE
statements you are able to create entire databases and table schemas using the commandline! Therefore Bash is a formidable ally when you want more control over a MariaDB/MySQL installation and the databases and tables stored on the same.
Happy Bash learning, and of course Linux as a whole! Stay tuned for more insights and tips! Also check out learn Linux as a data scientist.
By Paul-Christian Markovski, for NailLinuxExam.com.