fbpx

Four Linux disk and file commands that aren’t that unique, and two important commands for finding content in command output or text files

Let us summarize four really basic commands, that are the same in Linux and Windows:

cd – Change directory.

dir – List directory contents.

mkdir – Make a directory.

rmdir – Remove a directory.

Now let us find specific words in command output

Okay, that much is clear. But during your daily work, you often want to find specific words from the output from other commands.

In Windows you would use the FindStr command as follows:

<command that prints output> | Findstr <search string>

But in Linux you will use grep.

<command that prints output> | grep <search string>

Let us find content in text files

But what if you want to find content in plain text files? Then you have the type command in Windows, and can use it as follows:

type <filename> | Findstr <search string>

In Linux you can use the cat command, as follows:

cat <filename> | grep <search string>

Curious to find out more about Linux network commands that are similar to Windows? Check this out.

Three reasons why data scientists should learn Bash in 2021

Data science makes use of many tools, and naturally programming is a large part of data science. But as commonly known, there are peripheral skills that are vital. Let us check out three hard reasons why Bash can come in handy for data scientists.

ETL operations

Take for instance the acronym ETL, Extract, Transform, Load. The extraction of data, and in particular text data, to transform it into a desired format and then loading it again. These tasks are by no means menial and consider intellect as well as using tools. Bash offers a range of tools to work with data files, and to filter and modify segments. For instance, awk is a domain-specific language for text processing. Furthermore, the grep command and its variants are used to quickly match for patterns in text output (which can then be combined with other commands using the pipe functionality). The lists goes on, with commands such as sed, cat, head, tail . All these commands come in handy for working with text data.

Bash can be used in both Linux and Windows

Continuing on the same train of thought, Bash which is now available in Linux and Windows (using the Windows Subsystem for Linux, WSL), is an integral part for automating the creation, copying and moving files automatically. With the help of (writing) Bash scripts, a data scientist can automate much of the work that is involved with files. Yes, certainly, Batch and PowerShell in Windows can be used for the same. But Bash is available in Linux and Windows, and has a much richer scripting possibilities, with more commands and the ability to use functions in code that can then be called from other Bash scripts. The list goes on and Bash with its pipe functionality is more modular and more permissive for automation with files and network tasks.

Are you using MySQL or MariaDB? Bash to the rescue!

Are you using the relational database MySQL (or MariaDB) in Linux for the storage of column-oriented data? Furthermore, are you in need of working on the commandline to import and export data to and from the RDBMS? Well then, with Bash you can use mysqldump for the export of data, and the following format of the mysql command to import a data file: mysql -u <username> -p <password> < datafile.sql . If your .sql file contains CREATE DATABASE and CREATE TABLE statements you are able to create entire databases and table schemas using the commandline! Therefore Bash is a formidable ally when you want more control over a MariaDB/MySQL installation and the databases and tables stored on the same.

Happy Bash learning, and of course Linux as a whole! Stay tuned for more insights and tips! Also check out learn Linux as a data scientist.

By Paul-Christian Markovski, for NailLinuxExam.com.

5 Linux network commands that are similar or the same as in the Windows shell – Part 2

In the first part of this blog series, we did a round-up of five common Linux network commands and their equivalent commands in Windows. In this part we will continue with the five described Linux commands for networking, and we will automate the usage of these with Bash scripts.

The commands we will cover are netstat, nslookup, ping, traceroute and curl. As you may be familiar with, these commands have a range of different switches that can be activated when running the commands. By using Bash, we will automate the running of the commands and the output they generate. We will also use the read Bash built-in command to create simple user interfaces where we request input from the keyboard. In one of our examples we will use a handy while loop in combination with the read command to read all lines in a file. That is a code snippet you will definitely have use of in the future.

Let’s get to it!

netstat for network connections

So netstat can show you network connections and the routing table as well as network interface statistics. But let’s say we want to narrow down our search for network connections in listening mode, and only those that were opened by users on the local Linux box. We can do that with the -l switch in combination with a grep command to filter for the connections that have the path “user” (in ‘run’ directory on Debia-based Linux). This way we get a neat table as output which shows all listening network connections initiated by users.

Check out the code here..

netstat -l | grep user

Above we pipe the output from netstat to grep and only match for entries in the output that contain “user”.

Checking listening network connections for a specific user

Let’s move on to the next netstat Bash example. We are going to build on the last command snippet. Now we want to narrow down the search even more, to only show listening network connections that were opened by a specific user ID. Recall that all users in a Linux system have numeric user IDs associated with them. If you want to learn more about Linux user IDs, please read “What is a Linux UID?”.

Now let us look at the Bash script.

read -p "Which user ID? " usr
netstat -l | grep user | grep $usr

As you can see, the second line is almost identical to the one in the last section. The difference is that we have added another grep statement, and matching for the contents in the variable $usr. The first line is where we define the $usr variable, and we do that by reading from the keyboard with the Bash built-in read.

The result is a pretty useful, although simple, Bash script. We ask the user to enter a UID, and then we search for all listening network connections and grep for only user initated connections, and finally for the specific UID.

But netstat -l is not fool-proof!

You need to know that the solution above is not fool-proof, because we are using predetermined matching conditions and we cannot guarantee that there will not be false positives. However, although the output may contain some extra lines for matching (listening) network connections, we can be sure that all the specific listening network connections for a specific UID will be displayed.

netstat for TCP and UDP connections

Now it’s time to look further and automate netstat even more. In this scenario we will consider when we want to check the listening and connected network sockets for both TCP and UDP. We will also want to map the network sockets to the PID (program name) that opened the network sockets.

netstat -tulpn

nslookup to look up domain names

Our first nslookup example will ask for a domain name and lookup the associated A record for it.

read -p "Enter domain: " domain
nslookup $domain

As you can see, we are reading the domain name from the keyboard and simply passing that variable value to nslookup.

nslookup for several domains on the same line

Let’s say you want to get A-records for several domains in one go. Well, since the internal field separator (IFS) is set to carriage return, you can use space to separate several values to assign to variables. The read command will read the input until it encounters a newline character. So the command really reads words, not lines. Like this..

read -p "Enter domains: " domain domain2

nslookup $domain
nslookup $domain2

We assign the variable values to each variable in turn, and then we just run nslookup twice, once for each variable. This was simple enough.

nslookup can do more

nslookup can look up all sorts of domain records, such as the name servers responsible for specific A-domains (records), and the email servers (MX records). Check out the following example..

read -p "Enter domain: " domain

nslookup -type=MX $domain
nslookup -type=NS $domain

Here we read one domain from the keyboard and assign it to a variable, noting new here. But check out the two nslookup commands. We use the -type switch to specify which type of domain records we are looking for, for the specific domain. MX stands for Mail Exchange, for routing emails. And NS stands for Nameserver, for the authoritative domain server(s) for the specific domain.

curl to get web page contents

curl is in its generic form a tool to transfer data to and from servers. It is often used to transfer HTTP data. Here we will see how to get the web page headers for a domain.

curl -I https://www.google.com

The -I switch tells curl to fetch the headers only. This is perhaps the simplest of all curl examples. Be aware that it supports a wide range of protocols, as follows (DICT, FILE, FTP, FTPS, GOPHER, HTTP, HTTPS, IMAP, IMAPS, LDAP, LDAPS, POP3, POP3S, RTMP, RTSP, SCP, SFTP, SMB, SMBS, SMTP, SMTPS, TELNET and TFTP).

curl to fetch a user-specified web page

So let us continue by using curl to fetch an entire web page. This is the default mode when you use curl without switches.

read -p "Domain name? " domain

curl $domain

Like before, we ask the user to submit a domain name (should be prepended with HTTP or HTTPS). Then we simply invoke curl for the read variable, and we will see the entire web page printed to stdout.

curl to fetch and save web pages

Finally, let us modify the previous example and save the output to a file. Like this..

read -p "Domain name? " domain

curl $domain -o output.txt

That’s it! To understand the usefulness of curl, I highly recommend that you run man curl and read about all the available options.

You can find the GitHub repository with the simple examples here.

Author: Paul-Christian Markovski, for NailLinuxExam.com.

5 reasons for you to store your Bash scripts in Git repositories

1 Maintain a history of your script improvements – Force yourself to continuously improve

Git will help you to get a clear overview of how your scripts have evolved. Maybe you need to go back a version or two and fork a script from there. If you have added proper comments in your scripts, than you and others can follow your reasoning easily.

A funny thing happens when you start adding code in Git repositories and update them as you add or change code. You become more prone to improving your code as you know which parts needs some more work. Maybe even a #ToDo note somewhere that signifies a totally new feature – That #ToDo comment will nag you more when you have a Git repository and constantly go there to check it out. Add the reasoning in reason number 2 and you also feel the peer pressure from others to continue building on your useful scripts.

2 Others might take over your scripts for regular usage

Think of the Linux users that you will work with and that will need the same toolchest that you have. Show some team spirit and share your work, of course with proper attribution to you as a script programmer. This gives you a lot of respect

3 Understand the fallacy of “the perfect script” thinking

There are no perfect scripts. No silver bullets to any IT problem of a larger scale. In the Linux world we are also used to many ways to solve simple tasks on the commandline in many ways. This statement can then be used to argue that your solution has no need to be hidden away or not to be covered by version control. With version control you introduce the idea of newer and better versions of your Bash script. It is a nice trick to show yourself and others that yes, improvements and new ideas are welcome.

4 DevOps and Test-Driven Development ask for iteration and cross-border approaches

These two paradigms really show that more collaboration is expected between subject matter experts, whether a Linux user, a Python scripter, or a LAMP Engineer (to give only three examples). We get inspiration from each other’s solutions to tasks on the commandline. These solutions are stored in Bash scripts. By using Git you show that you are open to the idea of sharing your scripts as well as reject the notion of the “write once, all good, don’t argue” type of mentality.

5 Git is used to version control more than you think

These days you will find that Git is used by many more than programmers. From technical writers to scripters and data scientists. The idea of applying version control and having central repositories that can be forked and worked on by many people is simly very appealing. It also deals with the number one silo: The expert that sits on all the knowledge but that is so tremendously useful to others as well.