The scheduling format for the Linux scheduling daemon cron are not easy to remember, especially if you don’t work with cron that frequently. The first reaction for most Linux sys admins when they can’t remember the ordering of fields is to type ‘man crontab’, and unfortunately this man page section does not contain the schedule format information. If you are like me, you will immediately start Googling it.

What is the best way to locate the man page for crontab scheduling format then? For one thing, you can search the man page for the key work ‘crontab’ using the command below –

daniel@linubuvma:/tmp$ man -k crontab
anacrontab (5)       - configuration file for anacron
crontab (1)          - maintain crontab files for individual users (Vixie Cron)
crontab (5)          - tables for driving cron

You see, there are two sections for crontab – section 1 describes the command usage and section 5 shows the tables we are looking for. If you are familiar with how man page section numbers are assigned, you would have immediately jumped to section 5 of the man page for crontab –


1. General commands
2. System calls
3. C library functions
4. Special files (usually devices, those found in /dev) and drivers
5. File formats and conventions
6. Games and screensavers
7. Miscellanea
8. System administration commands and daemons

Short answer to how do i see the crontab schedule format is – run

 man 5 crontab 

Per the man page, the time and date fields in order are –

field allowed values
----- --------------
minute 0-59
hour 0-23
day of month 1-31
month 1-12 (or names, see below)
day of week 0-7 (0 or 7 is Sun, or use names)

One of the most commonly used Linux system administration tools is chown, which is part of the coreutils package. It is used to change the user and/or group ownership of a given file or directory. Something to be aware of this tool is, it doesn’t change the ownership of symbolic links, as shown below –

root@linubuvma:/tmp# touch test
root@linubuvma:/tmp# ls -l test
-rw-r--r-- 1 root root 12 Dec 20 08:01 test
root@linubuvma:/tmp# ln -s test sltest
root@linubuvma:/tmp# ls -l sltest
lrwxrwxrwx 1 root root 4 Dec 20 08:01 sltest -> test
root@linubuvma:/tmp# chown daniel:daniel sltest
root@linubuvma:/tmp# ls -l sltest
lrwxrwxrwx 1 root root 4 Dec 20 08:01 sltest -> test

The reason this doesn’t work is in the man page for chown – symbolic links named by arguments are silently left unchanged unless -h is used.” By simply running chown on symbolic link without ‘-h’ option, you are changing the ownership of the target. The ‘-h’ option affects symbolic links instead of any referenced file.

root@linubuvma:/tmp# chown -h daniel:daniel sltest

root@linubuvma:/tmp# ls -l sltest
lrwxrwxrwx 1 daniel daniel 4 Dec 20 08:01 sltest -> test

Though not portable, in some distros

 chown -R 

will recursively change the owernship of all files, including symbolic link files and directories. In my case, ‘chown -R /path/to/file’ works for GNU chown which is part of the ‘GNU coreutils 8.21’ package on Ubuntu 14.04.

How to record your ssh session using screen.

Per the man page – “Screen is a full-screen window manager that multiplexes a physical terminal between several processes (typically interactive shells)”.
Screen is most commonly used to create multiple sessions to remote hosts within a single terminal window or even run multiple commands locally without leaving your shell terminal. For instance, you could be tailing the log file in one session, then run a long process, then ssh into other machine etc. all within a single window.

Screen is the go to tool when setting up a remote connection, such as ssh, and you want to continue your work at any time or from any other host without worrying of a dropped connection.

In this post, I will show you how you can record your bash session.

Installation –

yum install screen        (Debian/Ubuntu)
apt-get install screen    (Redhat/CentOS)

My local environment and the remote host I am sshing to –

daniel@linubuvma:/tmp$ screen -v
Screen version 4.01.00devel (GNU) 2-May-06
daniel@linubuvma:/tmp$ uname -r
3.13.0-106-generic
daniel@linubuvma:/tmp$ cat /etc/issue
Ubuntu 14.04.5 LTS \n \l

daniel@linubuvma:/tmp$ ssh ns2 'uname -r ; cat /etc/issue'
2.6.32-642.6.1.el6.x86_64
CentOS release 6.8 (Final)
Kernel \r on an \m

The ‘-L’ option of screen is used to record your session, the session log is automatically saved in a file named ‘screenlog.n’ in your current directory.

daniel@linubuvma:/tmp$ ls
config-err-hbzs5e  one          ssh-4yheApHRgMBF  ssh-RK7GpeFuzUB8  VMwareDnD    vmware-root-2347660412
gpg-kZux7q         screenlog.0  ssh-BBblvGtb5284  vmware-daniel     vmware-root
daniel@linubuvma:/tmp$ free -m
             total       used       free     shared    buffers     cached
Mem:          3946       2489       1457          6        547       1031
-/+ buffers/cache:        911       3035
Swap:         4092          0       4092
daniel@linubuvma:/tmp$ exit
[screen is terminating]
daniel@linubuvma:/tmp$ 

The whole bash session will be logged in screenlog.0 in this case –

daniel@linubuvma:/tmp$ cat screenlog.0 
daniel@linubuvma:/tmp$ ls
config-err-hbzs5e  one          ssh-4yheApHRgMBF  ssh-RK7GpeFuzUB8  VMwareDnD    vmware-root-2347660412
gpg-kZux7q         screenlog.0  ssh-BBblvGtb5284  vmware-daniel     vmware-root
daniel@linubuvma:/tmp$ free -m
             total       used       free     shared    buffers     cached
Mem:          3946       2489       1457          6        547       1031
-/+ buffers/cache:        911       3035
Swap:         4092          0       4092
daniel@linubuvma:/tmp$ exit
exit
daniel@linubuvma:/tmp$ 

Recording your session of an ssh connection to a remote host is also similar, with ‘-L’ option followed by the command to ssh to remote host.
Option -fn (with no flow-control)
Option -t (title bar name) in this case ‘practice’.

daniel@linubuvma:/tmp$ screen -fn -t practice -L  ssh ns2
Last login: Tue Dec 27 09:46:10 2016 from linubuvma.home.net

[daniel@kauai ~]$ hostname -f
kauai.example.net
[daniel@kauai ~]$ uptime
 10:08:18 up 18 days, 10:02, 14 users,  load average: 0.19, 0.49, 0.64
[daniel@kauai ~]$ exit
[screen is terminating]


daniel@linubuvma:/tmp$ cat screenlog.0 
Last login: Tue Dec 27 09:46:10 2016 from linubuvma.home.net
[daniel@kauai ~]$ hostname -f
kauai.example.net
[daniel@kauai ~]$ uptime
 10:08:18 up 18 days, 10:02, 14 users,  load average: 0.19, 0.49, 0.64
[daniel@kauai ~]$ exit
logout
Connection to ns2 closed.
daniel@linubuvma:/tmp$ 

Additional resources –

https://www.rackaid.com/blog/linux-screen-tutorial-and-how-to/
https://linux.die.net/man/1/screen

Randomly ordering files in a directory with python

I have a playlist file which contains audio files to play. The audio player unfortunately plays the music files in a sequential order, in whatever order they are listed in the playlist file. So occasionally I have to regenerate the playlist file to randomize the audio files order. Here is a simple script that I had to write for this purpose, the core component is the random.shuffle(list) python function –

Create script file as shuffle_files.py –

#!/usr/bin/env python

import os
import random
import sys

music_files=[]

if len(sys.argv) != 2:
  print "Usage:", sys.argv[0], "/path/directory"
else:
  dir_name=sys.argv[1]
  if os.path.isdir(dir_name):
    for file_name in os.listdir(dir_name):
      music_files.append(file_name)
  else:
    print "Directory", dir_name, "does not exist"
    sys.exit(1)
# shuffle list
random.shuffle(music_files)
for item in music_files:
  print os.path.join(dir_name,item)

Run the script by providing a path to a directory with files. Each iteration should list the files in the directory in a different order.
Note – the script does not recurse into the directories, it can be easily modified with os.walk if necessary.

root@svm1010:/home/daniel/scripts# python shuffle_files.py /opt/iotop/iotop
/opt/iotop/iotop/setup.py
/opt/iotop/iotop/README
/opt/iotop/iotop/iotop
/opt/iotop/iotop/iotop.8
/opt/iotop/iotop/NEWS
/opt/iotop/iotop/iotop.py
/opt/iotop/iotop/PKG-INFO
/opt/iotop/iotop/THANKS
/opt/iotop/iotop/sbin
/opt/iotop/iotop/setup.cfg
/opt/iotop/iotop/ChangeLog
/opt/iotop/iotop/.gitignore
/opt/iotop/iotop/COPYING


root@svm1010:/home/daniel/scripts# python shuffle_files.py /opt/iotop/iotop
/opt/iotop/iotop/PKG-INFO
/opt/iotop/iotop/COPYING
/opt/iotop/iotop/iotop
/opt/iotop/iotop/setup.cfg
/opt/iotop/iotop/NEWS
/opt/iotop/iotop/README
/opt/iotop/iotop/.gitignore
/opt/iotop/iotop/setup.py
/opt/iotop/iotop/THANKS
/opt/iotop/iotop/iotop.py
/opt/iotop/iotop/ChangeLog
/opt/iotop/iotop/iotop.8
/opt/iotop/iotop/sbin


root@svm1010:/home/daniel/scripts# python shuffle_files.py /opt/iotop/iotop
/opt/iotop/iotop/THANKS
/opt/iotop/iotop/setup.py
/opt/iotop/iotop/NEWS
/opt/iotop/iotop/README
/opt/iotop/iotop/iotop.8
/opt/iotop/iotop/.gitignore
/opt/iotop/iotop/ChangeLog
/opt/iotop/iotop/sbin
/opt/iotop/iotop/PKG-INFO
/opt/iotop/iotop/iotop
/opt/iotop/iotop/COPYING
/opt/iotop/iotop/iotop.py
/opt/iotop/iotop/setup.cfg

Reference – https://docs.python.org/2/library/random.html?highlight=shuffle#random.shuffle

Ngrep is a very user friendly packet sniffer, basically the “grep” equivalent at the network layer.

Here is a quick way of figuring out the http connections your browser is making even if you are browsing to a secure site, make sure that is the only site you are visiting as the command will capture all port 80 connections.

Installation –

apt-get install ngrep

Let us redirect all traffic ngrep captured to a file –

ngrep -d any -W byline port 80 | tee  /tmp/net_output

Now visit a secure site, say https://cnet.com, you will see nicely formated output

root@lindell:~# ngrep -d any -W byline port 80 | tee  /tmp/output
interface: any
filter: (ip or ip6) and ( port 80 )
####
T 17.31.198.19:33954 -> 72.21.91.29:80 [AP]
POST / HTTP/1.1.
Host: ocsp.digicert.com.
User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:50.0) Gecko/20100101 Firefox/50.0.
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8.
Accept-Language: en-US,en;q=0.5.
Accept-Encoding: gzip, deflate.
Content-Length: 83.
Content-Type: application/ocsp-request.
Connection: keep-alive.
..

From here, you can parse the /tmp/output file.

Similarly, you can parse the output file for the type of web server your favorite sites are using. Keep the ngrep command running, and visit all your favorite sites. Note, this works for http only, as https traffic is encrypted, for https only destination IP and port are shown.

In this case, I searched for the ‘Server:’ field in the HTTP response header from the web server. Apparently, nginx seems to be most popular, it is also interesting to see that AmazonS3 storage being used for hosting static content –

root@lindell:~# awk '/Server:/ {print $2}' /tmp/output |sort | uniq -c |sort -nr
    155 nginx.
     40 Apache.
     36 Apache-Coyote/1.1.
     20 Apache/2.2.3
     14 nginx/1.8.1.
      7 AmazonS3.
      6 Akamai
      5 ECS
      5 cloudflare-nginx.
      4 Omniture
      4 ESF.
      3 sffe.
      3 nginx/1.10.2.
      2 Microsoft-IIS/7.5.
      2 gws.
      2 AkamaiGHost.
      1 WildFly/8.
      1 Varnish.
      1 openresty.
      1 NetDNA-cache/2.2.
      1 Cowboy.
      1 ATS.
      1 Apache/2.2.14

References –
http://ngrep.sourceforge.net/usage.html
https://wiki.christophchamp.com/index.php?title=Ngrep

In some cases, you might want to block all users from logging in to the system or just after you login, you want to prevent everyone else from connecting to the server. During server maintenance, this could be helpful or there are use cases where only one actively logged in user has to do some work if the username is a shared account.

Solution – create the /etc/nologin file, and put the text notice as the body of the file. If a user attempts to log in to a system where this file exists, the contents of the nologin file is displayed, and the user login is terminated.

[root@kauai ~]# echo 'System is under maintenance till Dec. 24, 2PM EST.' > /etc/nologin

Now try to login to the server under non super user –

daniel@linubuvma:~$ ssh ns2
System is under maintenance till Dec. 24, 2PM EST.
Connection closed by 192.168.10.103

If your ssh configuration allows it, root user can login to the server though, the root user will still be greeted with the contents of /etc/nologin file though –

daniel@linubuvma:~$ ssh root@ns2
root@ns2's password:
System is under maintenance till Dec. 24, 2PM EST.
Last login: Sat Dec 12 01:11:35 2015 from linubuvma.home.net
[root@kauai ~]# 

Reference – https://docs.oracle.com/cd/E19683-01/806-4078/6jd6cjs3v/index.html

During user login, a Linux box might show message of the day(motd), new email, or package updates information. This is particularly common in Ubuntu boxes. In some cases, you want to prevent all these messages from being displayed as it could be delaying your login for instance.

Solution – Create a file named .hushlogin in the user’s home directory.

A typical login to an Ubuntu box might look like this –

[daniel@kauai etc]$ ssh practice
daniel@practice's password: 
Welcome to Ubuntu 14.04.1 LTS (GNU/Linux 3.13.0-39-generic x86_64)

 * Documentation:  https://help.ubuntu.com/

  System information as of Sat Jan 10 11:37:24 EST 2015

  System load:  0.0                Processes:           290
  Usage of /:   46.1% of 45.15GB   Users logged in:     1
  Memory usage: 13%                IP address for eth0: 192.168.10.206
  Swap usage:   0%

  Graph this data and manage this system at:
    https://landscape.canonical.com/

168 packages can be updated.
63 updates are security updates.

You have new mail.
Last login: Sat Jan 10 11:37:26 2015 from linux.local

To suppress all this information, create a .hushlogin file in the users home directory and log out and login back –

daniel@linubuvma:~$ touch ~/.hushlogin

daniel@linubuvma:~$ exit
logout
Connection to practice closed.

[daniel@kauai etc]$ ssh practice
daniel@practice's password: 

daniel@linubuvma:~$ 

How to interact with web services.

Curl is the defacto CLI tool for interacting with web services and other non-HTTP services such as FTP or LDAP. Linux or Unix system administrators as well as developers love it for its ease of use and debugging capabilities. When you want to interact with web services from within scripts, curl is the number one choice.For downloading files from the web, wget is commonly used as well, but curl can way more.

Since enough has been written about curl, this post is about a tool which takes interaction with web services a lot more human friendly, with nicely formatted and colored output – httpie. It is written in Python.

Installation

apt-get  install httpie     #(Debian/Ubuntu)
yum install httpie          #(Redhat/CentOS)

Note – although the package name is httpie, the binary file is installed as http.

When troubleshooting web services, the first thing we check is usually http request and response headers –

daniel@lindell:/$ http -p hH  httpbin.org
GET / HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: httpbin.org
User-Agent: HTTPie/0.9.2

HTTP/1.1 200 OK
Access-Control-Allow-Credentials: true
Access-Control-Allow-Origin: *
Connection: keep-alive
Content-Length: 12150
Content-Type: text/html; charset=utf-8
Date: Thu, 22 Dec 2016 01:32:13 GMT
Server: nginx

Where -H is for Request headers, -h is for response headers. Similarly, -B is for request body and -b is for response body.

We can also pass more complex HTTP headers, in this case “If-Modified-Since”, the web server will return 304 if the static content i am requesting has not been modified. Moving the date a few years back, it will respond with 200 status code.

daniel@lindell:/$ http -p hH http://linuxfreelancer.com/wp-content/themes/soulvision/images/texture.jpg "If-Modified-Since: Wed, 21 Dec 2016 20:51:14 GMT"
GET /wp-content/themes/soulvision/images/texture.jpg HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: linuxfreelancer.com
If-Modified-Since:  Wed, 21 Dec 2016 20:51:14 GMT
User-Agent: HTTPie/0.9.2

HTTP/1.1 304 Not Modified
Connection: Keep-Alive
Date: Thu, 22 Dec 2016 01:39:28 GMT
ETag: "34441c-f04-4858fcd6af900"
Keep-Alive: timeout=15, max=100
Server: Apache/2.2.14 (Ubuntu)

daniel@lindell:/$ http -p hH http://linuxfreelancer.com/wp-content/themes/soulvision/images/texture.jpg "If-Modified-Since: Wed, 21 Dec 2008 20:51:14 GMT"
GET /wp-content/themes/soulvision/images/texture.jpg HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: linuxfreelancer.com
If-Modified-Since:  Wed, 21 Dec 2008 20:51:14 GMT
User-Agent: HTTPie/0.9.2

HTTP/1.1 200 OK
Accept-Ranges: bytes
Connection: Keep-Alive
Content-Length: 3844
Content-Type: image/jpeg
Date: Thu, 22 Dec 2016 01:39:37 GMT
ETag: "34441c-f04-4858fcd6af900"
Keep-Alive: timeout=15, max=100
Last-Modified: Sat, 01 May 2010 22:23:00 GMT
Server: Apache/2.2.14 (Ubuntu)

httpie also makes passing JSON encoding as well as POST/PUT methods a lot easier. No need for formatting your payload as JSON, it defaults to JSON. Debugging is easier to with -v option, which shows the raw wire data –

daniel@lindell:/$ http -v PUT httpbin.org/put name=JoeDoe email=joedoe@gatech.edu
PUT /put HTTP/1.1
Accept: application/json
Accept-Encoding: gzip, deflate
Connection: keep-alive
Content-Length: 48
Content-Type: application/json
Host: httpbin.org
User-Agent: HTTPie/0.9.2

{
    "email": "joedoe@gatech.edu", 
    "name": "JoeDoe"
}

HTTP/1.1 200 OK
Access-Control-Allow-Credentials: true
Access-Control-Allow-Origin: *
Connection: keep-alive
Content-Length: 487
Content-Type: application/json
Date: Thu, 22 Dec 2016 01:44:20 GMT
Server: nginx

{
    "args": {}, 
    "data": "{\"name\": \"JoeDoe\", \"email\": \"joedoe@gatech.edu\"}", 
    "files": {}, 
    "form": {}, 
    "headers": {
        "Accept": "application/json", 
        "Accept-Encoding": "gzip, deflate", 
        "Content-Length": "48", 
        "Content-Type": "application/json", 
        "Host": "httpbin.org", 
        "User-Agent": "HTTPie/0.9.2"
    }, 
    "json": {
        "email": "joedoe@gatech.edu", 
        "name": "JoeDoe"
    }, 
    "origin": "192.1.1.2", 
    "url": "http://httpbin.org/put"
}

I have touched just the surface of httpie here, please feel free to get more detailed information on the github repo. It has built-in JSON support, form/file upload, HTTPS, proxies and authentication, custom headers, persistent sessions etc.

Article on wget and curl from previous post.

In these series of Docker tutorials, i will walk you through a hands on experimentation with Docker. The operating system I am working on is Ubuntu 16.04.

Docker is a containerization technology which allows deployment of applications in containers. Its advantage is speed, a docker container hosting an application would be up and running in a few milliseconds.

As opposed to Virtual machines, containers run on top of the host OS. They share the host kernel. Thus you can only run a Linux container on a Linux host or machine.

Docker Installationuse this link for instructions on how to install Docker.

Installation Limitation – Docker runs on 64-bit OS only and it supports Linux kernel version 3.10 and above. You can verify this using the commands below –

root@lindell:~# arch
x86_64
root@lindell:~# uname -r
4.4.0-47-generic

Docker – terminology

    Images – are the building blocks of Docker. Once created or built, they can be shared, updated and used to launch containers. No image, no containers.

    Containers – are images in action. Containers give images life, containers are image plus all the ecosystem the Operating system need to run the application.

    Registry – where images are stored. They can be public or private. DockerHub is a typical example of public registry.

    Data volumes – persistent storage used by containers.
    Dockerfile – file containing instructions to be read by Docker for building a Docker image.

    Node – physical or virtual machine running Docker engine.

Our first Docker container
After installing docker and making sure that the Docker engine is running, run the commands below to check the Docker images(‘docker images’) available or if any Docker containers are running(‘docker ps’). Both commands should not return any results if this is a first time installation.

root@lindell:~# docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
root@lindell:~# docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED
             STATUS              PORTS               NAMES

The next step is to get a Docker image from Docker Hub. For security reasons, we are going to use only official images –

root@lindell:~# docker search --filter=is-official=true ubuntu
NAME                 DESCRIPTION                                     STARS     OFFICIAL   AUTOMATED
ubuntu               Ubuntu is a Debian-based Linux operating s...   5238      [OK]
ubuntu-upstart       Upstart is an event-based replacement for ...   69        [OK]
ubuntu-debootstrap   debootstrap --variant=minbase --components...   27        [OK]


root@lindell:~# docker run -ti ubuntu /bin/bash
Unable to find image 'ubuntu:latest' locally
latest: Pulling from library/ubuntu

b3e1c725a85f: Pull complete
4daad8bdde31: Pull complete
63fe8c0068a8: Pull complete
4a70713c436f: Pull complete
bd842a2105a8: Pull complete
Digest: sha256:7a64bc9c8843b0a8c8b8a7e4715b7615e4e1b0d8ca3c7e7a76ec8250899c397a
Status: Downloaded newer image for ubuntu:latest

root@d1b13e2c3d3f:/# docker images
bash: docker: command not found

root@d1b13e2c3d3f:/# hostname -f
d1b13e2c3d3f

root@d1b13e2c3d3f:/# uname -r
4.4.0-47-generic

We just downloaded an official Ubuntu image and started an Ubuntu container by running /bin/bash inside the newly started container. The ‘-ti’ option runs bash interactively(-i) by allocating a pseudo-TTY(-t).

Note that – the kernel version on the container is the same as the host’s kernel version. During first run, Docker will try to find Ubuntu image in our local storage, if it can’t find it, it downloads it from Docker Hub. On next runs, starting the containers will be much faster.

If we check the images and processes running now –

root@lindell:~# docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
ubuntu              latest              104bec311bcd        5 days ago          129 MB
root@lindell:~# docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED              STATUS              PORTS               NAMES
d1b13e2c3d3f        ubuntu              "/bin/bash"         About a minute ago   Up About a minute              

At this point, if we exit from the container, docker ps will no longer show the container as it has been terminated. We use ‘docker ps -a’ instead to view it and then use ‘docker start’ command to start the container –

root@d1b13e2c3d3f:/# exit
exit


root@lindell:~# docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

root@lindell:~# docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS                     PORTS               NAMES
d1b13e2c3d3f        ubuntu              "/bin/bash"         5 minutes ago       Exited (0) 5 seconds ago

root@lindell:~# docker exec -ti d1b13 /bin/bash
root@d1b13e2c3d3f:/# uptime
 01:39:46 up  1:18,  0 users,  load average: 0.39, 0.39, 0.37
root@d1b13e2c3d3f:/# 
               

On Part 2 of quick introduction to Docker, we will walk through using Dockerfile to automate image creation. We will see how quickly we can go from development to deployment.

You might also find some of the questions I answered in Stackoverflow about Docker.

Splunk offers a free version with a 500 MB per day indexing limit, which means you can only add 500 MB amount of new data for indexing per day. This might work for most home users, the only problem is the first time you install Splunk, you might configure it to injest your existing log files which most likely are above 500 MB if you consolidate your logs in a syslog server like I do. In this case, Splunk will stop indexing any data above 500 MB per day. During first time indexing, make sure your existing data or log files are below this limit. If for some reason, you ask Splunk to injest way more than 500 MB of data and you want to start fresh, run the following command to clean up the data –

 splunk  clean eventdata 

You can find the details on Splunk Free on this link.

Here is the series of commands I had to execute to clean up the event data –

[daniel@localhost]$ pwd 
/opt/splunk/bin
[daniel@localhost]$ sudo -H -u splunk ./splunk  clean eventdata
In order to clean, Splunkd must not be running.

[daniel@localhost bin]$ sudo -H -u splunk /opt/splunk/bin/splunk stop
Stopping splunkd...
Shutting down.  Please wait, as this may take a few minutes.
..                                                         [  OK  ]
Stopping splunk helpers...
                                                           [  OK  ]
Done.

[daniel@localhost bin]$ sudo -H -u splunk ./splunk  clean eventdata
This action will permanently erase all events from ALL indexes; it cannot be undone.
Are you sure you want to continue [y/n]? y
Cleaning database _audit.
Cleaning database _blocksignature.
Cleaning database _internal.
Cleaning database _introspection.
Cleaning database _thefishbucket.
Cleaning database history.
Cleaning database main.
Cleaning database summary.
Disabled database 'splunklogger': will not clean.

[daniel@localhost bin]$ sudo -H -u splunk /opt/splunk/bin/splunk start
Checking prerequisites...
	Checking http port [8000]: open
	Checking mgmt port [8089]: open
	Checking appserver port [127.0.0.1:8065]: open
	Checking kvstore port [8191]: open
	Checking configuration...  Done.
	Checking critical directories...	Done
	Checking indexes...
		Validated: _audit _blocksignature _internal _introspection _thefishbucket history main summary
	Done
	Checking filesystem compatibility...  Done
	Checking conf files for problems...
	Done
All preliminary checks passed.

Starting splunk server daemon (splunkd)...  
Done
                                                           [  OK  ]

Waiting for web server at https://127.0.0.1:8000 to be available.. Done


If you get stuck, we're here to help.  
Look for answers here: http://docs.splunk.com

The Splunk web interface is at https://localhost:8000