The Shell has environment variables which determine its behavior. Exported environment variables are also popular ways of making an application change its behavior. These environment variables can be loaded or ‘sourced’ using the source builtin command or ‘.’ notation. In this post, I will share a particular problem I encountered while sourcing environment variables I saved in a .envrc file.

Problem – sourcing .envrc was not loading the right environment variables, including ‘. envrc’. Renaming .envrc to any other file works though.

[daniel@kauai tmp]$ cat .envrc 
NAME='Jhon Doe'
[daniel@kauai tmp]$ source .envrc 
[daniel@kauai tmp]$ echo $NAME
Alice Bob

As you can see, the variable NAME was set to ‘Jhon Doe’ and yet after sourcing .envrc, NAME is showing ‘Alice Bob’! Renaming the file seems to resolve the issue –

[daniel@kauai tmp]$ source .envrcs 
[daniel@kauai tmp]$ echo $NAME
Jhon Doe

Troubleshooting using strace – I followed the tips on ‘is-it-possible-to-strace-the-builtin-commands-to-bash’ to strace ‘source’. Stracing shell builtins is not straight forward. After looking at the output, I found out that the shell builtin source was actually reading the .envrc from a different directory, not my current working directory! The directory it was sourcing from was one of the directories in $PATH environment variables.

Read the man pages – Looking at the man page for bash, under the section for source command –

source filename [arguments]
Read and execute commands from filename in the current shell environment and return the exit status of the last command executed from filename. If filename does not contain a slash, file names in PATH are used to find the directory containing filename. The file searched for in PATH need not be executable. When bash is not in posix mode, the current directory is searched if no file is found in PATH. If the sourcepath option to the shopt builtin command is turned off, the PATH is not searched. If any arguments are supplied, they become the positional parameters when filename is executed. Otherwise the positional parameters are unchanged. The return status is the status of the last command exited within the script (0 if no commands are executed), and false if filename is not found or cannot be read.

Apparently this is an expected behavior. If I hadn’t a .envrc file in one of the $PATH directories, this would have been fine. In this case, there are several solutions –

1. Remove .envrc from $PATH directories [not the best option ]
2. Rename .envrc to a different file [ not ideal either ]
3. When sourcing the file, use absolute path [ good practice ]

[daniel@kauai tmp]$ echo $NAME
Alice Bob
[daniel@kauai tmp]$ pwd
/tmp
[daniel@kauai tmp]$ source /tmp/.envrc 
[daniel@kauai tmp]$ echo $NAME
Jhon Doe

4. When sourcing the file, add slash(‘/’), for instance – source ./.envrc [ good practice ]
5. Disable sourcepath shell option [ not bad idea ]

[daniel@kauai tmp]$ shopt sourcepath
sourcepath     	on
[daniel@kauai tmp]$ shopt -u sourcepath
[daniel@kauai tmp]$ source .envrc 
[daniel@kauai tmp]$ echo $NAME
Jhon Doe

Bottom line – Make sure you understand how environment variables sourcing or loading works in bash and you follow good practices, so that you won’t wast precious hours trying to figure out why your script is behaving strangely.

Sooner or later, you will find yourself adding sensitive data into Ansible playbooks, host or group vars files.Such information might include MySQL DB credentials, AWS secret keys, API credentials etc. Including such sensitive information in plain text might not be acceptable for security compliance reasons or even lead to your systems being owned when your company hires a third party to do pen testing and worst yet by outside hackers. In addition to this, sharing such playbooks to public repositories such as github won’t be easy as you have to manually search and redact all the sensitive information from all your playbooks, and as we know manual procedure is not always error prone. You might ‘forget’ to remove some of the paswords.

One solution for this is a password vault to hold all your sensitive data, and Ansible provides a utitility called ansible-vault to create this encrypted file and the data can be extracted when running your playbooks with a single option. This is equivalent to Chef’s data bag.

In this blog post, I will share with you how to use a secret key file to protect sensitive data in Ansible with ansible-vault utility. The simplest use case is to protect the encrypted file with a password or passphrase, but that is not convinient as you have to type the password everytime you run a playbook and is not as strong as a key file with hundreds or thousands of random characters. Thus the steps below describe only the procedure for setting up a secret key file rather than a password protected encrypted file. Let us get started.

The first step is to generate a key file containing a random list of characters –

#openssl rand -base64 512 |xargs > /opt/ansible/vaultkey

Create or initialize the vault with the key file generated above –

#ansible-vault create --vault-password-file=/opt/ansible/vaultkey /opt/ansible/lamp/group_vars/dbservers.yml

Populate your vault, refer to Ansible documentation on the format of the vault file –

#ansible-vault edit --vault-password-file=/opt/ansible/vaultkey /opt/ansible/lamp/group_vars/dbservers.yml

You can view the contents by replacing ‘edit’ with ‘view’ –

#ansible-vault view --vault-password-file=/opt/ansible/vaultkey /opt/ansible/lamp/group_vars/dbservers.yml

That is it, you have a secret key file to protect and encrypt a YAML file containing all your sensitive variables to be used in your ansible playbooks.

There comes a time though when you have to change the secret key file, say an admin leaves the company after winning the Mega jackbot lottery 🙂 We have to generate a new key file and rekey the encrypted file as soon as possible –

Generate a new key file –

#openssl rand -base64 512 |xargs > /opt/ansible/vaultkey.new

Rekey to new key file –

#ansible-vault rekey --new-vault-password-file=/opt/ansible/vaultkey.new --vault-password-file=/opt/ansible/vaultkey
Rekey successful

Verify –

#ansible-vault view --vault-password-file=/opt/ansible/vaultkey.new /opt/ansible/lamp/group_vars/dbservers.yml

Last but not least, make sure the secret key file is well protected and is readable only by the owner.

#chmod 600 /opt/ansible/vaultkey.new

Finally, you can use the vault with ansible-playbook. In this case, I am running it against site.yml which is a master playbook to setup a LAMP cluster in AWS (pulling the AWS instances using ec2.py dynamic inventory script) –

#ansible-playbook -i /usr/local/bin/ec2.py site.yml --vault-password-file /opt/ansible/vaultkey.new

Web sites store information on local machines of site visitors using cookies. On subsequent visits, the browser sends the data from the cookies on the visitors machine to the web server, which might then use that information as a historical record of the users activity on the site – on the minimum the time the cookie was created, when it is set to expire and last access time or last time user visited site. Cookies are also used by sites to ‘remember’ user acitivity , say the shopping cart items or login/session information to address the shortcomings of the stateless HTTP protocol.

Most users think that only the sites they had directly visited store cookies on their computers, in reality the number is way higher than that. A single site you visit, usually has lots of links in it, especially ads, that store cookies in your computer. In this post, i will demonstrate how to list the list of all sites that left cookies in your computer, as well as extract additional information from the cookies. When i ran the script and did a count of the 10 top sites which left largest number of entries in the cookies sqlite DB, none of them except for one or two were sites I directly visited!

This Python script was written to extract cookies information on a Linux box running Firefox. The cookies information is stored as a sqlite file and thus you will need the sqlite3 python module to read the sqlite file.

The script takes the path to the cookies file as well as the path to the output file, it will write the output to this file. It will also dump the output to the screen.

root@dnetbook:/home/daniel/python# python cookie_viewer.py 
cookie_viewer.py cookie-fullpath output-file

root@dnetbook:/home/daniel/python# python /home/daniel/python/cookie_viewer.py $(find /home/daniel/ -type f -name 'cookies.sqlite' | head -1) /tmp/test.txt
doubleclick.net,Thu Feb 11 17:56:01 2016,Thu Apr 23 20:46:58 2015,Tue Feb 11 17:56:01 2014
twitter.com,Thu Feb 11 17:56:05 2016,Tue Apr 21 22:27:46 2015,Tue Feb 11 17:56:05 2014
imrworldwide.com,Thu Feb 11 17:56:12 2016,Tue Apr 21 22:19:35 2015,Tue Feb 11 17:56:12 2014
quantserve.com,Thu Aug 13 19:32:02 2015,Thu Apr 23 20:46:57 2015,Tue Feb 11 18:32:0

The output will be the domain name of the site, cookie expiry date, access time and creation time.

Code follows –

#!/usr/bin/env python

''' Given a location to firefox cookie sqlite file
    Write its date param - expiry, last accessed,
    Creation time to a file in plain text.
    id
    baseDomain
    appId
    inBrowserElement
    name
    value
    host
    path
    expiry
    lastAccessed
    creationTime
    isSecure
    isHttpOnly
    python /home/daniel/python/cookie_viewer.py $(find /home/daniel/ -type f -name 'cookies.sqlite' | head -1) /tmp/test.txt 
'''

import sys
import os
from datetime import datetime
import sqlite3

def Usage():
    print "{0} cookie-fullpath output-file".format(sys.argv[0])
    sys.exit(1)

if len(sys.argv)<3:
    Usage()

sqldb=sys.argv[1]
destfile=sys.argv[2]
# Some dates in the cookies file might not be valid, or too big
MAXDATE=2049840000

# cookies file must be there, most often file name is cookies.sqlite
if not os.path.isfile(sqldb):
    Usage()

# a hack - to convert the epoch times to human readable format
def convert(epoch):
    mydate=epoch[:10]
    if int(mydate)>MAXDATE:
        mydate=str(MAXDATE)
    if len(epoch)>10:
        mytime=epoch[11:]
    else:
        mytime='0'
    fulldate=float(mydate+'.'+mytime)
    x=datetime.fromtimestamp(fulldate)
    return x.ctime()

# Bind to the sqlite db and execute sql statements
conn=sqlite3.connect(sqldb)
cur=conn.cursor()
try:
    data=cur.execute('select * from moz_cookies')
except sqlite3.Error, e:
    print 'Error {0}:'.format(e.args[0])
    sys.exit(1)
mydata=data.fetchall()

# Dump results to a file
with open(destfile, 'w') as fp:
    for item in mydata:
        urlname=item[1]
        urlname=item[1]
        expiry=convert(str(item[8]))
        accessed=convert(str(item[9]))
        created=convert(str(item[10]))
        fp.writelines(urlname + ',' + expiry + ',' + accessed + ',' + created)
        fp.writelines('\n')

# Dump to stdout as well
with open(destfile) as fp:
    for line in fp:
        print line

TOP 10 sites with highest number of enties in the cookies file –

root@dnetbook:/home/daniel/python# awk -F, '{print $1}' /tmp/test.txt  | sort | uniq -c | sort -nr | head -10
     73 taboola.com
     59 techrepublic.com
     43 insightexpressai.com
     34 pubmatic.com
     33 2o7.net
     31 rubiconproject.com
     28 demdex.net
     27 chango.com
     26 yahoo.com
     26 optimizely.com

In Python, you can read from and write to files without import any modules. Python has built-in function “open” which can be used to view and manipulate file objects. Let us see two ways of opening a file for reading/writing, for instance –

   fp_in = open('/etc/hosts', 'r')  # default is 'r', we can omit it.
   fp_out = open('/tmp/hosts', 'w')
   for line in fp_in:
       fp_out.write(line)

   fp_in.close()
   fp_out.close()


   with open('/etc/hosts') as fp_in:
       with open('/tmp/hosts') as fp_out:
       for line in fp_in:
           fp_out.write(line)
   # No need to close file, it is automatically closed at end of block.

One of the most common reasons given why you have to close the file object in the first case is to free up resources. But there is a second reason why you should always use ‘with’ keyword. After writing to a file object, and before closing it, the whole content from the source file might not appear in the destination file. This is because write uses buffering, and the changes will not be reflected until you run flush() or close() on the file object. Here is the help page for ‘write’ –

write(...)
    write(str) -> None.  Write string str to file.
    
    Note that due to buffering, flush() or close() may be needed before
    the file on disk reflects the data written.

Let me demonstrate this by copying the /var/log/messages file to /tmp/message, the bigger the file, the more likely you will witness the effect of buffering. First i will take a copy of /var/log/messages to /var/log/messages.orig, and work with messages.orig as the former will most likely change in size as work along.

[root@kauai ~]# wc -l /var/log/messages.orig 
10544 /var/log/messages.orig

[root@kauai ~]# wc -l /tmp/messages 
10542 /tmp/messages
[root@kauai ~]# tail -1 /tmp/messages 
Nov 16 02:36:02 kauai syslog-ng[1605]: Log statistics; processed='src.internal(s_sys[root@kauai ~]# 

[root@kauai ~]# tail -1 /var/log/messages
Nov 16 02:46:02 kauai syslog-ng[1605]: Log statistics; processed='src.internal(s_sys#2)=1787', stamp='src.internal(s_sys#2)=1416123362', processed='source(s_name_servers)=0', processed='destination(d_mesg)=7693', processed='destination(d_auth)=210', processed='source(s_sys)=12643', processed='global(payload_reallocs)=3568', processed='destination(d_mail)=12', processed='destination(d_kern)=5176', processed='destination(d_mlal)=0', processed='destination(d_ns_filtered)=0', processed='global(msg_clones)=0', processed='destination(d_spol)=0', processed='destination(hosts)=12643', processed='destination(d_boot)=0', processed='global(sdata_updates)=0', processed='center(received)=0', processed='destination(d_cron)=3653', processed='center(queued)=0'

Notice how the destination file /tmp/messages got truncated, it doesn’t even have a newline character at the end.

fp_out.close()
[root@kauai ~]# wc -l /tmp/messages 
10544 /tmp/messages

[root@kauai ~]# tail -1 /var/log/messages
Nov 16 02:56:02 kauai syslog-ng[1605]: Log statistics; processed='src.internal(s_sys#2)=1788', stamp='src.internal(s_sys#2)=1416123962', processed='source(s_name_servers)=0', processed='destination(d_mesg)=7694', processed='destination(d_auth)=211', processed='source(s_sys)=12646', processed='global(payload_reallocs)=3570', processed='destination(d_mail)=12', processed='destination(d_kern)=5176', processed='destination(d_mlal)=0', processed='destination(d_ns_filtered)=0', processed='global(msg_clones)=0', processed='destination(d_spol)=0', processed='destination(hosts)=12646', processed='destination(d_boot)=0', processed='global(sdata_updates)=0', processed='center(received)=0', processed='destination(d_cron)=3654', processed='center(queued)=0'

This problem would not have happened if we had used the ‘with’ keyword, as it automatically does the flush() and close() for us at the end of the block statement –

    with open('/var/log/messages.orig') as fp_in:
    with open('/tmp/messages','w') as fp_out:
        for line in fp_in:
            fp_out.write(line)

[root@kauai ~]# wc -l /var/log/messages.orig 
10544 /var/log/messages.orig
[root@kauai ~]# wc -l /tmp/messages 
10544 /tmp/messages

There you go, both source and destination files synced immediately.

This script is written based on the list of U.S. federal holidays I found in Wikipedia – Wikipedia – U.S. Federal holidays. Some of the dates, such as New Year, are straight forward, as the date and month are fixed. While others require some effort, take for instance Thanksgiving, which is on the fourth Thursday of November OR Memorial day – last Monday of May.

The script is written in bash, and only tested in 32 bit Ubuntu netbook. It will exit with an error message if you try to get the holidays for the year 2038 or above. This is a know issue with UNIX dates on 32 bit Operating Systems – UNIX: Year 2038 problem

Sample output

daniel@dnetbook:~$ /usr/local/bin/federalholidays.sh
Usage: federalholidays.sh Year
Eg. federalholidays.sh 2014


daniel@linubuvma:~$ ./federalholidays.sh 1500
New Year's Day:               Monday, January 01, 1500
Martin Luther King, Jr. Day:  Monday, January 15, 1500
Washington's Birthday:        Monday, February 19, 1500
Memorial Day:                 Monday, May 28, 1500
Independence Day:             Wednesday, July 04, 1500
Labor Day:                    Monday, September 03, 1500
Columbus Day:                 Monday, October 08, 1500
Veteran's Day:                Sunday, November 11, 1500
Thanksgiving:                 Thursday, November 22, 1500
Christmas Day:                Tuesday, December 25, 1500


daniel@linubuvma:~$ ./federalholidays.sh 2014
New Year's Day:               Wednesday, January 01, 2014
Martin Luther King, Jr. Day:  Monday, January 20, 2014
Washington's Birthday:        Monday, February 17, 2014
Memorial Day:                 Monday, May 26, 2014
Independence Day:             Friday, July 04, 2014
Labor Day:                    Monday, September 01, 2014
Columbus Day:                 Monday, October 13, 2014
Veteran's Day:                Tuesday, November 11, 2014
Thanksgiving:                 Thursday, November 27, 2014
Christmas Day:                Thursday, December 25, 2014


daniel@linubuvma:~$ ./federalholidays.sh 2500
New Year's Day:               Friday, January 01, 2500
Martin Luther King, Jr. Day:  Monday, January 18, 2500
Washington's Birthday:        Monday, February 15, 2500
Memorial Day:                 Monday, May 31, 2500
Independence Day:             Sunday, July 04, 2500
Labor Day:                    Monday, September 06, 2500
Columbus Day:                 Monday, October 11, 2500
Veteran's Day:                Thursday, November 11, 2500
Thanksgiving:                 Thursday, November 25, 2500
Christmas Day:                Saturday, December 25, 2500

Here is the whole script, feel free to modify it or report any problem –


#!/bin/bash

ARCH=$(arch)
ARGC=$#

function Usage
{

echo "Usage: $(basename $0) Year"
echo "Eg. $(basename $0) 2014"
exit 1

}

# we will need the year as argument in YYYY format
[[ $ARGC -ne 1 ]] &&  Usage

myyear="$1"
dformat='+%A, %B %d, %Y'

[[ "$myyear" -ge 2038 ]] && [[ "$ARCH" = "i686" ]] && echo 'Year 2038 problem : http://en.wikipedia.org/wiki/Year_2038_problem ' && exit 1

#We will ignore any year below 1902
[[ "$myyear" -lt 1902 ]] && [[ "$ARCH" = "i686" ]] && exit 1

##Function to get the nth day week of the month, for instance, Third Monday of March.

function nth_xday_of_month
{

my_nth=$1
my_xday=$2
my_month=$3
my_year=$4

case "$my_nth" in

1)  mydate=$(echo {01..07})
    ;;
2)  mydate=$(echo {08..14})
    ;;
3)  mydate=$(seq 15 21)
    ;;
4)  mydate=$(seq 22 28)
   ;;
5)  mydate=$(seq 29 31)
    ;;
*) echo "Echo wrong day of the week"
   exit 1
   ;;
esac


for x in $mydate; do
  nthday=$(date '+%u' -d "${my_year}${my_month}${x}")
  if [ "$nthday" -eq "$my_xday" ]; then
   date "${dformat}" -d "${my_year}${my_month}${x}"
  fi
done
}


##Memorial day - Last Monday of May.

for x in {31..01}; do y=$(date '+%u' -d "${myyear}05${x}"); if [ "$y" -eq 1 ]; then memday="${x}" ; break; fi ; done

echo "New Year's Day:              " $(date "${dformat}"  -d "${myyear}0101")
echo "Martin Luther King, Jr. Day: " $(nth_xday_of_month 3 1 01 ${myyear})
echo "Washington's Birthday:       " $(nth_xday_of_month 3 1 02 ${myyear})
echo "Memorial Day:                " $(date "${dformat}" -d "${myyear}05${memday}")
echo "Independence Day:            " $(date "${dformat}" -d "${myyear}0704")
echo "Labor Day:                   " $(nth_xday_of_month 1 1 09 ${myyear})
echo "Columbus Day:                " $(nth_xday_of_month 2 1 10 ${myyear})
echo "Veteran's Day:               " $(date "${dformat}" -d "${myyear}1111")
echo "Thanksgiving:                " $(nth_xday_of_month 4 4 11 ${myyear})
echo "Christmas Day:               " $(date "${dformat}" -d "${myyear}1225")

: <<'federal_holidays_comment'

http://en.wikipedia.org/wiki/Federal_holidays_in_the_United_States

Jan 1 - New Year's Day - 1st day of the year
Third Monday of January - Martin Luther King, Jr. Day 
Third Monday of February - Washington's Birthday
Last Monday of May - Memorial Day.
July 4 - Independence Day.
First Monday of September - Labor Day.
Second Monday of October - Columbus Day.
November 11 - Veteran's Day.
Fourth Thursday of November - Thanksgiving
December 25 - Christmas Day
federal_holidays_comment

One of the things which confuses many Linux users is why the access time attribute of a file does not change, although the file has been clearly accessed a number of times recently. Let me illustrate here by accessing a file, and checking whether the access time changes or not. I will use

 stat -c %x filename 

to grab the atime attribue.

[root@ip-10-136-87-176 lvm]# sleep 10; date; cat myfile ; stat -c %x myfile
Sun Aug  3 20:56:51 UTC 2014
Beam me up, Scotty.
2014-08-03 20:54:40.000000000 +0000
[root@ip-10-136-87-176 lvm]# sleep 10; date; cat myfile ; stat -c %x myfile
Sun Aug  3 20:57:23 UTC 2014
Beam me up, Scotty.
2014-08-03 20:54:40.000000000 +0000

The atime has not changed. Let us check

/proc/mounts

for any mount options.

[root@ip-10-136-87-176 lvm]# pwd
/mnt/lvm
[root@ip-10-136-87-176 lvm]# grep /mnt/lvm /proc/mounts 
/dev/xvdj1 /mnt/lvm ext3 rw,seclabel,relatime,errors=continue,barrier=1,data=ordered 0 0

The answer to our question lies in the

relatime

option.The Linux Kernel starting from version 2.6.30 switched to using the relatime by default during file system mount. Here is the exerpts from the man page for mount command –

relatime
              Update inode access times relative to modify or change time.  Access time is only updated  if  the
              previous  access time was earlier than the current modify or change time. (Similar to noatime, but
              doesn’t break mutt or other applications that need to know if a file has been read since the  last
              time it was modified.)

              Since  Linux  2.6.30,  the kernel defaults to the behavior provided by this option (unless noatime
              was  specified), and the strictatime option is required to obtain traditional semantics. In  addi-
              tion,  since  Linux 2.6.30, the file’s last access time is always  updated  if  it  is more than 1
              day old.

If the Kernel was to update the atime everytime a file was accessed that would be a big performance killer for disks. Specially in servers with lots of files which are accessed frequently, updating the atime attribute everytime a file is accessed would be a huge I/O burden, that is why the Kernel defaults to relatime. But as always, the Linux Kernel provides you the mechanism to update the atime everytime a file is accessed if you want to. For this to work you can use the

strictatime

option during mount. Let me illustrate this –

[root@ip-10-136-87-176 /]# umount /mnt/lvm
[root@ip-10-136-87-176 /]# mount  -o strictatime /dev/xvdj1 /mnt/lvm/
[root@ip-10-136-87-176 /]# grep '/mnt/lvm' /proc/mounts 
/dev/xvdj1 /mnt/lvm ext3 rw,seclabel,errors=continue,barrier=1,data=ordered 0 0
[root@ip-10-136-87-176 /]# cd /mnt/lvm/
[root@ip-10-136-87-176 lvm]# sleep 10; date; cat myfile ; stat -c %x myfile
Sun Aug  3 21:06:22 UTC 2014
Beam me up, Scotty.
2014-08-03 21:06:22.000000000 +0000
[root@ip-10-136-87-176 lvm]# sleep 60; date; cat myfile ; stat -c %x myfile
Sun Aug  3 21:07:27 UTC 2014
Beam me up, Scotty.
2014-08-03 21:07:27.000000000 +0000

Note: If the file system is mounted with a readonly option, the atime won’t be updated for obvious reasons.

The strace command allows us to trace the system calls made by a program. In this blog, I will show you how you can use strace to capture some of the syscalls made by Apache when clients make http requests. strace has several options, but here we will consider only the following options –

-p : attach to the process with the process ID pid
-o filename : write the strace output to the file filename rather than to stderr
-ff : If the -o filename option is in effect, each processes trace is written to filename.pid where pid is the numeric process id of each process. 
-e expr : to filter only specific syscalls (eg. open, fstat etc.)

We will attach strace to the parent process for the apache threads. With -ff specified, strace will trace all children of the parent process and saves the trace output to a file named filename.PID. We will be using ab(Apache HTTP server benchmarking tool) to generate traffic to the web server and see which files apache opens during client requests by explicitly looking for open syscall.

1. Let us find the parent process –

ns1 strace # ps xo comm,pid,ppid | grep apache2
apache2          2062     1

The PID to trace in this case is 2062.

2. Run strace command, while this is running, launch another session and run the ab command –

ns1 strace # strace -ff -o wiki.home.net -e trace=open,close -p 2062
Process 2062 attached - interrupt to quit
Process 18526 attached
Process 18531 attached
Process 18532 attached
Process 18536 attached
Process 18537 attached
Process 18538 attached
Process 18539 attached
Process 18526 detached
Process 18531 detached
Process 18532 detached
Process 18538 detached
^CProcess 2062 detached
Process 18536 detached
Process 18537 detached
Process 18539 detached


140706133405:root:homevm:/home/daniel:# ab -n 25 -c 10 http://wiki.home.net./
This is ApacheBench, Version 2.3 <$Revision: 655654 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking wiki.home.net. (be patient).....done


Server Software:        Apache/2.2.22
Server Hostname:        wiki.home.net.
Server Port:            80

Document Path:          /
Document Length:        0 bytes

Concurrency Level:      10
Time taken for tests:   20.595 seconds
Complete requests:      25
Failed requests:        0
Write errors:           0
Non-2xx responses:      25
Total transferred:      11300 bytes
HTML transferred:       0 bytes
Requests per second:    1.21 [#/sec] (mean)
Time per request:       8237.856 [ms] (mean)
Time per request:       823.786 [ms] (mean, across all concurrent requests)
Transfer rate:          0.54 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        1  282 459.3      2    1004
Processing:  5372 6551 733.6   6846    7631
Waiting:      367 1546 734.1   1842    2629
Total:       5374 6833 991.9   7199    8049

Percentage of the requests served within a certain time (ms)
  50%   7170
  66%   7616
  75%   7819
  80%   7933
  90%   8005
  95%   8013
  98%   8049
  99%   8049
 100%   8049 (longest request)

3. Once ab completes, stop the strace command and do ls in current directory to see the output of strace command for each apache thread which was serving the http request as well as the strace output for the parent process

ns1 strace #  ls -l
total 88
-rw-r--r-- 1 root root 10617 Jul  6 13:34 wiki.home.net.18526
-rw-r--r-- 1 root root 10617 Jul  6 13:34 wiki.home.net.18531
-rw-r--r-- 1 root root 10617 Jul  6 13:34 wiki.home.net.18532
-rw-r--r-- 1 root root 10441 Jul  6 13:34 wiki.home.net.18536
-rw-r--r-- 1 root root 10441 Jul  6 13:34 wiki.home.net.18537
-rw-r--r-- 1 root root 10617 Jul  6 13:34 wiki.home.net.18538
-rw-r--r-- 1 root root 10441 Jul  6 13:34 wiki.home.net.18539
-rw-r--r-- 1 root root   581 Jul  6 13:34 wiki.home.net.2062

4. As you can see the apache parent process doesn’t serve any client requests, the child threads are the ones serving the client requests and in each strace output for the child threads we can see the files accesses/opend –

strace output for parent process - 

ns1 strace # cat wiki.home.net.2062
close(20)                               = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
close(20)                               = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
close(20)                               = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
close(20)                               = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
close(20)                               = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
close(20)                               = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
close(20)                               = 0
--- SIGCHLD (Child exited) @ 0 (0) ---


ns1 strace # head -25 wiki.home.net.18526
open("/proc/sys/kernel/ngroups_max", O_RDONLY) = 20
close(20)                               = 0
open("/etc/group", O_RDONLY|O_CLOEXEC)  = 20
close(20)                               = 0
open("/.htaccess", O_RDONLY|O_CLOEXEC)  = -1 ENOENT (No such file or directory)
open("/var/.htaccess", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
open("/var/www/.htaccess", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
close(22)                               = 0
close(22)                               = 0
close(22)                               = 0
open("/var/www/wiki/index.php", O_RDONLY) = 22
close(22)                               = 0
open("/var/www/wiki/includes/WebStart.php", O_RDONLY) = 22
close(22)                               = 0
open("/var/www/wiki/includes/Init.php", O_RDONLY) = 22
close(22)                               = 0
open("/var/www/wiki/includes/AutoLoader.php", O_RDONLY) = 22
close(22)                               = 0
open("/var/www/wiki/includes/profiler/Profiler.php", O_RDONLY) = 22
close(22)                               = 0
open("/var/www/wiki/includes/Defines.php", O_RDONLY) = 22
close(22)                               = 0
open("/var/www/wiki/includes/normal/UtfNormalDefines.php", O_RDONLY) = 22
close(22)                               = 0
open("/var/www/wiki/includes/DefaultSettings.php", O_RDONLY) = 22

This is my first attempt in response to a question posed in one of StackExchange sites for Unix/Linux – How do you compare two folders and copy the difference to a third folder?. The scripts compares the latest directory, given as argument one, to an old directory, argument two, and creates a difference directory if it doesn’t exist, third argument, and copies the files and directories which exist only in latest directory into the difference directory. It also copies files which are different in latest directory as compared to the old one, to the difference directory. Make sure to put the arguments in the right order – latest directory first, old directory next, and the difference directory last.

Sample usage:

daniel@linubuvma:~/scripts/python$ python copy_difference.py /tmp/test/current /tmp/test/old /tmp/test/difference

(Silent output is good).

daniel@linubuvma:~/practice/python$ ls -1R /tmp/test/current/
/tmp/test/current/:
dirc
extra
newone
one
three
two

/tmp/test/current/dirc:

/tmp/test/current/extra:
extra2
fourth

/tmp/test/current/extra/extra2:

/tmp/test/current/newone:
file2
fileone
daniel@linubuvma:~/practice/python$ ls -1R /tmp/test/old
/tmp/test/old:
extra
newone
one
two

/tmp/test/old/extra:

/tmp/test/old/newone:
file2
daniel@linubuvma:~/practice/python$ ls -1R /tmp/test/difference
ls: cannot access /tmp/test/difference: No such file or directory
daniel@linubuvma:~/practice/python$ python copy_difference.py /tmp/test/current /tmp/test/old /tmp/test/difference
daniel@linubuvma:~/practice/python$ ls -1R /tmp/test/difference
/tmp/test/difference:
extra
newone
three
two

/tmp/test/difference/extra:
fourth

/tmp/test/difference/newone:
fileone

Here is the Python script.


#!/usr/bin/env python

import os, sys
import filecmp
import re
import shutil
holderlist=[]

def compareme(dir1, dir2):
    dircomp=filecmp.dircmp(dir1,dir2)
    only_in_one=dircomp.left_only
    diff_in_one=dircomp.diff_files
    dirpath=os.path.abspath(dir1)
    [holderlist.append(os.path.abspath( os.path.join(dir1,x) )) for x in only_in_one]
    [holderlist.append(os.path.abspath( os.path.join(dir1,x) )) for x in diff_in_one]
    if len(dircomp.common_dirs) > 0:
        for item in dircomp.common_dirs:
            compareme(os.path.abspath(os.path.join(dir1,item)), os.path.abspath(os.path.join(dir2,item)))
        return holderlist

def main():
 if len(sys.argv) > 3:
   dir1=sys.argv[1]
   dir2=sys.argv[2]
   dir3=sys.argv[3]
 else:
   print "Usage: ", sys.argv[0], "currentdir olddir difference"
   sys.exit(1)

 if not dir3.endswith('/'): dir3=dir3+'/'

 source_files=compareme(dir1,dir2)
 dir1=os.path.abspath(dir1)
 dir3=os.path.abspath(dir3)
 destination_files=[]
 new_dirs_create=[]
 for item in source_files:
   destination_files.append(re.sub(dir1, dir3, item) )
 for item in destination_files:
  new_dirs_create.append(os.path.split(item)[0])
 for mydir in set(new_dirs_create):
   if not os.path.exists(mydir): os.makedirs(mydir)
#copy pair
 copy_pair=zip(source_files,destination_files)
 for item in copy_pair:
   if os.path.isfile(item[0]):
    shutil.copyfile(item[0], item[1])

if __name__ == '__main__':
 main()

Getting the URLs in your favorites or bookmarks as a plain list.

I have tons of pages that i bookmarked in my Firefox browser in a Linux box and wanted to get a simple listing of these URLs with titles.

1. Export books marks to a JSON file
2. Extract JSON file to get a simple list

1. How to Export bookmars in Firefox as JSON.
Go to Bookmarks menu
Show All Bookmarks
Import and Backup (click the down arrow to expand it)
Backup
Save (Make sure JSON is selected at the right bottom corner)

The file will be saved something like ‘bookmarks-2013-12-07.json’, the format is ‘bookmarks-yyyy-mm-dd.json’. Write down the path where you saved this file, we will need it for the next step.

2. Get a simple list out of the JSON format file

We are going to use the json module for python to load the file into a python list object and print the lines containing URLs. Make sure you set the ‘bookmarks_path’ variable to the path where you saved the bookmarks file.


#!/usr/bin/env python
'''extract a list of URLs from Firefox exported bookmars JSON file '''

import sys
import os
import json
import io

def Usage():
    print "{0} Path-to-bookmarks-file".format(sys.argv[0])
    sys.exit(1)

if len(sys.argv) < 2:
    Usage()

bookmark_file = sys.argv[1]

#Does the file exist?
if not os.path.isfile(bookmark_file):
    print "{0} not found.".format(bookmark_file)
    sys.exit(1)

# Load JSON file
fp_data = io.open(bookmark_file, encoding='utf-8')
try:
    jdata = json.load(fp_data)
except ValueError:
    print "{0} not valid JSON file".format(bookmark_file)
    sys.exit(1)
fp_data.close()


#Recursive function to get the title and URL keys from JSON file

def grab_keys(bookmarks_data, bookmarks_list=[]):
  if 'children' in bookmarks_data:
    for item in bookmarks_data['children']:
      bookmarks_list.append({'title': item.get('title', 'No title'),
                             'uri': item.get('uri', 'None')})
      grab_keys(item, bookmarks_list)
  return bookmarks_list


def main():
  mydata=grab_keys(jdata)
  for item in mydata:
    myurl = item['uri']
    if myurl.startswith('http') or myurl.startswith('ftp'):
      print item['uri'], "  ", item['title']

if __name__=="__main__":
  main()

Save this file, say as ‘get_bookmars.py’, and running it will give an output similar to the one below –

[root@localhost]# python get_bookmarks.py
https://www.google.com/ Google
https://aws.amazon.com/ Amazon Web Services, Cloud Computing: Compute, Storage, Database
http://docs.python.org/3/py-modindex.html Python Module Index â Python v3.3.3 documentation
http://www.linuxhomenetworking.com/wiki/#.UqMjHddn21E Linux Home Networking
http://www.zytrax.com/books/dns/ DNS for Rocket Scientists - Contents
http://www.centos.org/ Centos
http://wiki.centos.org/ Wiki
http://www.centos.org/docs/6/ Documentation
http://www.centos.org/modules/newbb/ Forums

Another way of approaching the problem is to export the bookmarks as HTML file and then dump it as text file. Here I used ‘lynx’ (Install it using ‘yum install lynx’ in CentOS/RHEL/Fedora) to dump the file and grepped for the URLs –

[root@localhost]# lynx –dump bookmarks.html | egrep ‘[0-9]+\.[[:space:]]+http’
3. https://www.google.com/
4. https://aws.amazon.com/
5. http://docs.python.org/3/py-modindex.html
6. http://www.linuxhomenetworking.com/wiki/#.UqMjHddn21E
7. http://www.zytrax.com/books/dns/
9. http://www.centos.org/
10. http://wiki.centos.org/
11. http://www.centos.org/docs/6/
12. http://www.centos.org/modules/newbb/

[root@localhost]# lynx –dump bookmarks.html | egrep ‘[0-9]+\.[[:space:]]+http’ | awk ‘{print $2}’
https://www.google.com/
https://aws.amazon.com/
http://docs.python.org/3/py-modindex.html
http://www.linuxhomenetworking.com/wiki/#.UqMjHddn21E
http://www.zytrax.com/books/dns/
http://www.centos.org/
http://wiki.centos.org/
http://www.centos.org/docs/6/
http://www.centos.org/modules/newbb/

In order to use this script, you need to do certain things in advance –

1. Download youtube-dl, a script which allows you to download videos

https://github.com/rg3/youtube-dl

2. Install ffmpet: an audio/video conversion tool.
Ubuntu users can run the following commands –

#apt-get install ffmpeg libavcodec-extra-53

Note: More details can be found here.

Usage Example: –

# ./musicdownloader.sh http://www.youtube.com/watch?v=8tHu-OwzwPg BereketMengstead-mizerey.mp3

#!/bin/bash

downloader=`which youtube-dl`
ffmpeg=`which ffmpeg`
bitrate=192000

ARGC=$#
LINK=$1
FILENAME=$2
SAVEDFILE=$(basename $0)_mymusic123.mp4

if [ $ARGC -ne 2 ]; then
  echo "Usage: $(basename $0) url-link output-file"
  echo "Example: $(basename  $0)  http://www.youtube.com/watch?v=fQZNiMckKbI Azmari-ethio01.mp3"
  exit
fi

$downloader -f 18 $LINK -o $SAVEDFILE  &&  $ffmpeg -i $SAVEDFILE -f mp3 -ab $bitrate -vn $FILENAME

if [ $? -eq 0 ];
then
 echo "File saved in " $FILENAME
 rm $SAVEDFILE
fi