Concatenating / Merging .csv on the Linux or Mac OS Terminal
For data analysis .csv files with houndreds of thousands of data sets still play a role. You might think:
Hey, why worry about .csv, that's an ancient format, nobody uses that!Think again! CSV files, comma separated values (that also can be separated by tabs or kind of any character you want, but never mind that) are still used a lot on in data analysis as a raw input format.
Thanks to a former colleague I was tasked with merging 1.5 million records that were split over several files into a big CSV file. Of course they all contained neat little header portions like so:
# file 1
Name,Email
Sirius,s.black@hogwarts.com
Remus,r.lupin@hogwarts.com
Whistler,grumpy_redneck@vampirekillerelite.net
Salome,givemethehead@dungeon.com
# file 2
Name,Email
Galadriel,AwesomeElvenQueen@loth.lorien
Saruman,WhiteHand69@dark.tower
So the simple way would be to just cat file* > output_simple.csv
, right? That would repeat the row that contains the headers though, which is no good.
CSV only keep first header / skip headers
To concatenate without repeating the headers, we can craft a simple (if that term ever applies to bash) script:#!/bin/bash
if [[ $# -eq 1 ]] ; then
echo 'usage:'
echo './merge.sh pattern output.csv'
exit 1
fi
output_file=$2
i=0
files=$(ls "$1"*".csv" )
echo $files
for filename in $files; do
echo $i
if [[ $i -eq 0 ]] ; then
# copy csv headers from first file
echo "first file"
head -1 $filename > $output_file
fi
echo $i "common part"
# copy csv without headers from other files
tail -n +2 $filename >> $output_file
i=$(( $i + 1 ))
done
It's still fairly simple, because it just uses a loop, a conditional and head
and tail
which either read a line-based file from the top or the bottom (with a specified offset).
This script will produce output like the following to output.csv
if run like this: ./merge.sh file output.csv
Name,Email
Sirius,s.black@hogwarts.com
Remus,r.lupin@hogwarts.com
Whistler,grumpy_redneck@vampirekillerelite.net
Salome,givemethehead@dungeon.com
Galadriel,AwesomeElvenQueen@loth.lorien
Saruman,WhiteHand69@dark.tower
I hope somebody finds this useful. If you found this post, let me know what you're doing with it
Are you in email marketing? Are you a data scientist?