Concatenating / Merging .csv on the Linux or Mac OS Terminal

For data analysis .csv files with houndreds of thousands of data sets still play a role. You might think:

Hey, why worry about .csv, that's an ancient format, nobody uses that!
Think again! CSV files, comma separated values (that also can be separated by tabs or kind of any character you want, but never mind that) are still used a lot on in data analysis as a raw input format.

Thanks to a former colleague I was tasked with merging 1.5 million records that were split over several files into a big CSV file. Of course they all contained neat little header portions like so:

# file 1
Name,Email
Sirius,s.black@hogwarts.com
Remus,r.lupin@hogwarts.com
Whistler,grumpy_redneck@vampirekillerelite.net
Salome,givemethehead@dungeon.com
# file 2
Name,Email
Galadriel,AwesomeElvenQueen@loth.lorien
Saruman,WhiteHand69@dark.tower

So the simple way would be to just cat file* > output_simple.csv, right? That would repeat the row that contains the headers though, which is no good.

CSV only keep first header / skip headers

To concatenate without repeating the headers, we can craft a simple (if that term ever applies to bash) script:

#!/bin/bash

if [[ $# -eq 1 ]] ; then
    echo 'usage:'
    echo './merge.sh pattern output.csv'
    exit 1
fi

output_file=$2

i=0

files=$(ls "$1"*".csv" )
echo $files
for filename in $files; do
  echo $i
  if [[ $i -eq 0 ]] ; then
    # copy csv headers from first file
    echo "first file"
    head -1 $filename > $output_file
  fi
  echo $i "common part"
  # copy csv without headers from other files
  tail -n +2 $filename >> $output_file
  i=$(( $i + 1 ))
done

It's still fairly simple, because it just uses a loop, a conditional and head and tail which either read a line-based file from the top or the bottom (with a specified offset).

This script will produce output like the following to output.csv if run like this: ./merge.sh file output.csv

Name,Email
Sirius,s.black@hogwarts.com
Remus,r.lupin@hogwarts.com
Whistler,grumpy_redneck@vampirekillerelite.net
Salome,givemethehead@dungeon.com
Galadriel,AwesomeElvenQueen@loth.lorien
Saruman,WhiteHand69@dark.tower

I hope somebody finds this useful. If you found this post, let me know what you're doing with it

Are you in email marketing? Are you a data scientist?

Tagged with: #csv #head #tail

Thank you for reading! If you have any comments, additions or questions, please tweet or toot them at me!