Importing a CSV file into Drupal 7 programmatically using the batch API

The Challenge

So, you've got this CSV file you need to get into Drupal. Out-of-the-box solutions like the feeds module, migrate, etc. just don't cut it sometimes. If you find yourself needing to do any of the following, it's a good idea to import this file as manually as possible, so you have total control over how each row of the file is transformed into a drupal object.

Why would I choose this approach?

  • You need to create multiple entities, nodes or users per row, instead of one per row.
  • You need to relate columns or fields to other Drupal objects, like find the author by "name" and match to the UID.
  • You need to perform any kind of advanced processing on a field rather than just copy-pasting the values.
  • You just like understanding how everything works, you're a decent programmer and want to avoid a headache.

Build a CSV importer tool using the batch API

Drupal's default batch API is a little tricky, but we're gonna help you through it. Here's what we're going to build:

  1. A form where the user can upload a CSV file
  2. An implementation of Drupal's Batch API to import each row of the CSV file 1-at-a-time to avoid memory problems.
  3. Provide validation of the file to make sure the CSV structure is correct.
  4. Programatic importing where we manually generate the node object(s) per row of the CSV.

Download the Sample Module

If you're the type that hates reading and just wants a sample module to play with, no probs, here you go. The rest of this article is simply going to be breaking apart this sample and explaining how it functions.

1. Make a new drupal module

The very first thing we need to do is create a new drupal module with its .info and .module files. If you don't know how to do this, then download the sample module and start from there.

2. Make a new menu item

Declare a menu item where we can store the form and upload our file. We do this using drupal's hook_menu

function csvImporter_menu() {
  $items['csvImporter'] = array(
    'title' => 'Import Data from CSV File',
    'description' => 'Import content from a <abbr title="Comma Separated Values">CSV</abbr> file.',
    'access callback' => 'user_access',
    'access arguments' => array('access content'),
    'page callback' => 'csvImporter_pagecallback',
    'type' => MENU_NORMAL_ITEM,
  );
  return $items ;
}

 

function csvImporter_pagecallback(){
    $module_path = drupal_get_path('module', 'csvImporter');
    
    $form = drupal_get_form('csvImporter_form');
    
    $output = "<p>This tool will attempt to import CSV data";
    $output .= drupal_render($form);
    
    return $output;
  }

Build the $form object

We're going to build a super basic drupal form with a file upload widget. This is how we collect our CSV for processing. We'll do some basic validation on the provided file here as well, like making sure it's the correct format, and has the correct number of lines before passing it off to our batch processor.

function csvImporter_form() {
  $form['#attributes'] = array(
    'enctype' => 'multipart/form-data'
  );

  $form['csvfile'] = array( 
    '#title' => t('CSV File'),
    '#type'  => 'file',
    '#description' => ($max_size = parse_size(ini_get('upload_max_filesize'))) ? t('Due to server restrictions, the <strong>maximum upload file size is !max_size</strong>. Files that exceed this size will be disregarded.', array('!max_size' => format_size($max_size))) : '',    
  ) ;
  $form['submit'] = array(
    '#type' => 'submit',
    '#value' => t('Commence Import'),
  ) ;
  $form['#validate'] = array(
    'csvImporter_validate_fileupload',
    'csvImporter_form_validate',
  ) ;
  return $form ;
}


function csvImporter_validate_fileupload(&$form, &$form_state) {
  $validators = array(
    'file_validate_extensions' => array( 'csv' ),
  );
  
  if ( $file = file_save_upload('csvfile', $validators, "public://", FILE_EXISTS_REPLACE) ) {
    $form_state['values']['csvupload'] = $file->destination;       
  }
  else {
    form_set_error('csvImporter', t('Unable to copy upload file to !dest', array('!dest' => $destination)));
  }      
}


function csvImporter_form_validate(&$form, &$form_state) {
  if ( isset( $form_state['values']['csvupload'] ) ) {
    if ( $handle = fopen($form_state['values']['csvupload'], 'r') ) {
      $line_count = 1 ;
      $first = TRUE ;
      if ( $line = fgetcsv($handle, 4096) ) {

        //-------------------------------------------------------------------------------------------|
        //  This is where you can validate aspects of the file itself, like the number of columns
        //-------------------------------------------------------------------------------------------|
          if(count($line) != 3){
            form_set_error('csvfile', t('This file has the incorrect number of columns. Expecting 3'));
          }
        //-------------------------------------------------------------------------------------------|
        //  End validating aspects of the CSV file
        //-------------------------------------------------------------------------------------------|
        
        
        
      }
      fclose($handle);
    }
    else {
      form_set_error('csvfile', t('Unable to read uploaded file !filepath', array('!filepath' => $form_state['values']['csvupload'])));
    }
  }
}


function csvImporter_form_submit(&$form, &$form_state) {
  $batch = array(
    'title' => t('Importing CSV ...'),
    'operations' => array(),
    'init_message' => t('Commencing'),
    'progress_message' => t('Processed @current out of @total.'),
    'error_message' => t('An error occurred during processing'),
    'finished' => 'csvImporter_import_finished',
  ) ;
  if ( isset( $form_state['values']['csvupload'] ) ) {
    if ( $handle = fopen($form_state['values']['csvupload'], 'r') ) {
      $batch['operations'][] = array('_csvImporter_remember_filename', array( $form_state['values']['csvupload'] ) ) ;
      $line_count = 1 ;
      $first = TRUE ;
      $line = fgetcsv($handle, 4096);
      while ( $line = fgetcsv($handle, 4096) ) {
        /**
         * we use base64_encode to ensure we don't overload the batch
         * processor by stuffing complex objects into it
         */
        $batch['operations'][] = array('_csvImporter_import_line', array(array_map('base64_encode', $line)));        
      }
      fclose($handle);
    } // we caught this in csvImporter_form_validate()
  } // we caught this in csvImporter_form_validate()
  batch_set($batch);
}

Make some helper functions for dealing with the batch API

We need a few helpers for our batch API to process correctly, like what happens when it's done importing, and remembering the filename.

function csvImporter_import_finished($success, $results, $operations) {
  if ( !empty($results['failed_rows']) ) {
    $dir  = file_directory_path() . '/csvImporter/' ;
    if ( file_check_directory( $dir, FILE_CREATE_DIRECTORY ) ) {
      $csv_filename = 'failed_rows-'. basename($results['uploaded_filename']); // we validated extension on upload
      $csv_filepath = $dir .'/'. $csv_filename;
      $targs = array(
        '!csv_url' => l(check_plain($csv_filename), $csv_filepath),
        '%csv_filename' => $csv_filename,
        '%csv_filepath' => $csv_filepath,
      ) ;
      if ( $handle = fopen($csv_filepath, 'w+') ) {
        foreach( $results['failed_rows'] as $failed_row ) {
          fputcsv($handle, $failed_row);
        }
        fclose($handle);
        drupal_set_message(t('Some rows failed to import. You may download a CSV of these rows: !csv_url', $targs), 'error');
      }
      else {
        drupal_set_message(t('Some rows failed to import, but unable to write error CSV to %csv_filepath', $targs), 'error');
      }
    }
    else {
      drupal_set_message(t('Some rows failed to import, but unable to create directory for error CSV at %csv_directory', $targs), 'error');
    }
  }
  return t('The CSV import has completed.');
}


function _csvImporter_remember_filename($filename, &$context) {
  $context['results']['uploaded_filename'] = $filename;
}

Write a function for importing each individual line

This is the cool part, and the only part you're probably going to modify for your use. Here we actually process each line, one-at-a-time, and do stuff with it. You could make each line into a node by inserting it programmatically. In this example, we're simply setting a message.

function _csvImporter_import_line($line, $session_nid, &$context) {
      
    $context['results']['rows_imported']++;
    $line = $cleaned_line = array_map('base64_decode', $line);
    
    
    //-------------------------------------------------------------------------------------------|
    //  Give feedback to the importer about which operation we're preforming
    //-------------------------------------------------------------------------------------------|
      //Show the row count by default
      $context['message'] = t('Importing row !c', array( '!c' => $context['results']['rows_imported'] ));
    
      //Or, Provide some feedback about the row currently being processed
      $context['message'] = t('Importing %title', array('%title' => $line[0]));
    //-------------------------------------------------------------------------------------------|
    //  End giving feedback to the importer
    //-------------------------------------------------------------------------------------------|
  
    //-------------------------------------------------------------------------------------------|
    //  This is where you would convert the line of the CSV file into a new object, be it a node, user, etc.
    //-------------------------------------------------------------------------------------------|
  
      drupal_set_message("Parsed line {$line[0]}");
  
    //-------------------------------------------------------------------------------------------|
    //  End converting the line to a drupal object
    //-------------------------------------------------------------------------------------------|
    
    //-------------------------------------------------------------------------------------------|
    //  You can capture and report on failed lines by doing something like this
    //-------------------------------------------------------------------------------------------|
      if ( $line[1] == 'ROW' && $line[2] == 'FAILS' ) {
        $context['results']['failed_rows'][] = $line ;
      }
    //-------------------------------------------------------------------------------------------|
    //  End capturing and reporting on failed lines
    //-------------------------------------------------------------------------------------------|
  }

 

Rejoice, you're done.

If this helped you, or sucked, please let us know in the comments!

Download the Sample Module

In case you missed it earlier, here's the completed module ready for manipulation.