Using Data Loader from the command line
Using Data Loader from the command line
Moving data in and out of the Salesforce environment is a key facet of many applications. The Apex Data Loader provides a handy GUI for accomplishing these tasks, but using the GUI requires some human interaction. Your particular requirements may call for unattended operation of import and export activities. For these scenarios, you can call the Data Loader from the command line, also referred to as the CLI (Command Line Interpreter).
This article will walk you through the process of performing a simple data import with Data Loader through the CLI and discuss some additional facets of using this approach. The command line interface can perform much more complicated operations than the one described in the following sections, but you can refer to the samples and documentation for more information on the extended settings required for these other operations.
When you use the CLI to move data, you are using the same Data Loader engine you would use through the graphical front end interface. You can do all the same things from the command line - insert data, update data, upsert data, delete data or extract data. You can move data to and from a comma-separated variable file (.csv) or to and from a relational database, such as Oracle or SQL Server, that has a standard JDBC driver.
With a default installation on a Windows machine, all the Data Loader files will be placed in the Program Files\salesforce.com\Apex Data Loader APIversion directory, where APIversion corresponds to the API version. The process command, which you will learn about in the next section, will be in the bin subdirectory, and the configuration files will normally be in the conf subdirectory.
The main configuration file used by the CLI is the process-conf.xml file. (Another file which can be involved in the use of Data Loader through the command line, the config.properties file , is discussed in the section below on best configuration practices.)
The process-conf.xml file contains a description for every potential process that could be called from the command line. Each of these processes is referred to as a Process Bean. In this simple example, there is only one Process Bean in the process-conf.xml file. In normal practice, though, you would have a number of Process Bean sections in the configuration file, so calling out the specific process would make sense.
The process-conf.xml file consists of a set of property-value pairs. The file used for this sample process is shown below, with an explanation of the key entries following the listing.
<!DOCTYPE beans PUBLIC "-//SPRING//DTD BEAN//EN" "http://www.springframework.org/dtd/spring-beans.dtd"> <beans> <bean id="csvInsertArtist" class="com.salesforce.dataloader.process.ProcessRunner" singleton="false"> <description>Inserts artist names from CSV file into Artist custom object.</description> <property name="name" value="csvInsertArtist"/> <property name="configOverrideMap"> <map> <entry key="sfdc.debugMessages" value="false"/> <entry key="sfdc.debugMessagesFile" value="c:\dataloader\csvInsertArtist.log"/> <entry key="sfdc.endpoint" value="https://www.salesforce.com"/> <entry key="sfdc.username" value="<i>Your Salesforce ID</i>"/> <entry key="sfdc.password" value="<i>Your encrypted Salesforce password</>"/> <entry key="sfdc.timeoutSecs" value="540"/> <entry key="sfdc.loadBatchSize" value="200"/> <entry key="sfdc.entity" value="Artist__c"/> <entry key="process.operation" value="insert"/> <entry key="process.mappingFile" value="c:\dataloader\artistmap.sdl"/> <entry key="process.outputError" value="c:\dataloader\errorInsertArtist.csv"/> <entry key="process.outputSuccess" value="c:\dataloader\successInsertArtist.csv"/> <entry key="dataAccess.name" value="c:\dataloader\Artists_1_100.csv" /> <entry key="dataAccess.type" value="csvRead" /> <entry key="process.initialLastRunDate" value="2007-06-06T00:00:00.000-0800"/> </map> </property> </bean> </beans>
The next section discusses these entries in more detail.
The process-conf.xml file contains properties for one or more Process Beans. Some of the more important properties are
- name - used to identify the bean in the config.properties file and when you call it from the CLI
- sfdc.entity - the Salesforce object that is the target of the actions. The value of this property is case sensitive.
- process.operation - identifies the type of operation
- process.mappingFile - identifies the mapping between the fields in the source file (dataAccess.name) and the Salesforce object. This mapping file is a .sdl file. The mapping file is the same format as the .sdl file created by the GUI when you save a mapping scheme.
- dataAccess.type - identifies the operation to perform on the source file
- sfdc.username - your Salesforce.com user name
One other property, the sfdc.password entry, deserves a brief discussion of it's own. Since the XML file is stored as plain text on your client machine, you would not want to store your password in the file. To avoid this problem, the Data Loader installation includes the encrypt command in the \bin subdirectory. You can use this command to encrypt your password, either with the default encryption or with an encryption key you supply. You can simply copy the encrypted version of your password to the Bean definition. For more information on using the encrypt command, please refer to the Data Loader documentation.
Running the job
With your configuration files in order, you are now ready to run the job. For this job, I opened up a command window on my Windows machine and moved to the Program Files\salesforce.com\Apex Data Loader APIversion\bin directory, where APIversion corresponds to the API version. Once there, I ran the command
process ../conf csvInsertArtist
The first parameter identifies the directory where the configuration files reside - the default conf directory. The second parameter identifies the particular Process Bean description in the XML configuration file.
Assuming all your files are in order, the job should run smoothly in response to this command. In the case of this job, the successful completion will also create .csv files listing the successful inserts and any errors - which come in handy for me if I want to delete the records I just inserted. The success file has the IDs of all the rows inserted, so I could run a delete job where I mapped the values in that column to the ID field in the matching Salesforce object.
File interactions and best practices
The previous description walked you through a simple example, using a version of the process-conf.xml file with one Process Bean listing. In reality, there are three places where you can specify configuration parameters - on the command line, as you did with the configuration directory and the Process Bean name, in the config.properties file, or in the process-conf.xml file. These three locations can all be used in running a single job.
Property value interactions
There are three places where you can define property-value pairs -
|The command line||The command line requires two parameters - the configuration directory and the name of the process. You can also include any other property value pairs that could be listed in the process-conf.xml file. Any property-value pairs included as part of the command line are used in place of any other specifications in the other files.|
|config.properties||File located in the configuration file, which can contain any property-value pairs. Any property-value pairs listed in this file are used in place of their specification in the process-conf.xml file.|
|process-conf.xml||File that contains properties for all processes. Each process has it's own section in the file. To invoke a particular process defined in process-conf.xml pass the name of the process in the command via the process.name parameter. For example; jar -cp DataLoader.jar -Dsalesforce.config.dir=conf com.salesforce.lexiloader.process.ProcessRunner process.name=csvUpsertAccount. Otherwise, the process-conf.xml file will not be referenced and all required Data Loader values will need to be entered in the config.properties file.|
The scope of arguments in the command line is the individual process being called. The scope of the property-value pairs in the process-conf.xml file relate to the specific process that contains the pair. The property-value pairs in the optional config.properties file relate to all processes.
These differences naturally lead to a preferred set of practices. The process-conf.xml file acts as a sort of repository, with all the necessary configuration parameters for all possible processes. The config.properties file should be used to hold those parameters that remain the same over all or most of the processes, such as the sfdc.endpoint parameter. The parameters used as part of the command line are specific to individual runs of a single process.
The role of the GUI=
You can use the GUI interface to Data Loader to help in specifying property values for your command line processes. The GUI uses the config.properties file, but in a different way. This file is used by the GUI interface to store property values each time it runs a job. You can get many of the correct values for parameters in the process-conf.xml file by running the desired job from the GUI and simply copying the parameter names and values from the resulting config.properties file.
In a similar manner, you can use the GUI to map between source and target fields. When you store one of these mappings, the result is put in an .sdl file, which can be used as the value for the process.mappingFile value in the process-conf.xml file.
Hopefully, you will now have enough information to create and run your own Data Loader jobs from the command line. This section describes some problems that might be preventing you from reaching your goal, and suggestions for overcoming them.
- My process is throwing off Java exceptions
Java exceptions frequently come from the Data Loader process not being able to find files required for operation, such as the swt file included in the Java code for Data Loader. When the Apex Data Loader is installed, the process also installs all the files needed for the runtime environment in the same directory – Program Files\salesforce.com\Apex Data Loader APIversion, where APIversion corresponds to the API version. You may be trying to run the dataloader.jar file without being able to find the rest of these files. One easy solution to this issue is to use the default directory structures and run the process command from the \bin subdirectory in the default directory structure.
- I have some extraneous rows of data at the beginning of my .csv file. How can I get the Data Loader to ignore them?
Normally, Data Loader expects the first row of data to contain the column names for the file. The next row is considered the first row of real data. In cases where you might have additional rows that you do not want loaded, you simply set the property rowToStartAt to a value larger than 0. If you had one row of garbage following the field names, you would set this variable to 1.
- Can I run multiple batch jobs with a single process command?
Each individual process requires its own separate command. You could make your own .bat file which combines multiple process commands, so you could run them with a single .bat file call.
The purpose of this article was to give you a very simple example to help lay the groundwork for more complex CLI Data Loader operations. There are a number of other sources which can help you on the implementation details of other types of tasks
- The documentation for the Data Loader product
- The samples that are installed with the product. These samples are in the /samples subdirectory.
- This page which gives you all the property options for the process-conf.xml file.
- This paper is an excellent account of one developer's experiences using the Data Loader through the command line.