Extract Files from Salesforce
The Problem
If you’ve ever needed to download files from Salesforce in bulk, you know it’s not straightforward. Whether you’re migrating data, backing up files, or auditing attachments, the native UI forces you to click through records one by one. For orgs with hundreds or thousands of files, this quickly becomes impractical.
The Salesforce Files (ContentVersion) system is powerful, but extracting files programmatically requires dealing with the API, managing authentication, and organizing the downloaded files in a meaningful way. This is especially painful when files are scattered across different parent records like Accounts, Cases, or Opportunities.
The Solution
I built a Node.js tool that automates the entire process of extracting files from Salesforce. It queries ContentVersion records, downloads the actual file data, and automatically organizes them into folders based on their linked parent records. All the files end up neatly organized in a local files folder, ready for whatever you need to do with them.
Why You’d Want This
Here are some real-world scenarios where this tool has been invaluable:
Data Migration: When moving to a new system or consolidating orgs, you need all those files extracted and organized by account or record type.
Backup and Archival: Regular backups of file attachments for compliance or disaster recovery purposes.
Audit and Analysis: Download all files related to specific accounts or record types for external review or processing.
Development and Testing: Pull production files into a sandbox environment for realistic testing scenarios.
File Cleanup Projects: Before deleting old records, extract their files for archival purposes.
How It Works
The tool uses the Salesforce REST API to query ContentVersion records and their LinkedEntity relationships. You can customize the query to target specific objects, date ranges, or other criteria. Each file is then downloaded and placed in a folder named after its parent record, making it easy to understand what file belongs where.
For example, if you’re extracting files from Accounts, you might end up with a structure like:
files/
├── Acme Corporation/
│ ├── contract.pdf
│ └── proposal.docx
├── Global Industries/
│ └── presentation.pptx
└── Tech Solutions Inc/
├── invoice.pdf
└── specs.xlsx
Setup and Usage
Prerequisites
Make sure you have Node.js installed on your machine. This tool was built with Node v19.3.0, but any recent version should work.
Installation
- Clone the repository from GitHub
- Run
npm installto install the required dependencies - Copy
.env.exampleto.envand populate it with your Salesforce credentials:- Domain name (e.g.,
yourorg.my.salesforce.com) - Username
- Password
- Security token
- Domain name (e.g.,
Configuring the Query
The tool’s power comes from its flexibility. Open server.js and modify the SOQL query to target exactly what you need. By default, it queries all ContentVersion records linked to Accounts, but you can easily adjust this:
// Example: Only get files from the last 6 months
WHERE LinkedEntity.Type = 'Account'
AND CreatedDate = LAST_N_MONTHS:6
// Example: Only get PDFs from specific accounts
WHERE LinkedEntity.Type = 'Account'
AND FileExtension = 'pdf'
AND LinkedEntity.Name IN ('Acme Corp', 'Global Inc')
Running the Tool
Once configured, simply run:
node server.js
The tool will authenticate to Salesforce, execute your query, and start downloading files. Progress is logged to the console, and all files are saved to the files directory.
Tips for Daily Use
Start Small: Test with a limited query first (maybe just one account) to make sure everything works as expected before running a massive download.
Watch Your API Limits: Large file extractions can consume API calls quickly. Monitor your org’s API usage if you’re working with thousands of files.
Use Specific Queries: The more targeted your query, the faster and more useful your results. Filter by date, record type, or file extension to get exactly what you need.
Check File Extensions: Some files might have unusual or missing extensions. The tool preserves the original file type, but you might want to add validation or filtering based on your needs.
Schedule Regular Backups: Add this to a cron job or scheduled task for regular, automated backups of your Salesforce files.
Version Control Your Queries: Save different query configurations for different use cases. You might have one for monthly backups, another for account-specific extractions, etc.
Common Use Cases in Development
Sandbox Refreshes: After refreshing a sandbox, files aren’t included. Use this tool to selectively restore files you need for testing.
Cross-Org Migrations: Moving data between orgs? Extract files from the source org and upload them to the destination.
Test Data Generation: Download sample files from production to use as test data in your development environments.
Compliance Reporting: Extract files for specific date ranges or record types when responding to audit requests.
Conclusion
File management in Salesforce shouldn’t require clicking through hundreds of records. Whether you’re migrating data, backing up files, or just need to analyze attachments at scale, having a programmatic solution saves hours of manual work.
The tool is open source and available on GitHub, so feel free to customize it for your specific needs or contribute improvements back to the community.
This post was partially written with assistance from Claude AI to provide additional context and usage examples.