How to create a tab-separated file with Scrapy

Does Scrapy have a Tab Separated Exporter?

Scrapy does not support tab-separated file format out of the box. This means that you are left with only one choice–write your code, which would be an exporter class.

Tab-separated or tab-delimited files are CSV files, but instead of commas, these use tabs as column separators.

Scrapy bundles the CsvItemExporter class, which supports CSV files out of the box.

We can create a new subclass of CsvItemExporter class and override the comma separator with a tab separator. Sounds simple? Great! Let’s see the details.

The all-new Tab Separated Item Exporter 

Create a new file–exporters.csv in the project folder. In my case, the name of the Scrapy project is tabs, and the file path is tabs/exporters.py

Paste in the following code

from scrapy.exporters import CsvItemExporter

class TabSeparatedItemExporter(CsvItemExporter):
    def __init__(self, *args, **kwargs):
        kwargs["delimiter"] = "\t" #Use any one character separater that you like
        super(CsvCustomSeperator, self).__init__(*args, **kwargs)

That’s it! You have written a custom exporter.

Enable the exporter

Open settings.py, and set the FEED_EXPORTERS to the following dictionary:

FEED_EXPORTERS = {
    "tsv": "tabs.exporters.TabSeparatedItemExporter"
    }

The dictionary key tsv can be anything that you want. I like tsv, because, well, it’s going to be a Tab Separated File 🙂.

How to use the custom exporter

To use this exporter, there are two ways:

  • Enable FEEDS in settings.py
  • Use -o switch with scrapy crawl spider

If you like the settings.py path, here is what you would need to add:

FEEDS = {
        "anyfile.anyextension":{
            'format': 'tsv' # this should be FEED_EXPORTERS dictionary key
        }
    }

As you can figure out, the file name plays no role. The value of the format key should match what you define in the FEED_EXPORTERS.

If you don’t want to add FEEDS in the settings, you can run the spider and append -o filename if you like.

scrapy crawl quotes -o my_file.tsv

In this case, the file extension is critical. The file extension determines which feed exporter would be used. In this case, our custom feed exporter is tsv in the settings.py.

Did it work for you? Do let me know in the comments!

Quick Review of creating tab-separated files with Scrapy

How to create Tab Separated files with Scrapy

  1. Create a new exporter from CsvItemExporter.

    Create a new file and inherit it from scrapy.exporters.CsvItemExporter

  2. Set the delimiter to a tab character

    In the constructor of the new Exporter, set kwargs[“delimiter”] = “\t”

  3. Enable the exporter in Settings.py

    In the settings, set FEED_EXPORTERS to {“tsv”: “tabs.exporters.TabSeparatedItemExporter”}

  4. Use the -o switch with the new file extension

    Run the spider and use the -o switch with a file name with a custom file extension. Note that the file extension is the dictionary key of the Feed Exporter:
    scrapy crawl -o output_file.tsv

Frequently Asked Questions

How to create CSV with Python

Scrapy has built-in exporters with CSV, JSON, and many other formats. Run your spider with -o your_file_name.csv
Note that file extension is important. A quick tip: use a random extension such as file_name.xyz to see a list of all the supported file types in the current version of Scrapy.

5 2 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments

You May Also be Interested In These Blogs

Python

Virtual Environments in Python

A Python virtual environment is a tool to isolate specific Python environments on a single machine, allowing for separate dependencies and packages for different projects.