How to create a tab-separated file with Scrapy

Does Scrapy have a Tab Separated Exporter?

This is a classic case of good-news-bad-news.

Let’s look at the bad news first. 

Scrapy does not support tab-separated file format out of the box. This means that you are left with only one choice–write your own code, which would be an exporter class.

Here comes the good news. Tab-separated files are essentially CSV files, but instead of commas, these use tabs as column separators.

Scrapy bundles the CsvItemExporter class, which means that it supports CSV files out of the box.

We can create a new subclass of CsvItemExporter class and override the comma separator with a tab separator. Sounds simple? Great! Let’s see the details.

The all-new Tab Separated Item Exporter 

Create a new file–exporters.csv in the project folder. In my case, the name of the Scrapy project is tabs, and the file path is tabs/exporters.py

Paste in the following code

from scrapy.exporters import CsvItemExporter

class TabSeparatedItemExporter(CsvItemExporter):
    def __init__(self, *args, **kwargs):
        kwargs["delimiter"] = "\t" #Use any one character separater that you like
        super(CsvCustomSeperator, self).__init__(*args, **kwargs)

That’s it! You have written a custom exporter.

Enable the exporter

Open settings.py, and set the FEED_EXPORTERS to the following dictionary:

FEED_EXPORTERS = {
    "tsv": "tabs.exporters.TabSeparatedItemExporter"
    }

The dictionary key tsv can be anything that you want. I like tsv, because, well, it’s going to be a Tab Separated File 🙂.

How to use the custom exporter

To use this exporter, there are two ways:

  • Enable FEEDS in settings.py
  • Use -o switch with scrapy crawl spider

If you like the settings.py path, here is what you would need to add:

FEEDS = {
        "anyfile.anyextension":{
            'format': 'tsv' # this should be FEED_EXPORTERS dictionary key
        }
    }

As you can figure out, the file name plays no role. The value of the format key should match what you define in the FEED_EXPORTERS.

If you don’t want to add FEEDS in the settings, you can simply run the spider and append -o filename if you like.

scrapy crawl quotes -o my_file.tsv

In this case, the file extension is critical. The file extension determines which feed exporter would be used. In this case, our custom feed exporter is tsv in the settings.py.

Did it work for you? Do let me know in the comments!

5 1 vote
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments

You May Also be Interested In These Blogs