Does Scrapy have a Tab Separated Exporter?
Scrapy does not support tab-separated file format out of the box. This means that you are left with only one choice–write your code, which would be an exporter class.
Tab-separated or tab-delimited files are CSV files, but instead of commas, these use tabs as column separators.
Scrapy bundles the CsvItemExporter class, which supports CSV files out of the box.
We can create a new subclass of CsvItemExporter
class and override the comma separator with a tab separator. Sounds simple? Great! Let’s see the details.
The all-new Tab Separated Item Exporter
Create a new file–exporters.csv in the project folder. In my case, the name of the Scrapy project is tabs, and the file path is tabs/exporters.py
Paste in the following code
from scrapy.exporters import CsvItemExporter
class TabSeparatedItemExporter(CsvItemExporter):
def __init__(self, *args, **kwargs):
kwargs["delimiter"] = "\t" #Use any one character separater that you like
super(CsvCustomSeperator, self).__init__(*args, **kwargs)
That’s it! You have written a custom exporter.
Enable the exporter
Open settings.py, and set the FEED_EXPORTERS to the following dictionary:
FEED_EXPORTERS = {
"tsv": "tabs.exporters.TabSeparatedItemExporter"
}
The dictionary key tsv
can be anything that you want. I like tsv
, because, well, it’s going to be a Tab Separated File 🙂.
How to use the custom exporter
To use this exporter, there are two ways:
- Enable FEEDS in settings.py
- Use -o switch with
scrapy crawl spider
If you like the settings.py path, here is what you would need to add:
FEEDS = {
"anyfile.anyextension":{
'format': 'tsv' # this should be FEED_EXPORTERS dictionary key
}
}
As you can figure out, the file name plays no role. The value of the format key should match what you define in the FEED_EXPORTERS.
If you don’t want to add FEEDS in the settings, you can run the spider and append -o filename if you like.
scrapy crawl quotes -o my_file.tsv
In this case, the file extension is critical. The file extension determines which feed exporter would be used. In this case, our custom feed exporter is tsv
in the settings.py.
Did it work for you? Do let me know in the comments!
Quick Review of creating tab-separated files with Scrapy
How to create Tab Separated files with Scrapy
-
Create a new exporter from CsvItemExporter.
Create a new file and inherit it from scrapy.exporters.CsvItemExporter
-
Set the delimiter to a tab character
In the constructor of the new Exporter, set kwargs[“delimiter”] = “\t”
-
Enable the exporter in Settings.py
In the settings, set FEED_EXPORTERS to {“tsv”: “tabs.exporters.TabSeparatedItemExporter”}
-
Use the -o switch with the new file extension
Run the spider and use the -o switch with a file name with a custom file extension. Note that the file extension is the dictionary key of the Feed Exporter:
scrapy crawl -o output_file.tsv
Frequently Asked Questions
How to create CSV with Python
Scrapy has built-in exporters with CSV, JSON, and many other formats. Run your spider with -o your_file_name.csv
Note that file extension is important. A quick tip: use a random extension such as file_name.xyz to see a list of all the supported file types in the current version of Scrapy.