Failing to index files with special characters in title or author names due to pdftotext.exe error
Hello,
188BET靠谱吗I realized that I had 806 files that are in my Zotero library that are unindexed.I attempted to index them with Rebuild Index...>Index Unindexed Items or by right clicking on an individual file and selecting "Reindex Item" but neither approach worked to index the file.So I enabled Debug Output Logging and tried again to see what was going wrong.A number of the pdf files were old papers that were not text, so it makes sense that those could not be indexed.However, most of the pdf files that are failing to index are just regular pdfs of recently published papers, without any special system security settings like encryption enabled.Still, when I attempted to Reindex Item for these pdfs, I was getting this error:
188BET靠谱吗(1)(+0000069): Error running C:\Program Files (x86)\Zotero\pdftotext.exe
Looking at the debug log, I realized that many of these unindexed files have "special" characters in their file names due to the titles having Greek letters (i.e.α, β, μ, etc.) or punctuation (i.e.3′-DNA), or because the author names are not Anglicized and contain accents (i.e.ą, ć, ç, ł, ö, etc.).188BET靠谱吗I did not intentionally include these characters in the file names when I added them to my Zotero library.188BET靠谱吗I usually just you my Google Chrome Zotero plugin to save new papers.
I tested out if these characters were indeed the cause of the error.Here is one example.A file that would not Reindex was named "Gu et al_2020_Effect of the Short-Term Use of Fluoroquinolone and β-Lactam Antibiotics on.pdf".I opened the pdf file location and changed the name of the file to "Gu et al_2020_Effect of the Short-Term Use of Fluoroquinolone and.pdf", eliminating the β character from the file name.188BET靠谱吗After I did this, if I tried to open the pdf in Zotero, a "File Not Found" box popped up, as expected.So I just clicked "Locate..." and opened the renamed pdf.188BET靠谱吗I then right clicked on the file in Zotero and clicked "Reindex Item".This time the indexing worked fine and when I looked at my Search Preferences, the number of Unindexed files had decreased from 806 files to 805 files.
Repeating this process for hundreds of additional files is not something I really want to do.188BET靠谱吗I am also a little confused at to why these characters in the pdf names are such an issue, since in Zotero these characters display fine and I can also search for these characters.Is there anyway to update pdftotext.exe so that it can handle special characters in pdf file names?
188BET靠谱吗I am running Zotero 6.0.36
Submitted with Debug ID D1432943356
188BET靠谱吗I realized that I had 806 files that are in my Zotero library that are unindexed.I attempted to index them with Rebuild Index...>Index Unindexed Items or by right clicking on an individual file and selecting "Reindex Item" but neither approach worked to index the file.So I enabled Debug Output Logging and tried again to see what was going wrong.A number of the pdf files were old papers that were not text, so it makes sense that those could not be indexed.However, most of the pdf files that are failing to index are just regular pdfs of recently published papers, without any special system security settings like encryption enabled.Still, when I attempted to Reindex Item for these pdfs, I was getting this error:
188BET靠谱吗(1)(+0000069): Error running C:\Program Files (x86)\Zotero\pdftotext.exe
Looking at the debug log, I realized that many of these unindexed files have "special" characters in their file names due to the titles having Greek letters (i.e.α, β, μ, etc.) or punctuation (i.e.3′-DNA), or because the author names are not Anglicized and contain accents (i.e.ą, ć, ç, ł, ö, etc.).188BET靠谱吗I did not intentionally include these characters in the file names when I added them to my Zotero library.188BET靠谱吗I usually just you my Google Chrome Zotero plugin to save new papers.
I tested out if these characters were indeed the cause of the error.Here is one example.A file that would not Reindex was named "Gu et al_2020_Effect of the Short-Term Use of Fluoroquinolone and β-Lactam Antibiotics on.pdf".I opened the pdf file location and changed the name of the file to "Gu et al_2020_Effect of the Short-Term Use of Fluoroquinolone and.pdf", eliminating the β character from the file name.188BET靠谱吗After I did this, if I tried to open the pdf in Zotero, a "File Not Found" box popped up, as expected.So I just clicked "Locate..." and opened the renamed pdf.188BET靠谱吗I then right clicked on the file in Zotero and clicked "Reindex Item".This time the indexing worked fine and when I looked at my Search Preferences, the number of Unindexed files had decreased from 806 files to 805 files.
Repeating this process for hundreds of additional files is not something I really want to do.188BET靠谱吗I am also a little confused at to why these characters in the pdf names are such an issue, since in Zotero these characters display fine and I can also search for these characters.Is there anyway to update pdftotext.exe so that it can handle special characters in pdf file names?
188BET靠谱吗I am running Zotero 6.0.36
Submitted with Debug ID D1432943356
-
dstillman188BET靠谱吗Fixed in the Zotero 7 beta, which no longer uses pdftotext.