[Z7 Beta] Random (?) sentences in the outline view

samvimes · June 6, 2024

Hi all,

when working with this PDF ( https://www.degruyter.com/document/doi/10.1515/9781400856626.159/html188BET靠谱吗), I've noticed that Zotero shows an outline as available, which was already a surprise because Firefox doesn't show an outline for this document.

Apparently there is some automatic outline detection happening (which would be a great feature) that in this case is not very helpful!it just shows some apparently random sentences in the outline view:

188BET靠谱吗https://s3.amazonaws.com/zotero.org/images/forums/u5025031/oas4su82aldgac7hmwhi.png

martynas_b · June 6, 2024

Yes, this feature is still experimental, but it will improve over time.Thanks for reporting.

samvimes · June 11, 2024

So far I've noticed quite a few cases in which the outline extraction produces somewhat unhelpful results.In this PDF, for example ( https://www.degruyter.com/document/doi/10.1524/9783050060187/html188BET靠谱吗), Zotero just recognizes the PDFs table of contents as chapter headings:

188BET靠谱吗https://s3.amazonaws.com/zotero.org/images/forums/u5025031/q0vtjjvsbbd5knuuwxar.png

Is it helpful to you if I report all the problems I'm seeing with this feature?

martynas_b · June 11, 2024

Is it helpful to you if I report all the problems I'm seeing with this feature?

Definitely helpful!

samvimes · June 11, 2024

Okay, I'll keep them coming in this case:

In this book ( http://link.springer.com/10.1007/978-3-322-80378-8), the extraction just extracts the book's title:
188BET靠谱吗https://s3.amazonaws.com/zotero.org/images/forums/u5025031/qvbuogn2t2sinkd5nyrz.png

In this text ( https://www.nomos-elibrary.de/index.php?doi=10.5771/0023-5652-2015-182-78) it just extracts the first sentence and the title:
188BET靠谱吗https://s3.amazonaws.com/zotero.org/images/forums/u5025031/btp7pt5czzr9h7hayw3g.png

In another scanned and OCRed book, it just recognizes part of the heading of one chapter title:
188BET靠谱吗https://s3.amazonaws.com/zotero.org/images/forums/u5025031/3kloi0gcn21ix2diae49.png

poettli · June 11, 2024

This paper: https://doi.org/10.2151/sola.15A-012

The detected outline:
188BET靠谱吗https://s3.amazonaws.com/zotero.org/images/forums/u2119014/a6zaj4vkkfs1pk7lge32.png

mjthoraval · June 11, 2024

Another OA example: https://doi.org/10.1063/5.0086745

188BET靠谱吗https://s3.amazonaws.com/zotero.org/images/forums/u265723/a2jheecqqm6kijs2iis7.png

In this case, I had removed the first page of the PDF file before generating the outline to obtain this results.

If I keep the first page, here is the result:
188BET靠谱吗https://s3.amazonaws.com/zotero.org/images/forums/u265723/rpl57rjak7jf6zudcwq5.png

188BET靠谱吗Zotero 7.0.0-beta.85+c0c00a00e (64-bit)
Windows 10

mjthoraval · June 11, 2024

Another OA example showing its attraction to punctuation in equations: https://doi.org/10.1103/PhysRevFluids.9.053301

188BET靠谱吗https://s3.amazonaws.com/zotero.org/images/forums/u265723/vjnz99zpq74z8fh9v9em.png

Note that it is working nicely in some cases, so it is really useful to have this feature, even if only partially working.

188BET靠谱吗Zotero 7.0.0-beta.85+c0c00a00e (64-bit)
Windows 10

mjthoraval · June 12, 2024

I have found 3 books where I see the problem also.
188BET靠谱吗I have sent them to support@zotero.org.

188BET靠谱吗https://s3.amazonaws.com/zotero.org/images/forums/u265723/5097ld8ln3sl29v03rkg.png

188BET靠谱吗https://s3.amazonaws.com/zotero.org/images/forums/u265723/lubofuyzmziidpul78rj.png

188BET靠谱吗https://s3.amazonaws.com/zotero.org/images/forums/u265723/i1ulc6dqmnf8aqdvq3ve.png

mjthoraval · June 14, 2024

Another interesting recent OA article: https://doi.org/10.1021/acsami.3c17037
It is extracting some useful bookmarks, but the structure is not recognized:

188BET靠谱吗https://s3.amazonaws.com/zotero.org/images/forums/u265723/k09gyqyo1sc0itv6hhtp.png

188BET靠谱吗Zotero 7.0.0-beta.87+f59a4da7f (64-bit)
Windows 10

samvimes · June 22, 2024

188BET靠谱吗In an OCRed PDF, Zotero detects some random sentences as Outline:

188BET靠谱吗https://s3.amazonaws.com/zotero.org/images/forums/u5025031/4kv0unsu2103h72jibqz.png

Maybe it is because I quite often work with OCRed texts, but so far the outline detection feature has rarely been successful for me.