[COMPRESS] tar files and missing bytes?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[COMPRESS] tar files and missing bytes?

Tim Allison
All,
  We recently made TikaInputStream's skip() inherently strict so that it
throws an EOF if a parser tries to skip past the end of a file.  We didn't
notice any problems in our regression tests (aside from some likely
truncated mp4s), but we recently got an issue [1] from a user where this is
a problem for a tar file created by 7z [2].
  Is this a valid tar, or are we right to throw an EOF?

         Thank you.

                   Best,

                       Tim

[1] https://issues.apache.org/jira/browse/TIKA-3110
[2]
https://github.com/AlexOkayJ/apache-tika-tar-issue/blob/master/src/main/resources/7ztar.tar
Reply | Threaded
Open this post in threaded view
|

Re: [COMPRESS] tar files and missing bytes?

Stefan Bodewig
On 2020-06-11, Tim Allison wrote:

>   We recently made TikaInputStream's skip() inherently strict so that it
> throws an EOF if a parser tries to skip past the end of a file.  We didn't
> notice any problems in our regression tests (aside from some likely
> truncated mp4s), but we recently got an issue [1] from a user where this is
> a problem for a tar file created by 7z [2].

>   Is this a valid tar, or are we right to throw an EOF?

Yes, it is, unfortunately. It somewhat depends on what you consider
"valid".

I saw the mail about the TIKA issue before I found this mail, see
https://issues.apache.org/jira/browse/TIKA-3110?focusedCommentId=17134328&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17134328

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]