[compress][io] Avoid InputStream#skip?

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[compress][io] Avoid InputStream#skip?

Stefan Bodewig
Hi all

while looking into https://issues.apache.org/jira/browse/COMPRESS-449 I
realized that calling skip on any InputStream probably means you are up
to receiving an IOException because some stream in there decided to
implement skip via seek and now faces a stream that doesn't support
seek. And there is no way you can test for this.

It looks as if commons-io has decided to simple not use skip at all,
IOUtils.skip only uses read. commons-compress IOUtils.skip tries to be
smart with a combination of skip and read but is fooled when the stream
it reads from is System.in and the amount to skip is big enough.

OTOH not using skip but read on a stream that was happy to provide a
proper skip is a severe waste of time.

Any other ideas than playing save and wasting time?

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [compress][io] Avoid InputStream#skip?

garydgregory
Hi All:

We should at the very least document all of this carefully.

My gut reaction is that an API called skip should call skip internally.
Doing anything else clever as you point out has performance implications
and might violate the principle of least surprise.

We could consider creating a Skipper interface that can be implemented by
with Skip and a SkipRead implementations. Then all of the code that used to
call skip would be parameterized. But this might be too framework-y for
[io], or [compress]

Gary

On Sun, Apr 22, 2018 at 9:52 AM, Stefan Bodewig <[hidden email]> wrote:

> Hi all
>
> while looking into https://issues.apache.org/jira/browse/COMPRESS-449 I
> realized that calling skip on any InputStream probably means you are up
> to receiving an IOException because some stream in there decided to
> implement skip via seek and now faces a stream that doesn't support
> seek. And there is no way you can test for this.
>
> It looks as if commons-io has decided to simple not use skip at all,
> IOUtils.skip only uses read. commons-compress IOUtils.skip tries to be
> smart with a combination of skip and read but is fooled when the stream
> it reads from is System.in and the amount to skip is big enough.
>
> OTOH not using skip but read on a stream that was happy to provide a
> proper skip is a severe waste of time.
>
> Any other ideas than playing save and wasting time?
>
> Stefan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [compress][io] Avoid InputStream#skip?

Stefan Bodewig
On 2018-04-22, Gary Gregory wrote:

> We should at the very least document all of this carefully.

commons-io's IOUtils already quite clearly states it is using read and
not using skip at all.

> My gut reaction is that an API called skip should call skip internally.
> Doing anything else clever as you point out has performance implications
> and might violate the principle of least surprise.

Unfortunately I don't see any clean way to use skip at all.

> We could consider creating a Skipper interface that can be implemented by
> with Skip and a SkipRead implementations. Then all of the code that used to
> call skip would be parameterized. But this might be too framework-y for
> [io], or [compress]

Just grep for IOUtils.skip through the compress code base, it is used
all over the place. :-)

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [compress][io] Avoid InputStream#skip?

sebb-2-2
On 23 April 2018 at 19:45, Stefan Bodewig <[hidden email]> wrote:

> On 2018-04-22, Gary Gregory wrote:
>
>> We should at the very least document all of this carefully.
>
> commons-io's IOUtils already quite clearly states it is using read and
> not using skip at all.
>
>> My gut reaction is that an API called skip should call skip internally.
>> Doing anything else clever as you point out has performance implications
>> and might violate the principle of least surprise.
>
> Unfortunately I don't see any clean way to use skip at all.

If skip only fails when trying to read a new buffer, it might be
possible to get the file position before calling skip.
This assumes that the failure does not occur after reading any further.

>> We could consider creating a Skipper interface that can be implemented by
>> with Skip and a SkipRead implementations. Then all of the code that used to
>> call skip would be parameterized. But this might be too framework-y for
>> [io], or [compress]
>
> Just grep for IOUtils.skip through the compress code base, it is used
> all over the place. :-)
>
> Stefan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [compress][io] Avoid InputStream#skip?

Stefan Bodewig
On 2018-04-23, sebb wrote:

> On 23 April 2018 at 19:45, Stefan Bodewig <[hidden email]> wrote:
>> On 2018-04-22, Gary Gregory wrote:

>>> We should at the very least document all of this carefully.

>> commons-io's IOUtils already quite clearly states it is using read and
>> not using skip at all.

>>> My gut reaction is that an API called skip should call skip internally.
>>> Doing anything else clever as you point out has performance implications
>>> and might violate the principle of least surprise.

>> Unfortunately I don't see any clean way to use skip at all.

> If skip only fails when trying to read a new buffer, it might be
> possible to get the file position before calling skip.
> This assumes that the failure does not occur after reading any further.

I've tried to implement something like this, see the COMPRESS-449
branch. Unfortunately it breaks the assumption that I can mark() the
stream before calling skip() and reset() afterwards - see
Pack200TestCase#testInputStreamMethods()

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [compress][io] Avoid InputStream#skip?

Stefan Bodewig
In reply to this post by Stefan Bodewig
Hi all

as skip throwing exception is uncommon enough that it hasn't been
reported before in Compress and only seems to happen if you use streams
like System.in maybe we can solve the issue by providing a wrapper
stream that overrides skip so that it uses read and advice people to use
that if they use special streams?

For Compress I really don't want to stop using skip as we potentially
use it to skip larger chunks of uncompressed archives when moving ahead
to the next entry without wanting to read the current one and reading
will mean a significant slowdown for everybody in order to avoid
problems in a few corner cases.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [compress][io] Avoid InputStream#skip?

sebb-2-2
On 1 May 2018 at 11:24, Stefan Bodewig <[hidden email]> wrote:
> Hi all
>
> as skip throwing exception is uncommon enough that it hasn't been
> reported before in Compress and only seems to happen if you use streams
> like System.in maybe we can solve the issue by providing a wrapper
> stream that overrides skip so that it uses read and advice people to use
> that if they use special streams?

+1

> For Compress I really don't want to stop using skip as we potentially
> use it to skip larger chunks of uncompressed archives when moving ahead
> to the next entry without wanting to read the current one and reading
> will mean a significant slowdown for everybody in order to avoid
> problems in a few corner cases.
>
> Stefan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [compress][io] Avoid InputStream#skip?

Torsten Curdt-3
+1

On Tue, May 1, 2018 at 12:59 PM, sebb <[hidden email]> wrote:

> On 1 May 2018 at 11:24, Stefan Bodewig <[hidden email]> wrote:
> > Hi all
> >
> > as skip throwing exception is uncommon enough that it hasn't been
> > reported before in Compress and only seems to happen if you use streams
> > like System.in maybe we can solve the issue by providing a wrapper
> > stream that overrides skip so that it uses read and advice people to use
> > that if they use special streams?
>
> +1
>
> > For Compress I really don't want to stop using skip as we potentially
> > use it to skip larger chunks of uncompressed archives when moving ahead
> > to the next entry without wanting to read the current one and reading
> > will mean a significant slowdown for everybody in order to avoid
> > problems in a few corner cases.
> >
> > Stefan
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [compress][io] Avoid InputStream#skip?

Matt Sicker
Speaking of rarely used APIs in java.io, would that also include
mark/available/reset? Or are those all implemented in terms of skip()?

On 1 May 2018 at 06:05, Torsten Curdt <[hidden email]> wrote:

> +1
>
> On Tue, May 1, 2018 at 12:59 PM, sebb <[hidden email]> wrote:
>
> > On 1 May 2018 at 11:24, Stefan Bodewig <[hidden email]> wrote:
> > > Hi all
> > >
> > > as skip throwing exception is uncommon enough that it hasn't been
> > > reported before in Compress and only seems to happen if you use streams
> > > like System.in maybe we can solve the issue by providing a wrapper
> > > stream that overrides skip so that it uses read and advice people to
> use
> > > that if they use special streams?
> >
> > +1
> >
> > > For Compress I really don't want to stop using skip as we potentially
> > > use it to skip larger chunks of uncompressed archives when moving ahead
> > > to the next entry without wanting to read the current one and reading
> > > will mean a significant slowdown for everybody in order to avoid
> > > problems in a few corner cases.
> > >
> > > Stefan
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [hidden email]
> > > For additional commands, e-mail: [hidden email]
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>



--
Matt Sicker <[hidden email]>