[COMPRESS] Anyone implemented "pigz"?

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[COMPRESS] Anyone implemented "pigz"?

Roger Whitcomb
Someone here was doing benchmarks using "pigz" (see here: http://zlib.net/pigz/, basically multi-threaded "gzip") and I couldn't find any "reasonable" Java implementations.  Anyone thought about it for Commons Compress?

Thanks,
Roger Whitcomb
Reply | Threaded
Open this post in threaded view
|

Re: [COMPRESS] Anyone implemented "pigz"?

garydgregory
Reply | Threaded
Open this post in threaded view
|

Re: [COMPRESS] Anyone implemented "pigz"?

sebb-2-2
AFAICT the implementation is written in C and uses some C libraries.

It would have to be completely rewritten for Java.
Not a trivial job, though it may be possible to use the algorithm.

On 10 May 2017 at 01:03, Gary Gregory <[hidden email]> wrote:

> I've not heard of it on the ML yet. Go for it! ;-)
>
> Gary
>
> On Tue, May 9, 2017 at 4:44 PM, Roger Whitcomb <[hidden email]>
> wrote:
>
>> Someone here was doing benchmarks using "pigz" (see here:
>> http://zlib.net/pigz/, basically multi-threaded "gzip") and I couldn't
>> find any "reasonable" Java implementations.  Anyone thought about it for
>> Commons Compress?
>>
>> Thanks,
>> Roger Whitcomb
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [COMPRESS] Anyone implemented "pigz"?

garydgregory
I think the question is can/should [Compress] use any of the stock code
in java.util.zip in a multi-threaded fashion for performance gains.

Gary

On Tue, May 9, 2017 at 5:22 PM, sebb <[hidden email]> wrote:

> AFAICT the implementation is written in C and uses some C libraries.
>
> It would have to be completely rewritten for Java.
> Not a trivial job, though it may be possible to use the algorithm.
>
> On 10 May 2017 at 01:03, Gary Gregory <[hidden email]> wrote:
> > I've not heard of it on the ML yet. Go for it! ;-)
> >
> > Gary
> >
> > On Tue, May 9, 2017 at 4:44 PM, Roger Whitcomb <
> [hidden email]>
> > wrote:
> >
> >> Someone here was doing benchmarks using "pigz" (see here:
> >> http://zlib.net/pigz/, basically multi-threaded "gzip") and I couldn't
> >> find any "reasonable" Java implementations.  Anyone thought about it for
> >> Commons Compress?
> >>
> >> Thanks,
> >> Roger Whitcomb
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
E-Mail: [hidden email] | [hidden email]
Java Persistence with Hibernate, Second Edition
<https://www.amazon.com/gp/product/1617290459/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1617290459&linkCode=as2&tag=garygregory-20&linkId=cadb800f39946ec62ea2b1af9fe6a2b8>

<http:////ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1617290459>
JUnit in Action, Second Edition
<https://www.amazon.com/gp/product/1935182021/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1935182021&linkCode=as2&tag=garygregory-20&linkId=31ecd1f6b6d1eaf8886ac902a24de418%22>

<http:////ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1935182021>
Spring Batch in Action
<https://www.amazon.com/gp/product/1935182951/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1935182951&linkCode=%7B%7BlinkCode%7D%7D&tag=garygregory-20&linkId=%7B%7Blink_id%7D%7D%22%3ESpring+Batch+in+Action>
<http:////ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1935182951>
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory
Reply | Threaded
Open this post in threaded view
|

Re: [COMPRESS] Anyone implemented "pigz"?

Matt Sicker
In reply to this post by sebb-2-2
Those C libraries are pthread (don't need that in Java as it has its own
thread API) and zlib (pretty standard gz library). With that in mind, this
may be a useful reference: http://www.jcraft.com/jzlib/

On 9 May 2017 at 19:22, sebb <[hidden email]> wrote:

> AFAICT the implementation is written in C and uses some C libraries.
>
> It would have to be completely rewritten for Java.
> Not a trivial job, though it may be possible to use the algorithm.
>
> On 10 May 2017 at 01:03, Gary Gregory <[hidden email]> wrote:
> > I've not heard of it on the ML yet. Go for it! ;-)
> >
> > Gary
> >
> > On Tue, May 9, 2017 at 4:44 PM, Roger Whitcomb <
> [hidden email]>
> > wrote:
> >
> >> Someone here was doing benchmarks using "pigz" (see here:
> >> http://zlib.net/pigz/, basically multi-threaded "gzip") and I couldn't
> >> find any "reasonable" Java implementations.  Anyone thought about it for
> >> Commons Compress?
> >>
> >> Thanks,
> >> Roger Whitcomb
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
Matt Sicker <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: [COMPRESS] Anyone implemented "pigz"?

Stefan Bodewig
In reply to this post by garydgregory
On 2017-05-10, Gary Gregory wrote:

> I think the question is can/should [Compress] use any of the stock code
> in java.util.zip in a multi-threaded fashion for performance gains.

We rely on java.util.zip.Deflater for DEFLATE which isn't thread safe by
itself.

But we could implement the same strategy pigz uses, which is to break up
the stream into chunks and work on the chunks in parallel. Combining the
output of several streams may become tricky using the Java API.

If my first read of the comments in
https://github.com/madler/pigz/blob/master/pigz.c is correct then we'd
need to manipulate the output of Deflater in order to strip headers and
trailers and insert empty stored blocks as well as create headers and
trailers of our own for the combined output.

In theory we could implement something like pigz on top of the LZ77
support I've added for Snappy and LZ4 (and some additional Hufmann code
yet to be written) but it would be slower than zlib - probably a lot -
and likely eat up the speed gain provided by parallel processing.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [COMPRESS] Anyone implemented "pigz"?

Matt Sicker
Would the scattering and gathering byte channel APIs in java.nio be helpful
in splitting up a stream into chunks for parallel processing?

On 10 May 2017 at 02:57, Stefan Bodewig <[hidden email]> wrote:

> On 2017-05-10, Gary Gregory wrote:
>
> > I think the question is can/should [Compress] use any of the stock code
> > in java.util.zip in a multi-threaded fashion for performance gains.
>
> We rely on java.util.zip.Deflater for DEFLATE which isn't thread safe by
> itself.
>
> But we could implement the same strategy pigz uses, which is to break up
> the stream into chunks and work on the chunks in parallel. Combining the
> output of several streams may become tricky using the Java API.
>
> If my first read of the comments in
> https://github.com/madler/pigz/blob/master/pigz.c is correct then we'd
> need to manipulate the output of Deflater in order to strip headers and
> trailers and insert empty stored blocks as well as create headers and
> trailers of our own for the combined output.
>
> In theory we could implement something like pigz on top of the LZ77
> support I've added for Snappy and LZ4 (and some additional Hufmann code
> yet to be written) but it would be slower than zlib - probably a lot -
> and likely eat up the speed gain provided by parallel processing.
>
> Stefan
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
Matt Sicker <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

RE: [COMPRESS] Anyone implemented "pigz"?

Roger Whitcomb
In reply to this post by garydgregory
Exactly.

-----Original Message-----
From: Gary Gregory [mailto:[hidden email]]
Sent: Tuesday, May 09, 2017 5:29 PM
To: Commons Developers List <[hidden email]>
Subject: Re: [COMPRESS] Anyone implemented "pigz"?

I think the question is can/should [Compress] use any of the stock code in java.util.zip in a multi-threaded fashion for performance gains.

Gary

On Tue, May 9, 2017 at 5:22 PM, sebb <[hidden email]> wrote:

> AFAICT the implementation is written in C and uses some C libraries.
>
> It would have to be completely rewritten for Java.
> Not a trivial job, though it may be possible to use the algorithm.
>
> On 10 May 2017 at 01:03, Gary Gregory <[hidden email]> wrote:
> > I've not heard of it on the ML yet. Go for it! ;-)
> >
> > Gary
> >
> > On Tue, May 9, 2017 at 4:44 PM, Roger Whitcomb <
> [hidden email]>
> > wrote:
> >
> >> Someone here was doing benchmarks using "pigz" (see here:
> >> http://zlib.net/pigz/, basically multi-threaded "gzip") and I
> >> couldn't find any "reasonable" Java implementations.  Anyone
> >> thought about it for Commons Compress?
> >>
> >> Thanks,
> >> Roger Whitcomb
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>


--
E-Mail: [hidden email] | [hidden email] Java Persistence with Hibernate, Second Edition <https://www.amazon.com/gp/product/1617290459/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1617290459&linkCode=as2&tag=garygregory-20&linkId=cadb800f39946ec62ea2b1af9fe6a2b8>

<http:////ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1617290459>
JUnit in Action, Second Edition
<https://www.amazon.com/gp/product/1935182021/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1935182021&linkCode=as2&tag=garygregory-20&linkId=31ecd1f6b6d1eaf8886ac902a24de418%22>

<http:////ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1935182021>
Spring Batch in Action
<https://www.amazon.com/gp/product/1935182951/ref=as_li_tl?ie=UTF8&camp=1789&creative=9325&creativeASIN=1935182951&linkCode=%7B%7BlinkCode%7D%7D&tag=garygregory-20&linkId=%7B%7Blink_id%7D%7D%22%3ESpring+Batch+in+Action>
<http:////ir-na.amazon-adsystem.com/e/ir?t=garygregory-20&l=am2&o=1&a=1935182951>
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: [COMPRESS] Anyone implemented "pigz"?

Stefan Bodewig
In reply to this post by Matt Sicker
On 2017-05-10, Matt Sicker wrote:

> Would the scattering and gathering byte channel APIs in java.nio be helpful
> in splitting up a stream into chunks for parallel processing?

Possibly. pigz breaks up the stream into chunks of 128k and using the
scattering part we should be able to do the same. I'm not so sure about
gathering as we'd need to massage the individual outputs and create new
overall headers and trailers.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]