[FILEUPLOAD] RfC: Proposed API changes for streaming

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

[FILEUPLOAD] RfC: Proposed API changes for streaming

jochen-2

Hi,

first of all, let me say thanks for giving me the possibility to contribute
to the project as a committer. Let's hope, I'll be doing fine. ;-)

As already written, the changes required for streaming are nontrivial. I'd
like to discuss them even if I were an experienced project member. Which I
am not, so please be alert even more.

The main reason for API changes is, that the existing interface FileItem is
to powerful. Therefore I propose to introduce a new interface, let's call it
SlimFileItem, or StreamingFileItem, with the following signature:

    public interface SlimFileItem extends Serializable {
        InputStream getInputStream() throws IOException;
        String getContentType();
        String getName();
        String getFieldName();
        boolean isFormField();
    }

As you can see, the FileItem can be changed to extend SlimFileItem.
Additionally, FileItem has a semantic difference: It allows to invoke
getInputStream() more than once, while SlimFileItem should throw an
IllegalStateException on the second invocation. So far, these changes should
not introduce a problem.

The SlimFileItem is not to be created by a factory: The need for a factory
is mainly caused by the fact, that the FileItem's data needs to be written.
SlimFileItem is best implemented as an inner class of the MultipartStream.

So the API changes can be reduced to an additional method

    Iterator /* SlimFileItem */ getItemIterator(RequestContext ctx)
      throws FileUploadException;

in FileUploadBase. The existing method

    List parseRequest(RequestContext ctx) throws FileUploadException;

would be changed to use getItemIterator internally and convert the
SlimFileItem instances into FileItem.

Finally, a suggestion for simplification: Let's change the
FileUploadException to extend IOException, and not Exception. It will avoid
a lot of casting, type checking, and conversions. Besides, it also makes
sense in the semantic level. At least, I do thing so.

The only argument I can see against the exception change is the following:
People will see compiler errors when using constructs like

    try {
       ...
    } catch (IOException e) {
       ...
    } catch (FileUploadException e) {
       ...
    }

The obvious solution would be to change the order. IMO, that's acceptable.


Regards for any comments,

Jochen

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [FILEUPLOAD] RfC: Proposed API changes for streaming

Martin Cooper-3
On 6/9/06, Jochen Wiedmann <[hidden email]> wrote:
>
>
> Hi,
>
> first of all, let me say thanks for giving me the possibility to
> contribute
> to the project as a committer. Let's hope, I'll be doing fine. ;-)


Let's hope. ;-) Please add youself to the list of developers in the
project.xml file.

As already written, the changes required for streaming are nontrivial.


This looks interesting. However, given the non-trivial nature of the
changes, and the "stylistic" effect on the way FileUpload works, I wonder if
this wouldn't be better incorporated into the FileUpload 2 design, rather
than being retrofitted into FileUpload 1.x.

Sadly, there is no one place where the goals for FileUpload 2 are written
down right now. Some are in existing enhancement requests, some are
mentioned in random old e-mail threads, and some are still just in my head.
Now that there are two of us, I guess I'd better try to find some time to
write things down. ;-)

I'd
> like to discuss them even if I were an experienced project member. Which I
> am not, so please be alert even more.


I've had coffee. I'm alert. ;-)

The main reason for API changes is, that the existing interface FileItem is
> to powerful.


By "too powerful", you mean that it tackles two tasks - providing an
interface to the incoming data, and implicitly providing storage for it.
That's true, and if we want to separate those functions, then perhaps that
calls for a more radical separation than you are suggesting.

Therefore I propose to introduce a new interface, let's call it
> SlimFileItem, or StreamingFileItem, with the following signature:


I don't like "Slim". "Stream" or "Streaming" would be better. But I'm
wondering if the relationship between this and the existing notion of a
FileItem should be so direct. I'm really thinking off the top of my head
here, and haven't thought it through, though, so I don't have any more
concrete ideas at this point. It's worth some thought, though.

    public interface SlimFileItem extends Serializable {

>         InputStream getInputStream() throws IOException;
>         String getContentType();
>         String getName();
>         String getFieldName();
>         boolean isFormField();
>     }
>
> As you can see, the FileItem can be changed to extend SlimFileItem.
> Additionally, FileItem has a semantic difference: It allows to invoke
> getInputStream() more than once, while SlimFileItem should throw an
> IllegalStateException on the second invocation. So far, these changes
> should
> not introduce a problem.


Well, I guess I don't agree here. If FileItem extended  SimFileItem, then I
could pass a FileItem to methods that accept a SlimFileItem. At that point,
that SlimFileItem may or may not throw an exception if getInputStream is
called more than once. That's bad API behaviour.

The SlimFileItem is not to be created by a factory: The need for a factory
> is mainly caused by the fact, that the FileItem's data needs to be
> written.


Sort of. It's really that FileItem is accomplishing two different tasks, as
I noted above.

SlimFileItem is best implemented as an inner class of the MultipartStream.


Hmm. Not if you expect to expose SlimFileItem as part of the public API,
which you are suggesting with the getItemIterator method below.
Multipartstream isn't (supposed to be) part of the public API, so exposing
an inner class of it through the public API is a no-no.

So the API changes can be reduced to an additional method

>
>     Iterator /* SlimFileItem */ getItemIterator(RequestContext ctx)
>       throws FileUploadException;
>
> in FileUploadBase. The existing method
>
>     List parseRequest(RequestContext ctx) throws FileUploadException;
>
> would be changed to use getItemIterator internally and convert the
> SlimFileItem instances into FileItem.
>
> Finally, a suggestion for simplification: Let's change the
> FileUploadException to extend IOException, and not Exception.


No. A FileUploadException isn't always related to IO, so extending
IOException would simply be wrong.

--
Martin Cooper


It will avoid

> a lot of casting, type checking, and conversions. Besides, it also makes
> sense in the semantic level. At least, I do thing so.
>
> The only argument I can see against the exception change is the following:
> People will see compiler errors when using constructs like
>
>     try {
>        ...
>     } catch (IOException e) {
>        ...
>     } catch (FileUploadException e) {
>        ...
>     }
>
> The obvious solution would be to change the order. IMO, that's acceptable.
>
>
> Regards for any comments,
>
> Jochen
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [FILEUPLOAD] RfC: Proposed API changes for streaming

jochen-2
Martin Cooper wrote:

> Let's hope. ;-) Please add youself to the list of developers in the
> project.xml file.

Done.


> This looks interesting. However, given the non-trivial nature of the
> changes, and the "stylistic" effect on the way FileUpload works, I
> wonder if
> this wouldn't be better incorporated into the FileUpload 2 design, rather
> than being retrofitted into FileUpload 1.x.

I am open for both. In either way, we can start in a branch. That leaves
everything open, doesn't it? I have one concern, though: Please let's try to
have a at least a beta release within short time, say three months?


> Well, I guess I don't agree here. If FileItem extended  SimFileItem, then I
> could pass a FileItem to methods that accept a SlimFileItem. At that point,
> that SlimFileItem may or may not throw an exception if getInputStream is
> called more than once. That's bad API behaviour.

Personally I have no problem with my suggestion: If a method would accept a
StreamingFileItem, then that method should not expect, that getInputStream()
cannot be used more than once, so that's okay.

However, let's say that FileItem doesn't extend StreamingFileItem. Fine with me.


> SlimFileItem is best implemented as an inner class of the MultipartStream.
>
>
> Hmm. Not if you expect to expose SlimFileItem as part of the public API,
> which you are suggesting with the getItemIterator method below.

That's something I do not understand. The StreamingFileItem is an interface,
which is exposed. What's the problem, if the implementation is an inner class?


> Multipartstream isn't (supposed to be) part of the public API, so exposing
> an inner class of it through the public API is a no-no.

That's good to know. I get you right, that this means we may change that
classes API, may we?


> No. A FileUploadException isn't always related to IO, so extending
> IOException would simply be wrong.

Ok.


Attached you find a proposed patch, that takes a first step: The
MultipartStream receives an inner class ItemInputStream. This inner class is
used by readBodyData. (Obviously, the same InputStream will later be
returned by the StreamingFileItem.)


Jochen

Index: /home/jwi/workspace/commons-fileupload/src/java/org/apache/commons/fileupload/MultipartStream.java
===================================================================
--- /home/jwi/workspace/commons-fileupload/src/java/org/apache/commons/fileupload/MultipartStream.java (revision 413075)
+++ /home/jwi/workspace/commons-fileupload/src/java/org/apache/commons/fileupload/MultipartStream.java (working copy)
@@ -496,64 +496,22 @@
     public int readBodyData(OutputStream output)
         throws MalformedStreamException,
                IOException {
-        boolean done = false;
-        int pad;
-        int pos;
-        int bytesRead;
-        int total = 0;
-        while (!done) {
-            // Is boundary token present somewere in the buffer?
-            pos = findSeparator();
-            if (pos != -1) {
-                // Write the rest of the data before the boundary.
+        final ItemInputStream istream = new ItemInputStream();
+        final byte[] bytes = new byte[8192];
+        for (;;) {
+            int res = istream.read(bytes);
+            if (res == -1) {
                 if (output != null) {
-                    output.write(buffer, head, pos - head);
+                    output.flush();
                 }
-                total += pos - head;
-                head = pos;
-                done = true;
-            } else {
-                // Determine how much data should be kept in the
-                // buffer.
-                if (tail - head > keepRegion) {
-                    pad = keepRegion;
-                } else {
-                    pad = tail - head;
-                }
-                // Write out the data belonging to the body-data.
+                return (int) istream.getBytesRead();
+            }
+            if (res > 0  &&  output != null) {
                 if (output != null) {
-                    output.write(buffer, head, tail - head - pad);
-                }
-
-                // Move the data to the beginning of the buffer.
-                total += tail - head - pad;
-                System.arraycopy(buffer, tail - pad, buffer, 0, pad);
-
-                // Refill buffer with new data.
-                head = 0;
-                bytesRead = input.read(buffer, pad, bufSize - pad);
-
-                // [pprrrrrrr]
-                if (bytesRead != -1) {
-                    tail = pad + bytesRead;
-                } else {
-                    // The last pad amount is left in the buffer.
-                    // Boundary can't be in there so write out the
-                    // data you have and signal an error condition.
-                    if (output != null) {
-                        output.write(buffer, 0, pad);
-                        output.flush();
-                    }
-                    total += pad;
-                    throw new MalformedStreamException(
-                            "Stream ended unexpectedly");
+                    output.write(bytes, 0, res);
                 }
             }
         }
-        if (output != null) {
-            output.flush();
-        }
-        return total;
     }
 
 
@@ -748,6 +706,122 @@
         }
     }
 
+    /**
+     * An {@link InputStream} for reading an items contents.
+     */
+    public class ItemInputStream extends InputStream {
+        private long total;
+        private int pad, pos;
+
+        ItemInputStream() {
+            findSeparator();
+        }
+
+        private void findSeparator() {
+            pos = MultipartStream.this.findSeparator();
+            if (pos == -1) {
+                if (tail - head > keepRegion) {
+                    pad = keepRegion;
+                } else {
+                    pad = tail - head;
+                }
+            }
+        }
+
+        /** Returns the number of bytes, which have been read
+         * by the stream.
+         */
+        public long getBytesRead() {
+            return total;
+        }
+
+        public int available() throws IOException {
+            if (pos == -1) {
+                return tail - head - pad;
+            } else {
+                return pos - head;
+            }
+        }
+
+        public int read() throws IOException {
+            if (available() == 0) {
+                if (makeAvailable() == 0) {
+                    return -1;
+                }
+            }
+            ++total;
+            int b = buffer[head++];
+            return b >= 0 ? b : b + 256;
+        }
+
+        public int read(byte[] b, int off, int len) throws IOException {
+            if (len == 0) {
+                return 0;
+            }
+            int res = available();
+            if (res == 0) {
+                res = makeAvailable();
+                if (res == 0) {
+                    return -1;
+                }
+            }
+            res = Math.min(res, len);
+            System.arraycopy(buffer, head, b, off, res);
+            head += res;
+            total += res;
+            return res;
+        }
+
+        public void close() throws IOException {
+            for (;;) {
+                int av = available();
+                if (av == 0) {
+                    av = makeAvailable();
+                    if (av == 0) {
+                        break;
+                    }
+                }
+                skip(av);
+            }
+        }
+
+        public long skip(long bytes) throws IOException {
+            int av = available();
+            if (av == 0) {
+                av = makeAvailable();
+                if (av == 0) {
+                    return 0;
+                }
+            }
+            long res = Math.min(av, bytes);
+            head += res;
+            return res;
+        }
+
+        private int makeAvailable() throws IOException {
+            if (pos != -1) {
+                return 0;
+            }
+
+            // Move the data to the beginning of the buffer.
+            total += tail - head - pad;
+            System.arraycopy(buffer, tail - pad, buffer, 0, pad);
+
+            // Refill buffer with new data.
+            head = 0;
+            int bytesRead = input.read(buffer, pad, bufSize - pad);
+            if (bytesRead == -1) {
+                // The last pad amount is left in the buffer.
+                // Boundary can't be in there so signal an error
+                // condition.
+                throw new MalformedStreamException(
+                        "Stream ended unexpectedly");
+            }
+            tail = pad + bytesRead;
+            findSeparator();
+            return available();
+        }
+    }
 
     // ------------------------------------------------------ Debugging methods
 
Index: /home/jwi/workspace/commons-fileupload/src/test/org/apache/commons/fileupload/SizesTest.java
===================================================================
--- /home/jwi/workspace/commons-fileupload/src/test/org/apache/commons/fileupload/SizesTest.java (revision 0)
+++ /home/jwi/workspace/commons-fileupload/src/test/org/apache/commons/fileupload/SizesTest.java (revision 0)
@@ -0,0 +1,81 @@
+/*
+ * Copyright 2001-2004 The Apache Software Foundation
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.commons.fileupload;
+
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.util.Iterator;
+import java.util.List;
+import javax.servlet.http.HttpServletRequest;
+import junit.framework.TestCase;
+
+
+/**
+ * Unit test for items with varying sizes.
+ */
+public class SizesTest extends TestCase
+{
+    public void testFileUpload()
+            throws IOException, FileUploadException
+    {
+        ByteArrayOutputStream baos = new ByteArrayOutputStream();
+        int add = 16;
+        int num = 0;
+        for (int i = 0;  i < 16384;  i += add) {
+            if (++add == 32) {
+                add = 16;
+            }
+            String header = "-----1234\r\n"
+                + "Content-Disposition: form-data; name=\"field" + (num++) + "\"\r\n"
+                + "\r\n";
+            baos.write(header.getBytes("US-ASCII"));
+            for (int j = 0;  j < i;  j++) {
+                baos.write((byte) j);
+            }
+            baos.write("\r\n".getBytes("US-ASCII"));
+        }
+        baos.write("-----1234--\r\n".getBytes("US-ASCII"));
+
+        List fileItems = parseUpload(baos.toByteArray());
+        Iterator fileIter = fileItems.iterator();
+        add = 16;
+        num = 0;
+        for (int i = 0;  i < 16384;  i += add) {
+            if (++add == 32) {
+                add = 16;
+            }
+            FileItem item = (FileItem) fileIter.next();
+            assertEquals("field" + (num++), item.getFieldName());
+            byte[] bytes = item.get();
+            assertEquals(i, bytes.length);
+            for (int j = 0;  j < i;  j++) {
+                assertEquals((byte) j, bytes[j]);
+            }
+        }
+        assertTrue(!fileIter.hasNext());
+    }
+
+    private List parseUpload(byte[] bytes) throws FileUploadException {
+        String contentType = "multipart/form-data; boundary=---1234";
+
+        FileUploadBase upload = new DiskFileUpload();
+        HttpServletRequest request = new MockHttpServletRequest(bytes, contentType);
+
+        List fileItems = upload.parseRequest(request);
+        return fileItems;
+    }
+
+}


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

RE: [FILEUPLOAD] RfC: Proposed API changes for streaming

James.Ring
In reply to this post by jochen-2
Hi there,

I haven't had much experience with FileUpload, so forgive me if my question
is way off...

> -----Original Message-----
> From: Jochen Wiedmann [mailto:[hidden email]]
> Sent: Saturday, 10 June 2006 6:05 AM
> To: [hidden email]
> Subject: [FILEUPLOAD] RfC: Proposed API changes for streaming
>
>
>
> Hi,
>
> As you can see, the FileItem can be changed to extend SlimFileItem.
> Additionally, FileItem has a semantic difference: It allows to invoke
> getInputStream() more than once, while SlimFileItem should throw an
> IllegalStateException on the second invocation. So far, these
> changes should
> not introduce a problem.
>

Why do you want to throw an IllegalStateException on subsequent invocations
of getInputStream()?

> Jochen

Regards,
James

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [FILEUPLOAD] RfC: Proposed API changes for streaming

jochen-2
[hidden email] wrote:

> Why do you want to throw an IllegalStateException on subsequent invocations
> of getInputStream()?

Because the nature of streaming implies, that the data can be returned only
once (as opposed to the FileItem where getInputStream() may be called as
often as you like, each time returning the same data).

Throwing an exception helps users to understand immediately, what is wrong.
Otherwise, they might possible be forced to investigate a subtle problem
without any help.


Jochen

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: [FILEUPLOAD] RfC: Proposed API changes for streaming

James.Ring
In reply to this post by jochen-2
Hi Jochen,

> -----Original Message-----
> From: Jochen Wiedmann [mailto:[hidden email]]
> Sent: Thursday, 15 June 2006 10:49 PM
> To: Jakarta Commons Developers List
> Subject: Re: [FILEUPLOAD] RfC: Proposed API changes for streaming
>
> Because the nature of streaming implies, that the data can be
> returned only
> once (as opposed to the FileItem where getInputStream() may
> be called as
> often as you like, each time returning the same data).

As I understand it, by the time you call getInputStream(), the user's POST
request is already entirely in the server's memory space, or it has been
written to disk. This data isn't consumed when you read() it, so why can't
you get another InputStream over it?
 
Forgive me if my understanding is not correct!

>
> Jochen

Regards,
James

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [FILEUPLOAD] RfC: Proposed API changes for streaming

jochen-2
[hidden email] wrote:

> As I understand it, by the time you call getInputStream(), the user's POST
> request is already entirely in the server's memory space, or it has been
> written to disk. This data isn't consumed when you read() it, so why can't
> you get another InputStream over it?

No. At the time when getInputStream() is called, the request data is read up
to the beginning of the file, or form field. The InputStream returns the
file, or form field, after which the request data is positioned at the end
of the file, or form field.


Jochen


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [FILEUPLOAD] RfC: Proposed API changes for streaming

Jerome Lacoste-3
On 6/16/06, Jochen Wiedmann <[hidden email]> wrote:

> [hidden email] wrote:
>
> > As I understand it, by the time you call getInputStream(), the user's POST
> > request is already entirely in the server's memory space, or it has been
> > written to disk. This data isn't consumed when you read() it, so why can't
> > you get another InputStream over it?
>
> No. At the time when getInputStream() is called, the request data is read up
> to the beginning of the file, or form field. The InputStream returns the
> file, or form field, after which the request data is positioned at the end
> of the file, or form field.

(Caution I am not familiar with the original code. Just read that thread).

To me it sounds like SlimFileItem is implying that getInputStream() is
doing some kind of iterator side effect.

First if kept like this, it probably shouldn't be a getter, maybe
inputStream(). Also it makes wonder if the separation between the
iterator and the iterator item is done at the right place. Think about
JDBC or Collection. This proposal is not intuitive.



Second "while SlimFileItem should throw an IllegalStateException on
the second invocation".

"Should throw" as part of an interface implies that FileItem does not
respect the spec of its base class. "May throw" sounds better.

Also the above sentence sounds classish not interfaceish (i.e. reading
it makes it sound like SlimFileItem is a class not an interface).



Finally, "FileBaseUpload would be changed to use getItemIterator
internally and convert the SlimFileItem instances into FileItem."

I don't particularly like down castings. That ties the class to the
particular implementation. What does this hide?

Cheers,

Jerome

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]