FtpClient.Upload using FtpDataStream.ReceiveBufferSize

Apr 9, 2012 at 6:54 PM

I'm trying to track down an issue where the FtpClient is hanging when the network/ftp stream is referenced after all data for a large file has been uploaded, and I've run across this.

In FtpClient.Upload(Stream istream, FtpFile remote, FtpDataType datatype, long rest), FtpDataStream.ReceiveBufferSize is being used instead of FtpDataStream.SendBufferSize.  Is there any reason for this?

Coordinator
Apr 9, 2012 at 7:10 PM
Edited Apr 9, 2012 at 7:11 PM

I believe that's a bug, it should be using the SendBufferSize. What server software are you uploading to? The hang could be happening while the client is waiting on the server to respond after the data channel is closed. I doubt the hang is related to the buffer size, most likely the client code is just waiting for the server to acknowledge that it's received the file.

Coordinator
Apr 9, 2012 at 7:45 PM

I can't reproduce the hang. Please log the FTP transaction (passwords should be omitted but double check before posting anything) and post it here. Include (if possible) the OS + Version and FTP Server Software + Version. See the FtpLogStream property of the FtpClient object for setting up a file stream to log to.

Apr 9, 2012 at 7:56 PM

I'm uploading to a Windows Server 2003 machine, using the built-in FTP server (IIS 6.0).  I believe I also encountered this same problem sending the large file to a linux server running vsftpd, but I could double-check this if you want me to.

Here's what was sent to the output while debugging in Visual Studio:

> 220-Microsoft FTP Service
> 220 ed2c Web FTP
< AUTH TLS
> 500 'AUTH TLS': command not understood
< AUTH SSL
> 500 'AUTH SSL': command not understood
< USER backup
> 331 Password required for backup.
< PASS [omitted for security]
> 230-Welcome to the ed2c FTP server!
> 230 User backup logged in.
< CWD 
> 250 CWD command successful.
< PWD
> 257 "/backup" is current directory.
< TYPE I
> 200 Type set to I.
< EPSV
> 500 'EPSV': command not understood
< FEAT
> 211-FEAT
>     SIZE
>     MDTM
> 211 END
< PASV
> 227 Entering Passive Mode (99,190,188,74,19,140).
< STOR /backup/Subversion.dump.rar.enc
> 125 Data connection already open; Transfer starting.

I have verified that, while the upload completes (all data is sent, and the file uploaded is valid), the app hangs in FtpControlConnection.ReadLine(), when it calls "this.StreamReader.ReadLine()".
Going back up the stack trace a little, the real spot that needs to be looked at is in FtpDataStream.Close(). The app hangs on the line "if (this.ControlConnection.ResponseStatus && !this.ControlConnection.ReadResponse()) {". The XML comments for the ReadResponse() method actually mention that it should not be called unless a command has been sent. I guess that a command was sent (STOR), but it may be freezing because it took so long (~45 minutes) to upload the large file that the server "forgot" that a command actually was sent?

Coordinator
Apr 9, 2012 at 9:12 PM

The control connection isn't supposed to be closed due to inactivity during a transfer however that might be what's happening. I'm working a hack now to enable a keep-alive operation during transfers. I've got it implemented in FtpClient's upload method however it needs to be implemented in FtpDataStream so that it works with all the various ways there are to upload and download files. I will let you know via this forum when it's ready and you can give that a shot to see if it clears it up. It's an unsupported feature (not outlined in RFC959 that I'm aware of) so it will be whatever it is, I can't guarantee it won't cause problems elsewhere.

Coordinator
Apr 9, 2012 at 9:31 PM

Alright, revision 653152fdea2d has a new KeepAliveInterval property in FtpControlConnection / FtpClient. Set it to a non-zero value (I recommend 15) to have the client code attempt to keep the control connection active during large transfers. Let me know the results.

Thanks,

J.P.

Apr 9, 2012 at 9:50 PM

OK, I've downloaded the latest version and will let you know how the test goes when I get a chance to complete it (it does take a while, after all, since I have to send a larger file (200+ mb) to test it).

FYI, I did another test with the linux server, which uses vsftpd, and it worked as intended:

> 220 Welcome to the FTP service.
< AUTH TLS
> 530 Please login with USER and PASS.
< AUTH SSL
> 530 Please login with USER and PASS.
< USER anomaly
> 331 Please specify the password.
< PASS [omitted for security]
> 230 Login successful.
< CWD ed2c/
> 250 Directory successfully changed.
< PWD
> 257 "/mnt/files/public/ed2c"
< TYPE I
> 200 Switching to Binary mode.
< EPSV
> 229 Entering Extended Passive Mode (|||48667|).
< STOR /mnt/files/public/ed2c/Subversion.dump.rar.enc
> 150 Ok to send data.
> 226 Transfer complete.
< QUIT
> 221 Goodbye.
Apr 9, 2012 at 10:29 PM

I tried specifying a KeepAliveInterval, and unfortunately it didn't work.  In fact, the first time the NOOP was sent (as noted in the output window), the data transfer stopped, and the application hangs.  Eventually, on the server side, since no more data is coming in, the connection times out, but the app thinks it's still open.

So, I guess there's something going where the NOOP interrupts the data being sent.

Coordinator
Apr 9, 2012 at 10:39 PM

That sounds about right, RFC959 doesn't define this behavior as far as I know so it's really unpredictable how the server will handle executing commands while a transfer is in progress. A google search earlier revealed several posts about IIS6 FTP and the server 2003 firewall causing connection timeouts. I couldn't tell if it is was a passive port issue or what but you may try disabling windows firewall on the server or you can use a different server software. I use FileZilla for FTP serving on windows. I know it works, I wrote most of the code against FileZilla first and tested others later. It will be tomorrow at the earliest before I can setup a server 2003 system with IIS6 FTP to try to debug the problem myself.

Apr 10, 2012 at 1:54 PM

FYI, if you're going to keep the keep-alive hack, you'll need to add in a check for KeepAliveInterval==0:

if (DateTime.Now.Subtract(this.LastNoOp).Seconds >= this.ControlConnection.KeepAliveInterval) {

The above is always true if KeepAliveInterval is zero, the default.

Coordinator
Apr 10, 2012 at 2:12 PM

Thanks, the latest revision includes this fix! Sometimes the little things are the worst mistakes.

Apr 12, 2012 at 4:54 PM

OK, I think I've made some progress on identifying what can be done to gracefully handle this problem, which seems to occur with Windows Server/IIS ftp servers.  What I did was, since I knew that FileZilla can transfer it "successfully", I tried it with the most debug info possible turned on.  Below is what FileZilla did to handle it:

...<snip> connects to server, does directory listing, etc </snip>...

2012-04-12 11:33:10 10696 1 Command: STOR Subversion.dump.rar.enc
2012-04-12 11:33:10 10696 1 Trace: CTransferSocket::OnConnect
2012-04-12 11:33:10 10696 1 Trace: CTransferSocket::OnReceive(), m_transferMode=1
2012-04-12 11:33:10 10696 1 Trace: First EAGAIN in CTransferSocket::OnSend()
2012-04-12 11:33:10 10696 1 Trace: CFtpControlSocket::OnReceive()
2012-04-12 11:33:10 10696 1 Response: 125 Data connection already open; Transfer starting.
2012-04-12 11:33:10 10696 1 Trace: CFtpControlSocket::TransferParseResponse()
2012-04-12 11:33:10 10696 1 Trace:   code = 1
2012-04-12 11:33:10 10696 1 Trace:   state = 4
2012-04-12 11:33:10 10696 1 Trace: CFtpControlSocket::SendNextCommand()
2012-04-12 11:33:10 10696 1 Trace: CFtpControlSocket::TransferSend()
2012-04-12 11:33:10 10696 1 Trace:   state = 5
2012-04-12 11:33:11 10696 1 Trace: Made progress in CTransferSocket::OnSend()

...<snip> nothing else logged here, this is where the file sends (takes about 45 minutes) </snip>...

2012-04-12 12:15:30 10696 1 Trace: CTransferSocket::TransferEnd(1)
2012-04-12 12:15:30 10696 1 Trace: CFtpControlSocket::TransferEnd()
2012-04-12 12:15:50 10696 1 Error: Connection timed out
2012-04-12 12:15:50 10696 1 Trace: CControlSocket::DoClose(2050)
2012-04-12 12:15:50 10696 1 Trace: CFtpControlSocket::ResetOperation(2114)
2012-04-12 12:15:50 10696 1 Trace: CControlSocket::ResetOperation(2114)
2012-04-12 12:15:50 10696 1 Trace: CFtpControlSocket::ResetOperation(2114)
2012-04-12 12:15:50 10696 1 Trace: CControlSocket::ResetOperation(2114)

...<snip> reconnects, does directory listing, etc </snip>...

2012-04-12 12:15:51 10696 1 Command: APPE Subversion.dump.rar.enc
2012-04-12 12:15:51 10696 1 Trace: CTransferSocket::OnConnect
2012-04-12 12:15:51 10696 1 Trace: CTransferSocket::OnReceive(), m_transferMode=1
2012-04-12 12:15:51 10696 1 Trace: CTransferSocket::TransferEnd(1)
2012-04-12 12:15:51 10696 1 Trace: CFtpControlSocket::TransferEnd()
2012-04-12 12:15:52 10696 1 Trace: CFtpControlSocket::OnReceive()
2012-04-12 12:15:52 10696 1 Response: 125 Data connection already open; Transfer starting.
2012-04-12 12:15:52 10696 1 Trace: CFtpControlSocket::TransferParseResponse()
2012-04-12 12:15:52 10696 1 Trace:   code = 1
2012-04-12 12:15:52 10696 1 Trace:   state = 6
2012-04-12 12:15:52 10696 1 Trace: CFtpControlSocket::SendNextCommand()
2012-04-12 12:15:52 10696 1 Trace: CFtpControlSocket::TransferSend()
2012-04-12 12:15:52 10696 1 Trace:   state = 7
2012-04-12 12:15:52 10696 1 Trace: CFtpControlSocket::OnReceive()
2012-04-12 12:15:52 10696 1 Response: 226 Transfer complete.

As you can see, after the transfer completes (CTransferSocket knows it completed, just like you would in FtpClient.Upload based on how much was written), it times out in getting any response from the control connection. It then simply reconnects, sends an APPE command, and then gets the 226 transfer complete response.

First thing that is probably a requirement to handle this problem is to implement your own timeout for FtpControlConnection.ReadResponse and subsequently FtpControlConnection.ReadLine. You could use a ManualResetEvent that has a predefined timeout value that is used in FtpDataStream.Close when FtpControlConnection.ReadResponse is called. That way, FtpDataStream.Close is non-blocking. If a timeout occurs, then FtpDataStream.Close can throw an exception.

This exception could then be caught in FtpClient.Upload (with a try/catch around the using), which then causes a retry to occur to APPE the rest of the file contents (even if it's 0 bytes, I guess?). The number of retries should be limited, of course.

Another way would be to call FtpDataStream.Close in FtpClient.Upload before you call OnTransferProgress. Catch the timeout exception there, do a retry X number of times, and then let it exit the using block, disposing the FtpDataStream.

I would help implementing these changes, but these are pretty extensive changes, and I'm not sure how much of it you'd agree to, or be willing to include in your library. Plus, I've never contributed to a project like this on codeplex, and not 100% sure how changes would be committed. If you happen to want me to contribute, let me know what exactly you'd like me to implement, and some simple instructions on how to submit the changes (I know how to use source control, just not sure which repository, where changes should be committed to, etc).

Coordinator
Apr 12, 2012 at 5:39 PM

First, thanks for doing this. The debugging you have done is pretty extensive work in itself and I appreciate you taking the time to do it. It sounds like what filezilla does is detect the failure and then tries to resume the transfer where the failure was which happens to be at the very end. I'm going to have to do some thinking on how to proceed from here. I think catching the exception in the Upload() method and resuming the upload accordingly is a very much doable solution however I generally try not to hide exceptions from the calling code so I think maybe firing an event that allows the user to tell FtpClient to restart the transfer where it left off is probably going to be the path I would like to see taken. If the user doesn't handle the event or chooses not to restart then FtpClient should re-throw the trapped exception in the upload method so that the calling code can do whatever it needs to do for dealing with the error. That doesn't address handling such a failure when using FtpDataStream objects directly which can be done using the OpenDownload() and OpenUpload() methods however they are more low level so I think that it's acceptable to expect the person utilizing those methods to handle a failure and restart the transfer if need be where they want it. I believe there are already Read & Write timeout properties inherited from FtpChannel (I'll have to look, haven't touched that code in a while). If I recall correctly the are just proxies to the timeouts on the underlying network stream.

Apr 12, 2012 at 5:49 PM

No problem.  A library like this has allowed me to figure out why my application's FTP uploading wasn't completing successfully.  I was originally using .NET's FtpWebRequest, and it would either hang or generate a "protocol violation" error, with no way to avoid or recover from it.

As far as not hiding exceptions from the calling code goes, I tend to agree, but if it were a specific Timeout exception, and the FtpClient allows for the calling code to specify how timeouts should be handled (a retry count, for example), then I wouldn't think it'd be a big deal.  You could implement a retry and after the retries have exceeded the maximum retries specified by the calling code, it would rethrow the exception so the calling code could handle it.  It just depends on how much control you want to require that the calling code have on situations like this.

There are some read and write timeouts among the streams being used to send and receive the data, but I'm pretty sure I tested it, and specifying these don't cause any kind of timeout to occur in this case.  You'd still have to throw the timeout exception yourself after recognizing the ReadResponse has hung.

Coordinator
Apr 12, 2012 at 8:11 PM

I've just pushed up a new revision that adds a new property ReadResponseTimeout that allows you to specify how long the control connection will wait for the server to respond. If it times out an exception of type FtpResponseTimeoutException is thrown. I have not tested it well as of yet. What I have done is set the time out to 1 milisecond, start a download, catch the exception and then increase the time out to 30 seconds and restart the download. It worked as expected. I made this work by using a delegate to invoke StreamReader.ReadLine() asynchronously. I'm not entirely sure of the implications of doing it this way. I haven't changed any of the methods up to recover from the time outs as of yet. This new code might help you with the IIS problem. If you track the progress of the transfer in your code and catch the time out exception you can restart your download where the progress left off which **I think** should be the same behavior as what you posted above with FileZilla.

Coordinator
Apr 12, 2012 at 8:37 PM

After looking at this further, I think the proper way to handle the situation is this:

FtpCLient cl = new FtpClient(...);
try {
    cl.Upload(.....);
}
catch(FtpReadResponseException) {
     cl.Connect();
}
It leaves full control of how errors are handled to the person using the ftp client library. Might give that a shot with the new time out property and let me know how it affects your project.

Apr 12, 2012 at 9:04 PM

I've got a test running now.

I'm handling the TransferProgress event, saving the FtpTransferInfo so that when I catch the exception, I will have the information I need to attempt a resume.

If this takes care of the problem, then I may look into modifying it slightly so that the FtpResponseTimeoutException contains the information one would need to resume (position, length, etc).  Maybe by catching the exception in FtpClient.Upload and then rethrowing it with FtpTransferInfo?  I'm not sure yet.

Apr 12, 2012 at 9:57 PM

Well, it seems like everything worked as expected.  The exception was thrown, I caught it, and then retried.  However, another exception was then thrown saying that the server does not support resume.  However, it does (http://www.microsoft.com/technet/prodtechnol/WindowsServer2003/Library/IIS/19594888-29b0-49cb-91ee-41037cb79b0c.mspx?mfr=true).

It appears that your library sends a REST command for resuming downloads AND uploads.  If I'm not mistaken, REST is for specifying where in the file you would like to resume a download, sent before the RETR, and APPE should be used to resume an upload.  Is this not correct, or did I find a bug?  This may be an issue with FtpClient.OpenWrite() and FtpDataStream.Seek(), but I will defer to your RFC exptertise.

Coordinator
Apr 12, 2012 at 10:14 PM

Uploads should work with REST too. Comment out the following lines in FtpDataStream.Seek():

 if (!this.ControlConnection.HasCapability(FtpCapability.REST)) {
                throw new IOException("The FTP server does not support stream seeking.");
            }
The exception is thrown because the server isn't advertising REST STREAM in it's FEAT list. It should be safe to comment those lines infact I may remove them from the main code base. The REST command failing would be no worse than the exception above.

Coordinator
Apr 13, 2012 at 2:48 AM

I went ahead and removed that code and committed it just now. It's kind of useless, an exception will be thrown either so way so I figure why not try to execute the REST command and if it fails it fails. This isn't the first case I've seen where the servers support more features than advertised by the FEAT command.

Apr 13, 2012 at 3:19 AM

OK, that makes sense.  I'll give it another good test tomorrow.

I'm still a little confused as to why APPE isn't used at all?  For a resumed upload, it appears to me that it uses REST and then STOR.  Why doesn't it use APPE?

Coordinator
Apr 13, 2012 at 11:24 AM

Why does it need to be used? RFC959 doesn't say REST only works with RETR. REST + STOR do the job for resuming and STOR allows you to overwrite while APPE does not. If you want to use the APPE command I can add an OpenAppend() method that allows you to open a file for writing using APPE instead of STOR.

Apr 13, 2012 at 4:57 PM

I don't know that APPE does need to be used.  I just saw that FileZilla was using it, rather than STOR, for resuming the upload.

I tried another test, and when it resumes (with rest==file length) it overwrites the file with a zero-byte file.  Is that what's supposed to happen when you issue a REST command with the total number of bytes?  I would think that, if a "rest" is specified, it shouldn't create a new file, but resume where it left off; and if it left off at the end, nothing is written.  What do you think?

Coordinator
Apr 13, 2012 at 5:07 PM

No, that is not what is supposed to happen. This is definitely a bug that needs to be addressed. This behavior is happening on IIS6 correct?

Apr 13, 2012 at 5:37 PM

Correct, that was what I tested with last.  If you need me to test with a IIS7+, let me know.

Coordinator
Apr 13, 2012 at 6:24 PM

Nah, I've changed the code to ignore rest on uploads where FtpClient.System == Windows_NT (IIS) and just use APPE. There are just too many variables to reliably decide when to use APPE over REST + STOR. APPE isn't listed as a required command for server implementations but neither is REST. The FEAT list of the server can't be relied upon for determining the best path to take. So, in the end we end up with hacked together software for those special cases. Revision 1a45cc90d659 contains the changes. Hopefully this wraps up our IIS FTP adventure. :)

Apr 13, 2012 at 9:56 PM

I've tested the latest revision and it has solved the problem (with me catching the timeout exception and doing an upload with the "rest" specified).

I appreciate all of your help, and time dedicated to addressing these issues (thanks to Microsoft).  I'll continue to use this library, and if I could ever be of any help, don't hesitate to contact me.

Coordinator
Apr 14, 2012 at 4:23 AM

No problem, glad it we got it worked out. The REST + STOR issue with IIS was a pretty serious bug so I'm glad we ran across it and got it situated.