This project is read-only.

Filename encoding problem with special characters (> 7bit ASCII)

May 27, 2013 at 2:17 PM
Edited May 27, 2013 at 2:17 PM
Hi there,

I'm trying to use the FTP Client to upload files to a Linux Server (Pure-FTPd) and have run into the following issue.

When I write a file containing Unicode characters with more than 127 bit the client escapes the Filename to 2-byte UTF-8 standard (e.g. the "ö" character gets replaced by #C3#B6). The server now however interprets these characters as Latin-1 unicode characters (in my example it replaces them by "ö" on the server).

Is there already a built-in method to avoid such an issue like setting the encoding of the filename to ASCII/Unicode although the client detected UTF-8?

Regards,
David
Coordinator
May 27, 2013 at 2:34 PM
Can you confirm this problem exists with other servers besides Pure-FTPd and send a copy of the transaction log? Please refer to the Examples/Debug.cs file for info on obtaining a transaction log.
Coordinator
May 27, 2013 at 2:44 PM
I've added a new method DisableUTF8() which reverts back to ASCII that you can try as well. It's untested, let me know the results.
May 27, 2013 at 2:56 PM
Edited May 27, 2013 at 2:57 PM
I'm currently setting up a new test Project for that.

From what I can tell from FileZilla command log. It produced a similar issue when I tried to rename the file on the server via FileZilla, so it is definitely a server problem that in this configuration reports UTF-8 but doesn't use it:

Command: OPTS UTF8 ON
Response: 200 OK, UTF-8 enabled
Status: Connected
Status: '/import/downloads/Verbundmörtel Zubehör + Technische Daten DE.pdf' is being renamed to '/import/downloads/Verbundmörtel Zubehör + Technische Daten DE.pdf1'
Command: CWD /import/downloads
Response: 250 OK. Current directory is /import/downloads
Command: PWD
Response: 257 "/import/downloads" is your current location
Command: RNFR Verbundmörtel Zubehör + Technische Daten DE.pdf
Response: 550 Sorry, but that file doesn't exist
Status: Recieving directory listing...
Command: TYPE I
Response: 200 TYPE is now 8-bit binary
Command: PASV
Response: 227 Entering Passive Mode (IP,Port)
Command: MLSD
Response: 150 Accepted data connection
Status: Recieved invalid string, disabling UTF-8. Select UTF-8 in server manager, to force UTF-8.
Response: 226-Options: -a -l
Response: 226 393 matches total
Status: Directory listing complete
Status: '/import/downloads/Verbundmörtel Zubehör + Technische Daten DE.pdf' wird in '/import/downloads/Verbundmörtel Zubehör + Technische Daten DE.pdf1' umbenannt
Command: RNFR Verbundmörtel Zubehör + Technische Daten DE.pdf
Response: 350 RNFR accepted - file exists, ready for destination
Command: RNTO Verbundmörtel Zubehör + Technische Daten DE.pdf1
Response: 250 File successfully renamed or moved

Will post as soon as I've got some debug info from your client.
May 27, 2013 at 3:14 PM
Tried the DisableUTF8() method first which resulted immediatley to a "502 command unknown" Response to the "OPTS UTF8 OFF" command from the server... will try to get the full transaction log now.
May 27, 2013 at 3:42 PM
Edited May 27, 2013 at 3:42 PM
Okay, so I do get 2 different results with Version 1.0.4885.16106 (obtained via NuGet) and the recent commit 1.0.4895.xxx

In the 4885 the file is uploaded with the UTF-8 escaped characters (that are interpreted as ANSI characters. The log is as follows:

220---------- Welcome to Pure-FTPd [privsep] [TLS] ----------
220-You are user number 2 of 50 allowed.
220-Local time is now 16:30. Server port: 21.
220-IPv6 connections are also welcome on this server.
220 You will be disconnected after 10 minutes of inactivity.
USER ***
331 User *** OK. Password required
PASS <omitted>
230 OK. Current restricted directory is /
FEAT
211-Extensions supported:
EPRT
IDLE
MDTM
SIZE
REST STREAM
MLST type*;size*;sizd*;modify*;UNIX.mode*;UNIX.uid*;UNIX.gid*;unique*;
MLSD
AUTH TLS
PBSZ
PROT
UTF8
TVFS
ESTA
PASV
EPSV
SPSV
ESTP
211 End.
OPTS UTF8 ON
200 OK, UTF-8 enabled
CWD /import/downloads
250 OK. Current directory is /import/downloads
220---------- Welcome to Pure-FTPd [privsep] [TLS] ----------
220-You are user number 3 of 50 allowed.
220-Local time is now 16:30. Server port: 21.
220-IPv6 connections are also welcome on this server.
220 You will be disconnected after 10 minutes of inactivity.
USER ***
331 User *** OK. Password required
PASS <omitted>
230 OK. Current restricted directory is /
FEAT
211-Extensions supported:
EPRT
IDLE
MDTM
SIZE
REST STREAM
MLST type*;size*;sizd*;modify*;UNIX.mode*;UNIX.uid*;UNIX.gid*;unique*;
MLSD
AUTH TLS
PBSZ
PROT
UTF8
TVFS
ESTA
PASV
EPSV
SPSV
ESTP
211 End.
OPTS UTF8 ON
200 OK, UTF-8 enabled
PWD
257 "/import/downloads" is your current location
CWD /import/downloads
250 OK. Current directory is /import/downloads
TYPE I
200 TYPE is now 8-bit binary
SIZE Verbundmörtel Zubehör + Technische Daten DE.pdf
550 Can't check for file existence
EPSV
229 Extended Passive mode OK (|||50326|)
STOR Verbundmörtel Zubehör + Technische Daten DE.pdf
150 Accepted data connection
226-File successfully transferred
226 0.204 seconds (measured here), 6.41 Mbytes per second
QUIT
221-Goodbye. You uploaded 1340 and downloaded 0 kbytes.
221 Logout.
QUIT
221-Goodbye. You uploaded 0 and downloaded 0 kbytes.
221 Logout.

The file is then finally saved as "Verbundmörtel Zubehör + Technische Daten DE.pdf"






Using the recently compiled 1.0.4895 I get the following output:

220---------- Welcome to Pure-FTPd [privsep] [TLS] ----------
220-You are user number 2 of 50 allowed.
220-Local time is now 16:37. Server port: 21.
220-IPv6 connections are also welcome on this server.
220 You will be disconnected after 10 minutes of inactivity.
USER ***
331 User *** OK. Password required
PASS <omitted>
230 OK. Current restricted directory is /
FEAT
211-Extensions supported:
EPRT
IDLE
MDTM
SIZE
REST STREAM
MLST type*;size*;sizd*;modify*;UNIX.mode*;UNIX.uid*;UNIX.gid*;unique*;
MLSD
AUTH TLS
PBSZ
PROT
UTF8
TVFS
ESTA
PASV
EPSV
SPSV
ESTP
211 End.
OPTS UTF8 ON
200 OK, UTF-8 enabled
CWD /import/downloads
250 OK. Current directory is /import/downloads
220---------- Welcome to Pure-FTPd [privsep] [TLS] ----------
220-You are user number 3 of 50 allowed.
220-Local time is now 16:37. Server port: 21.
220-IPv6 connections are also welcome on this server.
220 You will be disconnected after 10 minutes of inactivity.
USER ***
331 User *** OK. Password required
PASS <omitted>
230 OK. Current restricted directory is /
PWD
257 "/import/downloads" is your current location
CWD /import/downloads
250 OK. Current directory is /import/downloads
TYPE I
200 TYPE is now 8-bit binary
SIZE Verbundmörtel Zubehör + Technische Daten DE.pdf
550 Can't check for file existence
EPSV
229 Extended Passive mode OK (|||59183|)
STOR Verbundmörtel Zubehör + Technische Daten DE.pdf
150 Accepted data connection
226-File successfully transferred
226 0.468 seconds (measured here), 2.80 Mbytes per second
QUIT
221-Goodbye. You uploaded 1340 and downloaded 0 kbytes.
221 Logout.
Disposing FtpClient object...
Disposing FtpSocketStream...
QUIT
221-Goodbye. You uploaded 0 and downloaded 0 kbytes.
221 Logout.

In this case the file is saved as "Verbundm?rtel Zubeh?r + Technische Daten DE.pdf". The question marks imply that the filename was transmitted as ASCII (that I think is done in the .Net Encoding class when converting from ANSI (Windows Codepage 1252) to ASCII.
Coordinator
May 27, 2013 at 3:45 PM
The file name was transmitted correctly as per the log: STOR Verbundmörtel Zubehör + Technische Daten DE.pdf

The way it was stored on the file system of the server has to do with the server OS or FTP server software.
May 27, 2013 at 3:49 PM
Edited May 27, 2013 at 3:56 PM
Strangely though FileZilla manages to upload the file and the file ends up on the server with a correctly encoded file name. So either they do something wierd in FileZilla or it is not entirely a server sided issue - or at least you can hack it in that way, that you can override the server side problem.

/edit: Tried it a few times with FileZilla. As long as the Client believes the server is still in UTF-8 mode, it stores the filename in the malformed UTF8 -> ANSI conversion.

Once FileZilla printed out that it recieved an invalid string, they transmit the filename in ANSI Encoding while actually remaining in UTF-8 mode - which seems like an ugly hack but obviously it works.

So the problem is indeed rather server sided but I fear I don't have the luxury to change the server as this is done by our service provider.
Coordinator
May 27, 2013 at 11:12 PM
Edited May 27, 2013 at 11:15 PM
Alright, sorry for the long delay in getting back to you, I had to go out of town for most of the day. I've tracked down and fixed this bug. The bug was that the FtpClient.Capabilities enum was not properly copied via reflection to cloned control connections (see FtpClient.EnableThreadSafeDataConnections) due to the property setter being private instead of protected. The end result was that the cloned connection used for uploads and downloads didn't know the server supported UTF8 so the internal System.Text.Encoding of System.Net.FtpClient never got changed over to UTF8. Anyway, it's fixed now. I've tested the file name from your transaction logs against my development server and confirmed the bug existed and implemented a fix as well as pushed up a new nuget package.

-- edit --

Worth noting for anyone reading this thread the bug was introduced a few revisions back when I changed the Connect() method to not re-load (execute FEAT and process response) the Capabilities enum for cloned connections which is wasteful since it was already done on the parent control connection.
Coordinator
May 27, 2013 at 11:23 PM
The reason the file name appeared correctly in the logs is because is because whatever text encoding that was being used to write to the console window or log file supported UTF8, it was no reflection of what was being sent to the server (obviously or this bug wouldn't have existed, just wanted to point out why I was wrong to assume the file name was being sent correctly).
May 28, 2013 at 7:34 AM
Edited May 28, 2013 at 8:05 AM
Just for fun I rigged the (old) FtpClient with an overloaded method for execute in which I chanced the client encoding:
    /// <summary>
    /// Executes a command
    /// </summary>
    /// <param name="command">The command to execute</param>
    /// <param name="targetEncoding">The Encoding the command string is transmitted in. Used to circumvent server malconfiguration from the client side.</param>
    /// <returns>The servers reply to the command</returns>
    public FtpReply Execute(string command, Encoding targetEncoding)
    {
        Encoding oldEncoding = Encoding;
        this.Encoding = targetEncoding;
        FtpReply reply = Execute(command);
        this.Encoding = oldEncoding;
        return reply;
    }
When I used this method to call:

CurrentReply = client.Execute(string.Format("RNFR {0}", file.Name), Encoding.ASCII); // This is in ASCII because of the Reflection UTF8 bug.
[...]
CurrentReply = client.Execute(string.Format("RNFR {0}", file.Name), Encoding.Default);

It renamed the malformed filename to the correct name. So the server os/ftpd config definitely is wrong, but you can compensate it clientside if you hack it the same way FileZilla does this.

/edit: Just tried this with the new build. Now also works with Encoding.UTF8.