How is date modified handled?

Jun 19, 2012 at 6:02 PM
Edited Jun 19, 2012 at 6:03 PM

So, I previously was using FtpWebRequest, but my issue was when i used ListDirectoryDetails, it returned a nasty formatted string.  That apparently changes depending on the way FTP site was implemented.  One of the sites I am connecting to uses a standard (/bin/ls format) for its modified date,  where if the data is over some amount of months old (I think normally 6) it will put a year instead of a timestamp.  (e.g. Jun 01 16:37 for this year; Jun 01 2011 for last year).

I wanted to find a library that would handle these different implementation details, so I could get the most recent version of a daily file that could be put out multiple times a day, each with a slightly different file name (hour changes).  This library appeared to do this at first, but I noticed that it is saying the file was last modified{6/19/2012 1:16:00 AM} when it actually was Jun 01 16:37.  Now it appears this still creates a correct modified order, but I would guess this could give me the wrong order, depending on the actual time stamp.

Has this been investigated?

Coordinator
Jun 19, 2012 at 6:05 PM
What server software is this happening with and could provide the
transaction log? It's hard to debug any issues without specific
details such as the server software + version and the ftp transaction
log. Without those I can't give you much of an answer.
Coordinator
Jun 19, 2012 at 6:22 PM
Also, the way the last write time is retrieved depends on the server
and its features. The MLST/MLSD commands are preferred for performing
file listings and getting file info when they are available however
not all servers implement these commands. The output is parser
friendly where as the LIST command is not. There are parsers for
various LIST formats built in to the library. The mod date is
generally ignored in LIST formats because it's not as accurate.
Instead the MDTM command is used to get the last write time. If the
last write time is wrong I need to see the transaction log to really
get an understanding as to why. Providing the server information
allows me to setup and try to reproduce the problem and verify the
problem has been fixed.
Jun 19, 2012 at 6:38 PM
Edited Jun 19, 2012 at 6:39 PM

I can't provide you when the server information because we have our own security credentials, but here is the output from the list operation.  As you can see the modified date is 6/1 16:37 and 6/6 14:22, while the FtpListItem[] returned shows  Modify: 6/19/2012 1:16:00 AM and  Modify: 6/19/2012  6:14:00 AM respectively.

The lower run shows where this breaks in logic, when the months are different.

< LIST *20120501*.txt
> 150 Here comes the directory listing.
> -rwxrwxr-x    1 ftp      ftp         25841 Jun 01 16:37 mr005aeci2012050100est20.txt
> -rwxrwxr-x    1 ftp      ftp         25841 Jun 06 14:22 mr005aeci2012050100est45.txt
> 226 Directory send OK.
>
< LIST *2012*.txt
> 150 Here comes the directory listing.
> -rwxrwxr-x    1 ftp      ftp         26176 Feb 01 17:52 mr005aeci2012010100est10.txt
> -rwxrwxr-x    1 ftp      ftp         26176 Feb 06 17:08 mr005aeci2012010100est25.txt
> -rwxrwxr-x    1 ftp      ftp         25707 Mar 01 18:03 mr005aeci2012020100est10.txt
> -rwxrwxr-x    1 ftp      ftp         25707 Mar 06 16:27 mr005aeci2012020100est20.txt
> -rwxrwxr-x    1 ftp      ftp         25774 Apr 02 16:37 mr005aeci2012030100est10.txt
> -rwxrwxr-x    1 ftp      ftp         25841 Apr 05 13:39 mr005aeci2012030100est45.txt
> -rwxrwxr-x    1 ftp      ftp         25841 May 01 17:09 mr005aeci2012040100est10.txt
> -rwxrwxr-x    1 ftp      ftp         25841 May 04 16:12 mr005aeci2012040100est30.txt
> -rwxrwxr-x    1 ftp      ftp         25841 Jun 01 16:37 mr005aeci2012050100est20.txt
> -rwxrwxr-x    1 ftp      ftp         25841 Jun 06 14:22 mr005aeci2012050100est45.txt
> 226 Directory send OK.
>
mr005aeci2012010100est10.txt: 6/19/2012 1:17:00 AM
mr005aeci2012010100est25.txt: 6/19/2012 6:17:00 AM
mr005aeci2012020100est10.txt: 6/19/2012 1:18:00 AM
mr005aeci2012020100est20.txt: 6/19/2012 6:16:00 AM
mr005aeci2012030100est10.txt: 6/19/2012 2:16:00 AM
mr005aeci2012030100est45.txt: 6/19/2012 5:13:00 AM
mr005aeci2012040100est10.txt: 6/19/2012 1:17:00 AM
mr005aeci2012040100est30.txt: 6/19/2012 4:16:00 AM
mr005aeci2012050100est20.txt: 6/19/2012 1:16:00 AM
mr005aeci2012050100est45.txt: 6/19/2012 6:14:00 AM

 

Coordinator
Jun 19, 2012 at 6:42 PM
Are there MDTM commands in the transaction log?
Jun 19, 2012 at 6:47 PM

This library does clean up a lot of things, but I was really looking for a library that handled these different directory formats.  So, I can parse the RawListing, I was just hoping that the library could do this parsing logic.

Coordinator
Jun 19, 2012 at 6:51 PM
This library does do that, what you've encountered is most likely a
bug and I'd like to fix it but I'm wondering if the MDTM command was
called in the transaction? The date time is not supposed to be parsed
from LIST formats so if that is happening it's a bug. The date time in
LIST output on most if not all systems is not accurate. The MDTM
command should be accurate. I'm trying to figure out how the FTP
library got those dates out of the information the server provided.
Coordinator
Jun 19, 2012 at 6:57 PM

Also, I'm not asking for an account on the server, rather I'm wondering what the name of the server software is and the version.

Jun 19, 2012 at 6:58 PM
Edited Jun 19, 2012 at 7:00 PM

Here is what comes right above the first output block above.  It looks like there is mention of a MDTM feature, but no command is firing.  Is there something i need to configure.  I did try using MLST and MLSD as a parameter in GetListing, but it errors out saying unknown command.

 

< FEAT
> 211-Features:
>  EPRT
>  EPSV
>  MDTM
>  PASV
>  REST STREAM
>  SIZE
>  TVFS
> 211 End
< TYPE A
> 200 Switching to ASCII mode.
< PASV
> 227 Entering Passive Mode (148,142,64,33,59,6)
< LIST *2012*.txt
> 150 Here comes the directory listing.

Here is my Download method:

 

public void DownloadLatestFile(string filePath, string filePattern, Action processStreamCallback)
        {
            FtpListItem latestFile;
            using (var client = GenerateFtpClient())
            {
                FtpClient.FtpLogStream = Console.OpenStandardOutput();
                FtpClient.FtpLogFlushOnWrite = true;
                client.SetWorkingDirectory(filePath);
                client.DataChannelType = FtpDataChannelType.Passive;

                var files = client.GetListing(filePattern);
                var filesRaw = client.GetRawListing(filePattern);
                foreach (var file in files)
                    Console.WriteLine(file.Name+": "+file.Modify);
                latestFile = files.OrderByDescending(p => p.Modify).First();
            }
            DownloadFile(Path.Combine(filePath, latestFile.Name), processStreamCallback);
        }

 

 

Jun 19, 2012 at 7:02 PM

It is a 3rd party system, I can try to contact them, to see if I can figure out more information.  This appears to be the format their system is using http://cr.yp.to/ftp/list/binls.html

Coordinator
Jun 19, 2012 at 7:11 PM

I think I've found the problem in the FtpListFormatParser class. It appears to be a regression, not sure how long its been there. The last write time is not supposed to be used from long format unix listings however it is. I will give you some code to try in a few to see if it clears of the problem. After you get a chance to test it let me know so I can push the changes up.

Coordinator
Jun 19, 2012 at 7:36 PM

Alright, here's some code to try. Put this code anywhere before you call GetListing() and see if it clears up the problem. The latest revision in the source section of the site contains the fixes.

FtpListFormatParser.Parsers.Clear();

// UNIX format directory
FtpListFormatParser.Parsers.Add(new FtpListFormatParser(
                        @"(d[\w-]{9})\s+\d+\s+([\w\d]+)\s+([\w\d]+)\s+\d+\s+(\w+\s+\d+\s+\d+:?\d+)\s+(.*)",
                        5, -1, 0, 1, 2, 3, -1, FtpObjectType.Directory));

                    // UNIX format file
FtpListFormatParser.Parsers.Add(new FtpListFormatParser(
                        @"(-[\w-]{9})\s+\d+\s+([\w\d]+)\s+([\w\d]+)\s+(\d+)\s+(\w+\s+\d+\s+\d+:?\d+)\s+(.*)",
                        6, 4, 0, 1, 2, 3, -1, FtpObjectType.File));

Coordinator
Jun 19, 2012 at 7:38 PM
Edited Jun 19, 2012 at 7:40 PM

If this works I strongly recommend getting the latest revision from the source section and removing the above code from your project. The list format you sent looks pretty standard and there is no reason you should have to modify the Parsers collection like we're doing above. The latest code should eliminate the problem without the above hack.

Also forgot to mention, FtpListItem.Modify will == DateTime.MinValue. The latest code loads the modify time automatically. One work around is to use FtpDirectory() and FtpFile() instead. FtpDirectory() is capable of getting a file listing and is actually the preferred way to do things like you're doing above. FtpDirectory() and FtpFile() strongly mirror the functionality of System.IO.File and System.IO.Directory, if you're familiar with those classes.

Jun 19, 2012 at 7:41 PM

I am getting an cannot resolve constructor, it looks like there is one more integer than is expected.

Coordinator
Jun 19, 2012 at 7:43 PM
Ah yes, because you're using an older code base. The binary you
downloaded is lacking many new features and bug fixes. Again, I urge
you to get the latest code base from the source section.
Jun 19, 2012 at 7:50 PM

I downloaded the newest version, and commented out the above code again.  And with it commented and uncommented out, the modified time goes to 1/1/0001 12:00 AM.

Coordinator
Jun 19, 2012 at 7:50 PM

I've uploaded a new binary for your convenience: http://netftp.codeplex.com/releases/view/89836

Coordinator
Jun 19, 2012 at 7:54 PM

Which is the default behavior or FtpListItem. The new code should (and binary) should be calling MDTM to load the last write time, you should see one MDTM call for each object returned in the file listing. If you don't see MDTM in the transaction log after LIST is called, make sure you've got the latest revision. Some times codeplex is slow to show new data. The change set is 2d8b3df1518c.

Jun 19, 2012 at 8:01 PM

This is what I am getting for my output.

< LIST *2012*.txt
> 150 Here comes the directory listing.
> -rwxrwxr-x    1 ftp      ftp         26176 Feb 01 17:52 mr005aeci2012010100est10.txt
> -rwxrwxr-x    1 ftp      ftp         26176 Feb 06 17:08 mr005aeci2012010100est25.txt
> -rwxrwxr-x    1 ftp      ftp         25707 Mar 01 18:03 mr005aeci2012020100est10.txt
> -rwxrwxr-x    1 ftp      ftp         25707 Mar 06 16:27 mr005aeci2012020100est20.txt
> -rwxrwxr-x    1 ftp      ftp         25774 Apr 02 16:37 mr005aeci2012030100est10.txt
> -rwxrwxr-x    1 ftp      ftp         25841 Apr 05 13:39 mr005aeci2012030100est45.txt
> -rwxrwxr-x    1 ftp      ftp         25841 May 01 17:09 mr005aeci2012040100est10.txt
> -rwxrwxr-x    1 ftp      ftp         25841 May 04 16:12 mr005aeci2012040100est30.txt
> -rwxrwxr-x    1 ftp      ftp         25841 Jun 01 16:37 mr005aeci2012050100est20.txt
> -rwxrwxr-x    1 ftp      ftp         25841 Jun 06 14:22 mr005aeci2012050100est45.txt
> 226 Directory send OK.
>
< MDTM *2012*.txt/mr005aeci2012010100est10.txt
> 550 Could not get file modification time.
< MDTM *2012*.txt/mr005aeci2012010100est25.txt
> 550 Could not get file modification time.
< MDTM *2012*.txt/mr005aeci2012020100est10.txt
> 550 Could not get file modification time.
< MDTM *2012*.txt/mr005aeci2012020100est20.txt
> 550 Could not get file modification time.
< MDTM *2012*.txt/mr005aeci2012030100est10.txt
> 550 Could not get file modification time.
< MDTM *2012*.txt/mr005aeci2012030100est45.txt
> 550 Could not get file modification time.
< MDTM *2012*.txt/mr005aeci2012040100est10.txt
> 550 Could not get file modification time.
< MDTM *2012*.txt/mr005aeci2012040100est30.txt
> 550 Could not get file modification time.
< MDTM *2012*.txt/mr005aeci2012050100est20.txt
> 550 Could not get file modification time.
< MDTM *2012*.txt/mr005aeci2012050100est45.txt
> 550 Could not get file modification time.

Coordinator
Jun 19, 2012 at 8:04 PM
While the LIST command on the server supports globs (wildcard
patterns), this code does not. Use the full path to the directory
you're listing, i.e., cl.GetListing("./") to get a list in the current
directory and filter out what you don't want from the array of files
returned.
Jun 19, 2012 at 8:23 PM
Edited Jun 19, 2012 at 8:25 PM

Thank you for all your help.  Well I got it to work without doing the filtering after I get all of the listing.  That MDTM fails when doing the initial listing, but after getting all the files that match that string pattern, I then can loop over that list and create a new FtpFile for each one and then MDTM works and I can look at the LastWriteTime.  I will probably modify this method further to just download the FtpFile right there rather than passing the FullName string, but for anyone that needs a example in the future here is my working method.

public void DownloadLatestFile(string filePath, string filePattern, Action<StreamReader> processStreamCallback)
{
	string latestFile;
	using (var client = GenerateFtpClient())
	{
                client.FtpLogStream = Console.OpenStandardOutput();
                client.FtpLogFlushOnWrite = true;
                client.SetWorkingDirectory(filePath);
                client.DataChannelType = FtpDataChannelType.Passive;

                var files = client.GetListing(filePattern);

		latestFile = files.Select(p => new FtpFile(client, Path.Combine(filePath, p.Name))).OrderByDescending(p => p.LastWriteTime).First().FullName;
	}
	DownloadFile(latestFile, processStreamCallback);
}
Jun 19, 2012 at 8:26 PM

Thank you for all of your help.  I do have a couple questions about releases.  I am researching a good FTP library for a team of developers, and we want something that is actively worked on, clearly this one is.  I was wondering when do you normally build a new dll, and post it as the current version because the version I downloaded was from March, but you have a lot of revisions.  

Coordinator
Jun 19, 2012 at 8:44 PM

We make releases when we believe they are tested enough to not give people fits when trying to use it. Unfortunately people do not report bugs that often so issues like this one sometimes go unnoticed for months. It is difficult to ensure reliability on every FTP server configuration out there and sometimes things just get broken. If you want to use this library the best path forward is to use the latest source revision and let us know when you run across a bug. We do not get paid to work on this project so bugs get fixed when we get around to it. I know that might sound bad but that's just the reality of the situation. For what it's worth I try to fix things as quickly as possible but I make no guarantees that's how things will always play out and I make no guarantee about the reliability of this code.

Developer
Jun 22, 2012 at 8:36 AM
Edited Jun 22, 2012 at 10:14 AM

Hi !

I think this thread is related to the regression fix and the use of the MDTM command ?

I don't understand why the date should not be read from the LIST format ?
Except from the fact it may switch from timestamp to year, when the file is old, the date is still accurate.

The way I did it, was:

TryParse the date, if it fails, this is an "exotic" date format that the TryParse doesn't understand, and then I use TryParseExact with strings matching the unknown date format.

So two possibilities, the version used was from before this modification, or this is a new unknown format, and could have been added to the list of string to match.

string[] formats = new string[] { "MMM dd HH:mm", "MMM dd  yyyy" };

I know this is may not the best solution and it may be possible to use more powerful regexp, or find another way to parse date.  But it prevent calling MDTM on each item of the list, this could really slow down listing directories with lot of files, and if the connection is a bit slow it could be worse.

And also what happen if the server doesn't support MDTM ?

 

EDIT: First I confirm this fix doesn't work on server without MDTM.
Second, It seems that it works better by calling TryParseExact first and then if the date is still DateTime.MinValue, try a TryParse.
If try parse doesn't work it should return a DateTime.MinValue, therefore we could try MDTM when DateTime is MinValue.
The edge case, is when tryparse works, but return the wrong date. 

So I think we should use both solutions.

Coordinator
Jun 22, 2012 at 12:56 PM

I think you're right, that both options should be available. I think that we should prefer MDTM to the date in a LIST listing when MDTM is listed in FEAT, otherwise fall back to the date/time in the LIST listing with an option to override MDTM for people who want performance over date/time accuracy. I think for starters we need to dump public access FtpListItem and force people to use FtpFile and FtpDirectory which is what I really intended. Returning an array of FtpListemItem objects from FtpClient.GetListing() was a poor decision on my part when starting this project I think. FtpFile and FtpDirectory automatically load last write time's when they can so performance wise they're better because they don't call MDTM for every file returned in the list unless the user wants to use the modify time for each file returned in the list. The LastWriteTime property of FtpFileSystemObject's will detect if it can or needs to call MDTM when its getter is accessed.

Developer
Jun 22, 2012 at 2:21 PM

I think the problem with this way, by forcing use of FtpFile and FtpDirectory is that, we move away from "true" ftp. The GetListing method should return the same information that the one returned in the GetRawListing without needs to call other methods, because this is what it is expect from the LIST command. And then only if the user needs it, he can call MDTM, or any other command to get more or accurate information.

This means after a call to list, I should be able to view all the information in my object and the see the same as in the rawlisting. The fact that FtpFile and FtpDirectory loads their data on demands break this behavior.

Coordinator
Jun 22, 2012 at 2:46 PM
Edited Jun 22, 2012 at 2:47 PM

Alright, I'm reverting the changes except  per your suggestion we'll use if(!TryParseExact()) { if(!TryParse()) { date = DateTime.MinValue } } so that we try a known format first with TryParseExact, then a free-for-all with TryParse() and if it fails we give up with DateTime.MinValue. In addition I'm adding a format which I think may be related to why this thread was started: MMM[sp][sp]d[sp]HH:mm which I see ls -l producing on Mac OS X and Linux. The long listing format on these two systems doesn't produce a 2 digit day for single digit numbers, i.e., we just get 1 instead of 01 so I think the  "MMM[sp]dd[sp]HH:mm" may not have been catching those. I'll try to add some documentation stating that if FtpListItem.Modify = DateTime.MinValue, call FtpClient.GetLastWriteTIme() to try to load the last write time. That way it's completely up to the user of the code and doesn't cause a performance hit for user's that don't care otherwise.

Coordinator
Jun 22, 2012 at 4:03 PM

 

Alright StrAbz, I've pushed up the reverted and new changes we discussed. I've tested it lightly, seems to be working OK.

Jun 22, 2012 at 7:58 PM

So, I have tested this, and it appears that GetListing() now returns the correct modified date.  Is it safe to depend on this to always be filled with a correct value?  I am creating a utility class that will take in any ftp connection and work fine, so I want to make sure I can remove the extra logic of creating a new FtpFile (below), if I can count on the Modified field.

var latestFile = files.Select(p => new FtpFile(client, Path.Combine(filePath, p.Name))).OrderByDescending(p => p.LastWriteTime).First();

Coordinator
Jun 22, 2012 at 8:16 PM
Edited Jun 22, 2012 at 8:30 PM

Because there is no written specification on LIST formats, no it's not always safe to depend on the modification time to be correct. We can only give it our best effort. I added a project called FileListing to the source tree. It's available in the latest revision. In this project is an example you can build off of to get accurate to-the-second last write times if the server supports the MDTM command. Essentially this is what you want if you modification times are important:

 

foreach(FtpListItem o in cl.GetListing(path)) {
	// if the server supports MLSD then we should
	// almost certainly have an accurate date/time
	// representation of the modification date
	if(!cl.HasCapability(FtpCapability.MLSD) && cl.HasCapability(FtpCapability.MDTM)) {
		// the server does not support MLSD and the last
		// write time is important. the server also supports
		// MDTM which can give us the most accurate date/time
		// representation of the last write time.
		DateTime modify = cl.GetLastWriteTime("{0}/{1}", path, o.Name);
		
		// make sure the MDTM command succeeded
		if(modify != DateTime.MinValue) {
			o.Modify = modify;
		}
		else {
			// the MDTM command failed. do you care? if
			// so this is the place to trigger a failure.
		}
	}
	else {
		// the server doesn't support any way to get modification times
		// other than what was provided by LIST, if anything. this means
		// you are stuck with whatever is in o.Modify. if the value is
		// DateTime.MinValue either the modification time was not available
		// or there was a problem parsing it. if there was a problem you
		// should provide the complete listing from the transaction log
		// as well as a list of the entries that are invalid so that
		// can look into adding support into our parsers.
	}
}

 

 

If you want the best performance and accuracy you need to be using a modern FTP server that supports MLSD which is specifically designed to be parsed by a machine. The LIST Format is designed to be viewed by human eyes and is not parser friendly. The actual formatting of LIST entries is left up to the developer of the FTP server software. Most server software sticks to UNIX's ls long listing format but some do variations of the format that might not be compatible with our parsers. IIS, for example, can give listings in ls long listing format or dir (dos) style formats.  If you can't use a server that supports MLSD then my advice is to verify the modification times match what is returned in the file listing by observing the transaction log. If you run across some entries that are wrong please supply the complete file listing to us as well as point out the entries that were not correctly parsed and we'll look at adding the format to our parsers. This is the best we can do for LIST formats, we can't guarantee every format you might run across can be parsed out of the box, it just isn't feasible.

-- EDIT --

Also keep in mind that you can add your own parsers, have a look at the documentation page on this site for more information and examples about how to do that.

Jun 22, 2012 at 8:36 PM

For the outside else (no MDTM support), I am assuming that creating the new FtpFile for the LastWriteTime will not work because that tries to use the MDTM to get that time, is that correct/make sense?

Coordinator
Jun 22, 2012 at 8:53 PM

That is correct.

Coordinator
Jun 22, 2012 at 8:56 PM
Edited Jun 22, 2012 at 9:00 PM

I should mention that DOS iis format listings provide accurate date/time representations and are well supported by this library. If you are hitting an IIS server using this format for the file listing you do not need to worry about calling MDTM, in fact IIS doesn't accept MDTM against a directory. It only works for files. Several other servers out there are the same, MDTM only works on files.