Tag Archives: msdos

Create a Distinct Ordered List From an Unordered Duplicate List Using DOS (MS-DOS)

As I’ve stated in previous posts, I’m often forced to use minimalist tools to accomplish a job. In this installment, I challenged myself to deduplicate a list of random numbers (or any values, really) and output just the distinct ones into a new file. It’s very easy in SQL (“Select distinct…”), but there’s a little more to it when DOS is your only option. Not many people will find themselves limited in such a way, but as with many of my posts, this is more about the mental exercise and the application of techniques than it is about the end product.

To see how this works, start with a file that has about 100 lines, each line beginning with a random number between 1 and 50 inclusive. You can generate this list using Microsoft Excel. Enter the formula “=INT(RAND()*50)+1” into cell A1 and copy it down the next 99 rows. Voila: a set of random integers that’s sure to have some duplicates. Copy/paste (or just save) this set into a text file named “UnsortedDupes.txt”.

Drop the following script into a batch file within the same directory as “UnsortedDupes.txt”

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
@Echo Off
REM AUTHOR: www.dullsharpness.com

REM Read up on Delayed Variable Expansion in the SET command's help
SETLOCAL ENABLEDELAYEDEXPANSION

REM Initialize the variable that will hold previous value
REM (choose a value that you won't encounter in your list)
set prev=bogusvalue

REM Pipe the file into a sorter, and for each full sorted line, cast aside
REM the lines that match the previous line.
for /F %%i in ('type UnsortedDupes.txt ^| sort') Do (
   echo PREV1=%prev%, VAL=%%i, PREV2="!prev!"
   If NOT "!prev!"=="%%i" (echo %%i>> DeDuped.txt)
   set prev=%%i
)

REM Turn off the SETLOCAL settings
ENDLOCAL

After you run the batch file, you’ll find that all of your distinct/deduplicated values have been echoed into “DeDuped.txt.”

Worth noting:


  • The reason this script works, which I presume is obvious, is that when we iterate over a list of sorted duplicate values, each time the value we’re processing differs from the previous value we processed, we recognize that it’s a new unique value and echo it to the deduped list. And each time the value we’re processing matches the previous value processed, we know we’ve already recorded it, so we ignore it.

  • Delayed variable expansion is required for this script to work. Otherwise, the PREV variable won’t get assigned its new value each time through the loop. Behavior of DOS variable expansion is discussed in more detail here.

  • I made the sort operation more complicated (and perhaps less efficient) than it needed to be because I piped standard output into it using the TYPE command. I did this because I wanted to demonstrate the technique of piping within the FOR command, and specifically the requirement to escape the pipe character with a caret “^|”. An easier way to perform the same thing is with “sort UnsortedDupes.txt”, and the SORT help listing says that’s a faster way too.

  • Using numbers for this exercise exposes an imperfection in the SORT command: it sorts alphabetically instead of numerically. The result, for example, is that “10” comes before “2”. There does not appear to be a way to overcome the SORT command’s limitation.

  • This example shows how to dedupe a list of single values, but the FOR command allows you to examine a specific field from a delimited row. And SORT allows you to specify which character in the row to start examining. Therefore, applying either one of those tactics would allow you to deduplicate your list based on a specific field or character number. Review the help contents of FOR and SORT to learn how to use those capabilities.


That’s all for now.

Elapsed Timer Using Pure DOS (MS-DOS)

A while back, I discussed a Windows Script Host script for printing elapsed time updates to the terminal screen. It was more a mental exercise for me than something I thought would wow my readers, but a few people found it useful, which I was pleased to learn about in the comments. As mentioned in that previous post, some of the machines I work on everyday are limited in their installed software and in their ability to download the web’s finest utilities, which was the impetus for my creating such a script. But suppose you’re even more of a purist and would like to run such a script using purely DOS? The script in the listing below will do it for you.

One technique worth highlighting here is my use of the poor man’s “SLEEP” command near the bottom, which is mimicked by pinging a bogus IP address. Since DOS has no “sleep” mechanism, leveraging the ping timeout is one way to simulate sleep. Also useful is the technique for extracting hours, minutes, and seconds from the current time variable (see a separate note about that below).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
@Echo Off
REM AUTHOR: www.dullsharpness.com
REM ~~~~~~~~~~~~~~~~~~~~~
REM     Set Variables Here
REM ~~~~~~~~~~~~~~~~~~~~~
REM Ping interval is the timeout in seconds before a ping tries again.
REM Therefore this variable dictates how often the time gets
REM printed to the screen. Consider it the "sleep" interval.
SET PINGINTERVALS=5
REM We use a bogus IP because we want the ping to timeout.
SET BOGUS_IP=128.0.0.1
REM ~~~~~~~~~~~~~~~~~~~~~
REM     END User Variables
REM ~~~~~~~~~~~~~~~~~~~~~

REM Convert the ping interval seconds to millseconds
SET /A PINGINTERVALMS=%PINGINTERVALS%*1000

REM Print Current Time
echo. | time | find /i "current"

REM Determine the start time values
FOR /F "usebackq tokens=1,2,3 delims=:" %%i in (`echo %Time:~0,8%`) DO (
 set START_HOUR=%%i
 set START_MINUTE=%%j
 set START_SECOND=%%k
)

REM Clean up leading zeroes if there are any,
REM otherwise calculations get screwy
If %START_HOUR:~0,1%==0 set START_HOUR=%START_HOUR:~1,1%
If %START_MINUTE:~0,1%==0 set START_MINUTE=%START_MINUTE:~1,1%
If %START_SECOND:~0,1%==0 set START_SECOND=%START_SECOND:~1,1%

REM Come back and iterate from this point after ping timeout
:iterate
FOR /F "usebackq tokens=1,2,3 delims=:" %%i in (`echo %Time:~0,8%`) DO (
 set CUR_HOUR=%%i
 set CUR_MINUTE=%%j
 set CUR_SECOND=%%k
)

REM Clean up leading zeroes if there are any
If %CUR_HOUR:~0,1%==0 set CUR_HOUR=%CUR_HOUR:~1,1%
If %CUR_MINUTE:~0,1%==0 set CUR_MINUTE=%CUR_MINUTE:~1,1%
If %CUR_SECOND:~0,1%==0 set CUR_SECOND=%CUR_SECOND:~1,1%

REM Calculate using a set of "CALC" values.
REM We want to leave "CUR" values untouched
set CALC_HOUR=%CUR_HOUR%
set CALC_MINUTE=%CUR_MINUTE%
set CALC_SECOND=%CUR_SECOND%

REM Start by calculating seconds, which need to rollover every minute
If %CUR_SECOND% LSS %START_SECOND% (
 set /A CALC_SECOND=%CALC_SECOND%+60-%START_SECOND%
 set /A CALC_MINUTE-=1
) Else (
 set /A CALC_SECOND-=%START_SECOND%
)

REM Minutes here. They need to rollover every hour.
If %CUR_MINUTE% LSS %START_MINUTE% (
 set /A CALC_MINUTE=%CALC_MINUTE%+60-%START_MINUTE%
 set /A CALC_HOUR-=1
) Else (
 set /A CALC_MINUTE-=%START_MINUTE%
)
 
set /A CALC_HOUR=%CALC_HOUR%-%START_HOUR%

REM Prepend a leading zero when necessary
If %CALC_HOUR% LSS 10 set CALC_HOUR=0%CALC_HOUR%
If %CALC_MINUTE% LSS 10 set CALC_MINUTE=0%CALC_MINUTE%
If %CALC_SECOND% LSS 10 set CALC_SECOND=0%CALC_SECOND%

REM Print the elapsed time
echo %CALC_HOUR%:%CALC_MINUTE%:%CALC_SECOND%

REM Prepare for the next iteration
set PREV_HOUR=%CUR_HOUR%
set PREV_MINUTE=%CUR_MINUTE%
set PREV_SECOND=%CUR_SECOND%

REM Ping a bogus IP to mimic "sleep" behavior
ping -n 1 -w %PINGINTERVALMS% %BOGUS_IP% > nul
GOTO Iterate

Note the following:

  • The ping timeout “sleep” approach is not perfect, and your elapsed time printouts won’t always be at exactly the interval you specify. But the elapsed time calculation itself is not a function of that interval, so printed times are still correct

  • Setting the ping interval to anything less than 3 or 4 seconds can cause the printouts to be a little erratic. This is related to the previous bullet about the ping/sleep approach being imperfect

  • While the FOR loop used above (line 23) to set hour/minute/second variables is effective, it could be simplified by setting the variables like so:

    set HOURVARIABLE=%TIME:~0,2%
    set MINUTEVARIABLE=%TIME:~3,2%
    set SECONDVARIABLE=%TIME:~6,2%

    This works because %TIME% is a special DOS variable that always represents the current timestamp, and the syntax shown performs string extraction. Additional special DOS variables (%DATE%, %CD%, etc.) and string extraction examples are shown in the SET command reference (type “help set” or “set /?” at a command prompt)

    The reason I chose to demonstrate the FOR loop technique in the main script is because it can be applied more universally, such as when the text strings you’re operating on are less predictable, delimited differently, etc.


  • Line 20 shows a technique I use often, wherein “echo.” issues a hard return. So I’m actually piping a hard return into the “TIME” command (which would otherwise prompt me to enter a new time). Then I’m piping that output into a “FIND” so it only grabs the line I’m interested in. Alternative approaches are:
    • “TIME /T” which prints current time as well, but not always in the format desired.
    • Echo The current time is: %TIME%” which could have produced results identical to the series of piped commands in the main listing.

  • Last Note: I think this script misbehaves when the day rolls over while it’s running. Fixing that issue is left as an exercise for the reader.

To be clear, I don’t expect this script to be one that anyone’s clamoring for. Sure, it could be useful. But it was really about the challenge of creating something seemingly simple, yet doing it with a very limited toolset. I’m publishing it primarily because it demonstrates some techniques that others might find useful when also trying to accomplish something seemingly simple using pure DOS.

As always, please leave comments if this is helpful. Enjoy.