Forum Index
HomeZBasic Home   Forum RulesForum Rules   Forum FAQForum FAQ   MemberlistMemberlist   UsergroupsUsergroups   RSS FeedRSS Feed
Site SearchSite Search   LinksLinks   DownloadDownload   Digests and SubscriptionsDigests and Subscriptions
ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in   RegisterRegister
ZBasic Performance Measurements

 
Post new topic   Reply to topic    Forum Index -> ZBasic Language
Author Message
mikep



Joined: 24 Sep 2005
Posts: 765
Location: Austin, TX

Posted: 26 February 2006, 19:59 PM    Post subject: ZBasic Performance Measurements Reply with quote

I'm trying to measure the performance of ZBasic. Here is my test program:
Code:
Const count as Long = 100000
Sub Main()
   Dim b as Byte
   Dim i as Long
   Dim t as Single, o as Single
   
   ' empty loop
   t = timer()
   For i=1 to count
   Next
   o = timer() - t
   Debug.Print "Loop Overhead:";CStr(o)
   
   ' measurement loop
   t = timer()
   For i=1 to count
      b = b + 1
   Next
   t = timer() - t - o
   Debug.Print "Time for ";CStr(count); " iterations:";CStr(t)
   Debug.Print "Number of iterations per second "; CStr(CSng(count)/t)
End Sub
When b is declared as a Byte I get the result
Code:
Number of iterations per second 99805.07


When b is a integer or Long I get slightly slower speeds obviously because there is more work to perform the addition.

However this result doesn't match the stated claim of 160,000 per second. What am I missing here?

BTW When the same program is run under BasicX, the result is 27061 iterations per seconds.
Back to top
dkinzer
Site Admin


Joined: 03 Sep 2005
Posts: 2499
Location: Portland, OR

Posted: 27 February 2006, 4:55 AM    Post subject: Reply with quote

The claim of 160,000 maximum instructions per second is based on the measured execution time of a byte increment instruction. The test method is to measure the execution time of a loop not having the subject instruction and then measuring it again with the subject instruction. Here is the code for the basic loop that establishes the baseline speed.
Code:

Sub Main()
   Register.DDRC = &H01
   Do
      Register.PortC = Register.PortC Xor &H01
      Loop
End Sub


When this program is run, pin 12 will produce a square wave whose period is twice the nominal loop execution time. The period is measured with an oscilloscope or logic analyzer.

The program is then modified as shown below, adding a statement that generates a byte increment instruction.
Code:

Dim b as Byte
Sub Main()
   Register.DDRC = &H01
   Do
      Register.PortC = Register.PortC Xor &H01
      b = b + 1
      Loop
End Sub


When this program compiled and run, the increase in the period of the output from pin 12 will be attributable to the added instruction. The measured difference is twice the execution time of the added statement.

Note that in this case, the tested statement involves the incrementing of a module-level variable which yields a single pcode instruction. Your test code increments a local variable for which the following code is generated:
Code:
                  b = b + 1
006e 0d0000      PSHR_A         bp+0
0071 c4          INCI_B

There is no pcode instruction for incrementing a variable whose address is relative to the stack frame. There is, however, an instruction for incrementing a variable whose address is on the top of the stack so that is used since it is still more efficient than pushing the value, incrementing the variable on the TOS and then popping the value.

There are 8-bit, 16-bit and 32-bit versions of the three address mode variations of the increment instruction as well as corresponding decrement instructions.


Last edited by dkinzer on 27 February 2006, 19:23 PM; edited 1 time in total
Back to top
mikep



Joined: 24 Sep 2005
Posts: 765
Location: Austin, TX

Posted: 27 February 2006, 6:44 AM    Post subject: Reply with quote

Yes my mistake. I forgot to check the list file. When I move the variable b to be a global instead of a stack relative local I get the following (rounded) results:
  • Byte = 168,977 increments per second
  • Integer = 158,514 increments per second
  • Long = 145,868 increments per second
Back to top
spamiam



Joined: 13 Nov 2005
Posts: 665

Posted: 27 February 2006, 16:28 PM    Post subject: Reply with quote

mikep wrote:
Yes my mistake. I forgot to check the list file. When I move the variable b to be a global instead of a stack relative local I get the following (rounded) results:
  • Byte = 168,977 increments per second
  • Integer = 158,514 increments per second
  • Long = 145,868 increments per second


Wow, I would have expected a GREATER performance hit for a long vs a byte.

The AVR machine code under optimal circumstances consists of a subtract, then 3 subtract'sw/ carry. (Under the RISC architecture, no add machine code exists.)

So, a long increment takes the hardware 4 times longer for a long. I suppose this means that most of the time for a ZBASIC instruction is taken with other operations than just the read the data from memory to registers then doing the operation.

I wonder what the raw speed of the hardware is for a byte, int, and long increment (or add). Does anyone here play with the AVRs in C or ASM and have a scope. I don't have the scope, and I only use C, and the compiler optimizes away trivial stuff like repetetive increments unless I turn off ALL optimizations.

-Tony
Back to top
stevech



Joined: 23 Feb 2006
Posts: 657

Posted: 27 February 2006, 16:38 PM    Post subject: Reply with quote

I did a little test with something like



flag = TRUE
do
n=n+1 ' an unsigned long
loop while flag

And I had earlier created a task that sleeps for 10 seconds and then clears "flag". I then printed n divided by 10 and got about 360,000.
I did it this way to reduce the overhead in the loop iteration. I'll try it again with n as a global rather than a local.
Back to top
mikep



Joined: 24 Sep 2005
Posts: 765
Location: Austin, TX

Posted: 27 February 2006, 16:52 PM    Post subject: Reply with quote

spamiam wrote:
Wow, I would have expected a GREATER performance hit for a long vs a byte.

Given a clock speed of 14.7456 MHz, then 168,977 increments per second means 87 clocks for each increment.

Assuming 3 instructions for a byte increment then the remaining clocks are used for loading the code from EEPROM, instruction decoding and load/storing into the ZBasic stack.

Similar for a long increment the number of clocks is 101. These extra 14 clocks are the instructions to deal with the extra complication of a long increment and loading/saving 4 bytes instead of 1.
Back to top
dkinzer
Site Admin


Joined: 03 Sep 2005
Posts: 2499
Location: Portland, OR

Posted: 27 February 2006, 17:02 PM    Post subject: Reply with quote

The setup code is nearly the same for byte, word and dword increment/decrement, explaining why the times are not proportional to operand length.

Quote:
The AVR machine code under optimal circumstances consists of a subtract, then 3 subtract'sw/ carry.


The optimal code is a word add, then 2 byte adds with carry. The timing is the same as the sequence that you described but the code is two bytes shorter this way.
Code:
   adiw   r24, 1
   adc      r22, zero
   adc      r23, zero
Back to top
mikep



Joined: 24 Sep 2005
Posts: 765
Location: Austin, TX

Posted: 27 February 2006, 17:11 PM    Post subject: Reply with quote

stevech wrote:
I then printed n divided by 10 and got about 360,000.
I did it this way to reduce the overhead in the loop iteration. I'll try it again with n as a global rather than a local.

The real way to eliminate the loop iteration is to time two tests (one a null loop and one with the code under test) as Don and I have both remarked on earlier. It turns out that the loop time is much longer than the increment time.

Are you sure you divided by 10 because your result doesn't make sense - ZBasic is not as fast as 360,000 per second. I rewrote the test using your sample code (which includes the loop overhead) and got 38,506 iterations per second.
Code:
Private flag as Boolean
Private n as UnsignedLong
Private delayStack(1 to 50) as Byte

Public Sub Main()
   flag = TRUE
   n = 0
   CallTask "DelayTask", delayStack   
   Do
      n=n+1
   Loop While flag
   Debug.Print CStr(n\10)
End Sub

Private Sub DelayTask()
   Call Sleep(10.0)
   flag = FALSE
End Sub
Back to top
stevech



Joined: 23 Feb 2006
Posts: 657

Posted: 27 February 2006, 22:50 PM    Post subject: Reply with quote

360K - That's what I recall but I may be mistaken; I'm away from home now. Your code, above, is virtually identical to what I did. So...

I never make mistakes (I thought I made one once - but I was wrong!)
Back to top
Display posts from previous:   
Post new topic   Reply to topic    Forum Index -> ZBasic Language Time synchro. with the server - Timezone/DST with your computer
Page 1 of 1

 


All content Copyright © 2005-2012 Elba Corp. All Rights Reserved.
Opinions expressed in posts are those of the author and not necessarily those of Elba Corp.
Powered by phpBB © 2001, 2005 phpBB Group