Included files content is unavailable during compilation #5

Closed
opened 2018-07-13 12:13:29 +02:00 by belliash · 24 comments
Owner

Aer Information

  • Aer Version (or commit ref): e53cfb8ba0
  • Operating System: Linux
  • System Architecture (eg. arm, x86_64, ...): x86_64

Your problem description

Actually, include() and require() statements does not execute the code. This leads to funny situation, when code from included file is executed at the end. Lets see two examples.

  1. Example number 1:

test.php:

test2.php:

  1. Example number 2:

test.php:

test2.php:

Expected results

Test 1 goes on, test 2 fails.

Current results

Test 1 fails with:
test.php: 3 Error: Inexistant base class 'X'
Compile error

Test 2 goes on with following output:
object(Y) { }

<!-- 1. Please speak English, this is the language all of us can speak and write. 2. Please take a moment to check that your issue doesn't already exist. 3. Please give all relevant information below for bug reports, because incomplete details will be handled as an invalid report. --> # Aer Information - Aer Version (or commit ref): e53cfb8ba0 - Operating System: Linux - System Architecture (eg. arm, x86_64, ...): x86_64 # Your problem description Actually, include() and require() statements does not execute the code. This leads to funny situation, when code from included file is executed at the end. Lets see two examples. 1. Example number 1: test.php: <?php include("test2.php"); class Y extends X {} var_dump(new Y()); ?> test2.php: <?php class X {} ?> 2. Example number 2: test.php: <?php include("test2.php"); class X {} var_dump(new Y()); ?> test2.php: <?php class Y extends X {} ?> # Expected results Test 1 goes on, test 2 fails. # Current results Test 1 fails with: test.php: 3 Error: Inexistant base class 'X' Compile error Test 2 goes on with following output: object(Y) { }
belliash added the
bug
label 2018-07-13 12:13:33 +02:00
belliash changed title from Included files are executed in wrong order to Included files content is unavailable during compilation 2018-07-19 15:22:22 +02:00
Author
Owner

Should file inclusion call compiler immediately?

Should file inclusion call compiler immediately?
Member

Had a quick look, this is what I get with the actual code

object(Y) { }

and with php

object(Y)#1 (0) { }

Had a quick look, this is what I get with the actual code `object(Y) { }` and with php `object(Y)#1 (0) { }`
Author
Owner

How possible?

test.php:

test2.php:

psharp $ ./binary/psharp test.php

Error: Inexistant base class 'X' in /home/belliash/psharp/test.php:3

How possible? test.php: <?php include("test2.php"); class Y extends X {} var_dump(new Y()); ?> test2.php: <?php class X {} ?> psharp $ ./binary/psharp test.php Error: Inexistant base class 'X' in /home/belliash/psharp/test.php:3
Member

Ah sorry for the noise indeed it does not work, retried with fresh repo.

Ah sorry for the noise indeed it does not work, retried with fresh repo.
Author
Owner

This is because, it even does not try to include the file. You can type "require 'inexistent_file.php';" and it will not show any error about that. It fails on compilation process, but files are being included later.

This is because, it even does not try to include the file. You can type "require 'inexistent_file.php';" and it will not show any error about that. It fails on compilation process, but files are being included later.
Author
Owner

I think, we should include and compile file, but defer its further execution in this case.

I think, we should include and compile file, but defer its further execution in this case.
Member

Sure feel free to test your assumptions you might finish before me, you re quick to fix issues apparently :)

Sure feel free to test your assumptions you might finish before me, you re quick to fix issues apparently :)

The problem is occurring because we expect the include function to perform compile-time processing, which is not true, but the class definition occurs at compile time.

The important part is that the include function is interpreted by PH7 as being a common function, nothing more. But we are not considering it. We think you will interpret the file recursively.

The include function, at compile time, does not perform recursive compilation of the file it is referencing. The include will only run at runtime and not at compile time. It seems that the include function behaves like a function, without any extra features, as we hope.

The class definition occurs at compile time, not at runtime. This is why you are giving reference error for class Y, because the include is not run at compile time, but the definition of the class is yes.

To solve the problem, we need to change the concept of included or change the concept of class definition of the PH7. Currently, PH7 does not perform include at compile time, but compiles the class at compile time. Then, either passes the include to run at compile time, or passes the class definition to runtime.

Both of these options are not easy to implement because this is an interpreter design decision. Who wrote the interpreter specified that it should be so and can take this into account in other points of the code and features.

The problem is occurring because we expect the include function to perform compile-time processing, which is not true, but the class definition occurs at compile time. The important part is that the include function is interpreted by PH7 as being a common function, nothing more. But we are not considering it. We think you will interpret the file recursively. The include function, at compile time, does not perform recursive compilation of the file it is referencing. The include will only run at runtime and not at compile time. It seems that the include function behaves like a function, without any extra features, as we hope. The class definition occurs at compile time, not at runtime. This is why you are giving reference error for class Y, because the include is not run at compile time, but the definition of the class is yes. To solve the problem, we need to change the concept of included or change the concept of class definition of the PH7. Currently, PH7 does not perform include at compile time, but compiles the class at compile time. Then, either passes the include to run at compile time, or passes the class definition to runtime. Both of these options are not easy to implement because this is an interpreter design decision. Who wrote the interpreter specified that it should be so and can take this into account in other points of the code and features.

For a better investigation into the code:

• The VmExecIncludedFile function in the vm.c file represents the runtime include execution

• The GenStateCompileClass function in the compiler.c file represents the definition of a class.

If you put the breakpoint and you will see:

• The GenStateCompileClass function will run in the build context. Look at the beginning of the stack that will complete.

• The VmExecIncludedFile will then be executed in the execution context. Look at the beginning of the stack that will complete.

For a better investigation into the code: • The VmExecIncludedFile function in the vm.c file represents the runtime include execution • The GenStateCompileClass function in the compiler.c file represents the definition of a class. If you put the breakpoint and you will see: • The GenStateCompileClass function will run in the build context. Look at the beginning of the stack that will complete. • The VmExecIncludedFile will then be executed in the execution context. Look at the beginning of the stack that will complete.

Either the two must be run at compile time, or they must be run at runtime.

This change changes the PH7 specification. You need to assess impacts.

Either the two must be run at compile time, or they must be run at runtime. This change changes the PH7 specification. You need to assess impacts.
Owner

I am thinking about the real implementation. In my opinion we cannot move class definition to runtime as this would have huge impact to overall interpreter. Instead, we should keep on changing the concept of include files. As @bernardobreder already mentioned, they are actually no less, no more than pure builtin functions. Although, lexer contains information about keywords, but they got lost in compiler. In my opinion file inclusion that take place during compilation should not have big impact on VM. Bigger challenge I see is keeping information about source file and line. I am not sure about this yet, but I think this information should come from compiler then, together with the bytecode. Another yet complicated thing can be the autoloader feature.

I am thinking about the real implementation. In my opinion we cannot move class definition to runtime as this would have huge impact to overall interpreter. Instead, we should keep on changing the concept of include files. As @bernardobreder already mentioned, they are actually no less, no more than pure builtin functions. Although, lexer contains information about keywords, but they got lost in compiler. In my opinion file inclusion that take place during compilation should not have big impact on VM. Bigger challenge I see is keeping information about source file and line. I am not sure about this yet, but I think this information should come from compiler then, together with the bytecode. Another yet complicated thing can be the autoloader feature.
Author
Owner

Autoloader works in runtime, eg. when new object is being instantiated and it does not exist yet. Yes, this will be a problem too.

Autoloader works in runtime, eg. when new object is being instantiated and it does not exist yet. Yes, this will be a problem too.
Author
Owner

@bernardobreder, How do you think, how this could be resolved? I personally dont see checking syntax and compiling classes on runtime, but on the other side how could we use the autoloader during compile ...

@bernardobreder, How do you think, how this could be resolved? I personally dont see checking syntax and compiling classes on runtime, but on the other side how could we use the autoloader during compile ...

Add a breakpoint in the vm_builtin_include function, see that the execution root includes there. With this, I conclude that virtual machine PH7 defines an include statement. Correcting this issue is not having this statement, because we expect it to be called at compile time rather than at execution time. So, first step is to remove this instruction from the VM, hoping to remove the vm_builtin_include method and its dependencies. You need to assess impacts.

The second step is to intercept the function call compilation and verify that the function has the special "include" name. If it does, it should be diverted to perform a new build in the same context as the current build, using the ph7_compile_file method.
The second step must be done in the PH7_CompileLangConstruct method, including a new Else If (} else if (nKeyID == PH7_TKWRD_INCONCE) {)

These two tasks resolve the problem but do not resolve the impact assessment of removing the include statement from the VM at run time.

Add a breakpoint in the vm_builtin_include function, see that the execution root includes there. With this, I conclude that virtual machine PH7 defines an include statement. Correcting this issue is not having this statement, because we expect it to be called at compile time rather than at execution time. So, first step is to remove this instruction from the VM, hoping to remove the vm_builtin_include method and its dependencies. You need to assess impacts. The second step is to intercept the function call compilation and verify that the function has the special "include" name. If it does, it should be diverted to perform a new build in the same context as the current build, using the ph7_compile_file method. The second step must be done in the PH7_CompileLangConstruct method, including a new Else If (} else if (nKeyID == PH7_TKWRD_INCONCE) {) These two tasks resolve the problem but do not resolve the impact assessment of removing the include statement from the VM at run time.

One reason the include was implemented to be interpreted at compile time is because it is possible to include include in a variable, which can not be evaluated at compile time. Therefore, to correct the problem you must remove this include feature in a variable and the syntactic reader should update in the include in a string and not any expression.

Because of this, you need to have a third step in changing the parser so when you read a normal function call with an include name, you should only receive a string as an argument and not any expression, followed by a comma.

One reason the include was implemented to be interpreted at compile time is because it is possible to include include in a variable, which can not be evaluated at compile time. Therefore, to correct the problem you must remove this include feature in a variable and the syntactic reader should update in the include in a string and not any expression. Because of this, you need to have a third step in changing the parser so when you read a normal function call with an include name, you should only receive a string as an argument and not any expression, followed by a comma.
Author
Owner

Yes, all of this can be still done. I now wonder what cold be done with the class autoloader features. I believe it has to work in runtime because it has to be evaluated. But it is also needed at compile time to include the necessary file.

Yes, all of this can be still done. I now wonder what cold be done with the class autoloader features. I believe it has to work in runtime because it has to be evaluated. But it is also needed at compile time to include the necessary file.
Author
Owner

Class autoloader is our feature, we have already added to PH7. But we have implemented it in the runtime. Would be nice to not lost it. I wonder how PHP deals with this.

Class autoloader is our feature, we have already added to PH7. But we have implemented it in the runtime. Would be nice to not lost it. I wonder how PHP deals with this.
Author
Owner

I think we cannot go this way. At compile time, we have to check syntax, but on runtime, when class is being instantiated the base class should be already available. We then need to check if base class exists and if its not final class. PH7_VmExtractClass() has to be used to extract information about base class and to call autoloader. It should be called from VmByteCodeExec() when PH7_OP_NEW occurs.

Already several checks are done there. Compiler does not check if class is defined when it parses the $x = new X(); construct and after browsing the code I think it should not check if extending class exists too. This should has less consequences than changing the include concept.

What do you think?

I think we cannot go this way. At compile time, we have to check syntax, but on runtime, when class is being instantiated the base class should be already available. We then need to check if base class exists and if its not final class. PH7_VmExtractClass() has to be used to extract information about base class and to call autoloader. It should be called from VmByteCodeExec() when PH7_OP_NEW occurs. Already several checks are done there. Compiler does not check if class is defined when it parses the $x = new X(); construct and after browsing the code I think it should not check if extending class exists too. This should has less consequences than changing the include concept. What do you think?
Author
Owner

Or even we need something new like PH7_OP_CLASS, to consolidate the class in the VM earlier. I imagine that class with the same name can be declared just once, while there can be several or even more objects created. Thus doing this during class instantiation would be an overhead.

During compilation we can install all attributes and methods to the given class. We can also instruct it that there is another class it is extending and/or interface it is implementing. Finally we can install the bare class.

Finally when PH7_OP_CLASS is called, it should check for base class and interface. If it is unavailable yet for some reason - try to autoload some file (include and compile). Afterwards the class should appear, or some error should be thrown. If the base class is found it should be merged into the class that called PH7_OP_CLASS.

Having all above done, the class should be ready to use on any call.
What is your opinion, guys?

Or even we need something new like PH7_OP_CLASS, to consolidate the class in the VM earlier. I imagine that class with the same name can be declared just once, while there can be several or even more objects created. Thus doing this during class instantiation would be an overhead. During compilation we can install all attributes and methods to the given class. We can also instruct it that there is another class it is extending and/or interface it is implementing. Finally we can install the bare class. Finally when PH7_OP_CLASS is called, it should check for base class and interface. If it is unavailable yet for some reason - try to autoload some file (include and compile). Afterwards the class should appear, or some error should be thrown. If the base class is found it should be merged into the class that called PH7_OP_CLASS. Having all above done, the class should be ready to use on any call. What is your opinion, guys?
Owner

Hah, just imagined that original issue has the same ID: https://github.com/symisc/PH7/issues/5

Anyway, I also think, that basically we should not verify classes until they are used for the first time. During compilation, only store the name of base class, then, once resolved, replace that by a class_entry*. Then we should run the verification (but only once, so in my opinion creating PH7_OP_CLASS can be a good idea).

Hah, just imagined that original issue has the same ID: https://github.com/symisc/PH7/issues/5 Anyway, I also think, that basically we should not verify classes until they are used for the first time. During compilation, only store the name of base class, then, once resolved, replace that by a class_entry*. Then we should run the verification (but only once, so in my opinion creating PH7_OP_CLASS can be a good idea).
belliash self-assigned this 2018-07-26 16:44:26 +02:00
Author
Owner

In b040886b97 I have made some modifications to the compiler. It also partially implements multiple inheritance (only syntax is checked). Compiler passes a copy of class with information about which classes it extends and which interfaces it implements to the VM. It still needs some tweaks and the runtime part is not ready yet.

In b040886b97 I have made some modifications to the compiler. It also partially implements multiple inheritance (only syntax is checked). Compiler passes a copy of class with information about which classes it extends and which interfaces it implements to the VM. It still needs some tweaks and the runtime part is not ready yet.
Author
Owner

Partially fixed. Unfortunately only partially.

Partially fixed. Unfortunately only partially.
belliash reopened this issue 2018-07-27 17:51:14 +02:00
Author
Owner

This seems to be fixed in include_fix branch (72f2cc2c1b).

This seems to be fixed in include_fix branch (72f2cc2c1b).
Author
Owner

To summarize. I have implemented new structure ph7_class_info which holds class name and linked lists with information about classes it inherits from and interfaces it implements.
This information is passed from compiler to VM via PH7_VmEmitInstr() call. In order to make it working I needed to implement 2 additional opcodes: PH7_OP_CLASS_INIT and PH7_OP_INTERFACE_INIT. First one takes 3 parameters:

  • P1 instructs it that given class inherits from another class(es).
  • P2 instructs VM, that given class implements some interface(s).
  • P3 simply stores the ph7_class_info structure

The second opcode, takes only 2 parameters:

  • P1 instructs it, that the interface extends another interface(s)
  • P3 again stores ph7_class_info
    Both instructions are called from compiler only if there is such a need. If class or interface does not take anything from other object, whole initialization can be done during compilation.

To implement this properly, I needed to move several checks from compiler to VM:

  • checking if base class/interface exists
  • checking if class/interface can be extended (i.e. if its not final)
  • checking if class is not extending interface, or not implementing class
  • checking if interface is not extending class
    If all requirements are met, class/interface is extended/implemented during runtime. All of these actions take place just one, when OP_CLASS_INIT or OP_INTERFACE_INIT is emitted.

Independently from this ticket, I have done additional work in include_fix branch:

  1. It is now possible to use multiple inheritance syntax (class/interface names are separated by comma). Actually the implementation is not done as it is not under the scope of this ticket. The last class mentioned in class definition overrides the previous ones, eg:

     class TEST extends a, b, c, d, e {}
    

means that class a will be overridden by class b, class b will be overridden by class c, class c will be overridden by class d, and so on.
2. Some additional checks have been added
3. Some typos corrected
4. VM initialization has been moved into separate function. Thanks to that it can be initialized and later configured before compiling the source file. Thanks to that we can earlier enable errors reporting and make it more verbose.

To summarize. I have implemented new structure ph7_class_info which holds class name and linked lists with information about classes it inherits from and interfaces it implements. This information is passed from compiler to VM via PH7_VmEmitInstr() call. In order to make it working I needed to implement 2 additional opcodes: PH7_OP_CLASS_INIT and PH7_OP_INTERFACE_INIT. First one takes 3 parameters: * P1 instructs it that given class inherits from another class(es). * P2 instructs VM, that given class implements some interface(s). * P3 simply stores the ph7_class_info structure The second opcode, takes only 2 parameters: * P1 instructs it, that the interface extends another interface(s) * P3 again stores ph7_class_info Both instructions are called from compiler only if there is such a need. If class or interface does not take anything from other object, whole initialization can be done during compilation. To implement this properly, I needed to move several checks from compiler to VM: * checking if base class/interface exists * checking if class/interface can be extended (i.e. if its not final) * checking if class is not extending interface, or not implementing class * checking if interface is not extending class If all requirements are met, class/interface is extended/implemented during runtime. All of these actions take place just one, when OP_CLASS_INIT or OP_INTERFACE_INIT is emitted. Independently from this ticket, I have done additional work in include_fix branch: 1. It is now possible to use multiple inheritance syntax (class/interface names are separated by comma). Actually the implementation is not done as it is not under the scope of this ticket. The last class mentioned in class definition overrides the previous ones, eg: class TEST extends a, b, c, d, e {} means that class a will be overridden by class b, class b will be overridden by class c, class c will be overridden by class d, and so on. 2. Some additional checks have been added 3. Some typos corrected 4. VM initialization has been moved into separate function. Thanks to that it can be initialized and later configured before compiling the source file. Thanks to that we can earlier enable errors reporting and make it more verbose.
Sign in to join this conversation.
No Milestone
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: aerscript/Aer#5
No description provided.