X-Authentication-Warning: delorie.com: mail set sender to geda-user-bounces using -f
X-Recipient: geda-user AT delorie DOT com
X-CMAE-Analysis: v=2.4 cv=ev4acqlX c=1 sm=1 tr=0 ts=60204c9f
 a=+cj0cO56Fp8x7EdhTra87A==:117 a=8ZHd1Vedsh/TjeYqJD4SKQ==:17
 a=9+rZDBEiDlHhcck0kWbJtElFXBc=:19 a=dLZJa+xiwSxG16/P+YVxDGlgEgI=:19
 a=IkcTkHD0fZMA:10 a=qa6Q16uM49sA:10 a=a1KZgU7cAAAA:8 a=Mj1Xp5F7AAAA:8
 a=Cq1Uj3AEd3N1culE-xcA:9 a=QEXdDO2ut3YA:10 a=ng0hpkU2jXKPaRTLMVYJ:22
 a=OCttjWrK5_uSHO_3Hkg-:22
X-SECURESERVER-ACCT: glimrick AT epilitimus DOT com
X-Original-DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
	d=epilitimus.com; s=default; h=Content-Transfer-Encoding:Content-Type:
	In-Reply-To:MIME-Version:Date:Message-ID:From:References:To:Subject:Sender:
	Reply-To:Cc:Content-ID:Content-Description:Resent-Date:Resent-From:
	Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help:
	List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive;
	bh=iFaBu3bu+yGMj8xxtSONLXtOBsKgnGw7bZLHL9DuufM=; b=eXkQl56SYzBQa3CicMCmbGe1ul
	/c+QUp95qL85GTGtBPDPPuYKrTyXyKAeZvtl+V40DhfTK9XSMM/1urs+N8st68WFoNwFqkd9qH/UB
	hZI3TSWC3tHnVEcDtWqHjtnVaCubKb+3HsRk7RdCRH8qo6p/B0hIQ6HfG9G2sqjU60xvw1zoIejBz
	iBpSDm3PFzVJxukM4ZMYVwoDkgBjM+v2q9Iw1frkZs0WI09h7e0QhtN6V/+qtHzlp5w9N1140R2qF
	JAMysNhUDqYwGMkaLRaSTXjpkgQWNG4YhpwAhgHkQrhXVKFla7wWlbvate7iQT2KRb6pGbSPa7I1u
	sIP2IwrQ==;
Subject: Re: [geda-user] recursive subsheet issue
To: geda-user AT delorie DOT com
References: <d3f9a609-d761-8646-8f4f-0f3fb3fa6b33 AT epilitimus DOT com>
 <alpine DOT DEB DOT 2 DOT 21 DOT 2102021301260 DOT 1295 AT nimbus>
 <0cc19ac4-217d-0897-52db-8b6fdcbbd975 AT epilitimus DOT com>
 <alpine DOT DEB DOT 2 DOT 21 DOT 2102022220190 DOT 4264 AT nimbus>
 <fa457adf-e71d-b096-81d9-7e7198106020 AT epilitimus DOT com>
 <alpine DOT DEB DOT 2 DOT 21 DOT 2102071614290 DOT 2567 AT nimbus>
From: "Glenn (glimrick AT epilitimus DOT com) [via geda-user AT delorie DOT com]" <geda-user AT delorie DOT com>
Message-ID: <68e49810-304f-3252-3201-b7a68ad16ff8@epilitimus.com>
Date: Sun, 7 Feb 2021 12:24:53 -0800
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101
 Firefox/60.0 SeaMonkey/2.53.3
MIME-Version: 1.0
In-Reply-To: <alpine.DEB.2.21.2102071614290.2567@nimbus>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-AntiAbuse: This header was added to track abuse, please include it with any abuse report
X-AntiAbuse: Primary Hostname - a2plcpnl0121.prod.iad2.secureserver.net
X-AntiAbuse: Original Domain - delorie.com
X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12]
X-AntiAbuse: Sender Address Domain - epilitimus.com
X-Get-Message-Sender-Via: a2plcpnl0121.prod.iad2.secureserver.net: authenticated_id: glimrick AT epilitimus DOT com
X-Authenticated-Sender: a2plcpnl0121.prod.iad2.secureserver.net: glimrick AT epilitimus DOT com
X-Source: 
X-Source-Args: 
X-Source-Dir: 
X-CMAE-Envelope: MS4xfBvqh8o98LF8gJMmW6p9ZQdny3HHlm0JbT0WsrqO2wx44s4RSCULd4wQoT4vWBawbeetFGbkdGEKQybNHX3zpLPGIlyGiKvPzHup1Q5xARyPpNiBy3J4
 v0/lg3q+xIMxhm++BtooMKEZKbahQbX5uLeHnvnlzfFfz6h4AIkK1pt5MOoYNtH+9S1kS+QFTPVw07LgNOijpAeE9yiw8vMgsyd70Sup2+HJtK36BqndrkwW
Reply-To: geda-user AT delorie DOT com
Errors-To: nobody AT delorie DOT com
X-Mailing-List: geda-user AT delorie DOT com
X-Unsubscribes-To: listserv AT delorie DOT com
Precedence: bulk

Roland Lutz wrote:
>
> On Sat, 6 Feb 2021, Glenn (glimrick AT epilitimus DOT com) [via
> geda-user AT delorie DOT com] wrote:
>> Assuming I remembered correctly how to do it you should now find 3 new
>> commits on the stable-1.10 branch in my repo.
>
> `stable-1.10' is only for fixes which should go into the next bugfix
> release; regular development usually happens on `master'.
>
I put it on stable-1.10 because until you merge 1.10.2 back into master
I can only build stable-1.10.2. As far as I know the patch should apply
and build against master just fine. But I have no way to test it.
>
>>> commit 32d8fb59f919c5a26b5dae90a439f45c2ed7f86f
>>> Author: Glenn Pavlovic <glimrick AT epilitimus DOT com>
>>> Date:   Sat Feb 6 21:24:53 2021 -0800
>>>
>>>     Block inifinite subsheet recursion
>
> A few things which I noticed when looking at the patch:
>
> 1) The recursion check compares the os.path.realpath of the schematic,
> which is not necessarily a good idea.  As a rule of thumb, if you
> don't have to mess with symbolic links in your code, don't.  (I know
> gEDA/gaf violates this rule on a few instances, but this is because I
> wasn't sure if I'd break anything by changing them, and shouldn't be
> used as an example for new code.)
>
The reason for using realpath was to disambiguate a case where recursion
was occurring via symbolic links, i.e. to remove symbolic links from the
equation. Because the subsheet mechanism uses filenames which come from
the filesystem symbolic links are something which must be taken into
account.  Also see below

> 2) Using the filename for comparison isn't necessary anyway since the
> composite_sources field already contains Schematic objects; you can
> simply compare these objects.
>
> 3) The current comparison is done at the top of s_traverse_sheet1, but
> I think it makes more sense to do the comparison in the loop which
> invokes s_traverse_sheet1.  This means top-level schematics aren't
> checked, but these can't be recursive anyway.
>
My first reaction was: Yes they can. Any sheet can recurse to itself. I
checked.
Upon thinking about it further I understand what you meant. This way
still seems cleaner to me.
> 4) Passing a common `stack' variable as an argument to
> s_traverse_sheet1 is kind of mixing paradigms:
>
>   a) Either `stack' is conceptually shared between s_traverse_sheet1
> invocations.  In this case, there's no need to pass it as an argument;
> s_traverse_sheet1 can just use the `sheet_stack' variable.  (I'd
> prefer to place the definition before the function body to make things
> clearer, but this isn't strictly necessary.)
>
>   b) Or, s_traverse_sheet1 passes the current invocation stack to the
> next level of hierarchy.  In this case, it's conceptually cleaner to
> pass a copy of the variable which is discarded afterwards.  Since you
> don't need to do stack.pop() in this case, the `set' type seems more
> appropriate:
>
>     s_traverse_sheet1(sheet, set())
>     ...
>     s_traverse_sheet1(subsheet, set.union(stack, {sheet_path}))
>
re: 2-4 Schematic objects can't be compared as if the same schematic is
referenced by different paths, e.g. a symbolic link, then there are
multiple objects, one for each path.

The reason for using a list of filenames was so that a useful diagnostic
message could be provided so the user can find the source of the problem
in their file tree. Such a message requires the order in which the
sheets were referenced, meaning a set cannot be used. The only
alternative is to recreate the list by traversing back up the sheet
chain collecting ids as you go. Since you have the filenames as you
traverse downward it is cleaner to collect them as you go. Full
pathnames were used as the sheet structure may not be contained in a
single directory, nor even in subdirectories of the working directory.

I referred to it as a stack because that is essentially how it operates.
Subsheet pathnames are pushed on as they are encountered and removed as
processing is completed, just like a call stack. The only deviation from
a strict stack paradigm is when recursion is detected the data is
traversed like a list in order to provide the diagnostic information,
just like if you are outputting a stack trace. Imagine trying to find
the recursion point in a complex sheet structure if all you have to go
on is that recursion exists somewhere.

In order to capture the top level sheet which invokes the chain which
contains the recursion the 'stack' needs to be accessible at both the
top level and in s_traverse_sheet1.

re: 4a It was a judgement call as to whether to define it prior to
s_traverse_sheet1 and use it in both there and the top level loop or
define it as I did and and pass it in as a parameter. I chose the course
I did because: 1) I am still getting used to the scoping rules in python
and wasn't certain that a variable from outside the body of
s_traverse_sheet1, but inside the class constructor (which we are in
this case) would be accessible as it isn't really a 'global' (PS testing
says yes), 2) I find it cleaner to define it closest to the point where
the data originates, i.e. the top level loop, and then pass it along.
But in thanking about I can see your approach as well.

You can't do the check at the top level loop because that wouldn't check
anything but the top level sheets and so recursion below the top level
wouldn't be caught. The only place to catch them all is in
s_traverse_sheet1. If you check inside the loop in s_traverse_sheet1 as
you point out the top level sheets aren't checked and you have also done
a bunch of unneeded processing.

> 5) A change like this should have a regression test, i.e., a minimal
> test in xorn/tests/netlist/regression/ which fails before applying the
> change and passes afterwards.  This way, if someone inadvertently
> re-introduces the bug, the test will fail and the error be noticed.
>
Agreed. I tried. I actually had a very nice test all set up and commited
(a commit which I had to discard) before I discovered that the way the
test framework is set up it wouldn't work.
1. Recursion sets the error flag which causes xorn to exit with a
non-zero status.
2. The test framework bails as soon as it sees a non-zero status from
xorn which means the output is not checked.
3. If the recursion is allowed to go uncaught it crashes with a python
runtime error which also issues a non-zero status (99 instead of 1).
4. Because the error information is all on stderror the only way to
really distinguish whether the error was handled correctly is to compare
that output.
5. Because the diagnostic message contains path names 'run-test'
couldn't just be modified to check output after a non-zero status as the
output on someone elses machine would be different than on my machine.
I gave up when I hit #5 If you have a solution I am open to suggestions.
>
> 6) Keep Python coding style in mind:
>
>   a) operators are preceded and followed by a space, and commas are
> followed by a space
I thought I did
>
>   b) `if' is followed by a space and doesn't have parentheses around
> its condition
Again I thought I did, but admittedly old C/C++ habits die hard. (PS
Okay I see it)
>
>   c) iterating over a sequence is preferentially done using the `for
> member in sequence' syntax, on in this case, `for member in
> reversed(sequence)'
Yep, that was my original approach, but that doesn't give you positional
information, AFAIK, and parts of the diagnostic message depend on where
you are in the sequence, which was the reason for the numeric 'for'. It
is cleaner to compare i>0 than i<len(stack)-1 which is the reason for
the, admittedly clunky looking, range(len(stack)-1,-1,-1) construct to
iterate backwards.
>
>   d) an `if' statement which handles an error condition and ends with
> a `return' statement or similar should usually have its condition be
> true on error and not have an `else' clause
Don't know about 'usually' but I can see the argument you are making, If
it really bugs you I will change it.
>
> 7) Netlist errors should be concise and not contain line breaks.  In
> this case, the important information is the name of the instantiating
> schematic (which is implicitly reported by Sheet.error) and the name
> of the instantiated schematic; only the latter should be included in
> the error message.
Generally speaking I would agree, but I think it is useful in tracking
down an error like this to have the sequence. Imagine reading a stack
trace where all you have are the endpoints and nothing about how you got
from one to the other. If working on a large project you may be using a
sheet chain that someone else produced, possibly in collaboration with
some third parties, in which case you can't just go changing schematics,
and it is possible that the recursion is caused by something more subtle
like a mis-configured rc file. It is also possible that the actual
problem is at some midpoint, e.g. an incorrect symbolic link, not the
endpoints. I am loathe to assume I can foresee all the ways in which
this error might occur.
>
>
>>> commit 23b15ba475ec26fb55727af2903f17fae371c4e5
>>> Author: Glenn Pavlovic <glimrick AT epilitimus DOT com>
>>> Date:   Sat Feb 6 21:55:22 2021 -0800
>>>
>>>     Fixed a rare, freaky bug
>
> Do I understand correctly that this bug can only be triggered by a
> recursing subsheet, which is caught by the previous commit?
>
> In this case, I'd prefer to not handle this case specially.  An
> invalid internal state can lead to errors, but it's the invalid
> internal state which should be fixed, not the assumption that the
> state is sane.
Frankly I don't know. The specific case I found that causes it uses a
specific combination which involves recursion, and is invalid for other
reasons as well. When I remove any one of the three pieces the bug goes
away. Prior to the previous commit that set of conditions would have
resulted in a crash from run away recursion and so never would have
reached the point where this bug would have shown up. The reason I
didn't track it further back was that the code is trying to remove
something from a list which isn't in the list in this situation. So the
end result appears to me to be the same. Since the input which causes
the issue is both already invalid, and unlikely to occur it may be safe
to simply discard this commit, but then the bug would still exist. I
felt it was better to provide at least something as opposed to dropping
a "by the way there is a bug in gaf.netlist.netlist which happens only
if you do this weird thing" on your plate.
>
>
>>> commit 1f2de17cce2e4e97b8be14dda145155d5d8bd1ae (HEAD -> stable-1.10)
>>> Author: Glenn Pavlovic <glimrick AT epilitimus DOT com>
>>> Date:   Sat Feb 6 22:05:27 2021 -0800
>>>
>>>     Remove a cruft comment
>
> This comment was a result of me erring very hard on the side of not
> removing comments.  I believe it is safe to remove it.
Victory!
:)

Glenn