[RNG] Duplicated blocks?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[RNG] Duplicated blocks?

Gilles Sadowski-2
Hello.

SonarQube reports "duplicated blocks":
    https://sonarcloud.io/component_measures?id=commons-rng&metric=duplicated_blocks&view=list

Does it report about the license header?
If so, how to deactivate that spurious report?

Regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [RNG] Duplicated blocks?

Alex Herbert


> On 13 Sep 2019, at 15:27, Gilles Sadowski <[hidden email]> wrote:
>
> Hello.
>
> SonarQube reports "duplicated blocks":
>    https://sonarcloud.io/component_measures?id=commons-rng&metric=duplicated_blocks&view=list
>
> Does it report about the license header?

No. If you click though to the source code (from a violation in the list) and scroll you will see a bar in the left margin for the duplicated code. You can click on it and it states where the duplicate is.

There are 13 reported duplicates and it’s not always a spurious report. Some of the code blocks are duplicates. In this case it is for code that performs work to compute more than a single value. As such it is not a simple refactor to move the block into a method without having to return an object encapsulating the multiple values computed; or pass in an array to hold all the value computed; or have a common parent with shared state variable. In the case of common code in the random generator next() method of Well19937a and Well44497a the various workarounds to eliminate common code would have negative effects and is not worth it IMO. In one case it is in classes within different packages (e.g. AbstractPcg6432 vs PcgRxsMXs64) where the amount of duplicate code is actually the same but it is outside of the package structure to have a common parent. That is unreleased code though so could be rearranged without breaking binary compatibility.

In about half of the cases it is a spurious report where:

- the code block as text is identical but the types of the variables are different (e.g. byte/short/int in the MarsagliaTsangWangDiscreteSampler) - in these cases I could try renaming variables but it seems odd since the code is deliberately copied with modification only for the data type.
- the value of the constants are different on a single line of code that calls a method (e.g. the AbstractXoRoShiRo128 vs AbstractXoShiRo256).

In one case it is exactly the same algorithm (BoxMullerGaussianSampler vs BoxMullerNormalizedGaussianSampler); here one is deprecated as it is a known duplicate.

> If so, how to deactivate that spurious report?

Don’t know. I’m currently OK with all the duplicates as they are. Ideally they could be removed as exceptions without turning off the duplicates report but leaving them on is fine.

There are a fair number of other code smells (total = 107).

We can get rid of 9 by renaming SEED to S in the commons/rng/simple/internal/ Converters.

There are 33 for deprecation to ignore.

There are 18 for tests that have no assertions. These are tests that check constructors do not throw when self-seeded (i.e. input seed is too small). They could be updated to assert the created object is not null. It seems pointless to do that. The methods are commented to state they are checking construction paths. The generators could be tested to ensure they do not return all zeros or the same number. But a full statistical test of the output seems over the top.

There are 15 'Extract the assignment out of this expression’ that I prefer as is in MarsagliaTsangWangDiscreteSampler. But I did write it. I could change this and it would not affect sampler performance as the code is all in the sampler creation method. The other 3 are in the generator next() methods and are written that way to match the reference code. It is a style that I am fine with in these cases. If changed I would like to do performance tests to check it doesn’t make the method slower to remove a coding style warning. So I recommend not changing those for simplicity.

3 for commented out code in the pom.xml could be fixed.

There are 2 ignored tests using zero array seeds. It is known that xor-shift based generators cannot handle all zero seeds. These tests would never work for those generators (there are 18 such generators in the library). I suggest deleting these tests. An alternative would be to test the first 50 values. If all zero then assume the generator is dead. Otherwise do the statistical test on the generator. The test could be commented to state that if a generator is outputting non-zero values from an all zero seed that the output should be random.

The TODO comment in InternalGamma can be removed. It is 3 years old and there is not currently a need to move this helper class to a different component.

Then there are a few legitimate oddities for the remaining 10 or so. I can comment the code so this can be seen when clicking through from Sonar.

Doing all the fixes would remove 48 code smells. It would make looking at the Sonar report easier.

Alex


>
> Regards,
> Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [RNG] Duplicated blocks?

Gilles Sadowski-2
Hi.

Le ven. 13 sept. 2019 à 22:34, Alex Herbert <[hidden email]> a écrit :

>
>
>
> > On 13 Sep 2019, at 15:27, Gilles Sadowski <[hidden email]> wrote:
> >
> > Hello.
> >
> > SonarQube reports "duplicated blocks":
> >    https://sonarcloud.io/component_measures?id=commons-rng&metric=duplicated_blocks&view=list
> >
> > Does it report about the license header?
>
> No. If you click though to the source code (from a violation in the list) and scroll you will see a bar in the left margin for the duplicated code. You can click on it and it states where the duplicate is.
>
> There are 13 reported duplicates and it’s not always a spurious report. Some of the code blocks are duplicates. In this case it is for code that performs work to compute more than a single value. As such it is not a simple refactor to move the block into a method without having to return an object encapsulating the multiple values computed; or pass in an array to hold all the value computed; or have a common parent with shared state variable. In the case of common code in the random generator next() method of Well19937a and Well44497a the various workarounds to eliminate common code would have negative effects and is not worth it IMO. In one case it is in classes within different packages (e.g. AbstractPcg6432 vs PcgRxsMXs64) where the amount of duplicate code is actually the same but it is outside of the package structure to have a common parent. That is unreleased code though so could be rearranged without breaking binary compatibility.
>
> In about half of the cases it is a spurious report where:
>
> - the code block as text is identical but the types of the variables are different (e.g. byte/short/int in the MarsagliaTsangWangDiscreteSampler) - in these cases I could try renaming variables but it seems odd since the code is deliberately copied with modification only for the data type.
> - the value of the constants are different on a single line of code that calls a method (e.g. the AbstractXoRoShiRo128 vs AbstractXoShiRo256).
>
> In one case it is exactly the same algorithm (BoxMullerGaussianSampler vs BoxMullerNormalizedGaussianSampler); here one is deprecated as it is a known duplicate.
>
> > If so, how to deactivate that spurious report?
>
> Don’t know. I’m currently OK with all the duplicates as they are. Ideally they could be removed as exceptions without turning off the duplicates report but leaving them on is fine.

IIRC it is possible to instruct SonarQube that a particular should be
considered as solved.
[I could do it when SonarQube was located at Apache; but one has to be
logged in (which
is reasonable for tracking purpose).]

>
> There are a fair number of other code smells (total = 107).
>
> We can get rid of 9 by renaming SEED to S in the commons/rng/simple/internal/ Converters.

I don't agree with this rule as it contradicts a more important one IMO:
self-documenting code.

> There are 33 for deprecation to ignore.
>
> There are 18 for tests that have no assertions. These are tests that check constructors do not throw when self-seeded (i.e. input seed is too small). They could be updated to assert the created object is not null. It seems pointless to do that. The methods are commented to state they are checking construction paths. The generators could be tested to ensure they do not return all zeros or the same number. But a full statistical test of the output seems over the top.
>
> There are 15 'Extract the assignment out of this expression’ that I prefer as is in MarsagliaTsangWangDiscreteSampler. But I did write it. I could change this and it would not affect sampler performance as the code is all in the sampler creation method. The other 3 are in the generator next() methods and are written that way to match the reference code. It is a style that I am fine with in these cases. If changed I would like to do performance tests to check it doesn’t make the method slower to remove a coding style warning. So I recommend not changing those for simplicity.

Fine.
Could you check that you can declare issues as "solved"?

Thanks,
Gilles

>
> 3 for commented out code in the pom.xml could be fixed.
>
> There are 2 ignored tests using zero array seeds. It is known that xor-shift based generators cannot handle all zero seeds. These tests would never work for those generators (there are 18 such generators in the library). I suggest deleting these tests. An alternative would be to test the first 50 values. If all zero then assume the generator is dead. Otherwise do the statistical test on the generator. The test could be commented to state that if a generator is outputting non-zero values from an all zero seed that the output should be random.
>
> The TODO comment in InternalGamma can be removed. It is 3 years old and there is not currently a need to move this helper class to a different component.
>
> Then there are a few legitimate oddities for the remaining 10 or so. I can comment the code so this can be seen when clicking through from Sonar.
>
> Doing all the fixes would remove 48 code smells. It would make looking at the Sonar report easier.
>
> Alex
>
>
> >
> > Regards,
> > Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [RNG] Duplicated blocks?

Alex Herbert


> On 13 Sep 2019, at 22:10, Gilles Sadowski <[hidden email]> wrote:
>
> Hi.
>
> Le ven. 13 sept. 2019 à 22:34, Alex Herbert <[hidden email] <mailto:[hidden email]>> a écrit :
>>
>>
>>
>>> On 13 Sep 2019, at 15:27, Gilles Sadowski <[hidden email]> wrote:
>>>
>>> Hello.
>>>
>>> SonarQube reports "duplicated blocks":
>>>   https://sonarcloud.io/component_measures?id=commons-rng&metric=duplicated_blocks&view=list
>>>
>>> Does it report about the license header?
>>
>> No. If you click though to the source code (from a violation in the list) and scroll you will see a bar in the left margin for the duplicated code. You can click on it and it states where the duplicate is.
>>
>> There are 13 reported duplicates and it’s not always a spurious report. Some of the code blocks are duplicates. In this case it is for code that performs work to compute more than a single value. As such it is not a simple refactor to move the block into a method without having to return an object encapsulating the multiple values computed; or pass in an array to hold all the value computed; or have a common parent with shared state variable. In the case of common code in the random generator next() method of Well19937a and Well44497a the various workarounds to eliminate common code would have negative effects and is not worth it IMO. In one case it is in classes within different packages (e.g. AbstractPcg6432 vs PcgRxsMXs64) where the amount of duplicate code is actually the same but it is outside of the package structure to have a common parent. That is unreleased code though so could be rearranged without breaking binary compatibility.
>>
>> In about half of the cases it is a spurious report where:
>>
>> - the code block as text is identical but the types of the variables are different (e.g. byte/short/int in the MarsagliaTsangWangDiscreteSampler) - in these cases I could try renaming variables but it seems odd since the code is deliberately copied with modification only for the data type.
>> - the value of the constants are different on a single line of code that calls a method (e.g. the AbstractXoRoShiRo128 vs AbstractXoShiRo256).
>>
>> In one case it is exactly the same algorithm (BoxMullerGaussianSampler vs BoxMullerNormalizedGaussianSampler); here one is deprecated as it is a known duplicate.
>>
>>> If so, how to deactivate that spurious report?
>>
>> Don’t know. I’m currently OK with all the duplicates as they are. Ideally they could be removed as exceptions without turning off the duplicates report but leaving them on is fine.
>
> IIRC it is possible to instruct SonarQube that a particular should be
> considered as solved.
> [I could do it when SonarQube was located at Apache; but one has to be
> logged in (which
> is reasonable for tracking purpose).]
>
>>
>> There are a fair number of other code smells (total = 107).
>>
>> We can get rid of 9 by renaming SEED to S in the commons/rng/simple/internal/ Converters.
>
> I don't agree with this rule as it contradicts a more important one IMO:
> self-documenting code.
>
>> There are 33 for deprecation to ignore.
>>
>> There are 18 for tests that have no assertions. These are tests that check constructors do not throw when self-seeded (i.e. input seed is too small). They could be updated to assert the created object is not null. It seems pointless to do that. The methods are commented to state they are checking construction paths. The generators could be tested to ensure they do not return all zeros or the same number. But a full statistical test of the output seems over the top.
>>
>> There are 15 'Extract the assignment out of this expression’ that I prefer as is in MarsagliaTsangWangDiscreteSampler. But I did write it. I could change this and it would not affect sampler performance as the code is all in the sampler creation method. The other 3 are in the generator next() methods and are written that way to match the reference code. It is a style that I am fine with in these cases. If changed I would like to do performance tests to check it doesn’t make the method slower to remove a coding style warning. So I recommend not changing those for simplicity.
>
> Fine.
> Could you check that you can declare issues as "solved”?

I can sign in with GitHub and it links my Apache credentials. This allows me to comment on issues and ‘confirm’ them which I could not do when not logged in.

However I do not see options to mark false positives and won’t fix. This stack overflow thread [1] for SonarCloud 5 states that I need to be added to an "Administer Issues” group for the project. Maybe the SonarCloud instance is a new version but the same should apply. I don’t have enough permissions.

Is this an issue to raise with INFRA?

[1] https://stackoverflow.com/questions/29646256/sonarqube-5-how-do-i-mark-false-positive <https://stackoverflow.com/questions/29646256/sonarqube-5-how-do-i-mark-false-positive>



>
> Thanks,
> Gilles
>
>>
>> 3 for commented out code in the pom.xml could be fixed.
>>
>> There are 2 ignored tests using zero array seeds. It is known that xor-shift based generators cannot handle all zero seeds. These tests would never work for those generators (there are 18 such generators in the library). I suggest deleting these tests. An alternative would be to test the first 50 values. If all zero then assume the generator is dead. Otherwise do the statistical test on the generator. The test could be commented to state that if a generator is outputting non-zero values from an all zero seed that the output should be random.
>>
>> The TODO comment in InternalGamma can be removed. It is 3 years old and there is not currently a need to move this helper class to a different component.
>>
>> Then there are a few legitimate oddities for the remaining 10 or so. I can comment the code so this can be seen when clicking through from Sonar.
>>
>> Doing all the fixes would remove 48 code smells. It would make looking at the Sonar report easier.
>>
>> Alex
>>
>>
>>>
>>> Regards,
>>> Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [RNG] Duplicated blocks?

Gilles Sadowski-2
>>> [...]
> > Could you check that you can declare issues as "solved”?
>
> I can sign in with GitHub and it links my Apache credentials. This allows me to comment on issues and ‘confirm’ them which I could not do when not logged in.
>
> However I do not see options to mark false positives and won’t fix. This stack overflow thread [1] for SonarCloud 5 states that I need to be added to an "Administer Issues” group for the project. Maybe the SonarCloud instance is a new version but the same should apply. I don’t have enough permissions.
>
> Is this an issue to raise with INFRA?

I'd think so.  Here is the ticket:
    https://issues.apache.org/jira/browse/INFRA-18842

Gilles

>
> [1] https://stackoverflow.com/questions/29646256/sonarqube-5-how-do-i-mark-false-positive <https://stackoverflow.com/questions/29646256/sonarqube-5-how-do-i-mark-false-positive>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]