Adding an observation class for weight composition

Hello Casal2 development team,

I want to add a new class of observational data to Casal2. This is proportions by weight bin.

This is where I have got to thinking about it.

The expected proportions at weight will be derived from the numbers at age, via the growth model (length at age with assumed CV, plus weight at length). In CASAL, there is no error in the weight at length conversion. Would a CV of weight at length be a useful addition here (data exist for its estimation outside of the model)?

There then needs to be a transition matrix between the weight prediction and the bin structure of the observations. Presumably, observed weight bins cannot be entirely within expected weight bins, although the reverse may be true. There also needs to be a weight plus group. I would only think this was needed at the high-weight end, but it might be prudent to allow it at both ends?

I would not expect weight bins to be at the gram level, but they could be as small as 100 g. Bin width would need to be specifiable.

I am assuming the likelihood would be multinomial. A small constant would need to be added to the observed value for each bin (as is done for lengths, I assume). There is of course a lot of correlation between adjacent bins. But I’m not sure what the best alternative error distribution would be. Put another way, I don’t have a better answer right now.

Keen to hear what you think.
Matt

Thanks, Matt, for your first post and for getting the contributions started. Legend! That’s awesome.
We can work through this with Teresa and Ian to get a spec into the repository.
If you think of any other details or questions, please keep adding them to this topic.

Hi Matt
is your data really a weight composition or is it number of fish in a 40 kg carton?

Ian

Hi Ian,

It is a carton having a known number of fish in it, and a total weight. They assure us they deliberately pack fish of a similar size together.

For one of the locations (Leigh fisheries), it is usually (they say) four fish per carton.

Matt

image001.jpg

Matt

But the carton volume is constant? So carton weight is approximately constant within a narrow range that equates to the volume of a fish that cannot just fit, but slightly smaller ones can? For the example above, you might expect carton that is X kg with 4 fish, but down to 0.8*X for 3 fish?

Ian

Ian,

You’re right.

So, I made an enquiry. The data that is being provided are derived from pack weights, and numbers of fish per pack. The latter being important – it seems they do have numbers of fish recorded, not just weight with an assumed number. So we have mean weights, coming in ~4 fish “packages”., The actual calculation of this, and the original data, are beyond my control (being provided by a subcontractor).

So, what is provide to me (the assessment), and what I want as an observation class, is proportion per 100 g bin.

Matt

image001.jpg

@MattD @doonanij

Please respond with a write-up in a draft document so that we can clarify how this new observation class would be an extension of one of the proportions-at-length observation classes like ProcessRemovalsByLength or ProportionsAtLength.

Since these are observations from port sampling, would this be related to retained catch instead of total catch?

OK, here is the draft text so far.

One questions:
Do we just keep the CV of weight in this observation, or should it be included elsewhere?

I have some equations etc. available from the casal2 manual (for length observations), and also from the SS3 manual (which has implemented generic composition data, i.e., weight compositions).

Bit of Proportions-at-weight: new Casal2 observation type December 2020

The observation process_removals_by_weight was added to allow the use of fish market data, where fish weights have been measured instead of fish lengths or ages, which are collated into proportions-at-weight. The observation is strictly from retained catch and is therefore associated with mortality from a defined fishery. When discarding is included in the model, the retained selectivity should be used for these data. If there are no discarding, then the total catch selectivity is used. However, for current coding purposes, this observation must be linked to a fishery as defined in a mortality_Instantaneous block, which ignores discards and so the selectivity is that defined in the mortality_Instantaneous block.

The expected proportions-at-weight are derived from normalizing the expected numbers-at-weight. There are two stages in calculating the expected numbers-at-weight. Firstly, Casal2 calculates the expected numbers-at-length from the numbers-at-age using the age-length and distribution in the @age_length block. Secondly, the numbers-at-length are converted into numbers-at-weight using the length-weight relationship in the @length_weight block and its distribution of weight about the mean weight-at-length. The length-weight distribution is currently only applied in Casal2 in the @observation block for weight composition data and is therefore specified here, as the CV of either a normal or lognormal distribution.

The user must specify the units of weight for the proportions-at-weight observations (which may be g or kg), a vector containing the lower edge of each weight bin, and a vector containing the proportions in each weight bin. Note that units for weight in the proportions-at-weight do not have to be the same as specified in @length-weight. Observation-specific weight bins must be a sequential subset of the model weight bins, with no missing or added values. Observations may be specified for any category used in the partition, or for some combinations of them [LINK TO CATEGORY SECTION?], e.g., for both males and females separately, or alternately, one set for combined sex. The weight bins must be the same for each year; if this is not the case, then years for which they are different need to be entered as different process_removals_by_weight blocks.

If there is no plus group, i.e., weight_plus=false, then Casal2 requires a vector of proportions-at-weight of length n+1, where n is the number of lower edges of each weight bin supplied (the final value provides the upper limit to the final bin). If weight_plus=true then Casal2 expects a vector of proportions-at-weight of length n. The last proportion represents the numbers from the last length bin to the maximum weight the age-weight relationship allows [not sure how we should specify this?].

Casal2 generates a warning if the mean weight estimated for the youngest age in the partition is greater than the lower size of the first weight bin. This is to guard against including weight observations that may have a substantial contribution from fish younger than the youngest age in the partition.

The only likelihood currently available in Casal2 for proportions_at_weight observations is the multinomial, with effective sample sizes for each year provided as error_values. Note that in the implementation of the multinomial likelihood in Casal2, the weight bins having a value of zero will have no contribution to the likelihood.

No specification is made for the specific time within the timestep that the same was taken. This is because the sample is linked to a fishery, where removed are defined to be at the mid-point.

Proposed Casal2 input:

@observation Observed_weight_frequency_east

type process_removals_by_weight

method_of_removal EastChathamRise #fishery

time_step Summer #Not truly needed, but length code needs it currently

mortality_instantaneous_process instant_mort

length_weight_cv 0.1

length_weight_dist lognormal

years 1991 1992

categories male

weight_unit kg

#delta 1e-5 ### robustification value for the likelihood; default 1e-11

weight_plus false

weight_bins 1.8 1.9 2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 4 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 5 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 7 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 8 8.1 8.2 8.3 8.4 8.5 8.6 8.7 8.8 8.9 9 9.1 9.2 9.3 9.4 9.5 9.6 9.7 9.8 9.9 10 10.1 10.2 10.3 10.4 10.5 10.6 10.7 10.8 10.9 11 11.1 11.2 11.3 11.4 11.5 11.6 11.7 11.8

table obs

1991 0.002 0.010 0.041 0.094 0.126 0.086 0.073 0.051 0.045 0.050 0.037 0.025 0.016 0.017 0.011 0.008 0.011 0.011 0.009 0.012 0.013 0.009 0.010 0.009 0.010 0.007 0.008 0.006 0.005 0.010 0.008 0.014 0.008 0.012 0.006 0.007 0.009 0.005 0.007 0.006 0.004 0.006 0.004 0.004 0.007 0.005 0.002 0.006 0.004 0.003 0.004 0.002 0.004 0.003 0.002 0.002 0.002 0.001 0.002 0.003 0.001 0.002 0.001 0.001 0.001 0.001 0.001 0.001 0.001 0.000 0.001 0.001 0.000 0.001 0.001 0.001 0.001 0.000 0.001 0.001 0.001 0.001 0.001 0.001 0.000 0.001 0.000 0.001 0.000 0.000 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.001

1992 0.002 0.006 0.016 0.018 0.027 0.038 0.058 0.069 0.051 0.061 0.063 0.052 0.032 0.028 0.021 0.018 0.013 0.012 0.008 0.011 0.012 0.007 0.012 0.012 0.015 0.007 0.009 0.011 0.009 0.010 0.014 0.013 0.012 0.019 0.011 0.013 0.013 0.009 0.010 0.014 0.008 0.009 0.006 0.006 0.008 0.008 0.005 0.008 0.007 0.006 0.003 0.006 0.006 0.004 0.002 0.004 0.007 0.003 0.004 0.003 0.002 0.003 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.002 0.001 0.001 0.001 0.002 0.002 0.001 0.002 0.002 0.002 0.001 0.001 0.001 0.002 0.001 0.001 0.001 0.001 0.001 0.000 0.001 0.001 0.000 0.001 0.001 0.000 0.000 0.001 0.000 0.001

likelihood multinomial

end_table

table error_values

1991 25

1992 25

end_table

delta 1e-5 for The robustification value ,delta, for the likelihood; Use (I.e., Use Z(α,Δ) in likelihood rather than Ei). Needed for the current likelihood code.

Thanks, Matt.

Since these are port-sampled data, then the new classes should be similar to the classes ProcessRemovalsByLengthRetained and ProcessRemovalsByLengthRetainedTotal.

Just a thought - in the @model block we currently have to specify length_bins when there are length composition observations. Would we/should we have to do an analogous thing for weight compositions?

I don’t think that having to specify population weight bins would be necessary, as the length-weight relationship would implicitly link the population length bins and the population weight bins.