The following explains in detail some of the parameters in the workflow and how to avoid common mistakes that may result in an error or biologically faulty results.
Chains to include in calculations
The chain identifier is a single, one-lettered code representing a chain in the protein's structure. Please provide a list of all chains you would like to retain during simulations.
IMPORTANT: The list must include the chains where the library positions are located on. You must also include the chains of the essential residues. It may also include chains interacting with the library positions that you would like FuncLib's calculations to take into account.
Do not include chains containing only ligands or ions (i.e., no amino acid residues).
How to find the chain identifier? The Chain IDs must correspond to chains within the PDB file. It is in the twenty-second column in a PDB file of lines starting with the words ATOM or HETATM.
Amino acid positions to diversify
Select several positions (2 to 15) in the active site. The selected positions should not include key catalytic residues performing the chemistry of the catalysis; rather, residues composing the first and second shell of the active site or in (in)direct contact with the substrate should be selected.
The Protein Contacs Atlas can be used to find first and second shell resides.
See Figures 1B and 4A in the FuncLib paper for examples of position selection.
Important: make sure the chains the library positions are located on are specified in Chain identifier to design (see above).
Essential amino acid residues
Fixed residues are residues that FuncLib, during all simulations, retains the wild-type original conformations of their side-chain.
Fixed residues may include:
- Key catalytic residues performing the chemical reaction itself
- Residues located on an interface with another chain/ protein
- Residues involved in binding a secondary ligand/ cofactor
- Metal ion chelating residues
- Catalytic residues in a different active site than the one being diversified using the current FuncLib calculation
MAKE SURE YOU USE THE CORRECT NUMBERS (the relevant numbers are the ones in the submitted PDB file).
If you enter a number that does not exist in the PDB, your query will crash.
Important 1: Do not include here any of the amino acid positions to diversify, as they should be mutated. The essential amino acid residues will be kept fixed during simulations.
Important 2: include only protein residues, not hetero atoms like co-factors and ligands.
Important 3: make sure the chains the essential positions are located on are specified in "Chain identifier to design"
Ligands to keep during simulations
FuncLib can be used to increase reactivity and/ or specificity to a certain ligand or transition state analog.
To use this option, make sure the ligand coordinates are specified in the protein structure.
Important: Use this option for ligands consisting of at least three atoms. For Ions, please use the option Ion below.
Please provide the residue number and the chain it is located on (i.e., 1A).
Ions to keep during simulations
Some proteins have ions important for structure/ catalysis. In order to preserve their original function, it is recommended to keep them in the exact same position during FuncLib's simulations.
In order to use this option, make sure the ion's coordinates are specified in the protein structure- either in the self-uploaded PDB file or in the Protein Data Bank version of the structure.
Please provide both the residue number and the chain it is located on (i.e., 1A).
We strongly recommend including the ion coordinating residues in the "Essential amino acid residues".
Sequence space file
A sequence space can be manually uploaded instead of calculated by the algorithm. Use this option if:
- This is a re-run. Use the sequence space sent to you at the end of the previous run, and edit whatever is necessary
- You have some previous knowledge on the system (e.g., deep mutational scanning data) which you want to use to calculate combinations of mutation
To avoid failing your run, please note:
- Format the sequence space file appropriately (see example below)
- We only calculate up to 500,000 mutants. If the sequence space includes more mutants, the run will fail
- All positions mentioned in the sequence space file should also be mentioned in the Amino acid positions to diversify (see above)
Example: A sequence space file diversifying four positions (106, 132 & 271 located on chain A, 217 located on chain B), should be in the following format (bolded parts are constant and should not be altered). The first AA in each row is the WT identity (e.g., position 106 on chain A is Isoleucine).
106 A PIKAA ICHLM
132 A PIKAA FL
271 A PIKAA LIR
217 B PIKAA ML