[gentoo-commits] proj/sci:master commit in: sci-biology/trf-bin/, sci-biology/trf-bin/files/ - gentoo-commits

From:	Martin Mokrejs <mmokrejs@×××××××××××××××.cz>
To:	gentoo-commits@l.g.o
Subject:	[gentoo-commits] proj/sci:master commit in: sci-biology/trf-bin/, sci-biology/trf-bin/files/
Date:	Tue, 31 Jan 2017 16:47:30
Message-Id:	`1485881052.a2df729d9eae2e016883572c2fbfe941184f421d.mmokrejs@gentoo`

1

commit:     a2df729d9eae2e016883572c2fbfe941184f421d

2

Author:     Martin Mokrejš <mmokrejs <AT> fold <DOT> natur <DOT> cuni <DOT> cz>

3

AuthorDate: Tue Jan 31 16:44:12 2017 +0000

4

Commit:     Martin Mokrejs <mmokrejs <AT> fold <DOT> natur <DOT> cuni <DOT> cz>

5

CommitDate: Tue Jan 31 16:44:12 2017 +0000

6

URL:        https://gitweb.gentoo.org/proj/sci.git/commit/?id=a2df729d

7

8

sci-biology/trf-bin: renamed to reflect it is an upstream's binary

9

10

A lot of fixes in the ebuild as well:

11

- install snapshot copies of HTML docs to prevent checksum errors

12

- properly set MY_PV version number

13

- fix install location

14

- distinguish between 32 and 64bit binaries

15

16

Package-Manager: Portage-2.3.3, Repoman-2.3.1

17

18

 sci-biology/trf-bin/files/trf.definitions.txt | 159 +++++++++++++++++

19

 sci-biology/trf-bin/files/trf.txt             | 176 +++++++++++++++++++

20

 sci-biology/trf-bin/files/trf.whatsnew.txt    | 243 ++++++++++++++++++++++++++

21

 sci-biology/trf-bin/metadata.xml              |   8 +

22

 sci-biology/trf-bin/trf-bin-4.09.ebuild       |  53 ++++++

23

 5 files changed, 639 insertions(+)

24

25

diff --git a/sci-biology/trf-bin/files/trf.definitions.txt b/sci-biology/trf-bin/files/trf.definitions.txt

26

new file mode 100644

27

index 0000000..ddedf76

28

--- /dev/null

29

+++ b/sci-biology/trf-bin/files/trf.definitions.txt

30

@@ -0,0 +1,159 @@

31

+   [1][trflogo.png]

32

+

33

+

34

+

35

+FASTA Format:

36

+

37

+   The FASTA format is a plain text format which looks something like

38

+   this:

39

+

40

+   >myseq

41

+   AGTCGTCGCT AGCTAGCTAG CATCGAGTCT TTTCGATCGA GGACTAGACT TCTAGCTAGC

42

+   TAGCATAGCA TACGAGCATA TCGGTCATGA GACTGATTGG GCTTTAGCTA GCTAGCATAG

43

+   CATACGAGCA TATCGGTAGA CTGATTGGGT TTAGGTTACC

44

+

45

+   The first line starts with a greater than sign ">" and contains a name

46

+   or other identifier for the sequence. This is the sequence header and

47

+   must be in a single line. The remaining lines contain the sequence

48

+   data. The sequence can be in upper or lower case letters. Anything

49

+   other than letters (numbers for example) is ignored. Multiple sequences

50

+   can be present in the same file as long as each sequence has its own

51

+   header.

52

+

53

+Table Explanation:

54

+

55

+   The summary table includes the following information:

56

+    1. Indices of the repeat relative to the start of the sequence.

57

+    2. Period size of the repeat.

58

+    3. Number of copies aligned with the consensus pattern.

59

+    4. Size of consensus pattern (may differ slightly from the period

60

+       size).

61

+    5. Percent of matches between adjacent copies overall.

62

+    6. Percent of indels between adjacent copies overall.

63

+    7. Alignment score.

64

+    8. Percent composition for each of the four nucleotides.

65

+    9. Entropy measure based on percent composition.

66

+

67

+   If the output contains more than 120 repeats, multiple linked tables

68

+   are produced. The links to the other tables appear at the top and

69

+   bottom of each table.

70

+

71

+   Note: If you save multiple linked summary table files, use the default

72

+   names supplied by your browser to preserve the automatic linking.

73

+

74

+Alignment Explanation:

75

+

76

+   The alignment is presented as follows:

77

+    1. In each pair of lines, the actual sequence is on the top and a

78

+       consensus sequence for all the copies is on the bottom.

79

+    2. Each pair of lines is one period except for very small patterns.

80

+    3. The 10 sequence characters before and after a repeat are shown.

81

+    4. Symbol * indicates a mismatch.

82

+    5. Symbol - indicates an insertion or deletion.

83

+    6. Statistics refers to the matches, mismatches and indels overall

84

+       between adjacent copies in the sequence, not between the sequence

85

+       and the consensus pattern.

86

+    7. Distances between matching characters at corresponding positions

87

+       are listed as distance, number at that distance, percentage of all

88

+       matches.

89

+    8. ACGTcount is percentage of each nucleotide in the repeat sequence.

90

+    9. Consensus sequence is shown by itself.

91

+   10. If chosen as an option, 500 characters of flanking sequence on each

92

+       side of the repeat are shown.

93

+

94

+   Note: If you save the alignment file, use the default name supplied by

95

+   your browser to preserve the automatic cross-referencing with the

96

+   summary table.

97

+

98

+Program Parameters:

99

+

100

+   Input to the program consists of a sequence file and the following

101

+   parameters:

102

+    1. Alignment Parameters. Weights for match, mismatch and indels. These

103

+       parameters are for Smith-Waterman style local alignment using

104

+       wraparound dynamic programming. Lower weights allow alignments with

105

+       more mismatches and indels. Match weight is +2 in all options here.

106

+       Mismatch and indel weights (interpreted as negative numbers) are

107

+       either 3, 5, or 7. A 3 is more permissive and a 7 less permissive

108

+       of these types of alignments choices.

109

+    2. Minimum Alignment Score. The alignment score must meet or exceed

110

+       this value for the repeat to be reported.

111

+    3. Maximum Period Size. The period size must be no larger than this

112

+       value for the repeat to be reported. Period size is the programs

113

+       best guess at the pattern size of the tandem repeat. The program

114

+       will find all repeats with period size between 1 and 2000.

115

+    4. Maximum TR array size. Specifies the longest TR array (the complete

116

+       repeating sequence) expected to be found in the input, in millions

117

+       of base pairs. Some sequences have very long TR arrays, such as

118

+       chromosome 18 in HG38 which has an array measuring over 5.3 million

119

+       base pairs.

120

+    5. Detection Parameters. Matching probability Pm and indel probability

121

+       Pi. Pm = .80 and Pi = .10 by default and cannot be modified in this

122

+       version of the program.

123

+

124

+Options:

125

+

126

+    1. Flanking sequence. Flanking sequence consists of the 500

127

+       nucleotides on each side of a repeat. Flanking sequence is recorded

128

+       in the alignment file. This may be useful for PCR primer

129

+       determination.

130

+    2. Masked Sequence File. The masked sequence file is a [2]FASTA format

131

+       file containing a copy of the sequence with every character that

132

+       occurred in a tandem repeat changed to the letter 'N'. The word

133

+       "masked" is added to the sequence description line just after the

134

+       '>' character.

135

+    3. Data File. The data file is a text file which contains the same

136

+       information, in the same order, as the repeat [3]table file, plus

137

+       consensus and repeat sequences. This file contains no labeling and

138

+       is suitable for additional processing, for example with a perl

139

+       script, outside of the program.

140

+

141

+

142

+

143

+     [4][USEMAP:buttonarrow.png]  [5]Home

144

+

145

+      [6][USEMAP:buttonarrow.png]  [7]What's New

146

+

147

+     [8][USEMAP:buttonarrow.png]  [9]Submit Page

148

+

149

+     [10][USEMAP:buttonarrow.png]   [11]Downloads

150

+

151

+

152

+

153

+     __________________________________________________________________

154

+

155

+   [12][bu.gif] Last revised February 22, 2016

156

+   Send any questions or comments to:

157

+   [13]Yozen Hernandez

158

+

159

+References

160

+

161

+   1. http://tandem.bu.edu/trf/trf.html

162

+   2. http://tandem.bu.edu/trf/trf.definitions.html#fasta

163

+   3. http://tandem.bu.edu/trf/trf.definitions.html#table

164

+   4. LYNXIMGMAP:http://tandem.bu.edu/trf/trf.definitions.html#FPMap0

165

+   5. http://tandem.bu.edu/trf/trf.html

166

+   6. LYNXIMGMAP:http://tandem.bu.edu/trf/trf.definitions.html#FPMap3

167

+   7. http://tandem.bu.edu/trf/trf.whatnew.html

168

+   8. LYNXIMGMAP:http://tandem.bu.edu/trf/trf.definitions.html#FPMap1

169

+   9. http://tandem.bu.edu/trf/trf.submit.options.html

170

+  10. LYNXIMGMAP:http://tandem.bu.edu/trf/trf.definitions.html#FPMap2

171

+  11. http://tandem.bu.edu/trf/trf.download.html

172

+  12. http://www.bu.edu/

173

+  13. javascript:spamGuard('yhernand','bu.edu')

174

+

175

+[USEMAP]

176

+http://tandem.bu.edu/trf/trf.definitions.html#FPMap2

177

+   1. http://tandem.bu.edu/trf/trf.download.html

178

+

179

+[USEMAP]

180

+http://tandem.bu.edu/trf/trf.definitions.html#FPMap1

181

+   1. http://tandem.bu.edu/trf/trf.submit.options.html

182

+

183

+[USEMAP]

184

+http://tandem.bu.edu/trf/trf.definitions.html#FPMap3

185

+   1. http://tandem.bu.edu/trf/trf.whatnew.html

186

+

187

+[USEMAP]

188

+http://tandem.bu.edu/trf/trf.definitions.html#FPMap0

189

+   1. http://tandem.bu.edu/trf/trf.html

190

191

diff --git a/sci-biology/trf-bin/files/trf.txt b/sci-biology/trf-bin/files/trf.txt

192

new file mode 100644

193

index 0000000..f75ec5f

194

--- /dev/null

195

+++ b/sci-biology/trf-bin/files/trf.txt

196

@@ -0,0 +1,176 @@

197

+   [1][trflogo.png]

198

+

199

+

200

+Using Command Line Version of Tandem Repeats Finder

201

+

202

+   Once the program is installed you can run it with no parameters to

203

+   obtain information on proper usage syntax.

204

+

205

+   If you installed the program as trf then by typing trf at the command

206

+   line you will see the following output:

207

+

208

+Please use: trf File Match Mismatch Delta PM PI Minscore MaxPeriod [options]

209

+

210

+Where: (all weights, penalties, and scores are positive)

211

+  File = sequences input file

212

+  Match  = matching weight

213

+  Mismatch  = mismatching penalty

214

+  Delta = indel penalty

215

+  PM = match probability (whole number)

216

+  PI = indel probability (whole number)

217

+  Minscore = minimum alignment score to report

218

+  MaxPeriod = maximum period size to report

219

+  [options] = one or more of the following:

220

+        -m        masked sequence file

221

+        -f        flanking sequence

222

+        -d        data file

223

+        -h        suppress html output

224

+        -r        no redundancy elimination

225

+        -l <n>    maximum TR length expected (in millions) (eg, -l 3 or -l=3 for

226

+ 3 million)

227

+

228

+Note the sequence file should be in FASTA format:

229

+

230

+>Name of sequence

231

+aggaaacctgccatggcctcctggtgagctgtcctcatccactgctcgctgcctctccag

232

+atactctgacccatggatcccctgggtgcagccaagccacaatggccatggcgccgctgt

233

+actcccacccgccccaccctcctgatcctgctatggacatggcctttccacatccctgtg

234

+

235

+

236

+   The program accepts a minimum of eight parameters. Options can be

237

+   specified to generate additional files.

238

+

239

+   The following is a more detailed description of the parameters:

240

+     * File: The sequence file to be analyzed in FASTA format( [2]see for

241

+       details). Multiple sequence in the same file are allowed.

242

+     * Match, Mismatch, and Delta: Weights for match, mismatch and indels.

243

+       These parameters are for Smith-Waterman style local alignment using

244

+       wraparound dynamic programming. Lower weights allow alignments with

245

+       more mismatches and indels. A match weight of 2 has proven

246

+       effective with mismatch and indel penalties in the range of 3 to 7.

247

+       Mismatch and indel weights are interpreted as negative numbers. A 3

248

+       is more permissive and a 7 less permissive. The recomended values

249

+       for Match Mismatch and Delta are 2, 7, and 7 respectively.

250

+     * PM and PI: Probabilistic data is available for PM values of 80 and

251

+       75 and PI values of 10 and 20. The best performance can be achieved

252

+       with values of PM=80 and PI=10. Values of PM=75 and PI=20 give

253

+       results which are very similar, but often require as much as ten

254

+       times the processing time when compared with values of PM=80 and

255

+       PI=10.

256

+     * Minscore: The alignment of a tandem repeat must meet or exceed this

257

+       alignment score to be reported. For example, if we set the matching

258

+       weight to 2 and the minimun score to 50, assuming perfect

259

+       alignment, we will need to align at least 25 characters to meet the

260

+       minimum score (for example 5 copies with a period of size 5).

261

+     * Maxperiod: Period size is the program's best guess at the pattern

262

+       size of the tandem repeat. The program will find all repeats with

263

+       period size between 1 and 2000, but the output can be limited to a

264

+       smaller range.

265

+     * -m: This is an optional parameter and when present instructs the

266

+       program to generate a masked sequence file. The masked sequence

267

+       file is a FASTA format file containing a copy of the sequence with

268

+       every location that occurred in a tandem repeat changed to the

269

+       letter 'N'. The word "masked" is added to the sequence description

270

+       line just after the '>' character.

271

+     * -f: If this option is present, flanking sequence around each repeat

272

+       is recorded in the alignment file. This may be useful for PCR

273

+       primer determination. Flanking sequence consists of the 500

274

+       nucleotides on each side of a repeat.

275

+     * -d: A data file is produced if this option is present. This file is

276

+       a text file which contains the same information, in the same order,

277

+       as the summary table file, plus consensus pattern and repeat

278

+       sequences. This file contains no labeling and is suitable for

279

+       additional processing, for example with a perl script, outside of

280

+       the program.

281

+     * -h: suppress HTML output (this automatically switches -d to ON)

282

+     * -l <n>: Specifies that the longest TR array expected in the input

283

+       is at most n million bp long. The default is 2 (for 2 million).

284

+       Setting this option too high may result in an error message if you

285

+       did not have enough availablememory. We have only tested this

286

+       option uo to value 29.

287

+     * -u: Prints the help/usage message above

288

+     * -v: Prints the version information

289

+     * -ngs: More compact .dat output on multisequence files, returns 0 on

290

+       success. You may pipe input in with this option using - for file

291

+       name. Short 50 flanks are appended to .dat output. .dat output

292

+       actually goes to stdout instead of file. Sequence headers are

293

+       displayed in output as @header. Only headers containing repeats are

294

+       shown.

295

+

296

+   Using recommended parameters the command line will look something like:

297

+   trf yoursequence.txt 2 7 7 80 10 50 500 -f -d -m

298

+

299

+   Once the program starts running it will print update messages to the

300

+   screen. The word "Done" will be printed when the program finishes.

301

+

302

+   For single sequence input files there will be at least two HTML format

303

+   output files, a repeat table file and an alignment file. If the number

304

+   of repeats found is greater than 120, multiple linked repeat tables are

305

+   produced. The links to the other tables appear at the top and the

306

+   bottom of each table. To view the results start by opening the first

307

+   repeat table file with your web browser. This file has the extension

308

+   ".1.html". Alignment files can be accessed from the repeat table files.

309

+   Alignment files end with the ".txt.html" extension.

310

+

311

+   For input files containing multiple sequences a summary page is

312

+   produced that links to the output of individual sequences. This file

313

+   has the extension "summary.html". You should start by opening this file

314

+   if your input had multiple sequences in the same file. Also note that

315

+   the output files of individual sequences will have an identifier of the

316

+   form ".sn." ( n an integer) embedded in the name indicating the index

317

+   of the sequence in the input file. The identifier is omitted for single

318

+   sequence input files.

319

+

320

+   For more information on the output please see [3]Table Explanation and

321

+   [4]Alignment Explanation.

322

+

323

+

324

+

325

+

326

+     [5][USEMAP:buttonarrow.png]  [6]Home

327

+

328

+      [7][USEMAP:buttonarrow.png]  [8]What's New

329

+

330

+     [9][USEMAP:buttonarrow.png]  [10]Submit Page

331

+

332

+     [11][USEMAP:buttonarrow.png]   [12]Downloads

333

+

334

+

335

+     __________________________________________________________________

336

+

337

+   [13][bu.gif] Last revised September 11, 2003

338

+   Send any questions or comments to:

339

+   [14]Yozen Hernandez

340

+

341

+References

342

+

343

+   1. http://tandem.bu.edu/trf/trf.html

344

+   2. http://tandem.bu.edu/trf/trf.definitions.html#fasta

345

+   3. http://tandem.bu.edu/trf/trf.definitions.html#table

346

+   4. http://tandem.bu.edu/trf/trf.definitions.html#alignment

347

+   5. LYNXIMGMAP:http://tandem.bu.edu/trf/trf.unix.help.html#FPMap0

348

+   6. http://tandem.bu.edu/trf/trf.html

349

+   7. LYNXIMGMAP:http://tandem.bu.edu/trf/trf.unix.help.html#FPMap3

350

+   8. http://tandem.bu.edu/trf/trf.whatnew.html

351

+   9. LYNXIMGMAP:http://tandem.bu.edu/trf/trf.unix.help.html#FPMap1

352

+  10. http://tandem.bu.edu/trf/trf.submit.options.html

353

+  11. LYNXIMGMAP:http://tandem.bu.edu/trf/trf.unix.help.html#FPMap2

354

+  12. http://tandem.bu.edu/trf/trf.download.html

355

+  13. http://www.bu.edu/

356

+  14. javascript:spamGuard('yhernand','bu.edu')

357

+

358

+[USEMAP]

359

+http://tandem.bu.edu/trf/trf.unix.help.html#FPMap2

360

+   1. http://tandem.bu.edu/trf/trf.download.html

361

+

362

+[USEMAP]

363

+http://tandem.bu.edu/trf/trf.unix.help.html#FPMap1

364

+   1. http://tandem.bu.edu/trf/trf.submit.options.html

365

+

366

+[USEMAP]

367

+http://tandem.bu.edu/trf/trf.unix.help.html#FPMap3

368

+   1. http://tandem.bu.edu/trf/trf.whatnew.html

369

+

370

+[USEMAP]

371

+http://tandem.bu.edu/trf/trf.unix.help.html#FPMap0

372

+   1. http://tandem.bu.edu/trf/trf.html

373

374

diff --git a/sci-biology/trf-bin/files/trf.whatsnew.txt b/sci-biology/trf-bin/files/trf.whatsnew.txt

375

new file mode 100644

376

index 0000000..8967a7d

377

--- /dev/null

378

+++ b/sci-biology/trf-bin/files/trf.whatsnew.txt

379

@@ -0,0 +1,243 @@

380

+   [1][trflogo.png]

381

+

382

+

383

+[newspaper.png] What's New?

384

+

385

+New For Version 4.09 (Feb 22, 2016)

386

+

387

+   [checkmark.png] new -l/-L flag allows the user to specify the length of

388

+   the longest expected TR array in the input sequence, in millions. The

389

+   default value is 2, for 2 million bp. For HG38, a value of 6 is

390

+   necessary.

391

+

392

+   Example usage:

393

+trf409.linux64.exe hg38.fasta 2 5 7 80 10 50 2000 -l 6

394

+

395

+   [checkmark.png] Setting a sufficiently high value may result in a

396

+   crash, very long execution time, or a sharp drop in available memory on

397

+   your system. We have only tested up to a value of 25.

398

+

399

+   [checkmark.png] For workloads requiring over 3GB of RAM (any value of

400

+   -l above 5), the 32-bit builds cannot be used.

401

+

402

+   [checkmark.png] New -v/-V flag allows you to quickly check your version

403

+   of TRF.

404

+

405

+   [checkmark.png] New -u/-U flag allows you to quickly display the usage

406

+   help text.

407

+

408

+

409

+New For Version 4.07b

410

+

411

+   [checkmark.png] new -ngs flag allows more compact output and piping of

412

+   input on linux systems, returns standard linux exit value of 0 on

413

+   success

414

+

415

+   [checkmark.png] temporary output is no longer written to disk

416

+

417

+   [checkmark.png] changed alignment to go further when score drops to 0

418

+   on first pass, more repeats are reported now

419

+

420

+   [checkmark.png] fixed bestlist structure deallocations, this

421

+   significantly improves run speed on multi-sequence fasta files

422

+

423

+   [checkmark.png] fixed line in alignment files "file K of N", K was off

424

+   by 1 before

425

+

426

+   [checkmark.png] Added check to make sure all required parameters are

427

+   entered from the commandline

428

+

429

+

430

+New For Version 4.04

431

+

432

+   [checkmark.png] Widened radius of narrowband alignment to avoid losing

433

+   alignment in some cases

434

+

435

+

436

+New For Version 4.03

437

+

438

+   [checkmark.png] Added -R switch to produce output without redundancy if

439

+   desired

440

+

441

+   [checkmark.png] Fixed a bug in the redundancy algorithm which sometimes

442

+   caused repeats to vanish

443

+

444

+   [checkmark.png] Fixed a bug when N characters being part of pattern

445

+   cause problems

446

+

447

+

448

+New For Version 4.00

449

+

450

+   [checkmark.png] Improved longer period detection and period 1 detection

451

+

452

+   [checkmark.png] Improved alignment

453

+

454

+   [checkmark.png] Added a flag to suppress HTML output (-h)

455

+

456

+   [checkmark.png] Fixed a loading sequence problem with incomplete FASTA

457

+   format files

458

+

459

+   [checkmark.png] Increased minimum period for larger repeats from 1.9 to

460

+   1.8 (patternsize > 50)

461

+

462

+   [checkmark.png] Added Linux GUI version (GTK+)

463

+

464

+

465

+New For Version 3.21

466

+

467

+   [checkmark.png] Fixed Sequence Name Bug: This bug affected the parsing

468

+   of FASTA header on Windows versions of the program. The bug caused the

469

+   program to report a sequence name that had a control character appended

470

+   at the end and a missing parameters line in the output data file. This

471

+   bug surfaced as a result of version 3.20 fixes.

472

+

473

+

474

+New For Version 3.20

475

+

476

+   [checkmark.png] Improved Redundancy Control: We have improved the

477

+   program's ability to remove redundant versions of the same tandem

478

+   repeat. On earlier versions certain conditions could cause the

479

+   algorithm to leave a redundant version in the output. The new version

480

+   properly identifies these and removes them.

481

+

482

+   [checkmark.png] Improved End-of-Line Identification: Different

483

+   operating systems use different conventions for end of line (EOL)

484

+   character sequence in text files. We have made improvements in the

485

+   routines that allows TRF to read sequence text from files using various

486

+   EOL conventions.

487

+

488

+   [checkmark.png] Fixed Memory Overrun Bug: We have corrected a problem

489

+   where some large-pattern, low-scoring repeats could cause a memory

490

+   fault in previous versions. Thanks to Angie Hinrichs at UC Santa Cruz's

491

+   Genome Bioinformatics group for the bug report and the offending

492

+   sequence.

493

+

494

+

495

+

496

+Previous Update (Version 3.01)

497

+

498

+   [checkmark.png] Unlimited Sequence Size: We have eliminated the

499

+   sequence size restriction of previous versions. In this version you are

500

+   only limited by the memory available in your system.

501

+

502

+   [checkmark.png] Multi-Sequence Files: The program handles files

503

+   containing multiple sequences. Each sequence must contain its own FASTA

504

+   header. A summary page is produced which links to the results of

505

+   individual sequences.

506

+

507

+   [checkmark.png] Data File Includes Repeat Sequence: We now include

508

+   complete repeat sequence for each repeat record reported in the data

509

+   file.

510

+

511

+   [checkmark.png] Smaller Scores: The downloadable versions of the

512

+   program are now able to report matches with scores as low as 20. We

513

+   recommend caution when using this feature since very large output files

514

+   can be generated at this score level.

515

+

516

+   [checkmark.png] Longer Patter Sizes: The program finds repeats with

517

+   period size as large as 2000 base pairs.

518

+

519

+   [checkmark.png] New File Naming Convention: For input files containing

520

+   a single sequence the naming convention for output files has not been

521

+   changed. For input files containing multiple sequences a summary page

522

+   is produced. This file has the extension "summary.html" and contains

523

+   links to the repeat tables of the individual sequences. In the name of

524

+   each of those repeat tables and their alignment files, an additional

525

+   identifier ".sn." ( n an integer, for example: ".s3.") has been

526

+   inserted before the parameters to indicate the sequence index in the

527

+   input file.

528

+

529

+   [checkmark.png] Repeat Table Changes: Each table now shows the total

530

+   number of repeats found in the sequence and links to other tables have

531

+   been added at the bottom of the page.

532

+

533

+   [checkmark.png] Longer Flanking Sequence: 500 characters of flanking

534

+   sequence on each side of the repeat are now reported.

535

+

536

+

537

+   Previous Update (Version 2.02)

538

+

539

+   [checkmark.png] Multiple Repeat Tables: If the output contains more

540

+   than 140 repeats, multiple linked repeat tables and alignment files

541

+   will be produced. This will speed downloading time and overcome

542

+   problems with tables too big for web browsers to format.

543

+

544

+   [checkmark.png] Consensus Sequence: The program prints the consensus

545

+   sequence for each repeat in the alignment file, below the alignment.

546

+

547

+   [checkmark.png] Flanking Sequence: As an option, the program prints 200

548

+   characters of flanking sequence from each side of the repeat. This may

549

+   be useful for PCR primer determination. Find it in the alignment file.

550

+

551

+   [checkmark.png] Masked Sequence File: As an option, the program returns

552

+   a copy of the original sequence with the tandem repeats "masked" out.

553

+   The masked sequence file is a [2]FASTA format file with every tandem

554

+   repeat character changed to the letter 'N'. The word "masked" is added

555

+   to the sequence description line just after the '>' character.

556

+

557

+   [checkmark.png] Data File: As an option, the program returns a text

558

+   file containing the same data, in the same order, as the summary table

559

+   file, plus consensus sequences, but without any labels or formatting

560

+   instructions. This file is suitable for automated processing, for

561

+   example with a perl script.

562

+

563

+   [checkmark.png] Select Parameters: Now you can select parameters when

564

+   you submit a sequence, or simply use the default parameters. Visit the

565

+   [3]Submit Options Page for more details.

566

+

567

+   [checkmark.png] Sequence Alphabet: The program now handles sequences

568

+   containing letter other than A, C, G, and T.

569

+

570

+   [checkmark.png] Enhanced Alignment File: We have modified the

571

+   presentation of the alignment file. The output should be easier to view

572

+   and print.

573

+

574

+   [checkmark.png] Automatic Redundancy Removal: The program now reports

575

+   only the smallest period size for a repeat unless a larger period size

576

+   has a significantly higher score.

577

+

578

+   [checkmark.png] Windows Version Now Available: A Windows version of the

579

+   program is now available for download. This version of the the program

580

+   can be run under Windows 95/98 and Windows NT 4. Please visit our

581

+   [4]Download Page for more details.

582

+

583

+

584

+

585

+

586

+     [5][USEMAP:buttonarrow.png]  [6]Home

587

+

588

+     [7][USEMAP:buttonarrow.png]  [8]Submit Page

589

+

590

+     [9][USEMAP:buttonarrow.png]  [10]Downloads

591

+     __________________________________________________________________

592

+

593

+   [11][bu.gif] Last revised February 22, 2016

594

+   Send any questions or comments to:

595

+   [12]Yozen Hernandez

596

+

597

+References

598

+

599

+   1. http://tandem.bu.edu/trf/trf.html

600

+   2. http://tandem.bu.edu/trf/trf.definitions.html#fasta

601

+   3. http://tandem.bu.edu/trf/trf.submit.options.html

602

+   4. http://tandem.bu.edu/trf/trf.download.html

603

+   5. LYNXIMGMAP:http://tandem.bu.edu/trf/trf.whatnew.html#FPMap0

604

+   6. http://tandem.bu.edu/trf/trf.html

605

+   7. LYNXIMGMAP:http://tandem.bu.edu/trf/trf.whatnew.html#FPMap1

606

+   8. http://tandem.bu.edu/trf/trf.submit.options.html

607

+   9. LYNXIMGMAP:http://tandem.bu.edu/trf/trf.whatnew.html#FPMap2

608

+  10. http://tandem.bu.edu/trf/trf.download.html

609

+  11. http://www.bu.edu/

610

+  12. javascript:spamGuard('yhernand','bu.edu')

611

+

612

+[USEMAP]

613

+http://tandem.bu.edu/trf/trf.whatnew.html#FPMap2

614

+   1. http://tandem.bu.edu/trf/trf.download.html

615

+

616

+[USEMAP]

617

+http://tandem.bu.edu/trf/trf.whatnew.html#FPMap1

618

+   1. http://tandem.bu.edu/trf/trf.submit.options.html

619

+

620

+[USEMAP]

621

+http://tandem.bu.edu/trf/trf.whatnew.html#FPMap0

622

+   1. http://tandem.bu.edu/trf/trf.html

623

624

diff --git a/sci-biology/trf-bin/metadata.xml b/sci-biology/trf-bin/metadata.xml

625

new file mode 100644

626

index 0000000..959160f

627

--- /dev/null

628

+++ b/sci-biology/trf-bin/metadata.xml

629

@@ -0,0 +1,8 @@

630

+<?xml version="1.0" encoding="UTF-8"?>

631

+<!DOCTYPE pkgmetadata SYSTEM "http://www.gentoo.org/dtd/metadata.dtd">

632

+<pkgmetadata>

633

+  <maintainer type="project">

634

+    <email>sci-biology@g.o</email>

635

+    <name>Gentoo Biology Project</name>

636

+  </maintainer>

637

+</pkgmetadata>

638

639

diff --git a/sci-biology/trf-bin/trf-bin-4.09.ebuild b/sci-biology/trf-bin/trf-bin-4.09.ebuild

640

new file mode 100644

641

index 0000000..11fae9c

642

--- /dev/null

643

+++ b/sci-biology/trf-bin/trf-bin-4.09.ebuild

644

@@ -0,0 +1,53 @@

645

+# Copyright 1999-2017 Gentoo Foundation

646

+# Distributed under the terms of the GNU General Public License v2

647

+# $Id$

648

+

649

+EAPI=6

650

+

651

+inherit eutils

652

+

653

+MY_PV="${PV/.}" # drop the dot

654

+MY_PN="trf"

655

+MY_P="trf${MY_PV}"

656

+

657

+DESCRIPTION="Tandem Repeats Finder"

658

+HOMEPAGE="http://tandem.bu.edu/trf/trf.html"

659

+SRC_URI="x86? ( http://tandem.bu.edu/trf/downloads/${MY_P}.linux32 )

660

+	amd64? ( http://tandem.bu.edu/trf/downloads/${MY_P}.linux64 )"

661

+

662

+LICENSE="trf"	# http://tandem.bu.edu/trf/trf.license.html

663

+SLOT="0"

664

+KEYWORDS="~amd64 ~x86"

665

+RESTRICT="mirror bindist"

666

+

667

+S="${WORKDIR}"

668

+

669

+QA_PREBUILT="opt/${MY_PN}/.*"

670

+

671

+src_unpack() {

672

+	if use x86; then

673

+		cp "${DISTDIR}/${MY_P}".linux32 "${S}/${MY_PN}" || die

674

+	elif use amd64; then

675

+		cp "${DISTDIR}/${MY_P}".linux64 "${S}/${MY_PN}" || die

676

+	else

677

+		eerror "Unsupported platform, check http://tandem.bu.edu/trf/downloads/"

678

+	fi

679

+	default

680

+}

681

+

682

+src_install() {

683

+	exeinto /opt/"${MY_PN}"/bin

684

+	doexe "${MY_PN}"

685

+	# GTK version (http://tandem.bu.edu/trf/downloads/trf400.linuxgtk.exe) has broken linking

686

+	#if use gtk; then

687

+	#	doexe trf400.linuxgtk.exe

688

+	#	make_desktop_entry /opt/${PN}/trf400.linuxgtk.exe "Tandem Repeats Finder" || die

689

+	#fi

690

+	# http://tandem.bu.edu/trf/trf.unix.help.html

691

+	# http://tandem.bu.edu/trf/trf.definitions.html

692

+	# http://tandem.bu.edu/trf/trf.whatnew.html

693

+	dodoc \

694

+		"${FILESDIR}/"trf.txt \

695

+		"${FILESDIR}/"trf.definitions.txt \

696

+		"${FILESDIR}/"trf.whatsnew.txt

697

+}

Gentoo Archives: gentoo-commits