nthash

package module
v0.4.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 16, 2021 License: MIT Imports: 3 Imported by: 7

README

ntHash

ntHash implementation in Go


travis GoDoc goreportcard codecov

Overview

This is a Go implementation of the ntHash recursive hash function for hashing all possible k-mers in a DNA/RNA sequence.

For more information, read the ntHash paper by Mohamadi et al. or check out their C++ implementation.

This implementation was inspired by Luiz Irber and his recent blog post on his cool Rust ntHash implementation.

I have coded this up in Go so that ntHash can be used in my HULK and GROOT projects but feel free to use it for yourselves.

Installation

go get github.com/will-rowe/nthash

Example usage

range over ntHash values for a sequence

package main

import (
    "log"
    "github.com/will-rowe/nthash"
)

var (
    sequence = []byte("ACGTCGTCAGTCGATGCAGTACGTCGTCAGTCGATGCAGT")
    kmerSize = 11
)

func main() {

    // create the ntHash iterator using a pointer to the sequence and a k-mer size
    hasher, err := ntHash.New(&sequence, kmerSize)

    // check for errors (e.g. bad k-mer size choice)
    if err != nil {
        log.Fatal(err)
    }

    // collect the hashes by ranging over the hash channel produced by the Hash method
    canonical := true
    for hash := range hasher.Hash(canonical) {
        log.Println(hash)
    }
}

Documentation

Overview

Package nthash is a port of ntHash (https://github.com/bcgsc/ntHash) recursive hash function for DNA kmers.

It was inspired by the Rust port by Luiz Irber (https://github.com/luizirber/nthash)

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type NTHi

type NTHi struct {
	// contains filtered or unexported fields
}

NTHi is the ntHash iterator

func NewHasher

func NewHasher(seq *[]byte, k uint) (*NTHi, error)

NewHasher is the constructor function for the ntHash iterator seq is a pointer to the sequence being hashed k is the k-mer size to use

func (*NTHi) Hash

func (nthi *NTHi) Hash(canonical bool) <-chan uint64

Hash returns a channel to range over the canonical ntHash values of a sequence canonical is set true to return the canonical k-mers, otherwise the forward hashes are returned

func (*NTHi) MultiHash

func (nthi *NTHi) MultiHash(canonical bool, numMultiHash uint) <-chan []uint64

MultiHash returns a channel to range over the canonical multi ntHash values of a sequence canonical is set true to return the canonical k-mers, otherwise the forward hashes are returned numMultiHash sets the number of multi hashes to generate for each k-mer

func (*NTHi) Next

func (nthi *NTHi) Next(canonical bool) (uint64, bool)

Next returns the next ntHash value from an ntHash iterator

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL